All Products
Search
Document Center

DataHub:Terms

Last Updated:Aug 04, 2021

Terms

Term

Description

project

An organizational unit in DataHub. Each project contains one or more topics. DataHub projects are independent of MaxCompute projects. You cannot use MaxCompute projects as DataHub projects.

topic

The minimum unit for data subscription and publishing in DataHub. You can use topics to distinguish different types of streaming data. For more information about the limits on the number of projects and topics, see Limits.

time-to-live (TTL) period of a topic

The period that each record can be retained in a topic. Unit: day. Valid values: 1 to 7.

shard

Shards are channels that allow for concurrently writing data to a topic. Each shard has a unique ID. Shards can be in different states. For more information about the states of shards, see the table in the "Shard states" section of this topic. Each active shard consumes server resources. We recommend that you create shards as needed.

shard hash key range

The range of hash key values for a shard, which is in the [Starting hash key,Ending hash key) format. The hashing mechanism ensures that all records with the same shard key are written to the same shard. For more information, see DataHub SDK for Java.

shard merge

The operation that merges two adjacent shards. Two shards are considered adjacent if the hash key ranges for the two shards form a contiguous set with no gaps. For more information, see Manage shards.

shard split

The operation that splits one shard into two adjacent shards.

record

A unit of data that is written to DataHub.

record type

The data type of records in a topic. TUPLE and BLOB are supported. A topic of the TUPLE type is a sequence of immutable objects. A topic of the BLOB type is a chunk of binary data stored as a single entity.

Data types

  • The following table describes the data types that are supported in a topic of the TUPLE type.

Type

Description

Valid value

BIGINT

An 8-byte signed integer.

-9223372036854775807 to 9223372036854775807

DOUBLE

A double-precision floating-point number. It is eight bytes in length.

-1.0 _10^308 to 1.0 _10^308

BOOLEAN

The Boolean data type.

True and False, true and false, or 0 and 1

TIMESTAMP

The timestamp data type.

The value is accurate to microseconds.

STRING

A string. Only UTF-8 encoding is supported.

The size of a string must not exceed 2 MB.

TINYINT

A single-byte integer.

-128 to 127

SMALLINT

A double-byte integer.

-32768 to 32767

INTEGER

A four-byte integer.

-2147483648 to 2147483647

FLOAT

A single-precision floating-point number. It is four bytes in length.

-3.40292347_10^38 to 3.40292347_10^38

DECIMAL

A decimal numeral.

- 10^38 + 1 to 10^38 - 1

For version V2.16.1-public and later, DataHub SDK for Java supports the TINYINT, SMALLINT, INTEGER, and FLOAT data types that are used in DataHub.

  • In a topic of the BLOB type, a chunk of binary data is stored as a record. Records written to DataHub are Base64 encoded.

    Shard states

State

Description

Opening

All shards in a topic are being activated when the topic is created. You cannot perform read or write operations on a shard when it is being activated.

Active

Read and write operations are allowed when a shard is in the Active state.

Closing

When a shard is being split or two shards are being merged, the shards are in the Closing state. You cannot perform read or write operations on shards in this state.

Closed

A shard is in the Closed state when the split or merge operation is complete. The shard is read-only when it is in the Closed state.

Error codes

Error code

HTTP status code

Description

InvalidUriSpec

400

The error code is returned because the specified URI is invalid.

InvalidParameter

400

The error code is returned because the specified parameter is invalid. Check the returned error message for detailed information.

Unauthorized

401

The error code is returned because a signature error occurs.

NoPermission

403

The error code is returned because the account does not have the permissions to perform the operation.

InvalidSchema

400

The error code is returned because the schema format is invalid.

InvalidCursor

400

The error code is returned because the cursor is invalid or has expired.

NoSuchProject

404

The error code is returned because the specified project does not exist.

NoSuchTopic

404

The error code is returned because the specified topic does not exist.

NoSuchShard

404

The error code is returned because the specified shard ID does not exist.

ProjectAlreadyExist

400

The error code is returned because the project name already exists.

TopicAlreadyExist

400

The error code is returned because the topic name already exists.

InvalidShardOperation

405

The error code is returned because the operation on the shard is not allowed. For example, you are not allowed to write data to a shard when it is in the Closed state.

LimitExceeded

400

The error code is returned because a specified threshold is exceeded. For example, you create more than 512 shards in a topic.

InternalServerError

500

The error code is returned because an unknown or internal error occurs or the system is being updated.