All Products
Search
Document Center

DataHub:Manage shards

Last Updated:May 10, 2022

Manage shards

You can horizontally expand shards or split and merge shards. Take note of the following points:

  1. If horizontal expansion of shards is enabled, you cannot merge shards. If the mode is not enabled, you can split and merge shards.

  2. If you use Kafka to consume topics, you must enable horizontal expansion of shards.

  3. After horizontal expansion of shards is enabled, the key range cannot be used. The hash key ranges of all shards are the same. Data cannot be written in the HashKey and PartitionKey mode. You must perform hash modulus in the application layer and pay attention to the changes in the write shard caused by the expansion.

Horizontally expand shards

DataHub allows you to horizontally expand shards in a topic. You can enable the shard extension mode when you create a topic.

Step 1

Enable the shard extension mode.

1

Step 2

Click the Edit icon to change the number of shards, as shown in the following figure.

23

Step 3

View the shard list after horizontal expansion.

4

Split and merge shards

DataHub allows you to scale in or out topics by splitting or merging shards.

Scenario

DataHub supports scaling. You can increase the number of shards to handle traffic surges and decrease the number to save resources. For example, if the topic throughput cannot handle a traffic surge during Double 11, you can split existing shards to up to 256 shards to increase the throughput to 1,280 MB/s. As the traffic decreases after Double 11, you can reduce the number of shards as needed by performing the merge operation.

Shard properties

You can call the ListShard operation to query the details of all shards.

{
    "ShardId": "string",
    "State": "string",
    "ClosedTime": uint64,
    "BeginHashKey": "string",
    "EndHashKey": "string",
    "ParentShardIds": [string,string,],
    "LeftShardId": "string",
    "RightShardId": "string"
}

Split a shard

You can split a shard by using a DataHub SDK or in the DataHub console. To split a shard, specify the shard ID and a 128-bit hash key value. After the shard is split into two child shards, the IDs and hash key ranges of the child shards are returned. The status of the parent shard is changed to CLOSED. For example, the following parent shard exists before the split operation:

ShardId:0 Status:ACTIVE BeginHashKey:00000000000000000000000000000000
    EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Split the shard by using a DataHub SDK:

String shardId = "0";
SplitShardRequest req = new SplitShardRequest(projectName, topicName, shardId, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");
SplitShardResult resp = client.splitShard(req);

After the split operation, the parent shard turns to the following three shards:

ShardId:0 Status:CLOSED BeginHashKey:00000000000000000000000000000000
                    EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
ShardId:1 Status:ACTIVE BeginHashKey:00000000000000000000000000000000
                    EndHashKey:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ShardId:2 Status:ACTIVE BeginHashKey:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                    EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Merge shards

You can merge two adjacent shards by using a DataHub SDK or in the DataHub console.After two shards are merged, the ID and hash key range of the new shard are returned. The status of two parent shards is changed to CLOSED. For example, the following two parent shards exist before the merge operation:

ShardId:0 Status:ACTIVE BeginHashKey:00000000000000000000000000000000
                    EndHashKey:7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF
ShardId:1 Status:ACTIVE BeginHashKey:7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF
                    EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Merge the two shards by using a DataHub SDK:

String shardId = "0";
String adjacentShardId = "1";
MergeShardRequest req = new MergeShardRequest(projectName, topicName, shardId, adjacentShardId);
MergeShardResult resp = client.mergeShard(req);

After the merge operation, the parent shards turn to the following three shards:

ShardId:0 Status:CLOSED BeginHashKey:00000000000000000000000000000000
                    EndHashKey:7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF
ShardId:1 Status:CLOSED BeginHashKey:7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF
                    EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
ShardId:2 Status:ACTIVE BeginHashKey:00000000000000000000000000000000
                    EndHashKey:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Usage notes

After the merge or split operation is performed, the statuses of the parent shards are changed to CLOSED. New data cannot be written into closed shards while the existing data in the closed shard can still be consumed. A closed shard cannot be split into two shards or merged with other shards. When the time-to-live (TTL) period of the data in the shard expires, the shard is deleted. If a DataConnector is configured for the shard, the DataConnector is stopped after all data in the shard are archived to the destination service. The DataConnector is automatically deleted after the shard is deleted. The new shard can be used only after its status is changed to Active. The status can be changed to Active within 5 seconds.