ComputeSplitPointsBySize to divide table data by size - Tablestore

Divides data in a table into several logical splits whose sizes are approximately the specified value. The split points between the splits and the information about hosts on which the splits reside are returned. This operation is used by compute engines to determine execution plans such as concurrency plans.

Request syntax

message ComputeSplitPointsBySizeRequest {
    required string table_name = 1;
    required int64 split_size = 2; // in 100MB
    optional int64 split_size_unit_in_byte = 3;
    optional int32 split_point_limit = 4;
}

Parameter	Type	Required	Description
table_name	string	Yes	The name of the table whose data you want to divide.
split_size	int64	Yes	The approximate size of each split. Unit: megabytes.
split_size_unit_in_byte	int64	No	The size unit to be used in splitting. This parameter is used in split point calculation to ensure calculation accuracy.
split_point_limit	int32	No	The limit on the number of split points. This parameter is used to control the returned result of split point calculation.

Response syntax

message ComputeSplitPointsBySizeResponse {
    required ConsumedCapacity consumed = 1;
    repeated PrimaryKeySchema schema = 2;

    /**
     * Split points between splits, in the increasing order
     *
     * A split is a consecutive range of primary keys,
     * whose data size is about split_size specified in the request.
     * The size could be hard to be precise.
     *
     * A split point is an array of primary-key column w.r.t. table schema,
     * which is never longer than that of table schema.
     * Tailing -inf will be omitted to reduce transmission payloads.
     */
    repeated bytes split_points = 3;

    /**
     * Locations where splits lies in.
     *
     * By the managed nature of TableStore, these locations are no more than hints.
     * If a location is not suitable to be seen, an empty string will be placed.
     */
     repeated SplitLocation locations = 4;
}

Parameter	Type	Description
consumed	ConsumedCapacity	The number of capacity units (CUs) that are consumed by this request.
schema	PrimaryKeySchema	The schema of the table. The schema is the same as the schema that was defined when the table was created.
split_points	repeated bytes	The split points between splits. The split points must increase monotonically between these splits. Each split point is a row of data in the PlainBuffer format and contains only the primary key. The last -inf of each split point is not transmitted. This helps reduce the amount of transmitted data.
locations	repeated SplitLocation	The information about the hosts on which the split points reside. You can leave this parameter empty.

For example, if a table contains three primary key columns and the data type of the first primary key column is string, the following splits are obtained after you call this operation: (-inf,-inf,-inf) to ("a",-inf,-inf), ("a",-inf,-inf) to ("b",-inf,-inf), ("b",-inf,-inf) to ("c",-inf,-inf), ("c",-inf,-inf) to ("d",-inf,-inf), and ("d",-inf,-inf) to (+inf,+inf,+inf). The first three splits reside on machine-A and the other two splits reside on machine-B. In this case, the value of split_points is [("a"),("b"),("c"),("d")], and the value of locations is "machine-A"*3, "machine-B"*2.

Use Tablestore SDKs

Tablestore SDK for Java: Split data into shards of a specific size

CU consumption

The number of read CUs that are consumed is the same as the number of splits. No write CUs are consumed.