Divides data in a table into several logical splits whose sizes are approximately the same as the specified value. The split points between the splits and the information about hosts on which the splits reside are returned. This operation is used by compute engines to determine execution plans such as concurrency plans.

Request syntax

message ComputeSplitPointsBySizeRequest {
    required string table_name = 1;
    required int64 split_size = 2; // in 100MB
Parameter Type Required Description
table_name string Yes The name of the table whose data you want to divide.
split_size int64 Yes The approximate size of each split. Unit: megabytes.

Response syntax

message ComputeSplitPointsBySizeResponse {
    required ConsumedCapacity consumed = 1;
    repeated PrimaryKeySchema schema = 2;

     * Split points between splits, in the increasing order
     * A split is a consecutive range of primary keys,
     * whose data size is about split_size specified in the request.
     * The size could be hard to be precise.
     * A split point is an array of primary-key column w.r.t. table schema,
     * which is never longer than that of table schema.
     * Tailing -inf will be omitted to reduce transmission payloads.
    repeated bytes split_points = 3;

     * Locations where splits lies in.
     * By the managed nature of TableStore, these locations are no more than hints.
     * If a location is not suitable to be seen, an empty string will be placed.
     message SplitLocation {
         required string location = 1;
         required sint64 repeat = 2;
     repeated SplitLocation locations = 4;
Parameter Type Description
consumed ConsumedCapacity The number of capacity units (CUs) that are consumed by this request.
schema PrimaryKeySchema The schema of the table. The schema is the same as the schema that was defined when the table was created.
split_points repeated bytes The split points between splits. The split points must increase monotonically between these splits. Each split point is a row of data in the PlainBuffer format and contains only the primary key. The last -inf of each split point is not transmitted. This helps reduce the amount of transmitted data.
locations repeated SplitLocation The information about the hosts on which the split points reside. This parameter can be left empty.

For example, if a table contains three primary key columns and the data type of the first primary key column is string, the following splits are obtained after you call this operation: (-inf,-inf,-inf) to ("a",-inf,-inf), ("a",-inf,-inf) to ("b",-inf,-inf), ("b",-inf,-inf) to ("c",-inf,-inf), ("c",-inf,-inf) to ("d",-inf,-inf), and ("d",-inf,-inf) to (+inf,+inf,+inf). The first three splits reside on machine-A and the other two splits reside on machine-B. In this case, the value of split_points is [("a"),("b"),("c"),("d")], and the value of locations is "machine-A"*3, "machine-B"*2.

Use Tablestore SDKs

Tablestore SDK for Java: Split data by a specified size

CU consumption

The number of read CUs that are consumed is the same as the number of splits. No write CUs are consumed.