Splits data in a table into several logical partitions whose sizes are close to the specified size. The split points between these partitions and the information about hosts where these partitions reside are returned. This operation is used by computing engines to determine execution plans, such as concurrency plans.

Request syntax

message ComputeSplitPointsBySizeRequest {
    required string table_name = 1;
    required int64 split_size = 2; // in 100MB
}
            
Parameter Type Required Description
table_name string Yes The name of the table whose data you want to split.
split_size int64 Yes The approximate size of each partition. Unit: megabytes.

Response syntax

message ComputeSplitPointsBySizeResponse {
    required ConsumedCapacity consumed = 1;
    repeated PrimaryKeySchema schema = 2;

    /**
     * Split points between splits, in the increasing order
     *
     * A split is a consecutive range of primary keys,
     * whose data size is about split_size specified in the request.
     * The size could be hard to be precise.
     *
     * A split point is an array of primary-key column w.r.t. table schema,
     * which is never longer than that of table schema.
     * Tailing -inf will be omitted to reduce transmission payloads.
     */
    repeated bytes split_points = 3;

    /**
     * Locations where splits lies in.
     *
     * By the managed nature of TableStore, these locations are no more than hints.
     * If a location is not suitable to be seen, an empty string will be placed.
     */
     message SplitLocation {
         required string location = 1;
         required sint64 repeat = 2;
     }
     repeated SplitLocation locations = 4;
}
            
Parameter Type Description
consumed ConsumedCapacity The number of capacity units (CUs) consumed by this request.
schema PrimaryKeySchema The schema of the table. The schema is the same as the schema that was defined when the table was created.
split_points repeated bytes The split points between partitions. The split points must increase monotonically between these partitions. Each split point is a line of data in the PlainBuffer format and contains only primary keys. The last -inf of each split point is not transmitted to reduce the amount of transmitted data.
locations repeated SplitLocation The information about the hosts where the split points reside. This parameter can be empty.

For example, if a table contains three primary key columns and the data type of the first column is string, the following five partitions are obtained after this operation is called: (-inf,-inf,-inf) to ("a",-inf,-inf), ("a",-inf,-inf) to ("b",-inf,-inf), ("b",-inf,-inf) to ("c",-inf,-inf), ("c",-inf,-inf) to ("d",-inf,-inf), and ("d",-inf,-inf) to (+inf,+inf,+inf). The first three partitions reside in machine-A and the other two partitions reside in machine-B. In this case, the value of split_points is [("a"),("b"),("c"),("d")], and the value of locations is "machine-A"*3, "machine-B"*2.

CU consumption

The number of consumed read CUs is the same as that of the partitions. No write CUs are consumed.