edit-icon download-icon

ComputeSplitPointsBySize

Last Updated: Mar 23, 2018

Action

Logically splits data in a table into several shards whose sizes are close to the specified size, and returns the split points between the shards and prompt about machines where the shards are located. This API is generally used for execution plans like concurrency of plans on a computing engine.

Request structure

  1. message ComputeSplitPointsBySizeRequest {
  2. required string table_name = 1;
  3. required int64 split_size = 2; // in 100 MB
  4. }

table_name:

  • Type: string
  • Required parameter: Yes
  • The name of the table holding the data to be split.

split_size:

  • Type: int64
  • Required parameter: Yes
  • The approximate size of each shard, in the unit of 100 MB.

Response message structure

  1. message ComputeSplitPointsBySizeResponse {
  2. required ConsumedCapacity consumed = 1;
  3. repeated PrimaryKeySchema schema = 2;
  4. /**
  5. * Split points between splits in an increasing order
  6. *
  7. * A split is a consecutive range of primary keys,
  8. * whose data size is about split_size specified in the request.
  9. * This means the actual size is difficult to determine.
  10. *
  11. * A split point is an array of primary-key column w.r.t. table schema,
  12. * which is never longer than that of table schema.
  13. * Tailing -inf is omitted to reduce transmission payloads.
  14. */
  15. repeated bytes split_points = 3;
  16. /**
  17. * Locations where splits lies in.
  18. *
  19. * By the managed nature of TableStore, these locations are no more than hints.
  20. * If a location is not suitable to be seen, an empty string is placed.
  21. */
  22. message SplitLocation {
  23. required string location = 1;
  24. required sint64 repeat = 2;
  25. }
  26. repeated SplitLocation locations = 4;
  27. }

consumed:

  • Type: ConsumedCapacity
  • The service capacity units consumed by this request.

schema:

  • Type: PrimaryKeySchema
  • The table’s schema, same as the schema given at the time of table creation.

split_points:

  • Type: repeated bytes
  • The split points between shards. The split points must increase monotonically between the shards. Each split point is a Plainbuffer-encoded line and contains only primary keys. The last -inf of each split point is not transmitted for a smaller volume of data transmitted.

locations:

  • Type: repeated SplitLocation
  • The prompt about the machines where the split points are located. It can be null.

    For example, a table contains three columns of primary keys, where the first column is of string type. After this API is called, five shards are obtained, which are:

    • (-inf,-inf,-inf) to ("a",-inf,-inf)
    • ("a",-inf,-inf) to ("b",-inf,-inf)
    • ("b",-inf,-inf) to ("c",-inf,-inf)
    • ("c",-inf,-inf) to ("d",-inf,-inf)
    • ("d",-inf,-inf) to (+inf,+inf,+inf)

      The first three shards are located in “machine-A”, while the last two in “machine-B”. In this case, split_points is (example) [("a"),("b"),("c"),("d")], while locations is (example) "machine-A"*3, "machine-B"*2.

Capacity unit consumption

The number of consumed read service capacity units is the same as that of the shards. No write service capacity unit is consumed.

Thank you! We've received your feedback.