Compute splits by size - Tablestore SDK for Java - Tablestore

To scan an entire table in parallel from a compute engine, divide the data into sub-ranges that can be processed concurrently. Tablestore SDK for Java generates primary key range splits of a specified size. Pass these splits directly to range read APIs to retrieve data concurrently.

Prerequisites

Tablestore SDK for Java installed.
A Tablestore client initialized.

Description

public ComputeSplitsBySizeResponse computeSplitsBySize(ComputeSplitsBySizeRequest request) throws TableStoreException, ClientException

On the server side, Tablestore logically divides a table into splits of a specified size. Each split is returned with its primary key range (lowerBound / upperBound) and a hint for the machine that hosts the split (location). Pass these primary key ranges directly to RangeRowQueryCriteria, then read the splits in parallel with Read data by range or Read data with an iterator. Compute engines typically use this pattern to plan execution parallelism.

The following example divides the split_demo table into splits of approximately 200 MB and prints the location and primary key range of each split.

String tableName = "split_demo";

// Divide the full data of the table into splits of approximately 200 MB (2 * 100 MB).
ComputeSplitsBySizeRequest request =
        new ComputeSplitsBySizeRequest(tableName, 2);

ComputeSplitsBySizeResponse response = client.computeSplitsBySize(request);

System.out.println("RequestId: " + response.getRequestId());
System.out.println("PrimaryKeySchema: " + response.getPrimaryKeySchema());
System.out.println("ConsumedCapacity: " + response.getConsumedCapacity().jsonize());

List<Split> splits = response.getSplits();
System.out.println("Splits size: " + splits.size());

Iterator<Split> iterator = splits.iterator();
while (iterator.hasNext()) {
    Split split = iterator.next();
    // The primary key ranges returned by getLowerBound() and getUpperBound() can be passed directly
    // to RangeRowQueryCriteria for parallel reads with getRange or createRangeIterator.
    System.out.println("Location: " + split.getLocation());
    System.out.println("LowerBound: " + split.getLowerBound().jsonize());
    System.out.println("UpperBound: " + split.getUpperBound().jsonize());
}

Note

splitSize is measured in units of 100 MB. Each split is approximately N × 100 MB, where N is the value you pass.
Splits are logical, and their sizes are approximate. Exact sizes cannot be guaranteed.
The location field hints at the machine that hosts the split. The field may be empty in some cases.

Parameters

Name	Type	Description
tableName (required)	String	The name of the table.
splitSize (required)	long	The approximate size of each split, in units of 100 MB. For example, passing `2` divides the data into splits of 200 MB each.