To scan an entire table in parallel from a compute engine, divide the data into sub-ranges that can be processed concurrently. Tablestore SDK for Java generates primary key range splits of a specified size. Pass these splits directly to range read APIs to retrieve data concurrently.
Prerequisites
Tablestore SDK for Java installed.
A Tablestore client initialized.
Description
public ComputeSplitsBySizeResponse computeSplitsBySize(ComputeSplitsBySizeRequest request) throws TableStoreException, ClientException
On the server side, Tablestore logically divides a table into splits of a specified size. Each split is returned with its primary key range (lowerBound / upperBound) and a hint for the machine that hosts the split (location). Pass these primary key ranges directly to RangeRowQueryCriteria, then read the splits in parallel with Read data by range or Read data with an iterator. Compute engines typically use this pattern to plan execution parallelism.
The following example divides the split_demo table into splits of approximately 200 MB and prints the location and primary key range of each split.
String tableName = "split_demo";
// Divide the full data of the table into splits of approximately 200 MB (2 * 100 MB).
ComputeSplitsBySizeRequest request =
new ComputeSplitsBySizeRequest(tableName, 2);
ComputeSplitsBySizeResponse response = client.computeSplitsBySize(request);
System.out.println("RequestId: " + response.getRequestId());
System.out.println("PrimaryKeySchema: " + response.getPrimaryKeySchema());
System.out.println("ConsumedCapacity: " + response.getConsumedCapacity().jsonize());
List<Split> splits = response.getSplits();
System.out.println("Splits size: " + splits.size());
Iterator<Split> iterator = splits.iterator();
while (iterator.hasNext()) {
Split split = iterator.next();
// The primary key ranges returned by getLowerBound() and getUpperBound() can be passed directly
// to RangeRowQueryCriteria for parallel reads with getRange or createRangeIterator.
System.out.println("Location: " + split.getLocation());
System.out.println("LowerBound: " + split.getLowerBound().jsonize());
System.out.println("UpperBound: " + split.getUpperBound().jsonize());
}
splitSizeis measured in units of 100 MB. Each split is approximately N × 100 MB, where N is the value you pass.Splits are logical, and their sizes are approximate. Exact sizes cannot be guaranteed.
The
locationfield hints at the machine that hosts the split. The field may be empty in some cases.
Parameters
|
Name |
Type |
Description |
|
tableName (required) |
String |
The name of the table. |
|
splitSize (required) |
long |
The approximate size of each split, in units of 100 MB. For example, passing |