If you do not have requirements on the order of query results, you can use parallel scan to quickly obtain query results.

Prerequisites

  • A Tablestore client is initialized. For more information, see Initialization.
  • A data table is created. Data is written to the table.
  • A search index is created for the data table. For more information, see Create search indexes.

Parameters

Parameter Description
tableName The name of the data table.
indexName The name of the search index.
scanQuery query The query statement for the search index. The operation supports term query, fuzzy query, range query, geo query, and nested query, which are similar to those of the Search operation.
limit The maximum number of rows that can be returned by each ParallelScan call.
maxParallel The maximum number of parallel scan tasks per request. The maximum number of parallel scan tasks per request varies based on the data volume. A larger volume of data requires more parallel scan tasks per request. You can use the ComputeSplits operation to query the maximum number of parallel scan tasks per request.
currentParallelId The ID of the parallel scan task in the request. Valid values: [0, Value of maxParallel)
token The token that is used to paginate query results. The results of the ParallelScan request contain the token for the next page. You can use the token to retrieve the next page.
aliveTime The validity period of the current parallel scan task. This validity period is also the validity period of the token. Unit: seconds. Default value: 60. We recommend that you use the default value. If the next request is not initiated within the validity period, more data cannot be queried. The validity time of the token is refreshed each time you send a request.
Note The server uses the asynchronous method to process expired tasks. The current task does not expire within the validity period. However, Tablestore does not guarantee that the task expires after the validity period.
columnsToGet You can use parallel scan to scan data only in search indexes. To use parallel scan for a search index, you must set store to true when you create the search index.
sessionId The session ID of the parallel scan task. You can call the ComputeSplits operation to create a session and query the maximum number of parallel scan tasks that are supported by the parallel scan request.

Examples

// 1. Obtain the session ID. 
let computeSplits = await new Promise((resolve, reject) => {
    client.computeSplits({
        tableName: tableName,
        searchIndexSplitsOptions: {
            indexName: indexName,
        }
    }, function (err, data) {
        if (err) {
            console.log('computeSplits error:', err.toString());
            reject(err);
        } else {
            console.log('computeSplits success:', data);
            resolve(data)
        }
    })
})

// 2. Specify query conditions. 
const scanQuery = {
    query: {
        queryType: TableStore.QueryType.MATCH_ALL_QUERY,
    },
    limit: 1000,
    aliveTime: 30,
    token: undefined,
    currentParallelId: 0,
    maxParallel: 1,
}

// 3. Create a ParallelScan request. In this example, a synchronous request is created. You can create an asynchronous request based on your business requirement. 
const parallelScanPromise = function () {
    return new Promise(function (resolve, reject) {
        client.parallelScan({
            tableName: tableName,
            indexName: indexName,
            columnToGet: {
                returnType: TableStore.ColumnReturnType.RETURN_ALL_FROM_INDEX,
            },
            sessionId: computeSplits.sessionId,
            scanQuery: scanQuery,
        }, function (err, data) {
            if (err) {
                console.log('parallelScan error:', err.toString());
                reject(err);
            } else {
                console.log("parallelScan, rows:", data.rows.length)
                resolve(data)
            }
        });
    })
}
let totalCount = 0 // Specify that the total number of rows that meet the query conditions is returned. 
let parallelScanResponse = await parallelScanPromise()
totalCount = totalCount + parallelScanResponse.rows.length
// 4. Pull data by using an iterator until all data is pulled. 
while (parallelScanResponse.nextToken !== null && parallelScanResponse.nextToken.length > 0) {
    scanQuery.token = parallelScanResponse.nextToken
    parallelScanResponse = await parallelScanPromise()
    totalCount += parallelScanResponse.rows.length
}
console.log("total rows:", totalCount)