Tablestore provides API operations that allow you to read a single row of data, multiple rows of data at a time, and data in a range from a data table. When you read a single row of data or multiple rows of data at a time from a table, you must specify the values of all primary key columns. When you read data from a table by range, you must specify a range for the values of all primary key columns or the prefix of the values of primary key columns. When you read data, you can specify the attribute columns and the number of data versions to be returned, the time range used to query data, and the filter conditions.
Query methods
Tablestore provides the GetRow, BatchGetRow, and GetRange operations that you can call to read data. Before you read data, select the appropriate query method based on the actual query scenario.
If you want to read data from a table that contains an auto-increment primary key column, make sure that you have queried the values of all primary key columns that include the values of the auto-increment primary key column. For more information, see Configure an auto-increment primary key column. If no value is recorded for the auto-increment primary key column, you can call the GetRange operation to specify the range within which data is read based on primary key values from the first primary key column.
Query method | Description | Scenario |
You can call the GetRow operation to read a single row of data. | This method is applicable to scenarios in which the values of all primary key columns of the row to be queried can be determined and the number of rows to be read is small. | |
You can call the BatchGetRow operation to read multiple rows of data from one or more tables at a time. The BatchGetRow operation consists of multiple GetRow operations. When you call the BatchGetRow operation, the process of constructing each GetRow operation is the same as the process of constructing the GetRow operation when you call the GetRow operation. | This method is applicable to scenarios in which the values of all primary key columns of the rows to be queried can be determined and the number of rows to be read is large or data is to be read from multiple tables. | |
Read data whose primary key values are in the specified range | You can call the GetRange operation to read data whose primary key values are in the specified range. The GetRange operation allows you to read data whose primary key values are in the specified range in a forward or backward direction. You can also specify the number of rows to read. If the range is large and the number of scanned rows or the volume of scanned data exceeds the upper limit, the scan stops, and the rows that are read and information about the primary key of the next row are returned. You can initiate a request to start from where the last operation left off and read the remaining rows based on the information about the primary key of the next row returned by the previous operation. | This method is applicable to scenarios in which the range of the values of all primary key columns or the prefix of primary key columns of the rows to be queried can be determined. Important If you cannot determine the prefix of primary key columns, you can specify the start primary key column whose data is of the INF_MIN type and the end primary key column whose data is of the INF_MAX type to determine the range of all primary key columns of a table. This operation scans all data in the table but consumes a large amount of computing resources. Proceed with caution. |
Prerequisites
- The OTSClient instance is initialized. For more information, see Initialize an OTSClient instance.
- A data table is created. Data is written to the table.
Read a single row of data
You can call the GetRow operation to read a single row of data. After you call the GetRow operation, one of the following results may be returned:
If the row exists, the primary key columns and attribute columns of the row are returned.
If the row does not exist, no row is returned and no error is reported.
Syntax
"""
Description: This operation reads a single row of data.
table_name: the name of the table.
primary_key: the primary key information of the row. Type: LIST.
columns_to_get: optional. The columns that you want to read. If you do not specify this parameter, all columns are returned. Type: LIST.
column_filter: optional. The filter conditions for columns. Only rows that meet the conditions are returned.
max_version: optional. The maximum number of data versions that can be returned. You must specify at least one of the max_version and time_range parameters.
time_range: optional. The time range of versions or a specific version that you want to read. You must specify at least one of the max_version and time_range parameters.
Response: The number of capacity units (CUs) consumed by the operation, primary key columns, and attribute columns are returned.
consumed: the number of CUs that are consumed by the operation. The consumed parameter is an instance of the tablestore.metadata.CapacityUnit class.
return_row: the row that is returned, including the primary key columns and attribute columns. Type: LIST. Example: [('PK0',value0), ('PK1',value1)].
next_token: the start column for the next wide-column read operation. The data type of the column is BINARY.
"""
def get_row(self, table_name, primary_key, columns_to_get=None,
column_filter=None, max_version=None, time_range=None,
start_column=None, end_column=None, token=None)
Parameters
Parameter | Description |
table_name | The name of the table. |
primary_key | The primary key information of the row. The value of this parameter contains the name, type, and value of each primary key column. Important The number and types of primary key columns that you specify must be the same as the actual number and types of primary key columns in the table. |
columns_to_get | The columns that you want to read. You can specify the names of primary key columns or attribute columns.
Note
|
max_version | The maximum number of data versions that can be returned. Important You must specify at least one of the max_version and time_range parameters.
|
time_range | The time range of versions or a specific version that you want to read. For more information, see TimeRange. Important You must specify at least one of the max_version and time_range parameters.
Only one of specific_time and Valid values of the time_range parameter: 0 to |
column_filter | The filter that you want to use to filter the query results on the server side. Only rows that meet the filter conditions are returned. For more information, see Filter. Note If you specify both the columns_to_get and column_filter parameters, Tablestore queries the columns that are specified by the columns_to_get parameter, and then returns the rows that meet the filter conditions. |
Example
The following sample code provides an example on how to read a row of data from a table:
# Specify the primary key information of the row that you want to read. In this example, the primary key consists of two primary key columns. The first primary key column is uid and the value is the integer 1. The second primary key column is gid and the value is the integer 101.
primary_key = [('gid', 1), ('uid', 101)]
# Specify the columns that you want to read. In this example, the columns are set to attribute columns name, growth, and type. If you leave the columns_to_get parameter empty, all attribute columns are returned.
columns_to_get = ['name', 'growth', 'type']
# Specify a filter for the columns. In this example, the row whose value of the growth column is not 0.9 and value of the name column is Hangzhou is returned.
cond = CompositeColumnCondition(LogicalOperator.AND)
cond.add_sub_condition(SingleColumnCondition("growth", 0.9, ComparatorType.NOT_EQUAL))
cond.add_sub_condition(SingleColumnCondition("name", 'Hangzhou', ComparatorType.EQUAL))
try:
# Call the GetRow operation to query data.
# Specify the table name. The last value 1 specifies that only one version of data is returned.
consumed, return_row, next_token = client.get_row('<table_name>', primary_key, columns_to_get, cond, 1)
print('Read succeed, consume %s read cu.' % consumed.read)
print('Value of primary key: %s' % return_row.primary_key)
print('Value of attribute: %s' % return_row.attribute_columns)
for att in return_row.attribute_columns:
# Display the key, value, and version of each column.
print('name:%s\tvalue:%s\ttimestamp:%d' % (att[0], att[1], att[2]))
# Client exceptions are generally caused by parameter errors or network exceptions.
except OTSClientError as e:
print('get row failed, http_status:%d, error_message:%s' % (e.get_http_status(), e.get_error_message()))
# Server exceptions are generally caused by parameter or throttling errors.
except OTSServiceError as e:
print('get row failed, http_status:%d, error_code:%s, error_message:%s, request_id:%s' % (e.get_http_status(), e.get_error_code(), e.get_error_message(), e.get_request_id()))
For more information about the detailed sample code, see get_row.py on GitHub.
Read multiple rows of data at a time
You can call the BatchGetRow operation to read multiple rows of data from one or more tables at a time. The BatchGetRow operation consists of multiple GetRow operations. When you call the BatchGetRow operation, the process of constructing each GetRow operation is the same as the process of constructing the GetRow operation when you call the GetRow operation.
If you call the BatchGetRow operation, each GetRow operation is separately performed, and Tablestore separately returns the response to each GetRow operation.
Usage notes
When you call the BatchGetRow operation to read multiple rows at a time, some rows may fail to be read. If this happens, Tablestore does not return exceptions, but returns BatchGetRowResponse in which the information about the failed rows are included. Therefore, when you call the BatchGetRow operation, you must check the return values to determine whether data is successfully read from each row.
The BatchGetRow operation uses the same parameter settings for all rows. For example, if the
ColumnsToGet
parameter is set to [colA], only the value of the colA column is read from all rows.You can call the BatchGetRow operation to read a maximum of 100 rows at a time.
Syntax
"""
Description: This operation reads multiple rows of data at a time.
request = BatchGetRowRequest()
request.add(TableInBatchGetRowItem(myTable0, primary_keys, column_to_get=None, column_filter=None))
request.add(TableInBatchGetRowItem(myTable1, primary_keys, column_to_get=None, column_filter=None))
request.add(TableInBatchGetRowItem(myTable2, primary_keys, column_to_get=None, column_filter=None))
request.add(TableInBatchGetRowItem(myTable3, primary_keys, column_to_get=None, column_filter=None))
response = client.batch_get_row(request)
response: the results returned. The response parameter is an instance of the tablestore.metadata.BatchGetRowResponse class.
"""
def batch_get_row(self, request):
Parameters
For more information about parameters, see the Parameters table of the "Read a single row of data" section.
Example
The following sample code provides an example on how to read three rows of data at a time from multiple tables:
# Specify the columns that you want to read.
columns_to_get = ['name', 'mobile', 'address', 'age']
# Read three rows of data.
rows_to_get = []
for i in range(0, 3):
primary_key = [('gid', i), ('uid', i + 1)]
rows_to_get.append(primary_key)
# Specify a filter for the columns. Set the value of the name column to John and the value of the address column to China.
cond = CompositeColumnCondition(LogicalOperator.AND)
cond.add_sub_condition(SingleColumnCondition("name", "John", ComparatorType.EQUAL))
cond.add_sub_condition(SingleColumnCondition("address", 'China', ComparatorType.EQUAL))
# Construct a request to read multiple rows of data.
request = BatchGetRowRequest()
# Specify the rows that you want to read from a table. The last value 1 specifies that the latest version of data is read.
request.add(TableInBatchGetRowItem('<table_name1>', rows_to_get, columns_to_get, cond, 1))
# Specify the rows that you want to read from another table.
request.add(TableInBatchGetRowItem('<table_name2>', rows_to_get, columns_to_get, cond, 1))
try:
result = client.batch_get_row(request)
print('Result status: %s' % (result.is_all_succeed()))
table_result_0 = result.get_result_by_table('<table_name1>')
table_result_1 = result.get_result_by_table('<table_name2>')
print('Check first table\'s result:')
for item in table_result_0:
if item.is_ok:
print('Read succeed, PrimaryKey: %s, Attributes: %s' % (item.row.primary_key, item.row.attribute_columns))
else:
print('Read failed, error code: %s, error message: %s' % (item.error_code, item.error_message))
print('Check second table\'s result:')
for item in table_result_1:
if item.is_ok:
print('Read succeed, PrimaryKey: %s, Attributes: %s' % (item.row.primary_key, item.row.attribute_columns))
else:
print('Read failed, error code: %s, error message: %s' % (item.error_code, item.error_message))
# Client exceptions are generally caused by parameter errors or network exceptions.
except OTSClientError as e:
print('get row failed, http_status:%d, error_message:%s' % (e.get_http_status(), e.get_error_message()))
# Server exceptions are generally caused by parameter or throttling errors.
except OTSServiceError as e:
print('get row failed, http_status:%d, error_code:%s, error_message:%s, request_id:%s' % (e.get_http_status(), e.get_error_code(), e.get_error_message(), e.get_request_id()))
For more information about the detailed sample code, see batch_get_row.py on GitHub.
Read data whose primary key values are in the specified range
You can call the GetRange operation to read data whose primary key values are in the specified range.
The GetRange operation allows you to read data whose primary key values are in the specified range in a forward or backward direction. You can also specify the number of rows to read. If the range is large and the number of scanned rows or the volume of scanned data exceeds the upper limit, the scan stops, and the rows that are read and information about the primary key of the next row are returned. You can initiate a request to start from where the last operation left off and read the remaining rows based on the information about the primary key of the next row returned by the previous operation.
In Tablestore tables, all rows are sorted by the primary key. The primary key of a table sequentially consists of all primary key columns. Therefore, the rows are not sorted based on a specific primary key column.Tablestore
Usage notes
The GetRange operation follows the leftmost matching principle. Tablestore compares values in sequence from the first primary key column to the last primary key column to read data whose primary key values are in the specified range. For example, the primary key of a data table consists of the following primary key columns: PK1, PK2, and PK3. When data is read, Tablestore first determines whether the PK1 value of a row is in the range that is specified for the first primary key column. If the PK1 value of a row is in the range, Tablestore stops determining whether the values of other primary key columns of the row are in the ranges that are specified for each primary key column and returns the row. If the PK1 value of a row is not in the range, Tablestore continues to determine whether the values of other primary key columns of the row are in the ranges that are specified for each primary key column in the same manner as PK1.
If one of the following conditions is met, the GetRange operation may stop and return data:
The amount of scanned data reaches 4 MB.
The number of scanned rows reaches 5,000.
The number of returned rows reaches the upper limit.
The read throughput is insufficient to read the next row of data because all reserved read throughput is consumed.
Each GetRange call scans data once. If the size of data that you want to scan by calling the GetRange operation is large, the scanning stops when the number of scanned rows reaches 5,000 or the size of scanned data reaches 4 MB. Tablestore does not return the remaining data that meets the query conditions. You can use the paging method to obtain the remaining data that meets the query conditions.
Syntax
"""
Description: This operation reads rows whose primary key values are in the specified range.
table_name: the name of the table.
direction: the order in which the rows are read and returned. Type: STRING. Valid values: FORWARD and BACKWARD.
inclusive_start_primary_key: the primary key information from which the read operation starts. If a row contains the start primary key column, the data of this row is returned.
exclusive_end_primary_key: the primary key information at which the read operation ends. If a row contains the end primary key column, the data of this row is not returned.
columns_to_get: optional. The columns that you want to read. If you do not specify this parameter, all columns are returned. Type: LIST.
limit: optional. The maximum number of rows that can be returned. If you do not specify this parameter, all rows are returned.
column_filter: optional. The filter conditions for columns. Only rows that meet the conditions are returned.
max_version: optional. The maximum number of data versions that can be returned. You must specify at least one of the max_version and time_range parameters.
time_range: optional. The time range of versions or a specific version that you want to read. You must specify at least one of the max_version and time_range parameters.
start_column: optional. The column from which the wide-column read operation starts.
end_column: optional. The column at which the wide-column read operation ends.
token: optional. The start column for the current wide-column read operation. The value of this parameter is returned by the previous wide-column read operation, and encoded as binary data.
Response: the results that meet the conditions.
consumed: the number of CUs that are consumed by the operation. The consumed parameter is an instance of the tablestore.metadata.CapacityUnit class.
next_start_primary_key: the primary key information from which you want to start the next GetRange operation. Type: DICT.
row_list: the returned rows of data. Format: [Row, ...].
"""
def get_range(self, table_name, direction,
inclusive_start_primary_key,
exclusive_end_primary_key,
columns_to_get=None,
limit=None,
column_filter=None,
max_version=None,
time_range=None,
start_column=None,
end_column=None,
token=None):
Parameters
Parameter | Description |
table_name | The name of the table. |
direction | The order in which the rows are read and returned.
For example, a table has two primary key values A and B, and Value A is smaller than Value B. If you set the direction parameter to FORWARD and specify a |
inclusive_start_primary_key | The start primary key information and end primary key information of the range that you want to read. The start primary key column and end primary key column must be valid primary key columns or virtual columns whose data is of the INF_MIN type and INF_MAX type. The number of columns in the range specified by virtual columns must be the same as the number of primary key columns of the specified table. INF_MIN indicates an infinitely small value. All values of other types are greater than a value of the INF_MIN type. INF_MAX indicates an infinitely great value. All values of other types are smaller than a value of the INF_MAX type.
The rows in a table are sorted in ascending order based on the primary key values. The range that is used to read data is a left-closed, right-open interval. If data is read in the forward direction, the rows whose primary key values are greater than or equal to the start primary key value but smaller than the end primary key value are returned. |
exclusive_end_primary_key | |
limit | The maximum number of rows that can be returned. The value of this parameter must be greater than 0. Tablestore stops an operation after the maximum number of rows that can be returned in the forward or backward direction is reached, even if some rows in the specified range are not returned. You can use the value of the next_start_primary_key parameter returned in the response to read data in the next request. |
columns_to_get | The columns that you want to read. You can specify the names of primary key columns or attribute columns.
Note
|
max_version | The maximum number of data versions that can be returned. Important You must specify at least one of the max_version and time_range parameters.
|
time_range | The time range of versions or a specific version that you want to read. For more information, see TimeRange. Important You must specify at least one of the max_version and time_range parameters.
Only one of specific_time and Valid values of the time_range parameter: 0 to |
column_filter | The filter that you want to use to filter the query results on the server side. Only rows that meet the filter conditions are returned. For more information, see Filter. Note If you specify both the columns_to_get and column_filter parameters, Tablestore queries the columns that are specified by the columns_to_get parameter, and then returns the rows that meet the filter conditions. |
next_start_primary_key | The start primary key information of the next read request. The value of the next_start_primary_key parameter can be used to determine whether all data is read.
|
Example
In the following example, data whose first primary key column values are in the specified range is read in ascending order based on the second primary key values from an INF_MIN value to an INF_MAX value. Then, the system checks whether the next_start_primary_key parameter is empty in the response. If no, the system calls the GetRange operation again until the next_start_primary_key parameter is empty.
# Specify the start primary key information.
inclusive_start_primary_key = [('gid', 1), ('uid', INF_MIN)]
# Specify the end primary key information.
exclusive_end_primary_key = [('gid', 5), ('uid', INF_MAX)]
# Query all columns.
columns_to_get = []
# Set the limit parameter to 90 to return a maximum of 90 rows of data. If a total of 100 rows meet the query conditions, the number of rows that are returned in the first read operation ranges from 0 to 90. The value of the next_start_primary_key parameter is not None.
limit = 90
# Specify a filter for the columns. In this example, the rows whose value of the address column is China and value of the age column is smaller than 50 are returned.
cond = CompositeColumnCondition(LogicalOperator.AND)
# Specify the pass_if_missing parameter to determine whether a row meets the filter conditions if the row does not contain a specific column.
# If you do not specify the pass_if_missing parameter or set the parameter to True, a row meets the filter conditions if the row does not contain a specific column.
# If you set the pass_if_missing parameter to False, a row does not meet the filter conditions if the row does not contain a specific column.
cond.add_sub_condition(SingleColumnCondition("address", 'China', ComparatorType.EQUAL, pass_if_missing = False))
cond.add_sub_condition(SingleColumnCondition("age", 50, ComparatorType.LESS_THAN,, pass_if_missing = False))
try:
# Call the GetRange operation.
consumed, next_start_primary_key, row_list, next_token = client.get_range(
'<table_name>', Direction.FORWARD,
inclusive_start_primary_key, exclusive_end_primary_key,
columns_to_get,
limit,
column_filter=cond,
max_version=1,
time_range = (1557125059000, 1557129059000) # Specifies that data whose timestamp is equal to or greater than 1557125059000 and smaller than 1557129059000 is returned.
)
all_rows = []
all_rows.extend(row_list)
# If the next_start_primary_key parameter is not empty, continue to read data.
while next_start_primary_key is not None:
inclusive_start_primary_key = next_start_primary_key
consumed, next_start_primary_key, row_list, next_token = client.get_range(
'<table_name>', Direction.FORWARD,
inclusive_start_primary_key, exclusive_end_primary_key,
columns_to_get, limit,
column_filter=cond,
max_version=1
)
all_rows.extend(row_list)
# Display the primary key columns and attribute columns.
for row in all_rows:
print(row.primary_key, row.attribute_columns)
print('Total rows: ', len(all_rows))
# Client exceptions are generally caused by parameter errors or network exceptions.
except OTSClientError as e:
print('get row failed, http_status:%d, error_message:%s' % (e.get_http_status(), e.get_error_message()))
# Server exceptions are generally caused by parameter or throttling errors.
except OTSServiceError as e:
print('get row failed, http_status:%d, error_code:%s, error_message:%s, request_id:%s' % (e.get_http_status(), e.get_error_code(), e.get_error_message(), e.get_request_id()))
For more information about the detailed sample code, see get_range.py on GitHub.
FAQ
References
If you want to use indexes to accelerate data queries, you can use the secondary index or search index feature. For more information, see Secondary index or Search index.
If you want to visualize data in a table, you can connect the table to DataV or Grafana. For more information, see Data visualization tools.
If you want to download data from a table to a local file, you can use DataX or the Tablestore CLI. For more information, see Download data in Tablestore to a local file.
If you want to compute and analyze data in a table, you can use the SQL query feature of Tablestore. For more information, see Overview.
NoteYou can also use compute engines such as MaxCompute, Spark, Hive, HadoopMR, Function Compute, and Realtime Compute for Apache Flink to compute and analyze data in a table. For more information, see Overview.