Use a secondary index to read data by using Tablestore SDK for Python - Tablestore

Tablestore allows you to read a single row of data or data whose primary key values are within a specific range from an index table. If the index table contain the attribute columns that you want to return, you can read the index table to obtain the data. Otherwise, you need to query the data from the data table for which the index table is created.

Prerequisites

An OTSClient instance is initialized. For more information, see Initialize an OTSClient instance.
A secondary index is created. For more information, see Create a secondary index.

Usage notes

You can use an index table only to read data.
The first primary key column of a local secondary index must be the same as the first primary key column of the data table.
If the attribute columns that you want to return are not contained in the index table, you need to query data from the data table for which the index table is created to obtain the required attribute columns.

Read a single row of data

You can call the GetRow operation to read a single row of data. For more information, see Read a single row of data.

Parameters

When you call the GetRow operation to read data from an index table, take note of the following items:

You must set table_name to the name of the index table.
Tablestore automatically adds the primary key columns of the data table that are not specified as index columns to an index table as the primary key columns of the index table. Therefore, when you specify the primary key columns of a row in an index table, you must specify the index columns based on which you create the index table and the primary key columns of the data table.

Examples

The following sample code provides an example on how to read a row of data in an index table based on the primary key of the row:

# Construct the primary key. The primary key columns of the index table consist of definedcol1, pk1, and pk2. pk2 is the primary key column of the data table that is not specified as the index column of the index table. The three primary key columns are of the INTEGER type. The primary key value based on which you want to read data is 1 in the definedcol1 column, 101 in the pk1 column, and 11 in the pk2 column. 
# If you want to read data from a local secondary index, the first primary key column of the index table must be the same as the first primary key column of the data table. 
primary_key = [('definedcol1', 1), ('pk1', 101), ('pk2', 11)]

# Specify the attribute columns of the index table that you want to return. In this example, definedcol2 and definedcol3 are specified. If you set the columns_to_get parameter to [], all attribute columns of the index table are returned. 
columns_to_get = ['definedcol2', 'definedcol3']

# Specify a filter for the columns. In this example, the filter is used to return the rows in which the value of the definedcol2 column is not equal to 1, and the value of the definedcol3 column is 'test'. 
cond = CompositeColumnCondition(LogicalOperator.AND)
cond.add_sub_condition(SingleColumnCondition("definedcol2", 1, ComparatorType.NOT_EQUAL))
cond.add_sub_condition(SingleColumnCondition("definedcol3", 'test', ComparatorType.EQUAL))

try:
    # Call the get_row operation to query data. 
    # Specify the name of the index table. The last parameter value 1 indicates that only one version of the value is returned. 
    consumed, return_row, next_token = client.get_row('<INDEX_NAME>', primary_key, columns_to_get, cond, 1)
    print('Read succeed, consume %s read cu.' % consumed.read)
    print('Value of primary key: %s' % return_row.primary_key)
    print('Value of attribute: %s' % return_row.attribute_columns)
    for att in return_row.attribute_columns:
        # Display the key, value, and version of each column. 
        print('name:%s\tvalue:%s' % (att[0], att[1]))
# In most cases, client exceptions are caused by parameter errors or network exceptions. 
except OTSClientError as e:
    print('get row failed, http_status:%d, error_message:%s' % (e.get_http_status(), e.get_error_message()))
# In most cases, server exceptions are caused by parameter or throttling errors. 
except OTSServiceError as e:
    print('get row failed, http_status:%d, error_code:%s, error_message:%s, request_id:%s' % (e.get_http_status(), e.get_error_code(), e.get_error_message(), e.get_request_id()))

Read data whose primary key values are within a specific range

You can call the GetRange operation to read data whose primary key values are within a specific range. For more information, see Read data whose primary key values are within a specific range.

Parameters

When you call the GetRange operation to read data from an index table, take note of the following items:

You must set table_name to the name of the index table.
Tablestore automatically adds the primary key columns of the data table that are not specified as index columns to an index table as the primary key columns of the index table. Therefore, when you specify the start primary key and end primary key of the range that you want to query, you must specify the index columns based on which you create the index table and the primary key columns of the data table.

Examples

The following sample code provides an example on how to read data whose primary key values are within a specific range from a secondary index.

# Construct the start primary key of the range. If you want to read data from a local secondary index, the first primary key column of the index table must be the same as the first primary key column of the data table. 
inclusive_start_primary_key = [('definedcol1', 1), ('pk1', INF_MIN), ('pk2', INF_MIN)]

# Construct the end primary key of the range. 
exclusive_end_primary_key = [('definedcol1', 5), ('pk1', INF_MAX), ('pk2', INF_MIN)]

# Specify that all columns of the rows that meet the query conditions are returned. 
columns_to_get = []

# Set the limit parameter to 90 to return a maximum of 90 rows of data. If a total of 100 rows meet the query conditions, the number of rows that are returned in the first read operation ranges from 0 to 90. The value of the next_start_primary_key parameter is not None. 
limit = 90

# Specify a filter for the columns. In this example, the filter is used to return the rows in which the value of the definedcol2 column is smaller than 50, and the value of the definedcol3 column is 'China'. 
cond = CompositeColumnCondition(LogicalOperator.AND)
# Specify the pass_if_missing parameter to determine whether a row meets the filter conditions if the row does not contain a specific column. 
# If you do not specify the pass_if_missing parameter or set the parameter to True, a row meets the filter conditions if the row does not contain a specific column. 
# If you set the pass_if_missing parameter to False, a row does not meet the filter conditions if the row does not contain a specific column. 
cond.add_sub_condition(SingleColumnCondition("definedcol3", 'China', ComparatorType.EQUAL, pass_if_missing=False))
cond.add_sub_condition(SingleColumnCondition("definedcol2", 50, ComparatorType.LESS_THAN, pass_if_missing=False))

try:
    # Call the get_range operation. 
    # Specify the name of the index table. 
    consumed, next_start_primary_key, row_list, next_token = client.get_range(
        '<INDEX_NAME>', Direction.FORWARD,
        inclusive_start_primary_key, exclusive_end_primary_key,
        columns_to_get,
        limit,
        column_filter=cond,
        max_version=1,
        time_range=(1557125059000, 1557129059000)  # Specify the time range of data. In this example, the start time is greater than or equal to 1557125059000. The end time is smaller than 1557129059000. 
    )

    all_rows = []
    all_rows.extend(row_list)

    # If the next_start_primary_key parameter is not empty, continue to read data. 
    while next_start_primary_key is not None:
        inclusive_start_primary_key = next_start_primary_key
        consumed, next_start_primary_key, row_list, next_token = client.get_range(
            '<INDEX_NAME>', Direction.FORWARD,
            inclusive_start_primary_key, exclusive_end_primary_key,
            columns_to_get, limit,
            column_filter=cond,
            max_version=1
        )
        all_rows.extend(row_list)

    # Display the primary key columns and attribute columns. 
    for row in all_rows:
        print(row.primary_key, row.attribute_columns)
    print('Total rows: ', len(all_rows))
# In most cases, client exceptions are caused by parameter errors or network exceptions. 
except OTSClientError as e:
    print('get row failed, http_status:%d, error_message:%s' % (e.get_http_status(), e.get_error_message()))
# In most cases, server exceptions are caused by parameter or throttling errors. 
except OTSServiceError as e:
    print('get row failed, http_status:%d, error_code:%s, error_message:%s, request_id:%s' % (e.get_http_status(), e.get_error_code(), e.get_error_message(), e.get_request_id())

FAQ

References

If your business requires multi-dimensional queries and data analysis, you can create a search index and specify the required attributes as the fields of the search index. Then, you can query and analyze data by using the search index. For example, you can use a search index to perform queries based on non-primary key columns, Boolean queries, and fuzzy queries. You can also use a search index to obtain the maximum and minimum values, collect statistics about the number of rows, and group query results. For more information, see Search index.
If you want to execute SQL statements to query and analyze data, you can use the SQL query feature. For more information, see SQL query.