You can use the collapse (distinct) feature to collapse the result set based on a specified column when the results of a query contain large amounts of data of a specific type. Data of the specific type is displayed only once in the query results to ensure diversity of the result types.

Prerequisites

  • The OTSClient is initialized. For more information, see Initialization.
  • A data table is created. Data is written to the table.
  • A search index is created for the data table. For more information, see Create search indexes.

Usage notes

  • If you use the collapse (distinct) feature, you can perform pagination only by specifying offset and limit instead of token.
  • If you aggregate and collapse a result set at the same time, the result set is aggregated before it is collapsed.
  • If you collapse the query results, the total number of results that are returned is determined by the sum of the offset and limit values. A maximum of 50,000 results can be returned.
  • The total number of rows in the response indicates the number of rows that meet the query conditions before you use the collapse (distinct) feature. After the result set is collapsed, the total number of distinct values cannot be queried.

Parameters

ParameterDescription
queryThe query type. You can set this parameter to any query type.
collapseThe configuration of the collapse parameter, including field_name.

field_name: the name of the column based on which the result set is collapsed. Only columns whose values are of the INTEGER, FLOATING-POINT and KEYWORD data types are supported.

offsetThe position from which the current query starts.
limitThe maximum number of rows that you want the current query to return.

To query only the number of matched rows without returning specific data, you can set limit to 0. This way, Tablestore returns the number of matched rows without specific data from the table.

get_total_countSpecifies whether to return the total number of rows that meet the query conditions. The default value of this parameter is false, which indicates that the total number of rows that meet the query conditions is not returned.

If you set this parameter to true, the query performance is compromised.

table_nameThe name of the data table.
index_nameThe name of the search index.
columns_to_getSpecifies whether to return all columns of each matched row. You can configure return_type and column_names for this parameter.
  • If you set return_type to ColumnReturnType.SPECIFIED, you can use column_names to specify the columns to return.
  • If you set the return_type parameter to ColumnReturnType.ALL, all columns are returned.
  • If you set the return_type parameter to ColumnReturnType.NONE, only the primary key columns are returned.

Examples

# Construct a query condition to return all products whose price is 1,000. 
query = TermQuery('price', 1000)

# Collapse the result set based on the "product_name" column. 
collapse = Collapse('product_name')
search_response = client.search(table_name, index_name,
                                SearchQuery(query, limit=100, get_total_count = False, collapse_field = collapse),
                                columns_to_get = ColumnsToGet(return_type = ColumnReturnType.ALL_FROM_INDEX))