All Products
Search
Document Center

Tablestore:Aggregation

Last Updated:Apr 23, 2026

The aggregation feature of a search index allows you to perform value calculations and grouping statistics on query results. Value aggregations include minimum value, maximum value, sum, average value, count, distinct count, and percentile statistics. Grouping aggregations support grouping by field value, range, geographical location, filter, histogram, and composite aggregation. You can combine multiple aggregations in a single request.

Note

The aggregation feature is supported in Python SDK V5.2.1 and later.

Features

The following tables describe the aggregation features.

Value aggregation

Feature

Description

Minimum value

Returns the minimum value from a field, similar to the SQL min function.

Maximum value

Returns the maximum value from a field, similar to the SQL max function.

Sum

Returns the sum of values in a numeric field, similar to the SQL sum function.

Average value

Returns the average of values in a numeric field, similar to the SQL avg function.

Count

Returns the number of values for a specified field or the total number of rows in a search index, similar to the SQL count function.

Distinct count

Returns the number of distinct values for a specified field, similar to the SQL count(distinct) function.

Percentile statistics

Calculates the percentile distribution of a dataset. For example, during routine system operations and maintenance (O&M), you can use this feature to analyze the distribution of request latencies, such as the P25, P50, P90, and P99 values.

Grouping aggregation

Feature

Description

Group by field value

Groups query results based on the values of a field. Rows with the same field value are placed in the same group. The response returns each group's value and its row count.

Note

When the number of groups is large, the results of this aggregation may be approximate.

Group by range

Groups query results based on specified ranges of a field. Rows with field values that fall into the same range are placed in the same group. The response returns the number of items in each range.

Group by geographical location

Groups query results based on the distance from a central point. Rows within the same distance range are placed in the same group. The response returns the number of items in each range.

Group by filter

Groups query results based on a set of filters. The response returns the number of documents that match each filter. The results are returned in the same order that the filters are specified.

Histogram aggregation

Groups query results based on a specified numeric interval. Rows with field values that fall into the same interval are placed in the same group. The response returns each group's value and its corresponding count.

Composite aggregation

Groups query results by multiple fields, similar to GROUP BY column1, column2, ... in SQL. You can use pagination to retrieve all the grouped results.

Value aggregation

Minimum value

Returns the minimum value from a field, similar to the SQL min function.

Parameters

Parameter

Description

name

A custom name for the aggregation, used to retrieve its result.

field

The field to aggregate. The data type of the field must be Long, Double, or Date.

missing

A default value for documents missing the specified field. If this parameter is set, its value is used in the aggregation. Otherwise, documents missing the field are ignored.

Example

Calculate the minimum score among people who are 18 years old.

query = TermQuery('age', 18)
agg = Min('score', name='min')

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, aggs=[agg]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for agg_result in search_response.agg_results:
    print("name: %s, value: %s" % (agg_result.name, str(agg_result.value)))

Maximum value

Returns the maximum value from a field, similar to the SQL max function.

Parameters

Parameter

Description

name

A custom name for the aggregation, used to retrieve its result.

field

The field to aggregate. The data type of the field must be Long, Double, or Date.

missing

A default value for documents missing the specified field. If this parameter is set, its value is used in the aggregation. Otherwise, documents missing the field are ignored.

Example

Calculate the maximum score among people who are 18 years old. If a person does not have a score, a default value of 0 is used.

query = TermQuery('age', 18)
agg = Max('score', missing_value=0, name='max')

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, aggs=[agg]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for agg_result in search_response.agg_results:
    print("name: %s, value: %s" % (agg_result.name, str(agg_result.value)))

Sum

Returns the sum of values in a numeric field, similar to the SQL sum function.

Parameters

Parameter

Description

name

A custom name for the aggregation, used to retrieve its result.

field

The field to aggregate. The data type of the field must be Long or Double.

missing

A default value for documents missing the specified field. If this parameter is set, its value is used in the aggregation. Otherwise, documents missing the field are ignored.

Example

Calculate the sum of all scores for people who are 18 years old.

query = TermQuery('age', 18)
agg = Sum('score', name='sum')

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, aggs=[agg]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for agg_result in search_response.agg_results:
    print("name: %s, value: %s" % (agg_result.name, str(agg_result.value)))

Average value

Returns the average of values in a numeric field, similar to the SQL avg function.

Parameters

Parameter

Description

name

A custom name for the aggregation, used to retrieve its result.

field

The field to aggregate. The data type of the field must be Long, Double, or Date.

missing

A default value for documents missing the specified field. If this parameter is set, its value is used in the aggregation. Otherwise, documents missing the field are ignored.

Example

Calculate the average score for people who are 18 years old.

query = TermQuery('age', 18)
agg = Avg('score', name='avg')

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, aggs=[agg]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for agg_result in search_response.agg_results:
    print("name: %s, value: %s" % (agg_result.name, str(agg_result.value)))

Count

Returns the number of values for a specified field or the total number of rows in a search index, similar to the SQL count function.

Note

You can count rows that match a query in two ways:

  • To get the total number of matching rows, set the get_total_count query parameter to True.

  • To count only the rows where a specific field has a value (useful for sparse columns), use the count aggregation on that field.

Parameters

Parameter

Description

name

A custom name for the aggregation, used to retrieve its result.

field

The field to be used for the aggregation. The data type of the field must be Long, Double, Boolean, Keyword, GeoPoint, or Date.

Example

Count the number of people who are 18 years old and have a recorded exam score.

query = TermQuery('age', 18)
agg = Count('score', name='count')

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, aggs=[agg]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for agg_result in search_response.agg_results:
    print("name: %s, value: %s" % (agg_result.name, str(agg_result.value)))

Distinct count

Returns the number of distinct values for a specified field, similar to the SQL count(distinct) function.

Note

The result of a distinct count is an approximation.

  • When the distinct count is less than 10,000, the result is nearly exact.

  • When the distinct count reaches 100 million, the error rate is approximately 2%.

Parameters

Parameter

Description

name

A custom name for the aggregation, used to retrieve its result.

field

The field to be used for the aggregation. The data type of the field must be Long, Double, Boolean, Keyword, GeoPoint, or Date.

missing

A default value for documents missing the specified field. If this parameter is set, its value is used in the aggregation. Otherwise, documents missing the field are ignored.

Example

Count the number of distinct names among people who are 18 years old.

query = TermQuery('age', 18)
agg = DistinctCount('name', name='distinct_name')

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, aggs=[agg]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for agg_result in search_response.agg_results:
    print("name: %s, value: %s" % (agg_result.name, str(agg_result.value)))

Percentile statistics

Calculates the percentile distribution of a dataset. For example, during routine system O&M, you can use this feature to analyze the distribution of request latencies, such as the P25, P50, P90, and P99 values.

Note

Percentile statistics are an approximation. Accuracy is higher for percentiles at the extremes of the distribution (such as 1% and 99%) than for those in the middle (such as 50%).

Parameters

Parameter

Description

name

A custom name for the aggregation, used to retrieve its result.

field

The field to be used for the aggregation. The data type of the field must be Long, Double, or Date.

percentiles

A list of percentile values, such as [50, 90, 99]. You can specify one or more values.

missing_value

A default value for documents missing the specified field. If this parameter is set, its value is used in the aggregation. Otherwise, documents missing the field are ignored.

Example

query = TermQuery('product', '10010')
agg = Percentiles('latency', percentiles_list=[50, 90, 95])

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, aggs=[agg]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for agg_result in search_response.agg_results:
    print("name: %s" % agg_result.name)
    for item in agg_result.value:
        print("  percentile: %s, value: %s" % (str(item.key), str(item.value)))

Grouping aggregation

Group by field value

Groups query results based on the values of a field. Rows with the same field value are placed in the same group. The response returns each group's value and its row count.

Note

When the number of groups is large, the results of this aggregation may be approximate.

Parameters

Parameter

Description

name

A custom name for the aggregation, used to retrieve its result.

field

The field to group by. The data type of the field must be Long, Double, Boolean, Keyword, or Date.

size

The number of groups to return. The default value is 10. The maximum value is 2,000. If the number of groups exceeds 2,000, only the first 2,000 groups are returned.

group_by_sort

The sorting rule for the returned groups. By default, groups are sorted by row count in descending order. If multiple rules are set, they are applied sequentially. You can sort by group key in lexicographical order, by row count, or by the result of a sub-aggregation in ascending or descending order.

sub_aggs and sub_group_bys

Sub-aggregations and sub-group-bys that perform further analysis on the grouped data. For example, after grouping products by category, you can add Max and Min sub-aggregations to find the highest and lowest price for each category.

Example 1

Group people who are 18 years old by score, and retrieve the 10 most common scores and the number of people for each score.

query = TermQuery('age', 18)
group_by = GroupByField('score', size=10)

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, group_bys=[group_by]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for group_by_result in search_response.group_by_results:
    print("name: %s" % group_by_result.name)
    for item in group_by_result.items:
        print("  key: %s, count: %d" % (item.key, item.row_count))

Example 2

Group people who are 18 years old by score, and retrieve the two least common scores and the number of people for each score.

group_by = GroupByField('score', size=2, group_by_sort=[RowCountSort(sort_order=SortOrder.ASC)])

search_response = client.search(table_name, index_name,
    SearchQuery(TermQuery('age', 18), limit=0, get_total_count=True, group_bys=[group_by]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for group_by_result in search_response.group_by_results:
    print("name: %s" % group_by_result.name)
    for item in group_by_result.items:
        print("  key: %s, count: %d" % (item.key, item.row_count))

Example 3

Group people who are 18 years old by score, retrieve the two most common scores and their counts, and then get information about the top three individuals in each group, sorted by primary key.

sort = RowCountSort(sort_order=SortOrder.DESC)
sub_agg = [TopRows(limit=3, sort=Sort([PrimaryKeySort(sort_order=SortOrder.DESC)]), name='top_rows')]
group_by = GroupByField('score', size=2, group_by_sort=[sort], sub_aggs=sub_agg)

search_response = client.search(table_name, index_name,
    SearchQuery(TermQuery('age', 18), limit=0, get_total_count=True, group_bys=[group_by]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for group_by_result in search_response.group_by_results:
    print("name: %s" % group_by_result.name)
    for item in group_by_result.items:
        print("  key: %s, count: %d" % (item.key, item.row_count))
        for sub_agg in item.sub_aggs:
            print("    sub_agg: %s" % sub_agg.name)
            for entry in sub_agg.value:
                print("      value: %s" % str(entry))

Example 4

Group people who are 18 years old by score and gender.

sort = RowCountSort(sort_order=SortOrder.ASC)
sub_group = GroupByField('sex', size=10, group_by_sort=[sort])
group_by = GroupByField('score', size=10, group_by_sort=[sort], sub_group_bys=[sub_group])

search_response = client.search(table_name, index_name,
    SearchQuery(TermQuery('age', 18), limit=0, get_total_count=True, group_bys=[group_by]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for group_by_result in search_response.group_by_results:
    print("name: %s" % group_by_result.name)
    for item in group_by_result.items:
        print("  key: %s, count: %d" % (item.key, item.row_count))
        for sub_group in item.sub_group_bys:
            print("    sub_group: %s" % sub_group.name)
            for sub_item in sub_group.items:
                print("      key: %s, count: %s" % (str(sub_item.key), str(sub_item.row_count)))

Group by range

Groups query results based on specified ranges of a field. Rows with field values that fall into the same range are placed in the same group. The response returns the count of items in each range.

Parameters

Parameter

Description

name

A custom name for the aggregation, used to retrieve its result.

field

The field to group by. The data type of the field must be Long or Double.

range[double_from, double_to)

The range for grouping. The start and end values can be set to represent negative and positive infinity.

sub_aggs and sub_group_bys

Sub-aggregations and sub-group-bys that perform further analysis on the grouped data. For example, to find which province contributes the most to sales within a certain sales volume range, you can group by sales volume and then add a GroupByField as a sub-group-by for the province.

Example

Count the number of people who are 18 years old and have scores in the [80, 90) and [90, 100) ranges.

query = TermQuery('age', 18)
group_by = GroupByRange(field_name='score', ranges=[(80, 90), (90, 100)])

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, group_bys=[group_by]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for group_by_result in search_response.group_by_results:
    print("name: %s" % group_by_result.name)
    for item in group_by_result.items:
        print("  range: %.1f~%.1f, count: %d" % (item.range_from, item.range_to, item.row_count))

Group by geographical location

Groups query results based on the distance from a central point. Rows within the same distance range are placed in the same group. The response returns the count of items in each range.

Parameters

Parameter

Description

name

A custom name for the aggregation, used to retrieve its result.

field

The field must be of the GeoPoint type.

origin(double lat, double lon)

The latitude and longitude of the central point. lat is the latitude and lon is the longitude.

range[double_from, double_to)

The range for grouping, in meters. The start and end values can be set to represent negative and positive infinity.

sub_aggs and sub_group_bys

Sub-aggregations and sub-group-bys that perform further analysis on the grouped data.

Example

Count the number of people who are 18 years old and whose homes are within 1 kilometer, and between 1 and 2 kilometers, from their school. The school's coordinates are (31, 116).

query = TermQuery('age', 18)
group_by = GroupByGeoDistance(field_name='address', origin=GeoPoint(31, 116), ranges=[(0, 1000), (1000, 2000)])

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, group_bys=[group_by]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for group_by_result in search_response.group_by_results:
    print("name: %s" % group_by_result.name)
    for item in group_by_result.items:
        print("  range: %.1f~%.1f, count: %d" % (item.range_from, item.range_to, item.row_count))

Group by filter

Groups query results based on a set of filters. The response returns the number of documents that match each filter. The results are returned in the same order that the filters are specified.

Parameters

Parameter

Description

name

A custom name for the aggregation, used to retrieve its result.

filter

The filter conditions. Results are returned in the same order that the filters are specified.

sub_aggs and sub_group_bys

Sub-aggregations and sub-group-bys that perform further analysis on the grouped data.

Example

Count the number of people who are 18 years old and scored 100 in math, and the number of people who scored 100 in Chinese.

query = TermQuery('age', 18)
filter1 = TermQuery('math', 100)
filter2 = TermQuery('chinese', 100)
filters = [filter1, filter2]
group_by = GroupByFilter(filters)

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, group_bys=[group_by]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for group_by_result in search_response.group_by_results:
    print("name: %s" % group_by_result.name)
    i = 0
    for item in group_by_result.items:
        print("  filter: %s=%s, count: %d" % (str(filters[i].field_name), str(filters[i].column_value), item.row_count))
        i += 1

Histogram aggregation

Groups query results based on a specified numeric interval. Rows with field values that fall into the same interval are placed in the same group. The response returns each interval and its corresponding document count.

Parameters

Parameter

Description

name

A custom name for the aggregation, used to retrieve its result.

field

The field to be used for the aggregation. The data type of the field must be Long or Double.

interval

The interval for the aggregation.

field_range[min,max]

The range for the aggregation. This parameter works with interval to limit the number of groups. The value of (fieldRange.max - fieldRange.min) / interval cannot exceed 2,000.

min_doc_count

The minimum document count for a group to be included in the results. Groups with fewer documents than this value are omitted.

missing_value

A default value for documents missing the specified field. If this parameter is set, its value is used in the aggregation. Otherwise, documents missing the field are ignored.

Example

query = TermQuery('product', '10010')
group_by = GroupByHistogram(field_name='latency', interval=100, field_range=FieldRange(0, 10000), missing_value=0)

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, group_bys=[group_by]),
    ColumnsToGet(return_type=ColumnReturnType.ALL_FROM_INDEX))

for group_by_result in search_response.group_by_results:
    print("name: %s" % group_by_result.name)
    for item in group_by_result.items:
        print("  key: %s, value: %s" % (item.key, item.value))

Composite aggregation

Composite aggregation groups query results based on multiple fields, similar to the GROUP BY column1, column2, ... clause in SQL. Unlike group by field value, composite aggregation supports grouping across multiple sources simultaneously and supports pagination to retrieve all grouped results. A maximum of 32 sources are allowed in sources.

Note

Python SDK 6.4.4 and later supports composite aggregation.

Parameters

Parameter

Description

name

A custom name for the aggregation, used to distinguish different aggregations and obtain their corresponding results. The default value is group_by_composite.

sources

The list of sources. Multiple GroupBy types, such as GroupByField, can be used as sources. The sources list must be non-empty and can contain a maximum of 32 sources.

Note
  • The size parameter cannot be set for GroupBy in a source. Simply specify field_name and name. The number of groups is controlled by the size or suggested_size parameter of GroupByComposite.

  • Set the name parameter for GroupBy in a source. The name is returned in the source_group_by_names field, which helps identify each source.

size

The number of groups to return in each response. This is an optional parameter with a default value of 10 and a maximum value of 2000. It precisely controls the number of groups to return. If the specified value exceeds the maximum value, the server returns an error. This parameter cannot be specified at the same time as suggested_size.

next_token

The next token. You do not need to set this for the first request. For subsequent requests, use the next_token from the previous response.

suggested_size

The suggested number of groups. This is an optional parameter that accepts any positive integer or -1. In suggestion mode, if the specified value exceeds the maximum value (2000), it is automatically adjusted to the maximum value without returning an error. A value of -1 also defaults to the maximum value. This is suitable for scenarios such as data exploration and batch processing where you want to retrieve as much data as possible without triggering an out-of-bounds error. This parameter cannot be specified at the same time as size.

sub_aggs and sub_group_bys

Sub-aggregations and sub-group-bys that perform further analysis on the grouped data.

Note
  • In the returned result, the keys for each group is a list of strings that corresponds one-to-one with the sources. When the field value corresponding to a source is empty, the value at the corresponding position in keys is None.

  • When the number of groups is large, we recommend that you set the size or suggested_size parameter and use next_token to paginate through the results to avoid returning too much data at once. When the next_token in the response is None, it indicates that all group results have been retrieved.

Example 1

Group people who are 18 years old by the score field, and get the number of rows for each score value.

query = TermQuery('age', 18)
# When a GroupByField is used as a source, you only need to specify the field_name and name parameters. The size parameter cannot be set.
source = GroupByField('score', name='group_by_score')
group_by = GroupByComposite(sources=[source])

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, get_total_count=True, group_bys=[group_by]),
    ColumnsToGet(return_type=ColumnReturnType.NONE))

for group_by_result in search_response.group_by_results:
    print("name: %s" % group_by_result.name)
    print("source_names: %s" % str(group_by_result.source_group_by_names))
    for item in group_by_result.items:
        print("  keys: %s, count: %d" % (str(item.keys), item.row_count))

Example 2

Group by the score and sex fields for individuals aged 18 to obtain the number of rows for each score and sex combination.

query = TermQuery('age', 18)
source1 = GroupByField('score', name='group_by_score')
source2 = GroupByField('sex', name='group_by_sex')
group_by = GroupByComposite(sources=[source1, source2])

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, get_total_count=True, group_bys=[group_by]),
    ColumnsToGet(return_type=ColumnReturnType.NONE))

for group_by_result in search_response.group_by_results:
    print("name: %s" % group_by_result.name)
    print("source_names: %s" % str(group_by_result.source_group_by_names))
    for item in group_by_result.items:
        # The keys parameter is a list of strings that corresponds to the sources in order. If a field value is empty, the corresponding element is None.
        print("  keys: %s, count: %d" % (str(item.keys), item.row_count))

Example 3

Use next_token to retrieve all grouped results in a paginated manner. The results are grouped by the score field, and you can retrieve 2 groups at a time to obtain all results through pagination.

query = TermQuery('age', 18)
source = GroupByField('score')
group_by = GroupByComposite(sources=[source], size=2)

# Initial request
search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, get_total_count=True, group_bys=[group_by]),
    ColumnsToGet(return_type=ColumnReturnType.NONE))

group_by_result = search_response.group_by_results[0]
all_items = list(group_by_result.items)

# Paginate to retrieve the remaining results. The operation is complete when next_token is None.
while group_by_result.next_token is not None:
    group_by = GroupByComposite(sources=[source], size=2, next_token=group_by_result.next_token)
    search_response = client.search(table_name, index_name,
        SearchQuery(query, limit=0, get_total_count=True, group_bys=[group_by]),
        ColumnsToGet(return_type=ColumnReturnType.NONE))

    group_by_result = search_response.group_by_results[0]
    all_items.extend(group_by_result.items)

# Print all the grouped results.
for item in all_items:
    print("keys: %s, count: %d" % (str(item.keys), item.row_count))

Example 4

You can use a sub-aggregation to group data by the score field and calculate the maximum value of score for each group.

query = TermQuery('age', 18)
source = GroupByField('score')
sub_agg = Max('score')
group_by = GroupByComposite(sources=[source], sub_aggs=[sub_agg])

search_response = client.search(table_name, index_name,
    SearchQuery(query, limit=0, get_total_count=True, group_bys=[group_by]),
    ColumnsToGet(return_type=ColumnReturnType.NONE))

for group_by_result in search_response.group_by_results:
    print("name: %s" % group_by_result.name)
    for item in group_by_result.items:
        print("  keys: %s, count: %d" % (str(item.keys), item.row_count))
        for sub_agg in item.sub_aggs:
            print("    sub_agg: %s, value: %s" % (sub_agg.name, str(sub_agg.value)))

Related documentation