All Products
Search
Document Center

aggregate clause

Last Updated: Sep 09, 2021

Overview

Tens of thousands of documents may be retrieved based on a single search query. You may be unable to view all the retrieved documents. In specific cases, statistics about the retrieved documents are required.

Function

Syntax of the aggregate clause:

group_key:field, range:number1~number2, agg_fun:func1#func2, max_group:number2, 
agg_filter:filter_clause, max_group:number

Parameters:

Parameter

Type

Required

Valid value

Default value

Description

group_key:field

field: an attribute field

Yes

Fields of the INT, LITERAL, INT_ARRAY, or LITERAL_ARRAY type. If the attribute field is of the INT_ARRAY or LITERAL_ARRAY type and an item in the array is repeated, the number of its occurrences is counted.

Specifies the name of the field for which you want to collect statistics. You must reference an attribute field in this parameter.

agg_fun

Yes

The built-in functions count(), sum(), max(), and min()

You can set func to the count(), sum(), max(), or min() built-in function to calculate the number of documents, the sum of field values, the maximum field value, or the minimum field value. You can use multiple functions at a time by separating them with number signs (#). You can reference multiple fields in the sum(), max(), or min() function by using basic arithmetic operators.

range

No

The values between Number 1 and Number 2 and values greater than Number 2. Values of fields of the STRING type cannot be aggregated to collect statistics.

Generates statistics based on value ranges. This parameter can be used for data distribution. You can reference only one field in this parameter.

agg_filter

No

Retrieves documents that meet the specified conditions.

agg_sampler_threshold

INT

No

Specifies the threshold for document sampling. The retrieved documents whose ranks are higher than the threshold are counted in sequence in statistics, whereas those whose ranks are lower than the threshold are sampled based on the value of the agg_sampler_step parameter.

agg_sampler_step

INT

No

Specifies the sampling step size. The value indicates the intervals at which the documents whose ranks are lower than the threshold specified by the agg_sampler_threshold parameter are sampled. Statistics that are collected by the sum() and count() functions are processed in the following way: The system multiplies the statistics of documents whose ranks are lower than the threshold by the sampling step size to generate the estimated statistics. Then, the system adds the estimated statistics to the statistics of documents whose ranks are higher than the threshold to generate the final statistics.

max_group

INT

No

1000

Specifies the maximum number of key-items pairs that can be returned.

Usage notes

  • The aggregate clause is optional.

  • The fields that are referenced in the preceding parameters must be configured as attribute fields when you define the application schema.

  • The result of the aggregate clause is returned to the facet node, which is a node that is used for searches. The functions, such as sum() and count(), that are specified by the agg_fun parameter display the statistics.

  • You can specify multiple fields in the aggregate clause at a time by separating them with semicolons (;).

  • The result of the aggregate clause is returned to the facet node. To display statistics in the return result, you must set the format of the config clause to full JSON.

  • Due to the limit of the engine performance, the aggregate clause can return the accurate statistics of a maximum of 100,000 documents. The statistics of the excess documents may be inaccurate.

Examples

  1. Use the following query clause to obtain the statistics of documents that contain "浙大". Statistics are calculated based on the group_id and company_id fields. For the group_id fields, the statistics include the value sum and maximum value of the price field. For the company_id field, the statistics include the number of times each company occurs.

    query=default:'浙大'&&aggregate=group_key:group_id,agg_fun:sum(price)#max(price);group_key:company_id,agg_fun:count()

    Sample return result:

    {
      status: "OK",
      result: {
        searchtime: 0.015634,
        total: 5,
        num: 1,
        viewtotal: 5,
        items: [        //The return result.
          { ... }
        ],
        facet: [
          {
            key: "group_id",
            items: [
              {
                value: 43,
                sum: 81,
                max: 20,
              },
              {
                value: 63,
                sum: 91,
                max: 50,
              },
            ],
          },
          {
            key: "company_id",
            items: [
              {
                value: 13,
                count: 4,
              },
              {
                value: 10,
                count: 1,
              },
            ],
          },
        ],
      },
      errors: [ ],
      tracer: "",
    },
  2. Use the following query clause to obtain the statistics of documents that contain "浙大" based on the group_id field. The value sum of the price field is calculated. Documents whose ranks are lower than 10,000 are sampled. The sampling step size is set to 5.

    query=default:'浙大'&&aggregate=group_key:group_id,agg_fun:sum(price), agg_sampler_threshold:10000, agg_sampler_step:5
  3. Use the following query clause to obtain the statistics of documents that contain "浙大" based on the group_id field. The number of documents whose values of the group_id field are less than 10, that of documents whose values of the group_id field are between 10 and 50, and that of documents whose values of the group_id field are greater than 50, are calculated.

    query=default:'浙大'&&aggregate=group_key:group_id,agg_fun:count(),range:10~50
  4. Use the following query clause to obtain the statistics of documents that contain "浙大" based on the group_id field. The maximum value sum of the hits and replies fields is calculated among documents whose values of the create_timestamp field are greater than 1423456781.

    query=default:'浙大'&&aggregate=group_key:group_id,agg_fun:max(hits+replies),agg_filter:create_timestamp>1423456781