All Products
Search
Document Center

distinct clause

Last Updated: Sep 09, 2021

distinct clauses can be used to ensure diversified results. This improves user experience. For example, a great many documents are retrieved in a query. However, multiple documents of a specific user are highly scored and ranked in the front. As a result, most of the results displayed on the same page are from the same user. This affects the display effect and user experience. To resolve these issues, distinct clauses can be used to extract documents from each user so that the documents of each user can be displayed.

Syntax

Syntax format: dist_key:field,dist_count:1,dist_times:1,reserved:false.

Parameter

Type

Required

Valid value

Default value

Description

dist_key

string

Yes

The field to be scattered.

dist_times

int

No

1

The number of extractions.

dist_count

int

No

1

The number of documents to be extracted in one extraction.

reserved

true/false

No

true/false

true

Specifies whether to retain the remaining documents after extraction. If this parameter is set to false, the remaining documents are discarded. As a result, the total number of matching results is inaccurate.

update_total_hit

true/false

No

true/false

false

If the reserved parameter is set to false and the update_total_hit parameter is set to true, the discarded documents are excluded from the total matching results. If the update_total_hit parameter is set to false, the discarded documents are still included in the total matching results.

dist_filter

string

No

The filter condition. The documents that are filtered out are not scattered, but are sorted together with the first group of scattered documents. By default, all documents are scattered.

grade

float

No

The thresholds for classifying documents into different categories. All documents are classified into different categories based on the specified thresholds. The documents in each category are scattered based on the parameters in the distinct clause. The grade parameter is optional. If you do not set the grade parameter, all documents are classified into one category by default. Documents are classified based on the specified thresholds. Separate thresholds with vertical bars (|). The number of thresholds that you can specify is not limited. Example 1: grade:3.0. In this case, documents are classified into two categories based on the specified threshold. The documents with a score less than 3.0 are classified into the first category. The documents with a score greater than or equal to 3.0 are classified into the second category. Example 2: grade:3.0|5.0. In this case, documents are classified in to three categories. The documents with a score less than 3.0 are classified into the first category. The documents with a score greater than or equal to 3.0 but less than 5.0 are classified into the second category. The documents with a score greater than or equal to 5.0 are classified into the third category. The categories are sorted in the same order as that used for sorting the documents in the first category. If the documents in the first category are sorted in descending order, the categories are sorted in descending order. This also works the other way around.

Description about the dist_count and dist_times parameters

The following examples describe the usage and meanings of the dist_count and dist_times parameters. Six documents are provided. id indicates the primary key, and name indicates the field to be scattered.

doc1: id:11 name:a

doc2: id:22 name:a

doc3: id:33 name:a

doc4: id:44 name:b

doc5: id:55 name:c

doc6: id:66 name:c

Example 1: distinct=dist_key:name,dist_count:2,dist_times:1,reserved:false. In this example, one extraction is performed, and two documents are extracted. The following results are obtained after scattering: doc1, doc2, doc4, doc5, and doc6.

Example 2: distinct=dist_key:name,dist_count:1,dist_times:2,reserved:false. In this example, two extractions are performed. In each extraction, one document is extracted. The following results are obtained after scattering: doc1, doc4, doc5, doc2, and doc6.

Example 3: distinct=dist_key:name,dist_count:1,dist_times:1,reserved:false. In this example, one extraction is performed, and one document is extracted. The following results are obtained after scattering: doc1, doc4, and doc5.

Usage notes

  1. distinct clauses are optional.

  2. The fields referenced in a distinct clause must be configured as attribute fields when you define the application schema.

  3. The ARRAY type is not supported. Only the INT and LITERAL types are supported.

  4. You can specify only one field to be scattered.

distinct uniq plug-in

As described above, if the reserved parameter is set to false, the values of the total and viewtotal parameters related to search results are inaccurate. In this case, if you want to implement paging or perform other processing based on these values, errors may occur. To this end, OpenSearch provides the distinct uniq plug-in to ensure that the values of the total and viewtotal parameters are accurate when the dist_times, dist_count, and reserved parameters are set to 1, 1, and false. Set the kvpairs parameter to duniqfield:field. Example: kvpairs=duniqfield:name.

Notes:

  • The value of the field parameter must be the same as the value of the dist_key parameter in the distinct clause.

  • This plug-in works only when the dist_times, dist_count, and reserved parameters are set to 1, 1, and false. If the values of these parameter change, this plug-in does not work.

  • For performance reasons, this plug-in returns a maximum of 5,000 search results in each query even if more than 5,000 search results may be actually obtained.

Examples

  1. You want to search for documents in which the value of the create_time parameter is greater than 1402301230 and "Zhejiang University" is contained. The retrieved documents are scattered based on the company_id field. A total of 10 extractions are performed. In each extraction, two documents are extracted. The extracted documents are ranked at the back.

    query=default:'Zhejiang University'&&filter=create_time>1402301230&&distinct=dist_key:company_id,dist_count:2,dist_times:10
  2. You want to search for documents that contain "Zhejiang University". The retrieved documents are scattered based on the company_id field. A total of one extraction is performed, and one document is extracted. The remaining documents after extraction are discarded, and only the extracted documents are returned.

    query=default:'Zhejiang University'&&distinct=dist_key:company_id,dist_count:1,dist_times:1,reserved:false&&kvpairs=duniqfield:company_id