All Products
Search
Document Center

OpenSearch:distinct clause

Last Updated:Mar 31, 2026

The distinct clause diversifies search results by limiting how many documents from the same field value appear in a result set. This prevents one dominant value from monopolizing the results page.

Common use cases:

  • Deduplication: Return only one result per title, company, or product SKU. Set dist_count:1 and dist_times:1 to keep exactly one document per field value.

  • Correcting skew: When one seller or author dominates your top results, use dist_count and dist_times to cap their share and surface documents from other values.

Syntax

distinct=dist_key:<field>,dist_count:<n>,dist_times:<n>,reserved:<true|false>

Parameters

ParameterTypeRequiredDefaultDescription
dist_keystringYesThe field to scatter results by. Must be an attribute field of INT or LITERAL type.
dist_countintNo1Number of documents to extract per scatter operation.
dist_timesintNo1Number of scatter operations to perform.
reservedtrue/falseNotrueWhether to retain documents that were not extracted. Set to false to discard them.
update_total_hittrue/falseNofalseApplies only when reserved is false. When true, the total_hit value is adjusted by subtracting the number of discarded documents—but may still be inaccurate. When false, total_hit includes discarded documents.
dist_filterstringNoA filter condition that exempts matching documents from scattering. Exempted documents are sorted alongside the first group of scattered documents. By default, all documents are scattered.
gradefloatNoOne or more score thresholds (separated by |) that split documents into categories before scattering. Each category is scattered independently using the same dist_count and dist_times values. Categories are sorted in the same order as the first category. If omitted, all documents are treated as one category.
Warning

When reserved is set to false, the total and viewtotal response values become inaccurate. If your application uses these values for pagination or display, see distinct uniq plug-in.

How dist_count and dist_times work

dist_count controls how many documents to extract per operation. dist_times controls how many operations to run. The extracted documents are placed at the front of the result set, in the order of each operation.

Test data:

Documentidname
doc111a
doc222a
doc333a
doc444b
doc555c
doc666c

Example 1 — Extract 2 documents per operation, 1 operation:

distinct=dist_key:name,dist_count:2,dist_times:1,reserved:false

Result: doc1, doc2, doc4, doc5, doc6

Each name value contributes up to 2 documents. name:a contributes doc1 and doc2; name:b contributes doc4; name:c contributes doc5 and doc6. doc3 (third document with name:a) is discarded.

Example 2 — Extract 1 document per operation, 2 operations:

distinct=dist_key:name,dist_count:1,dist_times:2,reserved:false

Result: doc1, doc4, doc5, doc2, doc6

Operation 1 takes the top document from each value: doc1 (a), doc4 (b), doc5 (c). Operation 2 takes the next document from each value: doc2 (a), doc6 (c). doc3 is discarded.

Example 3 — Extract 1 document per operation, 1 operation:

distinct=dist_key:name,dist_count:1,dist_times:1,reserved:false

Result: doc1, doc4, doc5

One document per value is kept. All remaining documents are discarded.

grade parameter

Use grade to classify documents into score-based categories before scattering. Each category is scattered independently using the same dist_count and dist_times values. Categories are sorted in the same order as the first category.

Single thresholdgrade:3.0 creates two categories:

CategoryScore range
Firstscore < 3.0
Secondscore >= 3.0

Two thresholdsgrade:3.0|5.0 creates three categories:

CategoryScore range
Firstscore < 3.0
Second3.0 <= score < 5.0
Thirdscore >= 5.0

There is no limit on the number of thresholds.

Usage notes

  • The distinct clause is optional.

  • Fields referenced in dist_key must be configured as attribute fields in the application schema.

  • dist_key supports only INT and LITERAL field types. ARRAY fields are not supported.

  • Specify only one field per distinct clause.

  • The sort feature does not remove duplicates. To deduplicate by a field (for example, title), use a distinct clause with dist_count:1 and dist_times:1.

distinct uniq plug-in

When reserved is false, the total and viewtotal response values are inaccurate. The distinct uniq plug-in corrects these values when dist_times, dist_count, and reserved are set to 1, 1, and false.

To enable the plug-in, add duniqfield:<field> to the kvpairs clause:

kvpairs=duniqfield:<field>

The <field> value must match dist_key.

Limitations:

  • Works only when dist_times=1, dist_count=1, and reserved=false. Changing any of these values disables the plug-in.

  • Returns a maximum of 5,000 results per query, even if more results match.

  • May time out on queries that hit millions of records.

Examples

Diversify results by company, keep all documents:

Search for documents containing "Zhejiang University" with create_time > 1402301230, scatter by company_id with 10 operations of 2 documents each. Non-extracted documents are retained and ranked at the back.

query=default:'Zhejiang University'&&filter=create_time>1402301230&&distinct=dist_key:company_id,dist_count:2,dist_times:10

Deduplicate by company with accurate result count:

Search for documents containing "Zhejiang University", keep only one document per company_id, and use the distinct uniq plug-in to get accurate total and viewtotal values.

query=default:'Zhejiang University'&&distinct=dist_key:company_id,dist_count:1,dist_times:1,reserved:false&&kvpairs=duniqfield:company_id