The distinct clause diversifies search results by limiting how many documents from the same field value appear in a result set. This prevents one dominant value from monopolizing the results page.
Common use cases:
Deduplication: Return only one result per title, company, or product SKU. Set
dist_count:1anddist_times:1to keep exactly one document per field value.Correcting skew: When one seller or author dominates your top results, use
dist_countanddist_timesto cap their share and surface documents from other values.
Syntax
distinct=dist_key:<field>,dist_count:<n>,dist_times:<n>,reserved:<true|false>Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
dist_key | string | Yes | — | The field to scatter results by. Must be an attribute field of INT or LITERAL type. |
dist_count | int | No | 1 | Number of documents to extract per scatter operation. |
dist_times | int | No | 1 | Number of scatter operations to perform. |
reserved | true/false | No | true | Whether to retain documents that were not extracted. Set to false to discard them. |
update_total_hit | true/false | No | false | Applies only when reserved is false. When true, the total_hit value is adjusted by subtracting the number of discarded documents—but may still be inaccurate. When false, total_hit includes discarded documents. |
dist_filter | string | No | — | A filter condition that exempts matching documents from scattering. Exempted documents are sorted alongside the first group of scattered documents. By default, all documents are scattered. |
grade | float | No | — | One or more score thresholds (separated by |) that split documents into categories before scattering. Each category is scattered independently using the same dist_count and dist_times values. Categories are sorted in the same order as the first category. If omitted, all documents are treated as one category. |
When reserved is set to false, the total and viewtotal response values become inaccurate. If your application uses these values for pagination or display, see distinct uniq plug-in.
How dist_count and dist_times work
dist_count controls how many documents to extract per operation. dist_times controls how many operations to run. The extracted documents are placed at the front of the result set, in the order of each operation.
Test data:
| Document | id | name |
|---|---|---|
| doc1 | 11 | a |
| doc2 | 22 | a |
| doc3 | 33 | a |
| doc4 | 44 | b |
| doc5 | 55 | c |
| doc6 | 66 | c |
Example 1 — Extract 2 documents per operation, 1 operation:
distinct=dist_key:name,dist_count:2,dist_times:1,reserved:falseResult: doc1, doc2, doc4, doc5, doc6
Each name value contributes up to 2 documents. name:a contributes doc1 and doc2; name:b contributes doc4; name:c contributes doc5 and doc6. doc3 (third document with name:a) is discarded.
Example 2 — Extract 1 document per operation, 2 operations:
distinct=dist_key:name,dist_count:1,dist_times:2,reserved:falseResult: doc1, doc4, doc5, doc2, doc6
Operation 1 takes the top document from each value: doc1 (a), doc4 (b), doc5 (c). Operation 2 takes the next document from each value: doc2 (a), doc6 (c). doc3 is discarded.
Example 3 — Extract 1 document per operation, 1 operation:
distinct=dist_key:name,dist_count:1,dist_times:1,reserved:falseResult: doc1, doc4, doc5
One document per value is kept. All remaining documents are discarded.
grade parameter
Use grade to classify documents into score-based categories before scattering. Each category is scattered independently using the same dist_count and dist_times values. Categories are sorted in the same order as the first category.
Single threshold — grade:3.0 creates two categories:
| Category | Score range |
|---|---|
| First | score < 3.0 |
| Second | score >= 3.0 |
Two thresholds — grade:3.0|5.0 creates three categories:
| Category | Score range |
|---|---|
| First | score < 3.0 |
| Second | 3.0 <= score < 5.0 |
| Third | score >= 5.0 |
There is no limit on the number of thresholds.
Usage notes
The distinct clause is optional.
Fields referenced in
dist_keymust be configured as attribute fields in the application schema.dist_keysupports only INT and LITERAL field types. ARRAY fields are not supported.Specify only one field per distinct clause.
The sort feature does not remove duplicates. To deduplicate by a field (for example,
title), use a distinct clause withdist_count:1anddist_times:1.
distinct uniq plug-in
When reserved is false, the total and viewtotal response values are inaccurate. The distinct uniq plug-in corrects these values when dist_times, dist_count, and reserved are set to 1, 1, and false.
To enable the plug-in, add duniqfield:<field> to the kvpairs clause:
kvpairs=duniqfield:<field>The <field> value must match dist_key.
Limitations:
Works only when
dist_times=1,dist_count=1, andreserved=false. Changing any of these values disables the plug-in.Returns a maximum of 5,000 results per query, even if more results match.
May time out on queries that hit millions of records.
Examples
Diversify results by company, keep all documents:
Search for documents containing "Zhejiang University" with create_time > 1402301230, scatter by company_id with 10 operations of 2 documents each. Non-extracted documents are retained and ranked at the back.
query=default:'Zhejiang University'&&filter=create_time>1402301230&&distinct=dist_key:company_id,dist_count:2,dist_times:10Deduplicate by company with accurate result count:
Search for documents containing "Zhejiang University", keep only one document per company_id, and use the distinct uniq plug-in to get accurate total and viewtotal values.
query=default:'Zhejiang University'&&distinct=dist_key:company_id,dist_count:1,dist_times:1,reserved:false&&kvpairs=duniqfield:company_id