OpenSearch delivers results through a three-stage pipeline: query analysis, document matching, and relevance scoring. Understanding each stage helps you diagnose unexpected results and tune search quality.
How it works
When a search request arrives, OpenSearch:
Analyzes the search query into terms using the configured analyzer.
Matches documents against those terms using the logical operators in your query clause.
Scores matched documents using a rough sort expression, then re-ranks the top results using a fine sort expression.
Returns results based on the
startandhitparameters.
For details on analyzers, see Text analyzers.
Control what gets matched
Logical operators determine which documents are retrieved. OpenSearch uses AND by default, so only documents containing all analyzed terms are returned. To broaden or refine retrieval, use the operators in your query clause.
Operators and their effects
| Operator | Example | Effect |
|---|---|---|
| (default) | query=title:"Apple Mobile phone" | Retrieves documents where the title contains both "Apple" and "Mobile phone". |
| AND | query=title:'Apple' AND cate:'Mobile phone' | Returns the intersection: documents where the title contains "Apple" and the cate field contains "Mobile phone". |
| OR | query=title:'Apple' OR cate:'Mobile phone' | Returns the union: documents where the title contains "Apple" or the cate field contains "Mobile phone". |
| RANK | query=title:'Apple' RANK cate:'Mobile phone' | Retrieves documents where the title contains "Apple". Documents where cate also contains "Mobile phone" receive a higher relevance score. |
| ANDNOT | query=title:'Apple' ANDNOT cate:'Mobile phone' | Retrieves documents where the title contains "Apple" and the cate field does not contain "Mobile phone". |
Operator precedence, from highest to lowest: (), ANDNOT, AND, OR, RANK.
FAQ
Can I find documents where a specific term appears at the start of a field?
No. OpenSearch does not support position-based retrieval. It is not possible to restrict results to documents where a term such as "KFC" appears at the beginning of a field.
Control how results are scored and sorted
After retrieval, OpenSearch runs a two-stage scoring pipeline to rank documents by relevance.
The two-stage pipeline
Stage 1 — Rough sort
OpenSearch scores up to rank_size documents (default: one million) using a rough sort expression. This stage determines which top-N documents advance to fine sort. A fast, efficient rough sort expression reduces latency and ensures high-quality documents reach the fine sort stage.
Typical rough sort expressions use one or more of:
Simple forward index fields
static_bm25()— scores documents by text relevancetimeliness()— scores documents by recency
Stage 2 — Fine sort
The top-N documents from rough sort are re-scored using a fine sort expression, which supports mathematical and logical operations. Hundreds of documents are involved in the fine sort process. Fine sort expressions can combine text relevance, recency, business metrics, and human-intervention signals. After fine sort, results are returned in order of fine sort score.
If the number of documents to return exceeds N, the remaining documents are ranked by their rough sort scores.
What `sort=-RANK` means
If no sort clause is specified, OpenSearch uses sort=-RANK by default — sorting documents from highest to lowest relevance score. To add a tiebreaker, append a secondary sort field:
sort=-RANK;+bonusThis sorts by relevance descending, then by bonus ascending for documents with equal scores. See Rough sort functions and Fine sort functions for the full function reference.
OpenSearch also provides built-in application schemas and sort expressions for your reference and use in various scenarios.
Sort expression examples by vertical
The following examples show how rough sort and fine sort work together for different use cases.
| Scenario | Stage | Expression | What it scores |
|---|---|---|---|
| Forum | Rough sort | static_bm25() | Text relevance |
| Forum | Fine sort | text_relevance(title)*3+text_relevance(body)+if(text_relevance(title)>0.07,timeliness(create_timestamp),timeliness(create_timestamp)*0.5)+(topped+special+atan(hits)*0.5+atan(replies))*0.1 | Text relevance weighted by field, timeliness, and thread activity (views, replies) |
| O2O (online to offline) | Rough sort | sold_score+general_score*2 | Sales volume and offline store quality score |
| O2O (online to offline) | Fine sort | 2*sold_score+0.5*reward- 10*distance(lon,lat,u_posx,u_posy)+ if ((flags&2) =2, 2, 0)+if(is_open=5,10,0)+ special_score | Sales volume, delivery performance, distance, store status, and business rules |
| Fiction | Rough sort | static_bm25()*0.7+hh_hot*0.00003 | Text relevance and popularity |
| Fiction | Fine sort | pow(min(0.5,max(text_relevance(category),max(text_relevance(title), text_relevance(author)))),2)+ general_score*2+ 1.5*(1/(1+pow(2.718281,-((log10(hh_hot)-2)*2-5))))) | Category, title, and author relevance; novel quality; popularity |
| E-commerce | Rough sort | static_bm25()+general_score*2+timeliness(end_time) | Text relevance, comprehensive item score, and listing expiration |
| E-commerce | Fine sort | text_relevance(title)*3+text_relevance(category)+ general_score*2+boughtScore*2+ tag_match(ctr_query_value,doc_value,mul,sum,false,true)+.. | Text relevance, category relevance, popularity, seller ratings, and click-through rate (CTR) estimation |
FAQ
Why does error 2112 occur?
The fields in your query clause and your fine sort formula must match. For example, if the query clause is query=default:'keyword' (which expands to title and body), but the formula is text_relevance(title)+text_relevance(author), error 2112 is reported because author is not in the query clause.
Why can't `text_relevance()` find a field?
text_relevance() only supports fields of type TEXT and SHORT_TEXT. Fields of other types cannot be used with this function.
Improve scoring performance
Pre-compute expensive scores offline
Scores that do not depend on the search query — such as seller ratings, popularity, or quality scores — can be computed offline and stored in a dedicated field (commonly named general_score). Reference this field directly in sort expressions instead of recomputing it at query time. This reduces computation per query and improves latency.
Use `tag_match` for multi-dimensional feature scoring
tag_match(ctr_query_value, doc_value, mul, sum, false, true) applies a set of operations across query-level and document-level feature vectors. It is widely used in e-commerce to incorporate CTR estimation and personalized signals into sort expressions.
Tune factor weights iteratively
Relevance is determined by multiple factors. Start with the expressions that match your vertical (see the table above), then adjust the weight of each factor based on observed search quality. Small changes in weights can produce large differences in result ordering.