Relevance-based searches and examples - OpenSearch - Alibaba Cloud Documentation Center

OpenSearch delivers results through a three-stage pipeline: query analysis, document matching, and relevance scoring. Understanding each stage helps you diagnose unexpected results and tune search quality.

How it works

When a search request arrives, OpenSearch:

Analyzes the search query into terms using the configured analyzer.
Matches documents against those terms using the logical operators in your query clause.
Scores matched documents using a rough sort expression, then re-ranks the top results using a fine sort expression.
Returns results based on the start and hit parameters.

For details on analyzers, see Text analyzers.

Control what gets matched

Logical operators determine which documents are retrieved. OpenSearch uses AND by default, so only documents containing all analyzed terms are returned. To broaden or refine retrieval, use the operators in your query clause.

Operators and their effects

Operator	Example	Effect
(default)	`query=title:"Apple Mobile phone"`	Retrieves documents where the title contains both "Apple" and "Mobile phone".
AND	`query=title:'Apple' AND cate:'Mobile phone'`	Returns the intersection: documents where the title contains "Apple" and the `cate` field contains "Mobile phone".
OR	`query=title:'Apple' OR cate:'Mobile phone'`	Returns the union: documents where the title contains "Apple" or the `cate` field contains "Mobile phone".
RANK	`query=title:'Apple' RANK cate:'Mobile phone'`	Retrieves documents where the title contains "Apple". Documents where `cate` also contains "Mobile phone" receive a higher relevance score.
ANDNOT	`query=title:'Apple' ANDNOT cate:'Mobile phone'`	Retrieves documents where the title contains "Apple" and the `cate` field does not contain "Mobile phone".

Operator precedence, from highest to lowest: (), ANDNOT, AND, OR, RANK.

FAQ

Can I find documents where a specific term appears at the start of a field?

No. OpenSearch does not support position-based retrieval. It is not possible to restrict results to documents where a term such as "KFC" appears at the beginning of a field.

Control how results are scored and sorted

After retrieval, OpenSearch runs a two-stage scoring pipeline to rank documents by relevance.

The two-stage pipeline

Stage 1 — Rough sort

OpenSearch scores up to rank_size documents (default: one million) using a rough sort expression. This stage determines which top-N documents advance to fine sort. A fast, efficient rough sort expression reduces latency and ensures high-quality documents reach the fine sort stage.

Typical rough sort expressions use one or more of:

Simple forward index fields
static_bm25() — scores documents by text relevance
timeliness() — scores documents by recency

Stage 2 — Fine sort

The top-N documents from rough sort are re-scored using a fine sort expression, which supports mathematical and logical operations. Hundreds of documents are involved in the fine sort process. Fine sort expressions can combine text relevance, recency, business metrics, and human-intervention signals. After fine sort, results are returned in order of fine sort score.

If the number of documents to return exceeds N, the remaining documents are ranked by their rough sort scores.

What `sort=-RANK` means

If no sort clause is specified, OpenSearch uses sort=-RANK by default — sorting documents from highest to lowest relevance score. To add a tiebreaker, append a secondary sort field:

sort=-RANK;+bonus

This sorts by relevance descending, then by bonus ascending for documents with equal scores. See Rough sort functions and Fine sort functions for the full function reference.

OpenSearch also provides built-in application schemas and sort expressions for your reference and use in various scenarios.

Sort expression examples by vertical

The following examples show how rough sort and fine sort work together for different use cases.

Scenario	Stage	Expression	What it scores
Forum	Rough sort	`static_bm25()`	Text relevance
Forum	Fine sort	`text_relevance(title)3+text_relevance(body)+if(text_relevance(title)>0.07,timeliness(create_timestamp),timeliness(create_timestamp)0.5)+(topped+special+atan(hits)0.5+atan(replies))0.1`	Text relevance weighted by field, timeliness, and thread activity (views, replies)
O2O (online to offline)	Rough sort	`sold_score+general_score*2`	Sales volume and offline store quality score
O2O (online to offline)	Fine sort	`2sold_score+0.5reward- 10*distance(lon,lat,u_posx,u_posy)+ if ((flags&2) =2, 2, 0)+if(is_open=5,10,0)+ special_score`	Sales volume, delivery performance, distance, store status, and business rules
Fiction	Rough sort	`static_bm25()0.7+hh_hot0.00003`	Text relevance and popularity
Fiction	Fine sort	`pow(min(0.5,max(text_relevance(category),max(text_relevance(title), text_relevance(author)))),2)+ general_score2+ 1.5(1/(1+pow(2.718281,-((log10(hh_hot)-2)*2-5)))))`	Category, title, and author relevance; novel quality; popularity
E-commerce	Rough sort	`static_bm25()+general_score*2+timeliness(end_time)`	Text relevance, comprehensive item score, and listing expiration
E-commerce	Fine sort	`text_relevance(title)3+text_relevance(category)+ general_score2+boughtScore*2+ tag_match(ctr_query_value,doc_value,mul,sum,false,true)+..`	Text relevance, category relevance, popularity, seller ratings, and click-through rate (CTR) estimation

FAQ

Why does error 2112 occur?

The fields in your query clause and your fine sort formula must match. For example, if the query clause is query=default:'keyword' (which expands to title and body), but the formula is text_relevance(title)+text_relevance(author), error 2112 is reported because author is not in the query clause.

Why can't `text_relevance()` find a field?

text_relevance() only supports fields of type TEXT and SHORT_TEXT. Fields of other types cannot be used with this function.

Improve scoring performance

Pre-compute expensive scores offline

Scores that do not depend on the search query — such as seller ratings, popularity, or quality scores — can be computed offline and stored in a dedicated field (commonly named general_score). Reference this field directly in sort expressions instead of recomputing it at query time. This reduces computation per query and improves latency.

Use `tag_match` for multi-dimensional feature scoring

tag_match(ctr_query_value, doc_value, mul, sum, false, true) applies a set of operations across query-level and document-level feature vectors. It is widely used in e-commerce to incorporate CTR estimation and personalized signals into sort expressions.

Tune factor weights iteratively

Relevance is determined by multiple factors. Start with the expressions that match your vertical (see the table above), then adjust the weight of each factor based on observed search quality. Small changes in weights can produce large differences in result ordering.