Explanation of key concepts and technical terms - OpenSearch - Alibaba Cloud - OpenSearch

Instance management

Term	Description
instance	The top-level container for a search service. An instance holds all data configurations — data source schema, index schema, and data attributes — and serves as a single search service endpoint. An instance is analogous to a database in a relational database system.
document	The basic unit of searchable data, analogous to a row in a relational database table. A document contains one or more fields and must have a primary key field. OpenSearch uses the primary key to uniquely identify each document. If you push a new document with the same primary key as an existing one, the existing document is overwritten.
field	A single name–value pair within a document. Fields are the building blocks of a document and determine what data is stored and how it can be searched or filtered.
plugins	Built-in data processing plugins that OpenSearch provides to transform your data during import. Select plugins when you define the schema or configure a data source.
source data	The raw data you push to OpenSearch before any processing. Source data contains one or more source fields.
source field	The smallest unit of source data — a single name–value pair. For supported data types, see Application schema and index schema.
index	A data structure that accelerates retrieval. An instance can have multiple indexes. OpenSearch uses two types of indexes internally: inverted indexes and forward indexes.
composite index	An index built across multiple fields of TEXT or SHORT_TEXT type. For example, a forum search service might use a `title_search` index for title-only searches and a default composite index across both titles and bodies for comprehensive searches.
index field	A field defined to participate in query clauses. Defining index fields is required for high-performance full-text retrieval.
attribute field	A field used in FILTER, SORT, AGGREGATE, and DISTINCT clauses. Attribute fields enable filtering, sorting, and statistics on search results without participating in full-text retrieval.
default display field	The set of fields returned in search results by default. Override these defaults per request using the `fetch_fields` API parameter. When `fetch_fields` is set, the default display field configuration is ignored and only the specified fields are returned.
tokenization	The process of splitting text field values into individual terms for indexing. How text is split depends on the field type: TEXT fields are split into meaningful word-level terms, while SHORT_TEXT fields are split character by character. For example, the phrase "Zhejiang University" becomes "Zhejiang" and "University" for a TEXT field, but is split into individual characters for a SHORT_TEXT field. Without tokenization, only exact-string matches would work — tokenization is what makes full-text search possible.
term	A single token produced by tokenization. Terms are used to build the inverted index.
index building	The process of constructing indexes from terms after tokenization. OpenSearch builds two types of indexes: inverted indexes (used for retrieval) and forward indexes (used for filtering).
inverted index	A data structure that maps each term to the documents containing it. Inverted indexes power query clause searches. For example, given two documents — "quick brown fox" and "quick fox jumps" — the inverted index maps: `quick → doc1, doc2` / `brown → doc1` / `fox → doc1, doc2` / `jumps → doc2`.
forward index	A data structure that maps each document to its field values. Forward indexes power FILTER clause operations. They are less efficient for retrieval than inverted indexes but are necessary for operations that read field values per document — for example: `doc1 → id, type, create_time`.
retrieval	The process of finding documents that match a search request. OpenSearch converts query keywords into terms, then looks up the inverted index to find all matching documents.
retrieval amount	The number of documents that are retrieved.

Index fields, attribute fields, source fields, and default display fields serve distinct purposes. Index fields are used for full-text retrieval. Attribute fields are used for filtering, sorting, and aggregation. Source fields are the raw input fields from your data source. Default display fields control what is returned in search results. Understanding this distinction helps you design your schema correctly.

Data synchronization

Term	Description
data source	The external system from which data is pushed into OpenSearch. Supported sources are ApsaraDB for RDS, MaxCompute, and PolarDB.
reindexing	The process of rebuilding all indexes from scratch. Reindexing is required after you configure or modify the application schema and a data source.

Quota management

Term	Description
document capacity	The cumulative storage size of all documents in an instance, calculated by converting each field value to a string and summing the sizes.
QPS	Queries per second (QPS) — the number of search requests an instance processes per second.
LCU	Logical computing unit (LCU) — the unit used to measure the computing power of a search service. One LCU indicates the computing power of 10 millicores in a search cluster. A millicore is one thousandth of a CPU core.
scaling	Adjusting the compute and capacity configuration of an instance. Small specification changes take effect immediately. Changes that involve switching instance types — for example, from a shared instance to an exclusive instance — take effect only after approval.

Search

Term	Description
sort expression	A user-defined expression that controls the ranking of search results. Sort expressions support basic mathematical operations, mathematical functions, and built-in functions.
rough sort expression	A first-pass ranking expression. OpenSearch calculates a matching score for each retrieved document using this expression and sorts results by score. The top N results are then passed to the fine sort stage.
fine sort expression	A second-pass ranking expression applied to the top N results from rough sort. Fine sort expressions apply more precise scoring — at higher computational cost — to refine the final ranking.
search result summary	A short excerpt of a document's text content displayed alongside each search result, helping users judge relevance without reading the full document.
query analysis	A set of pre-retrieval features applied to the raw search query. Supported features include synonyms expansion, spelling correction, stop words filtering, and term weight adjustment. These features improve search quality by interpreting user intent rather than matching keywords literally.