All Products
Search
Document Center

OpenSearch:Terms

Last Updated:Feb 23, 2023

Instance management

Term

Description

instance

An instance is a set of data configurations, such as data source schema, index schema, and data attributes. An instance serves as a search service.

document

A document is a search unit of structured data. A document can contain one or more fields and must have a primary key field. High-performance Search Edition instances identify a unique document based on the value of the primary key field. If an added document has the same primary key value as an existing document, the existing document is overwritten by the added document.

field

A component of a document. A field consists of a field name and a field value.

plug-in

To help you process data during data import, High-performance Search Edition provides various built-in data processing plug-ins. You can use these plug-ins when you define the schema or configure a data source for an application.

source data

The original data to be pushed to OpenSearch. It contains one or more source fields.

source field

A source field is the smallest unit of the source data. A source field consists of a field name and a field value. For more information about supported data types, see Application schema.

index

A data structure that is used to accelerate retrieval. You can create multiple indexes for one instance.

combined index

You can create a combined index on multiple fields of the text types such as TEXT or SHORT_TEXT. For example, if you need to create a forum search service that supports both title-based searches and comprehensive searches based on titles and bodies, you can create the title_search index on titles and the default combined index on both titles and bodies. This way, title-based searches are implemented based on the title_search index. Comprehensive searches based on titles and bodies are implemented based on the default combined index.

index field

Index fields can be used in query clauses. To implement high-performance data retrieval, you must define index fields.

attribute field

Attribute fields can be used in the filter, sort, aggregate, and distinct clauses of queries to implement features such as filtering and statistics. For more information about these clauses, see filter clause, sort clause, aggregate clause, and distinct clause.

default display field

Default display fields are displayed in search results. You can use the API parameter fetch_fields to specify the fields to return for each search request. Note that if you specify the fetch_fields parameter in your program, the configurations of the default display fields are ignored and the fields that are specified by the fetch_fields parameter are displayed in the search results. If you do not specify the fetch_fields parameter in your program, the default display fields are displayed in the search results.

tokenization

This feature is used to tokenize the sentences in documents to terms. If the data type of the field is TEXT, the system tokenizes the sentences into meaningful terms. If the data type of the field is SHORT_TEXT, the system tokenizes the sentences into single words. Take "video game industry" as an example. If the data type is TEXT, "video game industry" is tokenized into two elements: "video game" and "industry". If the data type is SHORT_TEXT, "video game industry" is tokenized into three elements: "video", "game", and "industry".

term

A term is a text element that is generated after analysis.

index building

After tokenization, indexes are built based on terms. This allows OpenSearch to locate specific documents based on search requests in a quick manner. Search engines can build two types of linked lists: inverted indexes and forward indexes.

inverted index

An inverted index is a linked list that maps terms to their locations in a set of documents. Inverted indexes are used in query clauses. Example: term1->doc1,doc2,doc3 and term2->doc1,doc2.

forward index

A forward index is a linked list that maps documents to fields. Forward indexes are used in filter clauses. Forward indexes are less efficient than inverted indexes. Example: doc1->id,type,create_time.

retrieval

After documents are pushed to OpenSearch, the field values in the documents are converted to individual terms based on query keywords. OpenSearch looks up inverted indexes that are built based on the terms to find matched documents.

retrieval amount

The number of documents that are retrieved.

Data synchronization

Term

Description

data source

The source of data to be pushed. High-performance Search Edition supports data synchronization from ApsaraDB for RDS, MaxCompute, and PolarDB.

reindexing

This feature reindexes on data. Indexing is required after you configure or modify the application schema and a data source.

Quota management

Term

Description

document capacity

The cumulative size of total documents of tables in an instance. The cumulative size is calculated based on the field values. Each field value is converted to a string to calculate the cumulative size.

QPS

The number of queries per second.

LCU

A logical computing unit (LCU) is the unit that is used to measure the computing power of a search service. An LCU indicates the computing power of 10 millicores in a search cluster. Millicore is the unit of CPU resources. Each millicore is one-thousandth of one core.

Searches

Term

Description

sort expression

A sort expression is an expression that you can write to control the sort of search results. You can use basic mathematical operations, mathematical functions, and built-in functions to write a sort expression.

rough sort expression

The search results are first sorted by using a rough sort expression. The system calculates the matching scores of the documents based on a rough sort expression and sorts the documents based on the calculated scores.

fine sort expression

The system selects top N results that are sorted based on a rough sort and calculates the matching scores of the results in a more precise manner by using a fine sort expression. Then, the system sorts the results based on the calculated scores.

search result summary

Generally, the length of text content is long. To help users understand the main content of a document, only a part of the content of a document is displayed in the search results.