All Products
Search
Document Center

OpenSearch:Terms

Last Updated:Feb 27, 2024

Data-related terms

Term

Description

MaxCompute data source

The data source from which full data is obtained. The raw data is stored in MaxCompute by partition.

API data source

The data source from which incremental data is obtained. Data is updated by calling API operations.

document

The search unit of structured data. A document can contain one or more fields and must have a primary key field. Retrieval Engine Edition identifies a unique document based on the value of the primary key field. If a new document has the same primary key value as an existing document, the existing document is overwritten by the new document.

field

The component of a document. A field consists of a field name and a field value.

multi-value field

The field that contains multiple independent values.

primary key

The field that uniquely identifies a document.

Retrieval Engine Edition

Term

Description

Query Result Searcher (QRS) worker

The role used in online search. QRS workers parse query requests and merge the results returned by Searcher workers.

Searcher worker

The role used in online search. Searcher workers load index data and provide search services.

cluster

A search service that consists of a set of QRS workers and Searcher workers.

Processor

A role used in offline indexing for parsing users' raw data.

Builder

A role used in offline indexing for indexing on raw data.

Merger

A role used in offline indexing for merging and sorting indexes.

full indexing

The process that is used for indexing on full data in a MaxCompute data source. The indexes that are generated during this process are full indexes, and the index versions are full index versions.

incremental indexing

When data is updated in real time, the offline indexing process generates and applies the indexes to online clusters.

real-time indexing

The data that is pushed by calling API operations takes effect in real time. This process is referred to as real-time indexing. Real-time indexes are generated in the memory of Searcher workers.

inverted index

An inverted index is a linked list that maps terms to their locations in a set of documents. Inverted indexes are used in query clauses to make queries efficient. Example: term1->doc1,doc2,doc3;term2->doc1,doc2.

forward index

A forward index is a linked list that maps documents to fields. Forward indexes are used in FILTER clauses. Forward indexes are less efficient than inverted indexes. Example: doc1->id,type,create_time…

summary index

A summary index collects and stores the information that you want the system to display in summaries of search results. You can query information that is contained in a search result summary by specifying the primary key or document ID. Retrieval Engine Edition displays the search results by page.

tokenization

The sentences in documents are tokenized to terms. If the data type of the field is TEXT, the system tokenizes the sentences into meaningful terms. For example, if the data type is TEXT, "浙江大学" is tokenized into two terms "浙江" and "大学".

term

A term is a token or a set of tokens after tokenization.

Data changes triggered and implemented by FSM

Change type

Whether to allow recurring events

Description

Service discovery

Yes

Points the IP address of a Retrieval Engine Edition instance to the domain name to help you call the service. For the same cluster, all historical changes are terminated before the latest change runs.

ha3_biz_apend

No

Adds biz. This operation can be performed once on each instance. The system automatically triggers this change. The change may continue to run until the index table is added to the instance and the index is built.

update_biz_depend_index_fsm

No

Updates the index on which biz depends. This operation can be performed once on each instance. The system automatically triggers this change. The change may continue to run until the index table is added to the instance and the index is built.

Online deployment

Yes

For the same cluster, all historical changes are terminated before the latest change runs.

multi_biz_activate

No

Initializes a Retrieval Engine Edition instance.

This operation can be performed once on each instance. The change may continue to run until the index table is added to the instance and the index is built.

Index creation

Yes

For the same index, all historical changes are terminated before the latest change runs.

Automatically triggered full indexing

Yes

The system automatically triggers this change after new data partitions are identified. The latest change and historical changes can concurrently run.

Manually triggered full indexing

Yes

The latest change and historical changes can concurrently run.

Configuration push

Yes

All historical changes are terminated before the latest change runs.

Online resources

Yes

For the same zone, all historical changes are terminated before the latest change runs.

Index rollback

Yes

The latest change and historical changes can concurrently run.

Note
  • FSM: the finite-state machine. FSM works as a mathematical model that represents a finite number of states and the switchover between these states.

  • Trigger recurring events: specifies whether to allow recurring events.