Terms - Elasticsearch - Alibaba Cloud Documentation Center

This topic introduces the terms related to Alibaba Cloud Elasticsearch.

cluster

An Elasticsearch cluster consists of one or more Elasticsearch nodes. All nodes in a cluster work together to store data. Each cluster has a unique name. If two clusters in an environment have the same name, an unknown exception may occur.

node

A node runs on a server in a cluster. Nodes are used to store data and support indexing and query activities in the cluster. A cluster consists of one or more nodes, and the nodes can play different roles.

Data nodes are used to store indexes. You can use data nodes to add, remove, and modify documents and search for and aggregate data in documents.
Dedicated master nodes are used to perform operations on clusters. You can use dedicated master nodes to create or delete indexes, track nodes, and allocate shards. The stability of dedicated master nodes is important to the health of clusters. By default, each node in a cluster may be used as a dedicated master node.
Client nodes are used to share the CPU overheads of data nodes. Client nodes can improve the computing performance and service stability of a cluster.

index

An index is a set of documents that have similar features. An index is similar to a relational database. For example, you can create three indexes to store customer data, commodity catalog data, and order data, respectively. In most cases, a name in lowercase is assigned to an index for identification. When you index, query, update, or delete a document, you must specify the name of the index to which the document belongs.

type

A type is a logical class or partition of an index. A type is similar to a table in a relational database. An index can store different types of documents, such as the user type and blog type. In Elasticsearch V6.X and later, you can create only one type in an index. Therefore, the type concept is no longer mentioned in Elasticsearch V6.X and later. In Elasticsearch V5.X, an index can store different types of documents. In Elasticsearch V6.X, an index can store only one type of document. In Elasticsearch V7.X, the type of an index can only be _doc. For more information, see Open source Elasticsearch documentation.

document

A document is a basic information unit that can be indexed. A document is similar to a row in a table of a relational database. For example, you can create a document for a customer or commodity. Each document is a JSON object. The number of documents that are stored in an index is not limited. Documents must be indexed.

field

A field is the smallest unit that is contained in a document. A field is similar to a column in a table of a relational database.

mapping

A mapping defines how a document and the fields that the document contains are stored and indexed. For example, you can use mappings to define field names, field types, and the tokenizer that you want to use. A mapping is similar to the schema of a table in a relational database.

The following table lists the mapping between terms in Elasticsearch and relational databases.

Elasticsearch	Relational database
index	database
type	table
document	row
field	column
mapping	schema

shard and replica shard

An index can be divided into multiple shards. These shards can be distributed among different nodes to support distributed searches. Shards are classified into primary shards and replica shards. When you create an index, you must specify the numbers of primary shards and replica shards for the index. After the index is created, you cannot change the number of primary shards.

A replica shard is a copy of a primary shard for an index. You can configure multiple replica shards for a primary shard. When a cluster receives a request for a write operation, the cluster performs the operation on the related primary shard. After the operation is complete, the cluster copies the data involved in the operation to the replica shards of the primary shard. You can query primary or replica shards for data. Replica shards can improve the high availability of a cluster and the concurrency performance of a cluster during searches. However, if a large number of replica shards are configured for an index in a cluster, the data synchronization load of the cluster increases during write operations.

In versions earlier than Elasticsearch V7.0, each index is configured with five primary shards and each primary shard is configured with one replica shard by default. In Elasticsearch V7.0 and later, each index is configured with one primary shard and one replica shard by default. The following table describes the differences between primary shards and replica shards.

Shard type	Supported request type	Whether the number of shards can be changed	Remarks
Primary shard	Query and indexing requests	The number of primary shards for an index cannot be changed. This number is specified when the index is created. For more information, see Step 3: Create an index.	Each document in an index belongs to one primary shard. The number of primary shards determines the maximum volume of data that an index can store. Important The more primary shards, the more performance overheads of an Elasticsearch cluster.
Replica shard	Query requests	The number of replica shards can be changed at any time. For more information, see Index templates.	Replica shards are important to search performance and provide the following benefits: Improved fault tolerance: If a primary shard on a node is damaged or lost, you can restore the primary shard from its replica shards. Improved search efficiency: Elasticsearch automatically balances the load of queries among replica shards.

Important Both the number of shards and the size of each shard contribute to the stability and performance of an Elasticsearch cluster. You must appropriately plan shards for all indexes in an Elasticsearch cluster. This prevents numerous shards from affecting cluster performance when it is difficult to define business scenarios. For more information, see Evaluate specifications and storage capacity.

gateway

A gateway is used to store snapshots of indexes. By default, a node stores all the indexes in its memory. When the node memory is full, the node stores the indexes in local disks. When an Elasticsearch cluster is rebooted, its indexes are restored from the snapshots that are stored on the gateway. Restoring indexes from snapshots is faster than reading indexes from local disks. Elasticsearch supports multiple types of gateways, including local file systems, distributed file systems, Hadoop Distributed File System (HDFS), and Alibaba Cloud Object Storage Service (OSS). By default, local file systems are used as gateways.

discovery.zen

discovery.zen is an automatic node discovery mechanism. Elasticsearch is a peer to peer (P2P) system that sends broadcasts to discover nodes. Nodes communicate with each other by using multicast and P2P technologies.

transport

Transport is the method that is used by an Elasticsearch cluster or the nodes in the cluster to communicate with clients. By default, TCP is used for the communication. You can integrate plug-ins into your Elasticsearch cluster to use other protocols, such as HTTP over JSON, Thrift, Memcached, and ZeroMQ.