This topic describes the terms commonly used in Alibaba Cloud Elasticsearch.

cluster

An Elasticsearch cluster consists of one or more nodes. A cluster provides compound indexes and search capabilities for its nodes. All nodes in a cluster are used to store data. Each cluster has a unique name. The default cluster name is elasticsearch. Before a node joins a cluster, the name of the cluster is required.

You must make sure that clusters in different environments use different names. Otherwise, you may add nodes to the wrong cluster.

node

A node runs on a server in an Elasticsearch cluster. Nodes are used to store data and support indexing and query activities in the cluster. Same to a cluster, each node has a unique name. By default, a random UUID is assigned to a node as its name when the node is started. UUID is short for universally unique identifier. You can also assign a custom name to the node. Node names are required to complete management work. You must determine which node runs on a specific server based on the name of the node.

You can add a node to a cluster with a specified name. By default, nodes are added to the cluster named elasticsearch. Assume that these nodes can discover each other in a network. After you start these nodes, a cluster named elasticsearch is automatically created.

The number of nodes that a cluster can contain is not limited. If no Elasticsearch nodes are running in your network, after you start a node, a single-node cluster named elasticsearch is created.

index

An index is a set of documents that have similar features. It is similar to a relational database. For example, you can create three indexes to store customer data, commodity catalog data, and order data, respectively. In most cases, a name is assigned to an index to identify the index. Index names must be in lowercase. When you index, query, update, or delete a document, you must specify the name of the index to which the document belongs.

Table 1. Mappings between terms in Elasticsearch and relational databases
Elasticsearch Relational database
index database
type table
document row
field column
mapping schema

type

A type is a logical class or partition of an index. It is similar to a table in a relational database. An index can store different types of documents, such as the user type and blog type. You are not allowed to create multiple types in an index. In later versions of Elasticsearch, this concept will be removed. For more information, see Open-source Elasticsearch documentation.

document

A document is a basic information unit that can be indexed. It is similar to a row in a table of a relational database. For example, you can create a document for a customer or commodity. A document is a JSON object. The number of documents that are stored in an index is not limited and these documents must be indexed.

field

A field is the smallest unit that makes up a document. It is similar to a column in a table of a relational database.

mapping

A mapping is used to define how a document and the fields that the document contains are stored and indexed. For example, you can use mappings to define field names, field types, and the tokenizer that you want to use. A mapping is similar to a schema in a relational database.

shard

An index can be divided into multiple shards. These shards can be distributed among different nodes to support distributed searches. When you create an index, you must specify the number of shards for the index. After the index is created, you cannot change the number.

A shard can be a primary or replica shard. In versions earlier than Elasticsearch 7.0, each index is configured with five primary shards and one replica shard for each primary shard by default. In Elasticsearch 7.0 and later, each index is configured with one primary shard and one replica shard by default. The following table describes the differences between primary and replica shards.

Shard type Supported request type Whether the number of shards can be changed Remarks
Primary shard Query and indexing requests The number of primary shards in an index cannot be changed. This number is specified when the index is created. Each document in an index belongs to a single primary shard. Therefore, the number and sizes of primary shards determine the maximum volume of data that an index can store.
Notice The more primary shards, the more performance overheads your Elasticsearch cluster incurs.
Replica shard Query requests The number of replica shards can be changed at any time. Replica shards are important to search performance and provide the following benefits:
  • Improved fault tolerance: If a primary shard on a node is damaged or lost, you can restore the shard from replica shards.
  • Improved search efficiency: Elasticsearch automatically balances the load of queries among replica shards.

recovery

Data recovery (or data redistribution) is the process of redistributing shards for a node. This ensures the integrity of data when the node joins or leaves a cluster, or when the node recovers from a failure.

gateway

A gateway is used to store snapshots of indexes. By default, a node stores all the indexes in its memory. When the node memory is full, the node stores the indexes in local disks. When an Elasticsearch cluster is rebooted, its indexes are restored from the snapshots that are stored on the gateway. Restoring indexes from snapshots is faster than reading indexes from local disks. Elasticsearch supports multiple types of gateways, including the local file system (default), distributed file system, Hadoop Distributed File System (HDFS), and Alibaba Cloud Object Storage Service (OSS).

discovery.zen

discovery.zen is an automatic node discovery mechanism. Elasticsearch is a peer to peer (P2P) system that sends broadcasts to discover nodes. Nodes communicate with each other by using multicast and P2P technologies.

transport

Transport refers to the method that is used by an Elasticsearch cluster or the nodes in the cluster to communicate with clients. By default, TCP is used. You can integrate plug-ins into Elasticsearch to use other protocols, such as HTTP over JSON, Thrift, Servlet, Memcached, and ZeroMQ.