This page defines the core terms used in Alibaba Cloud Elasticsearch.
cluster
An Elasticsearch cluster consists of one or more nodes. All nodes in a cluster work together to store data. Each cluster must have a unique name — if two clusters share the same name in the same environment, an unknown exception may occur.
node
A node runs on a server within a cluster. Nodes store data and support indexing and query operations. Nodes can take on different roles:
-
Data nodes store indexes. Use data nodes to add, remove, and modify documents, and to search and aggregate data.
-
Dedicated master nodes manage cluster-level operations: creating and deleting indexes, tracking nodes, and allocating shards. The stability of dedicated master nodes is critical to cluster health. By default, any node in a cluster can serve as a dedicated master node.
-
Client nodes offload CPU overhead from data nodes, improving compute performance and cluster stability.
index
An index is a collection of documents with similar characteristics, analogous to a database in a relational system. For example, you might create three separate indexes to store customer data, product catalog data, and order data.
Each index is identified by a lowercase name. When indexing, querying, updating, or deleting a document, specify the name of the index the document belongs to.
type
A type is a logical partition within an index, analogous to a table in a relational database. An index can contain documents of different types, such as a user type and a blog type.
Type support has been progressively removed:
-
Elasticsearch 5.x — an index can contain multiple types of documents.
-
Elasticsearch 6.x — an index can contain only one type of document. The type concept is deprecated.
-
Elasticsearch 7.x and later — the type of an index is fixed as
_doc.
For details, see the Elasticsearch documentation on removal of types.
document
A document is the basic unit of information that can be indexed, analogous to a row in a relational database table. For example, a document might represent a single customer or a single product. Each document is a JSON object. An index can contain an unlimited number of documents.
field
A field is the smallest unit within a document, analogous to a column in a relational database table.
mapping
A mapping defines how a document and its fields are stored and indexed — including field names, field types, and the tokenizer to use. A mapping is analogous to the schema of a relational database table.
The table below shows how Elasticsearch concepts map to relational database concepts.
| Elasticsearch | Relational database |
|---|---|
| index | database |
| type | table |
| document | row |
| field | column |
| mapping | schema |
Shard and replica shard
An index can be split into multiple shards, which are distributed across nodes to support distributed search. Shards come in two types: primary shards and replica shards.
When creating an index, specify the number of primary shards and replica shards. The number of primary shards cannot be changed after the index is created.
Default shard configuration:
-
Elasticsearch earlier than 7.0: 5 primary shards and 1 replica shard per primary shard, per index.
-
Elasticsearch 7.0 and later: 1 primary shard and 1 replica shard per index.
The table below summarizes the differences between primary shards and replica shards.
| Shard type | Supported requests | Number changeable after creation | Notes |
|---|---|---|---|
| Primary shard | Query and indexing | No — set at index creation. See Step 3: Create an index. | Each document belongs to exactly one primary shard. The number of primary shards determines the maximum data volume an index can hold. More primary shards increase cluster performance overhead. |
| Replica shard | Query only | Yes — change at any time. See Index templates. | Replica shards improve fault tolerance: if a primary shard is lost, it can be restored from a replica. They also improve search throughput by distributing query load. |
You can query primary or replica shards for data.
Write operations: when a cluster receives a write request, it applies the operation to the relevant primary shard, then replicates the data to that shard's replicas. A large number of replica shards increases data synchronization load during writes.
Both the number of shards and the size of each shard affect cluster stability and performance. Plan shards for all indexes before deployment to avoid performance degradation at scale. For sizing guidance, see Evaluate specifications and storage capacity.
gateway
A gateway stores snapshots of indexes. By default, a node keeps all indexes in memory. When node memory is full, indexes spill to local disk. When a cluster restarts, it restores indexes from the gateway snapshots rather than reading from local disk — which is significantly faster.
Supported gateway types: local file system (default), distributed file system, Hadoop Distributed File System (HDFS), and Alibaba Cloud Object Storage Service (OSS).
discovery.zen
discovery.zen is the automatic node discovery mechanism used in Elasticsearch. Elasticsearch is a peer-to-peer (P2P) system that discovers nodes by sending broadcasts. Nodes communicate using multicast and P2P protocols.
transport
Transport is the communication layer between an Elasticsearch cluster (or its nodes) and clients. TCP is used by default. Integrate plug-ins to support additional protocols, including HTTP over JSON, Thrift, Memcached, and ZeroMQ.