This topic introduces the terms related to Alibaba Cloud Elasticsearch.
A cluster consists of one or more nodes. All nodes in a cluster work together to store data. Each cluster has a unique name. If two clusters in an environment have the same name, an unknown exception may occur.
- Data nodes are used to store indexes. You can use data nodes to add, remove, and modify documents and search for and aggregate data in documents.
- Dedicated master nodes are used to perform operations on clusters. You can use dedicated master nodes to create or delete indexes, track nodes, and allocate shards. The stability of dedicated master nodes is important to the health of clusters. By default, each node in a cluster may be used as a dedicated master node.
- Client nodes are used to share the CPU overheads of data nodes. Client nodes can improve the computing performance and service stability of a cluster.
An index is a set of documents that have similar features. An index is similar to a relational database. For example, you can create three indexes to store customer data, commodity catalog data, and order data, respectively. In most cases, a name in lowercase is assigned to an index for identification. When you index, query, update, or delete a document, you must specify the name of the index to which the document belongs.
A type is a logical class or partition of an index. A type is similar to a table in a relational database. An index can store different types of documents, such as the user type and blog type. In Elasticsearch V6.X and later, you can create only one type in an index. Therefore, the type concept is no longer mentioned in Elasticsearch V6.X and later. In Elasticsearch V5.X, an index can store different types of documents. In Elasticsearch V6.X, an index can store only one type of document. In Elasticsearch V7.X, the type of an index can only be _doc. For more information, see Open source Elasticsearch documentation.
A document is a basic information unit that can be indexed. A document is similar to a row in a table of a relational database. For example, you can create a document for a customer or commodity. Each document is a JSON object. The number of documents that are stored in an index is not limited. Documents must be indexed.
A field is the smallest unit that is contained in a document. A field is similar to a column in a table of a relational database.
A mapping defines how a document and the fields that the document contains are stored and indexed. For example, you can use mappings to define field names, field types, and the tokenizer that you want to use. A mapping is similar to the schema of a table in a relational database.
An index can be divided into multiple shards. These shards can be distributed among different nodes to support distributed searches. Shards are classified into primary shards and replica shards. When you create an index, you must specify the numbers of primary shards and replica shards for the index. After the index is created, you cannot change the number of primary shards.
A replica shard is a copy of a primary shard for an index. You can configure multiple replica shards for a primary shard. When a cluster receives a request for a write operation, the cluster performs the operation on the related primary shard. After the operation is complete, the cluster copies the data involved in the operation to the replica shards of the primary shard. You can query primary or replica shards for data. Replica shards can improve the high availability of a cluster and the concurrency performance of a cluster during searches. However, if a large number of replica shards are configured for an index in a cluster, the data synchronization load of the cluster increases during write operations.
|Shard type||Supported request type||Whether the number of shards can be changed||Remarks|
|Primary shard||Query and indexing requests||The number of primary shards for an index cannot be changed. This number is specified when the index is created.||Each document in an index belongs to one primary shard. The number of primary shards
determines the maximum volume of data that an index can store.
Notice The more primary shards, the more performance overheads of an Elasticsearch cluster.
|Replica shard||Query requests||The number of replica shards can be changed at any time.||Replica shards are important to search performance and provide the following benefits:
A gateway is used to store snapshots of indexes. By default, a node stores all the indexes in its memory. When the node memory is full, the node stores the indexes in local disks. When an Elasticsearch cluster is rebooted, its indexes are restored from the snapshots that are stored on the gateway. Restoring indexes from snapshots is faster than reading indexes from local disks. Elasticsearch supports multiple types of gateways, including local file systems, distributed file systems, Hadoop Distributed File System (HDFS), and Alibaba Cloud Object Storage Service (OSS). By default, local file systems are used as gateways.
discovery.zen is an automatic node discovery mechanism. Elasticsearch is a peer to peer (P2P) system that sends broadcasts to discover nodes. Nodes communicate with each other by using multicast and P2P technologies.
Transport is the method that is used by an Elasticsearch cluster or the nodes in the cluster to communicate with clients. By default, TCP is used for the communication. You can integrate plug-ins into your Elasticsearch cluster to use other protocols, such as HTTP over JSON, Thrift, Servlet, Memcached, and ZeroMQ.