This topic introduces basic concepts and terms used in Alibaba Cloud Elasticsearch.
An Elasticsearch cluster consists of one or more nodes. A cluster provides compound indexes and search capabilities for its nodes. All nodes in a cluster are used to store data. Each cluster has a unique name. The default cluster name is elasticsearch. Before a node can join the cluster, the cluster name is required.
You must make sure that clusters in different environments use different names. Otherwise, you may add nodes to the wrong cluster.
A node runs on a server in an Elasticsearch cluster. Nodes are used to store data and support indexing and query activities in the cluster. Same to a cluster, each node has a unique name. By default, a random UUID is assigned to a node as its name when the node is started. UUID is short for universally unique identifier. You can also assign a custom name to the node. Node names are required to complete management work. You must determine which node runs on a specific server based on the name of the node.
You can add a node to a cluster with a specified name. By default, nodes are added to the cluster named elasticsearch. Assume that these nodes can discover each other in a network. After you start these nodes, a cluster named elasticsearch is automatically created.
The number of nodes that a cluster can contain is not limited. If no Elasticsearch nodes are running in your network, after you start a node, a single-node cluster named elasticsearch is created.
An index is a set of documents that have similar features. It is like a relational database. For example, you can create three indexes to store customer data, commodity catalog data, and order data, respectively. In most cases, a name is assigned to an index to identify the index. Index names must be in lowercase. When you query, update, delete, or add indexes to the documents in an index, you must specify the name of the index.
A type is a logical class or partition of an index. It is like a table in a relational database. An index can store different types of documents, such as the user type and blog type. Currently, you are not allowed to create multiple types in an index. In later versions of Elasticsearch, this concept will be removed. For more information, visit Open-source Elasticsearch documentation.
A document is a basic information unit that can be indexed. It is like a row in a table of a relational database. For example, you can create a document for a customer or commodity. A document is a JSON object. The number of documents that are stored in an index is not limited and these documents must be indexed.
An index can be divided into multiple shards. You can divide a large index into shards and then distribute these shards among nodes to support distributed search. Before an index is created, the number of shards for the index must be specified. After an index is created, you cannot change the number of shards for the index.
- Improved fault tolerance: When a primary shard on a node is damaged or lost, you can restore the shard from replica shards.
- Improved search efficiency: Elasticsearch automatically balances the load of queries among replica shards.
Data recovery (or data redistribution) is the process of redistributing shards for a node. This guarantees the integrity of data when the node joins or leaves a cluster, or when the node recovers from a failure.
A gateway is used to store snapshots of indexes. By default, an Elasticsearch node stores all the indexes in its memory. When the node memory is full, the node stores the indexes in local disks. When an Elasticsearch cluster is rebooted, its indexes are restored from the snapshots stored on the gateway. Restoring indexes from snapshots is faster than reading indexes from local disks. Elasticsearch supports multiple types of gateways, including the local file system (default), distributed file system, Hadoop Distributed File System (HDFS), and Alibaba Cloud Object Storage Service (OSS).
discovery.zen is an automatic node discovery mechanism. Elasticsearch is a peer to peer (P2P) system that broadcasts to discover nodes. Nodes communicate with each other by using multicast and P2P technologies.
Transport refers to the method that is used by an Elasticsearch cluster or the nodes in the cluster to communicate with clients. By default, TCP is used. You can integrate plug-ins into Elasticsearch to use other protocols, such as HTTP over JSON, Thrift, Servlet, memcached, and ZeroMQ.