The apack plug-in is developed by the Alibaba Cloud Elasticsearch team. This plug-in provides the physical replication and vector retrieval features. This topic describes only the physical replication feature. This feature greatly reduces CPU overheads and improves write performance in scenarios such as logging and time series analytics. In these scenarios, replica shards are configured for indexes, large amounts of data are written, and data visibility is latency-insensitive.
Prerequisites
- An Alibaba Cloud Elasticsearch cluster of V6.7.0 is created. The kernel version of the cluster is 1.2.0 or later. For more information, see Create an Alibaba Cloud Elasticsearch cluster.
- The apack plug-in is installed.
Only Alibaba Cloud Elasticsearch clusters of V6.7.0 support this plug-in. If the kernel version of your cluster is earlier than 1.2.0, you must update the kernel before you use the plug-in. For more information about how to update the kernel of a cluster, see Upgrade the version of a cluster. If the kernel version of your cluster is 1.2.0 or later, the plug-in is already installed on your cluster and cannot be removed. You can go to the Plug-ins page to check whether the plug-in is installed.Note After the apack plug-in is installed, you can use both the physical replication and vector retrieval features. For more information about how to use the vector retrieval feature, see Use the aliyun-knn plug-in for vector searches.
Background information
Basic principle of the physical replication feature:
If the feature is disabled, the system writes index data to a primary shard when the node where the primary shard resides receives a write request. Then, the system synchronizes the request to the nodes where the replica shards of the primary shard reside and writes the index data to the replica shards. This process is the same as that in open source Elasticsearch. In this process, index data is written to not only the primary shard and its replica shards but also their translogs.
After the feature is enabled, index data is written only to the primary shard, its translog, and the translogs of its replica shards. This ensures data reliability and consistency. Each time the primary shard is refreshed, the system copies incremental index data to the replica shards of the primary shard over the network. This feature delays data visibility but significantly improves the write performance of a cluster.
- Test environment
- Node configuration: 5 data nodes (each with 8 vCPUs and 32 GiB of memory) and one 2-TiB standard SSD
- Dataset: 74-GiB nyc_taixs of Rally provided by open source Elasticsearch
- Index configuration: five primary shards and one replica shard for each primary shard (default configuration)
- Test result
Service Write speed (document/s) Open source Elasticsearch 6.7.0 127,305 Alibaba Cloud Elasticsearch V6.7.0 (with the physical replication feature enabled) 184,592 - Test conclusion
Alibaba Cloud Elasticsearch with the physical replication feature enabled delivers a write performance 45% better than open source Elasticsearch.
Precautions
- The physical replication feature of the apack plug-in works on indexes. By default, this feature is disabled for indexes created before the plug-in is installed and is enabled for indexes created after the plug-in is installed. If your indexes are created before the plug-in is installed, you must enable the feature before you can use it.
- You can disable the physical replication feature for an index. However, before you disable this feature, disable the index.
- Before you enable the physical replication feature for an index, disable the index and set the number of replica shards for the index to 0.
Enable the physical replication feature for a new index
PUT index-1
{
"settings": {
"index.replication.type" : "segment"
}
}