All Products
Search
Document Center

Elasticsearch:Use the aliyun-codec plug-in

Last Updated:Mar 26, 2026

Log and time-series workloads accumulate large indexes that drive up storage costs. The aliyun-codec plug-in addresses this by compressing indexes at the Lucene storage layer — row-oriented (_source), column-oriented (doc_values), and inverted (postings) data — without affecting write throughput. The plug-in also provides source_reuse_doc_values, which eliminates redundant _source storage by reconstructing field values from doc_values at read time.

Performance benchmarks

The following results are based on a test index containing 1.2 TiB of cluster log data across 22 primary shards.

Index compression (zstd algorithm, all three storage layers enabled; compared to a cluster where the aliyun-codec plug-in is used but index compression is not enabled):

MetricResult
Write throughputNo change
Index size40% smaller
I/O-intensive query latency50% lower

source_reuse_doc_values (compared to the same cluster without the feature enabled):

MetricResult
Write throughputNo change
Index sizeUp to 40% smaller (depends on the proportion of applicable fields)
I/O-intensive query latencyVaries by proportion of applicable fields and node disk type

Prerequisites

Before you begin, make sure you have:

  • An Alibaba Cloud Elasticsearch V7.10.0 cluster. See Create an Alibaba Cloud Elasticsearch cluster.

  • The required kernel version for the features you plan to use: To upgrade the kernel version, see Upgrade the version of a cluster.

    • V1.5.0 or later: index compression only

    • V1.6.0 or later: both index compression and source_reuse_doc_values

  • The aliyun-codec plug-in installed on your cluster. The plug-in is installed by default on Elasticsearch V7.10.0 clusters. To verify or install it, go to the Plug-ins page in the Elasticsearch console. See Install and remove a built-in plug-in.

Limitations

  • Index compression requires Elasticsearch V7.10.0 with kernel V1.5.0 or later. For Elasticsearch V6.7.0 clusters, use the codec-compression plug-in instead. See Use the codec-compression plug-in of the beta version.

  • source_reuse_doc_values requires kernel V1.6.0 or later. On clusters that meet this requirement, index compression is enabled by default in aliyun_default_index_template (index.codec is set to true).

  • source_reuse_doc_values can only be enabled at index creation time and cannot be disabled after it is enabled.

Enable index compression

The following steps use an Elasticsearch V7.10.0 cluster. Steps may differ for other versions.
  1. Log on to the Kibana console of your cluster. See Log on to the Kibana console.

  2. In the upper-right corner, click Dev tools.

  3. On the Console tab, run a command to enable index compression on your index. The following command enables compression on an existing index named test. By default, the plug-in applies the zstd algorithm to all three storage layers.

    Set a parameter to "" to disable compression for that storage layer.
    PUT test/_settings
    {
      "index.codec": "ali"
    }

    To use different algorithms for specific storage layers, specify them individually. The following example uses zstd for _source and doc_values, and leaves postings uncompressed.

    PUT test/_settings
    {
      "index.codec": "ali",
      "index.doc_value.compression.default": "zstd",
      "index.postings.compression": "",
      "index.source.compression": "zstd"
    }

Index compression parameters

ParameterValuesDescription
index.codec"ali"Enables the aliyun-codec plug-in for the index.
index.doc_value.compression.defaultlz4, zstdCompression algorithm for doc_values (column-oriented data). Applies only to fields of the number, date, keyword, and ip types.
index.postings.compressionzstd, ""Compression algorithm for postings (inverted data). Set to "" to disable.
index.source.compressionzstd, zstd_1024, zstd_dict, best_compression, defaultCompression algorithm for _source (row-oriented data). See the table below.
index.postings.pfor.enabledtrue, falseOptimizes encoding for postings. Reduces storage by 14.4% for keyword, match_only_text, and text fields; reduces overall disk size by 3.5%. Backported from open source Elasticsearch 8.0. Alibaba Elasticsearch clusters of earlier versions also provide this feature.

`index.source.compression` options:

ValueBlock sizeNotes
zstd128 KBStandard zstd compression.
zstd_10241,024 KBzstd with a larger block size.
zstd_dictzstd with dictionary-based compression. Higher compression ratio, but lower read and write performance than zstd.
best_compressionThe best_compression codec from open source Elasticsearch.
defaultThe default codec from open source Elasticsearch.

Enable source_reuse_doc_values

Open source Elasticsearch stores multiple copies of field data: in _source, postings, and doc_values. source_reuse_doc_values reduces index size by pruning the JSON data stored in _source — for applicable fields, Elasticsearch reconstructs source data from doc_values at read time instead of maintaining a duplicate copy.

Important

source_reuse_doc_values can only be enabled at index creation time and cannot be disabled after it is enabled.

Enable at index creation

Run the following command when creating your index:

PUT test
{
  "settings": {
    "index": {
      "ali_codec_service": {
        "source_reuse_doc_values": {
          "enabled": true
        }
      }
    }
  }
}

Configure source_reuse_doc_values

After enabling the feature, adjust the following settings based on your workload.

Set the maximum number of applicable fields

If the number of fields on which source_reuse_doc_values takes effect exceeds the threshold, Elasticsearch either reports an error or disables the feature (controlled by strict_max_fields). The default threshold is 50.

PUT _cluster/settings
{
  "persistent": {
    "apack.ali_codec_service.source_reuse_doc_values.max_fields": 100
  }
}

Control behavior when the threshold is exceeded

PUT _cluster/settings
{
  "persistent": {
    "apack.ali_codec_service.source_reuse_doc_values.strict_max_fields": true
  }
}
  • true: Elasticsearch reports an error if the number of applicable fields exceeds the threshold.

  • false: Elasticsearch silently disables source_reuse_doc_values if the threshold is exceeded.

Adjust concurrent read threads per index

When reading a document, the system uses concurrent threads to fetch field values from doc_values and merge them. The default is 5 threads. Adjust this value to tune fetch latency.

PUT test/_settings
{
  "index": {
    "ali_codec_service": {
      "source_reuse_doc_values": {
        "fetch_slice": 2
      }
    }
  }
}

Adjust the thread pool and queue size (YAML configuration)

The thread pool size defaults to the total number of vCPUs on data nodes in the cluster. The queue size defaults to 1,000. These settings can only be changed in the cluster's YAML configuration file. See Configure the YML file.

Add the following to your YAML configuration file:

apack.doc_values_fetch:
    size: 8
    queue_size: 1000

Parameter reference

The following table summarizes all parameters for both features.

ParameterDefaultScopeDescription
index.codecIndexSet to "ali" to enable the plug-in.
index.doc_value.compression.defaultIndexCompression algorithm for doc_values.
index.postings.compressionIndexCompression algorithm for postings.
index.source.compressionIndexCompression algorithm for _source.
index.postings.pfor.enabledIndexEnables optimized encoding for postings.
apack.ali_codec_service.source_reuse_doc_values.max_fields50ClusterMaximum number of fields on which source_reuse_doc_values takes effect.
apack.ali_codec_service.source_reuse_doc_values.strict_max_fieldsClusterBehavior when the field limit is exceeded: true = error, false = silent disable.
index.ali_codec_service.source_reuse_doc_values.fetch_slice5IndexNumber of concurrent threads for reading field values.
apack.doc_values_fetch.sizeTotal vCPUs of data nodesCluster (YAML only)Thread pool size for doc_values reads.
apack.doc_values_fetch.queue_size1,000Cluster (YAML only)Queue size for doc_values reads.