Elasticsearch on Alibaba Cloud: Index and Search Architecture

This article examines index lifecycle, shard allocation, and search request processing on Alibaba Cloud Elasticsearch and the design decisions that determine cluster performance at scale.

Search and log analytics workloads behave differently from transactional ones. Data accumulates continuously, query patterns shift from recent to historical records as data ages, and storage cost grows linearly while query value rarely does. An architecture that performs well on a one-month index often degrades at twelve months, not because the engine is slower, but because shard counts, segment sizes, and resource allocation were not designed for the volume the index eventually carries.

Alibaba Cloud Elasticsearch is a managed service on the open-source engine, integrated with the platform's storage, networking, and observability stack. The managed layer handles node provisioning, version patching, and snapshot scheduling; index design, shard sizing, lifecycle policy, and query construction remain with the engineer. Four areas determine cluster behaviour under sustained load: cluster topology, index lifecycle management, shard allocation, and search request processing.

ChatGPT_Image_May_27_2026_04_33_55_PM
Figure 1: Alibaba Cloud Elasticsearch architecture across hot, warm, and cold tiers.

Cluster Topology and Node Roles

A cluster is composed of nodes assigned specific roles: master, data, coordinating, and Kibana, each scaled independently. Master nodes maintain cluster state: index metadata, shard placement, and node membership. Production clusters run three dedicated master-eligible nodes to preserve quorum during single-node failure. Co-locating master responsibilities on data nodes is supported but creates availability risk under sustained query load.

Data nodes hold shards and execute reads and writes against them. Hot, warm, and cold tiers differ by storage media and CPU-to-storage ratio. Hot nodes use SSDs tuned for high write throughput and low-latency response. Warm nodes use a higher-capacity SSD with reduced CPU, serving older indices where query frequency falls but retention is still required. Cold nodes use object-storage-backed retrieval suited to compliance archives where queries are infrequent, and latency tolerance is measured in seconds.

Coordinating nodes are optional in small clusters but become important once query concurrency exceeds the capacity of data nodes to handle both shard execution and result aggregation. A coordinating node receives the request, fans out sub-queries to the relevant shards, merges partial results, and returns the response. Isolating this role prevents aggregation overhead from competing with shard execution for the same CPU and heap.

Index Lifecycle Management

Index Lifecycle Management (ILM) automates transitions between phases based on age, size, or document count. A typical lifecycle for log or telemetry data uses four phases: hot, warm, cold, and delete. The hot phase accepts writes against a rollover alias; rollover triggers at a configured age (commonly one day) or size threshold (commonly 50 GB), at which point a new index is created, and writes are redirected automatically.

On exit from the hot phase, the warm phase relocates the index to warm-tier nodes via allocation filtering, runs a forcemerge to consolidate segments, and optionally drops replicas from one to zero where high-availability reads are no longer required. The cold phase moves shards to cold-tier nodes and can apply a searchable snapshot conversion, where the primary copy resides on object storage and only metadata stays on the node. The delete phase removes the index once retention expires.

ILM policies attach to an index template, so every new index inherits the lifecycle automatically. Policy edits apply to all governed indices without per-index reconfiguration, replacing the cron-driven curator scripts earlier deployments relied on for retention.

Shard Allocation and Sizing

Shard sizing most directly determines cluster performance and is the decision most often made incorrectly. Each shard is a self-contained Lucene index with its own segments, query thread, and heap overhead. Too few large shards limit parallelism on a single query; too many small shards exhaust the heap on metadata before storage capacity is reached.

The operating range for log and search workloads is 10 GB to 50 GB per shard. Below 10 GB, metadata cost outweighs data held; above 50 GB, recovery after node failure extends and forcemerge slows. Primary shard count is fixed at index creation and cannot change without reindexing, making it the most consequential decision in the template. For an index expected to hold 200 GB over its hot phase, five to eight primary shards keep each shard in range while allowing parallel writes across nodes.

Replica count is mutable per phase. Hot indices typically run with one replica for write-time availability. Warm indices with declining query load can drop to zero where the snapshot policy provides recovery assurance. Allocation awareness, assigning shards to nodes by zone or rack metadata, distributes replicas across failure domains so a zone failure does not remove any shard. Cluster zones map to availability zones in the deployed region; allocation awareness should use the zone attribute on every production cluster.

Search Request Processing

A search request passes through two phases: query and fetch. In the query phase, the coordinating node forwards the request to one copy of each shard in the target index pattern. Each shard executes against its local segments and returns document identifiers and sort values. The coordinating node merges the partial results, applies global sorting and pagination, and determines which documents form the final set.

In the fetch phase, the coordinating node requests the full source documents from the shards holding them, assembles the response, and returns it. Query latency is dominated by the slowest shard in the query phase. One overloaded shard delays the entire response, making balanced allocation essential for a predictable response time.

Filter context clauses inside a bool query's filter array are cacheable in the node query cache and skip relevance scoring. Moving date ranges, keyword term filters, and exists checks into the filter context substantially reduces CPU cost on repeated queries. Analysed text queries and script-based scoring cannot be cached and should run after filters reduce the candidate set. Aggregation memory scales with field cardinality; high-cardinality terms aggregations should use a composite aggregation pattern to paginate results rather than materialise the full bucket set in one response.

Operational Considerations

Three factors determine reliable cluster performance under production load.

JVM heap sizing: Heap should be no more than 50% of available memory and must not exceed 31 GB, the compressed object pointer threshold. Sustained pressure above 75% signals undersized nodes or excessive shards per node, both of which degrade garbage collection and inflate query latency variance.
Snapshot and restore: Snapshots integrate with Object Storage Service. A daily snapshot of all production indices provides recovery independent of cluster state. Duration scales with new segments since the previous snapshot; scheduling during off-peak windows reduces I/O contention with active writes.
Access control and network isolation: Restrict cluster access to the deployed VPC and disable public endpoints unless an external integration requires them. RAM policies should scope cluster-level operations separately from index-level read and write permissions; superuser credentials are for administration, not application configuration.

Conclusion

Index management and search architecture on Alibaba Cloud Elasticsearch are governed by the same trade-offs as any Elasticsearch deployment, including shard sizing, lifecycle policy, replica count, and query construction. The managed service replaces node provisioning, patching, and snapshot infrastructure with declarative configuration, but the decisions that determine production behaviour remain with the engineer.

Three extension patterns are worth evaluating against the workload. Searchable snapshots reduce cold-tier storage cost where latency tolerance permits, by holding the primary copy on object storage and serving queries from snapshot-backed indices. Cross-cluster search consolidates access across regionally distributed clusters without physical replication, suiting deployments where data residency prevents centralisation. Where aggregation cost on high-cardinality time-series data becomes the bottleneck, a downsampled rollup index or a columnar engine alongside Elasticsearch text search on one, aggregation on the other separates workloads onto formats appropriate to each.

Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

Community

Elasticsearch on Alibaba Cloud: Index and Search Architecture

Cluster Topology and Node Roles

Index Lifecycle Management

Shard Allocation and Sizing

Search Request Processing

Operational Considerations

Conclusion

Read previous post:

Read next post:

PM - C2C_Yuan

You may also like

Comments

PM - C2C_Yuan

Related Products

Hybrid Cloud Distributed Storage

OSS(Object Storage Service)

Storage Capacity Unit

E-Commerce Solution