All Products
Search
Document Center

E-MapReduce:Cache

Last Updated:Mar 24, 2026

This topic describes the architecture of StarRocks' cache management and the applicable scenarios for each cache type, to help you select the right caching solution for your business needs.

Features

StarRocks provides multiple caching mechanisms that significantly improve query performance by caching hot data to the memory or disk of local BE and CN nodes. This reduces repeated access to remote storage, such as HDFS and object storage.

Cache types

Cache type

Use cases

Default state

Available since

shared-data Data Cache

Accelerates queries on internal tables in shared-data (serverless) instances.

Enabled by default

v3.1.7 / v3.2.3

data lake Data Cache

Accelerates queries on external tables from an External Catalog (such as Hive, Iceberg, and Hudi).

Enabled by default since v3.3.0

v2.5

Index Cache

Caches indexes for shared-data instances, ideal for scenarios where disk capacity is insufficient to cache the full dataset.

Enabled by default

v3.3.13

Note

Since v3.4.0, queries on internal tables in shared-data instances and queries on data lakes share the same Data Cache instance, eliminating the need for separate configurations.

Recommendations

  • Shared-data instance: Use the shared-data Data Cache. It automatically loads data on demand from remote storage to the local cache, requiring no extra configuration.

  • Data lake external tables: Use the data lake Data Cache. It supports caching remote files in formats like Parquet and ORC, making it ideal for scenarios that involve repeated scans of large tables, such as ad-hoc analytics and report queries.

  • Insufficient disk capacity for the full dataset: Enable the Index Cache. It caches only indexes, significantly improving query performance with low disk overhead.

  • Preloading hot data: Use Data Cache preheating (CACHE SELECT) to load specific data into the cache in advance, to avoid the performance impact of a cold start.