This topic describes the architecture of StarRocks' cache management and the applicable scenarios for each cache type, to help you select the right caching solution for your business needs.
Features
StarRocks provides multiple caching mechanisms that significantly improve query performance by caching hot data to the memory or disk of local BE and CN nodes. This reduces repeated access to remote storage, such as HDFS and object storage.
Cache types
Cache type | Use cases | Default state | Available since |
shared-data Data Cache | Accelerates queries on internal tables in shared-data (serverless) instances. | Enabled by default | v3.1.7 / v3.2.3 |
data lake Data Cache | Accelerates queries on external tables from an External Catalog (such as Hive, Iceberg, and Hudi). | Enabled by default since v3.3.0 | v2.5 |
Index Cache | Caches indexes for shared-data instances, ideal for scenarios where disk capacity is insufficient to cache the full dataset. | Enabled by default | v3.3.13 |
Since v3.4.0, queries on internal tables in shared-data instances and queries on data lakes share the same Data Cache instance, eliminating the need for separate configurations.
Recommendations
Shared-data instance: Use the shared-data Data Cache. It automatically loads data on demand from remote storage to the local cache, requiring no extra configuration.
Data lake external tables: Use the data lake Data Cache. It supports caching remote files in formats like Parquet and ORC, making it ideal for scenarios that involve repeated scans of large tables, such as ad-hoc analytics and report queries.
Insufficient disk capacity for the full dataset: Enable the Index Cache. It caches only indexes, significantly improving query performance with low disk overhead.
Preloading hot data: Use Data Cache preheating (
CACHE SELECT) to load specific data into the cache in advance, to avoid the performance impact of a cold start.