When multiple analysts run queries against the same OSS dataset simultaneously, every query fetches data directly from OSS — resulting in high latency and congested bandwidth. Lake cache eliminates repeated OSS fetches by caching frequently accessed objects on dedicated NVMe SSDs, delivering millisecond read latency and bandwidth that scales linearly with cache size.
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster
How it works
When a Spark job reads from OSS with lake cache enabled:
The lake cache client forwards the read request to the master node to fetch object metadata.
The master node returns the metadata to the client.
The client uses the metadata to request the actual objects from worker nodes.
If the objects are already cached on the worker nodes, they are returned immediately.
If not, the objects are fetched from OSS, returned to the client, and stored in the cache for future reads.
Data consistency is maintained automatically: when OSS objects are updated, lake cache detects the change and refreshes the cache. Queries always read the latest data.
Performance
Bandwidth and latency
| Metric | Details |
|---|---|
| Read latency | Millisecond-level (NVMe SSD) |
| Cache bandwidth | 5 Gbit/s per TB of cache size |
| Maximum burst throughput | Hundreds of Gbit/s |
| Cache size range | 10 GB–200,000 GB |
Bandwidth example: With a 10 TB cache, the read bandwidth is 5 × 10 = 50 Gbit/s (approximately 6.25 GB/s). With a 20 TB cache, the read bandwidth doubles to 100 Gbit/s. Bandwidth scales linearly as you increase the cache size, unconstrained by standard OSS bandwidth limits.
High throughput density: Lake cache can provide high throughput for a small amount of data to meet the burst read requirements for a small amount of hot data.
If you need a cache size larger than 200,000 GB, submit a ticket.
Cache eviction policy
When the cache reaches its size limit, lake cache uses the Least Recently Used (LRU) eviction policy: infrequently accessed objects are removed first, and frequently accessed objects are retained. To prevent eviction of objects you want to keep, increase the cache size.
TPC-H benchmark results
The following test uses TPC-H queries to measure lake cache impact on OSS read performance. Lake cache improved query execution speed by 2.7x compared to direct OSS access.
| Configuration | Cache size | Dataset size | Spark resource specs | Execution time |
|---|---|---|---|---|
| Lake cache enabled | 12 TB | 10 TB | 2 cores, 8 GB (medium) | 7,219s |
| Direct OSS access | None | 10 TB | 2 cores, 8 GB (medium) | 19,578s |
Billing
After you enable lake cache, you are charged for the used cache space on a pay-as-you-go basis. For pricing details, see Pricing for Enterprise Edition and Basic Edition and Pricing for Data Lakehouse Edition.
Limitations
Lake cache is available in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Singapore, US (Virginia), and Indonesia (Jakarta). To use lake cache in other regions, submit a ticket.
If a hardware fault occurs on the cache nodes, queries continue to run but may slow down while data is re-fetched from OSS. Performance recovers automatically after the cache is repopulated.
When the cache reaches its size limit, infrequently accessed objects are replaced by more frequently accessed ones. Increase the cache size to prevent replacement.
Enable lake cache
Log on to the AnalyticDB for MySQL console. In the upper-left corner, select a region. In the left-side navigation pane, click Clusters.
On the Enterprise Edition, Basic Edition, or Data Lakehouse Edition tab, find the cluster and click the cluster ID.
On the Cluster Information page, go to the Configuration Information section and click Configure next to Lake Cache.
In the Lake Cache dialog box, turn on Lake Cache and specify a cache size. If an error occurs when specifying the cache size, submit a ticket.
Click OK.
After enabling lake cache, open the Lake Cache dialog box again to confirm the configured cache size.
Use lake cache in Spark jobs
After enabling lake cache, set the spark.adb.lakecache.enabled parameter to true in your Spark job to activate OSS read acceleration.
Spark SQL
-- Enable lake cache for this session
SET spark.adb.lakecache.enabled=true;
-- Run your queries
SHOW databases;Spark JAR
Pass the parameter in the job configuration:
{
"comments": [
"Enable lake cache for OSS read acceleration."
],
"args": ["oss://testBucketName/data/readme.txt"],
"name": "spark-oss-test",
"file": "oss://testBucketName/data/example.py",
"conf": {
"spark.adb.lakecache.enabled": "true"
}
}To use lake cache with the XIHE engine, submit a ticket.
When lake cache does not accelerate a query
Lake cache does not speed up a query when any of the following conditions apply:
The Spark job does not have
spark.adb.lakecache.enabled=trueset.The cluster is in a region where lake cache is not supported.
The data is being read for the first time — it has not yet been cached. Performance improves on subsequent reads.
A hardware fault has occurred on cache nodes and data is being re-fetched from OSS (temporary; recovers automatically).
The cache space is full and the objects needed by the query have been evicted. Increase the cache size to retain more hot data.
Monitor lake cache
After enabling lake cache, check whether your Spark jobs are using the cache and review usage metrics in CloudMonitor.
Log on to the CloudMonitor console.
In the left-side navigation pane, choose Cloud Resource Monitoring > Cloud Service Monitoring.
Hover over the AnalyticDB for MySQL card and click AnalyticDB for mysql 3.0 - Data Lakehouse Edition.
Find the cluster and click Monitoring Charts in the Actions column.
Click the LakeCache Metrics tab to view cache details.
The following metrics are available:
| Metric | Description |
|---|---|
| LakeCache Cache Hit Ratio(%) | Percentage of read requests served from cache. Formula: reads from cache / total reads. A higher ratio means more OSS traffic is being avoided. |
| LakeCache Cache Usage(B) | Amount of cache space currently in use, in bytes. |
| Total Amount of Historical Cumulative Read Data of LakeCache(B) | Total data read from the cache since it was enabled, in bytes. |
If the cache hit ratio is low despite repeated queries over the same data, consider increasing the cache size to retain more objects between reads.