The Compaction Service is a new feature in EMR Serverless StarRocks 3.5, currently in Beta. It runs compaction tasks on a dedicated service, separate from your business compute groups, providing workload isolation, elastic scaling, and better performance. This topic describes the features of the Compaction Service and how to use it.
Overview
Core Value
Capability | Description |
Workload isolation | Compaction runs on a separate service. This prevents resource contention with business tasks such as queries and imports, which ensures business stability. |
Elastic scaling | The Compaction Service supports setting a minimum (Min CU) and maximum (Max CU) number of Compute Units (CUs). It automatically scales based on the compaction load. This ensures timely compaction while reducing costs. |
Out-of-the-box | A Compaction Service is automatically created in version 3.5. Enable it with one click in the console. No extra purchases or configurations are needed. |
Performance optimizations
The Compaction Service includes the following performance optimizations in addition to isolation:
Peer Cache reads: When a compaction task runs, it pulls data directly from cached nodes in the business compute group (Peer Cache), which avoids accessing Object Storage Service (Remote I/O) and significantly improves compaction read performance.
Cache push: After compaction is complete, the Compaction Service asynchronously pushes the merged data files to the nodes of the business compute group. This prevents Object Storage Service access due to a cache miss and ensures that query performance is not affected.
Prerequisites
EMR Serverless StarRocks version 3.5 or later.
A cluster in shared-data mode.
Enable the Compaction Service
Enable the Compaction Service in the EMR Serverless StarRocks console:
Go to the EMR Serverless StarRocks instance list page.
Log on to the E-MapReduce console.
In the navigation pane on the left, choose .
In the top menu bar, select a region as needed.
Click the ID of the target instance.
Click the Compaction Service tab. On the Basic Information page, click Start Service.
In the pane that appears on the right, set Min CU and Max CU.
After you complete the configuration, click Start Service.
Disable the Compaction Service
Click Shutdown Service in the console. In the pane that appears, select the Confirm Risk checkbox, and then click Confirm Shutdown. After the service is disabled, compaction reverts to running on the business compute groups. In-progress tasks will complete normally.
Elastic scaling
CU configuration
The Compaction Service lets you set a scaling range:
Parameter | Description | Recommendation |
Min CU | The minimum number of CUs. The service scales in to this value when idle. | Set this to the minimum value that meets your basic compaction needs. |
Max CU | The maximum number of CUs. The service scales out to this value during peak hours. | Set this value based on your peak write throughput and Compaction Score. |
Scaling policy
The Compaction Service automatically scales based on the following metrics:
Compaction Score: Reflects the accumulation of data versions. A higher score indicates greater compaction pressure.
Task load: The ratio of the current number of compaction tasks to available resources.
The system automatically scales out when the Compaction Score continues to rise or when tasks are queued. When the load decreases, the system gradually scales in to the Min CU value.
Best practices
Use the Compaction Service in the following scenarios:
High write throughput scenarios: Continuous, high-frequency writes cause the Compaction Score to rise, which affects query performance.
Query-sensitive scenarios: Your business is sensitive to query latency and you do not want compaction to compete for query resources.
Cost optimization scenarios: You want to use compaction resources on demand through elastic scaling to reduce standing costs.
Notes
The Compaction Service is only for clusters in shared-data mode.
After you enable the Compaction Service, compaction tasks for all tables are scheduled to run on the Compaction Service.
The CU resources for the Compaction Service are billed separately based on usage. Configure the Min and Max CU values reasonably.
If you disable the Compaction Service, compaction automatically reverts to running on each business compute group. In-progress tasks will complete normally.
Enable the Compaction Service for the first time during off-peak hours to monitor its impact on the system.