By Guo Xiaobing and Zhang Haoran
AnalyticDB for MySQL is an analytical database service developed by Alibaba Cloud. It provides online and offline analysis services with high performance and at a low cost. The storage layer of AnalyticDB for MySQL uses the XUANWU analytical storage engine to implement high-throughput real-time data writes and high-performance real-time queries. In AnalyticDB for MySQL V3.2.0 and later, the next-generation storage engine XUANWU_V2 is used. The XUANWU_V2 engine stores data in Object Storage Service (OSS) and uses disks as the intermediate cache to balance cost and performance. This further improves the cost-effectiveness of AnalyticDB for MySQL. In addition, XUANWU_V2 provides an independent component called Compaction Service to perform compaction operations during real-time computing. This improves the stability of highly concurrent writes and queries and provides higher throughput and more flexible resource scheduling capabilities. This topic describes the background of Compaction Service.
The architecture of the AnalyticDB for MySQL storage engine is similar to a Log Structured Merge (LSM)-Tree. Business data is written to an AnalyticDB for MySQL instance as real-time data based on the Append-Only File (AOF) mechanism and deleted based on the mark-and-sweep mechanism. This optimizes write performance. Then, the storage engine converts the real-time data to historical data through compactions. This optimizes query performance. During the compaction process, data is reorganized to improve query efficiency. For example, data is partitioned and sorted and indexes are built. In addition, data management operations such as historical partition deletion, hot and cold data separation, and asynchronous DDL statement execution are performed.
This shows that the compaction process in AnalyticDB for MySQL is an I/O- and CPU-intensive operation. In addition to regular compactions in the background, the storage nodes of AnalyticDB for MySQL instances also need to process real-time data writing, high-performance queries, and high-throughput data import. In such an architecture where compactions are performed on storage nodes, the following pain points exist:
• Low concurrency: Compaction tasks are executed in the background at low concurrency by using a small amount of storage node resources to not affect other tasks on storage nodes.
• Weak resource isolation: Even if sufficient resource isolation measures are taken, the execution of compaction tasks can still affect queries and writes during peak hours.
• Low scalability: Resources cannot be scaled out specifically for compaction tasks. Compaction resources can only be scaled out with storage nodes.
To address the preceding pain points, AnalyticDB for MySQL provides the Compaction Service feature. This feature decouples the compaction load in the system background from storage nodes and processes the compaction load in Compaction Service, which is supported by regional resource pools that automatically scale.
• Elastic scaling within seconds: When a large amount of data is written to your AnalyticDB for MySQL instance during business traffic spikes, compaction resources can be scaled out within seconds to increase task concurrency by several times. This way, real-time data can be quickly converted to historical data. Even with intensive write operations, the query performance is not affected.
• Strong isolation: Compactions in the background are performed in Compaction Service instead of on storage nodes. This significantly reduces the impacts of compactions on read and write operations.
• Pay-as-you-go billing method: You are charged based on the volume of data that is processed. You do not need to reserve resident resources for compaction tasks. You are not charged if no compaction tasks are executed.
Compared with performing compactions on storage nodes, performing compactions in Compaction Service can significantly reduce resource consumption on workers and improve concurrency and elasticity. Compaction Service can reduce 50% of storage resource consumption and reduce 40% of the total task execution time on average. You can modify configurations online to implement linear scaling within seconds.
The following section describes the overall implementation of Compaction Service from a technical perspective.
Compaction Service uses a shared elastic resource pool to serve all compaction tasks of the AnalyticDB for MySQL instances in a region. Compaction Service contains a controller node and multiple executor nodes. The controller node is responsible for scheduling and distributing tasks, whereas the executor nodes are mainly responsible for executing tasks.
Compaction Service can serve the instances of multiple tenants by using a resource isolation and scheduling mechanism and the Compaction Service cluster can be automatically scaled out or in within seconds by using a load-aware auto scaling mechanism.
• Shared elastic resource pools: Regional shared resource pools ensure that resident resources are available for efficiently executing compaction tasks. This prevents inefficiencies caused by frequent resource creation and release. In addition, resource sharing and auto scaling help reduce costs.
• Isolation and scheduling for the tasks of multiple tenants: Tasks are scheduled by using a fair mechanism to prevent tasks from waiting an extended period of time for execution.
• Auto scaling within seconds: The Compaction Service cluster is automatically scaled out or in within seconds by using a load-aware auto scaling mechanism. When tasks begin to accumulate, the cluster is scaled out to reduce wait time. When the number of tasks is small, the cluster is scaled in to reduce costs.
Compaction tasks are scheduled to Compaction Service instead of storage nodes for execution to reduce the impacts on read and write operations. The following features are implemented to achieve this:
▶ Execution: A storage SDK is implemented on the storage engine of AnalyticDB for MySQL. With this storage SDK, compaction tasks can be executed on serverless Compaction Service clusters instead of storage nodes. The storage SDK core contains the following two modules:
• Read module: allows you to read storage and index files in formats specific to AnalyticDB for MySQL from storage nodes or OSS.
• Write module: allows you to process the data the is read, such as sorting, layout building, and index building, and write the results to specific storage mediums.
▶ Scheduling: The following layers of scheduling are involved:
• Layer 1: Compaction Scheduler on storage nodes
• Layer 2: Controller node in Compaction Service
As mentioned in the preceding text, the tasks of multiple users whose instances reside in the same region are submitted to the same Compaction Service. In this case, Compaction Service must support multi-tenancy. The following challenges are to be overcome to isolate and schedule the tasks of multiple tenants:
• Resource allocation and priority rules. Each instance must be allocated an appropriate amount of resources for task execution and tasks must be scheduled in a fair manner so that they do not wait an extended period of time for execution.
• Task isolation for tenants. The execution context of the tasks of each tenant cannot affect each other and execution cannot be affected by tasks competing for resources.
Compaction Service isolates and schedules the tasks of multiple tenants by using the following solution:
• The controller node maintains the global status and is responsible for task throttling and scheduling.
• An executor node maintains and reports its status and is responsible for executing and isolating tasks.
Compaction Service performs task throttling on each instance for self-protection. Otherwise, if an instance submits an excessive number of tasks, the resources of Compaction Service will be exhausted by the instance. The default throttling settings can ensure that the tasks of each instance can be executed as expected under normal use.
▶ In Compaction Service, the minimum resource allocation unit is one slot. One executor node supports eight slots, and one task occupies one slot.
▶ The maximum number of slots allocated to an instance, which is the maximum number of tasks that can be concurrently executed for the instance, is determined based on the following strategy:
• Principle: Slots are allocated based on the number of AnalyticDB compute units (ACUs) of each instance.
• Strategy: By default, the maximum number of tasks that can be concurrently executed for an instance is calculated by using the following formula: Number of ACUs × 0.375. This default setting can ensure that the tasks of each instance can be executed as expected under normal use.
• Action: Compaction Service controls the number of tasks submitted by each instance.
Compaction Service determines the priorities of tasks based on the number of ACUs and the number of running tasks of each instance to ensure fair task scheduling. This prevents tasks from waiting an extended period of time for execution. Each time the controller node schedules a task, it assigns the highest-priority task to the idlest executor node.
▶ Select the highest-priority task
• Among the instances of multiple tenants, the tasks of the highest-priority instance are preferentially scheduled.
a. Principle: For each instance, try to make the ratio of instance-specific cluster resource consumption to total cluster resource consumption closer to the ratio of instance ACU count to total ACU count.
b. Strategy: The tasks of the instance that has the highest priority score are preferentially scheduled. If multiple highest-priority instances exist, tasks are scheduled in the order in which they are submitted. Priority score of an instance = Ratio of instance ACU count to total ACU count/Ratio of instance-specific cluster resource consumption to total cluster resource consumption.
• Within each instance, tasks are scheduled in the order in which they are submitted by the Compaction Scheduler. The Compaction Scheduler submits tasks in order of priority.
▶ Select the idlest executor node
• The executor node that has the most idle slots is the idlest. If no idle slots are available, tasks wait for the running tasks to complete or the cluster to be scaled out.
To prevent the tasks of an instance from being affected by the tasks of other instances, Compaction Service isolates the execution environment and limits resource consumption for each task.
▶ Isolate the execution context such as task metadata for each task. Task metadata includes instance metadata and table metadata.
• Task execution information includes whether to use SSD as a temporary cache.
• Temporary information during task execution includes execution statistics.
▶ Limit the resources used by each task. Each executor node can execute eight tasks, and each task can use 1/8 of the resources of the executor node.
• By default, a task can use a maximum of 1 CPU core and 1/8 of the memory of an executor node.
• By default, SSD usage is not limited. When SSD usage reaches 90%, SSD stops being used as the cache to ensure that tasks can be executed as expected.
• By default, I/O usage is not limited. When I/O usage approaches 100%, the I/O usage of each task is automatically reduced. The minimum I/O usage is 1/8 of the I/O resources of an executor node.
The load-aware auto scaling feature is the key to improving stability and reducing costs. When tasks begin to accumulate, Compaction Service clusters are scaled out to improve stability. When the number of tasks drops, Compaction Service clusters are scaled in to prevent resource underuse and reduce costs.
Load-aware auto scaling of Compaction Service clusters is implemented by using the HorizontalPodAutoscaler of Kubernetes. Compaction Service supports resource load metrics, such as CPU and I/O utilization, and business load metrics, such as the number of queued tasks and the expected number of replicas. The Kubernetes HorizontalPodAutoscaler automatically performs scale-outs and scale-ins based on the metrics.
To improve execution efficiency, Compaction Service can scale out resources as fast as 40 executor nodes within 30 seconds. The average time for adding one node is less than 1 second. To prevent scale-ins from causing task accumulation, Compaction Service adopts a conservative scale-in policy. When a scale-in is triggered, an executor node is removed only if no tasks are assigned to it for 5 minutes and the running tasks on the executor node are complete. If an executor node is removed when tasks on it are still being executed, resources are wasted.
Compaction Service resolves the pain points, such as weak resource isolation and low concurrency, in compaction task execution. It significantly improves user experience in terms of resource isolation, concurrency level, elasticity, and billing method.
Compaction Service reduces storage consumption by 50% and reduces the total execution time by 40% on average.
Compaction Service is billed on a pay-as-you-go basis. You are charged based on the volume of data that is processed. You do not need to reserve resident resources for compaction tasks.
The Compaction Service feature of AnalyticDB for MySQL has been released. You can enable the Compaction Service feature by referring to the AnalyticDB for MySQL documentation.
Alibaba Cloud Community - November 26, 2024
ApsaraDB - March 26, 2025
ApsaraDB - November 26, 2024
ApsaraDB - December 26, 2023
Apache Flink Community China - August 8, 2022
Alibaba Cloud Native - July 18, 2024
AnalyticDB for MySQL is a real-time data warehousing service that can process petabytes of data with high concurrency and low latency.
Learn MoreAlibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreAn on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreApsaraDB Dedicated Cluster provided by Alibaba Cloud is a dedicated service for managing databases on the cloud.
Learn MoreMore Posts by ApsaraDB