Alibaba Cloud launched the Basic Edition of AnalyticDB for MySQL to simplify the process of building a data warehouse. AnalyticDB for MySQL Basic Edition features high compatibility with MySQL, exceptional low costs, and superior performance, allowing small- and medium-sized enterprises (SMEs) to easily build a real-time data warehouse and mine data online for better value.
AnalyticDB for MySQL comes in Basic Edition and Cluster Edition. The Basic Edition provides services through a single node based on a minimalist architecture that significantly reduces costs. The Basic Edition provides robust analysis performance by using a storage-computing separation architecture, row-column hybrid store, lightweight indexing, and distributed hybrid computing engine. Using the Basic Edition, an enterprise can build a real-time data warehouse at an annual cost of less than RMB 10,000, without having to establish a dedicated big data team, which instead costs millions of RMB.
Let's take a look at the technical architecture of AnalyticDB for MySQL Basic Edition, which consists of a coordinator and a worker.
Coordinator, the frontend node of AnalyticDB for MySQL Basic Edition executes the following key functions:
Worker, the storage and compute node of AnalyticDB for MySQL Basic Edition includes the following components:
a) Computing Module: It uses the distributed hybrid computing engine and optimizer integrating massively parallel processing (MPP) and directed acyclic graph (DAG) to improve complex computing and hybrid load management capabilities. The computing module schedules resources in a flexible and elastic manner on the Alibaba Cloud Computing Platform. Start the Worker Compute Node independently and scale-out as needed in a matter of minutes or even seconds to maximize resource utilization.
b) Storage Module: It's more lightweight and supports real-time reading and writing with higher throughput. The write performance is about 50% higher as compared to the earlier version with the same specifications. Data is written in several milliseconds to support real-time analysis. The storage node supports backup and restoration in full and incremental modes. It synchronizes the periodic snapshots and logs of cloud disks to Object Storage Service (OSS) in real-time for storage. This improves the security of your data and helps you recover data when a database error occurs.
c) Worker Groups: The worker nodes with storage modules are allocated to worker groups. The Cluster Edition provides triplicate storage and uses Raft, a consensus distributed protocol, to allow the worker nodes to work as a whole. There is no disruption in service continuity, even when some worker nodes are faulty. The Basic Edition provides single-replica storage.
The optimizer processes the parser-generated syntax tree and uses the optimization algorithm to generate an optimal-cost plan for the computing engine. The plan cost directly affects query performance. Hence, the optimizer is one of the core modules in the database. The Basic Edition provides an optimizer - as powerful as that of the Cluster Edition. The optimizer implements optimization based on rules, costs, and patterns.
Complex analytical queries often involve joining multiple tables, and the table join order directly affects query performance. AnalyticDB for MySQL provides an optimizer that uses an algorithm to optimize the join order based on the estimated cost and real-time sample information. The algorithm perceives data distribution in the underlying storage. AnalyticDB for MySQL provides the full-index feature and the optimizer uses this feature to improve the accuracy of filter factor estimation. For complex join operations, the optimizer dynamically adjusts the join order based on the data distribution. It evaluates the cost of data reshuffling to select the optimal execution plan based on the global cost estimates.
Unlike a typical rule-based optimizer, the AnalyticDB for MySQL optimizer also provides functions for cost estimation and iteration optimization and integrates the Cascades cost-based optimizer (CBO) framework. The CBO search framework calls the Property Enforcement module to generate distributed execution plans and calls the cost estimation module to evaluate the cost of each candidate plan and select the optimal distributed execution plan. The AnalyticDB for MySQL optimizer implements history-based optimization, SQL pattern-based dynamic optimization, and data-driven intelligence to improve the join order further and optimize the join efficiency. For example, the Auto Analyze module automatically collects statistics to provide accurate data that allows the optimizer to search for the optimal execution plan.
To improve query performance, the AnalyticDB for MySQL optimizer optimizes combined filter conditions, aggregate operators, and joined subqueries in complex queries. For example, the pushdown optimization technology pushes the filter conditions and aggregate operators in an execution plan to the underlying module for execution. It improves the efficiency of underlying operators, reduces the amount of data processed by upstream operators, and eventually improves the overall query performance. As for joined subquery statements, the optimizer converts joined subqueries into an unjoined plan with the same semantic meaning to enhance the pipelining efficiency of the computing engine.
The AnalyticDB for MySQL computing engine uses the MPP+DAG architecture and in-memory pipelining mode to ensure high concurrency and low latency. To accelerate the evaluation of complex expressions and optimize the execution performance, the computing engine generates JVM bytecode at runtime by using Runtime CodeGen and dynamically loads the generated object instances. This reduces virtual function calls during execution and improves the efficiency of CPU-intensive tasks. The computing engine uses a vectorized execution model to evaluate expressions and uses a CPU SIMD instruction set to accelerate the evaluation.
As shown in the diagram below, the AnalyticDB for MySQL storage engine uses a row-column hybrid store architecture.
For every k rows of data (known as row group) in a table, the data of each column is continuously stored in a separate data block, and the column blocks of each row group are continuously stored in a disk. It's easy to sort the data in the column blocks of a row group based on a specified column to significantly reduce the number of random disk-based I/O operations during a query regarding this column. The row-column hybrid store combines the advantages of row store (suitable for point queries in online transaction processing [OLTP]) and column store (suitable for multidimensional analysis in online analytical processing [OLAP]) to meet the requirements of different types of workloads.
The AnalyticDB for MySQL storage engine uses intelligent full indexing to create an inverted index relative to the row number for the data of each column. In the query process, the AnalyticDB for MySQL storage engine converts the AND and OR operators of multiple SQL conditional expressions into Boolean queries and indexes these queries. It can search for the result sets that satisfy the WHERE clause in milliseconds and identify the numbers of the rows that store these result sets. It also merges multiple result sets.
The Basic Edition significantly simplifies the process of building a data warehouse and is more cost-effective than other data warehouse building methods, such as big data (Hadoop, Spark, and Elastic MapReduce [EMR]) and OLTP.
a) Ease of Use: The Basic Edition is billed at a minimum price of RMB 1.75/hour or RMB 860/month, which is about two-thirds of the minimum price of the Cluster Edition. Storage space is billed at only RMB 0.6/GB. A Basic Edition cluster supports up to 4 TB of storage space. Users may expand storage space as needed. This makes it easier for small- and medium-sized enterprises to build real-time warehouses and perform sophisticated analytics.
b) High Performance: In terms of data query, a Basic Edition cluster is 10 times faster than a MySQL database that has the same configuration.
c) Wide Range of Specifications: The Basic Edition supports the multiple specifications, including T8, T16, T32, and T52. It's easy to select and change the specifications based on business requirements.
d) Transparent Ecosystem: Upstream and downstream resources are compatible with the Cluster Edition and are transparent to users.
The Basic Edition is suitable for different customer groups with varied requirements.
To learn more, visit the AnalyticDB for MySQL page!
ApsaraDB - January 9, 2023
ApsaraDB - October 21, 2020
ApsaraDB - July 2, 2020
ApsaraDB - November 17, 2020
Alibaba Cloud New Products - August 10, 2020
ApsaraDB - July 4, 2022
Alibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.Learn More
Alibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.Learn More
AnalyticDB for MySQL is a real-time data warehousing service that can process petabytes of data with high concurrency and low latency.Learn More
ApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.Learn More
More Posts by ApsaraDB