AnalyticDB for MySQL is a real-time data warehousing service that is developed by Alibaba Cloud. It can process petabytes of data and has been tried and tested in the core business of ultra-large scales.

Overview

After its initial release in Alibaba Group in 2012, AnalyticDB for MySQL has so far been iterated through nearly 100 versions and has been supporting real-time analysis for a variety of business sectors owned by Alibaba Group, such as e-commerce, advertising, logistics, entertainment, tourism, and risk control. In 2014, AnalyticDB for MySQL was officially released to the public. It has provided services for traditional large and medium-sized enterprises, public service sectors, and Internet enterprises in more than a dozen industries.

AnalyticDB for MySQL is a cloud-native data warehousing service that integrates database and big data capabilities.

Architecture

AnalyticDB for MySQL adopts a cloud-native architecture that separates computing from storage and hot data from cold data. It supports real-time data write operations that have high throughput, strong data consistency, high query concurrency, and high-throughput batch processing.

AnalyticDB for MySQL Data Warehouse Edition (V3.0) is suitable for high-performance real-time analysis. As the data volume increases and more data formats are supported, data must be preprocessed before extract-transform-load (ETL) operations. To resolve this issue, AnalyticDB for MySQL Data Lakehouse Edition (V3.0) is released. It provides high-throughput batch processing capabilities to meet batch processing and real-time analysis requirements.

Data Warehouse Edition (V3.0)

Access layer

The access layer consists of linearly scalable coordinator nodes. It is used for protocol layer access, SQL parsing and optimization, real-time sharding of written data, data scheduling, and query scheduling.

Compute engine

The compute engine integrates the distributed massively parallel processing (MPP) and directed acyclic graph (DAG) capabilities. It leverages an intelligent optimizer to support high-concurrency and complex SQL queries. The cloud-native infrastructure allows compute nodes to be scaled within seconds. This way, resources are efficiently utilized.

Storage engine

The storage engine supports real-time data write operations with strong consistency and high availability in compliance with the Raft consensus protocol. The storage engine uses data sharding and Multi-Raft to support parallel processing, tiered storage of hot and cold data to reduce costs, and hybrid row-column storage and intelligent indexing to provide ultimate performance.

This three-layer architecture supports failover within seconds and can implement cross-zone deployment, automatic fault detection, and replica deletion and recreation. It supports three-replica data storage and full and incremental backups, providing data reliability that is required in the finance industry. AnalyticDB for MySQL provides tools that can be used to migrate, synchronize, manage, integrate, and protect your data, so that you can focus on business development.

Data Lakehouse Edition (V3.0)

Compared with Data Warehouse Edition (V3.0), Data Lakehouse Edition (V3.0) can implement low-cost batch processing and high-performance real-time analysis. Data Lakehouse Edition (V3.0) significantly improves the data processing capabilities in collection, storage, computing, management, and application.

Data source

AnalyticDB Pipeline Service (APS) is provided to implement low-cost data access to sources, such as databases, logs, and big data platforms.

Storage layer and compute layer

Data Lakehouse Edition provides two in-house engines: XIHE compute engine and XUANWU storage engine. It also supports the open source Spark compute engine and Hudi storage engine. Data Lakehouse Edition is suitable for a variety of data analysis scenarios and supports access between the in-house and open source engines to implement centralized data management.

  • Storage layer: One copy of full data can be used for both batch processing and real-time analysis.

    In batch processing scenarios, data needs to be stored on low-cost storage media to reduce costs. In real-time analysis scenarios, data needs to be stored on fast storage media to improve performance. To meet the requirements for batch processing, Data Lakehouse Edition stores one copy of full data on low-cost, high-throughput storage media. This reduces data storage and I/O costs and ensures high throughput. To meet the requirements for real-time analysis within 100 milliseconds, Data Lakehouse Edition stores real-time data on individual elastic I/O units (EIUs). This helps meet the timeliness requirements for row data query, full indexing, and cache acceleration.

  • Compute layer: The system automatically selects an appropriate compute mode for the XIHE compute engine. The open source Spark compute engine is suitable for various scenarios.

    The XIHE compute engine provides two compute modes: MPP and BSP. The MPP mode uses stream computing, which is not suitable for low-cost and high-throughput batch processing scenarios. The BSP mode divides tasks in DAGs and computes data for each task. This way, large amounts of data can be computed by using limited resources, and data can be stored on disks. If the MPP mode fails to process data within a specified period of time, the XIHE compute engine can automatically switch data computing to the BSP mode.

    The open source Spark compute engine is suitable for more complex batch processing and machine learning scenarios. The compute layer and storage layer are separated but interconnected, which allows you to create and configure Spark resource groups with ease.

Access layer

The access layer leverages unified billing units, metadata and permissions, development languages, and transmission links to improve development efficiency.

AnalyticDB for MySQL combines the advantages of distributed architecture, elastic computing, and cloud computing to significantly improve its scalability, usability, reliability, and security. This helps meet the requirements for data warehousing in different scenarios. AnalyticDB for MySQL supports concurrent access on a larger scale, provides faster read and write performance, and implements smarter management of hybrid query loads. It allows you to utilize resources in a finer-grained manner and at a lower cost, which allows you to focus more on business development and data value.