By Baotiao

In today's global AI wave, some believe that the relational database is fading. They think MySQL is losing its vitality. Is this true? AliSQL is a major branch of MySQL. Since its release in 2010, it has quietly supported the high concurrency and high availability needs of the core business of Alibaba Group. It never disappeared. It just stayed quiet for a long time.
In 2026, AliSQL community developers started to inject innovation into AliSQL. This is their first article. It systematically explains the major technical practice of integrating MySQL with DuckDB in depth. This is a breakthrough response to the industry consensus that MySQL is only good at transactional processing (TP). It is also an innovation that combines engineering courage and architectural vision. It adds Online Analytical Processing (OLAP) capabilities to MySQL in a lightweight, efficient, and zero-intrusion way. It remains fully compatible with the MySQL protocol, syntax, and O&M system.
One of the core innovations of MySQL is its pluggable storage engine architecture. This architecture lets MySQL expand its capabilities through different storage engines to support more business scenarios. The pluggable architecture of MySQL is shown in the following figure:

The pluggable storage engine architecture of MySQL is divided into four main parts:
• Runtime layer: handles tasks related to running MySQL, such as communication, access control, System Configuration, and monitoring.
• Binary logging (Binlog) layer: handles generating, copying, and applying Binlog.
• SQL layer: handles parsing, optimizing, and executing SQL.
• Storage engine layer: handles data storage and access.
MySQL designs a standard Data Access Control API (Plugable Engine Interface) between SQL compute and data storage. The SQL layer uses this standard API to update, query, and manage data. The storage engine can be integrated as an independent widget with hot-swapping.
Common storage engines in MySQL include the following:
• MyISAM: the earliest engine used by MySQL. It was replaced by InnoDB because it does not support transactions. However, it remained the storage engine for the system table until MySQL-5.7.
• InnoDB: the default engine of MySQL. It gradually replaced MyISAM to become the most widely used engine in MySQL because of its support for transactions and excellent performance.
• CSV: the CSV file engine. It is the storage engine for the MySQL slow query log and general log.
• Memory: the memory table storage engine. It can also serve as the storage engine for the internal temporary table when you execute SQL.
• TempTable: the engine introduced in MySQL-8.0. It is used to store the internal temporary table.
Introducing InnoDB as an engine to MySQL is a major milestone for the pluggable engine architecture of MySQL. In the early days of the Internet, MyISAM won the favor of Internet businesses due to its simple and efficient access. Together with Linux, Apache, and PHP, it was known as the LAMP architecture. With the rise of E-commerce and the social Internet, the shortcomings of MyISAM became more obvious. InnoDB greatly expanded the capabilities of MySQL because of its support for atomicity, consistency, isolation, and durability (ACID) transactions. It also has advantages in concurrent access and performance. With InnoDB, MySQL became the most popular open source online transaction processing (OLTP) database.
With the widespread use of MySQL, we see more analytical queries based on TP data. The InnoDB architecture is naturally designed for OLTP. It delivers excellent performance in TP business scenarios. However, the query efficiency of InnoDB is very low in analytical business scenarios. This greatly limits the scenarios of MySQL. To this day, MySQL lacks an analytical query engine. The emergence of DuckDB shows us a possibility.
DuckDB is designed for open source OLAP and data analytics workloads. It is lightweight, high-performance, zero-configuration, and easy to integrate. It is quickly becoming a popular choice in data science, business intelligence (BI) tools, and embedded analytics scenarios. DuckDB has the following main features:
• Excellent query performance: The performance of a single-node DuckDB is much higher than InnoDB. It even performs better than ClickHouse and SelectDB.
• High compression ratio: DuckDB uses columnar storage. It automatically selects the appropriate compression algorithm based on the data type. This provides a very high compression ratio.
• Embedded design: DuckDB is an embedded database system. It naturally fits into MySQL.
• Plugin-based design: DuckDB uses a plugin-based design. This makes third-party development and feature extension very easy.
• Friendly license: The DuckDB license permits any use of the DuckDB source code. This includes commercial use.
Because of these reasons, we believe DuckDB is highly suitable as the analytical processing (AP) storage engine for MySQL. Therefore, we integrated DuckDB into AliSQL.

The DuckDB engine provides lightweight, single-node analysis capabilities. DuckDB-based analytical read-only instances are now available for ApsaraDB RDS for MySQL. We welcome you to try it. In the future, we will release the DuckDB-based analytical primary instances with high availability. Users use tools such as Data Transmission Service (DTS) to aggregate heterogeneous data into the DuckDB-based analytical instances. This enables data analysis and queries.

The DuckDB-based analytical read-only instance uses a read/write splitting architecture. Analytical workloads and primary database workloads are separated. They do not affect each other. Similar to standard read-only instances, it copies data from the primary database using the binary logging (binlog) replication mechanism. The DuckDB-based analytical read-only instances offer the following advantages:
• High-performance analytical queries: Based on the query capabilities of DuckDB, the performance of analytical queries increases by up to 200 times compared to InnoDB. For more information, see the Performance section.
• Low storage cost: Because of the high compression ratio of DuckDB, the storage space of a DuckDB-based analytical read-only instance is typically only 20% of the primary database.
• 100% compatible with MySQL syntax with zero learning cost. DuckDB is integrated into MySQL as an engine. Therefore, users still use MySQL syntax for queries. This eliminates learning costs.
• No additional management costs: A DuckDB-based analytical read-only instance is still an RDS for MySQL instance. It adds only a few MySQL parameters compared to a standard read-only instance. Therefore, you manage, operate, and monitor DuckDB exactly like a standard RDS for MySQL instance. Monitoring information, slow query logs, audit logs, and RDS APIs remain identical.
• One-click creation of DuckDB-based analytical read-only instances. Data automatically transforms from InnoDB to DuckDB. No extra operations are required.

The usage of a DuckDB-based analytical read-only instance involves the query path and the binlog replication task. The query path accepts user query requests and executes data queries. The binlog replication task connects to the primary instance to replicate binlogs. The following sections describe the technical principles of these two aspects.

The query execution flow is shown in the preceding figure. InnoDB only stores metadata and system info, such as accounts and configurations. All user data resides in the DuckDB engine. InnoDB only stores metadata and system info, such as accounts and configurations.
Users connect to the instance using a MySQL client. After a query arrives, MySQL first parses and processes it. Then, MySQL sends the SQL to the DuckDB engine for execution. After DuckDB finishes execution, it returns the result to the server layer. The server layer transforms the result set into a MySQL result set and returns it to the customer.
The most important work in the query path is compatibility. The data types of DuckDB are basically compatible with MySQL. However, DuckDB has major differences from MySQL in syntax and function support. Because of this, we extended the DuckDB syntax resolver to make it compatible with MySQL-specific syntax. We rewrote many DuckDB functions and added many MySQL functions. This ensures that common MySQL functions run accurately. The automated compatibility test platform ran about 170,000 SQL tests. The results show a compatibility rate of 99%.

Because DuckDB does not support the two-phase commit protocol, the two-phase commit protocol does not ensure consistency between binlog Global Transaction Identifiers (GTIDs) and data. It also does not ensure consistency between InnoDB metadata and DuckDB during Data Definition Language (DDL) operations. Therefore, we modified the transaction commit procedure and the binlog replay procedure. This ensures data consistency after instance failures, breakdowns, and restarts.
The DuckDB implementation favors large transactions. Frequent small transactions execute inefficiently. This causes critical replication delay. Therefore, we optimized binary logging replay. We use batching to replay transactions. This optimization achieves a replay rate of 300,000 rows/s. In Sysbench stress tests, this method eliminates replication delay. It also delivers higher replay performance than InnoDB.

DuckDB does not support a few MySQL DDL operations, such as modifying column order. To ensure normal replication, we implemented the copy DDL mechanism. DuckDB executes natively supported DDL operations in inplace or instant mode. If DuckDB does not support a DDL operation, the system uses copy DDL to create a new table to replace the original table.

Copy DDL uses multi-threaded parallel execution. This reduces the running time by a factor of seven.

Test environment
ECS instance with 32 CPUs, 128 GB of memory, and a 500 GB PL1 ESSD
Test types
TPC-H SF100

We deeply integrated DuckDB into AliSQL to build a MySQL analytical instance with high performance and compatibility. This innovation resolves the long-standing limitations of MySQL in online analytical processing (OLAP) scenarios. It also creates a new path for lightweight hybrid transactional and analytical processing (HTAP). You can achieve powerful real-time analysis without a complex distributed architecture.
The DuckDB engine boosts analytical query performance by up to 200 times. Users do not need to change their existing application architecture. More importantly, users can use MySQL protocols and familiar SQL syntax. They do not need to learn new tools or modify applications. One-click creation, automatic synchronization, and seamless switching truly deliver "analytical capabilities as a service."
Letting the Executor "See" Data Locations: Lizard Physical Addressing Optimization
AliSQL DuckDB: Data Compression and Archiving Analysis Practices
ApsaraDB - May 13, 2026
ApsaraDB - November 18, 2025
ApsaraDB - January 15, 2026
ApsaraDB - March 20, 2026
ApsaraDB - September 10, 2025
ApsaraDB - April 20, 2026
Container Compute Service (ACS)
A cloud computing service that provides container compute resources that comply with the container specifications of Kubernetes
Learn More
Container Service for Kubernetes
Alibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn More
PolarDB for MySQL
Alibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn More
Bastionhost
A unified, efficient, and secure platform that provides cloud-based O&M, access control, and operation audit.
Learn MoreMore Posts by ApsaraDB