ApsaraDB RDS for MySQL DuckDB-based analytical instances use columnar storage and vectorization as their core technologies to improve the performance of complex analytical queries by a hundredfold. Columnar storage compression technology significantly reduces storage costs. These instances provide businesses with cost-effective, real-time analytics for large-scale data scenarios to enhance data-driven decision-making.
Features
DuckDB-based analytical instances include the following types:
DuckDB-based analytical primary instance
Definition: A standalone database instance that is similar to a regular RDS for MySQL primary instance. This type is available for RDS Cluster Edition and deployed with multiple standby nodes that can handle read requests.
Core features:
It retains all basic features of a regular RDS for MySQL instance and is compatible with related tools and ecosystems. For more information, see Overview of a DuckDB-based analytical primary instance.
It deeply integrates the DuckDB analytical engine into the MySQL kernel. It uses columnar storage and vectorization as its core technologies to provide both strong support for transactions and high-performance complex analytical processing capabilities.
DuckDB-based analytical read-only instance
Definition: A type of read-only instance that is attached to a regular RDS for MySQL instance that runs High-availability Edition.
Core features:
Seamless data flow: It uses native replication based on binary logging (binlog) to automatically synchronize data and transform table schemas. This eliminates the need to maintain a separate data synchronization link.
Integrated HTAP: It supports analytical requests through direct connections to the DuckDB-based analytical read-only instance or by automatically routing requests to the instance through a database proxy. This architecture provides integrated hybrid transactional and analytical processing (HTAP) capabilities. This ensures the transactional processing (TP) performance of the primary instance and improves the efficiency of data analytics.
Comparison item | DuckDB-based analytical primary instance | DuckDB-based analytical read-only instance | RDS for MySQL read-only instance | OLAP database | |
Scenario | Complex analytic queries | Complex analytic queries | Transaction processing | Complex analytic queries | |
Analytic query performance | Strong | Strong | Weak | Strong | |
Data synchronization method | DTS data synchronization link | Native replication based on binary logging | Native replication based on binary logging | DTS data synchronization link | |
MySQL compatibility | Data type | Fully compatible | Fully compatible | Fully compatible | Incompatible (requires field mapping) |
SQL syntax | Highly compatible (> 99.9%) | Highly compatible (>99.9%) | Fully compatible | Incompatible (requires SQL rewrite) | |
DDL | Highly compatible | Highly compatible (supports automatic rebuilding) | Fully compatible | Partially compatible | |
O&M costs | Low | Low | Low | High | |
Deployment model | Standalone deployment | Attached to a regular RDS for MySQL instance | Attached to a regular RDS for MySQL instance | Standalone deployment | |
Supported edition | Cluster Edition | High-availability Edition | Basic Edition, High-availability Edition | - | |
Scenarios
Aggregate analysis: DuckDB-based analytical instances provide efficient aggregate queries for real-time aggregate analysis, such as analyzing log data.
Multi-table join queries: For query services that involve multi-table
JOINoperations, these instances can significantly improve the analytical performance of MySQL.
Technical principles
What is DuckDB?
DuckDB is a standalone online analytical processing (OLAP) database designed for embedded scenarios. Its core architecture balances high-performance analytics with transaction processing:
High-performance analytics: Columnar storage significantly accelerates aggregate analysis queries, and the vectorized execution engine efficiently processes data in batches.
Strong transaction support: It provides full atomicity, consistency, isolation, and durability (ACID) transaction capabilities. It uses a Multi-Version Concurrency Control (MVCC) mechanism to enable efficient concurrent read and write operations in a standalone environment.
Technical optimizations
ApsaraDB RDS deeply integrates the DuckDB engine to combine DuckDB's high-performance analytics with the MySQL ecosystem. This approach balances high-performance analytics with transaction processing. This meets the efficient analytical needs of enterprises for large-scale data while ensuring enterprise-grade data reliability and consistency.
General technical optimizations (for both primary and read-only analytical instances)
Storage engine optimization
It encapsulates DuckDB's columnar storage as a transactional storage engine within MySQL and uses the standard MySQL data dictionary to manage metadata. This ensures full compatibility with the transaction semantics of the primary instance.
It enhances the native synchronization capability based on binary logging and uses DuckDB's write-ahead logging (WAL) for transaction persistence. This ensures data consistency between the ApsaraDB RDS for MySQL instance and the DuckDB-based analytic instance.
Compute engine enhancement
It integrates core DuckDB components. The optimizer, vectorized execution engine, and compiler are deeply integrated to support just-in-time (JIT) compilation and vectorization. This improves complex query performance by two orders of magnitude compared with InnoDB.
It adapts the SQL parser to support 99.9% of MySQL syntax and functions. This ensures that existing query statements can be executed directly without modification.
The Result Translator component automatically converts DuckDB's computation results into a MySQL protocol-compatible format. This allows for seamless client integration without requiring adjustments to existing application logic.
Query acceleration
Columnar storage: Data is stored locally in DuckDB's native columnar format. This improves aggregate query performance by more than 100 times. For example, the response time for a SUM query on the same amount of data is reduced to 1/100th of the time required by InnoDB.
Automatic caching of hot data: The system uses DuckDB's Buffer Pool mechanism to automatically cache frequently queried data. This provides stable support for high-concurrency business scenarios.
Additional technical optimizations for DuckDB-based analytical read-only instances
Data synchronization performance improvement
It introduces a batching mechanism that combines high-frequency small transactions into large transactions. This significantly reduces write latency.
Idempotent replay: The system performs idempotence checks on binary log events to ensure strong consistency in data synchronization.
DDL synchronization optimization
The system automatically identifies Data Definition Language (DDL) operations that are natively supported by DuckDB, such as creating tables and adding or removing fields. These operations are routed directly to the DuckDB engine for execution without conversion. This ensures data synchronization performance.
For DDL operations not supported by DuckDB, the system automatically triggers a table rebuild before execution. This prevents execution errors from interrupting replication and ensures the stability of the synchronization link.
Resource isolation: The DuckDB-based analytical read-only instance is isolated from the ApsaraDB RDS for MySQL primary instance. Therefore, analytical queries do not affect online transaction processing, ensuring the stability of core business operations.
Workflow of a DuckDB-based analytical instance
SQL parsing: A user's SQL query is parsed by the standard MySQL parser.
SQL execution: The parsed query is routed to the DuckDB compute engine for execution.
Data retrieval: Columnar data is retrieved from the DuckDB storage engine.
Format conversion: The computation result is converted by the
Result Translatorcomponent into the MySQL protocol format and then sent to the client.