By Yanxian and Wuha, Database Technology Experts at Alibaba Cloud
X-Engine is a database storage engine developed by Alibaba Cloud. It is integrated into the MySQL ecosystem through plug-ins and supports core functions in OLTP scenarios, such as row locks, transactions, and multi-version concurrency control (MVCC).
X-Engine is cost-effective and especially suitable for historical database scenarios. It is used to solve the cost problems of businesses, such as the core transaction history database of Alibaba (originally HBase), the DingTalk message history database (originally InnoDB), and Taobao image space.
X-Engine also empowers Alibaba Cloud database services. As the cloud-based storage engine for ApsaraDB RDS for MySQL, X-Engine is available to external users, allowing more users to enjoy the benefits of new technologies. For more information about X-Engine, see the monthly database kernel report for October 2019. This article introduces a core function of the X-Engine engine: online data definition language (DDL). Online DDL is an important function of the MySQL ecosystem. Before MySQL 5.6, when a database administrator (DBA) performed DDL change operations, primary-secondary failover was the only way to ensure 24/7 service.
Two important features that distinguish a database storage engine from a NoSQL engine are support for SQL statements and the availability of a schema (data dictionary). In addition to a schema, you also need to make flexible changes online to adapt to rapid business changes. Of all the storage engines in the MySQL ecosystem, only InnoDB supports online DDL. As a new member of the MySQL ecosystem, X-Engine uses a storage architecture completely different from that of InnoDB but provides the same user experience.
X-Engine uses a hierarchical architecture similar to the log-structured merge-tree (LSM tree) structure. Data is divided into multiple layers according to the time sequence logic. New data is placed at a higher layer, while the oldest historical data is at the lowest layer. For X-Engine, each primary table and secondary index is a hierarchical LSM tree structure, which is internally called a subtable. Each subtable consists of four layers in the specific sequence: Memtable, L0, L1, and L2. Data is migrated to lower layers based on its generation time. The Memtable layer is in the memory, while the other layers are placed on different storage media as needed. Online DDL makes full use of the data structure of X-Engine.
In the data structure, new build data is divided into baseline data and incremental data. Baseline data is the data that is traversable by using the snapshots before the change starts, while incremental data is the data written after the change starts. Writing is disabled for a short time during snapshotting because we strongly rely on this consistency point to ensure the integrity of baseline and incremental data.
Online DDL includes the following four phases:
1) Prepare: In this phase, the data dictionary is prepared, and the subtable for storing data is built at the underlying layer for subsequent incremental data writing.
2) Inplace-build: This is the core phase of online DDL. Here, baseline data is obtained through snapshots, and incremental data is maintained in real-time. By using the append-only feature of X-Engine's data organization, baseline data and incremental data are merged, building new data for online DDL. The logic of this process will be described in detail in the next section.
3) Commit: In this phase, the modification takes effect at the online DDL engine layer. If no exception or error occurs during the online DDL process, the new data dictionary and the data will be applied.
4) Post-DDL: In this phase, online DDL takes effect. A success message is returned only after this phase is complete. This phase is introduced mainly because MySQL is extended through a two-layer structure (server+engine), and each layer has its own data dictionary. The commit phase guarantees the completeness of the data and the data dictionary at the engine layer. To ensure the atomicity of DDL changes (by ensuring the data dictionaries of the server layer and the engine layer are consistent), the post-DDL phase is introduced for subsequent cleanup. The atomicity of DDL statements will be described in detail in the following sections.
The core logic of online DDL is for ensuring that the data manipulation language (DML) and SELECT operations on tables are not blocked upon DDL changes. By using the append-only feature of data organization, X-Engine maintains incremental data at the Memtable, L0, and L1 layers, and baseline data at the L2 layer. In addition, X-Engine writes incremental data to old tables and the new table concurrently. Similar to InnoDB, DDL changes are classified into inplace-rebuild and inplace-norebuild based on record format changes; and are essentially the same for X-Engine but differ in the number of indexes maintained. Inplace-rebuild DDL changes need to maintain all indexes in the new table simultaneously, while inplace-norebuild DDL changes need only to maintain the new indexes in the new table.
The inplace-build phase contains three key nodes arranged in chronological order. A snapshot is taken at time t0, baseline data is built at time t1, and the uniqueness check is completed at time t2. The main logic of the two phases is as follows:
From t0 to t1, the baseline of the new table is built. The data of the old tables integrate with the data dictionaries of the new table to generate new records and write the records to the L2 layer of the new table. The incremental data generated when the baseline of the new table is built is written to the Memtable, L0, and L1 layers of the new table.
During the DDL process, you must control the compaction tasks in the backend to ensure that the compaction tasks merged to L2 are not executed. From t1 to t2, uniqueness is checked to ensure the uniqueness of the new primary key or unique index. At t2, data at the Memtable, L0, L1, and L2 layers are integrated to get the full data of the new table. The following figure shows the conversion process.
The DDL transaction represents the DDL thread, which scans the baseline and generates baseline data for the new table. The DML transaction represents the concurrent DML transactions during the DDL process. They simultaneously maintain incremental data in the new table and the old tables in dual-channel mode.
Although the online DDLs of X-Engine and InnoDB are implemented in the form of baseline data and incremental data, their logic is different. InnoDB updates data in place and maintains incremental data by using the row-log mechanism. X-Engine is an append-only storage engine that natively supports multi-version data storage, allowing you to maintain incremental data in real-time. After the baseline is built, you only need to merge the baseline data and incremental data. Even if the baseline data is modified when the incremental data is added, the incremental data version is later than the baseline data version and overwrites the baseline data of the earlier version during merging. The following figure shows the online DDL process of the InnoDB engine.
Note that the online DDL of InnoDB also contains three key time points. In contrast to X-Engine, InnoDB enables the row-log process from t1 to t2. However, X-Engine does not require a uniqueness check from t1 to t2 because DDL changes might not involve a unique index operation.
Similar to InnoDB of MySQL 8.0, X-Engine also supports instant-DDL. Of all the available online DDL operations, if you perform DDL operations only to modify table properties or add columns, you do not need to change the record format or add indexes. These online DDL operations can be optimized to instant-DDL operations, which can be completed quickly and are imperceptible to users.
When an instant-DDL operation is performed, the engine data is not modified. Therefore, you need to add some control metadata to the engine record format to ensure the correctness of the subsequent query results and DDL operations. A byte can be added to indicate whether an instant-DDL operation has been performed when the record is generated. When a record is generated, it also needs to record the number of existing columns and the number of null columns. When you parse the record, you can see how many columns need to be supplemented with the instant column information according to the dictionary information. In this way, the query result correctness is returned after performing the instant-DDL operation.
The last phase of the online DDL process is post-DDL. Before MySQL 8.0, metadata at the server layer was stored in files such as FRM files, PAR files, and TRG files. Therefore, a DDL change involved file modification, engine data modification, and engine dictionary modification. If these operations could not be completed in a single transaction, the entire DDL operation was not atomic. If an exception occurred in the DDL process, data could be inconsistent between the server layer and the engine layer, and the residual garbage would not be cleaned up. MySQL 8.0 stores all the dictionary information of the server layer in DataDictionary (DD) through the InnoDB engine. Therefore, in the DDL process, we only need to modify the data dictionary at the server layer and encapsulate the data dictionary at the engine layer into a transaction.
The InnoDB engine allows storing DD and InnoDB data dictionary operations in the InnoDB engine. The InnoDB transaction characteristics guarantee atomicity. The X-Engine engine allows storing DD and X-Engine engine data dictionary operations in the InnoDB engine and X-Engine engine respectively. The InnoDB and X-Engine transaction characteristics and the internal 2-phase commit (2PC) protocol guarantee atomicity. If binlog is enabled for MySQL, binlog, X-Engine, and InnoDB together guarantee the atomicity of transactions through the 2PC protocol. Cleanup work is completed in the post-DDL phase. If the entire transaction is committed, old table data is cleaned up in this phase. If the entire transaction is rolled back, temporarily generated new table data is cleaned up in this phase to ensure that the database status is consistent before and after the DDL change.
As a new MySQL engine, X-Engine uses the same syntax as InnoDB. The algorithm_option specifies the online type, and the lock_option specifies whether to allow other concurrent DML and SELECT operations during the DDL process. In general, the default values of these two options can be used. MySQL will use the instant and inplace types internally. For DDL operations that do not support online, the copy type is selected. X-Engine also provides the same features as InnoDB. Currently, X-Engine does not support full-text indexing, virtual columns, and foreign keys. Therefore, DDL operations related to these features are not supported, while other DDL operations are the same as those for InnoDB. The following figure classifies common DDL operations.
As a new database storage engine, X-Engine has demonstrated its value in recent business scenarios of the Alibaba Group. We hope that more users benefit from this new technology in ApsaraDB for RDS scenarios. At present, X-Engine still has some shortcomings compared to the traditional and mature products, InnoDB and Oracle. Therefore, we need to improve X-Engine's stability and performance and work on it to support more features, such as foreign keys, full-text indexing, and virtual columns. In addition to RDS in the public cloud, the X-Engine-based integrated distributed database PolarDB-X can be used in private clouds to serve more users with strong demand for distributed databases.
FASTER: How Does Microsoft KV Store Achieve 160 Million OPS?
A Small Victory – Acceleration of BITFIELD Commands for ApsaraDB for Redis
ApsaraDB - November 16, 2020
ApsaraDB - March 3, 2020
ApsaraDB - February 19, 2020
ApsaraDB - November 17, 2020
ApsaraDB - October 20, 2020
ApsaraDB - May 14, 2020
An on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
An on-demand database hosting service for MySQL, SQL Server and PostgreSQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
ApsaraDB Dedicated Cluster provided by Alibaba Cloud is a dedicated service for managing databases on the cloud.Learn More
More Posts by ApsaraDB