Comparison of lock-free schema change solutions - Data Management

Table locking caused by scheme changes is a danger to your business. To resolve this issue, Database Management (DMS) provides the lock-free DDL feature that adopts a triggerless design. This feature allows you to change the schemas of large tables without the need to lock tables. This topic compares several lock-free schema change solutions.

We recommend that you do not directly perform DDL operations on tables in an online environment. You can use the following solutions to prevent impacts of DDL operations on your business:

Alter tables during off-peak hours.
Specify an appropriate and long enough period for altering the tables during off-peak hours. However, if the alter operation is not complete within the specified period on an excessively large table, your business is still affected.
Alter tables in the secondary database and perform a switchover between the primary and secondary databases.
The primary and secondary databases must be created before you can switch the databases in an appropriate period.
Online schema change tools:
- pt-online-schema-change: If you want to alter Table A, this tool creates a temporary table named Table A_gst and performs DDL operations on Table A_gst. In addition, this tool creates a DML trigger on Table A and copies data from Table A to Table A_gst. In the process of copying data from Table A to Table A_gst, the DML trigger captures incremental changes to Table A and synchronously applies the changes to Table A_gst. After the data is copied from Table A to Table A_gst, this tool renames Table A_gst after Table A.
- Online Schema Change (OSC): This tool basically works in the same way as the pt-online-schema-change tool. The difference is that this tool adopts an asynchronous design. A log table is created based on Table A_gst. Changes to Table A are captured by the DML trigger and recorded in the log table. Then, backend processes apply the recorded changes in the log table to Table A_gst. The entire process is asynchronous. Therefore, you can manage the process of applying the changes to Table A_gst.
- gh-ost: This tool basically works in the same way as the preceding two tools. However, this tool does not use triggers. Instead, this tool uses binary logs to capture incremental table changes. This tool reads the changes to Table A that are recorded in the binary logs. Then, this tool analyzes and applies the changes to Table A_gst. The binary logs contain changes only to Table A. Therefore, these changes can be obtained and applied to Table A_gst with ease.
  Note
  - For more information about triggers, see the Comparison between the trigger-based and triggerless designs section of this topic.
  - For more information about the comparison between the lock-free schema change feature of DMS and the gh-ost tool, see the Comparison between the lock-free schema change feature of DMS and the gh-ost tool section of this topic.

Comparison between the trigger-based and triggerless designs

Trigger-based design
Trigger-based tools have simple code logic. In most cases, you can use triggers to process data. For example, you can use triggers to process data in an implicit way and convert data types. This simplifies the complex process of migrating table data in real time.
Triggerless design
The key benefit of a triggerless design is database workload decoupling. A trigger-based design synchronizes each DML operation from the original table to the temporary table in a synchronous or an asynchronous manner. In the triggerless design, the process of writing data to the temporary table is decoupled from that of writing data to the original table.

Item	Trigger-based design	Triggerless design
Database overhead	A trigger is a stored procedure. As business develops, more DML operations are required. This increases the overheads that are generated by using triggers, particularly in peak hours.	No trigger-based overhead is generated. A triggerless tool imposes as a replicating server that reads the binary logs of primary and secondary databases and applies the events of the original table to the temporary table. This process is not related to the changes on the original table and does not require stored procedure in the database.
Lock contention	A trigger groups the operations that are performed on the original table and the temporary table in the same transaction. In this case, concurrent operations on the two tables may acquire the lock on the same object. This intensifies lock contention. In addition, trigger-based tools must concurrently copy data and change data. This further intensifies lock contention.	The process of writing data to the temporary table is separated from that of writing data to the original table. This prevents lock contention. In addition, to logically prevent and reduce lock contention, DMS copies and updates data in the temporary table at different time periods. This affects the efficiency in altering the table but significantly reduces the database load.
Exception handling	In a trigger-based design, triggers must keep running and cannot be suspended. Despite peak hours, exceptions, or latency in synchronization between primary and secondary databases, triggers cannot be canceled at any time during a schema change. Forced cancellation interrupts the schema change or causes data loss. This affects data accuracy in Table A_gst.	The decoupling feature allows you to suspend or slow down the threads that are used to acquire binary logs at any time. During peak hours or when the latency in synchronization between the primary and secondary databases is large, throttling can be enabled for the applications that are running. This helps prevent further issues.
Reliability verification	To verify a solution, you may want to know the duration of a task. If a trigger-based solution executes SQL statements to replicate data, you can create and use triggers to simulate data replication on a secondary database. However, when data is replicated based on rows, no trigger is required on the secondary database. This is because the triggers are running only on the primary database, and the secondary database directly applies the changes. In addition, even though SQL statements are executed to replicate data, you cannot simulate concurrent replication processes on the primary database. This is because a MySQL database uses a single thread to replay the replication process. Therefore, concurrency and locking issues cannot be verified or tested.	No difference exists between binary log-based operations and online operations on the primary and secondary databases. In addition, when you simulate the data change process in the secondary database, the original table and temporary table are not switched. This way, you can continue to verify the reliability by checking the data in both the original table and temporary table.
Code complexity	In trigger-based design, most tasks are performed by the database, and triggers are responsible for synchronization. Therefore, the tools play a relatively small role.	The triggerless design uses binary logs to synchronize data. This method is flexible but complex. A triggerless tool must be registered as a replicating server, obtain the events of the original table, convert the events to SQL statements, and then execute the SQL statements to apply the events to the temporary table. The code of the triggerless tool incorporates handling methods for exceptions such as connection failures, replication latency, data type mismatch, inappropriate load on applications, and uncontrollable exceptions. Therefore, the triggerless tool has a larger codebase and more complex logic for concurrency control.
Network traffic	In a trigger-based design, data is processed within a database.	A triggerless design requires that tools capture changes to the source table and apply the recorded changes to the destination table based on the binary logs. This generates traffic between hosts and occupies MySQL processes. In addition, to ensure robust and stable code logic, the code must be rigorously developed, and bountiful tests must be carried out. Despite the preceding challenges, the triggerless design provides substantial benefits. For example, you can specify when to exchange the names of the temporary and original tables. You can manage the process of data replication and implement throttling.

Comparison between the lock-free schema change feature of DMS and the gh-ost tool

The following table describes the comparison between the lock-free schema change feature of DMS and the gh-ost tool.

Item		Lock-free schema change feature in DMS	gh-ost
Historical data copy	Copy throttling	Y: manual or automatic	Y: manual
	Data consistency verification	Y	N
	Fault tolerance	Y	Y: partially supported
	Adaptive copy processing	Y	N
Incremental change replay	Replay throttling	Y: manual or automatic	Y: manual
	Multi-threading replay	Y	N
	Data consistency verification	Y	N
	Fault tolerance	Y	Y: partially supported
	Adaptive replay processing	Y	N
	Secondary database subscription	N	Y
Table switching	Atomicity of switching	Y	Y
	Delayed deletion of table replicas	Y	N
	Settings of window switching	Y	Y
	Lock protection mechanism	Y	N
Functionality	Secondary database changes	N	Y
	Seamless integration with Data Transmission Service (DTS)	Y	N
	Settings of policy changes based on DDL algorithm recognition	Y	N
	RocksDB engine	Y	N
	TokuDB engine	Y	N
	InnoDB engine	Y	Y
	Virtual column changes	Y	N
	JSON column changes	Y	Y
	Multi-valued indexes and function indexes	Y	-
	Tablespace optimization	Y	N
	Delayed deletion of table replicas	Y	N
	Installation and deployment	Y: installation free	Y: installation on the database server
	Visualized O&M	Y	N
	Visualized execution progress	Y	N

Y: indicates that the feature is supported.
N: indicates that the feature is not supported.
-: indicates that whether the feature is supported is unclear.