By Wuzhe
This article begins with the global secondary index.
When we build a global secondary index, a single logical data insertion becomes two physical insertions: one into the primary table and the other into the index table. To ensure data consistency between the primary table and the index table, we often need to enable distributed transactions and then insert the two pieces of data in parallel. But what if one data insertion fails? For example, a unique key conflict error occurs on the index, while the data has already been inserted into the primary table. Of course, we can simply roll back the entire transaction to ensure data consistency.
However, sometimes we've already executed a large number of operations within a transaction. Rolling back the entire transaction just because one data insertion failed can be too costly. For standalone MySQL, if a unique key conflict occurs during a UK insertion, the statement that caused the conflict will be automatically rolled back. Then, the business side decides whether to ignore the error and continue executing the transaction or to roll back the entire transaction. As a fully MySQL-compatible distributed database, PolarDB-X should also have this feature.
In fact, this applies not only to global secondary indexes but also to other scenarios, such as batch insert/delete/update and DML statements for broadcast tables.
When we need to roll back only a single or multiple statements instead of the entire transaction, naturally, we think of using the savepoint feature. In a transaction, we can set a savepoint at any time and roll back to this savepoint so that all operations after the savepoint can be rolled back. How does MySQL implement the savepoint feature?
In the server layer, MySQL maintains a linked list of savepoints for each transaction object, recording the savepoint objects that have been set for this transaction. Each savepoint object records the savepoint name to identify different savepoint objects. When a savepoint is set, a new savepoint object is inserted at the end of the linked list. When a savepoint is released, the system traverses the linked list based on the savepoint name to find the corresponding savepoint object and then deletes it and all subsequent savepoints. When a savepoint is rolled back, the system finds the corresponding savepoint object and performs a rollback based on the stored information. Subsequently, all subsequent savepoints (excluding itself) are implicitly released.
As we can see, each savepoint object needs to store certain information to inform binlog and InnoDB of the rollback position. For binlog records, the offset of the binlog cache at the time the savepoint was set is stored. For InnoDB, the undo number of the undo log at the time the savepoint was set is stored. These two simple pieces of information are enough for binlog and InnoDB to complete the rollback operation.
In fact, InnoDB also maintains a linked list of savepoints for transactions, but essentially, it is not much different from the linked list mentioned above, so we will not discuss it in detail.
How can PolarDB-X use the savepoint of DN to solve the problem of the global secondary index mentioned at the beginning?
In fact, the method is also very simple. We only need to add a savepoint before any physical statement is executed, and roll back or release the savepoint as appropriate after all physical statements are executed. This behavior is referred to as auto-savepoint. InnoDB uses a similar method. Before each statement is executed (actually after the previous statement is executed), InnoDB will update an anonymous savepoint object last_sql_stat_start which saves the undo number after the previous statement is executed. When an error occurs in the execution of the current statement, the undo number is used to roll back the operation of this statement. Those who are familiar with PolarDB-X know that it executes physical SQL through physical connections (connections between compute nodes and data nodes). A logical SQL statement to update GSI may need to use 2 physical connections and execute 3 physical SQL statements (one to update the primary table, one to delete the GSI table, and one to insert the GSI table).
Sample code:
Physical connection 0 (physical database shards 0):
update primary_tb; insert gsi_tb;
Physical connection 1 (physical database shards 1):
delete gsi_tb;
The key to setting an auto-savepoint is to set a savepoint at the right time. In this example, if any physical connection encounters an error, other connections will be notified to interrupt the ongoing operations. Suppose an error occurs while the delete gsi_tb is executed on physical connection 1. We do not have specific details about the execution status on physical connection 0. We don't know which statements have been successfully executed, which statements have failed, and which statements have not yet been executed. In this case, we can use the savepoint capability to roll back a single logical SQL statement to a state where all operations have not yet been performed, regardless of the specific execution status.
Therefore, the behavior of the savepoint we automatically set is as follows:
Physical connection 0 (physical database shards 0):
savepoint `s0`; update primary_tb; insert gsi_tb; rollback to savepoint `s0`;
Physical connection 1 (physical database shards 1):
savepoint `s0`; delete gsi_tb (ERROR); rollback to savepoint `s0`;
Of course, the design here ensures that all physical connections involved in a logical SQL statement set savepoints correctly. This ensures that the setting and rollback of savepoints are handled properly to prevent data inconsistency issues.
We use the savepoint capability of DN to roll back a single statement at the CN level. Although the cost of setting and releasing savepoints is relatively low, it is still important to minimize this cost in implementation.
First, we should avoid setting savepoints as much as possible and only set them automatically when GSI or other logic execution involves DML. Since inconsistencies between shards only occur during logical execution, auto-savepoint is necessary to ensure the atomicity of logical statements.
Furthermore, we set and release savepoints using multi-statements to send the savepoint SQL along with the physical SQL generated by the business, thereby avoiding additional round-trip times (RTT).
Finally, we also use the private protocol to bypass the parsing process of savepoint SQL and directly call the code to set and release savepoints on DN.
Try out database products for free:
How Does PolarDB Overcome the Classic Challenges of Multi-Master Architecture?
Apache Flink Community China - September 27, 2019
Apache Flink Community China - January 9, 2020
Apache Flink Community China - August 8, 2022
Apache Flink Community China - September 27, 2019
Alibaba Clouder - November 12, 2018
Apache Flink Community China - January 9, 2020
Alibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreAlibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAlibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.
Learn MoreTSDB is a stable, reliable, and cost-effective online high-performance time series database service.
Learn MoreMore Posts by ApsaraDB