This topic describes the definition and implementation of strongly consistent distributed transactions.
Distributed ACID transactions
Distributed transactions in PolarDB-X support MVCC. TSO is used to generate monotonically increasing timestamps for distributed version numbers at the global level. Versions are generated based on global timestamps to provide consistent reads within transactions.
Global Meta Service (GMS) provided by PolarDB-X is a Paxos-based service that uses a three-node architecture to provide high availability. The compute node of PolarDB-X uses a remote procedure call (RPC) interface to communicate with GMS and obtain global timestamps.
2PC for transactions
BEGIN; UPDATE account SET balance = balance - 20 WHERE name = 'Alice'; UPDATE account SET balance = balance + 20 WHERE name = 'Bob'; COMMIT;
If a transaction writes data into multiple partitions, the compute node of PolarDB-X uses 2PC to commit the transaction. 2PC provides transaction recovery mechanism to ensure the atomicity of the transaction in case of problems such as a node downtime during the transaction commit process.
SELECT SUM(balance) FROM account;
The data that you want to query is stored in multiple partitions. PolarDB-X obtains a global timestamp as the version of data to be read. During the read process, PolarDB-X checks whether each version of every row of data is visible. This can ensure that PolarDB-X reads only the data that is written by transactions that are committed before the global timestamp.
PolarDB-X uses multiple methods to ensure consistent reads. For example, PolarDB-X commits a money transfer transaction at different points in time on different data nodes. For committed transaction branches, PolarDB-X marks the data that fails the visibility check as invisible. For transaction branches that are being committed, PolarDB-X marks all data as invisible.
Support of downstream features for distributed transactions
Consistency in read/writing splitting mode
In most cases, distributed transactional databases use read/write splitting to improve read performance. If read/write splitting is enabled for a database, distributed transactions are required to ensure that data is consistent among data nodes within primary and secondary databases.
A PolarDB-X instance consists of a primary instance and several read-only instances. The primary instance runs in the leader/follower model of Paxos and consists of a leader and several followers. A read-only instance consists of one or more Paxos learners. PolarDB-X stores transaction information in the primary replica of the leader data node. PolarDB-X also synchronizes the multi-version information to the replicas of the learners. This ensures consistent reads from multiple partitions on read-only instances. For more information, see HTAP.
For example, a row of data is written to the primary database, and the version number of the data is 100. Next time the data is requested, PolarDB-X returns the data together with the global timestamp. The version number of the returned data is 101. When the query request is routed to a learner on a read-only instance, PolarDB-X can block the read request to ensure read consistency even if a replication delay exists on the learner.
Global data change logging
Distributed transactional databases can process highly concurrent write requests from online applications. In most cases, distributed transactional databases are also used to synchronize online data to downstream applications for disaster recovery, service aggregation, or data warehousing. The downstream applications have high requirements for consistent transaction logs. For example, transactions must be run in sequence, the atomicity of transactions must be ensured, and DDL operations can be synchronized.
A MySQL binary log is a binary log file that is provided by MySQL to record changes in data. A MySQL binary log can be regarded as a message queue. The queue stores detailed incremental change information in MySQL in chronological order. Downstream systems or tools consume the change entries in the queue to synchronize data from MySQL in real time. This mechanism is also referred to as change data capture (CDC).
Data nodes in PolarDB-X can store information about distributed transactions in change logs. PolarDB-X uses the CDC component to manage log streams from the data nodes in the distributed system. PolarDB-X collects, reorganizes, and sorts the log streams and flushes the log streams into disks. This way, PolarDB-X databases can provide binary logs that comply with the consistency semantics of distributed transactions and are fully compatible with the binary log protocol and ecosystems of MySQL. For more information, see Global data change logging.
Consistent backup and restoration
Most of the traditional relational databases use full backups to ensure data security. In case of risks such as data anomalies and intentional database deletions, backup sets are used to restore the data in databases. For data restoration, transactional consistency must be ensured. In most cases, distributed databases contain large volumes of data. This poses great challenges for consistent backup and restoration.
PolarDB-X stores information about global timestamps of distributed transactions in the data and change logs of data nodes. When PolarDB-X restores data to a point in time, PolarDB-X can convert a timestamp into a distributed global timestamp and restore data based on the visibility of each data version. PolarDB-X also uses a distributed architecture that consists of multiple nodes to provide parallel execution. This makes backup and restoration more efficient. For more information, see Data backup and restoration.