Community Blog Breaking the Limits of Relational Databases: An Analysis of Cloud-Native Database Middleware (2)

Breaking the Limits of Relational Databases: An Analysis of Cloud-Native Database Middleware (2)

This article provides an in-depth insight into cloud-native database technology, focusing on the core functions and implementation principles of transparent sharding middleware.

In Part 1 of this article, we covered the concept and implementation of sharding in NewSQL. In Part 2, we will discuss in further detail about distributed transactions and database governance.

Distributed Transactions

As mentioned in the previous article, database transactions must meet the Atomicity, Consistency, Isolation, and Durability (ACID) standard:

  1. Atomicity means that transactions are executed as a whole. Namely, either all or none transactions are executed at a time.
  2. Consistency means that transactions must ensure that the data changes from one consistent state to another consistent state.
  3. Isolation means that when multiple transactions are concurrently executed, the execution of one transaction should be immune to that of other transactions.
  4. Durability means that the transaction modification data that has been submitted is permanently stored.

On a single data node, transactions that can be accessed only by a single database resource are called local transactions. However, in the SOA-based distributed application environment, more and more applications require that the same transaction can access multiple database resources and services. To meet this requirement, distributed transactions have emerged.

Although relational databases provide perfect native ACID support for local transactions, they inhibit system performance in distribution scenarios. The top priority for distributed transactions is how to either enable databases to meet the ACID standard in distribution scenarios or find an alternative solution.

1. XA Protocol

The earliest distributed transaction model was X/Open Distributed Transaction Processing (DTP), or the XA protocol for short.

In the DTP model, a global transaction manager (TM) is used to interact with multiple resource managers (RMs). The global TM manages the global transaction status and the resources involved in the transactions. The RM is responsible for specific resource operations. The following shows the relationship between the DTP model and the application:

DTP model

The XA protocol uses two-phase commit to ensure the atomicity of distributed transactions. The commit process is divided into the prepare phase and the commit phase.

  1. In the prepare phase, the TM sends a preparation message to each RM to acknowledge whether the local transaction is successful.
  2. In the commit phase, if the TM receives a success message from each RM, the TM sends a commit message to each RM; otherwise, the TM sends a rollback message to each RM. Based on the received message, each RM performs the commit or rollback operation on the local transaction.

The following figure shows the transaction process of the XA protocol:

XA transaction process

Two-phase commit is the standard implementation of the XA protocol. It divides the commit process of a distributed transaction into two phases: prepare and commit/rollback.

After an XA global transaction is started, all transaction branches lock the resources based on the local default isolation level and record the undo and redo logs. Then, the TM initiates a prepare vote to query all the transaction branches whether the transaction can be committed. If all the transaction branches reply "yes", the TM initiates commit again. If a reply is "no", TM initiates rollback. If all replies in the prepare phase are "yes" but an exception (such as system downtime) occurs in the commit process, after the node service is restarted, the transaction is committed again based on XA_recover to ensure data consistency.

The distributed transactions implemented based on the XA protocol are not intrusive for the business. Its greatest advantage is the transparency to users. Users can use XA-based distributed transactions the same way they use local transactions. The XA protocol can rigidly implement the ACID standard of transactions.

However, the rigid implementation of the ACID standard of transactions is a double-edged sword.

During transaction execution, all the required resources must be locked. Therefore, the XA protocol is more suitable for short transactions with a fixed execution time. In a long transaction, data must be used exclusively, resulting in a significant decline in the concurrent performance of the business systems that rely on hotspot data. Because of this, in scenarios where high-concurrency performance is the top concern, XA-based distributed transactions are not the best choice.

2. Soft Transactions

If the transaction that implements the ACID standard is a rigid transaction, then the BASE-based transaction is a soft transaction. BASE refers to the combination of Basic Availability, Soft State, and Eventual Consistency.

  1. The Basic Availability feature ensures that distributed transaction participants do not have to all be online at the same time.
  2. The Soft State feature allows a certain latency for system status updates, which may be imperceptible to users.
  3. The Eventual Consistent feature means that the final consistency of the system is ensured by message reachability.

ACID transactions impose rigid isolation requirements. During transaction execution, all the resources must be locked. The idea of the soft transaction is to shift the mutex operation from the resource level to the business level through the business logic. The soft transaction can improve the system throughput by lowering the strong consistency requirements.

Given that a timeout retry may occur in a distributed system, operations in a soft transaction must be idempotent to prevent the problems caused by multiple requests. Models for implementing soft transactions include Best Efforts Delivery (BED), Saga, and Try-Confirm-Cancel (TCC).


BED is the simplest type of soft transaction. It is applicable to scenarios where database operations do succeed. NewSQL automatically records the failed SQL statements and reruns these statements until they are successfully executed. The rollback function is unavailable to BED soft transactions.
The implementation of BED soft transactions is very simple but imposes rigid scenario requirements. On one hand, BED features unlocked resources and minimal performance loss. On the other hand, its disadvantage lies in the transaction not being possible to roll back when multiple commit attempts fail. BED is solely applicable to business scenarios where transactions do succeed. BED improves performance at the cost of the transaction rollback function.


Saga was derived from a paper published in 1987 by Hector Garcia-Molina and Kenneth Salem.

Saga transactions are more suitable for scenarios where long-running transactions are used. A saga transaction consists of several local transactions, each of which has a transaction module and compensation module. When any of these local transactions becomes faulty, the corresponding compensation method is called to ensure the final consistency of the transactions.

The saga model splits a distributed transaction into multiple local transactions. Each of these local transactions has its own transaction module and compensation module. When any of these local transactions fails, the corresponding compensation method is called to restore the original transaction, ensuring the final consistency of the transactions.

Assume that all saga transaction branches (T1, T2, ..., Tn) have their corresponding compensations (C1, C2, ..., Cn-1). Then, the saga system can ensure the following:

  1. The transaction branch sequence (T1, T2, ..., Tn) is completed. This is the optimal situation for a transaction, where no rollback is required.
  2. The sequence (T1, T2, ...,Tx, Cx, ..., C2, C1) where "x" is less than "n" is completed. When rollback occurs, this ensures that the compensation operation is performed in the reverse order of the forward operations.

The saga model supports both forward recovery and reverse recovery. Forward recovery attempts to retry the transaction branch that currently fails. It can be implemented on the precondition that every transaction branch does succeed. In contrast, backward recovery compensates all the completed transaction branches when any transaction branch fails.

Obviously, providing compensation for transactions in forward recovery is unnecessary. If the transaction branches in the business will eventually succeed, forward recovery can lower the complexity of the saga model. In addition, forward recovery is also a good choice when compensation transactions are difficult to be implemented.

In theory, compensation transactions never fail. However, in a distributed world, servers may crash, networks may fail, and IDCs may experience power failures. Therefore, it is necessary to provide a rollback mechanism upon fault recovery, such as manual intervention.

The saga model removes the prepare phase, which exists for the XA protocol. Therefore, transactions are not isolated. For this reason, when two saga transactions simultaneously use the same resource, problems, such as missing updates and reading of dirty data, may occur. If this is the case, for an application that uses saga transactions, the logic of resource locking must be added to the application-level logic.


TCC implements distributed transactions by breaking down the business logic. As the name implies, the TCC transaction model requires the business system to provide the following sections of business logic:

  1. Try. It completes the business check and reserves the resources required for the business. The try operation is the key component of the entire TCC. It allows you to flexibly choose the granularity of the business resource lock.
  2. Confirm. It executes the business logic and directly uses the business resources reserved in the try phase without performing any further business checks.
  3. Cancel. It releases the business resources reserved in the Try phase.

The TCC model only provides a two-phase atomic commit protocol to ensure the atomicity of distributed transactions. The isolation of transactions is implemented by the business logic. In the TCC model, isolation is to shift locks from the database resource level to the business level by transforming the business. This releases underlying database resources, lowers the requirements of the distributed transaction lock protocol, and improves system concurrency.

Although the TCC model is the most ideal for implementing soft transactions, the application must provide three interfaces that can be called by the TM to implement the Try, Confirm, and Cancel operations. Therefore, the business transformation is relatively costly.

Assume that account A transfers 100 dollars to account B. The following figure shows the transformation of the business to support TCC:

TCC process

The Try, Confirm, and Cancel interfaces must be separately implemented for remittance and collection services. Meanwhile, they must be injected to the TCC TM in the service initialization phase.

Remittance Service


  1. Check the effectiveness of account A, namely, check whether the status of account A is "Transferring" or "Frozen".
  2. Check whether the balance of account A is sufficient.
  3. Deduct 100 dollars from account A and update the status to "Transferring".
  4. Reserve the deduction resource and store the event of transferring 100 dollars from account A to account B into a message or log.


  1. Perform no operations.


  1. Add 100 dollars to account A.
  2. Release the deduction resource from the message or log.

Collection Service


  1. Check whether account B is valid.


  1. Read the message or log, and add 100 dollars to account B.
  2. Release the deduction resource from the message or log.


  1. Perform no operations.

The preceding description indicates that the TCC model is intrusive for the business and the transformation difficulty is high.

Message-Driven Transaction Model

The message consistency scheme is used to ensure the consistency of upstream and downstream application data operations through the message middleware. The basic idea is to place local operations and sending messages into a local transaction. Then, the downstream applications subscribe to a message from the messaging system and perform the corresponding operation after receiving the message. This essentially relies on the message retry mechanism to achieve final consistency. The following figure shows the message-driven transaction model:

Message-driven transaction model

The disadvantages of this model stem from the fact that the degree of coupling is high and the message middleware must be introduced to the business system, increasing the complexity of the system.

In general, neither ACID-based strong-consistency transactions nor BASE-based final-consistency transactions are "silver bullets". Therefore, for either to be optimally useful, they must be used in their most appropriate scenarios. The following table provides a comparison of these models to help developers choose the most suitable model. Due to a high degree of coupling between message-driven transactions and business systems, this model is excluded from the table.

Comparison of transaction models

The pursuit of strong consistency will not necessarily lead to the most rational solution. For distributed systems, we recommend that you use the "soft outside and hard inside" design scheme. "Soft outside" means using soft transactions across data shards to ensure the final consistency of data in exchange for optimal performance. "Hard inside' means using local transactions within the same data shard to achieve the ACID standard.

Database Governance

1. Basic Governance

As described in the previous article, service governance is also applicable to the basic governance of the database. Basic governance includes the configuration center, registry, rate limiter, circuit breaker, failover, and tracker.

  1. The configuration center is used for centralized configuration, dynamic configuration updates, and notification delivery.
  2. The registry is used for service discovery, where the service refers to the database middle-layer instance itself. Through the database middle-layer instance, status monitoring and automatic notification delivery can be implemented, providing the database middleware with high availability and a self-healing capability.
  3. The rate limiter is used for traffic overload protection, which is divided into traffic overload protection for the database middleware itself and that for the database.
  4. The circuit breaker is another protection measure for traffic overloading. It blocks all the client's access attempts to the database so that the database can continue to provide services for other systems with a normal traffic. The circuit breaker mode mentioned in the previous article can be used to implement automatic fusing.
  5. Failover is used for multiple data nodes with completely consistent data when multiple data copies are available. When a node becomes unavailable, the failover mechanism instructs the database middleware to access another active data node.
  6. The tracker is used to visualize the indicators related to database access, such as the called links, performance, and topological relationships.

2. Auto Scaling

The major difference between database governance and service governance is that the database is stateful and each data node has its own persistent data. This makes it difficult to achieve auto scaling as a service does.

When the access traffic and data volume of the system exceeds the previously evaluated expectations, in most cases, the database needs to be resharded. When using policies such as date-based sharding, you can directly expand the capacity without migrating legacy data. However, in most scenarios, legacy data in a database cannot be directly mapped to a new sharding policy. To modify the sharding policy, you must migrate the data.

For traditional systems, the most feasible scheme is to stop services, migrate the data, and then restart the services. However, adopting this scheme imposes extremely high data migration costs on the business side, and engineers of the business side must accurately estimate the amount of data.

In Internet scenarios, the system availability requirement is quite demanding, and the possibility of explosive traffic growth is higher than that in traditional industries. In the cloud=native service architecture, auto scaling is a common requirement and can be easily implemented. This makes the auto scaling function of data, which is equivalent to that of services, an important capability for cloud-native databases.

In addition to system pre-sharding, another implementation of auto scaling is online data migration. Online data migration is often described as "changing the engines of an airplane in mid-flight". Its biggest challenge is ensuring that the migration process is immune to services. Online data migration can be performed after the sharding policy of the database is modified. This may be, for example, changing the sharding mode of splitting the database into 4 smaller databases based on the result of ID mod 4 to the sharding mode of splitting the database into 16 smaller databases based on the result of ID mod 16. Meanwhile, a series of system operations can be performed to ensure that the data is correctly migrated to the new data node and that the database-dependent services are completely unaware of the migration process.

Online data migration can be implemented in four steps:

  1. Simultaneous online writing on two nodes. During this step, data is concurrently written on the original data node before the sharding policy is modified as well as on the new data node after the sharding policy is modified. The consistency algorithms, such as Paxos or Raft, can be used to ensure the data written to both nodes are consistent.
  2. Historical data migration. During this step, historical data is offline migrated from the original data node to the new data node. To do this, you can use SQL or binlog statements.
  3. Data source switching. During this step, read and write requests are switched to the new data source, and no more data is written to the original data node.
  4. Cleanup of the redundant data. During this step, the relevant data that has been migrated to the new data node is cleared on the original data node.

Online data migration can be implemented not only to expand the data capacity but also to support online DDL operations. Database-native DDL operations do not support transactions. In addition, tables will be locked for a long time when you perform DDL operations on a large number of tables. Therefore, you can perform online data migration to support online DDL operations. The online DDL operation is consistent with the data migration process, which simply requires you to create an empty table with the modified DDL and then complete the preceding four steps.

1 1 0
Share on

Alibaba Clouder

2,605 posts | 739 followers

You may also like


Raja_KT February 8, 2019 at 8:11 am

My say is that database vendor(s), should document upfront on the features, they support and what it can and what it cannot. I have provided lots of inputs to vendors of HTAP, NewSQL as a Technical user, so that it will be easy for those who do use or do PoCs.