Community Blog Learning about Distributed Systems - Part 8: Improve Availability with Replications

Learning about Distributed Systems - Part 8: Improve Availability with Replications

Part 8 of this series discusses one of the core problems of distributed systems: availability.

By Qinxia

Disclaimer: This is a translated work of Qinxia's 漫谈分布式系统. All rights reserved to the original author.

The Second Core Issue

In the previous articles, we focused on one of the core issues of distribution: scalability. Only by solving this problem can the distributed system be truly distributed continuously.

Scalability allows data to be stored and computing to run. It is the core problem to be solved by distributed systems. After solving this problem, we should consider how to make the service run stably, so we can continue to benefit from the distributed capability.

This is often called availability. Only a system with high availability and high SLA is a trustworthy system.

The next articles will explore another core issue of distributed systems: availability.

The Only Choice for High Availability

There is only one way for a system in need of high availability: replication.

The reason is simple: physical failure is inevitable.

At the software level, no matter how advanced your design is and how complete your implementation is, you cannot withstand the sudden downtime of the server, the sudden power failure of the computer room, and the sudden disconnection of the network cable due to construction.

However, the impact time of physical failure is unpredictable. Network jitter may recover in milliseconds. Server downtime may take a few minutes to restart. However, hard disk damage may never be repaired. Server downtime may be restored after restarting, or it may need to be returned to the factory for repair.

You can only get one more replica. Be ready anytime. In the event of a failure, the backup is immediately switched to ensure the service is not interrupted.

In addition to making replication of data, it is often necessary to make replication of services.

It means to exchange money for availability.

The replication strategy is not unique to distributed systems. It has long been a mature practice in traditional fields. For example, disk data is backed up with RAID, and multiple instances of microservices are backups of services.

The scope of physical failure can be estimated. Server downtime will only affect that machine, and computer room power failure will not affect other computer rooms. It is helpful for us to adopt a different replication strategy.

Master-Slave in Replication

The easiest way of replication is to stand by a new replication. Usually, the original replication provides services, and standby replication is unaware. Only when the original replication fails, it performs the failover to provide services with the new replication.

Naturally, there is a master and a slave. (Leader & follower, master & slave, active & standby, etc., different systems have different names.) The failover brings about a state transition of the master-slave character.

The master-slave in replication can be played in many ways.

Single Leader Replication

The architecture mentioned above is straightforward. There will not be too many hidden dangers.

Many distributed systems use this approach, including HDFS (NameNode) and YARN (Resource Manager) mentioned in our previous articles.

However, many systems do not adopt this method. For example, in the HDFS DataNode, data has multiple replications, but it does not matter who is the master or slave. Each is equal.

Since it is so good, why not use a single leader in some places?

  • On the one hand, it takes time for failover to take effect, even if it is only a few seconds, during which the response will be blocked or discarded on the client.
  • On the other hand, since all the extra money is spent on availability, why not let replication play a role, such as providing read operations to relieve the performance pressure on the system?

Multi-Leader Replication

Therefore, each replication can provide external services with the multi-leader replication strategy.

This method has long been practiced in the database field, which is the master-master mode.

If the external services are subdivided into read and write, the roles in the multi-leader replication can be subdivided into read-only and read-write.

  • A fully peer-to-peer multi-leader is like a combination of MySQL and Tungsten.
  • Read-and-write leader and read-only leader: For example, the Observer NameNode in HDFS NameNode belongs to a read-only leader.

However, multiple masters, especially peer-to-peer read-write masters, can easily lead to conflict and confusion, or a more professional term we will mention later: consistency issues.

Imagine that in the case of concurrency, the same row of data in the database is changed to two different values by requests received by two different leaders at the same time. Which value should it accept at this time?

There is no uniform answer to this question since both requests are successfully modified by the corresponding leader. (You may say to use timestamps. What we write first is overwritten by what we write later. As we will see, things are not so simple.)

Even with this potentially serious problem, the multi-master replication architecture is still valuable.

A typical example is a database in multiple IDCs.

the same type of data for disaster recovery, capacity, or response latency often exists in data centers that are physically far away from each other. Each data center has its master-slave structure. It is relatively independent within the data center, and the data replication adopts the single leader replication mode. The data replication between data centers adopts the multi-leader mode.

Leaderless Replication

The two replication methods mentioned earlier (single-leader and multi-leader) may have the process of failover. The service may time out or be interrupted for a short time. Data consistency issues may occur.

In essence, as long as there is a leader and only one replication of a request is sent, at least one of the two problems will occur.

Therefore, there is leaderless replication. As the name implies, there is no leader. In this mode, requests are no longer sent to a leader but to many nodes.

For example, if the number of replications is three, write requests can be sent to three or two nodes at the same time.

It is equivalent to implementing an active data replication function on the client. The data replication in the first two modes is performed in the background of the server.

When reading data, considering the case of write failure, since there is no leader, it is not sure which node's data is the latest. It needs to request multiple nodes at the same time as writing data.

For example, if the number of replications is n, the number of write replications is w, and the number of read replications is r, only when w + r > n can the latest data be read.

Amazon's Dynamo and its open-source implementation Cassandra use this method of data replication.

Timeliness of Replication

Multiple replications provide high availability, and the data on the replications are replicated. Therefore, the speed of data replication (the timeliness of replication) directly affects the level of availability.

Synchronous Replication

If an unrecoverable downtime occurs before the data can be replicated to another replication, the data is completely lost.

Therefore, the simplest way is to ensure the data is replicated to all replications and then return OK to the client. This is called synchronous replication.

The drawbacks of this approach are also clear. The performance is bad.

If the network is jittered at any time, or the follower's processing performance is slightly behind, the overall performance will be compromised.

In case of switch failure or follower machine downtime, all requests are blocked directly.

Asynchronous Replication

Even if the availability is high, this is also impractical without performance.

It is natural to have the idea of asynchronous replication to solve the performance problem.

After the leader receives the data, it immediately returns OK to the client and continues to process other client requests. Data replication is left to another thread to handle asynchronously.

As such, performance is naturally maximized.

However, the shortcomings are clear. If the leader goes down before it can synchronize data with the follower, it may lose data.

Semi-Synchronous Replication

It seems we cannot achieve both the performance and availability metrics in the data replication scenario.

Since it is unacceptable to lose either, we can only compromise.

So, there is the semi-synchronous way.

For example, in the case of only three replications, the leader immediately lands to the local after receiving the client data, replicates it to one of the followers in a synchronous manner, and returns OK to the client. The remaining follower asynchronously replicates the data.

Although the way of compromise is not perfect, it may be a more practical and ideal solution for most scenarios.

For a distributed system, it is better to leave this choice to the user. Kafka is a good example, which allows users to set the data synchronization method according to the usage scenario.


This article roughly analyzes another core problem faced by distributed systems: availability.

  • The only way to high availability is replication. Physical failure is inevitable, and we can only spend money on backups.
  • Master-slave and timeliness are two important issues to consider in replication.
  • There are three main modes of master-slave: single master, multi-master, and masterless.
  • There are three main modes of timeliness: synchronous, asynchronous, and semi-synchronous.

The preceding content has inevitably mentioned timeliness, network jitter, and other issues. These problems are accompanied by the introduction of the replication mechanism to solve the high availability problem. These problems are inevitable and can lead to many serious consequences.

Let's discuss the price of replication in the next article.

This is a carefully conceived series of 20-30 articles. I hope to give everyone a core grasp of the distributed system in a storytelling way. Stay tuned for the next one!

0 1 0
Share on

Alibaba Cloud_Academy

61 posts | 47 followers

You may also like