×
Community Blog Single-Step Implementation of Two-Phase Membership Change

Single-Step Implementation of Two-Phase Membership Change

This article discusses the single-step implementation of two-phase Joint Consensus membership change and proposes some ways of improvement, providing ...

Introduction

Nodes often fail during the operation of the distributed system. You must add, delete, and replace nodes on demand.

Membership change is an important topic in a distributed system, especially in a consistency system. It helps with improving O&M capabilities and service availability.

The Joint Consensus method of two-phase membership change proposed in Raft is the mainstream membership change method in the industry, promoting the engineering application of membership changes substantially. However, Joint Consensus includes two phases of membership changes with two logs to be proposed for each change. This may cause inconvenience in some systems. Raft also proposed the single-step membership change method. However, the single-step membership change method can only add or remove one member at a time, which is highly restrictive and fallible. Therefore, this method is generally not recommended.

It is natural to wonder whether membership change through Joint Consensus can be implemented only in one step. This article discusses this topic.

Membership Change

Membership change refers to a change on nodes that are subject to the consensus protocol during cluster operation, such as node addition, removal, or replacement. The process of membership change shouldn't affect system availability.

Membership change also involves the consensus issue, meaning all nodes agree on the member configuration. However, membership change has its own particularity because, in the process of membership change, the members participating in the voting will change.

1
Figure 1At some point of membership change, two disassociated majorities exist simultaneously in Cold and Cnew

If membership change is regarded as a general consensus issue during the membership change process, there may be differences in time as each node was switched from Cold to Cnew. At a certain time, two disassociated majorities may exist simultaneously, which results in a double Quorum problem and destroys consensus.

To solve this problem, Raft uses Joint Consensus, a two-phase method for membership change.

Joint Consensus Membership Change

A joint member configuration called Cold,new is added as the transition configuration to avoid the double Quorum problem in Joint Consensus for membership change. Cold,new is the combination of Cold and Cnew. Cold and the Quorum of Cold,new have intersection, and Cold,new and the Quorum of Cnew also intersect. Membership change starts from the switch from Cold to Cold,new. After Cold,new completes submission, switch to Cnew. This process guarantees that Cold and Cnew are not used at the same time, and double Quorum is avoided, which guarantees security.

2
Figure 2The relationship between the Quorum of Cold, Cold,new, and Cnew

Joint Consensus uses two logs to complete the membership change process. After receiving the change request, the Leader synchronizes a Cold,new log to Cold and Cnew, and then all logs need the confirmation of the majority from Cold and Cnew. The Cold,new log can only be submitted after the majority of Cold and Cnew agree on a consensus. Then, the Leader synchronizes a log only containing Cnew to Cold and Cnew. Afterward, logs only need the confirmation of the Cnew majority. The Cnew log can be submitted once the majority in Cnew agrees on a consensus. At this time, membership change is completed, and members not included in Cnew are disabled automatically.

3
Figure 3Membership change process of Joint Consensus

If a Failover occurs during membership change and the old Leader is down, any node in Cold,new may become the new Leader. If the new Leader does not have Cold,new logs, continue to use Cold. If Cold,new logs exist on the Follower, the log will be truncated by the new Leader and rolled back to Cold, and the membership change fails. If the new Leader has Cold,new logs, the unfinished membership change process continues.

Single-Step Membership Change

Two phases are required for Joint Consensus membership change because no assumptions are made in terms of the relationship between Cold and Cnew. The two-phase scheme is introduced to avoid the double Quorum issue resulting from the disassociated Quorum of Cold and Cnew, respectively.

When the membership change restriction is enhanced, assuming that the Quorum intersection between Cold and Cnew is not empty, the double Quorum issue does not occur. Therefore, the membership change can be simplified into one phase.

The key to implementing single-step membership change is to restrict Cold and Cnew and make sure that the Quorum intersection between Cold and Cnew is not empty. How can we do that? The method is to add or delete only one member for each membership change.

4
Figure 4Quorum of Cold and Cnew when adding or removing one member

The situation of adding or removing one member, as shown in figure 4, can be strictly proved mathematically. As long as only one member is allowed to be added or removed at a time, it is impossible to form two disassociated Quorum in Cold and Cnew. By doing so, Cold can be switched to Cnew directly. The transition member configuration is not required to implement the single-step membership change.

You can only change one member at a time in the single-step membership change. For example, you can change or replace multiple members by performing single-step membership change multiple times.

Although the theory of single-step membership change is simple, it causes many problems. In practice, it is not that simple. A previous article entitled Raft Engineering Practices and the Cluster Membership Change has described that topic in detail.

Single-Step Implementation of Two-Phase Membership Change

Joint Consensus membership change is more common, but it involves two phases. One membership change requires the submission of two logs. As for single-step membership change, one membership change only requires one log to be submitted, but only one member can be changed at a time. Can the advantages of the two be combined? Can Joint Consensus membership change be implemented in a single-step manner?

During the membership change process of Joint Consensus, the submission of the Cold,new log has helped each node reach consensus on the Cnew configuration. Then, what is the role of Cnew logs? Can the switch from Cold,new to Cnew be achieved once the Cold,new log is submitted? After this, does it mean that the Cnew log is no longer necessary and single-step implementation is achieved?

Considering the function of the Cnew log in Joint Consensus membership change, a proposal is initiated in Cnew after the Cold,new log is submitted. After the node receives and persists the Cnew log, the Cold,new configuration is switched to the Cnew configuration. Members not in the Cnew configuration are disabled after the log is submitted. According to this process, the function of the Cnew log can be summarized below:

  1. Notify the node to switch from the Cold,new configuration to the Cnew configuration after receiving and persisting the Cnew log.
  2. Notify the nodes not in the Cnew configuration to shut down after the Cnew log is submitted.
  3. If a Failover occurs during membership change, local nodes with the Cnew log have the priority to be elected.

If the work of the Cnew log can be finished without using it, doesn't it mean the two-phase Joint Consensus membership change can be achieved in a single step? This approach has been explored systematically.

ZooKeeper Membership Change

ZooKeeper supports membership change based on Zab starting from Version 3.5.0. ZooKeeper has the Primary Order feature, while the Joint Consensus membership change using two logs cannot guarantee this feature. To make the membership change universal without losing the Primary Order feature, ZooKeeper proposed its membership change method in a paper titled Dynamic Reconfiguration of Primary/Backup Clusters and applied this method. ZooKeeper did this earlier than Raft.

Figure 5 shows the ZooKeeper membership change protocol. In the figure, the old member configuration is represented by S, and the new member configuration is represented by S', with P being the Leader node. Figure 5 shows the process of replacing nodes B1 and B2 with nodes B3 and B4:

5
Figure 5ZooKeeper membership change protocol

Initialization: To enable the new node to obtain the latest data, the new nodes B3 and B4 in the new member configuration S' will connect to the current master node P first. Then, P transmits its current state to them as their initial state. In the Zab protocol, when the secondary node is connected to the primary node, such transmission occurs automatically, and the secondary node continues to receive all subsequent operation logs (such as Op1 and Op2 in the figure) from the primary node P. In this process, nodes B3 and B4 do not participate in the voting.

  • Step 1: The primary node P sends a membership change log COP to all secondary nodes (S, U, and S') connected to it. The COP log carries the old member configuration S and the new member configuration S' and waits for node confirmation from nodes in the old member configuration S. Once the majority in S confirms the COP log, a consensus is reached on S'.
  • Step 2: The logs generated before the COP log only need to be confirmed by the majority in the old member configuration S. The logs can be submitted in the old member configuration and the new member configuration (S, U, and S'). The logs generated after the COP command and before the ACTIVATE message of S' must be confirmed by the majority of the new and old member (S, U, and S') configurations and can only be submitted in S'. The logs generated after the ACTIVATE message of S' only need to be confirmed by the majority of S' and submitted in S'.
  • Step 3: The primary node P waits for the confirmation for the COP log and the previous logs in S'.
  • Step 4: Once the majorities of the new and old member (S, U, and S') configurations have confirmed the COP log, the primary node P will submit the COP log. It also broadcasts an ACTIVATE message to activate the new member configuration S' to complete the membership change. Similar to the log synchronization message, the ACTIVATE message contains the Epoch of the primary node P. The ACTIVATE message that carries outdated Epoch will be ignored.

If a Failover occurs during membership changes, the following situations may occur:

  • If the Failover occurs before a COP log is sent, the membership change will fail, and the work will continue after a new leader is elected from the old member configurations.

If the Failover occurs after the COP log is sent and before ACTIVATE, any node in the new or old member configurations may become a new Leader. If no COP log exists on the new Leader, the membership change fails. If the COP log exists on the new Leader, the unfinished membership change process will resume.

If the Failover occurs after ACTIVATE, the membership change has been completed, but there is no guarantee that the new Leader must be in the new member configuration. At this time, the nodes that are not in the new member configuration cannot be disabled. Therefore, a no-op log must be submitted in the new member configuration after the ACTIVATE message is sent. After the no-op log is submitted, it can be ensured that the new Leader is in the new member configuration, and those nodes not included in the new member configuration can be disabled safely.

ZooKeeper uses the asynchronous Commit message, namely the ACTIVATE message, to notify the node to switch from the old member configuration to the new member configuration. Asynchronous no-op log enables nodes that are not in the new member configuration to be disabled safely. The ACTIVATE message and asynchronous no-op log of ZooKeeper serve as the Cnew log in Joint Consensus membership change.

Improved Single-Step Implementation

The ZooKeeper membership change protocol is not as concise as the Joint Consensus membership change protocol. The Joint Consensus membership change protocol can be used through two phases to ensure the security of membership changes without imposing too many restrictions. Can the ZooKeeper membership change protocol be improved?

The asynchronous ACTIVATE message and no-op log exist in the ZooKeeper membership change protocol to give play to the function of the Cnew log. If this is understood, the Cnew log of the Joint Consensus membership change can be changed into an asynchronous log. After the Cold,new log is submitted, the membership change is considered completed, and the Cnew log can be submitted asynchronously. Once the Cold,new log is submitted, all nodes have agreed on the new member configuration and will never roll back to the old Member configuration. The remaining process will be completed, and the Cnew log will be submitted.

Another method of improvement is to keep the ACTIVATE message, but it does not use the no-op log. How can we ensure that the node that switches to new member configuration has the priority to be elected? Based on the election security, the node with the latest log has the priority to be elected. Thus, for nodes with the current member configuration, if logs are all the latest, votes are cast preferentially on the nodes that are switched to the new member configuration. By doing so, the nodes that are switched to the new member configuration have the priority to be elected. After most nodes are switched to the new member configuration, nodes that are not in the new member configuration can be disabled safely.

Conclusion

The proposal of Joint Consensus membership change facilitates the engineering application of membership changes significantly. It is simple and versatile but uses two phases. Two logs must be submitted for a change. This article discusses the single-step implementation of two-phase Joint Consensus membership change and proposes some ways of improvement, providing more options for the engineering application of membership change.

Reflection

  1. Why will the Cnew log be submitted eventually after the Cold,new log is submitted?
  2. How do I use the ACTIVATE message to ensure that the nodes continue to use the new member configuration if they are restarted after they switch to the new member configuration?
  3. Are there any other methods for single-step implementation of two-phase membership change?
0 0 0
Share on

Xiangguang

6 posts | 2 followers

You may also like

Comments

Xiangguang

6 posts | 2 followers

Related Products