Automatic failover and manual failover - PolarDB - Alibaba Cloud Documentation Center

A PolarDB cluster can automatically fail over services from the primary node to a read-only node when a system failure occurs. You can also perform a manual failover by specifying a read-only node as the new primary node.

Precautions

During an automatic or manual failover, the database service may be interrupted for approximately 20 to 30 seconds if hot standby is disabled for read-only nodes. In this case, your connection to the database may be interrupted. Make sure that your application can automatically reconnect to the cluster. If hot standby is enabled for read-only nodes, the failover can be completed within 5 to 10 seconds. For information about how to enable hot standby for read-only nodes, see Configure hot replica nodes.

Important

In specific extreme cases, the database interruption time during the failover may increase but does not exceed 3 minutes.

Automatic failover

A PolarDB cluster of Cluster Edition uses an active-active high availability architecture. When the primary node fails, the system automatically elects a new primary node from the read-only nodes and fails over services from the original primary node to the new primary node.

A failover priority is assigned to each node in a cluster. The priorities determine which node can be elected as the primary node during a failover. If multiple nodes have the same priority, they have the same probability of being elected as the primary node.

The system performs the following steps to elect the primary node:

Find all available read-only nodes that can be elected as the primary node.
Select the read-only nodes that have the highest failover priority.
If the failover to the first read-only node fails due to network issues, abnormal replication status, or other reasons, the system attempts to fail over services to the next read-only node until the failover is successful.

You can view and configure the failover priority of each node in the cluster in the Basic Information section of the Database Nodes page of the cluster.

优先级

Note

If hot standby is disabled for the read-only node that is elected as the new primary node, the database service may be interrupted for approximately 20 to 30 seconds during the failover. In this case, your connection to the database may be interrupted. Make sure that your application can automatically reconnect to the cluster.
If hot standby is enabled for the read-only node that is elected as the new primary node, the failover can be completed within 5 to 10 seconds.
In specific extreme cases, the database interruption time during the failover may increase but does not exceed 3 minutes.

Manual failover

You can also perform a manual failover by specifying a read-only node as the new primary node. Manual failovers are suitable for scenarios in which you need to test the high availability of a cluster or specify a specific read-only node as the primary node of the cluster.

Log on to the PolarDB console.
In the upper-left corner, select the region where the cluster to which you want to connect is deployed.
Find the cluster and click its ID.
In the Database Nodes section of the Basic Information page, click the icon in the upper-right corner of the section to switch views.
Click Fail Over.
In the dialog box that appears, configure the New Primary Node parameter and click OK.
Note
- If hot standby is disabled for the read-only node that is specified as the new primary node, the database service may be interrupted for approximately 20 to 30 seconds during the failover. In this case, your connection to the database may be interrupted. Make sure that your application can automatically reconnect to the cluster.
- If hot standby is enabled for the read-only node that is specified as the new primary node, the failover can be completed within 5 to 10 seconds.
- In specific extreme cases, the database interruption time during the failover may increase but does not exceed 3 minutes.

FAQ

The status of my cluster does not return to Running 10 minutes after a failover is complete. What are the possible causes? How do I handle the issue?
If a persistent connection is established between your application and the cluster, your application may fail to detect the changed connection status when an anomaly triggers a failover. If no socket timeout periods are specified, your application waits for the database to return the results. In most cases, your application is disconnected after hundreds of seconds. During this period, some connections to the database are abnormal, and a large number of SQL statements fail to be executed.
To avoid invalid connections, we recommend that you specify the connectTimeout and socketTimeout parameters to prevent your application from waiting for an extended period of time due to network errors. This reduces the amount of time required to recover from failures.
You must specify these parameters based on your workloads and usage modes. Recommended values for online transaction scenarios:
- connectTimeout: We recommend that you set this parameter to 1 to 2 seconds.
- socketTimeout: For an internal network environment, we recommend that you set this parameter to 10 to 15 seconds. For a public network environment, we recommend that you set this parameter to 60 to 90 seconds.
Note
The preceding values are only for reference.

Related API operations

Operation	Description
FailoverDBCluster	Performs a manual failover in a PolarDB cluster by specifying a read-only node as the new primary node.