When a system failure occurs, a PolarDB cluster can automatically fail over from the primary node to a read-only node. You can specify a read-only node as the new primary node to fail over from the primary node to the read-only node.
Precautions
If the failover with hot standby feature is not enabled for read-only nodes, your applications may be interrupted for about 20 to 30 seconds when failover occurs. Therefore, make sure that your applications can be automatically reconnected to the cluster. If the failover with hot standby feature is enabled for read-only nodes, failover can be completed within 5 to 10 seconds. For more information about how to enable the failover with hot standby feature for read-only nodes, see Configure hot standby nodes.
Automatic failover
A PolarDB cluster of the Cluster Edition uses an active-active architecture that ensures high availability. If the primary node that supports reads and writes is faulty, applications are automatically failed over to the read-only node that is elected by the system as the new primary node.
A failover priority is assigned by the system to each node in a cluster. During a failover, a node is elected as the primary node based on the probability that is determined by this priority. The probability of being elected as the primary node is the same for the nodes that are assigned the same failover priority.
- Find all the available read-only nodes that can be elected as the primary node.
- Select the read-only nodes that are assigned the highest failover priority.
- If the failover to the first read-only node fails due to network issues, abnormal replication status, or other reasons, the system attempts to fail over your applications to another read-only node until the failover succeeds.
In the Database Nodes section of the Overview page for the cluster, you can view and configure the failover priority of each node in the cluster.

Manual failover
You can specify a read-only node as the new primary node to fail over from the primary node to the read-only node. Manual failovers are suitable for scenarios in which you need to test the high availability of a cluster or specify a read-only node as the primary node of a cluster.
FAQ
- The cluster does not run normally 10 minutes after the failover is complete. What are the possible causes? How do I handle the issue?
- If an exception on your cluster triggers a failover to ensure high availability, your
application may fail to identify and respond to the changes to the connections. If
no timeout periods are specified for socket connections, your application waits for
the database to return the results. In most cases, your application is disconnected
after hundreds of seconds. During this period, some connections to the database are
abnormal, and a large number of SQL statements fail to be executed.
To avoid invalid connections, we recommend that you configure the connectTimeout and socketTimeout parameters to prevent your application from waiting for a long period of time due to network errors. This reduces the time required to recover from failures.
You must configure these parameters based on your workloads and usage modes. Recommended values for online transaction scenarios:- connectTimeout: We recommend that you set this parameter to 1 to 2 seconds.
- socketTimeout: For an internal network environment, we recommend that you set this parameter to 10 to 15 seconds. For an Internet environment, we recommend that you set this parameter to 60 to 90 seconds.
Note The preceding values are for reference only.
Related API operations
Operation | Description |
---|---|
FailoverDBCluster | Manually fails over from the primary node to a read-only node of a specified PolarDB cluster. You can specify a read-only node as the new primary node. |