All Products
Search
Document Center

PolarDB:Automatic failover and manual failover

Last Updated:Oct 12, 2023

When a system failure occurs, a PolarDB for MySQL cluster can automatically fail over from the primary node to a read-only node. You can specify a read-only node as the new primary node to fail over from the primary node to the read-only node.

Precautions

If the failover with hot standby feature is not enabled for read-only nodes, your applications may be interrupted for about 20 to 30 seconds when failover occurs. Therefore, make sure that your applications can be automatically reconnected to the cluster. If the failover with hot standby feature is enabled for read-only nodes, failover can be completed within 5 to 10 seconds. For more information about how to enable the failover with hot standby feature for read-only nodes, see Configure hot standby nodes.

Automatic failover

A PolarDB for MySQL cluster of the Cluster Edition uses an active-active architecture that ensures high availability. If the primary node that supports reads and writes is faulty, applications are automatically failed over to the read-only node that is elected by the system as the new primary node.

A failover priority is assigned by the system to each node in a cluster. During a failover, a node is elected as the primary node based on the probability that is determined by this priority. The probability of being elected as the primary node is the same for the nodes that are assigned the same failover priority.

The system performs the following steps to elect the primary node:

  1. Find all the available read-only nodes that can be elected as the primary node.

  2. Select the read-only nodes that are assigned the highest failover priority.

  3. If the failover to the first read-only node fails due to network issues, abnormal replication status, or other reasons, the system attempts to fail over your applications to another read-only node until the failover succeeds.

In the Database Nodes section of the Basic Information page for the cluster, you can view and configure the failover priority of each node in the cluster.

Failover priorities of nodes
Note

If the failover with hot standby feature is not enabled for read-only nodes, your applications may be interrupted for about 20 to 30 seconds when failover occurs. Therefore, make sure that your applications can be automatically reconnected to the cluster. If the failover with hot standby feature is enabled for read-only nodes, failover can be completed within 5 to 10 seconds.

Manual failover

You can specify a read-only node as the new primary node to fail over from the primary node to the read-only node. Manual failovers are suitable for scenarios in which you need to test the high availability of a cluster or specify a read-only node as the primary node of a cluster.

  1. Log on to the PolarDB console.
  2. In the upper-left corner of the console, select the region in which the cluster that you want to manage is deployed.
  3. Find the cluster and click the cluster ID.
  4. In the Database Nodes section of the Basic Information page, click Failover in the upper-right corner of the section to switch views.

  5. Click Switch Primary Node.

    1
  6. In the dialog box that appears, specify New Primary Node and click OK.

    Note
    • If the failover with hot standby feature is not enabled for read-only nodes, your applications may be interrupted for about 20 to 30 seconds when failover occurs. Therefore, make sure that your applications can be automatically reconnected to the cluster.

    • If the failover with hot standby feature is enabled for read-only nodes, failover can be completed within 5 to 10 seconds.

FAQ

  • The cluster does not run normally 10 minutes after the failover is complete. What are the possible causes? How do I handle the issue?

  • If an exception on your cluster triggers a failover to ensure high availability, your application may fail to identify and respond to the changes to the connections. If no timeout periods are specified for socket connections, your application waits for the database to return the results. In most cases, your application is disconnected after hundreds of seconds. During this period, some connections to the database are abnormal, and a large number of SQL statements fail to be executed.

    To avoid invalid connections, we recommend that you configure the connectTimeout and socketTimeout parameters to prevent your application from waiting for a long period of time due to network errors. This reduces the time required to recover from failures.

    You must configure these parameters based on your workloads and usage modes. Recommended values for online transaction scenarios:

    • connectTimeout: We recommend that you set this parameter to 1 to 2 seconds.

    • socketTimeout: For an internal network environment, we recommend that you set this parameter to 10 to 15 seconds. For an Internet environment, we recommend that you set this parameter to 60 to 90 seconds.

    Note

    The preceding values are for reference only.

Related API operations

Operation

Description

FailoverDBCluster

Manually fails over from the primary node to a read-only node of a specified PolarDB for MySQL cluster. You can specify a read-only node as the new primary node.