PolarDB supports the persistent connection feature to prevent network interruptions or temporary failures of new connections. These issues may be caused by operations and maintenance (O&M) operations, such as primary/secondary switchovers and version upgrades. Issues may also be caused by other errors. For example, the server where nodes are deployed is unavailable. This new feature further improves the high availability of PolarDB.

Prerequisites

  • The cluster version must be ApsaraDB PolarDB MySQL-compatible edition 5.6, 5.7, or 8.0 and the edition must be Cluster Edition.
  • The cluster was created on March 16, 2021 or later.
    Note To enable this feature for clusters that were created before March 16, 2021, Submit a ticket.

Background

PolarDB has supported primary/secondary switchover by using high-availability components when the node is unavailable. This ensures high-availability services. However, the switching may adversely affect your service and cause issues, such as network interruptions or temporary failures of new connections. The service may be temporarily unavailable in the following scenarios:

In most cases, you can restart the application or implement a reconnection mechanism for the application to fix these issues. However, the issues may not be taken into account in the early stage of the development due to the limited development life cycle. Therefore, this causes a large number of exceptions and even unavailable services. PolarDB supports persistent connections to avoid the network interruption or temporary failure of new connections. These issues may be caused by O&M or non-O&M malfunctions. This new feature further improves the high availability of PolarDB.

How it works

Each session of the PolarDB contains a frontend connection (between the application and the proxy) and a backend connection (between the proxy and the backend database). After persistent connection is enabled, when the proxy is disconnected from the previous primary node (the primary node before the high-availability switching), the connection (session) between the proxy and the frontend application remains unchanged. In this case, the proxy creates another connection with the new primary node (the primary node after the high-availability switching) and recovers the previous session. This ensures a high-availability switching for the application.

1 2

Typically, a MySQL session includes the information, such as system variables, user variables, temporary tables, character set encoding, transaction status, and PREPARE statement status. The status of character set encoding is used in this topic to show the session status before and after persistent connection is enabled.

If a connection is established between the application and the proxy, after the set names utf8; command is run, the session is in the names=utf8 state. When the proxy switches from the previous primary node to the new primary node, the state must remain unchanged. Otherwise, a garbled character set error occurs. Therefore, to ensure persistent connections, the status of the session must remain unchanged after switching.

Note When the proxy switches the connection from the previous primary node to the new one, the previous and new databases may both become unavailable for read and write requests for a short period of time. The period of time depends on the loads of databases. Therefore, the proxy temporarily stops routing requests from the application to the backend database during the switching. The proxy routes requests based on the time that is required to recover the read and write capability of the new database:
  • If the new database recovers within 60 seconds, the proxy routes requests to the new database.
  • If the new database fails to recover within 60 seconds, the proxy disconnects from the application. In this case, the application must create a new connection. This is the same as the scenario in which persistent connection is disabled.

How to enable the feature

  • For ApsaraDB PolarDB MySQL-compatible edition clusters that are created on or after March 16, 2021, this feature is enabled after you purchase clusters. You do not need to manually enable the feature.
  • For ApsaraDB PolarDB MySQL-compatible edition clusters that were created before March 16, 2021,Submit a ticket.

Considerations

The connections in the following scenarios are not supported:
  • When the connection switching starts, temporary tables exist in the session.
  • When the connection switching starts, the proxy is receiving result messages from the database but only receives part of the messages. For example, after you execute a SELECT statement, a result message that contains 100 MB of data is returned from the database. However, the proxy only receives 10 MB of data when the switching starts.
  • When the connection switching starts, transactions that are in progress, such as begin;insert into;, exist in the session.
Note For the last two of the preceding scenarios, if the connection switching is caused by a switchover, the proxy first retains the connection for 2 seconds. Then, if the remaining messages can be returned within 2 seconds or the transaction can be completed within 2 seconds, the proxy will switch the connection after the waiting time. This way, connections are more likely to be kept alive.

Performance testing (appendix)

  • Test environment
    • The following cluster is used:
      • A ApsaraDB PolarDB MySQL-compatible edition 8.0 cluster that contains the primary node and one read-only node.
      • The node specification is 4-Core 16 GB(polar.mysql.x4.large).
    • Test tool: SysBench.
    • Test data:
      • 20 tables are used in the test. Each table contains 10,000 rows.
      • The concurrency is 20.
  • Test method

    In different O&M scenarios, test the ratio of persistent connections for the PolarDB cluster, which is the ratio of connections that are kept alive before and after O&M operations.

  • Test result

    In the following O&M scenarios, the ratio of persistent connections for the PolarDB cluster can reach 100%.

    Note
    • The ratio of connections can reach 100% only when you upgrade specifications level by level. If you upgrade the specifications of the cluster from 4 cores to 16 cores or more, a network interruption may occur.
    • If a database proxy node is scaled in when a read-only node is deleted, network interruptions may occur for some connections.
    • In the kernel minor version upgrade scenario, only the minor version upgrade of the database kernel engine is included. The minor version upgrade of the database proxy may cause a network interruption.
    Scenario Ratio of persistent connections
    Primary/secondary switchover 100%
    Upgrade the minor version of the kernel 100%
    Upgrade cluster specifications 100%
    Add or remove nodes 100%