Zero-downtime database major version upgrade best practices - ApsaraDB RDS

This topic describes how to perform a major version upgrade for an RDS PostgreSQL database with zero downtime, helping you avoid service interruptions during the upgrade process.

Upgrade process

Pre-upgrade check.
Create a zero-downtime upgrade task.
Switch to the higher version instance.

Upgrade impacts

During the zero-downtime upgrade process, the following phases may affect instance usage:

DDL restriction period: From the start of the upgrade task until the instance switch is completed, all DDL operations are prohibited.
WAL log accumulation period: After upgrading the destination instance to a higher version, a logical replication relationship must be established to ensure data consistency. Before performing the major version upgrade, a logical replication slot is created, which results in a log accumulation period.
Note
You can obtain information about the WAL log accumulation period by viewing the upgrade logs in the upgrade history. The logs record the events of creating replication slots and publications on the source. When WAL logs start to be retained, they will begin to accumulate until the subscriber on the destination starts and establishes the logical replication relationship, at which point WAL logs will begin to be consumed and no longer accumulate.
Logical replication synchronization period: The entire period from establishing the logical replication relationship until completing the instance switch. After establishing logical replication, it will generate a certain resource load, which is closely related to the number of databases and traffic.
Switch period: The instance is in read-only mode. The read-only duration is related to the number of sequences.

Step 1: Pre-upgrade check

Zero-downtime major version upgrade uses logical replication and requires prohibiting DDL operations throughout the process. Therefore, it is necessary to check and restrict the limitations of logical replication itself and plugins that might trigger DDL operations.

Log on to the ApsaraDB RDS console and go to the Instances page. In the top navigation bar, select the region in which the RDS instance resides. Then, find the RDS instance and click the ID of the instance.
(Optional) If read-only instances have been created for the instance to be upgraded, you need to modify the connection address configured in your application from the read-only instance to the primary instance, and delete the read-only instance.
Note
For service stability purposes, we recommend that you modify the endpoint configuration on your application during off-peak hours.
In the left-side navigation pane, click Major Version Upgrade.
Note
If you do not see Major Version Upgrade in the console, check the version and series configuration of your RDS PostgreSQL instance. For more information, see Upgrade the major engine version.
On the Upgrade Check tab, click Create Upgrade Check Report.
Select The Upgrade Version, and for Upgrade Mode, select Zero Downtime, and then click OK.
You can view the upgrade check results in the Upgrade Check Log section.
Important
- To ensure that the upgrade can be completed successfully, make sure that the Check Result of the upgrade check report is successful before proceeding with subsequent upgrade steps. When the check result of the upgrade check report is failed, you can click View Information to view the detailed content of the report. For common errors and reasons, see Interpreting the RDS PostgreSQL major version upgrade check report.
- After the upgrade check is successful, if plugins have been created on the destination instance, you need to perform the check again.

Step 2: Upgrade the major version

Click the Upgrade Instance tab, read the warning content, then Select The Upgrade Version, and click Create Upgrade Task.
In the dialog box that appears, read the prompt and click OK.
In the Create Major Version Upgrade Task section, select Zero-downtime for Upgrade Mode.
Click Create Now.
When the instance status changes to Migrating, it indicates that the upgrade task has officially started.
You can view the upgrade results on the Upgrade History tab.

Monitor instance load

During the upgrade process, you can monitor the instance load and disk usage to understand the performance status of the instance.

During the WAL log accumulation period, the disk usage of the instance will temporarily increase. After logical replication is established, disk usage will decrease.
During the logical replication phase, traffic and the number of databases will affect the load on the source instance. You can evaluate the impact of logical replication by observing the resource usage of wal_sender under various resource categories.

Step 3: Switch to the higher version

Verify the higher version instance.
When the instance status changes from Migrating to Migrating Data, it indicates that logical replication has been established, and the process of creating the upgrade task will end. You can verify the data in the higher version instance.
Go to the Upgrade History tab and use the Higher Version Verification Address of the target upgrade record to connect to the higher version instance to verify the upgraded data.
Note
The higher version instance is in Read-Only mode and cannot be written to.
Switch to the higher version instance.
After confirming that the data in the higher version instance meets your expectations and the upgrade result is Synchronizing, click Upgrade Log column's Change to switch your business to the higher version instance.
Note
- If the Upgrade Result shows another status, see Upgrade result description for handling instructions.
- If you decide to abandon this upgrade, you can click Upgrade Log column's Cancel. This will delete the logical replication slot, cancel the impact of logical replication on the source instance, and allow it to perform DDL operations.
Set the Tolerable Write Suspension Time (in seconds), and click OK.
Because logical replication cannot synchronize Sequence tables, there will be a period during the switch to synchronize Sequence tables. You can set the Tolerable Write Suspension Time when switching to ensure the switch is completed within an acceptable time. During this process, the Upgrade Result will change to Read Only. If this time is exceeded, the system will return to the Synchronizing state and remove the read-only restriction.
Check the switch result.
- When the instance status is Migrating, it indicates that the switch is in progress. On the Upgrade History tab of Major Version Upgrade, click the Break button in the Upgrade Log column to cancel this switch operation.
- When the instance status changes to The Instance Is Running., it indicates that the switch is successful. On the instance Basic Information page, you can view the current version information of the instance.

Rollback

After switching, if you need to roll back to the lower version instance, you can clone a new instance from the last backup of the lower version before the switch, and modify the connection address of this instance to the connection address of the old instance to achieve rollback to the lower version.

Note

Before switching, the system will back up the lower version instance.

Upgrade result description

The upgrade records on the Upgrade History tab mainly include the following statuses for Upgrade Result:

Status	Meaning	Available Actions
Running	The upgrade task is running.	None.
Synchronizing	Logical replication status is normal.	Change: Switch to the higher version instance. Cancel: Abandon this upgrade.
Replication Interrupted	Logical replication status is abnormal.	View the upgrade log to determine the cause of the replication exception. Cancel: Abandon this upgrade.
Read Only	Switching is in progress, the instance is in read-only mode, and Sequence tables are being synchronized.	Break: Cancel this switch operation.
Switch	Sequence table synchronization is complete, and final tasks are in progress.	None.
Canceled	The upgrade task is canceled.	None.
Success	The upgrade task is successful.	None.