In the two-node HA architecture, how can we ensure that RPO is 0 in reliability dimension across data centers and control RTO in terms of availability at the same time?
Semi-synchronous feature is a good choice.
1) When only one node is suspended, the RPO is 0. The example is shown as below:
Primary -> standby (failed) Primary (failed) -> standby
2) One node fails and another node recovers and starts the synchronization mode. If the (current) primary node fails, RPO is greater than 0. Although the standby database is active at this time, the synchronization mode has not been started. The example is given below:
Primary (failed) -> standby (in OPEN state, but has failed before and not started the synchronization mode) RPO > 0, which is similar to the condition of failure of both nodes at the same time
3) What is the way to ensure the RTO time controllably?
Under the synchronous mode, before committing, a transaction needs to wait for WALs of sync STANDBY to copy feedback, ensuring transaction WALs persist multiple replicas and return feedback to client. In this process, first, make the master database to be persistent. Then, synchronize the sync STANDBY. Next, wait for the feedback from the WAL synchronization timestamp in sync STANDBY. When the STANDBY fails, waiting is open-ended. So, two-node synchronous replication cannot take into account the availability (RTO).
Therefore, what is the solution?
The status of the (pg_stat_activity) wait event can be monitored. If the waiting time of a synchronous transaction exceeds a certain threshold (RTO threshold), the mode is degraded to the asynchronous mode.
Database restart is not required for degradation.
3.1) Modify the configuration.
3.2) Reload. This setting takes immediate effect for both existing and new connections.
3.3) Cancel the wait signal for the current waiting process.
4) After degrading, how can the mode be upgraded to the synchronization mode?
Similarly, the status of (pg_stat_replication) can be monitored. When sync standby is in streaming status, the mode can be switched to the synchronization mode.
Database restart is not required for the upgrade.
4.1) Modify configuration.
4.2) Reload. The setting takes effect immediately both for existing connections and new connections.
1) Transaction submission parameters.
synchronous_commit on, remote_apply, remote_write, local
2) Synchronous configuration parameters.
synchronous_standby_names [FIRST] num_sync ( standby_name [, ...] ) ANY num_sync ( standby_name [, ...] ) standby_name [, ...] ANY 3 (s1, s2, s3, s4) FIRST 3 (s1, s2, s3, s4) * 表示所有节点
3) Active sessions: View the wait event status when the transaction is submitted.
To know more, visit the following link
4) Stream status, pg_stat_replication.
sync_state='sync' state text Current WAL sender state. Possible values are: startup: This WAL sender is starting up. catchup: This WAL sender's connected standby is catching up with the primary. streaming: This WAL sender is streaming changes after its connected standby server has caught up with the primary. backup: This WAL sender is sending a backup. stopping: This WAL sender is stopping.
postgresql.conf synchronous_commit = remote_write wal_level = replica max_wal_senders = 8 synchronous_standby_names = '*'
recovery.conf restore_command = 'cp /data01/digoal/wal/%f %p' primary_conninfo = 'host=localhost port=8001 user=postgres' '
Disable standby to simulate a standby database exception to see how a semi-synchronous feature is achieved.
Simulate the STANDBY recovery to simulate the upgrade to the synchronous mode.
1) Monitor pg_stat_activity. If it is found that the transaction submission wait exceeds a certain threshold (RTO threshold), degrade it.
select max(now()-query_start) from pg_stat_activity where wait_event='SyncRep';
2) View the waiting time of the preceding result (RTO threshold).
When it is greater than a threshold value, it starts degradation.
Note that NULL is used to ensure that no transactions are in the SyncRep wait state.
3) Degrade step 1 to modify the synchronous_commit parameter. Change to WAL local persistence with asynchronous stream replication.
alter system set synchronous_commit=local;
4) Degrade step 2 to RELOAD the effective parameters.
5) Degrade step 3 to clear the current waiting queue. After the SyncRep waiting process receives the CANCEL signal, it clears the queue and notifies the client that the local WAL of the current transaction has been persisted. The transaction ends normally.
select pg_cancel_backend(pid) from pg_stat_activity where wait_event='SyncRep';
6) The client that receives the clearing signal returns normal results. The client can see that the transaction is submitted normally.
postgres=# end; WARNING: 01000: canceling wait for synchronous replication due to user request DETAIL: The transaction has already been committed locally, but might not have been replicated to the standby. LOCATION: SyncRepWaitForLSN, syncrep.c:264 COMMIT
The redo information of the transaction has been persisted in the local WAL, and the commit status is normal.
For subsequent requests of the current session, the asynchronous stream replication mode, that is, WAL local persistence mode (synchronous_commit=local), will be implemented.
Upgrade process is as follows:
7) Upgrade step 1 to monitor standby state. If sync_state='sync' state of standby is streaming state, it means that WALs of the standby and primary are fully synchronized.
select * from pg_stat_replication where sync_state='sync' and state='streaming';
The result returned indicates that the standby has received the WAL of the primary. Then, the synchronization can be implemented.
8) Upgrade step 2 to change the transaction commit mode to synchronous mode with synchronous_commit=remote_write. When submitting a transaction, wait until the sync standby receives the WAL and writes it later.
alter system set synchronous_commit=remote_write;
9) Upgrade step 3 to RELOAD the effective parameters. Reset synchronous_commit=remote_write for all sessions, including existing connections and new connections.
1) Without modifying the kernel of PostgreSQL, the two-node semi-synchronous mode is implemented through external auxiliary monitoring and manipulation, for example, the 5-second monitoring interval. If the two-node or single-node structure runs normally, RPO is 0 and controllable. For example, the maximum wait_event='SyncRep' wait time is greater than 10 seconds.
2) Kernel modification suggestions.
Alibaba Cloud Storage - April 3, 2019
Alibaba Clouder - November 20, 2018
Alibaba Cloud Storage - February 27, 2020
digoal - September 1, 2021
Alibaba Clouder - September 19, 2018
Alibaba Cloud Storage - February 27, 2020
Fully managed and less trouble database servicesLearn More
An online MPP warehousing service based on the Greenplum Database open source programLearn More
An array of powerful multimedia services providing massive cloud storage and efficient content delivery for a smooth and rich user experience.Learn More
An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
More Posts by digoal