×
Community Blog Physical Stream Replication of Standby Database in PostgreSQL

Physical Stream Replication of Standby Database in PostgreSQL

This article discusses the conflicts and playback control parameters related to stream replication.

By digoal

Background

Conflicts and playback control parameters related to stream replication:

  • hot_standby_feedback: The standby database feeds snapshots back. Garbage is generated when the oldest snapshot is not recycled during the upstream vacuum.
  • vacuum_defer_cleanup_age: The active database does not recycle garbage younger than this one.
  • recovery_min_apply_delay: The active database does not replay WAL newer than this time.
  • max_standby_streaming_delay: How long does the startup wait for conflict response when the startup replays the stream WAL record?
  • max_standby_archive_delay: How long does startup wait for conflict response when startup replays the archive (restore command) WAL record?

Some Questions:

  1. How many times do conflicts occur?
  2. What are the reasons?
  3. Is there any conflict that affects the redo playback operation in the current database?
  4. Who conflicts with redo playback?

Number of Conflicts

db1=# select * from pg_stat_database_conflicts ;  
 datid |  datname  | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock   
-------+-----------+------------------+------------+----------------+-----------------+----------------  
 14187 | postgres  |                0 |          0 |              0 |               0 |              0  
 16385 | db_video  |                0 |          0 |              0 |               0 |              0  
     1 | template1 |                0 |          0 |              0 |               0 |              0  
 14186 | template0 |                0 |          0 |              0 |               0 |              0  
 16387 | db1       |                0 |          0 |              4 |               0 |              0  
 17527 | db123     |                0 |          0 |              0 |               0 |              0  
(6 rows)  

Current Conflict Status

If the wait event for the startup process is empty, it means the process is blocked.

postgres=# select * from pg_stat_activity where backend_type ='startup' where wait_event is null;  
-[ RECORD 1 ]----+---------------------------------  
datid            |   
datname          |   
pid              | 21060  
usesysid         |   
usename          |   
application_name |   
client_addr      |   
client_hostname  |   
client_port      |   
backend_start    | 2020-02-29 00:26:28.478013+08  
xact_start       |   
query_start      |   
state_change     |   
wait_event_type  |   
wait_event       |   
state            |   
backend_xid      |   
backend_xmin     |   
query            |   
backend_type     | startup  

Current Startup Wait Event

When startup replays WAL, what event is it waiting for? Usually, there are also I/O and other operations. This is not conflict blocking.

postgres=# select * from pg_stat_activity where backend_type ='startup';  
-[ RECORD 1 ]----+------------------------------  
datid            |   
datname          |   
pid              | 21060  
usesysid         |   
usename          |   
application_name |   
client_addr      |   
client_hostname  |   
client_port      |   
backend_start    | 2020-02-29 00:26:28.478013+08  
xact_start       |   
query_start      |   
state_change     |   
wait_event_type  | IO  
wait_event       | DataFileExtend  
state            |   
backend_xid      |   
backend_xmin     |   
query            |   
backend_type     | startup  

RecoveryWalAll indicates that the startup process is waiting for WAL, but the standby database is usually not in the delay state.

postgres=# select * from pg_stat_activity where backend_type ='startup';  
-[ RECORD 1 ]----+------------------------------  
datid            |   
datname          |   
pid              | 21060  
usesysid         |   
usename          |   
application_name |   
client_addr      |   
client_hostname  |   
client_port      |   
backend_start    | 2020-02-29 00:26:28.478013+08  
xact_start       |   
query_start      |   
state_change     |   
wait_event_type  | Activity  
wait_event       | RecoveryWalAll  
state            |   
backend_xid      |   
backend_xmin     |   
query            |   
backend_type     | startup  

Some Query May Have Blocked Startup

Generally, the earlier the block occurs, the more likely it is because some query blocks startup.

It is the same when values of xmin and xid are smaller.

db1=# select *,xact_start,query_start,state,user,query from pg_stat_activity where datname=current_database() and state<>'idle' order by xact_start limit 5;  
-[ RECORD 1 ]----+---------------------------------------------------------------------------------------------------------------------------  
datid            | 16387  
datname          | db1  
pid              | 29015  
usesysid         | 10  
usename          | postgres  
application_name | psql  
client_addr      |   
client_hostname  |   
client_port      | -1  
backend_start    | 2020-03-10 19:01:22.577305+08  
xact_start       | 2020-03-10 19:01:42.257888+08  
query_start      | 2020-03-10 19:01:43.750416+08  
state_change     | 2020-03-10 19:01:43.750577+08  
wait_event_type  | Client  
wait_event       | ClientRead  
state            | idle in transaction  
backend_xid      |   
backend_xmin     | 4556  
query            | select * from abc limit 1;  
backend_type     | client backend  
xact_start       | 2020-03-10 19:01:42.257888+08  
query_start      | 2020-03-10 19:01:43.750416+08  
state            | idle in transaction  
user             | postgres  
query            | select * from abc limit 1;  

You can use the following queries:

1.  The longest time

select a.* from   
(  
select *,row_number() over (partition by state order by xact_start) as rn   
from pg_stat_activity   
where datname=current_database()   
and pid<>pg_backend_pid()   
and state<>'idle'   
) a,  
(  
select * from pg_stat_activity where backend_type ='startup' and wait_event is null   
)b  
where a.rn <= 1   
order by a.xact_start;   

2.  Or (the earliest transaction number)

select a.* from   
(  
select *,row_number() over (partition by state order by least(backend_xid::text::int8,backend_xmin::text::int8)) as rn   
from pg_stat_activity   
where datname=current_database()   
and pid<>pg_backend_pid()   
and state<>'idle'   
) a,  
(  
select * from pg_stat_activity where backend_type ='startup' and wait_event is null   
)b  
where a.rn <= 1   
order by least(a.backend_xid::text::int8,a.backend_xmin::text::int8);   

The query that may block WAL replay is listed below:

-[ RECORD 1 ]----+------------------------------  
datid            | 16387  
datname          | db1  
pid              | 30448  
usesysid         | 10  
usename          | postgres  
application_name | psql  
client_addr      |   
client_hostname  |   
client_port      | -1  
backend_start    | 2020-03-10 19:13:36.670184+08  
xact_start       | 2020-03-10 19:13:38.696822+08  
query_start      | 2020-03-10 19:13:40.856399+08  
state_change     | 2020-03-10 19:13:40.85716+08  
wait_event_type  | Client  
wait_event       | ClientRead  
state            | idle in transaction  
backend_xid      |   
backend_xmin     | 4561  
query            | select * from abc limit 1;  
backend_type     | client backend  
rn               | 1  

Note: Only the query that seems to be blocking startup replay can be found, which is inaccurate.

Perhaps in the future, the kernel will support such interfaces, which can accurately find queries that block WAL replay.

How Many WALs Are Blocked and Not Replayed?

db1=# select pg_is_wal_replay_paused(),  
pg_last_wal_receive_lsn(),pg_last_wal_replay_lsn(),  
pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(),pg_last_wal_replay_lsn()));  
 pg_is_wal_replay_paused | pg_last_wal_receive_lsn | pg_last_wal_replay_lsn | pg_size_pretty   
-------------------------+-------------------------+------------------------+----------------  
 f                       | 4/BCF4A338              | 4/BCF49EE8             | 1104 bytes  
(1 row)  
0 0 0
Share on

digoal

276 posts | 24 followers

You may also like

Comments