Elizabeth
Engineer
Engineer
  • UID625
  • Fans4
  • Follows1
  • Posts68
Reads:3489Replies:0

[PostgreSQL Development]PostgreSQL source code analysis Standby query conflicts - User was holding shared buffer pin for too

Created#
More Posted time:Sep 2, 2016 11:17 AM
Background
Stream replication-based physical standby database of PostgreSQL relies on redo physical blocks for replication, only read-only features are allowed. But please note that since the master database may constantly generate redo logs, these redo logs may be in conflict with the query of the standby database.
When will queries be congested or in conflict with the recovery?
When the redo logs generated by the following operations are copied to the standby database, and the standby database prepares to use these redo logs for recovery:
• Access Exclusive locks taken on the primary server, including both explicit LOCK commands and various DDL actions, conflict with table accesses in standby queries.
Access to exclusive locks on the master database is in conflict with the lock in the standby database.
For example, the master database truncates table a, and the standby database queries table a.
But such conflicts are within a narrow range.
• Dropping a tablespace on the primary conflicts with standby queries using that tablespace for temporary work files.
Tablespace of the master database is deleted while the tablespace is being used by the standby database for temporary work files.  For example, the TBS tablespace is deleted on the master database, and a major query in the standby database needs to write a temporary file that is written to the tablespace.
This case is rare and can be easily avoided. You can create a new tablespace and do not delete it.
• Dropping a database on the primary conflicts with sessions connected to that database on the standby.
A database on the master database is deleted and the standby database has a session connected to that database.
This case is also rare.
• Application of a vacuum cleanup record from WAL conflicts with standby transactions whose snapshots can still "see" any of the rows to be removed.
The master database collects dead tuple REDO logs, and at the same time, the current query snapshot of the standby database needs to see the logs.
This issue can be solved by arguments, namely prioritizing recovery or prioritizing query.  The time window can be configured.
The probability of such conflicts is low, unless users use repeatable read on the standby database for a sizable transaction.
Usually users use read committed.
• Application of a vacuum cleanup record from WAL conflicts with queries accessing the target page on the standby, whether or not the data to be removed is visible.
The same with the above case. But when the queried page is just the page to be cleaned up, conflict still arises.
This is the only difference between physical replication and logic replication. But for actual scenarios, the probability of such cases is also low.
Cases
I recently received a conflict case like this: query of some functions in the standby database receives such errors:
< 2016-08-09 22:03:39.534 CST >STATEMENT:  SELECTxxx('32980770','-1','0','0',20,100,10)
< 2016-08-09 22:03:39.534 CST >ERROR:  canceling statement due to conflict with recovery
< 2016-08-09 22:03:39.534 CST >DETAIL:  User was holding shared buffer pin for too long.
< 2016-08-09 22:03:39.534 CST >CONTEXT:  PL/pgSQL function xxx(integer,bigint[],integer,integer) line 7 at RETURN QUERY
  PL/pgSQL function xxx(integer,integer,bigint,bigint,integer,integer,integer) line 15 at RETURN QUERY


What’s the cause?
The types of conflicts between apply and query to PostgreSQL standby are as follows:
src/include/storage/procsignal.h
/* Recovery conflict reasons */
        PROCSIG_RECOVERY_CONFLICT_DATABASE,
        PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
        PROCSIG_RECOVERY_CONFLICT_LOCK,
        PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
        PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
        PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,


The conflict in the error of the example belongs to PROCSIG_RECOVERY_CONFLICT_BUFFERPIN
It comes from the function
/*
 * errdetail_recovery_conflict
 *
 * Add an errdetail() line showing conflict source.
 */
static int
errdetail_recovery_conflict(void)
{
        switch (RecoveryConflictReason)
        {
                case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
                        errdetail("User was holding shared buffer pin for too long.");
                        break;


When the standby applies redo, the conflict arises. The calls are as follows:


Source code
When the standby applies redo, if a vacuum operation appears, it will trigger the following:
The vacuum operation must get the block’s exclusive lock to continue.
void
LockBufferForCleanup(Buffer buffer)
{
...
    for (;;)
    {
        /* Try to acquire lock */
        LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
        LockBufHdr(bufHdr);
...
        /* Wait to be signaled by UnpinBuffer() */
        if (InHotStandby)
        {
            /* Publish the bufid that Startup process waits on */
            SetStartupBufferPinWaitBufId(buffer - 1);
            /* Set alarm and then wait to be signaled by UnpinBuffer() */
            ResolveRecoveryConflictWithBufferPin();
            /* Reset the published bufid */
            SetStartupBufferPinWaitBufId(-1);
        }


If it is standby, the command will judge timeout. If there is no timeout, vacuum redo will continue to wait.
void
ResolveRecoveryConflictWithBufferPin(void)
{
        TimestampTz ltime;

        Assert(InHotStandby);

        ltime = GetStandbyLimitTime();

        if (ltime == 0)
        {
                /*
                 * We're willing to wait forever for conflicts, so set timeout for
                 * deadlock check only
                 */
                enable_timeout_after(STANDBY_DEADLOCK_TIMEOUT, DeadlockTimeout);
        }
        else if (GetCurrentTimestamp() >= ltime)
        {
                /*
                 * We're already behind, so clear a path as quickly as possible.
                 */
                SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
        }


Get the timeout time. If timeout occurs, SendRecoveryConflictWithBufferPin is triggered, leading to case error.
/*
 * Determine the cutoff time at which we want to start canceling conflicting
 * transactions.  Returns zero (a time safely in the past) if we are willing
 * to wait forever.
 */
static TimestampTz
GetStandbyLimitTime(void)
{
        TimestampTz rtime;
        bool            fromStream;

        /*
         * The cutoff time is the last WAL data receipt time plus the appropriate
         * delay variable.  Delay of -1 means wait forever.
         */
        GetXLogReceiptTime(&rtime, &fromStream);
        if (fromStream)
        {
                if (max_standby_streaming_delay < 0)
                        return 0;                       /* wait forever */
                return TimestampTzPlusMilliseconds(rtime, max_standby_streaming_delay);
        }
        else
        {
                if (max_standby_archive_delay < 0)
                        return 0;                       /* wait forever */
                return TimestampTzPlusMilliseconds(rtime, max_standby_archive_delay);
        }
}


How to avoid or reduce conflicts
PostgreSQL offers three solutions for conflicts between queries and recoveries on the standby database.
• Configure vacuum_defer_cleanup_age on the master database to solve the last two kinds of conflicts.
vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is delayed

• Configure recovery delay on the standby database to solve all the conflicts above, and set an execution window for the standby database query.
max_standby_archive_delay = 30s        # max delay before canceling queries
# when reading WAL from archive;
# -1 allows indefinite delay
#
max_standby_streaming_delay = 30s      # max delay before canceling queries
# when reading streaming WAL;
# -1 allows indefinite delay


• Configure hot_standby_feedback on the standby database, and the standby database will send query feedback to the master database (maybe snapshot instead of the query itself) (in a cascading environment, the feedback will be sent to the top layer master database), so that the master database knows what the standby database is processing and takes its tasks into consideration during cleanup dead tuple to avoid conflicts.
hot_standby_feedback = off             # send info from standby to prevent query conflicts
#
wal_retrieve_retry_interval = 1s
Guest