×
Community Blog Three Key Details and Principles of PostgreSQL Online Backup & Recovery

Three Key Details and Principles of PostgreSQL Online Backup & Recovery

This article discusses the process of online backup restoration and addresses three key questions regarding the recovery of database consistency.

By digoal

This article answers 3 questions:

1.  From which WAL/ LSN does online backup data restoration begin?

Executing pg_start_backup triggers a checkpoint, and the StartLSN of this checkpoint is recorded in the backup_label file. This is because any necessary full page writes (FPWs) start from this position.

2.  To what WAL LSN must online backup data be restored for the database to be consistent? ((Why are there inconsistencies? Some disk operations during the backup process, such as partially copying a block, can result in the backup file containing half old and half new data for that block. This issue does not occur in file systems that support CopyOnWrite, such as ZFS, hence FPW can be closed.)

Without using block checksums, we cannot identify which data blocks in the backup file may be inconsistent.

Similarly, we cannot determine where the FPW corresponding to a block is located in the backup file unless the WAL is sequentially scanned from the startpoint until the corresponding FPW is encountered.

The FPW generated when a block is first modified after the checkpoint may exist in the WAL position of the stop point in the copied backup file, so restoring to this specific WAL LSN position ensures the consistency of the database.

A stop point refers to the LSN of the mark written to the WAL when pg_stop_backup is called.

Demo:

t1 pg_start_backup  
   Execute a checkpoint   
   Write backup_label  (RedoStartLSN of the checkpoint is written to backup_label)    
  
t2 copy start // The online backup copy starts.  
  
t3 copy stop // The online backup copy ends.   
  
t4 pg_stop_backup 

Where is the consistency point? The real answer is t3, but the database recognizes it as t4.

This is because the database cannot determine the WAL LSN that corresponds to t3. Only after invoking pg_stop_backup will an end mark (and the corresponding start backup LSN) be recorded in the WAL.

3.  What are the differences between performing an online backup on the primary database and on the standby database?

The first difference involves the stop backup process. When online backup is executed on the primary database, pg_stop_backup will lead to recording an end mark (along with the corresponding start backup LSN) in the WAL.

For an online backup on the standby, the end position is determined by ControlFile->minRecoveryPoint, marking the point to which the recovering database must advance to ensure consistency.

The second difference relates to the start backup. The primary database supports both exclusive and non-exclusive online backups (for instance, using pg_basebackup).

        /*  
         * Currently only non-exclusive backup can be taken during recovery.  
         */  
        if (backup_started_in_recovery && exclusive)  
                ereport(ERROR,  
                                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),  
                                 errmsg("recovery is in progress"),  
                                 errhint("WAL control functions cannot be executed during recovery.")));

Sample Code

src/backend/access/transam/xlog.c

/*  
 * do_pg_start_backup  
 *  
 * Utility function called at the start of an online backup. It creates the  
 * necessary starting checkpoint and constructs the backup label file.  
 *  
 * There are two kind of backups: exclusive and non-exclusive. An exclusive  
 * backup is started with pg_start_backup(), and there can be only one active  
 * at a time. The backup and tablespace map files of an exclusive backup are  
 * written to $PGDATA/backup_label and $PGDATA/tablespace_map, and they are  
 * removed by pg_stop_backup().  
 *  
 * A non-exclusive backup is used for the streaming base backups (see  
 * src/backend/replication/basebackup.c). The difference to exclusive backups  
 * is that the backup label and tablespace map files are not written to disk.  
 * Instead, their would-be contents are returned in *labelfile and *tblspcmapfile,  
 * and the caller is responsible for including them in the backup archive as  
 * 'backup_label' and 'tablespace_map'. There can be many non-exclusive backups  
 * active at the same time, and they don't conflict with an exclusive backup  
 * either.  
 *  
 * labelfile and tblspcmapfile must be passed as NULL when starting an  
 * exclusive backup, and as initially-empty StringInfos for a non-exclusive  
 * backup.  
 *  
 * If "tablespaces" isn't NULL, it receives a list of tablespaceinfo structs  
 * describing the cluster's tablespaces.  
 *  
 * tblspcmapfile is required mainly for tar format in windows as native windows  
 * utilities are not able to create symlinks while extracting files from tar.  
 * However for consistency, the same is used for all platforms.  
 *  
 * Returns the minimum WAL location that must be present to restore from this  
 * backup, and the corresponding timeline ID in *starttli_p.  
 *  
 * Every successfully started non-exclusive backup must be stopped by calling  
 * do_pg_stop_backup() or do_pg_abort_backup().  
 *  
 * It is the responsibility of the caller of this function to verify the  
 * permissions of the calling user!  
 */  
XLogRecPtr  
do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,  
                                   StringInfo labelfile, List **tablespaces,  
                                   StringInfo tblspcmapfile)  
{  
...  
  
                        /*  
                         * Now we need to fetch the checkpoint record location, and also  
                         * its REDO pointer.  The oldest point in WAL that would be needed  
                         * to restore starting from the checkpoint is precisely the REDO  
                         * pointer.  
                         */  
                        LWLockAcquire(ControlFileLock, LW_SHARED);  
                        checkpointloc = ControlFile->checkPoint;  
                        startpoint = ControlFile->checkPointCopy.redo;  
                        starttli = ControlFile->checkPointCopy.ThisTimeLineID;  
                        checkpointfpw = ControlFile->checkPointCopy.fullPageWrites;  
                        LWLockRelease(ControlFileLock);  
  
                        if (backup_started_in_recovery)  
                        {  
                                XLogRecPtr      recptr;  
  
                                /*  
                                 * Check to see if all WAL replayed during online backup  
                                 * (i.e., since last restartpoint used as backup starting  
                                 * checkpoint) contain full-page writes.  
                                 */  
                                SpinLockAcquire(&XLogCtl->info_lck);  
                                recptr = XLogCtl->lastFpwDisableRecPtr;  
                                SpinLockRelease(&XLogCtl->info_lck);  
  
                                if (!checkpointfpw || startpoint <= recptr)  
                                        ereport(ERROR,  
                                                        (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),  
                                                         errmsg("WAL generated with full_page_writes=off was replayed "  
                                                                        "since last restartpoint"),  
                                                         errhint("This means that the backup being taken on the standby "  
                                                                         "is corrupt and should not be used. "  
                                                                         "Enable full_page_writes and run CHECKPOINT on the primary, "  
                                                                         "and then try an online backup again.")));  
  
...  
  
  
                /* Use the log timezone here, not the session timezone */  
                stamp_time = (pg_time_t) time(NULL);  
                pg_strftime(strfbuf, sizeof(strfbuf),  
                                        "%Y-%m-%d %H:%M:%S %Z",  
                                        pg_localtime(&stamp_time, log_timezone));  
                appendStringInfo(labelfile, "START WAL LOCATION: %X/%X (file %s)\n",  
                                                 LSN_FORMAT_ARGS(startpoint), xlogfilename);  
                appendStringInfo(labelfile, "CHECKPOINT LOCATION: %X/%X\n",  
                                                 LSN_FORMAT_ARGS(checkpointloc));  
                appendStringInfo(labelfile, "BACKUP METHOD: %s\n",  
                                                 exclusive ? "pg_start_backup" : "streamed");  
                appendStringInfo(labelfile, "BACKUP FROM: %s\n",  
                                                 backup_started_in_recovery ? "standby" : "primary");  
                appendStringInfo(labelfile, "START TIME: %s\n", strfbuf);  
                appendStringInfo(labelfile, "LABEL: %s\n", backupidstr);  
                appendStringInfo(labelfile, "START TIMELINE: %u\n", starttli);  
  
  
  
/*  
 * do_pg_stop_backup  
 *  
 * Utility function called at the end of an online backup. It cleans up the  
 * backup state and can optionally wait for WAL segments to be archived.  
 *  
 * If labelfile is NULL, this stops an exclusive backup. Otherwise this stops  
 * the non-exclusive backup specified by 'labelfile'.  
 *  
 * Returns the last WAL location that must be present to restore from this  
 * backup, and the corresponding timeline ID in *stoptli_p.  
 *  
 * It is the responsibility of the caller of this function to verify the  
 * permissions of the calling user!  
 */  
XLogRecPtr  
do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)  
{  
  
  
...  
  
        /*  
         * During recovery, we don't write an end-of-backup record. We assume that  
         * pg_control was backed up last and its minimum recovery point can be  
         * available as the backup end location. Since we don't have an  
         * end-of-backup record, we use the pg_control value to check whether  
         * we've reached the end of backup when starting recovery from this  
         * backup. We have no way of checking if pg_control wasn't backed up last  
         * however.  
         *  
         * We don't force a switch to new WAL file but it is still possible to  
         * wait for all the required files to be archived if waitforarchive is  
         * true. This is okay if we use the backup to start a standby and fetch  
         * the missing WAL using streaming replication. But in the case of an  
         * archive recovery, a user should set waitforarchive to true and wait for  
         * them to be archived to ensure that all the required files are  
         * available.  
         *  
         * We return the current minimum recovery point as the backup end  
         * location. Note that it can be greater than the exact backup end  
         * location if the minimum recovery point is updated after the backup of  
         * pg_control. This is harmless for current uses.  
         *  
         * XXX currently a backup history file is for informational and debug  
         * purposes only. It's not essential for an online backup. Furthermore,  
         * even if it's created, it will not be archived during recovery because  
         * an archiver is not invoked. So it doesn't seem worthwhile to write a  
         * backup history file during recovery.  
         */  
        if (backup_started_in_recovery)  
        {  
  
                stoppoint = ControlFile->minRecoveryPoint;  
  
...   
        else  
        {  
                /*  
                 * Write the backup-end xlog record  
                 */  
                XLogBeginInsert();  
                XLogRegisterData((char *) (&startpoint), sizeof(startpoint));  
                stoppoint = XLogInsert(RM_XLOG_ID, XLOG_BACKUP_END);   
...    
/*  
 * Advance minRecoveryPoint in control file.  
 *  
 * If we crash during recovery, we must reach this point again before the  
 * database is consistent.  
 *  
 * If 'force' is true, 'lsn' argument is ignored. Otherwise, minRecoveryPoint  
 * is only updated if it's not already greater than or equal to 'lsn'.  
 */  
  
 ...  

What is the minimum WAL required for recovery?

• t0 write backup_label (the actual time may be slightly less than t0, which refers to the time when the startpoint LSN in backup_label is created)

• t1 write blockX FPW to WAL (after the checkpoint, FPW is a must for the first modification of the block)

• t2 write blockX to disk (before the first modification writing to the disk, the database will ensure that the FPW generated by this modification has been written to WAL in advance, otherwise this write to disk will not be executed)

• t2 copy blockX for backup (assuming that the operation is performed at the same time and partial write occurs)

• t3 write backup stoppoint to WAL

WAL between t0 ~ t3 is the minimum requirement to ensure that the database can be restored to the consistent state.

0 1 0
Share on

digoal

281 posts | 24 followers

You may also like

Comments