This topic describes the flashback technology of Zettabyte File System (ZFS) that is supported by ApsaraDB RDS for PostgreSQL. After you specify your ECS instance as a backup of your RDS instance, ZFS allows your ECS instance to synchronize data from your RDS instance and invoke the cron tool to quickly create snapshots at fixed intervals. If your RDS instance becomes faulty, you can use the snapshots on the ECS instance to restore your RDS instance in seconds.

Background information

ZFS is a dynamic file management system that offers features and benefits not found in other file systems available today. ZFS uses the concept of storage pools to manage physical storage devices. This allows ZFS to aggregate physical storage devices into a storage pool. The storage pool describes the characteristics of physical storage devices. These characteristics include device layout and data redundancy. The storage pool also acts as an arbitrary data store from which file systems can be created. File systems are no longer constrained to individual physical devices. In addition, file systems provided with individual physical devices can be shared in the storage pool.

ZFS uses the copy on write technique to manage data. When you write new data, the block that contains the original data is retained. This allows ZFS to store all of the data that is required to create snapshots. You can use the data to create snapshots in seconds. In addition, ZFS allows modifications to a file to be shared among file systems and their snapshots. This allows you to optimize the storage usage of ZFS snapshots.

These features of ZFS allow you to find the latest snapshot. You can use the latest snapshot to restore data in the event of faults or disasters.

Before you begin

  • Create a suitable ECS instance. For more information, see ECS instance creation overview. We recommend that your ECS instance run the 64-bit CentOS 7 operating system and have the same specifications, zones, Virtual Private Cloud (VPC), and VSwitch as your RDS instance. This ensures the fastest communication between your ECS and RDS instances over an internal network. If you require cross-region disaster recovery, create a Cloud Enterprise Network instance. For more information, see Tutorial overview.
  • Configure a whitelist to grant your ECS instance access to your RDS instance. For more information, see Configure a whitelist for an RDS PostgreSQL instance.

Create a snapshot

  1. Connect to your ECS instance. For more information, see Overview.
  2. Install ZFS.
    wget http://download.zfsonlinux.org/epel/zfs-release.el7_7.noarch.rpm  
    
    rpm -ivh zfs-release.el7_7.noarch.rpm   
    
    yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm  
    
    yum install -y "kernel-devel-uname-r == $(uname -r)" zfs  
  3. Modify the rc.local file.
    vi /etc/rc.local  
    /sbin/modprobe zfs  
    
    chmod +x /etc/rc.local  
  4. Test ZFS. After you check that ZFS is properly running, restart your ECS instance.
    # modprobe zfs   
    
    # zpool list  
    no pools available  
    
    # reboot  
  5. Install PostgreSQL 12 on your ECS instance.
    yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm  
    
    yum install postgresql12*  
  6. Configure the following parameters on the disks of your ECS instance:
    parted -a optimal -s /dev/vdb1 mklabel gpt mkpart primary 1MiB 100%FREE   // Align partitions. The /dev/vdb1 path is where a new disk is created.
    
    zpool create zp1 -f -o ashift=13 vdc1    // Configure the ZFS storage pool.
    
    zfs set canmount=off zp1     // Set the canmount parameter.
  7. Create a directory that is used to store data files.
    zfs create -o mountpoint=/zpdata01 -o recordsize=8K -o atime=off -o primarycache=metadata -o logbias=throughput -o secondarycache=none zp1/zpdata01
  8. Create a directory that is used to store archived write-ahead logging (WAL) files.
    zfs create -o mountpoint=/zpdata02 -o recordsize=8K -o atime=off -o primarycache=metadata -o logbias=throughput -o secondarycache=none -o compression=on zp1/zpdata02    
    
    zfs set compression=on zp1/zpdata02  
    
    
    # zfs list  
    NAME           USED  AVAIL     REFER  MOUNTPOINT  
    zp1           1.29M  1.42T      192K  /zp1  
    zp1/zpdata01   192K  1.42T      192K  /zpdata01  
    zp1/zpdata02   192K  1.42T      192K  /zpdata02  
  9. Create a directory that is used to store the synchronized data from your RDS instance.
    mkdir /zpdata01/pg12_1921_data  
    
    mkdir /zpdata02/pg12_1921_wal  
    
    chown -R postgres:postgres /zpdata01/pg12_1921_data  
    chown -R postgres:postgres /zpdata02/pg12_1921_wal 
  10. Synchronize data from your RDS instance to your ECS instance.
    su - postgres  
    
    export PGPASSWORD=<The password of the privileged account for your RDS instance>      
    nohup pg_basebackup -D /zpdata01/pg12_1921_data -F p -R -c fast -X stream -h <The endpoint that is used to connect to your RDS instance> -p <The port that is used to connect to your RDS instance> -U <The username of the privileged account for your RDS instance>  >./bak.log 2>&1 &   
    Note Make sure that you have configured a whitelist to grant your ECS instance access to your RDS instance. For more information, see Configure a whitelist for an RDS PostgreSQL instance.
    Synchronize data from your RDS instance to your ECS instance
  11. Comment out the following parameters in the /zpdata01/pg12_1921_data/postgresql.conf file:
    #Fri Mar 13 09:55:03 CST 2020  
    #ssl_key_file='server.key'  
    #huge_pages=try  
    #auto_explain.sample_rate=1  
    #zhparser.multi_zall=off  
    #shared_preload_libraries='pg_stat_statements,auth_delay,auto_explain,zhparser,timescaledb,pg_pathman'  
    #promote_trigger_file='/data/postgresql.trigger'  
    #ssl=off  
    #rds_max_log_files=20  
    #pg_pathman.enable_auto_partition=on  
    #shared_buffers=32768MB  
    #zhparser.punctuation_ignore=off  
    #pg_pathman.override_copy=on  
    #port=1922  
    #pg_stat_statements.max=5000  
    #auth_delay.milliseconds=3s  
    #auto_explain.log_nested_statements=off  
    #track_io_timing=on  
    #zhparser.multi_zmain=off  
    #auto_explain.log_analyze=off  
    #archive_mode=on  
    #ssl_cert_file='server.crt'  
    #zhparser.multi_short=off  
    #zhparser.dict_in_memory=off  
    #auto_explain.log_format=text  
    #auto_explain.log_min_duration=-1  
    #rds.rds_max_non_super_conns=12800  
    #pg_pathman.enable=on  
    #archive_command='/bin/date'  
    #auto_explain.log_verbose=off  
    #log_line_prefix='\1\n\t%p\t%r\t%u\t%d\t%t\t%e\t%T\t%S\t%U\t%E\t\t'  
    #pg_pathman.enable_runtimemergeappend=on  
    #zhparser.extra_dicts='dict_extra.xdb'  
    #auto_explain.log_buffers=off  
    #pg_stat_statements.track=top  
    #jit_provider='llvmjit'  
    #pg_pathman.enable_partitionrouter=off  
    #pg_stat_statements.track_utility=off  
    #pg_stat_statements.save=off  
    #zhparser.dicts_type='EXTRA'  
    #auto_explain.log_timing=on  
    #pg_pathman.enable_runtimeappend=on  
    #zhparser.seg_with_duality=off  
    #rds.rds_max_super_conns=100  
    #pg_pathman.enable_partitionfilter=on  
    #log_destination='stderr,csvlog'  
    #zhparser.multi_duality=off  
    #pg_pathman.insert_into_fdw='postgres'  
    #pg_pathman.enable_bounds_cache=on  
    #rds.rds_max_non_super_wal_snd=32  
    #auto_explain.log_triggers=off  
    #rds_sync_replication_timeout=0  
  12. Modify the postgresql.auto.conf file.
    vi /zpdata01/pg12_1921_data/postgresql.auto.conf
    
    primary_conninfo = 'user=<The username of the privileged account for your RDS instance> password=''<The password of the privileged account for your RDS instance> '' host=''<The endpoint that is used to connect to your RDS instance>'' port=<The port that is used to connect to your RDS instance> application_name=hello_rds_pg12'  
    port=1922  
    shared_buffers=32GB  
    log_destination='csvlog'  
    archive_mode=always  
    archive_command='test ! -f /zpdata02/pg12_1921_wal/%f && cp %p /zpdata02/pg12_1921_wal/%f'  
  13. Modify the permissions on the created directories.
    chmod 700 /zpdata02/pg12_1921_wal  
    chmod 700 /zpdata01/pg12_1921_data  
  14. Start the PostgreSQL 12 database on your ECS instance.
    su - postgres  
    /usr/pgsql-12/bin/pg_ctl start -D /zpdata01/pg12_1921_data  
    Note If the following error is reported because the configuration of your ECS instance is low, we recommend that you upgrade your ECS instance:
    HINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages.
  15. Configure the PostgreSQL 12 database on your ECS instance to automatically start.
    vi /etc/rc.local  
    
    su - postgres -c "/usr/pgsql-12/bin/pg_ctl start -D /zpdata01/pg12_1921_data"  
  16. Configure the PostgreSQL 12 database on your ECS instance to automatically create snapshots of the directory that is used to store data files. You do not need to create snapshots of the directory that is used to store archived WAL files.
    1. Create a script to configure the permission that is used to create snapshots.
      vi /etc/snap.sh  
      
      STIME=`date +%F%T`  
      /usr/sbin/zfs snapshot zp1/zpdata01@$STIME   
      
      chmod 500 /etc/snap.sh  
    2. Check whether snapshots can be properly created.
      /etc/snap.sh  
      
      # zfs list -t snapshot  
      NAME                              USED  AVAIL     REFER  MOUNTPOINT  
      zp1/zpdata01@2020-03-2117:06:47   144K      -      770M  -  
    3. Configure the crond daemon process to automatically start.
      # systemctl start crond  
      
      # systemctl enable crond  
      
      # systemctl status crond  
      ● crond.service - Command Scheduler  
         Loaded: loaded (/usr/lib/systemd/system/crond.service; enabled; vendor preset: enabled)  
         Active: active (running) since Sat 2020-03-21 16:16:08 CST; 53min ago  
       Main PID: 2526 (crond)  
         CGroup: /system.slice/crond.service  
                 └─2526 /usr/sbin/crond -n  
    4. Configure the crontab file.
      # crontab -e  
      1 1 * * * /etc/snap.sh  
      
      # crontab -l -u root  
      1 1 * * * /etc/snap.sh  
    Note You can configure automatic deletion of snapshots or archives based on your disk usage and business requirements. To manually delete snapshots or archives, follow these steps:
    Manually delete snapshots:
    # zfs list -t snapshot  
    zNAME                              USED  AVAIL     REFER  MOUNTPOINT  
    zp1/zpdata01@2020-03-2117:06:47   144K      -      770M  -  
    zp1/zpdata01@2020-03-2117:17:01     0B      -      786M  -  
    
    # zfs destroy zp1/zpdata01@2020-03-2117:06:47  
    
    # zfs list -t snapshot  
    NAME                              USED  AVAIL     REFER  MOUNTPOINT  
    zp1/zpdata01@2020-03-2117:17:01     0B      -      786M  -  
    
    
    Manually delete archives:
    find /zpdata02/pg12_1921_wal/ -type f -mtime +7 -exec rm -f {} \;  
  17. Check the validity of backup sets. For more information, visit GitHub.
  18. Obtain the latency of data synchronization between your RDS and ECS instances.
    postgres=> select pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_flush_lsn(),sent_lsn)) as sent_delay,  
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_flush_lsn(),replay_lsn)) as replay_dealy,*   
    from pg_stat_replication ;  
    -[ RECORD 1 ]----+------------------------------  
    sent_delay       | 0 bytes  
    replay_dealy     | 0 bytes  
    pid              | 84098  
    usesysid         | 886185  
    usename          | rep  
    application_name | hello_rds_pg12  
    client_addr      | 192.168.0.173  
    client_hostname  |   
    client_port      | 60402  
    backend_start    | 2020-03-21 16:59:01.890775+08  
    backend_xmin     |   
    state            | streaming  
    sent_lsn         | 11D/97002068  
    write_lsn        | 11D/97002068  
    flush_lsn        | 11D/97002068  
    replay_lsn       | 11D/97002068  
    write_lag        |   
    flush_lag        |   
    replay_lag       |   
    sync_priority    | 0  
    sync_state       | async  
    reply_time       | 2020-03-21 17:01:17.198139+08  

After you complete the preceding steps, snapshots will be created at fixed intervals on your ECS instance. You can use the created snapshots to restore your RDS instance in seconds for real-time disaster recovery.

Restore your RDS instance in seconds

  1. Clone the ZFS file system based on a suitable snapshot.
    # zfs list -t snapshot  
    NAME                              USED  AVAIL     REFER  MOUNTPOINT  
    zp1/zpdata01@2020-03-2117:17:01   312K      -      786M  -  
    
    # zfs clone -o mountpoint=/test_recovery zp1/zpdata01@2020-03-2117:17:01 zp1/zpdata_test  
    [root@iZbp135pwcjjoxqgfpw9k1Z pg12_1921_wal]# cd /test_recovery  
    [root@iZbp135pwcjjoxqgfpw9k1Z test_recovery]# ll  
    total 17  
    drwx------ 20 postgres postgres 35 Mar 21 16:59 pg12_1921_data  
  2. Configure the parameters that are used to restore the data of your RDS instance.
    Note If your ECS instance cannot provide a sufficient memory capacity, you can configure a small shared buffer.
    vi /test_recovery/pg12_1921_data/postgresql.auto.conf
    
    port=1923  
    shared_buffers=32GB  
    log_destination='csvlog'  
    recovery_end_command = 'cp /zpdata02/pg12_1921_wal/%f %p'  
    recovery_target_time = '2020-03-21 17:28:37.670338+08'  
    recovery_target_timeline = 'latest'  
    recovery_target_action = 'pause'   
  3. Delete the socket and pid files from the cloned ZFS file system.
    rm -f /test_recovery/pg12_1921_data/.s.*  
    rm /test_recovery/pg12_1921_data/postmaster.pid  
  4. Restore the data from the cloned ZFS file system.
    su - postgres
    /usr/pgsql-12/bin/pg_ctl start -D /test_recovery/pg12_1921_data 
  5. View the restored data.
    psql -h /test_recovery/pg12_1921_data -p 1923 -U <The username of the privileged account for your RDS instance> postgres  
    psql (12.1)  
    Type "help" for help.  
    
    postgres=> \dt  
    Data restoration successful
    Note After you check that the restored data is correct, you can use the pgdump tool to copy the restored data to your RDS instance.

Delete restored data from your ECS instance

After the restoration is complete, run the following commands to delete the restored data from the test_recovery directory on your ECS instance:

su - postgres
/usr/pgsql-12/bin/pg_ctl stop -m fast -D /test_recovery/pg12_1921_data  

sudo zfs destroy zp1/zpdata_test