All Products
Search
Document Center

ApsaraDB RDS:Flashback within seconds for real-time disaster recovery

Last Updated:Jan 25, 2024

This topic describes the flashback technology of Zettabyte File System (ZFS) that is supported by ApsaraDB RDS for PostgreSQL. After you specify your Elastic Compute Service (ECS) instance as a backup of your ApsaraDB RDS for PostgreSQL instance, ZFS allows your ECS instance to synchronize data from your RDS instance and invoke the cron tool to quickly create snapshots at fixed intervals. If your RDS instance becomes faulty, you can use the snapshots on the ECS instance to restore your RDS instance within seconds.

Background information

ZFS is a dynamic file management system that offers features and benefits not found in other file systems available today. ZFS uses the concept of storage pools to manage physical storage devices. This allows ZFS to aggregate physical storage devices into a storage pool. The storage pool describes the characteristics of physical storage devices. These characteristics include device layout and data redundancy. The storage pool also acts as an arbitrary data store from which file systems can be created. File systems are no longer constrained to individual physical devices. In addition, file systems provided with individual physical devices can be shared in the storage pool.

ZFS uses the copy on write technique to manage data. When you write new data, the block that contains the original data is retained. This allows ZFS to store all of the data that is required to create snapshots. You can use the data to create snapshots in seconds. In addition, ZFS allows modifications to a file to be shared among file systems and their snapshots. This allows you to optimize the storage usage of ZFS snapshots.

These features of ZFS allow you to find the latest snapshot. You can use the latest snapshot to restore data in the event of faults or disasters.

Preparations

  • Create a suitable ECS instance. For more information, see Creation methods. We recommend that your ECS instance have the same zone, virtual private cloud (VPC), and vSwitch as your RDS instance. This ensures fast communication between these instances over an internal network. If you require cross-region disaster recovery, create a Cloud Enterprise Network (CEN) instance. For more information, see Use CEN to enable intra-region network communication.

  • Configure a whitelist to grant your ECS instance access to your RDS instance. For more information, see Configure an IP address whitelist.

Create a snapshot

  1. Connect to your ECS instance. For more information, see Connection method overview.

  2. Install ZFS.

    wget http://download.zfsonlinux.org/epel/zfs-release.el7_7.noarch.rpm  
    
    sudo rpm -ivh zfs-release.el7_7.noarch.rpm   
    
    sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm  
    
    sudo yum install -y "kernel-devel-uname-r == $(uname -r)" zfs  
  3. Modify the rc.local file.

    vi /etc/rc.local  
    /sbin/modprobe zfs  
    
    sudo chmod +x /etc/rc.local  
  4. Test ZFS. After you verify that ZFS is running as expected, restart your ECS instance.

    modprobe zfs   
    
    zpool list  
    no pools available  
    
    reboot  
  5. Install PostgreSQL 12 on your ECS instance.

    sudo yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm  
    
    sudo yum install postgresql12*  
  6. Configure the following parameters on the disks of your ECS instance:

    # Align partitions. The /dev/vdb1 path is where a new disk is created.
    parted -a optimal -s /dev/vdb1 mklabel gpt mkpart primary 1MiB 100%FREE   
    
    # Configure the ZFS storage pool.
    zpool create zp1 -f -o ashift=13 vdc1    
    
    # Configure the canmount parameter.
    zfs set canmount=off zp1     
  7. Create a directory that is used to store data files.

    zfs create -o mountpoint=/zpdata01 -o recordsize=8K -o atime=off -o primarycache=metadata -o logbias=throughput -o secondarycache=none zp1/zpdata01
  8. Create a directory that is used to store archived write-ahead logging (WAL) files.

    zfs create -o mountpoint=/zpdata02 -o recordsize=8K -o atime=off -o primarycache=metadata -o logbias=throughput -o secondarycache=none -o compression=on zp1/zpdata02    
    
    zfs set compression=on zp1/zpdata02  
    
    
    zfs list  
    NAME           USED  AVAIL     REFER  MOUNTPOINT  
    zp1           1.29M  1.42T      192K  /zp1  
    zp1/zpdata01   192K  1.42T      192K  /zpdata01  
    zp1/zpdata02   192K  1.42T      192K  /zpdata02  
  9. Create a directory that is used to store the synchronized data from your RDS instance.

    mkdir /zpdata01/pg12_1921_data  
    
    mkdir /zpdata02/pg12_1921_wal  
    
    sudo chown -R postgres:postgres /zpdata01/pg12_1921_data  
    sudo chown -R postgres:postgres /zpdata02/pg12_1921_wal 
  10. Synchronize data from your RDS instance to your ECS instance.

    su - postgres  
    
    export PGPASSWORD=<The password of the privileged account for your RDS instance>      
    nohup pg_basebackup -D /zpdata01/pg12_1921_data -F p -R -c fast -X stream -h <Endpoint that is used to connect to your RDS instance> -p <Port that is used to connect to your RDS instance> -U <Username of the privileged account for your RDS instance>  >./bak.log 2>&1 &   
    Note

    Make sure that you have configured a whitelist to grant your ECS instance access to your RDS instance. For more information, see Configure an IP address whitelist.

    同步数据

  11. Comment out the following parameters in the /zpdata01/pg12_1921_data/postgresql.conf file:

    #Fri Mar 13 09:55:03 CST 2020  
    #ssl_key_file='server.key'  
    #huge_pages=try  
    #auto_explain.sample_rate=1  
    #zhparser.multi_zall=off  
    #shared_preload_libraries='pg_stat_statements,auth_delay,auto_explain,zhparser,timescaledb,pg_pathman'  
    #promote_trigger_file='/data/postgresql.trigger'  
    #ssl=off  
    #rds_max_log_files=20  
    #pg_pathman.enable_auto_partition=on  
    #shared_buffers=32768MB  
    #zhparser.punctuation_ignore=off  
    #pg_pathman.override_copy=on  
    #port=1922  
    #pg_stat_statements.max=5000  
    #auth_delay.milliseconds=3s  
    #auto_explain.log_nested_statements=off  
    #track_io_timing=on  
    #zhparser.multi_zmain=off  
    #auto_explain.log_analyze=off  
    #archive_mode=on  
    #ssl_cert_file='server.crt'  
    #zhparser.multi_short=off  
    #zhparser.dict_in_memory=off  
    #auto_explain.log_format=text  
    #auto_explain.log_min_duration=-1  
    #rds.rds_max_non_super_conns=12800  
    #pg_pathman.enable=on  
    #archive_command='/bin/date'  
    #auto_explain.log_verbose=off  
    #log_line_prefix='\1\n\t%p\t%r\t%u\t%d\t%t\t%e\t%T\t%S\t%U\t%E\t\t'  
    #pg_pathman.enable_runtimemergeappend=on  
    #zhparser.extra_dicts='dict_extra.xdb'  
    #auto_explain.log_buffers=off  
    #pg_stat_statements.track=top  
    #jit_provider='llvmjit'  
    #pg_pathman.enable_partitionrouter=off  
    #pg_stat_statements.track_utility=off  
    #pg_stat_statements.save=off  
    #zhparser.dicts_type='EXTRA'  
    #auto_explain.log_timing=on  
    #pg_pathman.enable_runtimeappend=on  
    #zhparser.seg_with_duality=off  
    #rds.rds_max_super_conns=100  
    #pg_pathman.enable_partitionfilter=on  
    #log_destination='stderr,csvlog'  
    #zhparser.multi_duality=off  
    #pg_pathman.insert_into_fdw='postgres'  
    #pg_pathman.enable_bounds_cache=on  
    #rds.rds_max_non_super_wal_snd=32  
    #auto_explain.log_triggers=off  
    #rds_sync_replication_timeout=0  
  12. Modify the configuration file.

    vi /zpdata01/pg12_1921_data/postgresql.auto.conf
    
    primary_conninfo = 'user=<The username of the privileged account for your RDS instance> password=''<The password of the privileged account for your RDS instance> '' host=''<The endpoint that is used to connect to your RDS instance>'' port=<The port that is used to connect to your RDS instance> application_name=hello_rds_pg12'  
    port=1922  
    shared_buffers=32GB  
    log_destination='csvlog'  
    archive_mode=always  
    archive_command='test ! -f /zpdata02/pg12_1921_wal/%f && cp %p /zpdata02/pg12_1921_wal/%f'  
  13. Modify the permissions on the created directories.

    sudo chmod 700 /zpdata02/pg12_1921_wal  
    sudo chmod 700 /zpdata01/pg12_1921_data  
  14. Start the PostgreSQL database on your ECS instance.

    su - postgres  
    /usr/pgsql-12/bin/pg_ctl start -D /zpdata01/pg12_1921_data  
    Note

    If the following error is reported because the configuration of your ECS instance is low, we recommend that you upgrade your ECS instance:

    HINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages.
  15. Configure the PostgreSQL database on your ECS instance to automatically start.

    vi /etc/rc.local  
    
    su - postgres -c "/usr/pgsql-12/bin/pg_ctl start -D /zpdata01/pg12_1921_data"  
  16. Configure the PostgreSQL database on your ECS instance to automatically create snapshots of the directory that is used to store data files. You do not need to create snapshots of the directory that is used to store archived WAL files.

    1. Create a script to configure the permission that is used to create snapshots.

      vi /etc/snap.sh  
      
      STIME=`date +%F%T`  
      /usr/sbin/zfs snapshot zp1/zpdata01@$STIME   
      
      chmod 500 /etc/snap.sh  
    2. Check whether snapshots can be created as expected.

      /etc/snap.sh  
      
      # zfs list -t snapshot  
      NAME                              USED  AVAIL     REFER  MOUNTPOINT  
      zp1/zpdata01@2020-03-2117:06:47   144K      -      770M  -  
    3. Configure the crond daemon process to automatically start.

      systemctl start crond  
      
      systemctl enable crond  
      
      systemctl status crond  
      ● crond.service - Command Scheduler  
         Loaded: loaded (/usr/lib/systemd/system/crond.service; enabled; vendor preset: enabled)  
         Active: active (running) since Sat 2020-03-21 16:16:08 CST; 53min ago  
       Main PID: 2526 (crond)  
         CGroup: /system.slice/crond.service  
                 └─2526 /usr/sbin/crond -n  
    4. Configure the crontab file.

      crontab -e  
      1 1 * * * /etc/snap.sh  
      
      crontab -l -u username  
      1 1 * * * /etc/snap.sh  
    Note

    You can configure automatic deletion of snapshots or archives based on your disk usage and business requirements. To manually delete snapshots or archives, follow these steps:

    Manually delete snapshots:
    zfs list -t snapshot  
    zNAME                              USED  AVAIL     REFER  MOUNTPOINT  
    zp1/zpdata01@2020-03-2117:06:47   144K      -      770M  -  
    zp1/zpdata01@2020-03-2117:17:01     0B      -      786M  -  
    
    zfs destroy zp1/zpdata01@2020-03-2117:06:47  
    
    zfs list -t snapshot  
    NAME                              USED  AVAIL     REFER  MOUNTPOINT  
    zp1/zpdata01@2020-03-2117:17:01     0B      -      786M  -  
    
    
    Manually delete archives:
    find /zpdata02/pg12_1921_wal/ -type f -mtime +7 -exec rm -f {} \;  
  17. Check the validity of backup sets. For more information, visit GitHub.

  18. Obtain the latency of data synchronization between your RDS and ECS instances.

    postgres=> select pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_flush_lsn(),sent_lsn)) as sent_delay,  
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_flush_lsn(),replay_lsn)) as replay_dealy,*   
    from pg_stat_replication ;  
    -[ RECORD 1 ]----+------------------------------  
    sent_delay       | 0 bytes  
    replay_dealy     | 0 bytes  
    pid              | 84098  
    usesysid         | 886185  
    usename          | rep  
    application_name | hello_rds_pg12  
    client_addr      | 192.168.0.173  
    client_hostname  |   
    client_port      | 60402  
    backend_start    | 2020-03-21 16:59:01.890775+08  
    backend_xmin     |   
    state            | streaming  
    sent_lsn         | 11D/97002068  
    write_lsn        | 11D/97002068  
    flush_lsn        | 11D/97002068  
    replay_lsn       | 11D/97002068  
    write_lag        |   
    flush_lag        |   
    replay_lag       |   
    sync_priority    | 0  
    sync_state       | async  
    reply_time       | 2020-03-21 17:01:17.198139+08  

After you complete the preceding steps, snapshots are created at fixed intervals on your ECS instance. You can use the created snapshots to restore your RDS instance in seconds for real-time disaster recovery.

Restore your RDS instance in seconds

  1. Clone the ZFS file system based on a suitable snapshot.

    zfs list -t snapshot  
    NAME                              USED  AVAIL     REFER  MOUNTPOINT  
    zp1/zpdata01@2020-03-2117:17:01   312K      -      786M  -  
    
    zfs clone -o mountpoint=/test_recovery zp1/zpdata01@2020-03-2117:17:01 zp1/zpdata_test  
    cd /test_recovery  
    ll  
    total 17  
    drwx------ 20 postgres postgres 35 Mar 21 16:59 pg12_1921_data  
  2. Configure the parameters that are used to restore the data of your RDS instance.

    Note

    If your ECS instance cannot provide a sufficient memory capacity, you can configure a small shared buffer.

    vi /test_recovery/pg12_1921_data/postgresql.auto.conf
    
    port=1923  
    shared_buffers=32GB  
    log_destination='csvlog'  
    recovery_end_command = 'cp /zpdata02/pg12_1921_wal/%f %p'  
    recovery_target_time = '2020-03-21 17:28:37.670338+08'  
    recovery_target_timeline = 'latest'  
    recovery_target_action = 'pause'   
  3. Delete the socket and pid files from the cloned ZFS file system.

    rm -f /test_recovery/pg12_1921_data/.s.*  
    rm /test_recovery/pg12_1921_data/postmaster.pid  
  4. Restore the data from the cloned ZFS file system.

    su - postgres
    /usr/pgsql-12/bin/pg_ctl start -D /test_recovery/pg12_1921_data 
  5. View the restored data.

    psql -h /test_recovery/pg12_1921_data -p 1923 -U <The username of the privileged account for your RDS instance> postgres  
    psql (12.1)  
    Type "help" for help.  
    
    postgres=> \dt  

    恢复成功

    Note

    After you verify that the restored data is correct, you can use the pgdump tool to copy the restored data to your RDS instance.

Delete restored data from your ECS instance

After the restoration is complete, run the following commands to delete the restored data from the test_recovery directory on your ECS instance:

su - postgres
/usr/pgsql-12/bin/pg_ctl stop -m fast -D /test_recovery/pg12_1921_data  

sudo zfs destroy zp1/zpdata_test