This topic describes the flashback technology of Zettabyte File System (ZFS) that is supported by ApsaraDB RDS for PostgreSQL. After you specify your Elastic Compute Service (ECS) instance as a backup of your ApsaraDB RDS for PostgreSQL instance, ZFS allows your ECS instance to synchronize data from your RDS instance and invoke the cron tool to quickly create snapshots at fixed intervals. If your RDS instance becomes faulty, you can use the snapshots on the ECS instance to restore your RDS instance within seconds.
Background information
ZFS is a dynamic file management system that offers features and benefits not found in other file systems available today. ZFS uses the concept of storage pools to manage physical storage devices. This allows ZFS to aggregate physical storage devices into a storage pool. The storage pool describes the characteristics of physical storage devices. These characteristics include device layout and data redundancy. The storage pool also acts as an arbitrary data store from which file systems can be created. File systems are no longer constrained to individual physical devices. In addition, file systems provided with individual physical devices can be shared in the storage pool.
ZFS uses the copy on write technique to manage data. When you write new data, the block that contains the original data is retained. This allows ZFS to store all of the data that is required to create snapshots. You can use the data to create snapshots in seconds. In addition, ZFS allows modifications to a file to be shared among file systems and their snapshots. This allows you to optimize the storage usage of ZFS snapshots.
These features of ZFS allow you to find the latest snapshot. You can use the latest snapshot to restore data in the event of faults or disasters.
Preparations
Create a suitable ECS instance. For more information, see Creation methods. We recommend that your ECS instance have the same zone, virtual private cloud (VPC), and vSwitch as your RDS instance. This ensures fast communication between these instances over an internal network. If you require cross-region disaster recovery, create a Cloud Enterprise Network (CEN) instance. For more information, see Use CEN to enable intra-region network communication.
Configure a whitelist to grant your ECS instance access to your RDS instance. For more information, see Configure an IP address whitelist.
Create a snapshot
Connect to your ECS instance. For more information, see Connection method overview.
Install ZFS.
wget http://download.zfsonlinux.org/epel/zfs-release.el7_7.noarch.rpm sudo rpm -ivh zfs-release.el7_7.noarch.rpm sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm sudo yum install -y "kernel-devel-uname-r == $(uname -r)" zfs
Modify the rc.local file.
vi /etc/rc.local /sbin/modprobe zfs sudo chmod +x /etc/rc.local
Test ZFS. After you verify that ZFS is running as expected, restart your ECS instance.
modprobe zfs zpool list no pools available reboot
Install PostgreSQL 12 on your ECS instance.
sudo yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm sudo yum install postgresql12*
Configure the following parameters on the disks of your ECS instance:
# Align partitions. The /dev/vdb1 path is where a new disk is created. parted -a optimal -s /dev/vdb1 mklabel gpt mkpart primary 1MiB 100%FREE # Configure the ZFS storage pool. zpool create zp1 -f -o ashift=13 vdc1 # Configure the canmount parameter. zfs set canmount=off zp1
Create a directory that is used to store data files.
zfs create -o mountpoint=/zpdata01 -o recordsize=8K -o atime=off -o primarycache=metadata -o logbias=throughput -o secondarycache=none zp1/zpdata01
Create a directory that is used to store archived write-ahead logging (WAL) files.
zfs create -o mountpoint=/zpdata02 -o recordsize=8K -o atime=off -o primarycache=metadata -o logbias=throughput -o secondarycache=none -o compression=on zp1/zpdata02 zfs set compression=on zp1/zpdata02 zfs list NAME USED AVAIL REFER MOUNTPOINT zp1 1.29M 1.42T 192K /zp1 zp1/zpdata01 192K 1.42T 192K /zpdata01 zp1/zpdata02 192K 1.42T 192K /zpdata02
Create a directory that is used to store the synchronized data from your RDS instance.
mkdir /zpdata01/pg12_1921_data mkdir /zpdata02/pg12_1921_wal sudo chown -R postgres:postgres /zpdata01/pg12_1921_data sudo chown -R postgres:postgres /zpdata02/pg12_1921_wal
Synchronize data from your RDS instance to your ECS instance.
su - postgres export PGPASSWORD=<The password of the privileged account for your RDS instance> nohup pg_basebackup -D /zpdata01/pg12_1921_data -F p -R -c fast -X stream -h <Endpoint that is used to connect to your RDS instance> -p <Port that is used to connect to your RDS instance> -U <Username of the privileged account for your RDS instance> >./bak.log 2>&1 &
NoteMake sure that you have configured a whitelist to grant your ECS instance access to your RDS instance. For more information, see Configure an IP address whitelist.
Comment out the following parameters in the /zpdata01/pg12_1921_data/postgresql.conf file:
#Fri Mar 13 09:55:03 CST 2020 #ssl_key_file='server.key' #huge_pages=try #auto_explain.sample_rate=1 #zhparser.multi_zall=off #shared_preload_libraries='pg_stat_statements,auth_delay,auto_explain,zhparser,timescaledb,pg_pathman' #promote_trigger_file='/data/postgresql.trigger' #ssl=off #rds_max_log_files=20 #pg_pathman.enable_auto_partition=on #shared_buffers=32768MB #zhparser.punctuation_ignore=off #pg_pathman.override_copy=on #port=1922 #pg_stat_statements.max=5000 #auth_delay.milliseconds=3s #auto_explain.log_nested_statements=off #track_io_timing=on #zhparser.multi_zmain=off #auto_explain.log_analyze=off #archive_mode=on #ssl_cert_file='server.crt' #zhparser.multi_short=off #zhparser.dict_in_memory=off #auto_explain.log_format=text #auto_explain.log_min_duration=-1 #rds.rds_max_non_super_conns=12800 #pg_pathman.enable=on #archive_command='/bin/date' #auto_explain.log_verbose=off #log_line_prefix='\1\n\t%p\t%r\t%u\t%d\t%t\t%e\t%T\t%S\t%U\t%E\t\t' #pg_pathman.enable_runtimemergeappend=on #zhparser.extra_dicts='dict_extra.xdb' #auto_explain.log_buffers=off #pg_stat_statements.track=top #jit_provider='llvmjit' #pg_pathman.enable_partitionrouter=off #pg_stat_statements.track_utility=off #pg_stat_statements.save=off #zhparser.dicts_type='EXTRA' #auto_explain.log_timing=on #pg_pathman.enable_runtimeappend=on #zhparser.seg_with_duality=off #rds.rds_max_super_conns=100 #pg_pathman.enable_partitionfilter=on #log_destination='stderr,csvlog' #zhparser.multi_duality=off #pg_pathman.insert_into_fdw='postgres' #pg_pathman.enable_bounds_cache=on #rds.rds_max_non_super_wal_snd=32 #auto_explain.log_triggers=off #rds_sync_replication_timeout=0
Modify the configuration file.
vi /zpdata01/pg12_1921_data/postgresql.auto.conf primary_conninfo = 'user=<The username of the privileged account for your RDS instance> password=''<The password of the privileged account for your RDS instance> '' host=''<The endpoint that is used to connect to your RDS instance>'' port=<The port that is used to connect to your RDS instance> application_name=hello_rds_pg12' port=1922 shared_buffers=32GB log_destination='csvlog' archive_mode=always archive_command='test ! -f /zpdata02/pg12_1921_wal/%f && cp %p /zpdata02/pg12_1921_wal/%f'
Modify the permissions on the created directories.
sudo chmod 700 /zpdata02/pg12_1921_wal sudo chmod 700 /zpdata01/pg12_1921_data
Start the PostgreSQL database on your ECS instance.
su - postgres /usr/pgsql-12/bin/pg_ctl start -D /zpdata01/pg12_1921_data
NoteIf the following error is reported because the configuration of your ECS instance is low, we recommend that you upgrade your ECS instance:
HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages.
Configure the PostgreSQL database on your ECS instance to automatically start.
vi /etc/rc.local su - postgres -c "/usr/pgsql-12/bin/pg_ctl start -D /zpdata01/pg12_1921_data"
Configure the PostgreSQL database on your ECS instance to automatically create snapshots of the directory that is used to store data files. You do not need to create snapshots of the directory that is used to store archived WAL files.
Create a script to configure the permission that is used to create snapshots.
vi /etc/snap.sh STIME=`date +%F%T` /usr/sbin/zfs snapshot zp1/zpdata01@$STIME chmod 500 /etc/snap.sh
Check whether snapshots can be created as expected.
/etc/snap.sh # zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT zp1/zpdata01@2020-03-2117:06:47 144K - 770M -
Configure the crond daemon process to automatically start.
systemctl start crond systemctl enable crond systemctl status crond ● crond.service - Command Scheduler Loaded: loaded (/usr/lib/systemd/system/crond.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2020-03-21 16:16:08 CST; 53min ago Main PID: 2526 (crond) CGroup: /system.slice/crond.service └─2526 /usr/sbin/crond -n
Configure the crontab file.
crontab -e 1 1 * * * /etc/snap.sh crontab -l -u username 1 1 * * * /etc/snap.sh
NoteYou can configure automatic deletion of snapshots or archives based on your disk usage and business requirements. To manually delete snapshots or archives, follow these steps:
Manually delete snapshots: zfs list -t snapshot zNAME USED AVAIL REFER MOUNTPOINT zp1/zpdata01@2020-03-2117:06:47 144K - 770M - zp1/zpdata01@2020-03-2117:17:01 0B - 786M - zfs destroy zp1/zpdata01@2020-03-2117:06:47 zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT zp1/zpdata01@2020-03-2117:17:01 0B - 786M - Manually delete archives: find /zpdata02/pg12_1921_wal/ -type f -mtime +7 -exec rm -f {} \;
Check the validity of backup sets. For more information, visit GitHub.
Obtain the latency of data synchronization between your RDS and ECS instances.
postgres=> select pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_flush_lsn(),sent_lsn)) as sent_delay, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_flush_lsn(),replay_lsn)) as replay_dealy,* from pg_stat_replication ; -[ RECORD 1 ]----+------------------------------ sent_delay | 0 bytes replay_dealy | 0 bytes pid | 84098 usesysid | 886185 usename | rep application_name | hello_rds_pg12 client_addr | 192.168.0.173 client_hostname | client_port | 60402 backend_start | 2020-03-21 16:59:01.890775+08 backend_xmin | state | streaming sent_lsn | 11D/97002068 write_lsn | 11D/97002068 flush_lsn | 11D/97002068 replay_lsn | 11D/97002068 write_lag | flush_lag | replay_lag | sync_priority | 0 sync_state | async reply_time | 2020-03-21 17:01:17.198139+08
After you complete the preceding steps, snapshots are created at fixed intervals on your ECS instance. You can use the created snapshots to restore your RDS instance in seconds for real-time disaster recovery.
Restore your RDS instance in seconds
Clone the ZFS file system based on a suitable snapshot.
zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT zp1/zpdata01@2020-03-2117:17:01 312K - 786M - zfs clone -o mountpoint=/test_recovery zp1/zpdata01@2020-03-2117:17:01 zp1/zpdata_test cd /test_recovery ll total 17 drwx------ 20 postgres postgres 35 Mar 21 16:59 pg12_1921_data
Configure the parameters that are used to restore the data of your RDS instance.
NoteIf your ECS instance cannot provide a sufficient memory capacity, you can configure a small shared buffer.
vi /test_recovery/pg12_1921_data/postgresql.auto.conf port=1923 shared_buffers=32GB log_destination='csvlog' recovery_end_command = 'cp /zpdata02/pg12_1921_wal/%f %p' recovery_target_time = '2020-03-21 17:28:37.670338+08' recovery_target_timeline = 'latest' recovery_target_action = 'pause'
Delete the socket and pid files from the cloned ZFS file system.
rm -f /test_recovery/pg12_1921_data/.s.* rm /test_recovery/pg12_1921_data/postmaster.pid
Restore the data from the cloned ZFS file system.
su - postgres /usr/pgsql-12/bin/pg_ctl start -D /test_recovery/pg12_1921_data
View the restored data.
psql -h /test_recovery/pg12_1921_data -p 1923 -U <The username of the privileged account for your RDS instance> postgres psql (12.1) Type "help" for help. postgres=> \dt
NoteAfter you verify that the restored data is correct, you can use the pgdump tool to copy the restored data to your RDS instance.
Delete restored data from your ECS instance
After the restoration is complete, run the following commands to delete the restored data from the test_recovery directory on your ECS instance:
su - postgres
/usr/pgsql-12/bin/pg_ctl stop -m fast -D /test_recovery/pg12_1921_data
sudo zfs destroy zp1/zpdata_test