This topic describes the working mechanism and billing rules of the cold data backup and restoration feature. This topic also describes how to configure the feature.
Data is a core asset of enterprises. As the business of enterprises grows, data exponentially increases. This requires business applications to be able to process data online and in real time. Database O&M personnel face great challenges in protecting the core data of enterprises because issues, such as accidental data deletion, system vulnerabilities and ransomware, hardware failures, and natural disasters, can cause data loss. Therefore, the data backup and restoration feature is a key feature of databases.
PolarDB-X allows you to back up data and logs of databases that use the InnoDB storage engine. For more information, see Overview. When you back up data or logs, you can select Auto Backup or Manual Backup as the backup mode and select By Point in Time or By Backup Set as the restoration mode. Cold data that is archived to Object Storage Service (OSS) and hot data that is stored in InnoDB are instance data. Therefore, data backup and restoration must be performed based on database instances. To meet different backup and restoration requirements, PolarDB-X provides a set of solutions to back up and restore cold data that is archived to OSS.
Working mechanism
For instances that contain cold data, PolarDB-X provides data restoration capabilities at different granularities, including the consistent backup and restoration feature for instances, the recycle bin feature for tables, and the SQL flashback feature for SQL statements.
Flashback query of cold data
A table with OSS as the storage engine consists of metadata and data files. Metadata is stored in the Global Meta Service (GMS) metadata nodes of PolarDB-X. Data files are stored in OSS. These data files are in the open source Optimized Row Columnar (ORC) format. The corresponding GMS metadata record of each data file in OSS contains the commit_ts and remove_ts timestamp fields. These fields are used for version control and achieve the same effect as the multiversion concurrency control (MVCC) method.
Flashback query allows you to specify a point in time at which to query the snapshot of data for each table in an SQL statement.
In the preceding figure, the current time is specified as the query time. In File 1 and File 2, the read timestamps are greater than the remove timestamps. Therefore, data in File 1 and File 2 are invisible. In File 3, only the commit timestamp exists and the read timestamp is greater than the commit timestamp. Therefore, data in File 3 is visible. Based on this capability, PolarDB-X provides the AS OF TIMESTAMP 'Specified time'
SQL statement to specify a point in time for snapshot query. Sample statement:
SELECT XX FROM oss_orders where userId = 100
AS OF TIMESTAMP '2022-07-05 11:11:11'
PolarDB-X also provides the Alter FileStorage OSS As Of Timestamp 'Specified time for data restoration'
statement to crop cold data files in OSS and restore data to a specific point in time.
Backup and restoration by backup set
When you restore data in a PolarDB-X instance, hot data that is stored in InnoDB and cold data that is archived to OSS are restored together. You must specify a point in time to restore the data in the original instance to a destination instance. The amount of cold data in OSS can be large. If you use the same backup and restoration policies as those for online databases, the backup and storage costs can be high. To save costs for the backup and restoration of instance data, PolarDB-X separates hot data in InnoDB and cold data in OSS and uses different backup and restoration logic for them. This way, you can back up hot data and cold data at the same time or separately back up hot data and cold data. When you back up or restore data, take note of the following points:
You can configure a scheduled task or manually perform operations to trigger the API operation for full backup. The backup process of instance data can be triggered globally. Hot data in the InnoDB storage engine is split into full data and incremental data and different backup and restoration logic is used for them.
PolarDB-X supports the API operation for the restoration of data to a specified point in time. In the process of instance data restoration, the time point restoration mechanism of hot data in InnoDB is called to ensure that the corresponding GMS and data nodes are restored. Then, cold data in OSS is restored based on the GMS metadata.
Backup process
Full backup of OSS data files at a fixed period.
The database management platform executes the
Alter FileStorage OSS Backup
statement to the compute node to persist the OSS data file metadata maintained by GMS to an OSS file named files_meta.txt. The metadata includes the data file names and the corresponding timestamp versions.The database management platform reads the files_meta.txt file in OSS to obtain all file metadata.
The database management platform copies the data files one by one to the OSS bucket that is used for backup.
OSS schema metadata is stored in GMS. You can follow the backup and restoration process of binary logs in MySQL databases to back up the metadata.
The Alter FileStorage OSS Purge Before Timestamp statement is executed to purge the expired data of old versions at a fixed interval. This ensures that the size of the backup set does not increase. Take note that the purge interval must be longer than the backup interval.
Restoration process
Restore the cold data in InnoDB to a new PolarDB-X instance. In this process, GMS and data nodes are restored to the specified point in time based on the full backup and incremental binary logs. Then, the corresponding OSS metadata can be restored to the specified point in time based on GMS.
Select an OSS data backup set. The system determines whether a most recent backup set whose backup time is earlier than the restoration time exists among historical backup sets. If the backup set exists, the backup set is selected. If the backup set does not exist, the restoration time is close to the current time. In this case, all data files stored in OSS are selected and the
Alter FileStorage OSS Backup
statement is executed to push metadata files maintained in GMS to OSS.Copy the OSS backup set selected in the preceding step to the new PolarDB-X instance and update the method that is used to connect the instance to the OSS bucket.
Execute the
Alter FileStorage OSS As Of Timestamp 'Specified time for data restoration'
statement on the compute node of the instance to crop cold data files in OSS and restore data to a specific point in time.
Unlike hot data, the minimum granularity based on which you can maintain cold data metadata is a file instead of a row. Therefore, differences exist between hot data and cold data. The following table describes the differences.
Type | Hot data in InnoDB | Cod data in OSS |
Data backup | Full backup + Incremental binary logs | Reuse of the InnoDB backup capability for metadata + full copy of data files |
Data restoration | Full data + Incremental data replay | Reuse of the InnoDB backup capability for metadata + Cropping of data files by timestamp |
Backup cycle | Daily or weekly | Weekly or monthly |
The preceding figure shows the difference between the backup and restoration of data in InnoDB and OSS. During the backup and restoration of data in InnoDB, the most recent full backup set whose backup time is earlier than the restoration time is found and then the incremental data is read to restore data to the specified point in time. During the backup and restoration of data in OSS, the most recent backup set whose backup time is later than the restoration time is found, and then the built-in incremental data in the dataset is cropped to restore data to the specified point in time.
Billing
The backup of data in OSS is free of charge.
Configure a backup policy
Log on to the PolarDB for Xscale console.
In the top navigation bar, select the region where the target instance is located.
On the Instances page, click the PolarDB-X 2.0 tab. Find the PolarDB-X instance that you want to manage and click the instance ID.
In the left-side navigation pane, choose .
In the Back Up and Archive Data section of the Archived Data Details page, click Settings.
In the Backup and Archiving Settings dialog box, configure the following parameters:
Back Up and Archive Data: The switch is automatically turned on.
Backup Interval: the number of days at which the backup is performed. Valid values: 1 to 59. Unit: days.
Backup Retention Policy: You can select whether to retain the backup data for a specified period of time or permanently.
Backup Retention Period: the retention period of backup files. Valid values: 30 to 730. Unit: days.
Click OK.