Database Backup (DBS) allows you to query data in backup sets without the need to restore the backup data to a database. Backup sets that are generated by using DBS are stored in Alibaba Cloud Object Storage Service (OSS).

Background information

To query backup data, you can use the following conventional methods:
  • Import the backup data to a database and query the backup data in the database. This method takes a long time and poses the risk of changing the backup data.
  • Use Hive to query backup data. This method has high costs and supports only backup data in specific file formats.

DBS allows you to use SQL statements to directly query backup data that is stored in OSS. This ensures that the backup data is not changed. The cost of DBS is low because you are charged based on the volume of scanned data instead of the total volume of the backup data. DBS also allows you to query data in multiple backup sets at a time. You can compare and analyze multiple versions of historical backup data to mine more value from data. For more information, see Query data in a single backup set and Query data in multiple backup sets at a time.

Billing

DBS can call the Data Lake Analytics (DLA) API to query data in backup sets. You are not charged for using DBS to query data in backup sets. However, DLA charges you based on the volume of scanned data. For more information, see Billing methods.

Scenarios

  • Efficient query of backup data: You can use SQL statements to query backup data without the need to restore the backup data to a database.
  • Offline data warehouses: You can store backup data that is generated by DBS in data lakes and build offline data warehouses to extract more value from data.
  • Audit: DBS allows you to back up all your business data. If you need to audit specified historical data, you can obtain the historical data by querying the data in the backup sets.
  • Quick data identification: When you query data in multiple backup sets, DLA creates a database based on the schema of the multiple backup sets. DLA also adds the dbs_dla_partition field to the schema. Each value in this field indicates the version of the backup set from which the row of data comes. This way, you can identify data that is backed up at different points in time. For more information, see Query data in multiple backup sets at a time.