Database Backup (DBS) allows you to query data from multiple backup sets at a time. The backup sets involved must be generated based on the same backup schedule. This topic describes how to query data from multiple backup sets at a time.

Prerequisites

Multiple backup sets are generated based on a database that runs the MySQL, SQL Server, Oracle, PostgreSQL, Postgres Plus Advanced Server (PPAS), or PolarDB-X engine.
Note The database can be a self-managed database, an ApsaraDB RDS database, or a PolarDB database, but cannot be a PolarDB O Edition database.

Create a data lake

  1. Log on to the DBS console.
  2. In the left-side navigation pane, click Backup Data Query.
  3. From the Schedule ID/Name drop-down list, select the backup schedule where the backup sets that you want to query are generated.
    Note To go to the Backup Data Query page, you can also click Backup Data Query in the upper-right corner on the details page of the backup schedule.
    Select a backup schedule
  4. Click Open the data lake in the upper-right corner. In the Open the data lake dialog box, specify the parameters as described in the following table.
    Open the data lake dialog box
    Parameter Description
    Optional backup set time range The time range during which the backup sets that you can select are generated. This parameter is set by the system.
    Select the start time of the backup set The earliest date on which the backup sets that you want to query are generated.
    Select the end time of the backup set The latest date on which the backup sets that you want to query are generated.
    Enable automatic incremental backup Specifies whether to enable automatic incremental backup. If you set this parameter to Yes, DBS adds the backup sets that are generated after the selected backup sets to the data lake that is created for the backup schedule.
    Note We recommend that you set this parameter to Yes only when you select the latest backup set that is generated based on the backup schedule. If you set this parameter to Yes and do not select the latest backup set that is generated based on the backup schedule, DBS adds all the backup sets, including the latest backup set, that are generated after the selected backup sets to the data lake immediately after the data lake is created.
    Backup Set ID The backup sets that you want to query. A data lake is created for the backup schedule based on the selected backup sets. By default, all the backup sets that are generated based on the backup schedule are selected.
  5. Click OK.
    Note After you click OK, Data Lake Analytics (DLA) starts to create a data lake based on the selected backup sets. The time required to create the data lake varies with the volume of data in the selected backup sets. You can click Data Lake State to view the creation progress of the data lake.

Query data in the data lake

  1. Log on to the DBS console.
  2. In the left-side navigation pane, click Backup Data Query.
  3. From the Schedule ID/Name drop-down list, select the backup schedule where the backup sets that you want to query are generated.
    Note To go to the Backup Data Query page, you can also click Backup Data Query in the upper-right corner on the details page of the backup schedule.
  4. Log on to the DLA console. In the left-side navigation pane, choose Serverless Presto > Execute. After you click OK in Step 5 of the previous section, DLA creates a database based on the schema of the selected backup sets and displays the database in the navigation tree of the Execute page. Double-click the database name, enter an SQL statement in the code editor, and then click Sync Execute.
    Note
    • The SQL syntax of DLA is based on Presto. For more information, click Syntax Manuals in the upper-right corner of the Execute page or see Common SQL statements.
    • When you query data in multiple backup sets, DLA adds the dbs_dla_partition field to the database that is created based on the backup sets. Each value in this field represents the version of the backup set from which the row of data comes. The version of a backup set is the point in time when the backup set is generated. For each row of data, the value in a field may vary in different backup sets. The following SQL statement is used to query values in the value field of the data row whose ID is 3:
      SELECT value,dbs_dla_partition FROM `database`.`table`WHERE ID = 3;

Manage data in a data lake

  • After you create a data lake for a backup schedule, you can add backup sets to the data lake.
  • You can create only one data lake for each backup schedule. To create a data lake for a backup schedule for which a data lake is already created, you must delete the existing data lake.
  1. Log on to the DBS console.
  2. In the left-side navigation pane, click Backup Data Query.
  3. From the Schedule ID/Name drop-down list, select the backup schedule where the backup sets that you want to query are generated.
  4. Manage data in the data lake.
    • Add one or more backup sets to the data lake.
      1. Click Add backup set in the upper-right corner of the page.
      2. In the Open the data lake dialog box, select one or more backup sets that you want to add to the data lake. Then, click OK.

        You can go to the Execute page of the DLA console to query data in the backup sets that you added.

    • Delete the data lake.
      1. Click Close the data lake in the upper-right corner of the page.
      2. In the message that appears, click OK.

        The data lake is deleted.