This topic describes how to use Data Management (DMS) to create a data warehouse in Data Lake Analytics (DLA) for an ApsaraDB RDS for MySQL instance. This topic also describes how to use the task orchestration feature of DMS.

Background information

DLA is a service that allows you to perform interactive query and analysis based on a serverless architecture on Alibaba Cloud. DLA interacts with servers by using SQL statements. Standard SQL statements are supported. In addition, DLA provides various built-in functions. You can connect multiple heterogeneous data sources to DLA at the same time and perform various analyses on the data sources. For example, you can perform analysis or association analysis on data from Object Storage Service (OSS) and Tablestore that are provided by Alibaba Cloud. The association analysis meets your need to analyze different types of data. DLA is fully integrated with the massively parallel processing (MPP) and directed acyclic graph (DAG) technologies. This enhances the horizontal analysis and extension capabilities of DLA, optimizes vectorized execution and pipeline operators, and enables efficient resource isolation and priority-based scheduling. DMS allows you to create and schedule a DLA analysis task flow to implement periodic data analysis in DLA. This has the following benefits:
  • The task flow is scheduled so that the tasks in the task flow can be run in an automatic and periodic manner. This significantly reduces the cost of labor.
  • The scheduled task flow generates analysis results in real time and provides quick responses for queries from business applications.
  • Analysis results of the scheduled task flow can be reused by other data analysis tasks in DLA. This maximizes the resource utilization in DLA.

Procedure

  1. Log on to the DMS console.
  2. In the top navigation bar, move the pointer over the More icon and choose Data Factory > Task Orchestration.
  3. In the Free orchestration tasks section of the Task Orchestration tab, click New task flow.
  4. , set the Task Flow Name and Description parameters and click OK.
  5. Click the blank area on the canvas. In the right-side pane, click the Variables tab and configure the following variables.
    After you configure the variables, you can use ${Variable name} in SQL statements to specify a point in time or time period. For more information about the rules and usage of variables, see Task.

    In this example, create a variable named weekday, set the Time Format parameter to e, and then set the offset to +1 Day.

    The value of this variable indicates the specific day of a week. In this example, the variable value indicates that Sunday is the first day of the week. If you set the offset to 0, the variable value indicates that Monday is the first day of the week.

  6. , drag the DLA one-click DW task node under Archiving the warehouse.
  7. Configure the DLA one-click DW task node.
    This task node can be used to store a database in the form of a file to an OSS data warehouse. This helps you create a data warehouse with ease.
    1. Click the DLA one-click DW task node on the canvas. In the right-side pane, click the Content tab.
    2. On the Content tab, set the parameters that are described in the following table and click Save.
      Parameter Description
      Source of data The database instance that you want to connect to DLA as the data source. In this example, select an ApsaraDB RDS for MySQL instance.
      Target database (Schema) name The name of the schema to be created in DLA.
      OSS path The OSS path to which you want to store the data in the data source.
      Advanced options The custom configurations of the schema to be created in DLA, such as filtering by field or filtering by table. For more information, see Advanced options.

      Example: include-tables=adjust_source.

      Note For more information about how to create a data warehouse in DLA, see Create a data warehouse by synchronizing full data with one click in T+1 days.
  8. , drag the DLA-SQL task node under Archiving the warehouse.
  9. Configure the DLA-SQL task node.
    This task node can be used to query, process, and analyze the data in the data warehouse.
    1. Right-click the DLA-SQL task node on the canvas and select Rename. Modify the name of the task node and press the Enter key. In this example, set the name to DLA-SQL-day.
    2. Click the DLA-SQL task node on the canvas. In the right-side pane, click the Content tab. Then, enter SQL statements for generating an analysis report.
      You can use the following sample SQL statements:
      CREATE EXTERNAL TABLE IF NOT EXISTS `rds_mysql_0723`.`adjust_day` (
        `id` DOUBLE COMMENT '',
        `auditon` TIMESTAMP COMMENT '',
        `returnmemo` STRING  COMMENT '',
        `issystem` DOUBLE COMMENT ''
      )
      STORED AS `PARQUET`
      LOCATION 'oss://xxx/DLA/DLA-Formation-rds_mysql_0723/${weekday}/adjust_day/'
      TBLPROPERTIES (
        'auto.create.location' = 'true',);
      
      INSERT OVERWRITE adjust_day
      SELECT
      id,
      auditon,
      returnmemo,
      issystem
      FROM adjust_source;
      Note After you enter SQL statements and configure variables, you can click SQLPreview to check whether the configurations meet your expectation. After you confirm that the current configurations meet your expectation, click Save.
  10. On the canvas, draw a line from the DLA one-click DW task node to the DLA-SQL-day task node.
  11. After you connect the task nodes, turn on Turn on/off on the Scheduling tab in the right-side pane. Set the parameters and click Save.
    Note You can schedule a task flow to be run by month, week, or day based on your needs. In this example, the DLA analysis task flow is scheduled to be run at 05:00 every business day.

What to do next

  • You can grant a RAM user the permission to query the DLA schema that is created by the DLA analysis task flow. For more information, see Permission management.
  • You can use the data protection feature of DMS to ensure data security. For more information, see Data Protection.