This topic shows you how to use Data Management (DMS) to build a data warehouse in Data Lake Analytics (DLA) for an ApsaraDB RDS for MySQL instance and use the task orchestration feature of DMS.

Background information

DLA is an interactive query and analytics service that is based on a serverless architecture on Alibaba Cloud. DLA interacts with servers by using SQL statements. Standard SQL is supported. In addition, DLA provides various built-in functions. You can connect multiple heterogeneous data sources to DLA at the same time and perform various analyses on the data sources. For example, you can perform analysis, or association analysis, on data from Object Storage Service (OSS) and Tablestore, both of which are provided by Alibaba Cloud. The association analysis meets your need to analyze different types of data. Based on full integration with the massively parallel processing (MPP) and directed acyclic graph (DAG) technologies, DLA provides enhanced horizontal analysis and extension capabilities, optimized vectorized execution and pipeline operators, efficient resource isolation, and priority-based scheduling. DMS allows you to create and schedule a DLA analysis task flow to implement periodic data analysis in DLA. This has the following benefits:
  • The task flow is scheduled so that tasks in the task flow can be run in an automatic and periodic manner. This significantly reduces the cost of labor.
  • The scheduled task flow generates analysis results in real time and provides quick responses for queries from business applications.
  • Analysis results of the scheduled task flow can be reused by other data analysis tasks in DLA. This maximizes the resource utilization in DLA.

Add an ApsaraDB RDS for MySQL instance

  1. Log on to the DMS console.
  2. In the upper-left corner, move the pointer over Add instance / Batch entry and select Add instance.
  3. Click the Cloud tab.
  4. On the Cloud tab, click MySQL.
  5. In the Add instance dialog box, set the parameters that are described in the following table.
    Section Parameter Description
    Basic Information Data source The source of the database instance. In this example, select Cloud.
    Database type The type of the database instance.
    Instance Area The region where the database instance resides.
    Entry mode The method that you can use to log on to the database instance. Valid values: Instance ID and Connection string address.
    Instance ID The ID of the database instance. This parameter is displayed only when you set the Entry mode parameter to Instance ID.
    Connection string address The endpoint of the database instance. This parameter is displayed only when you set the Entry mode parameter to Connection string address.
    Database account The username that you can use to log on to the database.
    Database password The password that you can use to log on to the database.
    Control Mode The control mode that is used to manage the database instance in DMS. For more information, see Control modes.
    Advanced information Environment type The environment of the database instance.
    Instance Name The name that you specify for the database instance.
    Enable DSQL Specifies whether to enable the cross-database query feature for the database instance. For more information, see Cross-database query.
    OnlineDDL Specifies whether to allow the database instance to change schemas without the need to lock tables. For more information, see Change schemas without locking tables.
    DBA The database administrator (DBA) of the database instance. The DBA can grant permissions to users.
    query timeout(s) The timeout period for the execution of an SQL query statement. If the execution of an SQL query statement lasts longer than the specified timeout period, the execution of the statement is terminated to protect the database.
    export timeout(s) The timeout period for the execution of an SQL export statement. If the execution of an SQL export statement lasts longer than the specified timeout period, the execution of the statement is terminated to protect the database.
  6. In the Basic Information section, click Test connection in the lower-left corner. Wait until the connectivity test is passed.
    Note If the test fails, check the parameter values that you specified based on the error message.
  7. Click Submit.

Add a DLA instance

  1. Log on to the DMS console.
  2. In the upper-left corner, move the pointer over Add instance / Batch entry and select Add instance.
  3. Click the Cloud tab.
  4. On the Cloud tab, click DLA-Data Lake Analytics.
  5. The subsequent steps are the same as those of adding an ApsaraDB RDS for MySQL instance. For more information, see Add an ApsaraDB RDS for MySQL instance.

Create a task flow

  1. Log on to the DMS console.
  2. In the top navigation bar, move the pointer over the More icon and choose Data Factory > Task Orchestration.
  3. In the Free orchestration tasks section of the Task Orchestration tab, click New task flow.
  4. In the New Task Flow dialog box, set the Task Flow Name and Description parameters and click OK.
  5. In the navigation tree, find the DLA one-click DW task node and drag the task node to the canvas.
  6. Configure the DLA one-click DW task node.
    1. Click the DLA one-click DW task node on the canvas. The Content tab appears on the right.
    2. On the Content tab, set the parameters that are described in the following table and click Save.
      Parameter Description
      Source of data The database instance that you want to connect to DLA as the data source. In this example, select the ApsaraDB RDS for MySQL instance.
      Target database (Schema) name The name of the schema to be created in DLA.
      OSS path The OSS path to which you want to store the data in the data source.
      Advanced options The custom configurations of the schema to be created in DLA, such as filtering by field or filtering by table. For more information, see Advanced options.
      Note For more information about how to create a data warehouse in DLA, see Use one-click data warehousing.
    3. Click the Variable Setting tab. Create one or more variables as needed and click Save.
      Note For information about supported time formats, see Task.
  7. In the navigation tree, find the DLA-SQL task node and drag the task node to the canvas.
  8. Configure the DLA-SQL task node.
    1. Double-click the DLA-SQL task node on the canvas, or right-click the task node and select Rename, modify the name of the task node, and then press the Enter key. In this example, set the name to DLA-SQL-day.
    2. Click the DLA-SQL task node on the canvas. On the Content tab, enter SQL statements for generating an analysis report.
    3. Click the Variable Setting tab. Create one or more variables as shown in Step 6 and click Save.
      Note After you enter SQL statements and configure variables, you can click SQLPreview to check whether the configurations meet your expectation. After you confirm that the current configurations meet your expectation, click Save in the lower-left corner.
  9. On the canvas, draw a line from the DLA one-click DW task node to the DLA-SQL task node.
  10. Turn on the switch on the Scheduling tab that appears on the right, complete the configurations, and then click Save.
    Note When you configure scheduling properties for a task flow, you can set the Scheduling cycle parameter as needed. Valid values are Hour, Day, Week, and Month. In this example, the DLA analysis task flow is run at 05:00 every weekday.

What to do next

  • You can grant a RAM user the permission to query the DLA schema that is created by the DLA analysis task flow. For more information, see Permission management.
  • You can use the data protection feature of DMS to ensure data security. For more information, see Data Protection.