All Products
Search
Document Center

Use the task orchestration feature to orchestrate and schedule tasks of DLA

Last Updated: May 29, 2020

Data Lake Analytics (DLA) is a serverless service that allows you to perform interactive search and analysis on the cloud. It allows you to query and analyze data stored in different data sources. This topic describes how to use the task orchestration feature of Data Management Service (DMS) to orchestrate and schedule computing tasks of DLA.

Before you begin

Prepare a DLA instance

Prepare a DLA instance on Alibaba Cloud. If no DLA instance is available,create one. Create a schema with two tables on the DLA instance. You can use the following code to create a schema and create two tables:

  1. -- Create a schema.
  2. CREATE DATABASE IF NOT EXISTS `demo_schema`
  3. WITH DBPROPERTIES (
  4. LOCATION = 'oss://dbs-backup-1673826650152166-cn-beijing/dla_demo/',
  5. catalog = 'oss',
  6. tags = 'sourceType=OSS_JSON'
  7. );
  8. -- Create a table named demo1 and a table named demo2.
  9. CREATE EXTERNAL TABLE IF NOT EXISTS demo1
  10. (
  11. id int,
  12. v int
  13. )
  14. ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
  15. STORED AS TEXTFILE
  16. LOCATION 'oss://dbs-backup-1673826650152166-cn-beijing/dla_demo/';
  17. CREATE EXTERNAL TABLE IF NOT EXISTS demo2
  18. (
  19. id int,
  20. v int
  21. )
  22. ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
  23. STORED AS TEXTFILE
  24. LOCATION 'oss://dbs-backup-1673826650152166-cn-beijing/dla_demo/';

You can also create a schema and tables on the codeless UI of the DLA console. For more information, see Wizard for creating OSS tables.

Activate DMS

If DMS is activated, go to the DMS console.

Otherwise, activate DMS by following these steps:

Log on to the DMS console. On the page that appears, click Buy Now.

Specify the billing method, the control mode, and the number of instances that you want to create, and click Buy Now. For more information, see Control modes.

Procedure

To connect DMS to DLA, register a DLA schema as an instance in DMS and orchestrate tasks in a task flow.

Register a DLA schema as an instance

  1. Log on to the DMS console. Move the pointer over the plus sign (+) in the upper-left corner and select Add instance.

  2. In the Add instance dialog box that appears, click the Cloud tab. On the Cloud tab, select DLA-Data Lake Analytics.
    Register a DLA schema as an instance

  3. In the Basic Information/Advanced information step, set the following parameters:
    DLA instance details

    • Instance Area: The region where the DLA instance is deployed.
    • Connection string address: The connection string of the DLA instance.
    • Database account and Database password: The username and password that you use to log on to the DLA instance.
    • Control Mode: The control mode that is used to manage the instance in DMS. In this example, select the Secure Collaboration mode.
    • Security Rules: The security rule that specifies the approval process. In this example, select dla default from the drop-down list.
  4. In the Basic Information section, click Test connection in the lower-left corner. Wait until the connectivity test is passed. If the test fails, check the parameter values that you specify.

  5. In the Advanced information section, enter a name for the instance in the Instance Name field, for example, DLA-test. Then, click Submit.

Orchestrate tasks in a task flow and run the task flow

After a DLA schema is registered as an instance, you can orchestrate tasks in a task flow and set the scheduling cycle for the task flow.

Orchestrate tasks in a task flow

To go to the Task orchestration page, choose Data Factory > Task orchestration in the top navigation bar.
On the left-side navigation submenu of the Task orchestration page, click Develop Space.

In the upper-left corner, click the plus sign (+) next to Task orchestration. In the New Task Flow dialog box that appears, enter a name for the task flow in the Task Flow Name field, for example, dla_demo_taskflow. Then, click OK.

Create SQL tasks

On the task orchestration page that appears, drag and drop DLA-SQL to the canvas to create SQL tasks. Click an SQL task. In the right-side pane, click the Content tab. Select demo_schema that you created on the DLA instance from the Database drop-down list and enter SQL statements.

In this example, two SQL tasks are created. One is used to prepare data and the other is used to aggregate data. Enter the following SQL statements for the SQL task used to prepare data on the Content tab:

  1. insert into demo1 values(1, 1);
  2. insert into demo1 values(1, 19);
  3. insert into demo1 values(2, 12);
  4. insert into demo1 values(3, 2);
  5. insert into demo1 values(3, 4);
  6. insert into demo1 values(4, 67);

Enter the following SQL statements for the SQL task used to aggregate data on the Content tab:

  1. insert into demo2
  2. select id, max(v) from demo1 group by id;

Click the Node Name tab for each SQL task. On the Node Name tab, set the name of the SQL task used to prepare data to Preparation and that of the SQL task used to aggregate data to Aggregation.

In the directed acyclic graph (DAG), click the mark on the border of the Preparation task. Then, draw a line with an arrow from the Preparation task to the Aggregation task.dla arrow

Run the task flow

After tasks are orchestrated, you can directly run the task flow or configure the scheduling policy to run the task flow at the specified time.

Directly run the task flow

In the upper-left corner, click Try Run to run the task flow.dla pre-run

View the running result

On the left-side navigation submenu, click Operation Center. On the page that appears, you can view the running result of the task flow.

Run the task flow at the specified time

On the left-side navigation submenu, click Develop Space to go back to the task orchestration page. Click a blank area on the canvas. On the Scheduling tab that appears in the right-side pane, turn on the Turn on/off switch and set the Trigger type parameter to Cyclic scheduling. Then, select a scheduling cycle and set the time to run the task flow.5

In this way, the scheduling policy is configured for the task flow. At the specified time, the task flow will be automatically run. You can view the running result on the Operation Center tab.

Summary

This topic describes how to connect DMS to DLA through task orchestration. The procedure includes activating DMS, registering a DLA schema as an instance, creating and orchestrating tasks in a task flow, and running and scheduling the task flow. You can gain the following benefits by using task flows in DMS to perform periodic data analysis in DLA:

  • You need to define a task flow only once. The tasks in the task flow can be run automatically and periodically. This greatly reduces the cost of manual operations.
  • Tasks are run periodically and analysis results are generated in advance. Business applications can query and obtain the results in a timely manner.
  • The results generated when task flows are being run can be reused by other data analysis tasks. This maximizes the resource utilization of DLA.

For more information about the features of DMS, see the DMS documentation.