DataWorks uses MaxCompute as a computing and storage engine to support visualized workflow development and managed scheduling and O&M. This platform provides offline processing and analysis of large amounts of data, and supports the managed scheduling of all tasks based on time and dependency. You can schedule and manage AnalyticDB for MySQL tasks by using DataWorks features such as managed scheduling, task dependency management, and task O&M.

Purpose

An important function of task scheduling is dependencies between tasks. This section uses two AnalyticDB for MySQL tasks created in DataWorks to demonstrate this function. The following figure shows the relationship between tables and tasks.

Purpose
  • Task 1

    AnalyticDB for MySQL executes the O_ORDERSTATUS='F' statement to filter finished orders in the orders table and then writes the orders into the finished_orders table.

  • Task 2

    AnalyticDB for MySQL executes the O_TOTALPRICE > 10000 statement to filter orders whose total price is greater than 10,000 from the finished_orders table, and then writes the orders into the high_value_finished_orders table.

Prerequisites

  1. An AnalyticDB for MySQL cluster is created and a whitelist is configured. For more information, see Limits.
    Note AnalyticDB for MySQL is in the same region as DataWorks.
    • If you want to access AnalyticDB for MySQL from DataWorks by using a public endpoint, you must apply for a public endpoint, and then add the to the AnalyticDB for MySQL whitelist.
    • If you want to access AnalyticDB for MySQL from DataWorks by using a VPC, you must add the exclusive resource group in DataWorks. The VPC of the exclusive resource group is the same as that of AnalyticDB for MySQL. For more information, see Exclusive resource groups.
    • This section uses the ORDERS table in the TPC-H dataset as test data. After you access an AnalyticDB for MySQL database, you must execute the following SQL statements to create a test table:
      CREATE TABLE orders(
            o_orderkey bigint NOT NULL COMMENT'',
            o_custkey int NOT NULL COMMENT'',
            o_orderstatus varchar NOT NULL COMMENT'',
            o_totalprice decimal(15,2) NOT NULL COMMENT'',
            o_orderdate date NOT NULL COMMENT'',
            o_orderpriority varchar NOT NULL COMMENT'',
            o_clerkvar char NOT NULL COMMENT'',
            o_shippriority int NOT NULL COMMENT'',
            o_comment varchar NOT NULL COMMENT'',
            dummy varchar
        )DISTRIBUTED BY HASH(o_orderkey);
      CREATE TABLE finished_orders (
             o_orderkey bigint NOT NULL,
             o_totalprice decimal(15,2) NOT NULL,
             primary key (o_orderkey)
         )DISTRIBUTE BY HASH(o_orderkey);
      
      CREATE TABLE high_value_finished_orders (
            o_orderkey bigint NOT NULL,
            o_totalprice decimal(15,2) NOT NULL,
            PRIMARY KEY (`o_orderkey`)
        )
        DISTRIBUTED BY HASH(o_orderkey);
  2. A workspace is created in DataWorks. For more information, see Create a workspace.
    Note Select standard mode for the workspace.

Step 1: Create an AnalyticDB for MySQL connection in DataWorks

For more information, see Configure the AnalyticDB for MySQL connection.

add adb source

DataWorks implements security control on connected services. You must add endpoints and ports of AnalyticDB for MySQL databases to the whitelist of DataWorks.

Configure whitelist 1Configure whitelist 2

Step 2: Create an AnalyticDB for MySQL task in DataWorks

  1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. In the Actions column corresponding to the created workspace, click Data Analytics.
  2. Right-click Business Flow and select Create Workflow to create a workflow named adb_task.
    Create a workflow
  3. On the adb_task tab, choose Custom > AnalyticDB for MySQL from the navigation tree to create a data analytics node with the name finished_orders in the workflow.
    Create a node
  4. Click finished_orders, set Select a connection to adb_source, and execute the following SQL statements.

    Task 1: AnalyticDB for MySQL uses O_ORDERSTATUS='F' to filter finished orders in the orders table and then writes the orders into the finished_orders table.

     use adb_demo;
     insert into finished_orders
     select O_ORDERKEY, O_TOTALPRICE
     from orders 
     where O_ORDERSTATUS = 'F';
    finished_orders
  5. Follow the preceding steps to create a data analytics node whose name is high_value_finished_orders.

    Task 2: AnalyticDB for MySQL uses O_TOTALPRICE > 10000 to filter orders whose total price is greater than 10,000 from the finished_orders table and writes the orders into the high_value_finished_orders table.

     use adb_demo;
     insert into high_value_finished_orders
     select * from finished_orders
     where O_TOTALPRICE > 10000;
    high_value_finished_orders

Step 3: Configure task dependencies

In DataWorks, tasks can be triggered by scheduling a configuration based on time or dependencies. Multiple tasks are executed at a specified time based on specified dependencies. For more information, see Scheduling properties and Dependencies.

For example, you can execute the finished_orders task at 00:17 every day.

时间依赖

Execute the high_value_finished_orders task after the finished_orders task is executed.

依赖2

Step 4: Publish a task

After you configure a task, you can publish and maintain the task. For more information, see Deploy a node.

Task O&M

After you publish a task, you can view tasks to be published.

Figure 1: View tasks to be published Figure 2 : View tasks to be published

After a task is published, you can view and maintain the task on the task O&M page.

Task O&M