All Products
Search
Document Center

AnalyticDB for MySQL:Use DataWorks to schedule XIHE SQL tasks

Last Updated:Sep 07, 2023

DataWorks uses MaxCompute as a computing and storage engine to process and analyze large amounts of data in batch mode. DataWorks offers fully hosted services for visualized workflow development, scheduling, and O&M. In DataWorks, tasks can be managed and scheduled by time or dependency. You can schedule and manage AnalyticDB for MySQL tasks by using DataWorks features such as managed scheduling, task dependency management, and task O&M.

Purposes

Task scheduling requires dependencies between tasks. In this topic, two AnalyticDB for MySQL tasks are created in DataWorks. The following figure shows the relationship between tables and tasks.

目的
  • Task 1

    Execute a statement that contains the o_orderstatus = 'F' condition to filter the finished orders from the orders table, and then write the finished orders to the finished_orders table.

  • Task 2

    Execute a statement that contains the o_totalprice > 10000 condition to filter the orders whose total price is greater than 10,000 from the finished_orders table, and then write the orders to the high_value_finished_orders table.

Prerequisites

  1. An AnalyticDB for MySQL cluster is created, and a whitelist is configured for the cluster. For more information, see Use AnalyticDB for MySQL Data Warehouse Edition (V3.0).

    Note

    Make sure that the AnalyticDB for MySQL cluster resides in the same region as DataWorks.

    • If you want to connect to AnalyticDB for MySQL from DataWorks by using a public endpoint, you must apply for a public endpoint, and then add the IP addresses in the DataWorks whitelist to the AnalyticDB for MySQL whitelist.

    • If you want to connect to AnalyticDB for MySQL from DataWorks by using a Virtual Private Cloud (VPC) endpoint, you must create an exclusive resource group in DataWorks. The VPC of the exclusive resource group must be the same as that of the AnalyticDB for MySQL cluster.

    • In this topic, the orders table is used as test data. After you connect to an AnalyticDB for MySQL database, you can execute the following statements to create test tables:

      CREATE TABLE orders(
            o_orderkey bigint NOT NULL COMMENT'',
            o_custkey int NOT NULL COMMENT'',
            o_orderstatus varchar NOT NULL COMMENT'',
            o_totalprice decimal(15,2) NOT NULL COMMENT'',
            o_orderdate date NOT NULL COMMENT'',
            o_orderpriority varchar NOT NULL COMMENT'',
            o_clerkvar char NOT NULL COMMENT'',
            o_shippriority int NOT NULL COMMENT'',
            o_comment varchar NOT NULL COMMENT'',
            dummy varchar
        )DISTRIBUTED BY HASH(o_orderkey);
      CREATE TABLE finished_orders (
             o_orderkey bigint NOT NULL,
             o_totalprice decimal(15,2) NOT NULL,
             primary key (o_orderkey)
         )DISTRIBUTE BY HASH(o_orderkey);
      
      CREATE TABLE high_value_finished_orders (
            o_orderkey bigint NOT NULL,
            o_totalprice decimal(15,2) NOT NULL,
            PRIMARY KEY (`o_orderkey`)
        )
        DISTRIBUTED BY HASH(o_orderkey);
  2. A workspace is created in DataWorks. For more information, see Create a workspace.

    Note

    Select standard mode for the workspace.

Step 1. Add an AnalyticDB for MySQL data source in DataWorks

For more information, see AnalyticDB for MySQL 3.0 data source.

DataWorks implements security control on connected services. You must add endpoints and ports of AnalyticDB for MySQL databases to the whitelist of DataWorks.

设置白名单2

Step 2. Create an AnalyticDB for MySQL node in DataWorks

For more information, see Create and use an AnalyticDB for MySQL node.

Step 3: Configure task dependencies

In DataWorks, tasks can be triggered by scheduling a configuration based on time or dependencies. Multiple tasks are executed at the specified point in time based on specified dependencies. For more information, see Configure time properties and Configure same-cycle scheduling dependencies.

For example, you can configure the finished_orders task to be executed at 02:00:00 every day. After the finished_orders task is executed, the high_value_finished_orders task is executed.

Step 4. Publish tasks

After you configure tasks, you can publish and maintain these tasks. For more information, see Deploy nodes.

After you publish a task, you can view the tasks to be published.

After a task is published, you can view and maintain the task on the task O&M page.