All Products
Search
Document Center

DataWorks:Build an enterprise data warehouse based on AnalyticDB for MySQL

Last Updated:Mar 27, 2026

This tutorial shows you how to set up an enterprise data warehouse on AnalyticDB for MySQL using DataWorks. By the end, you will have:

  • Added an AnalyticDB for MySQL data source and verified network connectivity

  • Configured an IP address whitelist to allow DataWorks to access your cluster

  • Created a workflow with a batch synchronization node and a data development node

  • Deployed the pipeline and monitored scheduled runs in Operation Center

  • Managed metadata in Data Catalog

Prerequisites

Before you begin, make sure you have:

  • A DataWorks workspace. See Create a workspace.

  • An AnalyticDB for MySQL cluster deployed in a virtual private cloud (VPC). If your cluster uses V3.0, create the database username and password in the AnalyticDB for MySQL console before proceeding.

  • A serverless resource group (recommended) or an old-version exclusive resource group for scheduling. Other resource group types cannot connect to VPC-deployed AnalyticDB for MySQL clusters and will time out.

Add an AnalyticDB for MySQL data source

Connect your AnalyticDB for MySQL cluster to DataWorks so nodes can read from and write to it.

  1. Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose More > Management Center, select your workspace from the drop-down list, and click Go to Management Center.

  2. In the left-side navigation pane of the SettingCenter page, click Data Sources.

  3. In the Add Data Source dialog box, click AnalyticDB for MySQL (V3.0).

  4. Configure the parameters. For details, see Add an AnalyticDB for MySQL 3.0 data source. Keep the following authentication requirements in mind:

    • V2.0: Use the AccessKey pair of your Alibaba Cloud account or a RAM user.

    • V3.0: Use the username and password of an AnalyticDB for MySQL V3.0 database. Create these credentials in the AnalyticDB for MySQL console before adding the data source.

  5. Find the resource group row and click Test Network Connectivity.

  6. After the connectivity test passes, click Complete Creation.

    Use a serverless resource group or an old-version exclusive resource group to run AnalyticDB for MySQL nodes that access a VPC-deployed cluster. For instructions, see Use serverless resource groups.

Configure the IP address whitelist

AnalyticDB for MySQL V3.0 uses an IP address whitelist to control inbound connections. Add your DataWorks resource group's IP address or CIDR block to the whitelist so the cluster accepts requests.

Choose the procedure that matches your network environment.

Internal network environment

During the network connectivity test, use the network connectivity diagnostic tool to:

  1. Associate the serverless resource group with the VPC where the AnalyticDB for MySQL V3.0 cluster is deployed.

  2. Add the CIDR block of the vSwitch associated with the resource group to the cluster's IP address whitelist.

For details, see Network connectivity solutions.

Public network environment

If you added a V2.0 data source, skip this section.

First, configure a NAT gateway and an elastic IP address (EIP) so DataWorks can reach the cluster over the internet:

  1. Configure an Internet NAT gateway and an EIP for the VPC associated with the serverless resource group. For details, see Overview of network connectivity solutions.

Then, add the EIP to the cluster's whitelist:

  1. Log on to the AnalyticDB for MySQL console. In the left-side navigation pane, click Clusters. Click the cluster name in the Cluster ID/Description column.

  2. In the left-side navigation pane, choose Cluster Management > Cluster Information. In the Data Security - Whitelist Settings section, click Create Whitelist.

  3. In the Create Whitelist panel, enter a name in Whitelist Name and paste the EIP into the IP Addresses field. Click OK.

Create a workflow

A workflow in DataWorks groups related nodes together and manages their dependencies. Create one before adding any nodes.

  1. Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose Data Development and O\&M > Data Development, select your workspace, and click Go to Data Development.

  2. On the DataStudio page, move the pointer over the Create icon in the top navigation bar and select Create Workflow.

  3. In the Create workflow dialog box, fill in Workflow Name and Description, then click Create.

Create a batch synchronization node

A batch synchronization node moves data from your source systems into the AnalyticDB for MySQL cluster on a schedule. Create one inside the workflow you just created.

  1. Click the workflow, right-click Data Integration, and choose Create Node > Offline synchronization.

  2. In the Create Node dialog box, enter a Name and click Confirm.

  3. On the configuration tab, set the source and destination for the synchronization.

  4. In the Field Mapping section, map source fields to destination fields. Click Add a row to add a mapping, or hover over a row and click the Remove icon to delete one.

  5. Configure the channel control settings:

    Parameter Description
    Task Expected Maximum Concurrency Maximum number of parallel threads for reading from the source or writing to the destination
    Synchronization rate Enable throttling and set a maximum transmission rate to reduce read pressure on the source
    Policy for Dirty Data Records Maximum number of dirty data records allowed
    Distributed Execution Enable to distribute the workload across nodes; use a serverless resource group for best performance. See Use serverless resource groups.
  6. Click the Properties

  7. Click the 保存 icon to save, then click the 提交 icon to commit the node.

Create a data development node

A data development node lets you run SQL statements against your AnalyticDB for MySQL cluster — for example, to transform or aggregate data after it has been synced.

  1. Click the workflow, right-click UserDefined, and choose Create Node > AnalyticDB for MySQL.

  2. In the Create Node dialog box, enter a Name and click Confirm.

  3. On the configuration tab, select the data source and write your SQL statements. Both data manipulation language (DML) and data definition language (DDL) statements are supported.

  4. Click the Properties tab in the right-side panel and configure the scheduling properties.

  5. Click the Save icon to save the node, then click the Run icon to test your SQL statements.

Monitor scheduled runs

After you commit and deploy a node, it runs automatically based on its scheduling properties. To check execution status:

  1. Open the node's configuration tab and click Operation Center in the upper-right corner.

  2. In Operation Center, view the run history, logs, and scheduling status of the node.

For details, see Manage scheduled tasks.

Manage metadata

After data flows into your AnalyticDB for MySQL cluster, you can browse and manage table metadata in Data Catalog.

Log on to the DataWorks console. In the left-side navigation pane, choose Data Catalog > Data Map.

For details, see Overview.