All Products
Search
Document Center

DataWorks:Build a data warehouse for an enterprise based on AnalyticDB for MySQL

Last Updated:Jan 15, 2024

This topic describes how to build a data warehouse for an enterprise based on AnalyticDB for MySQL, and use the data warehouse to perform O&M operations and manage metadata.

Before you perform operations in this topic, create a workspace. For more information, see Create a workspace.

Add an AnalyticDB for MySQL 3.0 data source

  1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the desired workspace and choose Shortcuts > Data Integration in the Actions column.

  2. In the left-side navigation pane, click Data Source. On the Data Sources page, click Create Data Source.

  3. In the Add data source dialog box, click AnalyticDB for MySQL 3.0.

  4. In the Add AnalyticDB for MySQL 3.0 data source dialog box, configure the parameters. For more information, see Add an AnalyticDB for MySQL 3.0 data source.

    Note
    • Nodes that are used to synchronize data from AnalyticDB for MySQL instances support only exclusive resource groups for scheduling. Nodes that are run on other types of resource groups cannot access AnalyticDB for MySQL instances that are deployed in a virtual private cloud (VPC). If you configure an AnalyticDB for MySQL node that is run on other types of resource groups to synchronize data from an AnalyticDB for MySQL instance that is deployed in a VPC, connections between the node and AnalyticDB for MySQL instance may time out. For information about exclusive resource groups for scheduling, see Create and use an exclusive resource group for scheduling.

    • If you add an AnalyticDB for MySQL 2.0 data source, you must enter the AccessKey ID and AccessKey secret of the current Alibaba Cloud account or a RAM user for identity authentication.

    • If you add an AnalyticDB for MySQL 3.0 data source, you must enter the username and password that are used to connect to an AnalyticDB for MySQL 3.0 database for identity authentication. After you create an AnalyticDB for MySQL 3.0 database, you must log on to the AnalyticDB for MySQL console and create the username and password for connecting to the database first.

  5. Find the desired resource group and click Test connectivity in the Actions column.

  6. After the data source passes the network connectivity test, click Complete.

Add information about your exclusive resource group for scheduling to the whitelist of the AnalyticDB for MySQL 3.0 database

An AnalyticDB for MySQL 3.0 database uses a whitelist to control client access. It allows access requests from a client only when the client information is included in the whitelist.

  1. Obtain information about the exclusive resource group for scheduling.

    To guarantee that an exclusive resource group for scheduling can properly access an AnalyticDB for MySQL 3.0 database, you must add information, such as the Elastic Network Interface (ENI) IP address, about the exclusive resource group to the whitelist of the AnalyticDB for MySQL 3.0 database. For more information, see Create and use an exclusive resource group for scheduling. Skip the operations in this section if you use an AnalyticDB for MySQL 2.0 database.

  2. Add information about your exclusive resource group for scheduling to the whitelist of the AnalyticDB for MySQL 3.0 database

    1. Log on to the AnalyticDB for MySQL console. In the left-side navigation pane, click Clusters. On the page that appears, find the desired cluster on the V3.0 Clusters tab and click the cluster ID. The details page of the cluster appears. Then, click Data Security.

    2. On the page that appears, click Create Whitelist in the upper-right corner. In the dialog box that appears, add information about the exclusive resource group to the whitelist.

Create a workflow

  1. In the DataWorks console, click the more icon in the upper-left corner and choose All Products > Data Development And Task Operation > DataStudio.

  2. In the Create Workflow dialog box, configure the Workflow Name and Description parameters.

  3. Click Create.

Create a batch synchronization node

  1. Click the created workflow, right-click Data Integration, and then choose Create Node > Offline synchronization.

  2. In the Create Node dialog box, configure the Name parameter and click Confirm.

  3. Configure a source and a destination for the batch synchronization node.

  4. In the Field Mapping area, configure field mappings between the source and destination.

    Fields in the source on the left have a one-to-one mapping with fields in the destination on the right. You can click Add a row to add a field. To remove an added field, move the pointer over the field and click the Remove icon.

  5. Configure channel control policies.

    Configure the maximum transmission rate and dirty data check rules.

    Parameter

    Description

    Expected Maximum Concurrency

    The maximum number of parallel threads that the synchronization node can use to read data from the source or write data to the destination. You can configure the parallelism for the data synchronization node on the codeless UI.

    Synchronization Rate

    Specifies whether to enable throttling. You can enable throttling and specify a maximum transmission rate to prevent heavy read workloads on the source. We recommend that you enable throttling and set the maximum transmission rate to an appropriate value based on the configurations of the source.

    Dirty Data Records Allowed

    The maximum number of dirty data records allowed.

    Exclusive Resource Group for Data Integration

    The exclusive resource group for Data Integration that you want to use to run the batch synchronization node. If you need to run a large number of nodes on the shared resource group for Data Integration and you find that the nodes are waiting for resources in the resource group, you can use an exclusive resource group for Data Integration or a custom resource group for Data Integration. For more information, see Create and use an exclusive resource group for Data Integration and Create and use a custom resource group for Data Integration.

  6. Click the Properties tab in the right-side navigation pane to configure scheduling properties for the batch synchronization node.

  7. After the configuration is complete, click the Save and Submit icons in sequence.

Create a node for data development

  1. Click the created workflow, right-click UserDefined, and then choose Create Node > AnalyticDB for MySQL.

  2. In the Create Node dialog box, configure the Name parameter and click Confirm.

  3. On the configuration tab that appears, select the related data source and enter SQL statements based on the syntax supported by AnalyticDB for MySQL. You can write data manipulation language (DML) or data definition language (DDL) statements.

  4. Click the Properties tab in the right-side navigation pane to configure scheduling properties for the AnalyticDB for MySQL node.

  5. After the configuration is complete, click the Save icon to save the node settings. Then, click the Run icon to run the SQL statements you wrote.

Perform O&M operations on data

After you commit and deploy the created nodes, click the more icon in the upper-left corner and choose All Products > Data Development And Task Operation > Operation Center. You can perform O&M operations on the nodes on the page that appears. For more information, see Operation Center.

Manage metadata

Click the more icon in the upper-left corner and choose All Products > Data Governance > DataMap. You can manage metadata on the page that appears. For more information, see Data Map.