All Products
Search
Document Center

Data Transmission Service:Configure an ETL task in DAG mode

Last Updated:Nov 03, 2023

Data Transmission Service (DTS) provides the extract, transform, and load (ETL) feature. The ETL feature allows you to cleanse and transform streaming data. This way, you can obtain the data that you need in an accurate and efficient manner. This topic describes how to configure an ETL task in directed acyclic graph (DAG) mode.

Background information

  • Before you configure an ETL task, take note of the following information:

    • Input/Dimension Table indicates the source database of the ETL task.

    • Output indicates the destination database of the ETL task.

  • DTS provides the streaming ETL feature for the data synchronization process. You can add a variety of transformation components between the source and destination databases to transform data and write the processed data to the destination database in real time. For example, you can join two stream tables into a large table and write the data of the large table to the destination database. You can also add a field to the source table and configure a function to assign values to the field. Then, you can write the field to the destination database.

Prerequisites

  • An ETL task is created in one of the following regions: China (Hangzhou), China (Shanghai), China (Qingdao), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Guangzhou), and China (Hong Kong).

  • The source database belongs to one of the following types: MySQL, PolarDB for MySQL, Oracle, PostgreSQL, iSeries DB2 (AS/400), Db2 for LUW, PolarDB-X 1.0, PolarDB for PostgreSQL, MariaDB, PolarDB for Oracle, SQL Server, and PolarDB-X 2.0.

  • The destination database belongs to one of the following types:MySQL, PolarDB for MySQL, Oracle, AnalyticDB for MySQL V3.0, PolarDB for PostgreSQL, PostgreSQL, Db2 for LUW, iSeries DB2 (AS/400), AnalyticDB for PostgreSQL, SQL Server, MariaDB, PolarDB-X 1.0, PolarDB for Oracle, and Tablestore.

  • The schemas of tables in the destination database are created. This is because the ETL feature does not support schema migration. For example, Table A contains Field 1, Field 2, and Field 3, and Table B contains Field 2, Field 3, and Field 4. If you want to join Table A and Table B into a table that contains Field 2 and Field 3, you must create Table C that contains Field 2 and Field 3 in the destination database.

  • The ETL feature does not support full data synchronization. Therefore, you can transform only incremental data in real time.

Precautions

  • The source and destination databases must reside in the same region.

  • All stream tables must belong to the same instance.

  • All names of the databases and names of the tables must be unique.

  • The transformation components have the following limits:

    • Table Join: You can perform join operations only between two tables.

    • Field Calculator and Table Record Filter: Only a single table schema is supported.

Flowchart

配置ETL任务流程

To create an ETL task, you must configure one or more source databases, one or more transformation components, and a destination database. The following table describes how to create an ETL task.

Step

Description

References

Configure the source databases

You can select one or more source databases in the Input/Dimension Table section and configure parameters for each source database.

Note

You can select the same type of source database multiple times.

  1. Configure the parameters on the Node Settings tab.

  2. Configure the parameters on the Output Fields tab.

  3. Optional: If you set the Node Type parameter to Stream Table, you must configure the time attributes.

Configure a source database

Configure the transformation components

You can select one or more transformation components and configure the parameters for each transformation component.

Note

You can select the same type of transformation components multiple times.

  • Table Join

    1. Configure the parameters on the Node Settings tab.

    2. Configure the parameters on the Output Fields tab.

  • Field Calculator

    1. Configure the Conversion Name parameter.

    2. Configure the calculator.

  • Table Record Filter

    1. Configure the Conversion Name parameter.

    2. Specify a WHERE condition.

Configure transformation components

Configure the destination database

You can select a destination database type for the Output node and configure the following parameters for the destination database:

  1. Configure the parameters on the Node Settings tab.

  2. Configure the parameters on the Field Mapping tab.

Configure the destination database

Procedure

  1. Go to the Streaming ETL page.

    1. Log on to the Data Management (DMS) console.

    2. In the top navigation bar, move your pointer over DTS.

    3. Choose Data integration > Streaming ETL.

  2. In the upper-left corner of the Streaming ETL page, click新增数据流. In the Create Data Flow dialog box, specify an ETL task name in the Data Flow Name field and set the Development Method parameter to DAG.

  3. Click OK.

  4. Configure the parameters for the ETL task.

    1. Create an ETL task.

      Note
      • In this example, an Input/Dimension Table MySQL node, a Field Calculator node, and an Output MySQL node are configured.

      • You can select one or more source databases. You can select the same type of source database multiple times.

      • You can select one or more transformation components. You can select the same transformation component multiple times.

      • You can select only a single destination database.

      1. On the left side of the canvas, drag an Input/Dimension Table MySQL node to the blank area of the canvas.

      2. Click Input/Dimension Table MySQL-1 on the canvas. Configure the parameters on the Node Settings, Output Fields, and Time Attribute tabs. For more information about how to configure the required parameters, see Configure a source databases.

      3. On the left side of the canvas, drag a Field Calculator node to the blank area of the canvas.

      4. Move the pointer over the Input/Dimension Table MySQL-1 node, and click the dot to draw a line between the Input/Dimension Table MySQL-1 node and the Field Calculator-1 node.

      5. Click the Field Calculator-1 node on the canvas and configure the parameters on the Node Settings tab. For more information about how to configure the required parameters, see Configure transformation components.

      6. On the left side of the canvas, drag an Output MySQL-1 node to the blank area of the canvas.

      7. Move the pointer over the Field Calculator-1 node, and click the dot to draw a line between the Field Calculator-1 node and the Output MySQL-1 node.

      8. Click the Output MySQL-1 node and configure the parameters on the Node Settings and Field Mapping tabs. For more information about how to configure the required parameters, see Configure the destination database.

      9. After you complete the preceding configurations, click Generate Flink SQL Validation or Publish based on your business requirements.

        • Generate Flink SQL Validation: If the Flink SQL validation succeeds, the Flink SQL Validation Succeeded message is displayed. If the validation fails, you can click View ETL Validation Details next to Generate Flink SQL Validation. After you troubleshoot the issues based on the error message, you can perform a Flink SQL validation again.

        • Publish: The system runs a Flink SQL validation first. The ETL task can be prechecked only after the Flink SQL validation is successful. You can click View ETL Validation Details to view the details of the validation.

    2. After the Flink SQL validation is complete, run a precheck. DTS can start the ETL task only after the task passes the precheck. If the task fails to pass the precheck, click View Details next to each failed item. After you troubleshoot the issues based on the error message, you can run a precheck again.

    3. After the precheck is passed, click Next: Purchase Instance in the lower part of the page.

    4. On the Purchase Instance page, configure the Instance Class and Compute Units (CUs) parameters. Then, read and select Data Transmission Service (Pay-as-you-go) Service Terms and Service Terms for Public Preview.

    5. Click Buy and Start to start the ETL task.

      You can return to the Streaming ETL page to view the state of the ETL task in the Status column.