Data Transmission Service (DTS) provides the extract, transform, and load (ETL) feature. You can use this feature to clean and transform streaming data, and to obtain the data you need in an accurate and efficient manner. This topic describes how to configure an ETL task.
- Before you configure an ETL task, take note of the following information:
- Input/Dimension Table indicates the source database of the ETL task.
- Output indicates the destination database of the ETL task.
- DTS provides the streaming ETL feature for the data synchronization process. You can add transformation components between the source and destination databases. You can transform data and write the processed data to the destination database in real time. For example, you can join two stream tables into a large table and write the data of the large table to the destination database. You can also add a field to the source table and configure a function to assign values to the field. Then, you can write the field to the destination database.
- An ETL task is created in the China (Hangzhou) or China (Beijing) region.
- The source database is a self-managed MySQL database, an ApsaraDB RDS for MySQL instance, or a PolarDB for MySQL cluster. The destination database is a self-managed MySQL database, an ApsaraDB RDS for MySQL instance, or an AnalyticDB for MySQL cluster.
- The schemas of tables in the destination database are created. This is because the ETL feature does not support schema migration. For example, Table A contains Field 1, Field 2, and Field 3, and Table B contains Field 2, Field 3, and Field 4. If you need to join Table A and Table B into a table that contains Field 2 and Field 3, you must create Table C in the destination database. Table C contains Field 2 and Field 3.
- The ETL feature does not support full data synchronization. Therefore, you can transform only incremental data in real time.
- The source and destination databases cannot use the same connection template. You must create different connection templates for the source and destination databases. For more information, see Create a connection template.
- The source and destination databases must reside in the same region.
- All stream tables belong to the same instance.
- All database names and table names are unique.
- The transformation components have the following limits:
- JOIN: You can perform JOIN operations only between two tables.
- Field Calculator and Table Record Filter: Only a single table schema is supported.
|Configure the source databases||You can select one or more source databases in the Input/Dimension Table section and set parameters for each source database.
Note You can select the same type of source database multiple times.
|Configure a source database|
|Configure the transformation components||You can select one or more transformation components in the Transform section and set parameters for each transformation component.
Note You can select the same transformation component multiple times.
|Configure the transformation components|
|Configure the destination database||You can select a destination database type in the Output section and set parameters for the destination database.
||Configure the destination database|
- Go to the Data Migration page of the new DTS console.
- In the upper-left corner of the page, select the region where you create the ETL task.
Note You can create an ETL task only in the China (Hangzhou) or China (Beijing) region. Select a region based on the actual scenario.
- In the left-side navigation pane, click ETL.
- On the ETL page, click Create Task (Pay-as-you-go).
- On the Create Task page, set parameters for the ETL task.
- Create an ETL task. Note
- In this example, MySQL is selected in the Input/Dimension Table section, Field Calculator is selected in the Transform section, and MySQL is selected in the Output section.
- You can select one or more source databases. You can select the same type of source database multiple times.
- You can select one or more transformation components. You can select the same transformation component multiple times.
- You can select only one destination database.
- In the Input/Dimension Table section on the left side of the page, select MySQL and drag it to the canvas on the right side of the page.
- Click MySQL on the canvas. In the Input/Dimension Table: MySQL section, configure the node information, field information, and time attributes. For more information, see Configure a source database.
- In the Transform section on the left side of the page, select Field Calculator and drag it to the canvas on the right side of the page.
- Move the pointer over the source MySQL database, and click the dot to draw a line between the source MySQL database and the Field Calculator component.
- In the Transform: Field Calculator section, configure the node information and the field information. For more information, see Configure the transformation components.
- In the Output section on the left side of the page, select MySQL and drag it to the canvas on the right side of the page.
- Move the pointer over Field Calculator, and click the dot to draw a line between the Field Calculator component and the destination MySQL database.
- In the Output: MySQL section, configure the node information and field mappings. For more information, see Configure the destination database.
- After you complete the preceding configurations, click Save and Return, Generate Flink SQL Validator, or Next: Save Task Settings and Precheck.
- Save and Return: You can save the settings of the ETL task as a template to improve configuration efficiency.
- Generate Flink SQL Validator: If Flink SQL validation succeeded, the Flink SQL Validation Succeeded message appears. If the validation fails, you can click View ETL Validation Details next to Generate Flink SQL Validator. You can troubleshoot the issue based on the error message and then perform Flink SQL validation again.
- Next: Save Task Settings and Precheck: The system runs Flink SQL validation. The ETL task can be prechecked only after Flink SQL validation is successful. You can click View ETL Validation Details to view details.
- After Flink SQL validation is completed, run a precheck. DTS can start the ETL task only after the task passes the precheck. If the task fails to pass the precheck, click View Details next to each failed item. Troubleshoot the issues based on the causes and run a precheck again.
- After the precheck is completed, click Next: Purchase Instance in the lower part of the page.
- On the Purchase Instance page, specify the Instance Class and Compute Units (CUs) parameters. Then, select Data Transmission Service (Pay-as-you-go) Service Terms and Service Terms for Public Preview.
- Click Buy and Start to start the ETL task.
- Create an ETL task.