Flink SQL is a programming language developed by Alibaba Cloud to simplify the computing model of extract, transform, load (ETL) and to decrease the requirements for user skills. Flink SQL is compatible with standard SQL syntax. Compared with the directed acyclic graph (DAG) mode, Flink SQL has more advanced capabilities. In the script editor of Flink SQL, you can enter statements that are not supported in the DAG mode. This topic describes how to configure an ETL task in Flink SQL mode.
Background information
Note The ETL feature is in public preview. You can apply for a free trial of this feature.
If you have questions during the free trial, join the DingTalk group 32326646 for
technical support.
- Before you configure an ETL task, take note of the following information:
- Input/Dimension Table indicates the source database of the ETL task.
- Output indicates the destination database of the ETL task.
- DTS provides the streaming ETL feature for the data synchronization process. You can add transformation components between the source and destination databases. You can transform data and write the processed data to the destination database in real time. For example, you can join two stream tables into a large table and write the data of the large table to the destination database. You can also add a field to the source table and configure a function to assign values to the field. Then, you can write the field to the destination database.
Prerequisites
- An ETL task is created in the China (Hangzhou), China (Shanghai), China (Qingdao), China (Beijing), China (Zhangjiakou), China (Shenzhen), or China (Guangzhou) region.
- The source database belongs to one of the following types: self-managed MySQL databases, ApsaraDB RDS for MySQL, PolarDB for MySQL, PolarDB-X V1.0 (formerly DRDS), self-managed Oracle databases, self-managed PostgreSQL databases, ApsaraDB RDS for PostgreSQL, Db2 for LUW, Db2 for i, and PolarDB for PostgreSQL.
- The destination database belongs to one of the following types: self-managed MySQL databases, ApsaraDB RDS for MySQL, PolarDB for MySQL, AnalyticDB for MySQL V3.0, self-managed Oracle databases, self-managed PostgreSQL databases, ApsaraDB RDS for PostgreSQL, Db2 for LUW, Db2 for i, and PolarDB for PostgreSQL.
- The schemas of tables in the destination database are created. This is because the ETL feature does not support schema migration. For example, Table A contains Field 1, Field 2, and Field 3, and Table B contains Field 2, Field 3, and Field 4. If you want to join Table A and Table B into a table that contains Field 2 and Field 3, you must create Table C that contains Field 2 and Field 3 in the destination database.
- The ETL feature does not support full data synchronization. Therefore, you can transform only incremental data in real time.
Precautions
- When you configure an ETL task in the DTS console, the connection templates used for the source and destination databases must be different. You must create different connection templates for the source and destination databases. For more information, see Create a connection template.
- The source and destination databases must reside in the same region.
- All stream tables must belong to the same instance.
- All database names and table names must be unique.