Data Transmission Service (DTS) provides the extract, transform, and load (ETL) feature to help you process streaming data in real time. You can perform drag-and-drop operations or execute Flink SQL statements to configure ETL tasks. The ETL feature is integrated with the data replication capabilities of DTS to implement streaming data extraction, data transformation and processing, and data loading. The ETL feature improves efficiency whereas lowers the development threshold and reduces the impact on business systems. The ETL feature enriches the scenarios of real-time data processing and computing, and empowers digital transformation.
- You can configure ETL tasks in directed acyclic graph (DAG) mode or Flink SQL mode.
- DAG mode:
- Visualized operations: The ETL feature provides three components: Input/Dimension Table, Transform, and Output. You can drag and drop components to build stream processing tasks.
- Diverse development components:
- In the Input/Dimension Table component, you can specify the following types of source databases: self-managed MySQL databases, ApsaraDB RDS for MySQL, PolarDB for MySQL, PolarDB-X V1.0 (formerly DRDS), self-managed Oracle databases, self-managed PostgreSQL databases, ApsaraDB RDS for PostgreSQL, Db2 for LUW, Db2 for i, and PolarDB for PostgreSQL.
- In the Transform component, you can join tables, compute functions, and filter fields. More than 90 function compute scenarios are supported.
- In the Output component, you can specify the following types of destination databases: self-managed MySQL databases, ApsaraDB RDS for MySQL, PolarDB for MySQL, AnalyticDB for MySQL V3.0, self-managed Oracle databases, self-managed PostgreSQL databases, ApsaraDB RDS for PostgreSQL, Db2 for LUW, Db2 for i, and PolarDB for PostgreSQL.
- Flink SQL mode: You can execute Flink SQL statements to configure ETL tasks. Flink SQL is compatible with standard SQL syntax.
- DAG mode:
- Industry-leading computing effectiveness: The ETL feature is integrated with the data collection capabilities of DTS. The ETL feature ensures data accuracy and has industry-leading computing effectiveness.
- Flexible task monitoring and management: You can monitor and manage ETL tasks in the DTS console. For example, you can start a task, stop a task, and view task details.
- Centralized management of multi-region or heterogeneous data in real time: To facilitate centralized and efficient management and decision-making, you can store heterogeneous data or data from different regions to the same database in real time.
- Real-time data integration: The data processing capabilities of ETL greatly improves the efficiency of data integration. The low-code development mode reduces the difficulty and cost of data integration.
- Real-time data warehousing: The ETL feature provides industry-leading streaming data processing capabilities to help you quickly build real-time data warehouses.
- Acceleration of offline data warehouses: In streaming data processing, pre-processed data is shipped to data warehouses for in-depth mining. The data warehouses can provide services without affecting your business systems.
- Real-time reporting: To improve the efficiency of reporting and facilitate digital transformation, you can build a real-time reporting system. The system is suitable for various real-time analysis scenarios.
- Real-time computing: You can clean the streaming data generated on the business side in real time to extract feature values and tags. Typical scenarios include online business computing models (such as profiling, risk control, and recommendations) and real-time big screens.
- ETL configuration
- Best practices