The extract, transform, and load (ETL) feature in Data Transmission Service (DTS) lets you build real-time stream processing pipelines without writing custom pipeline infrastructure. It integrates with DTS data replication to handle extraction from source databases, transformation logic, and loading into destination systems — all in a single task. ETL improves efficiency, lowers the development threshold, and reduces the impact on business systems.
ETL tasks support two authoring modes:
DAG mode — drag-and-drop canvas with three built-in components: Input/Dimension Table, Transform, and Output.
Flink SQL mode — write Flink SQL statements directly; compatible with standard SQL syntax.
When to use each mode
| DAG mode | Flink SQL mode | |
|---|---|---|
| Best for | Teams who prefer visual pipelines or want to avoid writing SQL | Teams comfortable with SQL who need custom logic |
| Authoring | Drag-and-drop components | Write Flink SQL statements |
| Transform flexibility | 90+ built-in function compute scenarios | Full Flink SQL expressiveness |
Supported databases
| Component | Supported databases |
|---|---|
| Input/Dimension Table (source) | Self-managed MySQL, ApsaraDB RDS for MySQL, PolarDB for MySQL, PolarDB-X 1.0 (formerly DRDS), self-managed Oracle, self-managed PostgreSQL, ApsaraDB RDS for PostgreSQL, Db2 for LUW, Db2 for i, PolarDB for PostgreSQL |
| Output (destination) | Self-managed MySQL, ApsaraDB RDS for MySQL, PolarDB for MySQL, AnalyticDB for MySQL V3.0, self-managed Oracle, self-managed PostgreSQL, ApsaraDB RDS for PostgreSQL, Db2 for LUW, Db2 for i, PolarDB for PostgreSQL |
Transform capabilities
The Transform component supports three classes of operations:
| Operation | Description |
|---|---|
| Join | Combine rows from multiple source tables or dimension tables |
| Function compute | Apply transformations across 90+ built-in compute scenarios |
| Filter | Select or drop fields before loading to the destination |
Key benefits
Industry-leading computing effectiveness — The ETL feature is integrated with the data collection capabilities of DTS, ensuring data accuracy and industry-leading computing effectiveness.
Flexible task monitoring and management — Monitor and manage ETL tasks directly in the DTS console. You can start a task, stop a task, and view task details.
Use cases
Consolidate data from heterogeneous sources — pull data from databases of different types or in different regions into a single destination in real time, enabling centralized reporting and operations.
Low-code data integration — build data integration pipelines through a visual interface rather than custom code, reducing both development time and operational complexity.
Real-time data warehousing — stream pre-processed data directly into data warehouses as events occur, instead of waiting for scheduled batch loads.
Accelerate offline data warehouses — pre-process streaming data before it reaches a data warehouse. The warehouse can serve queries without being affected by raw data ingestion.
Real-time reporting — power dashboards and reporting systems from a continuously updated data stream, suitable for scenarios like operations monitoring and business analytics.
Real-time computing — clean and enrich streaming data to extract feature values and tags. Typical scenarios include user profiling, risk control, and recommendation systems.
Billing
The ETL feature is currently in public preview. Each account can create up to two ETL instances for free during the preview period.
When the public preview ends, running instances will be charged. Alibaba Cloud will notify users in advance through announcements and SMS messages.