ETL Overview for Real-Time Data Streaming - Data Transmission Service

The extract, transform, and load (ETL) feature in Data Transmission Service (DTS) lets you build real-time stream processing pipelines without writing custom pipeline infrastructure. It integrates with DTS data replication to handle extraction from source databases, transformation logic, and loading into destination systems — all in a single task. ETL improves efficiency, lowers the development threshold, and reduces the impact on business systems.

ETL tasks support two authoring modes:

DAG mode — drag-and-drop canvas with three built-in components: Input/Dimension Table, Transform, and Output.
Flink SQL mode — write Flink SQL statements directly; compatible with standard SQL syntax.

When to use each mode

	DAG mode	Flink SQL mode
Best for	Teams who prefer visual pipelines or want to avoid writing SQL	Teams comfortable with SQL who need custom logic
Authoring	Drag-and-drop components	Write Flink SQL statements
Transform flexibility	90+ built-in function compute scenarios	Full Flink SQL expressiveness

Supported databases

Component	Supported databases
Input/Dimension Table (source)	Self-managed MySQL, ApsaraDB RDS for MySQL, PolarDB for MySQL, PolarDB-X 1.0 (formerly DRDS), self-managed Oracle, self-managed PostgreSQL, ApsaraDB RDS for PostgreSQL, Db2 for LUW, Db2 for i, PolarDB for PostgreSQL
Output (destination)	Self-managed MySQL, ApsaraDB RDS for MySQL, PolarDB for MySQL, AnalyticDB for MySQL V3.0, self-managed Oracle, self-managed PostgreSQL, ApsaraDB RDS for PostgreSQL, Db2 for LUW, Db2 for i, PolarDB for PostgreSQL

Transform capabilities

The Transform component supports three classes of operations:

Operation	Description
Join	Combine rows from multiple source tables or dimension tables
Function compute	Apply transformations across 90+ built-in compute scenarios
Filter	Select or drop fields before loading to the destination

Key benefits

Industry-leading computing effectiveness — The ETL feature is integrated with the data collection capabilities of DTS, ensuring data accuracy and industry-leading computing effectiveness.
Flexible task monitoring and management — Monitor and manage ETL tasks directly in the DTS console. You can start a task, stop a task, and view task details.

Use cases

Consolidate data from heterogeneous sources — pull data from databases of different types or in different regions into a single destination in real time, enabling centralized reporting and operations.
Low-code data integration — build data integration pipelines through a visual interface rather than custom code, reducing both development time and operational complexity.
Real-time data warehousing — stream pre-processed data directly into data warehouses as events occur, instead of waiting for scheduled batch loads.
Accelerate offline data warehouses — pre-process streaming data before it reaches a data warehouse. The warehouse can serve queries without being affected by raw data ingestion.
Real-time reporting — power dashboards and reporting systems from a continuously updated data stream, suitable for scenarios like operations monitoring and business analytics.
Real-time computing — clean and enrich streaming data to extract feature values and tags. Typical scenarios include user profiling, risk control, and recommendation systems.

Billing

The ETL feature is currently in public preview. Each account can create up to two ETL instances for free during the preview period.

Important

When the public preview ends, running instances will be charged. Alibaba Cloud will notify users in advance through announcements and SMS messages.

Data Transmission Service:What is ETL?