This topic provides a guide to job development.
Understand upstream and downstream systems
Upstream (Source): The source system from which data is read.
Examples include Kafka, MySQL CDC, Hologres, and Simple Log Service (SLS).
Downstream (Sink): The destination system to write the processed results.
Examples include databases (MySQL, PostgreSQL), data warehouses (ClickHouse, Doris, StarRocks), message queues, and data lakes (Paimon, OSS).
Realtime Compute for Apache Flink supports over 30 upstream and downstream connectors, including databases, message queues, and data lakes. This enables fast data pipeline development. For more information, see Supported connectors.
Define job types according to your use cases
Job type | Use cases |
Flink SQL | Real-time extract, transform, and load (ETL), real-time metric computation, multi-stream joins, streaming warehousing and lakehousing. |
Data ingestion with Flink CDC | Real-time database synchronization, data migration, and automatic table synchronization. |
Datastream API | Complex event processing (CEP), high-frequency external calls, complex window logic, and custom sources or sinks. |
Job development
Flink SQL ETL, data aggregations, and lookup joins. | Data ingestion with Flink CDC Real-time database synchronization and batch table ingestion. | Datastream API CEP, custom states, and complex job logic. |
Typical scenarios | Query and test | Advanced usage |
Ecosystem integration | O&M and optimization | Troubleshooting |