All Products
Search
Document Center

Realtime Compute for Apache Flink:Flink CDC job structure

Last Updated:Jul 25, 2025

This topic introduces the basic structure and important parameters of data ingestion jobs based on Flink CDC.

Job structure example

A typical Flink CDC data ingestion job consists of the following modules:

For example, the structure of a job that writes from MySQL to Paimon is as follows:

# MySQL Source module
source:
  type: mysql
  name: MySQL Source
  host: localhost
  port: 3306
  username: admin
  password: <yourPassword>
  tables: adb.*, bdb.user_table_[0-9]+, [app|web]_order_.*, mydb.\\.*

# Paimon Sink module
sink:
  type: paimon
  name: Paimon Sink
  catalog.properties.metastore: filesystem
  catalog.properties.warehouse: /path/warehouse

# Transform module
transform:
  - source-table: mydb.app_order_.*
    projection: id, order_id, TO_UPPER(product_name)
    filter: id > 10 AND order_id > 100
    primary-keys: id
    partition-keys: product_name
    table-options: comment=app order
    description: project fields from source table
    converter-after-transform: SOFT_DELETE
  - source-table: mydb.web_order_.*
    projection: CONCAT(id, order_id) as uniq_id, *
    filter: uniq_id > 10
    description: add new uniq_id for each row

# Route module
route:
  - source-table: mydb.default.app_order_.*
    sink-table: odsdb.default.app_order
    description: sync all table shards to one
  - source-table: mydb.default.web_order
    sink-table: odsdb.default.ods_web_order
    description: sync table to with given prefix ods_

# Pipeline module
pipeline:
  name: source-database-sync-pipe
  schema.change.behavior: evolve

Source module

The Source module defines the data source for Flink CDC data ingestion jobs. Currently supported connectors include MSMQ Kafka, MySQL, MongoDB, Simple Log Service SLS.

Syntax structure

source:
  type: mysql
  name: mysql source
  xxx: ...

For specific configurations supported by each connector, see the documentation for the corresponding connector.

Sink module

The Sink module defines the destination for Flink CDC data ingestion jobs. Currently supported connectors include MSMQ Kafka, Upsert Kafka, Hologres, Paimon, StarRocks, MaxCompute and Print.

Syntax structure

sink:
  type: hologres
  name: hologres sink
  xxx: ...

For specific configurations supported by each connector, see the documentation for the corresponding connector.

Transform module

You can define multiple rules in the Transform module of a Flink CDC data ingestion job to implement projection, calculation, and filtering of data from source tables.

Syntax structure

transform:
  - source-table: db.tbl1
    projection: ...
    filter: ...
  - source-table: db.tbl2
    projection: ...
    filter: ...

For specific expression syntax, see the Flink CDC Transform module documentation.

Route module

You can define multiple rules in the Route module of a Flink CDC data ingestion job to implement merging, broadcasting, and other operations for upstream sharding.

Syntax structure

route:
  - source-table: db.tbl1
    sink-table: sinkdb.tbl1
  - source-table: db.tbl2
    sink-table: sinkdb.tbl2

For specific expression syntax, see the Flink CDC Route module documentation.

Pipeline module

You can configure global parameters in the Pipeline module of a Flink CDC data ingestion job.

Syntax structure

pipeline:
  name: CDC YAML job
  schema.change.behavior: LENIENT

For configurable options, see the Flink CDC Pipeline module documentation.