This topic introduces the basic structure and important parameters of data ingestion jobs based on Flink CDC.
Job structure example
A typical Flink CDC data ingestion job consists of the following modules:
For example, the structure of a job that writes from MySQL to Paimon is as follows:
# MySQL Source module
source:
type: mysql
name: MySQL Source
host: localhost
port: 3306
username: admin
password: <yourPassword>
tables: adb.*, bdb.user_table_[0-9]+, [app|web]_order_.*, mydb.\\.*
# Paimon Sink module
sink:
type: paimon
name: Paimon Sink
catalog.properties.metastore: filesystem
catalog.properties.warehouse: /path/warehouse
# Transform module
transform:
- source-table: mydb.app_order_.*
projection: id, order_id, TO_UPPER(product_name)
filter: id > 10 AND order_id > 100
primary-keys: id
partition-keys: product_name
table-options: comment=app order
description: project fields from source table
converter-after-transform: SOFT_DELETE
- source-table: mydb.web_order_.*
projection: CONCAT(id, order_id) as uniq_id, *
filter: uniq_id > 10
description: add new uniq_id for each row
# Route module
route:
- source-table: mydb.default.app_order_.*
sink-table: odsdb.default.app_order
description: sync all table shards to one
- source-table: mydb.default.web_order
sink-table: odsdb.default.ods_web_order
description: sync table to with given prefix ods_
# Pipeline module
pipeline:
name: source-database-sync-pipe
schema.change.behavior: evolveSource module
The Source module defines the data source for Flink CDC data ingestion jobs. Currently supported connectors include MSMQ Kafka, MySQL, MongoDB, Simple Log Service SLS.
Syntax structure
source:
type: mysql
name: mysql source
xxx: ...For specific configurations supported by each connector, see the documentation for the corresponding connector.
Sink module
The Sink module defines the destination for Flink CDC data ingestion jobs. Currently supported connectors include MSMQ Kafka, Upsert Kafka, Hologres, Paimon, StarRocks, MaxCompute and Print.
Syntax structure
sink:
type: hologres
name: hologres sink
xxx: ...For specific configurations supported by each connector, see the documentation for the corresponding connector.
Transform module
You can define multiple rules in the Transform module of a Flink CDC data ingestion job to implement projection, calculation, and filtering of data from source tables.
Syntax structure
transform:
- source-table: db.tbl1
projection: ...
filter: ...
- source-table: db.tbl2
projection: ...
filter: ...For specific expression syntax, see the Flink CDC Transform module documentation.
Route module
You can define multiple rules in the Route module of a Flink CDC data ingestion job to implement merging, broadcasting, and other operations for upstream sharding.
Syntax structure
route:
- source-table: db.tbl1
sink-table: sinkdb.tbl1
- source-table: db.tbl2
sink-table: sinkdb.tbl2For specific expression syntax, see the Flink CDC Route module documentation.
Pipeline module
You can configure global parameters in the Pipeline module of a Flink CDC data ingestion job.
Syntax structure
pipeline:
name: CDC YAML job
schema.change.behavior: LENIENTFor configurable options, see the Flink CDC Pipeline module documentation.