Community Blog Principle Analysis of Apache Flink CDC Batch and Stream Integration

Principle Analysis of Apache Flink CDC Batch and Stream Integration

This article focuses on the processing logic of Flink CDC.

This article is reprinted from the Good Future Technology official account. It uses Flink SQL as a case study to introduce the use of Flink CDC 2.0 and interpret the core design in CDC. The main contents are listed below:

  1. Cases
  2. Core Design
  3. Code Details

GitHub Address

In August 2021, Flink CDC released version 2.0.0. Compared with version 1.0, Flink CDC supports distributed reads and checkpoints in the full read phase and ensures data consistency without locking tables during full + incremental read.

The data reading logic of Flink CDC 2.0 is not complicated, but the design of the FLIP-27: Refactor Source Interface and the lack of understanding of Debezium APIs is complicated. This article focuses on the processing logic of Flink CDC. The design of the FLIP-27 and the API calls of Debezium are not explained.

This article uses CDC version 2.0.0 to introduce the use of Flink CDC 2.0 with Flink SQL cases, introduces the core design of CDC (including split division, split reading, and incremental reading), and explains the code of calling and implementing flink-mysql-cdc interfaces involved in the data processing.

1. Cases

Read full and incremental data from the MySQL table and write data to Kafka in the changelog-json format. Observe the number of data entries of the RowKind type and the number of affected data entries:

public static void main(String[] args) {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        EnvironmentSettings envSettings = EnvironmentSettings.newInstance()
        // note: CK is enabled for incremental synchronization.
        StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(env, envSettings);
        tableEnvironment.executeSql(" CREATE TABLE demoOrders (\n" +
                "         `order_id` INTEGER ,\n" +
                "          `order_date` DATE ,\n" +
                "          `order_time` TIMESTAMP(3),\n" +
                "          `quantity` INT ,\n" +
                "          `product_id` INT ,\n" +
                "          `purchaser` STRING,\n" +
                "           primary key(order_id)  NOT ENFORCED" +
                "         ) WITH (\n" +
                "          'connector' = 'mysql-cdc',\n" +
                "          'hostname' = 'localhost',\n" +
                "          'port' = '3306',\n" +
                "          'username' = 'cdc',\n" +
                "          'password' = '123456',\n" +
                "          'database-name' = 'test',\n" +
                "          'table-name' = 'demo_orders'," +
                            // Full data and incremental data synchronization.
                "          'scan.startup.mode' = 'initial'      " +
                " )");

              tableEnvironment.executeSql("CREATE TABLE sink (\n" +
                "         `order_id` INTEGER ,\n" +
                "          `order_date` DATE ,\n" +
                "          `order_time` TIMESTAMP(3),\n" +
                "          `quantity` INT ,\n" +
                "          `product_id` INT ,\n" +
                "          `purchaser` STRING,\n" +
                "          primary key (order_id)  NOT ENFORCED " +
                ") WITH (\n" +
                "    'connector' = 'kafka',\n" +
                "    'properties.bootstrap.servers' = 'localhost:9092',\n" +
                "    'topic' = 'mqTest02',\n" +
                "    'format' = 'changelog-json' "+

             tableEnvironment.executeSql("insert into sink select * from demoOrders");}

Full data output:

{"data":{"order_id":1010,"order_date":"2021-09-17","order_time":"2021-09-22 10:52:12.189","quantity":53,"product_id":502,"purchaser":"flink"},"op":"+I"}
{"data":{"order_id":1009,"order_date":"2021-09-17","order_time":"2021-09-22 10:52:09.709","quantity":31,"product_id":500,"purchaser":"flink"},"op":"+I"}
{"data":{"order_id":1008,"order_date":"2021-09-17","order_time":"2021-09-22 10:52:06.637","quantity":69,"product_id":503,"purchaser":"flink"},"op":"+I"}
{"data":{"order_id":1007,"order_date":"2021-09-17","order_time":"2021-09-22 10:52:03.535","quantity":52,"product_id":502,"purchaser":"flink"},"op":"+I"}
{"data":{"order_id":1002,"order_date":"2021-09-17","order_time":"2021-09-22 10:51:51.347","quantity":69,"product_id":503,"purchaser":"flink"},"op":"+I"}
{"data":{"order_id":1001,"order_date":"2021-09-17","order_time":"2021-09-22 10:51:48.783","quantity":50,"product_id":502,"purchaser":"flink"},"op":"+I"}
{"data":{"order_id":1000,"order_date":"2021-09-17","order_time":"2021-09-17 17:40:32.354","quantity":30,"product_id":500,"purchaser":"flink"},"op":"+I"}
{"data":{"order_id":1006,"order_date":"2021-09-17","order_time":"2021-09-22 10:52:01.249","quantity":31,"product_id":500,"purchaser":"flink"},"op":"+I"}
{"data":{"order_id":1005,"order_date":"2021-09-17","order_time":"2021-09-22 10:51:58.813","quantity":69,"product_id":503,"purchaser":"flink"},"op":"+I"}
{"data":{"order_id":1004,"order_date":"2021-09-17","order_time":"2021-09-22 10:51:56.153","quantity":50,"product_id":502,"purchaser":"flink"},"op":"+I"}
{"data":{"order_id":1003,"order_date":"2021-09-17","order_time":"2021-09-22 10:51:53.727","quantity":30,"product_id":500,"purchaser":"flink"},"op":"+I"}

Modify table data and capture incremental data:

## Update the value of the 1005
{"data":{"order_id":1005,"order_date":"2021-09-17","order_time":"2021-09-22 02:51:58.813","quantity":69,"product_id":503,"purchaser":"flink"},"op":"-U"}
{"data":{"order_id":1005,"order_date":"2021-09-17","order_time":"2021-09-22 02:55:43.627","quantity":80,"product_id":503,"purchaser":"flink"},"op":"+U"}

## Delete 1000
{"data":{"order_id":1000,"order_date":"2021-09-17","order_time":"2021-09-17 09:40:32.354","quantity":30,"product_id":500,"purchaser":"flink"},"op":"-D"}

2. Core Design

2.1 Split Division

In the full phase, data is read in the distributed mode. First, the data in the current table is divided into multiple chunks by primary key and subsequent subtasks read data within the chunk range. The table data is divided into evenly distributed chunks and non-evenly distributed chunks based on whether the primary key columns are auto-increment integers.

2.1.1 Uniform Distribution

Primary key columns are auto-increment and are of the integer type (int, bigint, and decimal). Query the minimum value and maximum value of a primary key column. Data is evenly divided based on the chunkSize. Since the primary key is of the integer type, the end position of the chunk is calculated based on the start position and chunkSize of the current chunk.

Note: The trigger condition for uniform distribution of the latest version no longer depends on whether the primary key column is auto-incrementing. The primary key column is required to be an integer type and calculate the data distribution coefficient based on max(id) - min(id)/rowcount. Only the distribution coefficient that equals or is less than the configured distribution coefficient (evenly-distribution.factor default is 1000.0d) can be evenly divided.

//  Calculate the data range of the primary key column.
select min(`order_id`), max(`order_id`) from demo_orders;

//  Divide data into chunkSize-sized splits
chunk-0: [min,start + chunkSize)
chunk-1: [start + chunkSize, start + 2chunkSize)
chunk-last: [max,null)

2.1.2 Non-Uniform Distribution

The primary key column is not self-increasing or of a non-integer type. The primary key is non-numeric. You need to sort the undivided data in ascending order by the primary key for each division. The maximum value of the chunkSize parameter is the end position of the current chunk.

Note: The latest version of the non-uniform distribution trigger condition is that the primary key column is of the non-integer type or the calculated distribution coefficient (distributionFactor) is larger than the configured distribution coefficient (evenly-distribution.factor).

// After the unsplit data is sorted, take the chunkSize data to the maximum value, which is used as the end position of the split. 
chunkend = SELECT MAX(`order_id`) FROM (
        SELECT `order_id`  FROM `demo_orders` 
        WHERE `order_id` >= [starting position of the previous split] 
        ORDER BY `order_id` ASC 
        LIMIT   [chunkSize]  
    ) AS T

2.2 Full Slice Data Reading

Flink divides table data into multiple chunks, and subtasks read chunk data concurrently without locking. Since there is no lock in the whole process during data split reading, other transactions may modify the data within the split range. However, data consistency cannot be guaranteed. Therefore, Flink uses snapshot record reading + Binary log data correction in the full phase to ensure data consistency.

2.2.1 Snapshot Reading

Execute SQL to query the data records of the split range using JDBC:

## Read SQL for snapshot records
SELECT * FROM `test`.`demo_orders` 
WHERE order_id >= [chunkStart] 
AND NOT (order_id = [chunkEnd]) 
AND order_id <= [chunkEnd]

2.2.2 Data Correction

Execute the SHOW MASTER STATUS query before and after the snapshot read to query the current offset of the Binlog file. After the snapshot is read, query the Binlog data within the range and correct the read snapshot records.

The data organization structure during snapshot reading and Binlog data reading is shown below:


BinlogEvents corrects the SnapshotEvents rule.

  • No binlog data is read, which means no other transactions are performed during the select phase, and all snapshot records are directly sent.
  • If binlog data is read and the changed data record does not belong to the current split, a snapshot record is issued.
  • The binlog data is read and the change of the data record belongs to the current split. The delete operation removes this data from the snapshot memory, the insert operation adds new data to the snapshot memory, and the update operation adds a change record to the snapshot memory. Eventually, the two records before and after the update are output downstream.

Revised data organization structure:


The data in the range of splits [1,11] is used as an example to describe the processing of split data. c, d, and u represent the add, delete, and update operations captured by Debezium.

Data and structure before revision:


Revised data and structure:


After a single slice is processed, the SplitEnumerator will send the start position (ChunkStart, ChunkStartEnd) of the completed slice data and the maximum offset (High watermark) of the binlog. This parameter is used to specify the start offset for incremental reading.

2.3 Incremental Split Data Reading

After the full data reading stage, the SplitEnumerator issues a BinlogSplit for incremental data reading. The most important attribute of BinlogSplit reading is the starting offset. If the offset is set small, there may be duplicate data downstream. If the offset is set large, there may be overdue dirty data. The start offset of the Flink CDC incremental read is the smallest binlog offset of all completed full splits. Only the data that meets the specified conditions is sent downstream. Data delivery conditions:

  • The offset of the captured binlog data > the maximum offset of the binlog data of the split to which the data belongs

For example, the completed split information retained by the SplitEnumerator is listed below:

Split Index Chunk Data Range Maximum Binlog Read by Splits
0 [1, 100] 1000
1 [101,200] 800
2 [201,300] 1500

During incremental reading, binlog data is read from offset 800. When data at the range of <data:123, offset:1500> is captured, the snapshot split to which the 123 belongs is found first and the corresponding maximum binlog offset 800 is found afterwards. If the current offset is greater than the maximum offset read by the snapshot, the data is sent. Otherwise, the data is discarded.

3. Code Details

The FLIP-27: Refactor Source Interface design is not described in detail. This article focuses on the flink-mysql-cdc interface call and implementation.

3.1 MySqlSourceEnumerator Initialization

SourceCoordinator (as an OperatorCoordinator implementation of Source) runs on the master node and does some initialization work by calling MySqlParallelSource#createEnumerator to create a MySqlSourceEnumerator and calling the start method at startup.


1) Create a MySqlSourceEnumerator, use MySqlHybridSplitAssigner to split full + incremental data, and use MySqlValidator to verify the MySQL version and configuration

2) MySqlValidator verification:

  1. The MySQL version must be 5.7 or later.
  2. The binlog_format parameter must be set to ROW.
  3. The binlog_row_image configuration must be FULL.

3) MySqlSplitAssigner initialization:

  1. Create a ChunkSplitter for dividing splits
  2. Filter the name of the table to be read

4) Start a periodic scheduling thread. It requires SourceReader to send information that has been completed but not sent an ACK event to the SourceEnumerator.

private void syncWithReaders(int[] subtaskIds, Throwable t) {
    if (t != null) {
        throw new FlinkRuntimeException("Failed to list obtain registered readers due to:", t);
    // when the SourceEnumerator restores or the communication failed between
    // SourceEnumerator and SourceReader, it may missed some notification event.
    // tell all SourceReader(s) to report there finished but unacked splits.
    if (splitAssigner.waitingForFinishedSplits()) {
        for (int subtaskId : subtaskIds) {
            // note: Send FinishedSnapshotSplitsRequestEvent 
                    subtaskId, new FinishedSnapshotSplitsRequestEvent());

3.2 MySqlSourceReader Initialization

SourceOperator integrates SourceReader and interacts with SourceCoordinator through OperatorEventGateway.


1.  SourceOperator creates MySqlSourceReader by MySqlParallelSource during initialization. The MySqlSourceReader creates a Fetcher pull split data using the SingleThreadFetcherManager. The data is written to elementsQueue in the MySqlRecords format.


public SourceReader<T, MySqlSplit> createReader(SourceReaderContext readerContext) throws Exception {
    // note:  Data storage queue
FutureCompletingBlockingQueue<RecordsWithSplitIds<SourceRecord>> elementsQueue =
        new FutureCompletingBlockingQueue<>();
final Configuration readerConfiguration = getReaderConfig(readerContext);

    // note: Split Reader factory class
Supplier<MySqlSplitReader> splitReaderSupplier =
        () -> new MySqlSplitReader(readerConfiguration, readerContext.getIndexOfSubtask());

return new MySqlSourceReader<>(
        new MySqlRecordEmitter<>(deserializationSchema),

2.  The created MySqlSourceReader is passed to the SourceCoordinator as an event for registration. After the SourceCoordinator receives the registration event, it saves the reader address and index.

// note: SourceCoordinator handle the reader registration event
private void handleReaderRegistrationEvent(ReaderRegistrationEvent event) {
    context.registerSourceReader(new ReaderInfo(event.subtaskId(), event.location()));

3.  After the MySqlSourceReader is started, a request split event is sent to the MySqlSourceEnumerator to collect the allocated split data.

4.  After the SourceOperator is initialized, you can call the emitNext to merge datasets obtained by the SourceReaderBase from elementsQueue and send them to the MySqlRecordEmitter. Interface call diagram:


3.3 MySqlSourceEnumerator Processing Split Requests

When the MySqlSourceReader starts, a request for the RequestSplitEvent event is sent to the MySqlSourceEnumerator to read range data based on the returned split range. MySqlSourceEnumerator splits the request processing logic in the full data read phase and finally returns a MySqlSnapshotSplit.


1.  Handle the split request event, assign the split to the requested reader, and pass the MySqlSplit (full stage MySqlSnapshotSplit and incremental stage MySqlBinlogSplit) by sending the AddSplitEvent time.

public void handleSplitRequest(int subtaskId, @Nullable String requesterHostname) {
    if (!context.registeredReaders().containsKey(subtaskId)) {
        // reader failed between sending the request and now. skip this request.
    // note:  stores the subtaskId to which the reader belongs to TreeSet, and preferentially allocates task-0 when processing binlog split


// note: Assign a split
private void assignSplits() {
    final Iterator<Integer> awaitingReader = readersAwaitingSplit.iterator();
    while (awaitingReader.hasNext()) {
        int nextAwaiting = awaitingReader.next();
        // if the reader that requested another split has failed in the meantime, remove
        // it from the list of waiting readers
        if (!context.registeredReaders().containsKey(nextAwaiting)) {

        //note: The split is assigned by the MySqlSplitAssigner
        Optional<MySqlSplit> split = splitAssigner.getNext();
        if (split.isPresent()) {
            final MySqlSplit mySqlSplit = split.get();
            //  note: Send AddSplitEvent and return split information for Reader
            context.assignSplit(mySqlSplit, nextAwaiting);

            LOG.info("Assign split {} to subtask {}", mySqlSplit, nextAwaiting);
        } else {
            // there is no available splits by now, skip assigning

2.  The logic of MySqlHybridSplitAssigner for processing full data splits and incremental data splits.

  1. When the task is started, the remainingTables are not empty, and the return value of noMoreSplits is false. A SnapshotSplit is created.
  2. After the full-stage split is read, noMoreSplits returns true and creates a BinlogSplit.
public Optional<MySqlSplit> getNext() {
    if (snapshotSplitAssigner.noMoreSplits()) {
        // binlog split assigning
        if (isBinlogSplitAssigned) {
            // no more splits for the assigner
            return Optional.empty();
        } else if (snapshotSplitAssigner.isFinished()) {
            // we need to wait snapshot-assigner to be finished before
            // assigning the binlog split. Otherwise, records emitted from binlog split
            // might be out-of-order in terms of same primary key with snapshot splits.
            isBinlogSplitAssigned = true;

            //note: After the snapshot split, create a BinlogSplit.
            return Optional.of(createBinlogSplit());
        } else {
            // binlog split is not ready by now
            return Optional.empty();
    } else {
        // note: SnapshotSplit is created by the MySqlSnapshotSplitAssigner
        // snapshot assigner still have remaining splits, assign split from it
        return snapshotSplitAssigner.getNext();

3.  MySqlSnapshotSplitAssigner processes the full split logic. The splits are generated by ChunkSplitter and stored in Iterator.

public Optional<MySqlSplit> getNext() {
    if (!remainingSplits.isEmpty()) {
        // return remaining splits firstly
        Iterator<MySqlSnapshotSplit> iterator = remainingSplits.iterator();
        MySqlSnapshotSplit split = iterator.next();
        //note: The allocated splits are stored in the assignedSplits collection
        assignedSplits.put(split.splitId(), split);

        return Optional.of(split);
    } else {
        // note: remainingTables stores the name of the table to be read in the initialization phase
        TableId nextTable = remainingTables.pollFirst();
        if (nextTable != null) {
            // split the given table into chunks (snapshot splits)
            //  note: ChunkSplitter is created in the initialization phase, and generateSplits is called to divide splits
            Collection<MySqlSnapshotSplit> splits = chunkSplitter.generateSplits(nextTable);
            //  note: Retain all slice split information
            //  note: The table that has been split
            //  note: Call this method recursively
            return getNext();
        } else {
            return Optional.empty();

4.  hunkSplitter divides the table into evenly distributed or unevenly distributed splits. The read table must contain a physical primary key.

Public Collection<MySqlSnapshotSplit> generateSplits(TableId tableId) {

    Table schema = mySqlSchema.getTableSchema(tableId).getTable();
    List<Column> primaryKeys = schema.primaryKeyColumns();
    // note: Must have a primary key
    if (primaryKeys.isEmpty()) {
        throw new ValidationException(
                        "Incremental snapshot for tables requires primary key,"
                                + " but table %s doesn't have primary key.",
    // use first field in primary key as the split key
    Column splitColumn = primaryKeys.get(0);

    final List<ChunkRange> chunks;
    try {
         // note: Divide data into multiple splits by primary key column
        chunks = splitTableIntoChunks(tableId, splitColumn);
    } catch (SQLException e) {
        throw new FlinkRuntimeException("Failed to split chunks for table " + tableId, e);
    //note: Primary key data type conversion and ChunkRange is packaged into MySqlSnapshotSplit.
    // convert chunks into splits
    List<MySqlSnapshotSplit> splits = new ArrayList<>();
    RowType splitType = splitType(splitColumn);
    for (int i = 0; i < chunks.size(); i++) {
        ChunkRange chunk = chunks.get(i);
        MySqlSnapshotSplit split =
                        tableId, i, splitType, chunk.getChunkStart(), chunk.getChunkEnd());
    return splits;

5.  splitTableIntoChunks divides splits based on physical primary keys.

private List<ChunkRange> splitTableIntoChunks(TableId tableId, Column 
        throws SQLException {
    final String splitColumnName = splitColumn.name();
    //  select min, max
    final Object[] minMaxOfSplitColumn = queryMinMax(jdbc, tableId, splitColumnName);
    final Object min = minMaxOfSplitColumn[0];
    final Object max = minMaxOfSplitColumn[1];
    if (min == null || max == null || min.equals(max)) {
        // empty table, or only one row, return full table scan as a chunk
        return Collections.singletonList(ChunkRange.all());

    final List<ChunkRange> chunks;
    if (splitColumnEvenlyDistributed(splitColumn)) {
        // use evenly-sized chunks which is much efficient
        // note: Evenly divided by primary key
        chunks = splitEvenlySizedChunks(min, max);
    } else {
        // note: Non-uniform division by primary key
        // use unevenly-sized chunks which will request many queries and is not efficient.
        chunks = splitUnevenlySizedChunks(tableId, splitColumnName, min, max);

    return chunks;

/** Checks whether split column is evenly distributed across its range. */
private static boolean splitColumnEvenlyDistributed(Column splitColumn) {
    // only column is auto-incremental are recognized as evenly distributed.
    // TODO: we may use MAX,MIN,COUNT to calculate the distribution in the future.
    if (splitColumn.isAutoIncremented()) {
        DataType flinkType = MySqlTypeUtils.fromDbzColumn(splitColumn);
        LogicalTypeRoot typeRoot = flinkType.getLogicalType().getTypeRoot();
        // currently, we only support split column with type BIGINT, INT, DECIMAL
        return typeRoot == LogicalTypeRoot.BIGINT
                || typeRoot == LogicalTypeRoot.INTEGER
                || typeRoot == LogicalTypeRoot.DECIMAL;
    } else {
        return false;

 *  Split the table into blocks of uniform size according to the minimum and maximum values of the split column, and scroll the blocks in {@link #chunkSize} step size.
 * Split table into evenly sized chunks based on the numeric min and max value of split column,
 * and tumble chunks in {@link #chunkSize} step size.
private List<ChunkRange> splitEvenlySizedChunks(Object min, Object max) {
    if (ObjectUtils.compare(ObjectUtils.plus(min, chunkSize), max) > 0) {
        // there is no more than one chunk, return full table as a chunk
        return Collections.singletonList(ChunkRange.all());

    final List<ChunkRange> splits = new ArrayList<>();
    Object chunkStart = null;
    Object chunkEnd = ObjectUtils.plus(min, chunkSize);
    //  chunkEnd <= max
    while (ObjectUtils.compare(chunkEnd, max) <= 0) {
        splits.add(ChunkRange.of(chunkStart, chunkEnd));
        chunkStart = chunkEnd;
        chunkEnd = ObjectUtils.plus(chunkEnd, chunkSize);
    // add the ending split
    splits.add(ChunkRange.of(chunkStart, null));
    return splits;

/**   Split the table into blocks of uneven size by calculating the maximum value of the next block.
 * Split table into unevenly sized chunks by continuously calculating next chunk max value. */
private List<ChunkRange> splitUnevenlySizedChunks(
        TableId tableId, String splitColumnName, Object min, Object max) throws SQLException {
    final List<ChunkRange> splits = new ArrayList<>();
    Object chunkStart = null;

    Object chunkEnd = nextChunkEnd(min, tableId, splitColumnName, max);
    int count = 0;
    while (chunkEnd != null && ObjectUtils.compare(chunkEnd, max) <= 0) {
        // we start from [null, min + chunk_size) and avoid [null, min)
        splits.add(ChunkRange.of(chunkStart, chunkEnd));
        // may sleep a while to avoid DDOS on MySQL server
        chunkStart = chunkEnd;
        chunkEnd = nextChunkEnd(chunkEnd, tableId, splitColumnName, max);
    // add the ending split
    splits.add(ChunkRange.of(chunkStart, null));
    return splits;

private Object nextChunkEnd(
        Object previousChunkEnd, TableId tableId, String splitColumnName, Object max)
        throws SQLException {
    // chunk end might be null when max values are removed
    Object chunkEnd =
            queryNextChunkMax(jdbc, tableId, splitColumnName, chunkSize, previousChunkEnd);
    if (Objects.equals(previousChunkEnd, chunkEnd)) {
        // we don't allow equal chunk start and end,
        // should query the next one larger than chunkEnd
        chunkEnd = queryMin(jdbc, tableId, splitColumnName, chunkEnd);
    if (ObjectUtils.compare(chunkEnd, max) >= 0) {
        return null;
    } else {
        return chunkEnd;

3.4 MySqlSourceReader Processing Split Allocation Requests


After the MySqlSourceReader receives the split allocation request, it creates a SplitFetcher thread to add and execute the AddSplitsTask task to the taskQueue. Then, it executes the FetchTask task to read data using the Debezium API. The read data is stored in elementsQueue. The SourceReaderBase obtains data from the queue and sends it to the MySqlRecordEmitter.

1.  Create a SplitFetcher to add an AddSplitsTask to the taskQueue when processing the Split Assignment event:

public void addSplits(List<SplitT> splitsToAdd) {
    SplitFetcher<E, SplitT> fetcher = getRunningFetcher();
    if (fetcher == null) {
        fetcher = createSplitFetcher();
        // Add the splits to the fetchers.
    } else {

// Create a SplitFetcher
protected synchronized SplitFetcher<E, SplitT> createSplitFetcher() {
    if (closed) {
        throw new IllegalStateException("The split fetcher manager has closed.");
    // Create SplitReader.
    SplitReader<E, SplitT> splitReader = splitReaderFactory.get();

    int fetcherId = fetcherIdGenerator.getAndIncrement();
    SplitFetcher<E, SplitT> splitFetcher =
            new SplitFetcher<>(
                    () -> {
    fetchers.put(fetcherId, splitFetcher);
    return splitFetcher;

public void addSplits(List<SplitT> splitsToAdd) {
    enqueueTask(new AddSplitsTask<>(splitReader, splitsToAdd, assignedSplits));

2.  Execute the SplitFetcher thread. Execute the AddSplitsTask thread for the first time to add splits. Then, execute the FetchTask thread to pull data.

void runOnce() {
    try {
        if (shouldRunFetchTask()) {
            runningTask = fetchTask;
        } else {
            runningTask = taskQueue.take();
        if (!wakeUp.get() && runningTask.run()) {
            LOG.debug("Finished running task {}", runningTask);
            runningTask = null;
    } catch (Exception e) {
        throw new RuntimeException(
                        "SplitFetcher thread %d received unexpected exception while polling the records",

    synchronized (wakeUp) {
        // Set the running task to null. It is necessary for the shutdown method to avoid
        // unnecessarily interrupt the running task.
        runningTask = null;
        // Set the wakeUp flag to false.
        LOG.debug("Cleaned wakeup flag.");

3.  AddSplitsTask calls the MySqlSplitReader handleSplitsChanges method to add the allocated split information to the split queue. In the next fetch() call, fetch the slice from the queue and read the split data.

public boolean run() {
    for (SplitT s : splitsToAdd) {
        assignedSplits.put(s.splitId(), s);
    splitReader.handleSplitsChanges(new SplitsAddition<>(splitsToAdd));
    return true;
public void handleSplitsChanges(SplitsChange<MySqlSplit> splitsChanges) {
    if (!(splitsChanges instanceof SplitsAddition)) {
        throw new UnsupportedOperationException(
                        "The SplitChange type of %s is not supported.",

    //note: Add a split to the queue.

4.  MySqlSplitReader executes fetch(). DebeziumReader reads data to the event queue and returns the data in the MySqlRecords format after the data is corrected.

public RecordsWithSplitIds<SourceRecord> fetch() throws IOException {
    // note: creates a reader and reads data

    Iterator<SourceRecord> dataIt = null;
    try {
        // note:  corrects the read data
        dataIt = currentReader.pollSplitRecords();
    } catch (InterruptedException e) {
        LOG.warn("fetch data failed.", e);
        throw new IOException(e);

    //  note: The returned data is encapsulated as MySqlRecords for transmission
    return dataIt == null
            ? finishedSnapshotSplit()   
            : MySqlRecords.forRecords(currentSplitId, dataIt);

private void checkSplitOrStartNext() throws IOException {
    // the binlog reader should keep alive
    if (currentReader instanceof BinlogSplitReader) {

    if (canAssignNextSplit()) {
        // note:  reads MySqlSplit from the split queue
        final MySqlSplit nextSplit = splits.poll();
        if (nextSplit == null) {
            throw new IOException("Cannot fetch from another split - no split remaining");

        currentSplitId = nextSplit.splitId();
        // note:  distinguishes between full split reading and the incremental split reading
        if (nextSplit.isSnapshotSplit()) {
            if (currentReader == null) {
                final MySqlConnection jdbcConnection = getConnection(config);
                final BinaryLogClient binaryLogClient = getBinaryClient(config);

                final StatefulTaskContext statefulTaskContext =
                        new StatefulTaskContext(config, binaryLogClient, jdbcConnection);
                // note: creates a SnapshotSplitReader and uses the Debezium API to read the allocated data and the binlog value of the range
                currentReader = new SnapshotSplitReader(statefulTaskContext, subtaskId);

        } else {
            // point from snapshot split to binlog split
            if (currentReader != null) {
                LOG.info("It's turn to read binlog split, close current snapshot reader");

            final MySqlConnection jdbcConnection = getConnection(config);
            final BinaryLogClient binaryLogClient = getBinaryClient(config);
            final StatefulTaskContext statefulTaskContext =
                    new StatefulTaskContext(config, binaryLogClient, jdbcConnection);
            LOG.info("Create binlog reader");
            // note: Create a BinlogSplitReader and use the Debezium API to perform incremental data reading
            currentReader = new BinlogSplitReader(statefulTaskContext, subtaskId);
        // note: Reader is executed to read data

3.5 DebeziumReader Data Processing

DebeziumReader includes full split reading and incremental split reading. After data is read, it is stored in the ChangeEventQueue and corrected during pollSplitRecords.

1.  SnapshotSplitReader full split reading. Data reading in the full phase queries table data within the split range by executing the Select statement. When SHOW MASTER STATUS is executed before and after the write to the queue, the current offset is written.

public void submitSplit(MySqlSplit mySqlSplit) {
            () -> {
                try {
                    currentTaskRunning = true;
                    // note: The current offset of binlogs before and after data is inserted
                    // 1. execute snapshot read task。 
                    final SnapshotSplitChangeEventSourceContextImpl sourceContext =
                            new SnapshotSplitChangeEventSourceContextImpl();
                    SnapshotResult snapshotResult =

                    //  note: prepares for incremental reading, including the start offset
                    final MySqlBinlogSplit appendBinlogSplit = createBinlogSplit(sourceContext);
                    final MySqlOffsetContext mySqlOffsetContext =

                    //  note: reads from the start offset           
                    // 2. execute binlog read task
                    if (snapshotResult.isCompletedOrSkipped()) {
                        // we should only capture events for the current table,
                        Configuration dezConf =

                        // task to read binlog for current split
                        MySqlBinlogSplitReadTask splitBinlogReadTask =
                                new MySqlBinlogSplitReadTask(
                                        new MySqlConnectorConfig(dezConf),

                                new SnapshotBinlogSplitChangeEventSourceContextImpl());
                    } else {
                        readException =
                                new IllegalStateException(
                                                "Read snapshot for mysql split %s fail",
                } catch (Exception e) {
                    currentTaskRunning = false;
                                    "Execute snapshot read task for mysql split %s fail",
                    readException = e;

2.  SnapshotSplitReader incremental split reading. The focus of split reading in the incremental phase is to determine when the BinlogSplitReadTask stops. The reading ends at the offset when the slicing phase ends.

protected void handleEvent(Event event) {
    // note: Event delivery queue
    // note: The binlog reading must be terminated in the full read phase
    // check do we need to stop for read binlog for snapshot split.
    if (isBoundedRead()) {
        final BinlogOffset currentBinlogOffset =
                new BinlogOffset(
        // note: currentBinlogOffset > HW Stop Reading
        // reach the high watermark, the binlog reader should finished
        if (currentBinlogOffset.isAtOrBefore(binlogSplit.getEndingOffset())) {
            // send binlog end event
            try {
            } catch (InterruptedException e) {
                logger.error("Send signal event error.", e);
                        new DebeziumException("Error processing binlog signal event", e));
            //  Terminate binlog reading
            // tell reader the binlog task finished
            ((SnapshotBinlogSplitChangeEventSourceContextImpl) context).finished();

3.  The original data in the queue is corrected when the SnapshotSplitReader executes the pollSplitRecords. Please see RecordUtils#normalizedSplitRecords for more information about the processing logic.

public Iterator<SourceRecord> pollSplitRecords() throws InterruptedException {
    if (hasNextElement.get()) {
        // data input: [low watermark event][snapshot events][high watermark event][binlogevents][binlog-end event]
        // data output: [low watermark event][normalized events][high watermark event]
        boolean reachBinlogEnd = false;
        final List<SourceRecord> sourceRecords = new ArrayList<>();
        while (!reachBinlogEnd) {
            // note: Handle DataChangeEvent events written in queues
            List<DataChangeEvent> batch = queue.poll();
            for (DataChangeEvent event : batch) {
                if (RecordUtils.isEndWatermarkEvent(event.getRecord())) {
                    reachBinlogEnd = true;
        // snapshot split return its data once
        //  ************   Correct data  ***********
        return normalizedSplitRecords(currentSnapshotSplit, sourceRecords, nameAdjuster)
    // the data has been polled, no more data
    reachEnd.compareAndSet(false, true);
    return null;

4.  BinlogSplitReader data reading. The read logic is simple. The focus is on the setting of the starting offset, which is the HW of all splits.

5.  The original data in the queue is corrected when the BinlogSplitReader executes the pollSplitRecords to ensure data consistency. Binlog reads in the incremental phase are unbounded, and all data is sent to the event queue. About BinlogSplitReader, you can use shouldEmit() to determine whether to send the data.

public Iterator<SourceRecord> pollSplitRecords() throws InterruptedException {
    final List<SourceRecord> sourceRecords = new ArrayList<>();
    if (currentTaskRunning) {
        List<DataChangeEvent> batch = queue.poll();
        for (DataChangeEvent event : batch) {
            if (shouldEmit(event.getRecord())) {
    return sourceRecords.iterator();

Event delivery conditions:

  1. The value of the newly received event post is greater than maxwm.
  2. If the current data value belongs to a snapshot spilt and the offset is greater than HWM, the data will be sent.
 * Returns the record should emit or not.
 * <p>The watermark signal algorithm is the binlog split reader only sends the binlog event that
 * belongs to its finished snapshot splits. For each snapshot split, the binlog event is valid
 * since the offset is after its high watermark.
 * <pre> E.g: the data input is :
 *    snapshot-split-0 info : [0,    1024) highWatermark0
 *    snapshot-split-1 info : [1024, 2048) highWatermark1
 *  the data output is:
 *  only the binlog event belong to [0,    1024) and offset is after highWatermark0 should send,
 *  only the binlog event belong to [1024, 2048) and offset is after highWatermark1 should send.
 * </pre>
private boolean shouldEmit(SourceRecord sourceRecord) {
    if (isDataChangeRecord(sourceRecord)) {
        TableId tableId = getTableId(sourceRecord);
        BinlogOffset position = getBinlogPosition(sourceRecord);
        // aligned, all snapshot splits of the table has reached max highWatermark
        // note:  Send when the value of the newly received event post is greater than maxwm
        if (position.isAtOrBefore(maxSplitHighWatermarkMap.get(tableId))) {
            return true;
        Object[] key =

        for (FinishedSnapshotSplitInfo splitInfo : finishedSplitsInfo.get(tableId)) {
             *  note: Send the data when the current data value belongs to a snapshot spilt and the offset is greater than HWM
            if (RecordUtils.splitKeyRangeContains(
                            key, splitInfo.getSplitStart(), splitInfo.getSplitEnd())
                    && position.isAtOrBefore(splitInfo.getHighWatermark())) {
                return true;
        // not in the monitored splits scope, do not emit
        return false;

    // always send the schema change event and signal event
    // we need record them to state of Flink
    return true;

3.6 MySqlRecordEmitter Data Distribution

SourceReaderBase obtains a collection of DataChangeEvent data read by a split from a queue and converts the data type from the DataChangeEvent of Debezium to the RowData type of Flink.

1.  SourceReaderBase processing Split Data:

public InputStatus pollNext(ReaderOutput&lt;T&gt; output) throws Exception {
    // make sure we have a fetch we are working on, or move to the next
    RecordsWithSplitIds&lt;E&gt; recordsWithSplitId = this.currentFetch;
    if (recordsWithSplitId == null) {
        recordsWithSplitId = getNextFetch(output);
        if (recordsWithSplitId == null) {
            return trace(finishedOrAvailableLater());

    // we need to loop here, because we may have to go across splits
    while (true) {
        // Process one record.
        // note:  read a single piece of data from the iterator through MySqlRecords
        final E record = recordsWithSplitId.nextRecordFromSplit();
        if (record != null) {
            // emit the record.
            recordEmitter.emitRecord(record, currentSplitOutput, currentSplitContext.state);
            LOG.trace("Emitted record: {}", record);

            // We always emit MORE_AVAILABLE here, even though we do not strictly know whether
            // more is available. If nothing more is available, the next invocation will find
            // this out and return the correct status.
            // That means we emit the occasional 'false positive' for availability, but this
            // saves us doing checks for every record. Ultimately, this is cheaper.
            return trace(InputStatus.MORE_AVAILABLE);
        } else if (!moveToNextSplit(recordsWithSplitId, output)) {
            // The fetch is done and we just discovered that and have not emitted anything, yet.
            // We need to move to the next fetch. As a shortcut, we call pollNext() here again,
            // rather than emitting nothing and waiting for the caller to call us again.
            return pollNext(output);
        // else fall through the loop

private RecordsWithSplitIds&lt;E&gt; getNextFetch(final ReaderOutput&lt;T&gt; output) {

    LOG.trace("Getting next source data batch from queue");
    // note: obtain data from elementsQueue
    final RecordsWithSplitIds&lt;E&gt; recordsWithSplitId = elementsQueue.poll();
    if (recordsWithSplitId == null || !moveToNextSplit(recordsWithSplitId, output)) {
        return null;

    currentFetch = recordsWithSplitId;
    return recordsWithSplitId;

2.  MySqlRecords returns a single data collection:


public SourceRecord nextRecordFromSplit() {
    final Iterator&lt;SourceRecord&gt; recordsForSplit = this.recordsForCurrentSplit;
    if (recordsForSplit != null) {
        if (recordsForSplit.hasNext()) {
            return recordsForSplit.next();
        } else {
            return null;
    } else {
        throw new IllegalStateException();

3.  MySqlRecordEmitter converts data to Rowdata by RowDataDebeziumDeserializeSchema.

public void emitRecord(SourceRecord element, SourceOutput&lt;T&gt; output, MySqlSplitState splitState)
    throws Exception {
if (isWatermarkEvent(element)) {
    BinlogOffset watermark = getWatermark(element);
    if (isHighWatermarkEvent(element) && splitState.isSnapshotSplitState()) {
} else if (isSchemaChangeEvent(element) && splitState.isBinlogSplitState()) {
    HistoryRecord historyRecord = getHistoryRecord(element);
    Array tableChanges =
    TableChanges changes = TABLE_CHANGE_SERIALIZER.deserialize(tableChanges, true);
    for (TableChanges.TableChange tableChange : changes) {
        splitState.asBinlogSplitState().recordSchema(tableChange.getId(), tableChange);
} else if (isDataChangeRecord(element)) {
    //  note: data process
    if (splitState.isBinlogSplitState()) {
        BinlogOffset position = getBinlogPosition(element);
            new Collector&lt;T&gt;() {
                public void collect(final T t) {

                public void close() {
                    // do nothing
} else {
    // unknown element
    LOG.info("Meet unknown element {}, just skip.", element);

4.  RowDataDebeziumDeserializeSchema serialization process:

public void deserialize(SourceRecord record, Collector&lt;RowData&gt; out) throws Exception {
    Envelope.Operation op = Envelope.operationFor(record);
    Struct value = (Struct) record.value();
    Schema valueSchema = record.valueSchema();
    if (op == Envelope.Operation.CREATE || op == Envelope.Operation.READ) {
        GenericRowData insert = extractAfterRow(value, valueSchema);
        validator.validate(insert, RowKind.INSERT);
    } else if (op == Envelope.Operation.DELETE) {
        GenericRowData delete = extractBeforeRow(value, valueSchema);
        validator.validate(delete, RowKind.DELETE);
    } else {
        GenericRowData before = extractBeforeRow(value, valueSchema);
        validator.validate(before, RowKind.UPDATE_BEFORE);

        GenericRowData after = extractAfterRow(value, valueSchema);
        validator.validate(after, RowKind.UPDATE_AFTER);

3.7 MySqlSourceReader Reports Split Read Completion Events

After the MySqlSourceReader processes a full Split, it sends the completed Split information to the MySqlSourceEnumerator, including the split ID and HighWatermar, and then continues to send the split request.

protected void onSplitFinished(Map&lt;String, MySqlSplitState&gt; finishedSplitIds) {
for (MySqlSplitState mySqlSplitState : finishedSplitIds.values()) {
    MySqlSplit mySqlSplit = mySqlSplitState.toMySqlSplit();

    finishedUnackedSplits.put(mySqlSplit.splitId(), mySqlSplit.asSnapshotSplit());
 *   note: send the split read completion event

//  Continue to send split requests after the previous split is

private void reportFinishedSnapshotSplitsIfNeed() {
    if (!finishedUnackedSplits.isEmpty()) {
        final Map&lt;String, BinlogOffset&gt; finishedOffsets = new HashMap&lt;&gt;();
        for (MySqlSnapshotSplit split : finishedUnackedSplits.values()) {
            // note: Send slice ID and the maximum offset finishedOffsets
            finishedOffsets.put(split.splitId(), split.getHighWatermark());
        FinishedSnapshotSplitsReportEvent reportEvent =
                new FinishedSnapshotSplitsReportEvent(finishedOffsets);

                "The subtask {} reports offsets of finished snapshot splits {}.",

3.8 MySqlSourceEnumerator Allocates Incremental Splits

After all splits are read in the full phase, the MySqlHybridSplitAssigner creates a BinlogSplit for subsequent incremental reads. When a BinlogSplit is created, the smallest BinlogOffset is filtered from all completed full splits. Note: The minimum offset createBinlogSplit in the 2.0.0 branch always starts from 0. The latest master branch has fixed this BUG.

private MySqlBinlogSplit createBinlogSplit() {
    final List&lt;MySqlSnapshotSplit&gt; assignedSnapshotSplit =

    Map&lt;String, BinlogOffset&gt; splitFinishedOffsets =
    final List&lt;FinishedSnapshotSplitInfo&gt; finishedSnapshotSplitInfos = new ArrayList&lt;&gt;();
    final Map&lt;TableId, TableChanges.TableChange&gt; tableSchemas = new HashMap&lt;&gt;();

    BinlogOffset minBinlogOffset = null;
    // note: filters the minimum offset from all assignedSnapshotSplit
    for (MySqlSnapshotSplit split : assignedSnapshotSplit) {
        // find the min binlog offset
        BinlogOffset binlogOffset = splitFinishedOffsets.get(split.splitId());
        if (minBinlogOffset == null || binlogOffset.compareTo(minBinlogOffset) &lt; 0) {
            minBinlogOffset = binlogOffset;
                new FinishedSnapshotSplitInfo(

    final MySqlSnapshotSplit lastSnapshotSplit =
            assignedSnapshotSplit.get(assignedSnapshotSplit.size() - 1).asSnapshotSplit();
    return new MySqlBinlogSplit(
            minBinlogOffset == null ? BinlogOffset.INITIAL_OFFSET : minBinlogOffset,
0 0 0
Share on

Apache Flink Community

144 posts | 41 followers

You may also like


Apache Flink Community

144 posts | 41 followers

Related Products