All Products
Search
Document Center

Realtime Compute for Apache Flink:FAQ and solutions for data ingestion

Last Updated:Feb 27, 2026

This topic answers frequently asked questions about Data Ingestion jobs powered by Flink CDC.

Quick reference

SymptomPhaseSeverityLink
JobManager OOM with high SnapshotSplits metric valuesSnapshotCriticalFAQ 1
TaskManager OOM when few shards remainSnapshotCriticalFAQ 3
JobManager OOM on state restoration during incremental readingIncrementalCriticalFAQ 2
No new data after a lock-free schema change with pt-oscSchema changeHighFAQ 4
Transform column type mismatch after a lock-free schema changeSchema changeHighFAQ 5
Job fails to restore from a pre-schema-change savepointState recoveryHighFAQ 6

Snapshot phase

FAQ 1: JobManager OOM during the snapshot phase

Severity: Critical | Phase: Snapshot | Affected versions: All VVR engine versions

Symptom

  • The job restarts repeatedly during the snapshot phase.

  • The JobManager logs contain an OutOfMemoryError (OOM) stack trace.

  • On the Alarm tab, the Num of remaining SnapshotSplits and Num of processed SnapshotSplits metrics show exceptionally high values.

Alarm tab showing high SnapshotSplits metrics

Cause

During the snapshot phase, the MySQL source persists all table shard metadata to the Flink job's state. If the job handles a large volume of data or uses very small shard sizes, the JobManager creates an excessive number of shards. This consumes too much memory and causes the JobManager to run out of memory.

Solution

  1. Increase the memory resources allocated to the JobManager.

  2. Adjust the following parameters to increase the JobManager's heap and off-heap memory:

    • jobmanager.memory.heap.size

    • jobmanager.memory.off-heap.size


FAQ 3: TaskManager OOM near the end of the snapshot phase

Severity: Critical | Phase: Snapshot | Affected versions: All VVR engine versions

Symptom

  • The TaskManager runs out of memory late in the snapshot phase, typically when only a small number of shards remain.

  • Searching the TaskManager logs for using select statement reveals that the last unbounded query involves a very large volume of data.

Cause

Prolonged data reading during the snapshot phase causes a significant amount of incremental data to accumulate for the final shard or shards. When the TaskManager processes this large accumulated shard, it runs out of memory.

Solution

  1. Set the following option:

       scan.incremental.snapshot.unbounded-chunk-first.enabled: true
  2. Re-run the snapshot.


Incremental phase

FAQ 2: JobManager OOM during state restoration in the incremental phase

Severity: Critical | Phase: Incremental | Affected versions: VVR 11.1 or earlier

Symptom

  • The job enters the incremental phase but fails during state restoration.

  • The JobManager logs show an OOM.

Cause

VVR 11.1 and earlier versions may not properly clean up persisted table schema information from the job's state after transitioning from the snapshot phase to the incremental phase. This leftover schema information accumulates, causing an OOM when the job restores its state from a checkpoint.

Solution

  1. Upgrade to VVR 11.2 or later.


Schema change

FAQ 4: No new data after a lock-free schema change with pt-osc

Severity: High | Phase: Schema change | Affected versions: VVR 11.1 or earlier

Symptom

  • The job continues running without restarting after a lock-free table schema change.

  • The CurrentFetchTimeLag metric progresses as expected, indicating that data is being fetched.

  • The MySQL source stops producing new data and the CurrentEmitTimeLag metric stops updating.

Cause

VVR 11.1 and earlier versions cannot correctly handle DDL change events generated by lock-free schema change tools such as pt-osc. This causes the data pipeline to stall after the schema change.

Solution

  1. Upgrade to VVR 11.2 or later.

  2. Set the following option:

       scan.parse.online.schema.changes.enabled: true

FAQ 5: Transform column type mismatch after a lock-free schema change

Severity: High | Phase: Schema change | Affected versions: VVR 11.1 or earlier

Symptom

  • The job unexpectedly restarts following a lock-free table schema change (for example, using pt-osc).

  • The Transform operator logs indicate a column type mismatch error.

Cause

This is a known issue in VVR 11.1. If a significant volume of data is inserted into a table during a lock-free schema change operation, the engine may generate an unparsable event.

Solution

  1. Upgrade to VVR 11.2 or later.

  2. Perform a stateful restart from a savepoint that was created before the lock-free schema change.


State recovery

FAQ 6: Job fails to restore from a pre-schema-change savepoint

Severity: High | Phase: State recovery | Affected versions: VVR 11.1 or earlier

Symptom

  • A stateful restart from a savepoint created before a table schema change fails.

  • The error message indicates a table schema mismatch exception while consuming binary logs.

Cause

VVR 11.1 and earlier versions do not support stateful restarts from savepoints that contain an incompatible table schema.

Solution

  1. Upgrade to VVR 11.2 or later.

  2. After the upgrade, restart the job from a pre-schema-change savepoint.