This topic answers frequently asked questions about Data Ingestion jobs powered by Flink CDC.
Quick reference
| Symptom | Phase | Severity | Link |
|---|---|---|---|
JobManager OOM with high SnapshotSplits metric values | Snapshot | Critical | FAQ 1 |
| TaskManager OOM when few shards remain | Snapshot | Critical | FAQ 3 |
| JobManager OOM on state restoration during incremental reading | Incremental | Critical | FAQ 2 |
No new data after a lock-free schema change with pt-osc | Schema change | High | FAQ 4 |
| Transform column type mismatch after a lock-free schema change | Schema change | High | FAQ 5 |
| Job fails to restore from a pre-schema-change savepoint | State recovery | High | FAQ 6 |
Snapshot phase
FAQ 1: JobManager OOM during the snapshot phase
Severity: Critical | Phase: Snapshot | Affected versions: All VVR engine versions
Symptom
The job restarts repeatedly during the snapshot phase.
The JobManager logs contain an
OutOfMemoryError(OOM) stack trace.On the Alarm tab, the
Num of remaining SnapshotSplitsandNum of processed SnapshotSplitsmetrics show exceptionally high values.

Cause
During the snapshot phase, the MySQL source persists all table shard metadata to the Flink job's state. If the job handles a large volume of data or uses very small shard sizes, the JobManager creates an excessive number of shards. This consumes too much memory and causes the JobManager to run out of memory.
Solution
Increase the memory resources allocated to the JobManager.
Adjust the following parameters to increase the JobManager's heap and off-heap memory:
jobmanager.memory.heap.sizejobmanager.memory.off-heap.size
FAQ 3: TaskManager OOM near the end of the snapshot phase
Severity: Critical | Phase: Snapshot | Affected versions: All VVR engine versions
Symptom
The TaskManager runs out of memory late in the snapshot phase, typically when only a small number of shards remain.
Searching the TaskManager logs for
using select statementreveals that the last unbounded query involves a very large volume of data.
Cause
Prolonged data reading during the snapshot phase causes a significant amount of incremental data to accumulate for the final shard or shards. When the TaskManager processes this large accumulated shard, it runs out of memory.
Solution
Set the following option:
scan.incremental.snapshot.unbounded-chunk-first.enabled: trueRe-run the snapshot.
Incremental phase
FAQ 2: JobManager OOM during state restoration in the incremental phase
Severity: Critical | Phase: Incremental | Affected versions: VVR 11.1 or earlier
Symptom
The job enters the incremental phase but fails during state restoration.
The JobManager logs show an OOM.
Cause
VVR 11.1 and earlier versions may not properly clean up persisted table schema information from the job's state after transitioning from the snapshot phase to the incremental phase. This leftover schema information accumulates, causing an OOM when the job restores its state from a checkpoint.
Solution
Upgrade to VVR 11.2 or later.
Schema change
FAQ 4: No new data after a lock-free schema change with pt-osc
Severity: High | Phase: Schema change | Affected versions: VVR 11.1 or earlier
Symptom
The job continues running without restarting after a lock-free table schema change.
The
CurrentFetchTimeLagmetric progresses as expected, indicating that data is being fetched.The MySQL source stops producing new data and the
CurrentEmitTimeLagmetric stops updating.
Cause
VVR 11.1 and earlier versions cannot correctly handle DDL change events generated by lock-free schema change tools such as pt-osc. This causes the data pipeline to stall after the schema change.
Solution
Upgrade to VVR 11.2 or later.
Set the following option:
scan.parse.online.schema.changes.enabled: true
FAQ 5: Transform column type mismatch after a lock-free schema change
Severity: High | Phase: Schema change | Affected versions: VVR 11.1 or earlier
Symptom
The job unexpectedly restarts following a lock-free table schema change (for example, using
pt-osc).The Transform operator logs indicate a column type mismatch error.
Cause
This is a known issue in VVR 11.1. If a significant volume of data is inserted into a table during a lock-free schema change operation, the engine may generate an unparsable event.
Solution
Upgrade to VVR 11.2 or later.
Perform a stateful restart from a savepoint that was created before the lock-free schema change.
State recovery
FAQ 6: Job fails to restore from a pre-schema-change savepoint
Severity: High | Phase: State recovery | Affected versions: VVR 11.1 or earlier
Symptom
A stateful restart from a savepoint created before a table schema change fails.
The error message indicates a table schema mismatch exception while consuming binary logs.
Cause
VVR 11.1 and earlier versions do not support stateful restarts from savepoints that contain an incompatible table schema.
Solution
Upgrade to VVR 11.2 or later.
After the upgrade, restart the job from a pre-schema-change savepoint.