Real-time synchronization FAQ

This topic answers common questions about real-time synchronization.

Overview

Category	Related topics
Configuration notes for real-time synchronization tasks	What data sources does real-time synchronization support? Why is the latency of my real-time synchronization task high? Solutions for high latency in real-time synchronization tasks Why is using the public internet not recommended for real-time synchronization tasks? Column format in real-time synchronization How does real-time synchronization handle TRUNCATE operations? How can I improve the speed and performance of real-time synchronization? Can I run real-time synchronization tasks in the console? Why does real-time synchronization slow down when using a MySQL data source? Why does memory usage differ between selecting a single database and multiple databases in real-time synchronization? What DDL policies does real-time synchronization support? Considerations for source DDL and DML operations when writing to a destination data source If a source table has default values, are they and other attributes like NOT NULL retained in the destination table created by Data Integration? Why is the latency high after a failover of a real-time synchronization task that reads data from PostgreSQL? How do I run a full synchronization for an existing task?
FAQ for synchronizing MySQL data in real time	When synchronizing data from a MySQL data source in real time, the task reads data at first but then stops. What do I do?
FAQ for synchronizing Oracle, PolarDB, and MySQL data in real time	A real-time synchronization task for Oracle, PolarDB, or MySQL fails repeatedly
Error messages and solutions	Error messages and solutions Apache Kafka real-time synchronization error: `Startup mode for the consumer set to timestampOffset, but no begin timestamp was specified.` MySQL real-time synchronization errors: `Cannot replicate because the master purged required binary logs.` `MySQLBinlogReaderException` `show master status' has an error!` `parse.exception.PositionNotFoundException: can't find start position forxxx` Hologres real-time synchronization error: `permission denied for database xxx` Error when synchronizing data to MaxCompute in real time: `ODPS-0410051:invalid credentials-accessKeyid not found` Error when synchronizing data to Oracle in real time: `logminer doesn't init, send HeartbeatRecord`

Configuration notes for real-time synchronization

Supported data sources

For a list of supported data sources for real-time synchronization, see Supported data sources for real-time synchronization.

High latency in real-time synchronization

If you query a destination table and find that new data is missing, the real-time synchronization task may have high latency. You can go to the real-time synchronization tasks page in Operation Center to check if the latency of the task is too high. For more information, see Solutions for high latency in real-time synchronization tasks.

High latency can be caused by the following reasons:

Symptom	Direct cause	Solution
High read-side latency	Excessive data changes at the source. A sudden increase in latency indicates a surge in data volume at the source at a specific time.	If frequent updates and large data volumes at the source cause high synchronization latency, take the following actions: Modify the task configuration: You can adjust the real-time synchronization concurrency based on the number of databases or tables to be synchronized, within the maximum number of connections allowed by the source database. Note The maximum concurrency is limited by the resource group. The specific limits on concurrency and task numbers vary by resource group specification. For more information, see Overview of resource groups for DataWorks. When you set the concurrency, you can evaluate the setting based on the maximum number of connections allowed by the database if the source is an RDS instance. If the source is LogHub, you can set the concurrency based on the number of shards. Upgrade the resource group specification: If the data volume at the source increases, or if you modify a task to synchronize multiple databases and tables instead of a single one, the current resource group may become insufficient. In this case, you can upgrade the resource group specification. For more information, see Change resource group specifications.
High read-side latency	The starting offset is set too early.	If the starting offset is too early, the real-time synchronization task needs some time to catch up with the data.
High write-side latency	Performance or load issues with the destination database.	If the database load is high, simply adjusting the synchronization task concurrency does not solve the problem. You need to contact your database administrator for help.
High read-side and write-side latency	Synchronization over the public internet causes network issues that lead to task latency.	Synchronization over the public internet cannot guarantee real-time timeliness. Establish a network connection and synchronize data over the internal network. Note Synchronization over the public internet has the following risks: network instability and frequent packet loss affect synchronization performance and security is low.
High read-side and write-side latency	A large performance difference between the source and destination databases or high database load can also cause high latency.	If the database load is high, simply adjusting the synchronization task concurrency does not solve the problem. You need to contact your database administrator for help.

Using the public network

Using the public internet for real-time synchronization tasks poses the following risks:

The network may be unstable, and frequent packet loss can affect synchronization performance.
Security is low.

Column format

When Data Integration synchronizes data from MySQL, Oracle, LogHub, and PolarDB to DataHub or Apache Kafka in real time, it adds five extra columns to the destination to support metadata management, record sorting, and deduplication. For more information, see Column format in real-time synchronization.

Handling TRUNCATE operations

Real-time synchronization supports TRUNCATE operations, which take effect during incremental and full merges. If you choose to ignore TRUNCATE, extraneous data may appear in the destination table.

Performance improvement

If the write speed is slow, you can increase the write-side concurrency and tune the JVM parameters. JVM parameter settings depend on the frequency of data changes, not the number of synchronized databases. If your resource group permits, allocating more memory can reduce the frequency of full GC and improve the performance of real-time synchronization. Set Concurrency to 3. In the Basic configuration panel of the task, set JVM Parameters to -Xms8192m -Xmx8192m.

Running tasks in the console

No, you cannot run real-time synchronization tasks directly from the DataWorks console. After you configure a real-time synchronization task, you must submit and deploy the real-time synchronization node, and then run the node in the production environment. For more information, see Operations and tuning for single-table real-time synchronization.

Slowdown with MySQL sources

One reason is an increase in the number of binlogs to be synchronized. Because binlogs are generated at the instance level, even if a synchronization task is configured to synchronize only Table A, changes to other tables in the instance, such as Table B and Table C, also generate binlogs. These additional binlogs slow down the synchronization process.

Memory usage for single vs. multiple databases

When you select two or more databases, the task enters full-instance real-time synchronization mode. This mode consumes more resources than two separate real-time tasks that each synchronize a single database.

Supported DDL policies

Handling methods:

Processing	Ignore	Alert	Error
The DDL message is forwarded to the destination data source for processing. The processing policy may vary depending on the destination data source.	The DDL message is discarded, and the destination data source takes no action.	The DDL message is discarded, and an alert is sent. Note If no alert rule is configured for the real-time task, you will not receive the corresponding alert.	The task terminates with an error. Note If an alert rule corresponding to the task status is configured for the real-time task, you can receive the corresponding alert.

DDL types:

Create table	Real-time tasks that write to Apache Hudi support Processing. Note If a new table's name matches the task's filter rules, Data Integration creates a corresponding table at the destination. For sharded database and table real-time tasks, when a sub-table that matches the rules is created, its data is synchronized to the destination table. A new destination table is not created. Other task types do not support Processing. You can only choose Ignore, Alert, or Error.
Drop table	Sharded database and table real-time tasks support Processing. When a sub-table that matches the rules is dropped, its data is no longer synchronized, but the destination table is not dropped. Other task types do not support Processing. You can only choose Ignore, Alert, or Error.
Add column	Real-time tasks that write to MaxCompute, Hologres, MySQL, Oracle, and AnalyticDB for MySQL support Processing. Note When you add columns at the source, append new columns to the end of the field list. Do not insert them in the middle, as this can cause data synchronization anomalies. Other task types do not support Processing. You can only choose Ignore, Alert, or Error.
Drop column	Processing is not supported. You can only choose Ignore, Alert, or Error.
Rename table	Processing is not supported. You can only choose Ignore, Alert, or Error.
Rename column	Processing is not supported. You can only choose Ignore, Alert, or Error.
Alter column type	Real-time tasks that write to Apache Hudi support Processing. For more information, see Schema Evolution. Other task types do not support Processing. You can only choose Ignore, Alert, or Error.
Truncate table	Real-time tasks that write to MaxCompute, Hologres, MySQL, Oracle, and AnalyticDB for MySQL support Processing and clear the data in the destination table. For sharded database and table real-time tasks, when a sub-table is truncated, the task deletes that sub-table's data from the destination table. Other task types do not support Processing. You can only choose Ignore, Alert, or Error.

Source DDL and DML considerations

After a new column is added at the source and processed at the destination, the following limitations apply:
- When a column with a DEFAULT VALUE is added, the new column in the destination table remains NULL. Subsequently, when new data is added to this column at the source, the task synchronizes it to the destination.
- When a VIRTUAL column is added, the new column in the destination table remains NULL. Subsequently, when new data is added to this column at the source, the task synchronizes it to the destination.
For real-time synchronization from MySQL and PolarDB for MySQL sources, append new columns to the end of the table instead of inserting them in the middle. If inserting a column in the middle is unavoidable at the source, the following constraints apply:
- In a full and incremental solution, do not add a column in the middle during the full synchronization phase. This can cause data anomalies during the real-time synchronization phase.
- During the real-time synchronization phase, the synchronization offset reset time must be set after the DDL event that adds the column in the middle. Otherwise, data anomalies will occur during subsequent synchronization.
If data anomalies occur, you can restore the data by re-initializing the affected table. You only need to re-initialize the single table where the column was added, not all tables in the task.

If the source table has default values, are the default values and NOT NULL constraints retained in the destination table created by Data Integration?

When creating the destination table, DataWorks only retains the column names, data types, and comments from the source table. Default values, constraints (including NOT NULL constraints, indexes, etc.) are not retained.

High latency after PostgreSQL failover

This is a behavior of the PostgreSQL database itself. If the latency is unacceptable, you can stop the task and restart it to perform a full and incremental data synchronization.

Running a full synchronization

In Data Integration, find your task in the synchronization task list. Then, in the Actions column, click More > Rerun.

MySQL real-time synchronization FAQ

Task stops reading from MySQL

Run the following command in the database to identify the current binlog file.
```
show master status 
```
Compare this with the binlog file read by the task. Search the logs for journalName=mysql-bin.000001,position=50 to confirm if data is being written to the database.
If data is being written but the binlog is not advancing, contact your DBA for assistance.

Oracle, PolarDB, and MySQL FAQ

Recurring task failures for Oracle, PolarDB, and MySQL

Symptom: The real-time synchronization task fails repeatedly.

When the source of a real-time synchronization is an Oracle, PolarDB, or MySQL data source, processing DDL messages from the source is not supported by default. When DDL changes other than creating a new table occur, the real-time task fails and exits. With breakpoint resume, the synchronization may still fail even if no DDL messages are generated at the source.

Note
To avoid data loss or corruption, do not use the Rename command to swap the names of two columns. For example, do not swap the names of Column A and Column B.
Cause: Real-time synchronization supports breakpoint resume. To ensure no data is lost, the task may backtrack to an earlier offset when it restarts. This can cause the task to read previous DDL messages again, triggering another failure.
Solution:
1. When a DDL change occurs at the source, manually perform the corresponding DDL change on the destination database.
2. Start the real-time synchronization task and change the DDL policy from Error to Ignore.
  
  Note
  Because breakpoint resume may still subscribe to this DDL event, temporarily change the policy to Ignore to prevent the task from failing again.
3. Stop the real-time synchronization task, change the DDL policy from Ignore back to Error, and then restart the task.

Error messages and solutions

Kafka error: `Startup mode for the consumer set to timestampOffset, but no begin timestamp was specified.`

Reset the starting offset. In the Start dialog box for the task, find the Reset offset option and select the Reset offset checkbox to enable this feature.

The Reset offset function in Data Integration lets you change the starting point for data synchronization. Use it to restart a task from a specific time or data position, for example, to recover from an error or resynchronize specific data. This ensures data consistency and integrity.

MySQL error: `Cannot replicate because the master purged required binary logs.`

If a MySQL real-time synchronization task fails with the message Cannot replicate because the master purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new slave from backup., it may be because MySQL does not have the binlog record at the consumed offset. Check the binlog retention period in your MySQL settings and ensure that the offset is configured within this time range when the synchronization task starts.

Note

If you cannot subscribe to the binlog, try resetting the offset to the current time.

MySQL error: `MySQLBinlogReaderException`

If a MySQL real-time synchronization task fails with the message MySQLBinlogReaderException: The database you are currently syncing is the standby database, but the current value of log_slave_updates is OFF, you need to enable the binlog log update of the standby database first. , it may be because binlog is not enabled on the standby database. If you want to synchronize a standby database, you must enable binlog for cascading standby databases. You can ask your DBA for help.

For more information about how to enable binlog, see Step 3: Enable binlog for MySQL.

MySQL error: `show master status' has an error!`

If a MySQL real-time synchronization task fails with the message show master status' has an error! and the error details are Caused by: java.io.IOException: message=Access denied; you need (at least one of) the SUPER, REPLICATION CLIENT privilege(s) for this operation, with command: show master status, it may be because the data source account lacks the required database permissions.

The account configured for the data source must have the SELECT, REPLICATION SLAVE, and REPLICATION CLIENT permissions for the database. For more information about how to grant permissions to the data source account, see Step 2: Create an account and grant permissions.

MySQL error: `parse.exception.PositionNotFoundException: can't find start position forxxx`

The synchronization process could not find the offset. Please reset the offset.

PolarDB error: `The mysql server does not enable the binlog write function. Please enable the mysql binlog write function first`

Possible cause: The loose_polar_log_bin parameter is not enabled for the source PolarDB data source.
Solution: You need to enable the loose_polar_log_bin parameter. For more information, see Configure a PolarDB data source.

Hologres error: `permission denied for database xxx`

When you synchronize data to Hologres in real time, you need to grant the current user admin permissions for the Hologres instance. The user must have permission to create schemas. For more information, see Hologres permission model.

MaxCompute error: `ODPS-0410051:invalid credentials-accessKeyid not found`

When you synchronize data to a MaxCompute data source in real time by using a temporary AccessKey, the temporary AccessKey automatically expires after 7 days, which causes the task to fail. The platform automatically restarts the task when it detects a failure caused by an expired temporary AccessKey. If you have configured monitoring alerts for this type of failure, you will receive an alert.

Oracle error: `logminer doesn't init, send HeartbeatRecord`

When an Oracle real-time synchronization task initializes, it loads a previous archive log to find a suitable synchronization offset. If the archive log is large, it may take 3 to 5 minutes to complete the initialization phase.