What types of data sources support real-time synchronization?

For more information about the types of data sources that support real-time synchronization, see Data sources that support real-time synchronization.

Why is the Internet not recommended for real-time synchronization?

Real-time synchronization over the Internet has the following disadvantages:
  • Packet loss may occur and the performance of data synchronization may be affected due to unstable network connection.
  • The security of data synchronization is low.

What operation does DataWorks perform on the data records that are synchronized in real time?

When Data Integration synchronizes data from a data source such as MySQL, Oracle, LogHub, or PolarDB to DataHub or Kafka in real time, Data Integration adds five fields to the data records synchronized to the destination. These fields are used for operations such as metadata management, sorting, and deduplication. For more information, see Fields used for real-time synchronization.

Why does my real-time synchronization node have high latency?

The high latency may be caused by the following reasons:
  • The amount of incremental data in the source is small or excessively large.
  • The network connection is poor. We recommend that you do not use the Internet for real-time synchronization.
  • The offset from which data starts to be synchronized is earlier than the current time. As a result, it takes a period of time to read the historical data before data can be read in real time.

When I run a node to synchronize data from Kafka in real time, the following error message appears: Startup mode for the consumer set to timestampOffset, but no begin timestamp was specified.. What do I do?

Specify an offset from which you want to synchronize data. Reset the offset

When I run a node to synchronize data from MySQL in real time, the following error message appears: Cannot replicate because the master purged required binary logs.. What do I do?

Data Integration cannot find the binary logs generated for the offset from which you want to synchronize data. You must check the retention duration of the binary logs of your MySQL data source and specify an offset within the retention duration when you start your synchronization node.
Note If Data Integration cannot find the binary logs, you can reset the offset to the current time.

When I run a node to synchronize data from MySQL, the following error message appears: MysqlBinlogReaderException. What do I do?

The binary logging feature is disabled for the secondary MySQL database. If you want to synchronize data from the secondary MySQL database, you must enable this feature for the secondary database. To enable the feature, consult the administrator of the database.

For more information, see Enable the binary logging feature for the MySQL database.

When I run a node to synchronize data from MySQL, the following error message appears: show master status' has an error!. What do I do?

If the detailed information of the error is Caused by: java.io.IOException: message=Access denied; you need (at least one of) the SUPER, REPLICATION CLIENT privilege(s) for this operation, with command: show master status, the account that you use has no permissions to access the source.

The account that you use to access the source must have the SELECT, REPLICATION SLAVE, and REPLICATION CLIENT permissions on the MySQL database. For more information about how to grant the required permissions on the database to an account, see Create an account and grant the required permissions to the account.

When I run a node to synchronize data from MySQL in real time, the following error message appears: parse.exception.PositionNotFoundException: can't find start position forxxx. What do I do?

Data Integration cannot find the binary logs generated for the offset from which you want to synchronize data. You must reset an offset for the node.

When I run a node to synchronize data from MySQL in real time, data can be read at the beginning but cannot be read after a period of time. What do I do?

  1. Run the following command on the desired MySQL database to view the binary log files that record the data write operation in the database:
    show master status 
  2. Search for journalName=mysql-bin.xx,position=xx in the binary log files of the MySQL database to check whether the binary log files contain data records about the offset specified by the position parameter. For example, you can search for journalName=mysql-bin.000001,position=50.
  3. Contact the database administrator if data is being written to the MySQL database but no data write operations are recorded in binary logs.

How do I deal with the TRUNCATE statement during real-time data synchronization?

Real-time synchronization supports the TRUNCATE statement. The TRUNCATE statement takes effect when full and incremental data is merged. If you do not execute the TRUNCATE statement, excessive data may be generated during data synchronization.

How do I improve the speed and performance of real-time synchronization?

If data is written to the destination at a low speed, you can set the number of parallel threads to a larger value and adjust the values of the Java Virtual Machine (JVM) parameters. The values of the JVM parameters affect only the frequency of full heap garbage collection (Full GC). A large JVM heap memory reduces the frequency of full GC and improves the performance of real-time synchronization.

When I run a node to synchronize data from Hologres in real time, the following error message appears: permission denied for database xxx. What do I do?

Before you run a node to synchronize data from Hologres in real time, you must obtain the permissions of the <db>_admin user group in the Hologres console for your account. For more information, see Overview.

Can I directly run a real-time synchronization node on the codeless user interface (UI)?

You cannot directly run a real-time synchronization node on the codeless UI. You must commit and deploy the real-time synchronization node and run the node in the production environment. For more information, see Create, configure, commit, and manage real-time sync nodes.