This topic provides answers to some frequently asked questions about data synchronization nodes of Data Integration.
Network connectivity and operations on resource groups
- Which information about DataWorks and its network capabilities do I need to take note of before I configure a data synchronization node?
- How do I make sure the network connectivity between a resource group in DataWorks and a self-managed data source that is hosted on an Elastic Compute Service (ECS) instance when I synchronize data from the self-managed data source?
- How do I make sure the network connectivity between a resource group in DataWorks and a data source that is deployed in a different region from the resource group when I synchronize data from the data source?
- When I synchronize data from a data source, the account that I use to access the data source is different from the account that I use to access DataWorks. How do I make sure the network connectivity between DataWorks and the data source?
- What do I do if the network connectivity test for a data source in a VPC fails?
- The data source connectivity test is sometimes successful and sometimes fails. What do I do?
- I cannot find the exclusive resource group for Data Integration that I purchased when I test network connectivity for a data source or run a data synchronization node. What do I do?
- How can I determine the type of the resource group on which a data synchronization node is run from a log?
- How do I change the type of the resource group on which a data synchronization node is run?
- How do I troubleshoot the issue that a custom resource group for scheduling waits for gateway resources?
- How do I view the EIP of a resource group and add the EIP of the resource group to the IP address whitelist of the data source from which I want to synchronize data?
- Why is a message indicating that a node cannot be run due to insufficient resources in a resource group displayed when the resource group still has resources?
Batch synchronization
O&M of batch synchronization nodes
- Why is the network connectivity test of a data source successful, but the batch synchronization node that uses the data source fails to be run?
- How do I change the resource group that is used to run a batch synchronization node of Data Integration?
- How do I locate and handle dirty data?
- How do I view dirty data?
- Is all data not synchronized if the number of dirty data records generated during data synchronization exceeds the specified upper limit?
- What do I do if a batch synchronization node runs for an extended period of time?
Errors caused by configurations for batch synchronization nodes
- How do I handle a dirty data error that is caused by encoding format configuration issues or garbled characters?
- What do I do if a server-side request forgery (SSRF) attack is detected in a batch synchronization node and the error message Task have SSRF attacts is returned?
- What do I do if a batch synchronization node occasionally fails to be run?
- What do I do if a field is added to or updated in the source table of a batch synchronization node?
- What do I do if a batch synchronization node fails to be run because the name of a column in the source table is a keyword?
Errors for specific plug-ins
- What do I do if an error occurs when I use the root user to add a MongoDB data source?
- Batch synchronization
- How do I convert the values of the variables in the query parameter into values of the TIMESTAMP data type when I synchronize incremental data from a table of a MongoDB database?
- After data is synchronized from a MongoDB data source to a destination, the time zone of the data is 8 hours ahead of the original time zone of the data. What do I do?
- What do I do if a batch synchronization node fails to synchronize data changes in a MongoDB data source to a destination?
- Is the number of OSS objects from which OSS Reader can read data limited?
- What do I do if data fails to be written to DataHub because the amount of data that I want to write to DataHub at a time exceeds the upper limit?
- Is historical data replaced each time data is written to Lindorm in bulk mode provided by Lindorm?
- How do I query all fields in an index of an Elasticsearch cluster?
Batch synchronization scenarios
- How do I specify table names in the configurations of a batch synchronization node?
- What do I do if the table that I want to select is not displayed when I configure a batch synchronization node by using the codeless UI?
- What are the items that I must take note of when I use the Add feature to configure a batch synchronization node that synchronizes data from a MaxCompute table?
- How do I enable a batch synchronization node to synchronize data from a partition key column in a MaxCompute table?
- How do I enable a batch synchronization node to synchronize data from multiple partitions in a MaxCompute table?
- What do I do if a batch synchronization node fails to be run because the name of a column in the source table is a keyword?
- Why is no data obtained when I run a batch synchronization node to synchronize data from a LogHub data source whose fields contain values?
- Why is some data missed when I run a batch synchronization node to read data from a LogHub data source?
- What do I do if the LogHub fields that are read based on the field mappings configured for a batch synchronization node are not the expected fields?
- I configured the endDateTime parameter to specify the end time for reading data from a Kafka data source, but some data that is read is generated at a point in time later than the specified end time. What do I do?
- What do I do if a batch synchronization node used to synchronize data from a Kafka data source does not read data or runs for a long period of time even if only a small amount of data is stored in the Kafka data source?
- How do I remove the random strings that appear in the data I write to OSS?
- How do I configure a batch synchronization node to synchronize data from tables in sharded MySQL databases to the same MaxCompute table?
- What do I do if a full scan for a MaxCompute table slows down data synchronization because no index is added in the WHERE clause?
- What do I do if the Chinese characters that are synchronized to a MySQL table contain garbled characters because the encoding format of the related MySQL data source is utf8mb4?
- Can I use a function supported by a source to aggregate fields when I synchronize data by using an API operation? For example, can I use a function supported by a MaxCompute data source to aggregate Fields a and b in a MaxCompute table as a primary key for synchronizing data to Lindorm?
- Can I use only the ALTER TABLE statement to modify the time to live (TTL) of a table from which data needs to be synchronized?
- How do I configure Elasticsearch Reader to synchronize the properties of object fields or nested fields, such as object.field1?
- What do I do if data of a string type in a MaxCompute data source is enclosed in double quotation marks (") after the data is synchronized to an Elasticsearch data source? How do I configure the JSON strings read from a MaxCompute data source to be written to nested fields in an Elasticsearch data source?
- How do I configure a batch synchronization node to synchronize data such as string "[1,2,3,4,5]" from a data source to an Elasticsearch data source as an array?
- The property type of a field in a self-managed Elasticsearch index is keyword, but the type of the child property of the field is changed to keyword after the related batch synchronization node is run with the cleanup=true setting configured. Why does this happen?
- Each time data is written to Elasticsearch, an unauthorized request is sent, and the request fails because the username verification fails. As a result, a large number of audit logs are generated every day because all the requests are logged. What do I do?
- Why do the settings that are configured for Elasticsearch Writer not take effect during the creation of an index?
- How do I configure a batch synchronization node to synchronize data to fields of a date data type in an Elasticsearch data source?
- What do I do if a write error occurs when the type of a field is set to version in the configuration of Elasticsearch Writer?
- Batch synchronization
Error messages
Real-time synchronization
Precautions for configuring real-time synchronization nodes
- What types of data sources support real-time synchronization?
- Why does my real-time synchronization node have high latency?
- Solutions to latency on a real-time synchronization node
- Why is the Internet not recommended for real-time synchronization?
- What operation does DataWorks perform on the data records that are synchronized in real time?
- How do I deal with the TRUNCATE statement during real-time data synchronization?
- How do I improve the speed and performance of real-time synchronization?
- Can I directly run a real-time synchronization node on the codeless UI?
- Why does my real-time synchronization node that is used to synchronize data from MySQL slows down?
- Why do differences exist between the amount of resources that are consumed when I synchronize data from a single database and the amount of resources that are consumed when I synchronize data from multiple databases?
- Which types of DDL processing policies do real-time synchronization nodes support?
Errors for real-time synchronization of data from MySQL
Errors for real-time synchronization of data from Oracle, PolarDB, and MySQL
Error messages
Solution-based synchronization
- Why does decimal(7,4) is converted to numeric(38,18) when I run a data synchronization solution to synchronize data from MySQL to Hologres?
- Can I run a one-click real-time synchronization solution to synchronize data from tables in sharded databases to the same MaxCompute table?
- How do I prevent an error from being reported after fields in a source table specified in a one-click real-time synchronization solution are changed?