This topic describes how to view the run logs that are generated for a batch synchronization node.

Go to the log details page

You can view the run logs that are generated for a batch synchronization node in Operation Center or DataStudio.
Service Description
Operation Center You can go to the Cycle Instance, Test Instance, or Patch Data page in Operation Center, specify the filter conditions to search for the instance that is generated for the batch synchronization node, and then go to the log details page of the instance. For more information, see View auto triggered node instances, View and manage data backfill instances, or Execute the test and view the test instance.
DataStudio In the Operating history pane of the DataStudio page, you can view the run logs that are generated for the batch synchronization node within the last three days. For more information, see View operating history.

View run logs generated for a batch synchronization node

The following figure shows the run logs generated for a batch synchronization node in different stages. You can click the link provided in the area marked with 5 in the following figure to view the details of the batch synchronization node in each stage. Logs
Stage Keyword Description
Commit the node (area marked with 1) SUBMIT: The batch synchronization node is issued by the scheduling system to the resource group for Data Integration for running. This indicates that the batch synchronization node is rendered. The scheduling system issues the batch synchronization node to the resource group for Data Integration for running. You can view the resource group for Data Integration in the area marked with 1. The information that is printed in the run logs varies based on the type of the resource group that you use.
  • If the batch synchronization node is run on the shared resource group for Data Integration, the run logs contain the following information:

    running in Pipeline[basecommon_ group_xxxxxxxxx]

  • If the batch synchronization node is run on an exclusive resource group for Data Integration, the run logs contain the following information:

    running in Pipeline[basecommon_S_res_group_xxx]

Wait for resources (area marked with 2) WAIT: The batch synchronization node is waiting for resources in the resource group for Data Integration. If the batch synchronization node waits for resources in the resource group for Data Integration for a long period of time, other nodes may be running on the resource group and the idle resources in the resource group are insufficient for the current node. In this case, you can use one of the following solutions to resolve the issue:
  • Start the batch synchronization node after the nodes that are running on the resource group for Data Integration finish running. After the nodes finish running, the resources in the resource group for Data Integration are released. For more information about how to find the nodes that occupy resources, see Scenarios of slow data synchronization.
  • Find the nodes that compete for resources with the batch synchronization node, contact the owners of the nodes, and then ask the owners to reduce the parallel threads for the nodes.
  • Reduce the parallel threads that you specified for the batch synchronization node. Then, commit and deploy the node again.
  • Scale out the resource group for Data Integration. For more information, see Scale out or in a resource group.
Run the node (area marked with 3) RUN: The batch synchronization node is running. A batch synchronization node runs in the following stages:
  1. Execute the specified SQL statement before data synchronization
    If you specify the SQL statement that you want to execute before data synchronization for the batch synchronization node, the system issues the SQL statement to the related database and executes the SQL statement on the database. If you do not specify the SQL statement that you want to execute before data synchronization, this stage is skipped.
    • If you specify the SQL statement that you want to execute before data synchronization for a batch synchronization node that uses MySQL Writer, the system issues the SQL statement to the related database and executes the SQL statement on the database in this stage.
    • If you specify a condition that is used for refined data filtering or a WHERE clause for a batch synchronization node that uses MySQL Reader, the system issues the related SQL statement to the related database and executes the SQL statement on the database in this stage.
    • If you set the write mode to deleting existing data before data write for a batch synchronization node that uses MaxCompute Writer, the system executes the related SQL statement to delete existing data from the destination table before new data is written to the table.
    Note We recommend that you use an indexed field for data filtering. This prevents the batch synchronization node from requiring a long period of time to run because the specified SQL statement is executed on the related database for an extended period of time. This also prevents the batch synchronization node from failing because the execution of the SQL statement on the related database times out.
  2. Shard data in the source
    In this stage, data in the source is sharded and distributed to multiple shards. This way, the batch synchronization node can run parallel threads to read the data in batches. Data in the source is sharded based on the following rules:
    • If data is read from a relational database, the data is sharded based on the shard key that you specified and the batch synchronization node runs parallel threads to read the data in batches. If no shard key is specified, the batch synchronization node runs a single thread to read the data.
    • If data is read from a LogHub, DataHub, or MongoDB data source, the data is sharded based on the number of shards in the related data source. The maximum number of parallel threads that the batch synchronization node uses to read the data cannot exceed the number of shards.
    • If data is read from a semi-structured data source, the data is sharded based on the number of files or the data volume. For example, if data is read from an Object Storage Service (OSS) data source, the data is sharded based on the number of objects in the related OSS bucket. The maximum number of parallel threads that the batch synchronization node uses to read the data cannot exceed the number of the objects.
  3. Synchronize data
    In this stage, the batch synchronization node runs the specified number of parallel threads to read the sharded source data in batches. If data is read from a relational database, the system generates multiple SQL statements based on the shard key that you specified. The SQL statements are used to read the data from the database in batches.
    Note
    • During data synchronization, the number of parallel threads that are actually run by the batch synchronization node may not be the same as the number of parallel threads that you specified.
    • If an inappropriate shard key is specified, the following situations may occur: The batch synchronization node requires a long period of time to run because the SQL statements that are generated based on the shard key to read data from the source are executed on the related database for an extended period of time. The batch synchronization node fails because the execution of the SQL statements on the related database times out.
    • If the loads on the related database are high, the batch synchronization node may require an extended period of time to run.
  4. Execute the specified SQL statement after data synchronization
    If you specify the SQL statement that you want to execute after data synchronization for the batch synchronization node, the system issues the SQL statement to the related database and executes the SQL statement on the database after data synchronization. If you do not specify the SQL statement that you want to execute after data synchronization, this stage is skipped.
    • If you specify the SQL statement that you want to execute after data synchronization for a batch synchronization node that uses MySQL Writer, the system issues the SQL statement to the related database and executes the SQL statement on the database after data synchronization in this stage.
    • The time that is required by the batch synchronization node to run is also affected by the execution time of the SQL statement that you want to execute after data synchronization.
Finish running (area marked with 4) After the batch synchronization node finishes running, one of the following keywords is printed in the run logs:
  • FAIL: The batch synchronization node fails.
  • SUCCESS: The batch synchronization node is successfully run.
  • If the batch synchronization node fails, the key error message is printed in the run logs. You can click the link provided in the area marked with 5 to view the details of the batch synchronization node in each stage.
  • If the batch synchronization node is successfully run, the following information is printed in the run logs: the total number of synchronized data records, the total volume of synchronized data, and the average data synchronization speed.
Note
  • If dirty data is generated during data synchronization, Dirty data: xxR is printed in the run logs and the dirty data is not written to the destination.
  • If a large amount of dirty data is generated during data synchronization, the data synchronization speed is affected. If you have a high requirement for the data synchronization speed, we recommend that you handle the dirty data issue at the earliest opportunity after the dirty data is generated. For more information about dirty data, see Configurations for a batch synchronization node.
  • You can specify the maximum number of dirty data records that are allowed during data synchronization to control the impacts of dirty data on your batch synchronization node. By default, batch synchronization nodes allow the generation of dirty data. You can modify the setting related to dirty data on the configuration tab of your batch synchronization node. For more information about how to configure a batch synchronization node by using the codeless user interface (UI), see Configure a batch synchronization node by using the codeless UI. For more information about how to configure a batch synchronization node by using the code editor, see Configure a batch synchronization node by using the code editor.

Configuration of a shard key for a relational database

  • We recommend that you set the shard key to the name of the primary key column of the source table. This way, data can be evenly distributed into different shards based on the primary key column, instead of being intensively distributed only into specific shards.
  • A shard key can be used to shard only data of integer data types. If you use a field of an unsupported data type as the shard key, the batch synchronization node ignores the shard key that you specified and uses a single thread to read data.
  • If no shard key is specified, the batch synchronization node uses a single thread to read data.