All Products
Search
Document Center

Data Transmission Service:Configure data verification

Last Updated:Jul 01, 2024

This topic describes how to configure a data verification task in Data Transmission Service (DTS). The data verification feature helps you check whether data is consistent between the source and destination databases. In the Advanced Settings step of configuring a data synchronization or migration instance, you can configure Data Verification Mode to detect data inconsistency at the earliest opportunity.

Usage notes

  • Incremental data verification is not supported for database instances that are deployed on the classic network.

  • Data verification is not supported for DTS instances for which the multi-table merging feature is enabled.

  • Data verification is not supported for DTS instances whose Status is Completed.

  • If a table to be verified contains neither a primary key nor a unique index and the table contains more than 10,000 rows of data, DTS does not perform data verification on the table.

  • If the data to be verified has neither PRIMARY KEY nor UNIQUE constraints, DTS does not perform incremental data verification.

  • If you pause and restart a data synchronization or migration instance, full data verification configured for the instance is also restarted to verify data again.

  • Before you perform data verification, evaluate the impact of data verification on the performance of the source and destination databases. We recommend that you perform data verification during off-peak hours, such as when the CPU load of the source and destination databases is less than 30%. During data verification, DTS uses the read resources of the source and destination databases. This may increase the loads of the database servers. If you perform data verification when the database performance is poor, the database specifications are low, or the business volume is large, high pressure may be caused on the databases. The database services may become unavailable.

  • DTS performs incremental data verification only on the data that is incrementally synchronized or migrated to the destination database. Incremental data verification is not performed on the data that is manually changed in the destination database.

  • If you select Verify hash values based on the sampling ratio when you configure data verification for a MongoDB instance, the sampling percentage is fixed to 100%.

  • If you select Verify hash values based on the sampling ratio for Full Data Verification, we recommend that you do not use the extract, transform, and load (ETL) feature. Otherwise, the data modified by using the ETL feature is detected as inconsistent data.

Supported regions

Data verification is supported in all regions and cross-region scenarios. For more information, see Supported regions.

Supported scenarios

Note
  • Data verification is supported in two-way synchronization scenarios between the following source and destination databases.

  • The following source and destination databases support Alibaba Cloud database instances and self-managed database instances.

Data verification mode

Source database type

Destination database type

Full data verification

MySQL

MySQL, AnalyticDB for MySQL, PolarDB for MySQL, PostgreSQL, AnalyticDB for PostgreSQL, and Oracle

PolarDB for MySQL

MySQL, AnalyticDB for MySQL, PolarDB for MySQL, AnalyticDB for PostgreSQL, and Oracle

PostgreSQL

MySQL, PostgreSQL, AnalyticDB for PostgreSQL, PolarDB for PostgreSQL, Oracle, and PolarDB for Oracle

PolarDB for PostgreSQL

MySQL, PostgreSQL, AnalyticDB for PostgreSQL, PolarDB for PostgreSQL, and Oracle

Standalone MongoDB databases

Standalone MongoDB databases

MongoDB databases that use the replica set architecture

MongoDB databases that use the replica set architecture

MongoDB databases that use the sharded cluster architecture

MongoDB databases that use the sharded cluster architecture

Tair and Redis databases that use the master-replica architecture

Tair and Redis databases that use the master-replica architecture

Tair and Redis databases that use the cluster architecture

Tair and Redis databases that use the cluster architecture

SQL Server

MySQL, AnalyticDB for MySQL, PolarDB for MySQL, SQL Server, and AnalyticDB for PostgreSQL

Oracle

AnalyticDB for MySQL, AnalyticDB for PostgreSQL, MySQL, PolarDB for MySQL, PolarDB for PostgreSQL, PolarDB for Oracle, PostgreSQL, and Oracle

PolarDB for Oracle

PolarDB for Oracle, AnalyticDB for MySQL, and Oracle

Db2 for LUW

MySQL, PolarDB for MySQL, AnalyticDB for MySQL, and AnalyticDB for PostgreSQL

Db2 for i (AS/400)

MySQL

Incremental data verification

MySQL

MySQL, AnalyticDB for MySQL, PolarDB for MySQL, PostgreSQL, AnalyticDB for PostgreSQL, and PolarDB for PostgreSQL

PolarDB for MySQL cluster

MySQL, AnalyticDB for MySQL, PolarDB for MySQL, and AnalyticDB for PostgreSQL

PostgreSQL

MySQL, PostgreSQL, AnalyticDB for PostgreSQL, and PolarDB for PostgreSQL

PolarDB for PostgreSQL

PostgreSQL, AnalyticDB for PostgreSQL, and PolarDB for PostgreSQL

SQL Server

MySQL, AnalyticDB for MySQL, PolarDB for MySQL, SQL Server, and AnalyticDB for PostgreSQL

Oracle

AnalyticDB for MySQL, MySQL, PolarDB for MySQL, and PolarDB for Oracle

Schema verification

MySQL

MySQL and PolarDB for MySQL

PolarDB for MySQL

MySQL and PolarDB for MySQL

PostgreSQL

PostgreSQL and PolarDB for PostgreSQL

PolarDB for PostgreSQL

PostgreSQL and PolarDB for PostgreSQL

Configure a data verification task from the Data Verification page

  1. Go to the Data Verification page.

    1. Log on to the Data Management (DMS) console.

    2. In the top navigation bar, move the pointer over DTS.

    3. Choose DTS (DTS) > Data Verification.

  2. Go to the configuration page of a data verification task in one of the following two ways:

    • If you have purchased a data verification instance, click Configure Task in the Actions column of the instance.

      For more information about how to purchase a data verification instance, see Purchase a data verification instance.

    • If you have not purchased a data verification instance, click Create Task.

      Important

      This feature is in canary release. Only specific users can purchase a data verification instance from the Data Verification page.

  3. Configure the source and destination databases. For more information, see Overview of data synchronization scenarios or Overview of data migration scenarios.

    Note

    You cannot change the value of the Instance Region parameter in this step if you purchase the instance before you configure the task.

  4. In the lower part of the page, click Test Connectivity and Proceed.

    If the source or destination database is an Alibaba Cloud database instance, such as an ApsaraDB RDS for MySQL or ApsaraDB for MongoDB instance, DTS automatically adds the CIDR blocks of DTS servers to the whitelist of the instance. If the source or destination database is a self-managed database hosted on an Elastic Compute Service (ECS) instance, DTS automatically adds the CIDR blocks of DTS servers to the security group rules of the ECS instance, and you must make sure that the ECS instance can access the database. If the database is deployed on multiple ECS instances, you must manually add the CIDR blocks of DTS servers to the security group rules of each ECS instance. If the source or destination database is a self-managed database that is deployed in a data center or provided by a third-party cloud service provider, you must manually add the CIDR blocks of DTS servers to the whitelist of the database to allow DTS to access the database. For more information, see Add the CIDR blocks of DTS servers.

    Warning

    If the CIDR blocks of DTS servers are automatically or manually added to the whitelist of the database or instance, or to the ECS security group rules, security risks may arise. Therefore, before you use DTS to synchronize data, you must understand and acknowledge the potential risks and take preventive measures, including but not limited to the following measures: enhancing the security of your username and password, limiting the ports that are exposed, authenticating API calls, regularly checking the whitelist or ECS security group rules and forbidding unauthorized CIDR blocks, or connecting the database to DTS by using Express Connect, VPN Gateway, or Smart Access Gateway.

  5. Configure Data Verification Mode.

    Select one or more data verification modes based on your business requirements. If you select Full Data Verification, you must also configure the parameters that are described in the following table.

    Note
    • Full Data Verification: verifies data in the full data synchronization or migration stage of the DTS task.

    • Schema Verification: verifies the schemas of objects in the data synchronization or migration task.

    • Full data verification and incremental data verification start after schema verification is complete.

    • You cannot select Incremental Data Verification if you configure a data verification task from the Data Verification page.

    Parameter

    Description

    Full Data Verification

    • Verify hash values based on the sampling ratio: verifies all fields of the data sampled based on the specified sampling percentage. You can specify an integer that ranges from 10 to 100 in the field as the sampling percentage.

    • Verify based on the number of table rows: verifies the number of table rows in the full data synchronization or migration stage of the DTS task. DTS does not verify the data content.

    Note

    If the full data verification mode is Verify based on the number of table rows, you are not charged. If the full data verification mode is Verify hash values based on the sampling ratio, you are charged based on the actual amount of data to be verified.

    Full Data Verification Time Rule

    You can set this parameter only to Start Now.

    Timeout Settings for Full Data Verification

    • No: does not specify a timeout period for full data verification. The system does not forcibly stop full data verification if the verification times out.

    • Yes: specifies a timeout period for full data verification. The system starts to count the duration when full data verification starts. If full data verification is not complete within the specified timeout period, the system forcibly stops the verification. The value must be an integer that ranges from 1 to 72.

    Select Verification Benchmark

    • Default: checks the data consistency between the source and destination databases based on the source and destination databases.

    • Source Database: checks the data consistency between the source and destination databases based on the source database. Objects that exist in the destination database but do not exist in the source database are not checked.

    • Destination Database: checks the data consistency between the source and destination databases based on the destination database. Objects that exist in the source database but do not exist in the destination database are not checked.

    Maximum Number of Rows Can be Read per Second

    During full data verification, DTS uses the read resources of the source and destination databases. You can enable throttling for full data verification based on your business requirements. To configure throttling, you must specify the maximum number of data rows and bytes that are read per second. This reduces the loads of the database servers.

    Note

    A value of 0 specifies that the full data verification speed is not limited. If you set the Maximum Number of Rows Can be Read per Second and Maximum Data Volume Can be Read per Second (Byte/s) parameters to 0, throttling is disabled.

    Maximum Data Volume Can be Read per Second (Byte/s)

  6. Specify Verification Objects.

    1. Select the objects to be verified in the source database.

      In the Source Objects section, select the objects that you want to verify and click 向右 to move the objects to the Selected Objects section.

      Note

      You can specify Table Name and Column Name for a destination table only if you select tables as the objects to be verified.

    2. Select the objects to be verified in the destination database.

      1. In the Selected Objects section, right-click a schema moved from the Source Objects section.

      2. In the Edit Schema dialog box, specify Schema Name for the destination database and click OK.

      3. In the Selected Objects section, right-click a table moved from the Source Objects section.

      4. In the Edit Table dialog box, specify Table Name of the destination table.

      5. Optional. In the Column Name column of the Column section, specify the column name of the destination table.

        Note

        You can also clear Synchronize All Tables and manually select the columns for data verification.

      6. Click OK.

    3. Optional. Configure verification alerts.

      If you select Full Data Verification for Data Verification Mode, you can configure Full Data Verification Alert based on your business requirements.

      Parameter

      Description

      Full Data Verification Alert

      • No: disables alerting.

      • Yes: enables alerting. You must also select and configure alert rules. The following information describes the alert rules:

        • An alert is triggered if full data verification fails.

        • An alert is triggered if the volume of inconsistent data detected by full data verification is larger than or equal to the specified threshold.

      Important

      If you enable alerting for data verification and want to receive a notification when an alert is triggered, you must subscribe to alert messages in CloudMonitor. For more information, see Configure alert rules for DTS tasks in the CloudMonitor console.

  7. Click Next: Advanced Settings to configure advanced settings.

    Parameter

    Description

    Dedicated Cluster for Task Scheduling

    By default, DTS schedules the task to a shared cluster. You do not need to configure this parameter. You can also purchase a dedicated cluster of the specified specifications to run DTS tasks. For more information, see What is a DTS dedicated cluster.

    Retry Time for Failed Connections

    The retry time range for failed connections. If the source or destination database fails to be connected after the data synchronization task is started, DTS immediately retries a connection within the time range. Valid values: 10 to 1440. Unit: minutes. Default value: 720. We recommend that you set this parameter to a value greater than 30. If DTS reconnects to the source and destination databases within the specified time range, DTS resumes the data synchronization task. Otherwise, the data synchronization task fails.

    Note
    • If you specify different retry time ranges for multiple data synchronization tasks that have the same source or destination database, the shortest retry time range takes precedence.

    • When DTS retries a connection, you are charged for the DTS instance. We recommend that you specify the retry time range based on your business requirements. You can also release the DTS instance at your earliest opportunity after the source and destination instances are released.

    Retry Time for Other Issues

    The retry time range for other issues. For example, if the DDL or DML operations fail to be performed after the data synchronization task is started, DTS immediately retries the operations within the time range. Valid values: 1 to 1440. Unit: minutes. Default value: 10. We recommend that you set this parameter to a value greater than 10. If the failed operations are successfully performed within the specified time range, DTS resumes the data synchronization task. Otherwise, the data synchronization task fails.

    Important

    The value of the Retry Time for Other Issues parameter must be smaller than the value of the Retry Time for Failed Connections parameter.

    Environment Tag

    Optional. You can select an environment tag that is used to identify the data verification instance.

  8. Save the task settings and run a precheck.

    • To view the parameters to be specified when you call the relevant API operation to configure the DTS task, move the pointer over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.

    • If you do not need to view or have viewed the parameters, click Next: Save Task Settings and Precheck in the lower part of the page.

    Note
    • DTS performs a precheck before a data verification task starts. You can start the data verification task only after the task passes the precheck.

    • If the task fails to pass the precheck, click View Details next to each failed item. After you analyze the causes based on the check results, troubleshoot the issues. Then, run a precheck again.

    • If an alert is generated for an item during the precheck, perform the following operations based on the scenario:

      • If the alert item cannot be ignored, click View Details next to the failed item and troubleshoot the issues. Then, run a precheck again.

      • If the alert item can be ignored, click Confirm Alert Details. In the View Details dialog box, click Ignore. In the message that appears, click OK. Then, click Precheck Again to run a precheck again. If you ignore the alert item, data inconsistency may occur and your business may be exposed to potential risks.

  9. View the progress of the data verification task.

    • If you have purchased a data verification instance, click Back when Success Rate reaches 100%.

    • If you have not purchased a data verification instance, perform the following steps when Success Rate reaches 100%: Click Next: Purchase Instance, read and select Data Transmission Service (Pay-as-you-go) Service Terms, click Buy and Start, and then click OK.

Configure the data verification feature for a data synchronization or migration task in DTS

  1. Go to the Data Verification page.

    Note

    If you use the previous versions of the configuration page, go to the Advanced Settings step.

    • For information about how to go to the Advanced Settings step when you create a DTS task, see the topics listed in Overview of data synchronization scenarios or Overview of data migration scenarios.

    • For an existing DTS task, perform the following steps:

      1. Go to the Data Synchronization Tasks or Data Migration Tasks page in the DTS console.

      2. Find the DTS task that you want to manage and click the task ID.

      3. On the Basic Information or Task Management page, click Create Verification Task.

  2. Configure Data Verification Mode.

    Select one or more data verification modes based on your business requirements. If you select Full Data Verification, you must also configure the parameters that are described in the following table.

    Note
    • Full Data Verification: verifies data in the full data synchronization or migration stage of the DTS task. If the task does not contain the incremental data synchronization or migration stage, data verification starts after the task is created and full data is written to the destination database. If the task contains the incremental data synchronization or migration stage, data verification starts after the task is created and incremental data is written to the destination database without latency for the first time.

    • Incremental Data Verification: verifies data that is incrementally synchronized or migrated. Data verification starts after the task is created and incremental data is written to the destination database without latency for the first time.

    • Schema Verification: verifies the schemas of objects in the data synchronization or migration task. If the task does not contain the incremental data synchronization or migration stage, data verification starts after the task is created and schemas and full data are written to the destination database. If the task contains the incremental data synchronization or migration stage, data verification starts after the task is created and incremental data is written to the destination database without latency for the first time.

    • Full data verification and incremental data verification start after schema verification is complete.

    Parameter

    Description

    Full Data Verification

    • Verify hash values based on the sampling ratio: verifies all fields of the data sampled based on the specified sampling percentage. You can specify an integer that ranges from 10 to 100 in the field as the sampling percentage.

    • Verify based on the number of table rows: verifies the number of table rows in the full data synchronization or migration stage of the DTS task. DTS does not verify the data content.

    Note

    If the full data verification mode is Verify based on the number of table rows, you are not charged. If the full data verification mode is Verify hash values based on the sampling ratio, you are charged based on the actual amount of data to be verified.

    Full Data Verification Time Rule

    You can set this parameter only to Start Now.

    Timeout Settings for Full Data Verification

    • No: does not specify a timeout period for full data verification. The system does not forcibly stop full data verification if the verification times out.

    • Yes: specifies a timeout period for full data verification. The system starts to count the duration when full data verification starts. If full data verification is not complete within the specified timeout period, the system forcibly stops the verification. The value must be an integer that ranges from 1 to 72.

    Select Verification Benchmark

    • Default: checks the data consistency between the source and destination databases based on the source and destination databases.

    • Source Database: checks the data consistency between the source and destination databases based on the source database. Objects that exist in the destination database but do not exist in the source database are not checked.

    • Destination Database: checks the data consistency between the source and destination databases based on the destination database. Objects that exist in the source database but do not exist in the destination database are not checked.

    Maximum Number of Rows Can be Read per Second

    During full data verification, DTS uses the read resources of the source and destination databases. You can enable throttling for full data verification based on your business requirements. To configure throttling, you must specify the maximum number of data rows and bytes that are read per second. This reduces the loads of the database servers.

    Note

    A value of 0 specifies that the full data verification speed is not limited. If you set the Maximum Number of Rows Can be Read per Second and Maximum Data Volume Can be Read per Second (Byte/s) parameters to 0, throttling is disabled.

    Maximum Data Volume Can be Read per Second (Byte/s)

  3. Specify Verification Objects.

    In the Selected Objects section, you can select the objects that do not require data verification and click 移除 to remove them.

    Note

    By default, the objects to be synchronized or migrated are added to the Selected Objects section.

  4. Configure alerting for data verification.

    Configure alerting for data verification based on your business requirements. The following table describes the parameters.

    Parameter

    Description

    Full Data Verification Alert

    • No: disables alerting.

    • Yes: enables alerting. You must also select and configure alert rules. The following information describes the alert rules:

      • An alert is triggered if full data verification fails.

      • An alert is triggered if the volume of inconsistent data detected by full data verification is larger than or equal to the specified threshold.

    Incremental Data Verification Alert

    • No: disables alerting.

    • Yes: enables alerting. You must also select and configure alert rules. The following information describes the alert rules:

      • An alert is triggered if incremental data verification fails.

      • An alert is triggered if the volume of inconsistent data detected by incremental data verification within the specified consecutive periods is larger than or equal to the specified threshold. You can specify the number of consecutive periods, the statistical period, and the threshold of the inconsistent data volume.

      • An alert is triggered if the latency of data migration or synchronization detected by incremental data verification within the specified consecutive periods is greater than or equal to the specified threshold. You can specify the number of consecutive periods, the statistical period, and the latency of data migration or synchronization.

    Important

    If you enable alerting for data verification and want to receive a notification when an alert is triggered, you must subscribe to alert messages in CloudMonitor. For more information, see Configure alert rules for DTS tasks in the CloudMonitor console.

  5. Complete the subsequent steps as prompted.

Related API operations

API operation

Description

ConfigureDtsJob

Configures a data verification task.

FAQ

  • Does DTS detect the database names, table names, and column names that are modified by using the name mapping feature as inconsistent data?

    The data verification feature is compatible with the name mapping feature for databases and tables. If you use the name mapping feature to modify database names, table names, and column names, only the modified column names are detected as inconsistent data.

  • Why am I unable to select a data verification mode for a DTS instance?

    The DTS instance may not support the data verification mode, or the Synchronization Types or Migration Types that corresponds to the data verification mode is not selected in the Configure Objects and Advanced Settings step.

  • Does DTS determine the data in a destination table is inconsistent from the source table if the data is filtered by using filter conditions before data synchronization or migration?

    • During full data verification, DTS determines the data is consistent if the data is filtered by using Filter Conditions before data synchronization or migration.

    • During incremental data verification, DTS determines the data is inconsistent if the data is filtered by using Filter Conditions before data synchronization or migration.