All Products
Search
Document Center

ApsaraDB RDS:Synchronize data from a MySQL instance to a DuckDB-based analytical primary instance

Last Updated:Dec 10, 2025

A DuckDB-based analytical primary instance supports features such as columnar storage, just-in-time (JIT) compilation, vectorized execution, efficient memory management, and parallel processing. Compared with the InnoDB engine, it improves the performance of complex queries by a hundredfold. You can use Data Transmission Service (DTS) to synchronize the database schema and data from an RDS for MySQL instance to a DuckDB-based analytical primary instance. This lets you quickly build enterprise business intelligence (BI), interactive query, and real-time reporting systems.

Applicability

  • A destination DuckDB-based analytical primary instance has been created. For more information, see Create a DuckDB-based analytical primary instance.

  • Ensure that the storage space of the destination DuckDB-based analytical primary instance is larger than the storage space used by the source database.

Precautions

  • Tables to be synchronized in the RDS for MySQL instance must have primary keys.

  • When DTS performs initial full data synchronization, it consumes read and write resources on the source and destination databases, which may increase the database load. We recommend that you synchronize data during off-peak hours, for example, when the CPU load of both databases is below 30%.

  • Do not perform online DDL operations on synchronization objects in the source database using tools such as gh-ost or pt-online-schema-change during data synchronization. Otherwise, the synchronization fails.

  • Because of the limits of a DuckDB-based analytical primary instance, the cluster is locked if the disk space usage of a node exceeds 80%. Estimate the space required for the objects to be synchronized in advance and ensure that the destination cluster has sufficient storage space.

  • Prefix indexes are not supported. If the source database has prefix indexes, the data synchronization may fail.

  • If the destination DuckDB-based analytical primary instance is being backed up while the DTS task is running, the task fails.

  • A DuckDB-based analytical primary instance cannot be used as a data source for data synchronization.

  • If a DML statement contains LIMIT, a non-deterministic function such as UUID(), or an XA transaction, it is considered an Unsafe operation for replication. This may cause data inconsistency. Avoid using these statements when you synchronize data to a DuckDB-based analytical primary instance.

Billing

Synchronization typeTask configuration fee
Schema synchronization and full data synchronizationFree of charge.
Incremental data synchronizationCharged. For more information, see Billing overview.

Supported source instance types

The following instance types are supported for the source MySQL database in a data synchronization task:

  • RDS for MySQL

  • A self-managed database on an ECS instance

  • An on-premises self-managed database

This topic uses RDS for MySQL as an example to describe the configuration procedure. The procedure is similar if the source database is a self-managed MySQL database.

Note

If your source database is a self-managed MySQL database, you must make additional preparations. For more information, see Overview of preparations.

Supported SQL operations

  • DDL operations: CREATE TABLE, DROP TABLE, RENAME TABLE, TRUNCATE TABLE, ADD COLUMN, DROP COLUMN, and MODIFY COLUMN

  • DML operations: INSERT, UPDATE, and DELETE

Note
  • You cannot specify ORDER BY or LIMIT in UPDATE or DELETE statements.

  • If you change the field type of a source table during data synchronization, the synchronization job reports an error and stops. You can manually fix the issue. For more information, see Fix synchronization failures caused by changing field types.

Database account permissions

Database

Required permissions

RDS MySQL

We recommend that you use a privileged account. If you use a standard account, it must have the REPLICATION CLIENT, REPLICATION SLAVE, SHOW VIEW, and SELECT permissions on all synchronization objects.

A DuckDB-based analytical primary instance

We recommend that you use a privileged account. If you use a standard account, it must have the READ and WRITE permissions.

Procedure

  1. Log on to the Data Transmission Service console. In the navigation pane on the left, choose Data Synchronization. In the top navigation bar, select a region.

  2. On the Data Synchronization page, click Create Task to create a synchronization task.

  3. On the Create Synchronization Task > Source & Destination DB page, configure the source and destination databases for the synchronization task.

    Source instance information

    Configuration

    Description

    Database Type

    Relational Database > MySQL

    Connection Type

    Cloud Instance

    Instance Region

    The region of the source instance that you selected when you purchased the data synchronization instance. This parameter cannot be changed.

    Cross-account

    No cross-account access

    Instance ID

    Select the ID of the source RDS instance.

    Database Account

    Enter the database account of the source RDS instance. For information about the required permissions, see Database account permissions.

    Database Password

    Enter the password that corresponds to the database account.

    Connection Method

    Select Unencrypted Connection or SSL Secure Connection as needed. If you select SSL Secure Connection, you must enable Secure Sockets Layer (SSL) encryption for the RDS instance in advance. For more information, see Use a cloud certificate to quickly enable SSL encryption.

    Important

    The Encryption parameter is available only within regions in the Chinese mainland and the China (Hong Kong) region.

    Destination instance information

    Configuration

    Description

    Database Type

    Relational Database > MySQL

    Connection Type

    Cloud Instance

    Instance Region

    The region of the destination instance that you selected when you purchased the data synchronization instance. This parameter cannot be changed.

    Cross-account

    Single account

    Note

    DTS synchronization tasks support cross-region data synchronization and migration. This topic uses data synchronization between instances in the same region and VPC as an example to describe how to ensure data consistency. For more information about data synchronization, see Overview of data synchronization scenarios.

    RDS Instance ID

    The instance ID of the DuckDB instance.

    Database Account

    Enter the database account of the DuckDB-based analytical primary instance. For information about the required permissions, see Database account permissions.

    Database Password

    Enter the password that corresponds to the database account.

    Connection Method

    Select Unencrypted Connection or SSL Secure Connection as needed. If you select SSL Secure Connection, you must enable SSL encryption for the RDS instance in advance. For more information, see Use a cloud certificate to quickly enable SSL encryption.

    Important

    The Encryption parameter is available only within regions in the Chinese mainland and the China (Hong Kong) region.

  4. Click Test Connection To Proceed.

  5. In the Authorize DTS Servers To Access DB dialog box, click Test Connection.

  6. Configure the task objects.

    1. On the Object Configuration page, configure the objects to be synchronized.

      Configuration

      Description

      Synchronization Types

      • If you only need to perform full synchronization, we recommend that you select both Schema Synchronization and Full Synchronization.

      • To perform synchronization without downtime, we recommend that you select Schema Synchronization, Full Synchronization, and Incremental Synchronization.

      Note
      • If you do not select Schema Synchronization, make sure that the destination database contains the databases and tables that will receive the data. You can then use the table and column name mapping feature in the Selected Objects box as needed.

      • If you do not select Incremental Synchronization, do not write new data to the source instance during data synchronization to ensure data consistency.

      Trigger Migration Method For Source Database

      Select a method to migrate triggers as needed. If the objects to be migrated do not involve triggers, you do not need to configure this parameter. For more information, see Configure a method to synchronize or migrate triggers.

      Note

      You can configure this parameter only when you select Schema Synchronization for Synchronization Types.

      Enable Migration Evaluation

      This feature evaluates whether the schemas of the source and destination databases, such as index length, stored procedures, and dependent tables, meet the requirements. You can select Yes or No as needed.

      Note
      • You can configure this parameter only when you select Schema Synchronization for Synchronization Types.

      • If you select Yes, the precheck may take longer. You can view the Evaluation Result during the precheck phase. The evaluation result does not affect the precheck result.

      Synchronization Topology

      Select One-way Synchronization.

      Processing Mode For Tables With The Same Name In The Destination Database

      • Precheck And Report An Error: Checks whether a table with the same name exists in the destination database. If no table with the same name exists, the check item passes. If a table with the same name exists, an error is reported during the precheck phase and the data synchronization task does not start.

        Note

        If you cannot delete or rename the table with the same name in the destination database, you can change the name of the table in the destination database. For more information, see Map table and column names.

      • Ignore And Proceed: Skips the check for tables with the same name in the destination database.

        Warning

        If you select Ignore And Proceed, data inconsistency may occur, which may pose risks to your business. For example:

        • If the table schemas are consistent and a record in the destination database has the same primary key value as a record in the source database:

          • During full synchronization, DTS retains the record in the destination cluster. The record from the source database is not migrated to the destination database.

          • During incremental synchronization, DTS does not retain the record in the destination cluster. The record from the source database overwrites the record in the destination database.

        • If the table schemas are inconsistent, only data from some columns may be synchronized, or the synchronization may fail. Proceed with caution.

      Migrate Events

      Select whether to migrate events from the source database as needed. If you select Yes, you must also follow the relevant requirements and perform subsequent operations. For more information, see Synchronize or migrate events.

      Case-sensitivity For Destination Object Names

      You can configure the case-sensitivity policy for the names of synchronized databases, tables, and columns in the destination instance. By default, DTS Default Policy is selected. You can also choose to keep the policy consistent with the source or destination database. For more information, see Case-sensitivity for destination object names.

      Source Objects

      In the Source Objects box, select the objects to be synchronized and click 向右小箭头 to move them to the Selected Objects box.

      Note

      You can select objects at the database, table, and column levels. If you select tables or columns, other objects such as views, triggers, and stored procedures are not synchronized to the destination database.

      Selected Objects

      • To change the name of a single synchronized object in the destination instance, right-click the object in the Selected Objects box. For information about how to set the name, see Map table and column names.

      • To change the names of multiple synchronized objects in the destination instance in a batch, click Batch Edit in the upper-right corner of the Selected Objects box. For information about how to set the names, see Map table and column names.

      Note
      • To select SQL operations to be synchronized at the database or table level, right-click the object in the Selected Objects box and select the desired SQL operations in the dialog box that appears.

      • To set a WHERE clause to filter data, right-click the table in the Selected Objects box and set the filter condition in the dialog box that appears. For information about how to set the filter condition, see Set a filter condition.

    2. Click Next: Advanced Configuration to configure advanced parameters.

      In this example, the default configurations are used. You can also configure the parameters as needed.

      Configuration description

      Configuration

      Description

      Select A Dedicated Cluster To Schedule The Task

      By default, DTS schedules tasks on a shared cluster, so you do not need to select a cluster. If you want more stable tasks, you can purchase a dedicated cluster to run your DTS synchronization tasks. For more information, see What is a DTS dedicated cluster?.

      Replicate Temporary Tables Generated By Online DDL Tools To The Destination Database

      If you use Data Management (DMS) or gh-ost to perform online DDL changes on the source database, you can select whether to synchronize the temporary table data generated by the online DDL changes.

      Important

      DTS tasks do not support online DDL changes made by tools such as pt-online-schema-change. Otherwise, the DTS task fails.

      • Yes: Synchronizes the temporary table data generated by online DDL changes.

        Note

        If the temporary table data generated by online DDL changes is too large, the synchronization task may experience latency.

      • No, Adapt To DMS Online DDL: Does not synchronize the temporary table data generated by online DDL changes. Only the original DDL statements executed by DMS on the source database are synchronized.

        Note

        This option causes locked tables in the destination database.

      • No, Adapt To Gh-ost: Does not synchronize the temporary table data generated by online DDL changes. Only the original DDL statements executed by gh-ost on the source database are synchronized. You can also use the default regular expressions for gh-ost shadow tables and useless tables or configure your own.

        Note

        This option causes locked tables in the destination database.

      Migrate Accounts

      Select whether to synchronize the account information of the source database as needed. If you select Yes, you must also select the accounts to be synchronized and confirm their permissions. For information about authorization methods, see Migrate database accounts.

      Retry Duration After Connection Failure

      If the connection to the source or destination database fails after the synchronization task starts, DTS reports an error and immediately starts retrying the connection. The default retry duration is 720 minutes, but you can set a custom duration from 10 to 1,440 minutes. We recommend setting the duration to more than 30 minutes. If DTS reconnects to the database within the specified retry duration, the task automatically resumes. Otherwise, the task fails.

      Note
      • For multiple DTS instances with the same source or destination, such as DTS instance A and DTS instance B, if you set the network retry duration to 30 minutes for A and 60 minutes for B, the shorter duration of 30 minutes applies.

      • Because DTS charges for task runtime during connection retries, we recommend that you set a custom retry duration based on your business needs or release the DTS instance as soon as possible after the source and destination database instances are released.

      Retry Duration For Other Errors

      If a non-connection error, such as a DDL or DML execution error, occurs in the source or destination database after the task starts, DTS reports an error and immediately retries the operation. The default retry duration is 10 minutes, but you can set a custom duration from 1 to 1,440 minutes. We recommend setting the duration to more than 10 minutes. If the operation succeeds within the specified retry duration, the task automatically resumes. Otherwise, the task fails.

      Important

      The value of Retry Duration For Other Errors must be less than the value of Retry Duration After Connection Failure.

      Limit Full Migration Speed

      During the full synchronization phase, DTS consumes read and write resources on the source and destination databases, which may increase the database load. To reduce the pressure on the destination database, you can limit the full synchronization speed by setting QPS Limit For Source DB, RPS Limit For Full Migration, and BPS Limit For Full Migration (MB).

      Note

      You can configure this parameter only when you select Full Synchronization for Synchronization Types.

      Limit Incremental Synchronization Speed

      To reduce the pressure on the destination database, you can also limit the incremental synchronization speed by setting RPS Limit For Incremental Sync and BPS Limit For Incremental Sync (MB).

      Remove Heartbeat Table SQL For Two-way Tasks

      Select whether to write heartbeat SQL information to the source database while the DTS instance is running.

      • Yes: Does not write heartbeat SQL information to the source database. The DTS instance may show a delay.

      • No: Writes heartbeat SQL information to the source database. This may affect features such as physical backup and cloning of the source database.

      Environment Tag

      You can select an environment tag to identify the instance as needed.

      Scenario Tag

      You can select a scenario tag to identify the use case of the instance as needed.

      Configure ETL

      Select whether to configure the extract, transform, and load (ETL) feature. For more information about ETL, see What is ETL?.

      Sensitive Data Scan And Desensitization

      You can enable the sensitive data scan and desensitization feature to identify and process sensitive information in the transmitted data.

    3. Click Next: Data Validation to configure a data validation task.

      To use the data validation feature, select the data validation methods as needed. You can keep the default configurations for other parameters or configure them as needed. For information about how to configure the parameters, see Configure data validation.

      Note

      If you select schema validation, the validation process may report a "data inconsistency" error even if the schema has been synchronized. This is because of historical compatibility issues. This alert does not indicate a critical issue and does not affect the actual synchronization result. To avoid false positives, we recommend that you do not enable schema validation.

      Configuration description

      Parameter

      Description

      Full Validation Mode

      • Full-field Validation On Sampled Rows: Configure the sampling percentage to perform full-field validation on the sampled data. The value must be an integer from 10 to 100.

      • Row Count Validation: Validates the number of rows in the full data task. The specific data content is not validated.

      Note

      Row Count Validation is free of charge. Full-field Validation On Sampled Rows is charged based on the actual amount of data validated.

      Full Validation Time Rule

      Only Start Now is supported.

      Full Validation Timeout

      • Not Set: The full validation task is not forcibly ended if it times out.

      • Set: Sets the delay time for ending the full validation task. The timer starts after the full validation task begins. If the validation task is not completed within the specified time, it is forcibly ended. The value must be an integer from 1 to 72.

      Full Validation Baseline

      • Default: Uses the union of the source and destination databases as the baseline to validate data consistency.

      • Source: Uses the source database as the baseline to validate data consistency with the destination database. Data that exists only in the destination database is not validated.

      • Destination: Uses the destination database as the baseline to validate data consistency with the source database. Data that exists only in the source database is not validated.

      Maximum RPS For Full Validation

      Full data validation consumes read resources from the database. You can set a speed limit for the full validation task by specifying the number of rows and the amount of data read per second to reduce the pressure on the database.

      Note

      A value of 0 indicates no limit. If both Maximum RPS For Full Validation and Maximum BPS For Full Validation are set to 0, no speed limit is applied.

      Maximum BPS For Full Validation

      Incremental Validation Baseline

      You can filter the DML operations to be validated as needed.

      All objects are validated by default. You can remove unnecessary objects from the right side.

      In the Selected Objects box, select the objects that you do not want to validate and click 移除 to remove them.

      Note

      By default, DTS moves the objects to be synchronized or migrated to the Selected Objects box.

      Full Validation Alert

      • Not Set: Does not set an alert.

      • Set: Sets an alert. You must also select and configure alert rules. The alert rules are as follows:

        • An alert is triggered when the full validation task fails.

        • Set a threshold for data inconsistency. An alert is triggered when the amount of inconsistent data in the full validation task is greater than or equal to the threshold.

      Incremental Validation Alert

      • Not Set: Does not set an alert.

      • Set: Sets an alert. You must also select and configure alert rules. The alert rules are as follows:

        • An alert is triggered when the incremental validation task fails.

        • Set the number of statistical periods, the statistical period, and the inconsistency threshold. An alert is triggered when the number of inconsistent records in the incremental validation task is greater than or equal to the threshold for the specified number of consecutive periods.

        • Set the number of statistical periods, the statistical period, and the latency threshold. An alert is triggered when the latency of the incremental validation task is greater than or equal to the threshold for the specified number of consecutive periods.

    4. Click Next: Save Task And Precheck to perform a precheck on the synchronization task.

  7. After the Precheck Passes With A Score Of 100%, click Next: Purchase.

    Note
    • A precheck is performed before the synchronization job starts. The job can start only after it passes the precheck.

    • If the precheck fails, click the 提示 icon next to the failed check item to view the details.

      • You can fix the issue based on the cause and run the precheck again.

      • If you do not need to fix the alert items, you can click Confirm To Shield and then Ignore Alert Items And Rerun Precheck to skip the alert items and run the precheck again.

  8. On the Purchase page, select the billing method and link specification for the data synchronization instance. For more information, see the following table.

    Category

    Parameter

    Description

    Configuration

    Billing Method

    • Subscription: You pay when you create the instance. This is suitable for long-term needs and is more cost-effective than pay-as-you-go. The longer the subscription duration, the higher the discount.

    • Pay-as-you-go: You are charged on an hourly basis. This is suitable for short-term needs. You can release the instance immediately after use to save costs.

    Resource Group

    The resource group to which the instance belongs. The default value is default resource group. For more information, see What is Resource Management?.

    Link Specification

    DTS provides synchronization specifications with different performance levels. The link specification affects the synchronization speed. You can select a specification based on your business scenario. For more information, see Specifications of data synchronization links.

    Subscription Duration

    In subscription mode, select the duration and quantity for the subscription instance. You can select a monthly subscription from 1 to 9 months, or a yearly subscription of 1, 2, 3, or 5 years.

    Note

    This option is available only when the billing method is Subscription.

  9. After the configuration is complete, read and select Data Transmission Service (Pay-As-You-Go) Terms Of Service. Click Purchase And Start, and in the Confirm dialog box, click OK.

  10. After the purchase is successful, you are returned to the Data Synchronization page. Click the ID of the task that you created to go to the task details page. In the navigation pane on the left, click Task Management to view the progress. When the task status changes to Synchronizing, the historical data has been migrated and incremental data is being synchronized in real time.

Fix synchronization failures caused by changing field types

If you change the field type of a source table during data synchronization, the synchronization job reports an error and stops. You can follow these steps to manually fix the issue.

  1. In the destination instance, create a new table B (for example, customer_new) based on the failed table A (for example, customer). Adjust the field types of the new table B as needed.

  2. Run the INSERT INTO SELECT command to copy the data from table A to the new table B. Ensure that the data in both tables is consistent.

  3. Rename or delete the failed table A, and then rename table B to customer.

  4. In the DTS console, restart the data synchronization job.

References