This topic describes the restrictions on using the database migration feature to migrate databases to MaxCompute, rules for generating sync nodes, and supported database types.

The database migration feature allows you to quickly upload all tables in a database to MaxCompute in an efficient and cost-effective manner. This saves the time spent on creating multiple nodes one by one to migrate your initial data to the cloud.

For example, if your database contains 100 tables, the conventional method may require you to configure 100 sync nodes. With database migration, you can migrate all the tables at a time. However, some tables may fail to be migrated due to the restrictions of table design specifications of databases.

Restrictions on database migration

In consideration of table design specifications of databases, note the following restrictions on using the database migration feature:
  • Currently, you can only migrate the following types of databases to MaxCompute: MySQL, PostgreSQL, SQL Server, Distributed Relational Database Service (DRDS), POLARDB, AnalyticDB for PostgreSQL, HybridDB for MySQL, AnalyticDB for MySQL 3.0, Oracle, and Dameng (DM).
  • You can only upload incremental or full data on a daily basis.
    You cannot use the database migration feature to migrate all historical data at a time. To migrate historical data, we recommend that you perform the following operations:
    • Configure a scheduled node to migrate data on a daily basis. You can also create retroactive node instances to transmit historical data. This eliminates the need to execute temporary SQL statements to split data into partitions after all the historical data is migrated.
    • If you need to migrate all the historical data at a time, configure a sync node in DataStudio and click the Run icon to run the node. Daily incremental migration and daily full migration are both one-time operations. After the migration is completed, you must execute SQL statements to convert the data.
    If your daily incremental migration node is subject to special business logic where incremental data cannot be simply identified by a date field, you cannot use the database migration feature. In this case, we recommend that you perform the following operations:
    • Add a field that indicates the data change timestamp to your database. Generally, incremental data can be identified based on the binary logs provided by Data Transmission Service (DTS) or the date field that indicates the data change timestamp.

      Currently, Data Integration identifies incremental data based on the date field that indicates the data change timestamp. Therefore, your database must contain such a field. Data Integration can use this date field to detect whether the data is modified on the same day as the data timestamp. If yes, all modified data is synchronized.

    • To facilitate incremental upload, verify that your tables contain the gmt_create and gmt_modify fields. In addition, we recommend that you add the ID field as a primary key to improve upload efficiency.
  • Database migration supports data upload at a time and data upload in batches. Data upload in batches is to upload a part of data at the specified interval each time. Currently, Data Integration does not protect database connection pools.
    • We recommend that you upload tables in batches at the specified interval to protect your database from being overloaded and avoid affecting the business.
      • If you have primary and secondary databases, we recommend that you upload data of the secondary database.
      • During data upload in batches, each table requires a database connection with the maximum data transmission rate of 1 Mbit/s. If you upload 100 tables at a time, 100 database connections are established. We recommend that you determine the number of concurrent threads based on your business requirements.
    • The maximum data transmission rate of all sync nodes is 1 Mbit/s, which may fail to meet your specific requirements for data transmission efficiency.
  • Database migration only supports mapping between the names, field names, and data types of the source and destination tables.

    During database migration, Data Integration automatically creates MaxCompute tables. The partition field of the tables is pt, which is of the string type and is in the yyyymmdd format.

    Note After you select the tables to be uploaded, all fields in the tables are uploaded and you cannot edit the fields.

Rules for generating sync nodes

After you set parameters for database migration, Data Integration creates MaxCompute tables and generates corresponding sync nodes in sequence based on the tables you select.

The names, field names, and data types of MaxCompute tables are determined based on the advanced settings. If you do not configure advanced settings, the names, field names, and data types are the same as those of the source tables. The partition field of the tables is pt, which is in the yyyymmdd format.

The generated sync nodes are daily scheduled and start to run automatically in the early morning from the next day with the maximum data transmission rate of 1 Mbit/s. The nodes may vary by synchronization method and concurrency. You can expand the created workflow such as clone_database in DataStudio, find the target node, for example, mysql2odps_Table name under Data Integration, and then edit the node as required.