edit-icon download-icon

Full-database migration overview

Last Updated: Mar 21, 2018

Full-database Migration is a convenient tool that can improve user efficiency and reduce user cost. It can quickly upload all the tables in a MySQL database to MaxCompute all at the same time, saving on time that is involved in creating batch tasks for initial data migration to cloud.

For example, if a database contains 100 tables, you are supposed to configure 100 data synchronization tasks in a traditional way. With the full-database migration, you can upload all the tables at the same time. However, because of the design normalization of database tables, this tool cannot guarantee to complete the synchronization of all tables at the same time as per your business demands. In other words, it has limits too. This article describes the full-database migration feature in terms of its functions and limits.

Task generation rules

After the configuration is completed, MaxCompute tables are created and data synchronization tasks are generated based on the selected tables to be synchronized.

The table names, field names, and field types of the MaxCompute tables are generated according to the advanced settings. If no advanced settings are set, the structure of MaxCompute tables are identical to that of MySQL tables. The partition of these tables is pt, and its format is yyyymmdd.

The generated data synchronization tasks are cyclic tasks scheduled on a daily basis. They run automatically in the morning on the next day with a transfer rate of 1 MB/s. The actual performance of synchronization tasks varies with the selected synchronization mode and concurrency setting. You can locate the generated tasks by clicking clone_database > Data source name > mysql2odps_table name in the directory tree of synchronization tasks to customize them as needed.

Note:

We recommend that you perform a smoke test on the data synchronization tasks on the same day. You can find all the synchronization tasks generated by a data source in project_etl_start > Full-database Migration > Data Source Name under O&M Center > Task Management, and then right-click to test corresponding task nodes.

Limits

Full-database migration is subject to certain limits because of the design normalization of database tables. The limits include:

  • Currently, only the full-database migration from the Mysql data source to MaxCompute is supported.

    The migration feature for Hadoop/Hive and Oracle data sources will be available in the future.

  • Only the daily incremental and daily full upload modes are available.

  • If you want to synchronize historical data at a time, this feature cannot meet your needs. The following are a few suggestions for you to consider:

    • You can configure daily tasks instead of synchronizing historical data at the same time. You can trace the historical data with the provided data completing, which eliminates the need to run temporary SQL tasks to split data after the historical data is fully synchronized.
    • You can configure a task on the task development page and click Run. After that, convert data using SQL statements. They are both one-time operations.
  • If your daily incremental upload is subject to a special business logic and cannot be identified by a date field, this feature cannot meet your needs, and we provide the following suggestions:

    • The incremental upload of data can be achieved through binlog (available in the DTS product) or the date field for data change provided by databases. Currently, Data Integration supports the latter method, and thus your database must contain the date field for data change. The system can determine if your data is changed on the same day as the business date using this field. If yes, all the changed data is synchronized.

    • To facilitate incremental upload, we recommend that you include the gmt_create and gmt_modify fields when creating any database tables. Meanwhile, you can set the id field as the primary key to improve efficiency.

  • Full-database migration supports batch upload and full upload.

    Batch upload is configured with time intervals. Currently, the connection pool protection feature for data sources is not provided, which will be available soon.

  • To prevent the database from being overloaded, the full-database migration provides the batch upload mode, which enables you to split tables in batches by a time interval and prevents compromised service functionality because of the database overload. We have two suggestions:

    • If you have master and slave databases, we recommend that you synchronize the data of the slave database.

    • In batch tasks, each table pertains to a database connection with the maximum speed of 1 Mbit/s. If you run the synchronization tasks for 100 tables at the same time, 100 database connections are established. For this reason, make sure to select an appropriate concurrency based on your business conditions.

  • If you are looking for a specific task transfer rate, this feature cannot meet your demand. The maximum speed of any generated tasks is 1 Mbit/s.

  • Only the mapping of all table names, field names, and field types are supported.

    During the full-database migration, MaxCompute tables are created automatically, where the partition field is pt, the field type is string, and the format is yyyymmdd.

Note:

When you select tables for synchronization, all fields must be synchronized and none of these fields can be edited.

Thank you! We've received your feedback.