All Products
Search
Document Center

Data Online Migration:Use ossimport to migrate data

Last Updated:Jun 07, 2024

ossimport allows you to migrate data from local storage, third-party storage, or Object Storage Service (OSS) buckets in any region to OSS buckets in any region. This topic describes how to use ossimport to migrate data from third-party storage to OSS.

Example scenario

You have 500 TB of data stored in Tencent Cloud Object Storage (COS) in the Guangzhou region. You want to use ossimport to migrate the data to an OSS bucket in the China (Hangzhou) region within one week. Business continuity must be ensured during the migration process.

ossimport can be deployed in standalone mode or distributed mode.

  • Standalone mode is suitable for migrating less than 30 TB of data.

  • Distributed mode is suitable for migrating over 30 TB of data.

In this scenario, to migrate 500 TB of data, you must deploy ossimport in distributed mode.

Note

You can also use Data Online Migration to migrate data in an easier manner. For more information, see Background information.

Preparations

  • Activate OSS and create a bucket in the China (Hangzhou) region.

    • For more information about how to activate OSS, see Activate OSS.

    • For more information about how to create a bucket, see Create buckets.

  • Create a Resource Access Management (RAM) user and grant the RAM user the permissions to access OSS.

    Create a RAM user in the RAM console, grant the RAM user the permissions to access OSS, and then save the AccessKey ID and AccessKey Secret. For more information, see Preparations.

  • Optional. Purchase Elastic Compute Service (ECS) instances.

    Purchase ECS instances in the same region as the OSS bucket. For more information about ECS instance types, see General-purpose instance families. If you want to release the ECS instances after data migration, we recommend that you purchase ECS instances based on your business requirements.

    Note

    If you want to deploy ossimport to a small number of machines, you can deploy ossimport on on-premises machines. If you want to deploy ossimport to a large number of machines, we recommend that you deploy them on ECS instances. In this example, ossimport is deployed to ECS instances.

    The number of required ECS instances is calculated based on the formula: Number of required ECS instances = X/Y/(Z/100). In the formula, X indicates the amount of data that you want to migrate in TB. Y indicates the number of days planned for the migration. Z indicates the data migration speed in Mbit/s. About Z/100 TB of data can be migrated each day. If the migration speed of an ECS instance is 200 Mbit/s, about 2 TB of data can be migrated each day. In this case, you need to purchase 36 ECS instances in the example scenario. The number is calculated by using the following formula: 500/7/(200/100).

  • Configure ossimport.

    To meet the large-scale migration requirements in this example, you must deploy ossimport in distributed mode on ECS instances. For more information about the configurations of ossimport in distributed mode, such as the conf/job.cfg and conf/sys.properties configuration files and concurrency control settings, see Overview. For more information about how to deploy ossimport in distributed mode, such as downloading ossimport and troubleshooting common errors during configuration, see Distributed deployment.

Migration solution

The following figure shows the process of migrating data from third-party storage to OSS in distributed mode.

Note

After you deploy ossimport in distributed mode on the ECS instances, we recommend that you use ossimport to download data from COS in the Guangzhou region to the ECS instances in the China (Hangzhou) region over the Internet and upload data from the ECS instances to the OSS bucket within the same region over the internal network.

image

The following fees are generated in the migration process: fees for accessing the migration source and the destination bucket, outbound traffic fees for the migration source, ECS instance fees, and data storage fees. If more than 1 TB of data is to be migrated, the storage cost is proportional to the time required for the migration. Compared with the outbound traffic fees and storage fees, fewer fees are generated for using ECS instances. You can use more ECS instances for data migration to reduce the time required for the migration.

Migrate procedure

  1. Migrate all data last modified before T1.

    For more information, see the Migration section of the "Distributed deployment" topic.

    Important

    T1 is a UNIX timestamp representing the number of seconds that have elapsed since January 1, 1970, 00:00:00 UTC. Run date +%s to obtain T1.

  2. Configure mirroring-based back-to-origin rules.

    The origin keeps generating new data during the migration. To ensure business continuity and a seamless switchover, you need to configure mirroring-based back-to-origin rules. After you configure the rules, when objects that are requested by users do not exist in OSS, OSS retrieves these objects from the origin and returns the objects to the users. For more information, see Overview.

  3. In the job.cfg configuration file, set the importSince parameter to T1 and restart the migration task to migrate incremental data generated from T1 to T2.

  4. Switch the read and write operations on the business system to OSS. At this time, the business system records time at T2.

    Note
    • After Step 4 is complete, all read and write operations on your business system are switched to OSS. Data stored in third-party storage is only a copy of historical data, which can be retained or deleted based on your business requirements.

    • ossimport only migrates and verifies data, and does not delete data.

References

For more information about ossimport, see the following topics:

Deploy ossimport in distributed mode

Overview

FAQ