You can use ossimport to migrate data from local storage, third-party storage, or Object Storage Service (OSS) buckets in a region to OSS buckets in different regions. This topic describes how to use ossimport to migrate data from a third-party storage service to OSS.

Background information

Assume that a user has 500 TB of data stored in the Guangzhou region of Tencent Cloud Object Storage (COS). The user wants to use ossimport to migrate the data to a bucket in the China (Hangzhou) region of OSS within a week. During the migration process, business operations must continue to run normally.

ossimport can be deployed in standalone mode or distributed mode:
  • The standalone mode is applicable when you migrate data volumes smaller than 30 TB.
  • The distributed mode is suitable to migrate data volumes larger than 30 TB.

To migrate large amounts of data, deploy ossimport in distributed mode.

Note You can also use data online migration to easily migrate data. For more information, see Background information about Data Online Migration.

Preparations

  • Activate OSS. Create a bucket in the China (Hangzhou) region.
    • For more information about how to activate OSS, see Activate OSS.
    • For more information about how to create a bucket, see Create buckets.
  • Configure a Resource Access Management (RAM) user and grant OSS access permissions to the RAM user.

    Create a RAM user in the RAM console. Authorize the RAM user to access OSS, and then save the AccessKey ID and the AccessKey secret. For more information, see Create and authorize a RAM user.

  • (Optional) Purchase an Elastic Compute Service (ECS) instance.
    The ECS instance and OSS instance are located in the same region, which is China (Hangzhou). For more information about ECS instances, see General-purpose instance families. We recommend that you purchase a pay-as-you-go instance if you want to release the ECS instance after the data is migrated.
    Note If you want to deploy ossimport to a small number of machines, you can deploy them locally. If you want to deploy ossimport to a large number of machines, we recommend that you deploy them on an ECS instance. An ECS instance is used in the example to show how to perform a migration task.

    The number of required ECS instances is calculated based on the formula: Number of required ECS instances = X/Y/(Z/100). In the formula, X indicates the amount of data to be migrated. Y indicates the required duration in days. Z indicates the migration speed in Mbit/s (about Z/100 TB of data to be migrated each day). If the migration speed of an ECS instance reaches 200 Mbit/s (about 2 TB of data is migrated each day), you need to purchase 36 ECS instances (500/7/2) in the preceding example).

  • Configure ossimport properly.

    To meet the large-scale migration requirements in this example, you must deploy ossimport in distributed mode on ECS. For more information about the configuration and definition of distributed deployment, such as conf/job.cfg, conf/sys.properties, and concurrency control, see Architectures and configurations. For more information about operations on distributed deployment such as downloading ossimport and troubleshooting common errors during configurations, see Distributed deployment.

Migration solutions

The following process describes how to migrate data from a third-party storage service to OSS in distributed mode.

Note After you deploy ossimport in the distributed environment on the ECS instance, ossimport downloads data from the Guangzhou region of Tencent Cloud Object Storage (COS) to the ECS instance located in the China (Hangzhou) region. We recommend that you use the Internet when you migrate data. To use ossimport to upload data from the ECS instance to the OSS instance within the China (Hangzhou) region, we recommend that you use the internal network.

Costs involved in the migration process include the fees incurred when the source and destination buckets are accessed, outbound traffic fees for the source bucket, ECS instance fees, data storage fees. If more than 1 TB of data is to migrate, the storage cost and the migration period increase proportionally. Compared with the data transfer and storage fees, fewer fees are incurred when you use ECS. If more ECS instances are used, the migration period is shortened.

Implementation

  1. Migrate all data last modified before T1.
    For more information, see the Running section in Distributed deployment.
    Notice T1 is a Unix timestamp that indicates the number of milliseconds that have elapsed since the epoch time January 1, 1970, 00:00:00 UTC. You can run the date +%s command to obtain the value.
  2. Configure a mirroring-based back-to-origin rule.
    The origin keeps generating new data during the migration. To ensure business continuity and a seamless switchover, you need to configure a back-to-origin rule. When objects that are requested by users do not exist in OSS, OSS retrieves these objects from the origin and returns the objects to the users. For more information, see Overview.
  3. Switch the read/write operations on the business system to OSS. At this time, the business system records the time at T2.
  4. Open the job.cfg configuration file. Specify importSince=T1. Reinitiate the migration task to migrate incremental data last modified between T1 and T2.
    Note
    • After Step 4 is complete, all read and write operations on your business system are switched to OSS. Data stored in the third-party storage service is only a copy of historical data, which can be retained or deleted.
    • ossimport only migrates and verifies data, but does not delete data.

References

For more information about ossimport, see the following topics:

Distributed deployment

Architectures and configurations

Troubleshooting