All Products
Search
Document Center

Object Storage Service:Use ossimport to migrate data

Last Updated:Nov 14, 2023

ossimport allows you to migrate data from local storage, third-party storage, or OSS buckets in any region to OSS buckets in any region. This topic describes how to use ossimport to migrate data from third-party storage to OSS.

Background information

A user has 500 TB of data stored in the Guangzhou region of Tencent Cloud Object Storage (COS). The user wants to use ossimport to migrate the data to an OSS bucket in the China (Hangzhou) region within one week. The business must run properly during the migration process.

ossimport can be deployed in standalone mode or distributed mode:

  • Standalone mode is suitable for migrating data whose size is smaller than 30 TB.

  • Distributed mode is suitable for migrating data whose size is larger than 30 TB.

To migrate 500 TB of data, deploy ossimport in distributed mode.

Note

You can also use Data Online Migration to easily migrate data. For more information, see Background information.

Before you begin

  • Activate OSS and create a bucket in the China (Hangzhou) region.

    • For more information about how to activate OSS, see Activate OSS.

    • For more information about how to create a bucket, see Create buckets.

  • Create a RAM user and grant the RAM user the permissions to access OSS.

    Create a RAM user in the RAM console, grant the RAM user the permissions to access OSS, and then save the AccessKey ID and AccessKey Secret. For more information, see Before you begin.

  • Purchase an ECS instance. This step is optional.

    Purchase an Elastic Compute Service (ECS) instance in the same region as the OSS bucket. For more information about ECS instance types, see General-purpose instance families. If you want to release an ECS instance after data migration, we recommend that you purchase an ECS instance based on your business requirements.

    Note

    If you want to deploy ossimport to a small number of machines, you can deploy them locally. If you want to deploy ossimport to a large number of machines, we recommend that you deploy them on ECS instances. In this example, ECS instances are used to show how to perform a data migration task.

    The number of required ECS instances is calculated based on the formula: Number of required ECS instances = X/Y/(Z/100). In the formula, X indicates the amount of data that you want to migrate. Y indicates the time required for the migration in days. Z indicates the data migration speed in Mbit/s (about Z/100 TB of data migrated each day). If the migration speed of an ECS instance reaches 200 Mbit/s (about 2 TB of data migrated each day), you need to purchase 36 ECS instances (500/7/2) in the preceding example.

  • Configure ossimport

    To meet the large-scale migration requirements in this example, you must deploy ossimport in distributed mode on ECS instances. For more information about how to configure the parameters required to deploy ossimport in distributed mode, such as conf/job.cfg, conf/sys.properties, and concurrency control, see Overview. For more information about operations on distributed deployment such as downloading ossimport and troubleshooting common errors during configurations, see Distributed deployment.

Migration solution

The following figure shows the process of migrating data from third-party storage to OSS in distributed mode.

Note

After you deploy ossimport in distributed mode on the ECS instances, ossimport downloads data from the Guangzhou region of COS to the ECS instances located in the China (Hangzhou) region. We recommend that you use ossimport to download data over the Internet. To use ossimport to upload data from the ECS instances to the OSS bucket in the China (Hangzhou) region, we recommend that you use the internal network.

image

The following fees are generated in the migration process: fees for accessing the migration source and the destination bucket, outbound traffic fees for the migration source, ECS instance fees, and data storage fees. If more than 1 TB of data is to be migrated, the storage cost is proportional to the time required for the migration. Compared with the outbound traffic fees and storage fees, fewer fees are generated when you use an ECS instance. Using more ECS instances for data migration shortens the time required for the migration.

Migrate data

  1. Migrate all data last modified before T1.

    For more information, see Running.

    Important

    T1 is a timestamp that follows the UNIX time format. It is the number of seconds that have elapsed since 00:00:00 Thursday, January 1, 1970. Run date +%s to obtain T1.

  2. Configure a mirroring-based back-to-origin rule.

    The origin keeps generating new data during the migration. To ensure business continuity and a seamless switchover, you need to configure a mirroring-based back-to-origin rule. After you configure the rule, when objects that are requested by users do not exist in OSS, OSS retrieves these objects from the origin and returns the objects to the users. For more information, see Overview.

  3. Open the job.cfg configuration file and specify importSince=T1 to reinitiate the migration task to migrate incremental data generated from T1 to T2.

  4. Switch the read and write operations on the business system to OSS. At this time, the business system records time at T2.

    Note
    • After Step 4 is complete, all read and write operations on your business system are switched to OSS. Data stored in third-party storage is only a copy of historical data, which can be retained or deleted based on your business requirements.

    • ossimport only migrates and verifies data, and does not delete any data.

References

For more information about ossimport, see the following topics:

Distributed deployment

Overview

FAQ