ossimport migrates data from local storage, third-party storage, or Object Storage Service (OSS) buckets in any region to OSS buckets in any region. This topic focuses on large-scale migration from third-party cloud storage to OSS by using distributed-mode ossimport.
Example scenario
You have 500 TB of data in Tencent Cloud Object Storage (COS) in the Guangzhou region and want to migrate it to an OSS bucket in the China (Hangzhou) region within one week, while maintaining business continuity.
Deployment modes
ossimport supports two deployment modes:
Mode | Data volume | Use case |
Standalone mode | Less than 30 TB | Small-scale migrations |
Distributed mode | More than 30 TB | Large-scale migrations |
Because this scenario involves 500 TB, you must deploy ossimport in distributed mode.
Data Online Migration provides a simpler migration workflow. See Background information.
Prerequisites
Before you begin, ensure that you have:
Activated OSS and created a bucket in the China (Hangzhou) region. To activate OSS, see Activate OSS. To create a bucket, see Create buckets.
Created a Resource Access Management (RAM) user with permissions to access OSS, and saved the AccessKey ID and AccessKey Secret. See Preparations.
(Optional) Purchased Elastic Compute Service (ECS) instances in the same region as the OSS bucket. For a small number of machines, use on-premises machines. For a large number of machines, deploy ossimport on ECS instances. This example uses ECS instances. For ECS instance types, see General-purpose instance families (g series). If you plan to release ECS instances after migration, purchase them based on your business requirements.
Configured ossimport in distributed mode on the ECS instances. In distributed mode, configure the
conf/job.cfgandconf/sys.propertiesconfiguration files and concurrency control settings. For details, see Overview (discontinued). For deployment instructions, including downloading ossimport and troubleshooting common errors, see Distributed deployment.
Calculate the number of ECS instances
Use the following formula to determine how many ECS instances you need:
Number of ECS instances = X / Y / (Z / 100)Variable | Description |
X | Amount of data to migrate, in TB |
Y | Number of days planned for the migration |
Z | Migration speed per ECS instance, in Mbit/s |
Approximately Z/100 TB of data can be migrated per day. For example, if one ECS instance migrates data at 200 Mbit/s, about 2 TB can be migrated each day.
Example calculation: To migrate 500 TB in 7 days at 200 Mbit/s per instance:
500 / 7 / (200 / 100) = 36 ECS instancesMigration network path
After you deploy ossimport in distributed mode on the ECS instances, the data flows as follows:
ossimport downloads data from COS in the Guangzhou region to ECS instances in the China (Hangzhou) region over the Internet.
ossimport uploads data from the ECS instances to the OSS bucket within the same region over the internal network.
Fees
The following fees are incurred during migration:
Fee type | Description |
Source access fees | Fees for accessing the migration source |
Destination access fees | Fees for accessing the destination bucket |
Outbound traffic fees | Outbound traffic fees charged by the migration source |
ECS instance fees | Fees for the ECS instances used during migration |
Data storage fees | Proportional to migration duration for data volumes greater than 1 TB |
ECS instance fees are lower than outbound traffic fees and storage fees. Using more ECS instances reduces migration time and can lower overall costs.
Procedure
Step 1: Migrate historical data
Migrate all data last modified before time T1.
See Migration.
T1 is a UNIX timestamp representing the number of seconds that have elapsed since January 1, 1970, 00:00:00 UTC. Run the following command to obtain T1:
date +%sStep 2: Configure mirroring-based back-to-origin rules
The origin continues to generate new data during the migration. To ensure business continuity and a seamless switchover, configure mirroring-based back-to-origin rules on the destination OSS bucket. After you configure these rules, if a requested object does not exist in OSS, OSS retrieves the object from the origin and returns it to the requester.
See Overview.
Step 3: Migrate incremental data
In the job.cfg configuration file, set the importSince parameter to T1 and restart the migration task. This migrates all incremental data generated between T1 and T2.
Step 4: Switch your business system to OSS
Switch all read and write operations in your business system to OSS. Record this time as T2.
After this step is complete:
All read and write operations are performed on OSS.
Data stored in third-party storage is only a copy of historical data. Retain or delete it based on your business requirements.
ossimport only migrates and verifies data. It does not delete data.