You can use ossimport to migrate data from local storage, third-party storage, or Object Storage Service (OSS) buckets in a region to OSS buckets in different regions. This topic describes how to use ossimport to migrate data from a third-party storage service to OSS.
Assume that a user has 500 TB of data stored in the Guangzhou region of Tencent Cloud Object Storage (COS). The user wants to use ossimport to migrate the data to a bucket in the China (Hangzhou) region of OSS within a week. During the migration process, business operations must continue to run normally.
- The standalone mode is applicable when you migrate data volumes smaller than 30 TB.
- The distributed mode is suitable for migrating data volumes larger than 30 TB.
To migrate large amounts of data, deploy ossimport in distributed mode.
- Activate OSS. Create a bucket in the China (Hangzhou) region.
- Configure a Resource Access Management (RAM) user and grant OSS access permissions to the RAM user.
Create a RAM user in the RAM console. Authorize the RAM user to access OSS, and then save the AccessKey ID and the AccessKey secret. For more information, see Create a Resource Access Management (RAM) user and grant OSS access permissions to the RAM user.
- (Optional) Purchase an Elastic Compute Service (ECS) instance. The ECS instance and OSS instance are located in the same region, which is China (Hangzhou). For more information about ECS instances, see General-purpose instance families. We recommend that you purchase a pay-as-you-go instance if you want to release the ECS instance after the data is migrated.Note If you want to deploy ossimport to a small number of machines, you can deploy them locally. If you want to deploy ossimport to a large number of machines, we recommend that you deploy them on an ECS instance. An ECS instance is used in the example to show how to perform a migration task.
The number of required ECS instances is calculated based on the formula: Number of required ECS instances = X/Y/(Z/100). In the formula, X indicates the amount of data to be migrated. Y indicates the required duration in days. Z indicates the migration speed in Mbit/s (about Z/100 TB of data to be migrated each day). If the migration speed of an ECS instance reaches 200 Mbit/s (about 2 TB of data is migrated each day), you need to purchase 36 ECS instances (500/7/2) in the preceding example.
- Configure ossimport properly.
To meet the large-scale migration requirements in this example, you must deploy ossimport in distributed mode on ECS. For more information about the configuration and definition of distributed deployment, such as
conf/sys.properties, and concurrency control, see Architectures and configurations. For more information about operations on distributed deployment such as downloading ossimport and troubleshooting common errors during configurations, see Distributed deployment.
The following process describes how to migrate data from a third-party storage service to OSS in distributed mode.
Costs involved in the migration process include the fees incurred when the source and destination buckets are accessed, outbound traffic fees for the source bucket, ECS instance fees, data storage fees. If more than 1 TB of data is to migrate, the storage cost and the migration period increase proportionally. Compared with the data transfer and storage fees, fewer fees are incurred when you use ECS. If more ECS instances are used, the migration period is shortened.
- Migrate all data last modified before T1. For more information, see the Running section in Distributed deployment.Important T1 is a Unix timestamp that indicates the number of milliseconds that have elapsed since the epoch time January 1, 1970, 00:00:00 UTC. You can run the date +%s command to obtain the value.
- Configure a mirroring-based back-to-origin rule. The origin keeps generating new data during the migration. To ensure business continuity and a seamless switchover, you need to configure a back-to-origin rule. When objects that are requested by users do not exist in OSS, OSS retrieves these objects from the origin and returns the objects to the users. For more information, see Overview.
- Switch the read/write operations on the business system to OSS. At this time, the business system records the time at T2.
- Open the job.cfg configuration file. Specify importSince=T1. Reinitiate the migration task to migrate incremental data last modified between T1 and T2. Note
- After Step 4 is complete, all read and write operations on your business system are switched to OSS. Data stored in the third-party storage service is only a copy of historical data, which can be retained or deleted.
- ossimport only migrates and verifies data, but does not delete data.
For more information about ossimport, see the following topics: