This topic describes how to migrate data to Object Storage Service (OSS) or OSS-HDFS.
Migrate data to OSS
You can migrate data from local devices, third-party storage devices, or a source OSS bucket to a destination OSS bucket. The following table describes the methods that you can use to migrate data to OSS.
Migration method | Description | References |
Data Online Migration | Migrate data from third-party storage devices to OSS or between OSS buckets across accounts, across regions or in the same region. You do not need to set up an environment for migration tasks. You can submit a migration job online and monitor the migration process. | |
ossimport | Migrate historical data from various sources to OSS in batches, including local storage devices, Qiniu Cloud Object Storage (KODO), Baidu Object Storage (BOS), Amazon Simple Storage Service (Amazon S3), Azure Blob, UPYUN Storage Service (USS), Tencent Cloud Object Service (COS), Kingsoft Standard Storage Service (KS3), HTTP, and OSS. The sources can be expanded based on your business requirements. | |
ossutil | Migrate large amounts of historical data from various sources to OSS in batches. | |
Mirroring-based back-to-origin | Seamlessly migrate data from origins to OSS. You can migrate your business from origins or other cloud services to OSS without service interruption. After ossimport migrates historical data to OSS and the business runs in OSS, if requested data is not in OSS, mirroring-based back-to-origin is triggered to retrieve data from the origins and download data to OSS. For example, you can use mirroring-based back-to-origin rules to migrate your business from a self-managed origin or from another cloud service to OSS without service interruption. You can use mirroring-based back-to-origin rules during migration to obtain data that is not migrated to OSS. This ensures business continuity. | |
Cross-region replication (CRR) | Replicate objects from one OSS bucket to another OSS bucket in a different region. Note
| |
Data Transport | Migrate terabytes to petabytes of data from a local data center to OSS. | |
OSS API or OSS SDK | Use OSS API or OSS SDK to programmatically migrate data to OSS. This migration method is especially suitable for developers. | |
OSS external tables (gpossext) | Use the OSS external table (gpossext) feature of AnalyticDB for PostgreSQL to import data from or export data to OSS tables. | |
Jindo DistCp | Copy files within or between large-scale clusters. Jindo DistCp uses MapReduce to distribute files, handle errors, and restore data. The lists of files and directories are used as the input of the MapReduce tasks. Each task copies specific files and directories in the input list. |
Migrate data to OSS-HDFS
OSS-HDFS (JindoFS) is a cloud-native data lake storage service. OSS-HDFS provides centralized metadata management capabilities and is fully compatible with Hadoop Distributed File System (HDFS) API. OSS-HDFS also supports Portable Operating System Interface (POSIX). You can use OSS-HDFS to manage data in data lake-based computing scenarios in the big data and AI fields. You can migrate data to OSS-HDFS or between buckets for which OSS-HDFS is enabled. The following table describes the methods that you can use to migrate data to OSS-HDFS.
Migration method | Description | References |
Jindo DistCp | Copy files within or between large-scale clusters. Jindo DistCp uses MapReduce to distribute files, handle errors, and restore data. The lists of files and directories are used as the input of the MapReduce tasks. Each task copies specific files and directories in the input list. | |
JindoDistJob | Migrate full or incremental metadata of files from a semi-hosted JindoFS cluster to OSS-HDFS without the need to copy data blocks. | |
MoveTo command of JindoTable | Automatically update metadata after the command copies the underlying data. This way, data in a table or partitions can be fully migrated to the destination path. If you want to migrate a large number of partitions at the same time, you can specify filter conditions for the MoveTo command. JindoTable also provides protective measures to ensure data integrity and security when the MoveTo command is used to migrate data. | Use the JindoTable MoveTo command to migrate Hive tables and partitions to OSS-HDFS |