This topic describes how to migrate data from Hadoop Distributed File System (HDFS) to JindoFileSystem (JindoFS) that stores data in Object Storage Service (OSS).

Migrate data

  • Use Hadoop FS shell commands

    You can run File System (FS) shell commands to migrate a small amount of data:

    • hadoop dfs -cp hdfs://emr-cluster/README.md jfs://emr-jfs/
    • hadoop dfs -cp oss://oss_bucket/README.md jfs://emr-jfs/
  • Use Hadoop DistCp

    You can use DistCp, a built-in tool of Hadoop, to migrate a large amount of data:

    • hadoop distcp hdfs://emr-cluster/files jfs://emr-jfs/output/
    • hadoop distcp oss://oss_bucket/files jfs://emr-jfs/output/
    Note For more information about DistCp parameters, see DistCp Version2 Guide.

Use the cache mode

In cache mode, JindoFS stores data files as objects in OSS without changing the metadata and data. When you access these OSS objects, JindoFS can cache data and metadata of these OSS objects in the local cluster so that you can quickly access them next time. For more information, see Use the cache mode.