All Products
Search
Document Center

Lindorm:Migrate data from OSS

Last Updated:Mar 30, 2026

Use Hadoop DistCp to copy data from an Object Storage Service (OSS) bucket to a Lindorm file database in a single MapReduce job.

Prerequisites

Before you begin, ensure that you have:

Install the JindoFS SDK

Install the JindoFS SDK on every node in your Hadoop cluster before running the migration.

  1. Download jindofs-sdk.jar from the JindoFS SDK repository, then copy it to the Hadoop library directory:

    cp ./jindofs-sdk-*.jar ${HADOOP_HOME}/share/hadoop/hdfs/lib/
  2. Add the following environment variable to /etc/profile on each node:

    export B2SDK_CONF_DIR=/etc/jindofs-sdk-conf
  3. Create the JindoFS SDK configuration file at /etc/jindofs-sdk-conf/bigboot.cfg:

    [bigboot]
    logger.dir=/tmp/bigboot-log
    
    [bigboot-client]
    client.oss.retry=5
    client.oss.upload.threads=4
    client.oss.upload.queue.size=5
    client.oss.upload.max.parallelism=16
    client.oss.timeout.millisecond=30000
    client.oss.connection.timeout.millisecond=4000
  4. Load the environment variable:

    source /etc/profile
  5. Verify that your OSS bucket is accessible from the Hadoop cluster:

    ${HADOOP_HOME}/bin/hadoop fs -ls oss://<accessKeyId>:<accessKeySecret>@<bucket-name>.<endpoint>/

    If the command returns the bucket contents without errors, the SDK is configured correctly.

Migrate data from the OSS bucket

  1. Check the size of the data to migrate:

    ${HADOOP_HOME}/bin/hadoop du -h oss://<accessKeyId>:<accessKeySecret>@<bucket-name>.<endpoint>/test_data
  2. Run DistCp to start a MapReduce job that copies the data to the Lindorm file database:

    ${HADOOP_HOME}/bin/hadoop distcp \
      oss://<accessKeyId>:<accessKeySecret>@<bucket-name>.<endpoint>/test_data.txt \
      hdfs://<instance-id>/

    Replace <instance-id> with your Lindorm instance ID.

    The following table describes the parameters:

    Parameter Description Required
    accessKeyId The AccessKey ID used to authenticate OSS API calls. To get your AccessKey pair, see Create an AccessKey pair. Yes
    accessKeySecret The AccessKey Secret used to authenticate OSS API calls. Yes
    bucket-name.endpoint The OSS bucket access address, consisting of the bucket name and the endpoint for the region where the bucket is deployed. Yes
  3. Check the job output. The migration is complete when the output shows:

    • map 100% reduce 0%

    • Job job_xxx completed successfully

    • BYTESCOPIED equals BYTESEXPECTED

    Example output:

    20/09/29 12:23:59 INFO mapreduce.Job:  map 100% reduce 0%
    20/09/29 12:23:59 INFO mapreduce.Job: Job job_1601195105349_0015 completed successfully
    20/09/29 12:23:59 INFO mapreduce.Job: Counters: 38
     File System Counters
      FILE: Number of bytes read=0
      FILE: Number of bytes written=122343
      FILE: Number of read operations=0
      FILE: Number of large read operations=0
      FILE: Number of write operations=0
      HDFS: Number of bytes read=470
      HDFS: Number of bytes written=47047709
      HDFS: Number of read operations=15
      HDFS: Number of large read operations=0
      HDFS: Number of write operations=4
      OSS: Number of bytes read=0
      OSS: Number of bytes written=0
      OSS: Number of read operations=0
      OSS: Number of large read operations=0
      OSS: Number of write operations=0
     Job Counters
      Launched map tasks=1
      Other local map tasks=1
      Total time spent by all maps in occupied slots (ms)=5194
      Total time spent by all reduces in occupied slots (ms)=0
      Total time spent by all map tasks (ms)=5194
      Total vcore-milliseconds taken by all map tasks=5194
      Total megabyte-milliseconds taken by all map tasks=5318656
     Map-Reduce Framework
      Map input records=1
      Map output records=0
      Input split bytes=132
      Spilled Records=0
      Failed Shuffles=0
      Merged Map outputs=0
      GC time elapsed (ms)=64
      CPU time spent (ms)=2210
      Physical memory (bytes) snapshot=222294016
      Virtual memory (bytes) snapshot=2672074752
      Total committed heap usage (bytes)=110100480
     File Input Format Counters
      Bytes Read=338
     File Output Format Counters
      Bytes Written=0
     org.apache.hadoop.tools.mapred.CopyMapper$Counter
      BYTESCOPIED=47047709
      BYTESEXPECTED=47047709
      COPY=1
    20/09/29 12:23:59 INFO common.AbstractJindoFileSystem: Read total statistics: oss read average -1 us, cache read average -1 us, read oss percent 0%

Verify the migration

Check the size of the data that is migrated to the Lindorm file database:

${HADOOP_HOME}/bin/hadoop fs -du -s -h hdfs://<instance-id>/