All Products
Search
Document Center

E-MapReduce:FAQ about Jindo DistCp

Last Updated:Jun 01, 2023

This topic provides answers to some frequently asked questions about Jindo DistCp.

What do I do if objects are listed at a low speed?

  • Problem description

    When I use Jindo DistCp, objects are listed at a low speed, and the following message is returned:

    Successfully list objects with prefix xxx/yyy/ in bucket xxx recursive 0 result 315 dur 100036.615031MS

    In the message, dur 100036.615031MS indicates the time taken to list objects, in milliseconds. In normal cases, 1,000 Object Storage Service (OSS) objects can be listed within 1 second. You can determine whether the time taken to list objects in a directory is normal based on the normal speed. For example, the preceding message shows that 100 seconds are taken to list 315 objects in a directory. This is abnormal.

  • Solution

    Run the following command to increase the memory of your client:

    export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx4096m"

What do I do if a checksum-related error occurs?

  • Problem description

    The following error message is reported when Jindo DistCp is used:

    Failed to get checksum store.
  • Solution

    By default, OSS-HDFS uses the checksum algorithm COMPOSITE_CRC. If the dfs.checksum.combine.mode parameter of HDFS is set to MD5MD5CRC, you need to change the value of the fs.oss.checksum.combine.mode parameter to MD5MD5CRC. Sample command:

    hadoop jar jindo-distcp-${version}.jar --src /data --dest oss://destBucket/ --hadoopConf fs.oss.checksum.combine.mode=MD5MD5CRC

What do I do if an error occurs when I copy an Object Storage Service (OSS) object to OSS-HDFS?

  • Problem description

    The following error message is returned when Jindo DistCp is used to copy an OSS object to OSS-HDFS:

    Exception raised while copying data file, verify checksum failed
  • Solution

    If the objects in OSS are not migrated from HDFS to OSS by using Jindo DistCp, you must configure the --disableChecksum parameter to disable the checksum feature. Sample command:

    hadoop jar jindo-distcp-${version}.jar --src oss://ossBucket/ --dest oss://dlsBucket/ --disableChecksum

How do I check whether Jindo DistCp is successfully run?

If you do not add the --ignore parameter when you run Jindo DistCp and an exception occurs during the copy process, the system reports an error and stops the copy operation. If you add the --ignore parameter when you run Jindo DistCp, you can view the information about Jindo DistCp counters, such as COPY_FAILED and CHECKSUM_DIFF, to check whether data is complete. For more information, see Jindo DistCp counters in the Use Jindo DistCp topic.