In an E-MapReduce (EMR) cluster, you can run hadoop fs commands to perform operations on files in Hadoop Distributed File System (HDFS). This topic describes the common commands of HDFS.

Prerequisites

Background information

The following table describes the common commands of HDFS.
Command Description
mkdir

Creates a directory in HDFS.

touchz

Creates an empty file in HDFS.

ls

Views the information about a file or a directory in a specified path after the file or directory is created. When you run this command to view the information about a file or a directory, you must specify an absolute path.

put

Uploads a local file to a specified directory of HDFS.

du

Displays the size of a file or the size of each file stored in a directory.

cat

Views the content of a file in HDFS.

cp

Copies files or directories from the source to the destination in HDFS. The files or directories remain unchanged in the source. You can run this command to copy files from multiple source paths to a destination path. The destination path must be a directory.

mv

Moves files or directories from the source to the destination in HDFS. The files or directories are deleted from the source. You can run this command to move files from multiple source paths to a destination path. The destination path must be a directory.

get

Downloads files that are stored in a specified directory of HDFS to a local directory.

rm

Deletes specified files from HDFS.

rmr

Recursively deletes specified directories in HDFS.

For more information about Apache Hadoop, visit the Apache Hadoop official website.

mkdir

Creates a directory in HDFS.

  • Syntax:
    hadoop fs -mkdir <path1> [path2] ... [pathn]
  • Examples:
    • Create a directory named dir in HDFS.
      hadoop fs -mkdir dir
      You can run the ls command to view the created directory. mkdir-1
    • Create a subdirectory named sub-dir in the dir directory.
      hadoop fs -mkdir /dir/sub-dir
      You can run the ls command to view the created subdirectory. mkdir

touchz

Creates an empty file in HDFS.

  • Syntax:
    hadoop fs -touchz URI [URI ...]
  • Example: Create a file named emptyfile.txt in the /dir/ directory of HDFS.
    hadoop fs -touchz /dir/emptyfile.txt
    You can run the ls command to view the created file. touchz-1

ls

Views the information about a file or a directory in a specified path after the file or directory is created. When you run this command to view the information about a file or a directory, you must specify an absolute path.

Note You cannot run this command to access a specified directory.
  • Syntax:
    hadoop fs -ls <path>
  • Examples:
    • View the information about the hello.txt file.
      hadoop fs -ls hello.txt
      ls-1
    • View the information about the /dir/sub-dir directory.
      hadoop fs -ls /dir/sub-dir
      ls-2

put

Uploads a local file to a specified directory of HDFS.

  • Syntax:
    hadoop fs -put <path1> <path2>
  • Example: Upload the hello.txt file to the /dir/sub-dir directory of HDFS.
    hadoop fs -put hello.txt /dir/sub-dir
    You can run the ls command to view the status of the upload operation. put

du

Displays the size of a file or the size of each file stored in a directory.

  • Syntax:
    hadoop fs -du <path>
  • Examples:
    • View the size of the hello.txt file.
      hadoop fs -du hello.txt
      du-1
    • View the size of each file stored in the /dir directory.
      hadoop fs -du /dir
      du-2

cat

Views the content of a file in HDFS.

  • Syntax:
    hadoop fs -cat <path>
  • Examples:
    • View the content of the hello.txt file.
      hadoop fs -cat hello.txt
      cat-1
    • View the content of the hello_world.txt file that is stored in the /dir/sub-dir/ directory.
      hadoop fs -cat /dir/sub-dir/hello_world.txt
      cat-2

cp

Copies files or directories from the source to the destination in HDFS. The files or directories remain unchanged in the source. You can run this command to copy files from multiple source paths to a destination path. The destination path must be a directory.

Note You can also run this command to rename a file or a directory. This way, the original file or directory is retained.
  • Syntax:
    hadoop fs -cp <path1> <path2>
  • Example: Copy the hello_world.txt file from the /dir/sub-dir/ directory to the /tmp directory.
    hadoop fs -cp /dir/sub-dir/hello_world.txt /tmp
    You can run the ls command to view the status of the copy operation. cp-1

mv

Moves files or directories from the source to the destination in HDFS. The files or directories are deleted from the source. You can run this command to move files from multiple source paths to a destination path. The destination path must be a directory.

  • Syntax:
    hadoop fs -mv <path1> <path2>
  • Examples:
    • Move the hello_world2.txt file from the /tmp/ directory to the /dir/sub-dir/ directory.
      hadoop fs -mv /tmp/hello_world2.txt /dir/sub-dir/
      You can run the ls command to view the status of the move operation. mv-1
    • Move the test directory in the /tmp/ directory to the /dir/sub-dir/ directory.
      hadoop fs -mv /tmp/test /dir/sub-dir/
      You can run the ls command to view the status of the move operation. mv-2

get

Downloads files that are stored in a specified directory of HDFS to a local directory.

  • Syntax:
    hadoop fs -get <path1> <path2>
  • Example: Download the hello_world2.txt file that is stored in the /dir/sub-dir/ directory of HDFS to the local /emr directory.
    hadoop fs -get /dir/sub-dir/hello_world2.txt /emr
    You can run the ls command to view the status of the download operation. get-1

rm

Deletes specified files from HDFS.

  • Syntax:
    hadoop fs -rm <path>
  • Example: Delete the hello_world2.txt file from the /dir/sub-dir/ directory of HDFS.
    hadoop fs -rm /dir/sub-dir/hello_world2.txt
    You can run the ls command to view the status of the delete operation. rm-1

rmr

Recursively deletes specified directories in HDFS.

  • Syntax:
    hadoop fs -rmr <path>
  • Example: Recursively delete the sub-dir directory in the /dir/ directory of HDFS.
     hadoop fs -rmr /dir/sub-dir/
    You can run the ls command to view the status of the recursive delete operation. rmr-1