In an E-MapReduce (EMR) cluster, you can run hadoop fs commands to perform operations on files in Hadoop Distributed File System (HDFS). This topic describes the common commands of HDFS.
Prerequisites
Before you run any commands, make sure that the following conditions are met:
Cluster access: You have logged on to a node in the cluster, typically the master node, using a method such as Secure Shell (SSH).
User permissions: The account that you use, such as the default
hadoopuser, must have read and write permissions for the destination HDFS path. For clusters with Kerberos authentication enabled, you must first perform identity authentication.
Command versions
Hadoop provides two equivalent command formats for file system operations:
hdfs dfs <args>: This command is specific to HDFS.hadoop fs <args>: This is a generic file system command. It can operate on multiple Hadoop-compatible file systems, such as HDFS and the local file system (file:///).
All examples in this topic use hadoop fs.
Command cheat sheet
The following table provides a quick reference for the most common HDFS commands.
Command | Description | Common syntax |
Creates a new directory in HDFS. |
| |
Creates an empty file of 0 bytes in HDFS. |
| |
Lists the files and directories in a specified path and their basic information. |
| |
Copies one or more files from the local file system (on the EMR node where the command is run) to HDFS. |
| |
Copies files or directories from HDFS to the local file system (on the EMR node where the command is run). |
| |
Copies files or directories within HDFS. |
| |
Moves or renames files or directories within HDFS. |
| |
Deletes files or directories in HDFS. |
| |
Displays the content of files in HDFS. |
| |
Displays the size of a file or the total size of all files in a directory. |
|
For more information about HDFS commands, see the Apache Hadoop official website.
Directory and file management
mkdir: Create a directory
Creates a new directory in HDFS.
Syntax
hadoop fs -mkdir [-p] <paths>Parameter description
-p: Creates all parent directories in the path if they do not exist. This is similar to themkdir -pcommand in Linux. This parameter is commonly used in production environments to prevent errors that occur if a parent directory does not exist.
Example: Create the /dir directory in the HDFS file system.
hadoop fs -mkdir /dir
touchz: Create an empty file
Creates an empty file of 0 bytes in HDFS.
Syntax
hadoop fs -touchz URI [URI ...]Scenarios
To serve as a marker file that indicates a task is complete.
To create an empty output file before data processing.
Example: Create the emptyfile.txt file in the /dir/ directory of the HDFS file system.
hadoop fs -touchz /dir/emptyfile.txt
ls: List files and directories
Lists the files and directories in a specified path and their basic information.
Syntax
hadoop fs -ls [-h] [-R] [-t] <args>Parameter descriptions
-h: Displays file sizes in a human-readable format, such as 1K, 234M, or 2G.-R: Recursively lists the content of all subdirectories.-t: Sorts by modification time, with the newest files or directories displayed first.
Example: View the created
dirdirectory.hadoop fs -ls /dir
File transfer
put: Upload files to HDFS
Copies one or more files from the local file system (on the EMR node where the command is run) to HDFS.
Syntax
hadoop fs -put [-f] [-p] <localsrc> <dst>Parameter descriptions
-f: Forces the overwrite of an existing file at the destination path.-p: Preserves file access and modification times, ownership, and permissions.
Example: Upload the local file hello.txt to the /dir/sub-dir path in HDFS.
hadoop fs -put hello.txt /dir/sub-dir
get: Download files from HDFS
Copies files or directories from HDFS to the local file system (on the EMR node where the command is run).
Syntax
hadoop fs -get [-f] [-p] <src> <localdst>Parameter descriptions
-f: Forces the overwrite of an existing file at the destination path.-p: Preserves file access and modification times, ownership, and permissions.
Example: Copy the HDFS file /dir/emptyfile.txt to the local / path.
hadoop fs -get /dir/emptyfile.txt /
File operations
cp: Copy files or directories
Copies files or directories within HDFS.
Syntax
hadoop fs -cp [-f] URI [URI ...] <dest>Parameter description
-f: Forces the overwrite of an existing file at the destination path.
Example: Copy the hello_world.txt file from the /dir/sub-dir/ directory to the /tmp directory.
hadoop fs -cp /dir/sub-dir/hello_world.txt /tmp
mv: Move or rename files or directories
Moves or renames files or directories within HDFS. This is an atomic operation. When you move files within the same file system, data blocks are not moved. Only the metadata is updated, which makes the operation very fast.
Syntax
hadoop fs -mv URI [URI ...] <dest>Examples
Move the hello_world2.txt file from the /tmp/ directory to the /dir/sub-dir/ directory.
hadoop fs -mv /tmp/hello_world2.txt /dir/sub-dir/Move the test directory from the /tmp/ path to the /dir/sub-dir/ directory.
hadoop fs -mv /tmp/test /dir/sub-dir/
rm: Delete files or directories
Deletes files or directories in HDFS.
Syntax
hadoop fs -rm [-f] [-r] [-skipTrash] URI [URI ...]Parameter descriptions
-r: Recursively deletes a directory and all its contents. This parameter is required to delete a directory.-f: Forces the deletion. The command does not report an error if the file does not exist.-skipTrash: Permanently deletes the file or directory and skips the recycle bin. Use this option with extreme caution. By default, deleted items are moved to the current user's recycle bin directory, which is/user/<username>/.Trash/.
Example: Delete the hello_world2.txt file from the /dir/sub-dir/ directory in HDFS.
hadoop fs -rm /dir/sub-dir/hello_world2.txt
The hadoop fs -rmr command is deprecated. Use hadoop fs -rm -r instead to recursively delete directories.
File viewing
cat: View file content
Displays the content of files in HDFS.
Syntax
hadoop fs -cat URI [URI ...]Examples
View the content of the hello.txt file.
hadoop fs -cat /hello.txtView the content of the hello_world.txt file in the /dir/sub-dir/ directory.
hadoop fs -cat /dir/sub-dir/hello_world.txt
du: Display file size
Displays the size of a file or the total size of all files in a directory.
Syntax
hadoop fs -du [-s] [-h] URI [URI ...]Parameter descriptions
-s: Displays a summary of the total size instead of the size of each file or directory.-h: Displays sizes in a human-readable format.
Examples
View the size of a file.
hadoop fs -du /hello.txtView the total size of all files in a directory.
hadoop fs -du /dir
FAQ
Permission deniedCause: The current user does not have the required read, write, or execute permission for the destination file or directory.
Solution:
Run the
hdfs dfs -ls <parent_dir>command to check the permissions of the file or its parent directory.Contact an administrator to grant permissions using the
chmodorchowncommand.If Kerberos is enabled in the environment, confirm that you have obtained a ticket using the
kinitcommand.
SafeModeException: NameNode is in safe modeCause: The NameNode enters safe mode during startup and does not accept write operations during this period.
Solution: Wait a few minutes for the NameNode to automatically exit safe mode. You can run the
hdfs dfsadmin -safemode getcommand to check the status. Do not manually force an exit from safe mode unless it is an emergency.
No such file or directoryCause: The specified path does not exist.
Solution: Check the path for spelling errors. If you are writing to a file, make sure its parent directory exists. You can also use the
-pparameter with themkdircommand to create parent directories.
StandbyException: Operation category READ is not supported in state standbyCause: In a high availability (HA) configuration, a read or write request was sent to a NameNode that is in the Standby state.
Solution: Check the Hadoop configuration file (
core-site.xml) to make sure thatfs.defaultFSpoints to the HA NameService name, such ashdfs://mycluster, instead of a specific NameNode hostname.
References
If high availability (HA) is enabled for your cluster, see HDFS High Availability (HA) Commands (HaAdmin).
To migrate data between Hadoop clusters, or between HDFS and Object Storage Service (OSS), see Hadoop DistCp.