Common HDFS commands - E-MapReduce - Alibaba Cloud Documentation Center

In an E-MapReduce (EMR) cluster, you can run hadoop fs commands to perform operations on files in Hadoop Distributed File System (HDFS). This topic describes the common commands of HDFS.

Prerequisites

Before you run any commands, make sure that the following conditions are met:

Cluster access: You have logged on to a node in the cluster, typically the master node, using a method such as Secure Shell (SSH).
User permissions: The account that you use, such as the default hadoop user, must have read and write permissions for the destination HDFS path. For clusters with Kerberos authentication enabled, you must first perform identity authentication.

Command versions

Hadoop provides two equivalent command formats for file system operations:

hdfs dfs <args>: This command is specific to HDFS.
hadoop fs <args>: This is a generic file system command. It can operate on multiple Hadoop-compatible file systems, such as HDFS and the local file system (file:///).

All examples in this topic use hadoop fs.

Command cheat sheet

The following table provides a quick reference for the most common HDFS commands.

Command	Description	Common syntax
mkdir	Creates a new directory in HDFS.	`hadoop fs -mkdir [-p] <paths>`
touchz	Creates an empty file of 0 bytes in HDFS.	`hadoop fs -touchz URI [URI ...]`
ls	Lists the files and directories in a specified path and their basic information.	`hadoop fs -ls [-h] [-R] [-t] <args>`
put	Copies one or more files from the local file system (on the EMR node where the command is run) to HDFS.	`hadoop fs -put [-f] [-p] <localsrc> <dst>`
get	Copies files or directories from HDFS to the local file system (on the EMR node where the command is run).	`hadoop fs -get [-f] [-p] <src> <localdst>`
cp	Copies files or directories within HDFS.	`hadoop fs -cp [-f] URI [URI ...] <dest>`
mv	Moves or renames files or directories within HDFS.	`hadoop fs -mv URI [URI ...] <dest>`
rm	Deletes files or directories in HDFS.	`hadoop fs -rm [-f] [-r] [-skipTrash] URI [URI ...]`
cat	Displays the content of files in HDFS.	`hadoop fs -cat URI [URI ...]`
du	Displays the size of a file or the total size of all files in a directory.	`hadoop fs -du [-s] [-h] URI [URI ...]`

For more information about HDFS commands, see the Apache Hadoop official website.

Directory and file management

mkdir: Create a directory

Creates a new directory in HDFS.

Syntax
```
hadoop fs -mkdir [-p] <paths>
```
Parameter description
- -p: Creates all parent directories in the path if they do not exist. This is similar to the mkdir -p command in Linux. This parameter is commonly used in production environments to prevent errors that occur if a parent directory does not exist.
Example: Create the /dir directory in the HDFS file system.
```
hadoop fs -mkdir /dir
```

touchz: Create an empty file

Creates an empty file of 0 bytes in HDFS.

Syntax
```
hadoop fs -touchz URI [URI ...]
```
Scenarios
- To serve as a marker file that indicates a task is complete.
- To create an empty output file before data processing.
Example: Create the emptyfile.txt file in the /dir/ directory of the HDFS file system.
```
hadoop fs -touchz /dir/emptyfile.txt
```

ls: List files and directories

Lists the files and directories in a specified path and their basic information.

Syntax
```
hadoop fs -ls [-h] [-R] [-t] <args>
```
Parameter descriptions
- -h: Displays file sizes in a human-readable format, such as 1K, 234M, or 2G.
- -R: Recursively lists the content of all subdirectories.
- -t: Sorts by modification time, with the newest files or directories displayed first.
Example: View the created dir directory.
```
hadoop fs -ls /dir
```

File transfer

put: Upload files to HDFS

Copies one or more files from the local file system (on the EMR node where the command is run) to HDFS.

Syntax

hadoop fs -put [-f] [-p] <localsrc> <dst>

Parameter descriptions
- -f: Forces the overwrite of an existing file at the destination path.
- -p: Preserves file access and modification times, ownership, and permissions.
Example: Upload the local file hello.txt to the /dir/sub-dir path in HDFS.
```
hadoop fs -put hello.txt /dir/sub-dir
```

get: Download files from HDFS

Copies files or directories from HDFS to the local file system (on the EMR node where the command is run).

Syntax

hadoop fs -get [-f] [-p] <src> <localdst>

Parameter descriptions
- -f: Forces the overwrite of an existing file at the destination path.
- -p: Preserves file access and modification times, ownership, and permissions.
Example: Copy the HDFS file /dir/emptyfile.txt to the local / path.
```
hadoop fs -get /dir/emptyfile.txt /
```

File operations

cp: Copy files or directories

Copies files or directories within HDFS.

Syntax
```
hadoop fs -cp [-f] URI [URI ...] <dest>
```
Parameter description
- -f: Forces the overwrite of an existing file at the destination path.
Example: Copy the hello_world.txt file from the /dir/sub-dir/ directory to the /tmp directory.
```
hadoop fs -cp /dir/sub-dir/hello_world.txt /tmp
```

mv: Move or rename files or directories

Moves or renames files or directories within HDFS. This is an atomic operation. When you move files within the same file system, data blocks are not moved. Only the metadata is updated, which makes the operation very fast.

Syntax
```
hadoop fs -mv URI [URI ...] <dest>
```
Examples
- Move the hello_world2.txt file from the /tmp/ directory to the /dir/sub-dir/ directory.
```
hadoop fs -mv /tmp/hello_world2.txt /dir/sub-dir/
```
- Move the test directory from the /tmp/ path to the /dir/sub-dir/ directory.
```
hadoop fs -mv /tmp/test /dir/sub-dir/
```

rm: Delete files or directories

Deletes files or directories in HDFS.

Syntax

hadoop fs -rm [-f] [-r] [-skipTrash] URI [URI ...]

Parameter descriptions
- -r: Recursively deletes a directory and all its contents. This parameter is required to delete a directory.
- -f: Forces the deletion. The command does not report an error if the file does not exist.
- -skipTrash: Permanently deletes the file or directory and skips the recycle bin. Use this option with extreme caution. By default, deleted items are moved to the current user's recycle bin directory, which is /user/<username>/.Trash/.
Example: Delete the hello_world2.txt file from the /dir/sub-dir/ directory in HDFS.
```
hadoop fs -rm /dir/sub-dir/hello_world2.txt
```

The hadoop fs -rmr command is deprecated. Use hadoop fs -rm -r instead to recursively delete directories.

File viewing

cat: View file content

Displays the content of files in HDFS.

Syntax
```
hadoop fs -cat URI [URI ...]
```
Examples
- View the content of the hello.txt file.
```
hadoop fs -cat /hello.txt
```
- View the content of the hello_world.txt file in the /dir/sub-dir/ directory.
```
hadoop fs -cat /dir/sub-dir/hello_world.txt
```

du: Display file size

Displays the size of a file or the total size of all files in a directory.

Syntax
```
hadoop fs -du [-s] [-h] URI [URI ...]
```
Parameter descriptions
- -s: Displays a summary of the total size instead of the size of each file or directory.
- -h: Displays sizes in a human-readable format.
Examples
- View the size of a file.
```
hadoop fs -du /hello.txt
```
- View the total size of all files in a directory.
```
hadoop fs -du /dir
```

FAQ

Permission denied
- Cause: The current user does not have the required read, write, or execute permission for the destination file or directory.
- Solution:
  1. Run the hdfs dfs -ls <parent_dir> command to check the permissions of the file or its parent directory.
  2. Contact an administrator to grant permissions using the chmod or chown command.
  3. If Kerberos is enabled in the environment, confirm that you have obtained a ticket using the kinit command.
SafeModeException: NameNode is in safe mode
- Cause: The NameNode enters safe mode during startup and does not accept write operations during this period.
- Solution: Wait a few minutes for the NameNode to automatically exit safe mode. You can run the hdfs dfsadmin -safemode get command to check the status. Do not manually force an exit from safe mode unless it is an emergency.
No such file or directory
- Cause: The specified path does not exist.
- Solution: Check the path for spelling errors. If you are writing to a file, make sure its parent directory exists. You can also use the -p parameter with the mkdir command to create parent directories.
StandbyException: Operation category READ is not supported in state standby
- Cause: In a high availability (HA) configuration, a read or write request was sent to a NameNode that is in the Standby state.
- Solution: Check the Hadoop configuration file (core-site.xml) to make sure that fs.defaultFS points to the HA NameService name, such as hdfs://mycluster, instead of a specific NameNode hostname.

References

If high availability (HA) is enabled for your cluster, see HDFS High Availability (HA) Commands (HaAdmin).
To migrate data between Hadoop clusters, or between HDFS and Object Storage Service (OSS), see Hadoop DistCp.