Hadoop Distributed File System (HDFS) components store critical data on the local disks of each node. This page lists the directories for both high availability (HA) and non-HA EMR clusters so you can protect or exclude them during disk maintenance.
Warning
If the directory that is used by a component is deleted for no reason, service exceptions and data loss may occur.
Non-HA cluster
In a non-HA cluster, a single active NameNode manages the namespace. SecondaryNameNode handles periodic checkpointing — it merges the EditLog into a new FsImage to prevent the EditLog from growing indefinitely.
| Component | Directory | Stores |
|---|---|---|
| NameNode | /mnt/disk1/hdfs/name | FsImage files of the NameNode in the non-HA cluster. |
| NameNode | /mnt/disk1/hdfs/edit/ | EditLog files of the NameNode in the non-HA cluster. |
| SecondaryNameNode | /mnt/disk1/hdfs/secondary/ | Relevant data of the SecondaryNameNode in the non-HA cluster. SecondaryNameNode merges EditLog files and generates a new FsImage. |
| DataNode | /mnt/disk1/hdfs through /mnt/diskN/hdfs | Data blocks, one directory per data disk. N depends on the number of data disks on a DataNode. For example, a two-disk DataNode uses /mnt/disk1/hdfs and /mnt/disk2/hdfs. |
HA cluster
| Component | Directory | Stores |
|---|---|---|
| ZKFailoverController (ZKFC) | None | ZKFC does not use a local disk directory. |
| NameNode | /mnt/disk1/hdfs/name | FsImage files of a NameNode in the HA cluster. |
| NameNode | /mnt/disk1/hdfs/edit/ | EditLog files of a NameNode in the HA cluster. |
| JournalNode | /mnt/disk1/hdfs/journal/ | EditLog files of a JournalNode in the HA cluster. |
| DataNode | /mnt/disk1/hdfs through /mnt/diskN/hdfs | Data blocks, one directory per data disk. N depends on the number of data disks on a DataNode. For example, a two-disk DataNode uses /mnt/disk1/hdfs and /mnt/disk2/hdfs. |