This topic describes the deployment topology of Hadoop Distributed File System (HDFS) components in a non-high-availability (HA) cluster and in an HA cluster.
Non-HA cluster
| Node | Component | Description |
|---|---|---|
| master-1-1, or emr-header-1 in some versions | NameNode | The NameNode provides external read and write services. |
| Secondary NameNode | The Secondary NameNode merges the EditLog files of the NameNode to the FsImage files to accelerate the restart of the NameNode. | |
| core-1-1 or emr-worker-x | DataNode | The DataNode serves as a data disk on a node to manage and store the data blocks of HDFS. |
HA cluster
| Node | Component | Description |
|---|---|---|
| master-1-1, or emr-header-1 in some versions | ZKFailoverController (ZKFC) | The ZKFC is an independent process that runs to perform the primary/secondary election and switchover based on the status of the NameNode. |
| NameNode | In a group of NameNodes, the primary NameNode that is in the Active state provides read and write services, and the other secondary NameNodes are in the Standby state. Only the primary NameNode can provide external read and write services. | |
| JournalNode | The JournalNode stores the EditLog files of the NameNode. In most cases, three JournalNodes are used as a group. The NameNode can provide services only when two JournalNodes are healthy and data can be written to the JournalNodes. | |
| ZooKeeper | The ZooKeeper in the E-MapReduce (EMR) cluster. The ZKFC implements elections based on the capabilities provided by ZooKeeper. The HA statuses of other components also depend on ZooKeeper. | |
| master-1-2, or emr-header-2 in some versions | ZKFC | The ZKFC is an independent process that runs to perform the primary/secondary election and switchover based on the status of the NameNode. |
| NameNode | In a group of NameNodes, the primary NameNode that is in the Active state provides read and write services, and the other secondary NameNodes are in the Standby state. Only the primary NameNode can provide external read and write services. | |
| JournalNode | The JournalNode stores the EditLog files of the NameNode. In most cases, three JournalNodes are used as a group. The NameNode can provide services only when two JournalNodes are healthy and data can be written to the JournalNodes. | |
| ZooKeeper | The ZooKeeper in the EMR cluster. The ZKFC implements elections based on the capabilities provided by ZooKeeper. The HA statuses of other components also depend on ZooKeeper. | |
| master-1-3, or emr-header-3 or emr-worker-1 in some versions | *ZKFC | The ZKFC is an independent process that runs to perform the primary/secondary election and switchover based on the status of the NameNode. Note By default, three groups of ZKFC and NameNode are deployed for an HA cluster that uses Hadoop 3.x in EMR V5.8.0 or later. A group of ZKFC and NameNode is deployed on master-1-3. |
| *NameNode | In a group of NameNodes, the primary NameNode that is in the Active state provides read and write services, and the other secondary NameNodes are in the Standby state. Only the primary NameNode can provide external read and write services. Note By default, three groups of ZKFC and NameNode are deployed for an HA cluster that uses Hadoop 3.x in EMR V5.8.0 or later. A group of ZKFC and NameNode is deployed on master-1-3. | |
| JournalNode | The JournalNode stores the EditLog files of the NameNode. In most cases, three JournalNodes are used as a group. The NameNode can provide services only when two JournalNodes are healthy and data can be written to the JournalNodes. | |
| ZooKeeper | The ZooKeeper in the EMR cluster. The ZKFC implements elections based on the capabilities provided by ZooKeeper. The HA statuses of other components also depend on ZooKeeper. | |
| core-1-1 or emr-worker-x | DataNode | The DataNode serves as a data disk on a node to manage and store the data blocks of HDFS. |