An E-MapReduce (EMR) cluster consists of three categories of nodes: master, core, and task.
- Master node: runs the NameNode process of Hadoop HDFS and the ResourceManager process of Hadoop YARN.
- Core node: runs the DataNode process of Hadoop HDFS and the NodeManager process of Hadoop YARN.
- Task node: runs the NodeManager process of Hadoop YARN and performs only computing.
A master node is deployed with the management components of cluster services, such as ResourceManager of Hadoop YARN.
You can access web UIs to view the running status of services in a cluster. To test or run a job in a cluster, you can connect to the master node and submit the job on the command line. For more information about how to connect to a master node, see Connect to the master node of an EMR cluster in SSH mode.
Core nodes in a cluster are managed by the master node. Core nodes run the DataNode process of Hadoop HDFS to store all data of a cluster. They also run computing service processes such as NodeManager of Hadoop YARN to run computing tasks.
To cope with the increase of data storage and computing workloads, you can scale out core nodes at any time without affecting the running of the cluster. Core nodes can use different storage media to store data. For more information, see Local disks and Block Storage overview.
Task nodes run only computing tasks. They cannot be used to store HDFS data. If the core nodes of a cluster offer sufficient computing capabilities, task nodes are not required. If the computing capabilities of the core nodes in a cluster become insufficient, you can add task nodes to the cluster at any time. You can run Hadoop MapReduce tasks and Spark executors on these task nodes to provide extra computing capabilities.
Task nodes can be added to or removed from a cluster at any time without any impact on the running of a cluster.