Alibaba Cloud Elastic MapReduce (or E-MapReduce) is a big data processing solution that facilitates the processing and analysis of massive amounts of data.
- For more information about HBase Shell, see Apache Hadoop official website.
- For more information about HBase Shell, see Apache Spark official website.
- For more information about HBase Shell, see Apache Hive official website.
- For more information about HBase Shell, see Apache Pig official website.
- For more information about HBase Shell, see Apache HBase official website.
In general, to use a distributed processing system such as Hadoop or Spark, follow these steps:
- Evaluate the business characteristics.
- Select a machine type.
- Purchase a machine.
- Prepare the hardware environment.
- Install an operating system.
- Deploy applications (such as Hadoop and Spark).
- Start a cluster.
- Write applications.
- Run a job.
- Obtain data or perform another operation.
Steps 1-7 are preliminary tasks and may take some time to complete. Steps 8-10, however, concern application logic. E-MapReduce provides an integrated set of cluster management tools, including those used to build, configure, run, and manage clusters, configure and run jobs, as well as select hosts, deploy environments, and monitor performance.
With E-MapReduce, processes such as procurement, preparation, operation, and maintenance are all managed, allowing you to focus on the processing logic of your applications. E-MapReduce also provides flexible combination modes, allowing you to select different cluster services according to your needs. For example, if you want to receive daily statistics or perform simple batch operations, you can choose to only run Hadoop services in E-MapReduce. If you then want to implement stream-oriented and real-time computing at a later stage, you can add in Spark.
Structure of E-MapReduce
Clusters are the core component of E-MapReduce. An E-MapReduce cluster is essentially a Spark or Hadoop cluster that consists of multiple Alibaba Cloud ECS instances. For example, in Hadoop, the daemons that typically run on each ECS instance (such as NameNode, DataNode, ResourceManager, and NodeManager) form a Hadoop cluster. The nodes that run NameNode and ResourceManager are known as master nodes, while those that run DataNode and NodeManager are called slave nodes.
The following figure shows an E-MapReduce cluster that consists of one master node and three slave nodes: