Community Blog Big Data Made Simpler with E-MapReduce – Part 1

Big Data Made Simpler with E-MapReduce – Part 1

Part 1 of this 2-part series discusses how E-MapReduce provides a simple and highly effective big data practice.

By Shantanu Kaushik

Big Data plays a major role in strategy building and value extraction from raw data and metrics generated by systems that provide observability and monitoring. Alibaba Cloud has had major product cycles with big data and data analytics with next-generation artificial intelligence and machine learning comprising a host of data analysis solutions. Some of the major products under the Alibaba Cloud data analytics solution include:

  1. E-MapReduce
  2. Alibaba Cloud DataWorks for Big Data
  3. DataV for Visualizations
  4. Quick BI for Business Intelligence
  5. MaxCompute for Data Warehousing

We have discussed all of these products previously except for E-MapReduce (EMR). In this article, we will discuss Alibaba Cloud EMR and how it provides a simple and highly effective big data practice.

What Is E-MapReduce?

Alibaba Cloud Elastic MapReduce (EMR) is a big data processing solution built on the industry-leading Alibaba Cloud Elastic Compute Service (ECS) instances and based on open-source Apache Hadoop and Apache Spark.

Alibaba Cloud E-MapReduce enables you to use the Hadoop and Spark ecosystem components, such as:

  • Apache Hive
  • Apache Kafka
  • Flink
  • Druid
  • TensorFlow

You can use any of the above to enable data analysis and processing. Alibaba Cloud EMR has the functionality to interact with different classes of storage within the Alibaba Cloud ecosystem, including:

Let’s take a look at the architectural flow and inclusions related to Alibaba Cloud E-MapReduce on the chart below:


The chart above depicts the EMR clusters that are created using the Hadoop ecosystem. Alibaba Cloud E-MapReduce clusters can exchange data seamlessly with Object Storage Service (OSS) and ApsaraDB Relational Database Service (RDS). This allows you to work with data throughout the systems hosted on the Alibaba Cloud platform.

Benefits and Features

Alibaba Cloud E-MapReduce (EMR) offers an integrated solution to manage clusters. This takes away all the management complexities related to the EMR clusters. Let’s take a look at the benefits and features of Alibaba Cloud EMR:

1.  Ecosystem Support

Alibaba Cloud EMR supports the Hadoop File System (HDFS) using Object Storage Service (OSS). Imagine a scenario where you need to perform an elastic search. You can do so by utilizing the built-in ES-Hadoop plugin. Alibaba Cloud MaxCompute and E-MapReduce have deep integration links that enable them to read and write data to each other.

Alibaba Cloud MaxCompute is a fully-managed large-scale data warehousing platform with the potential to process exabytes of data. Alibaba Cloud MaxCompute supports the distributed computing model and multiple data importing solutions, making it a multi-tenancy data processing platform. MaxCompute helps you reduce costs, implement data security, and query large datasets.

When it comes to logging, Alibaba Cloud EMR enables Log Service (SLS) to facilitate real-time data input (RTD). With that, you can enable read and write operations on data with Alibaba Cloud Message Service using an SDK.

2.  Workflow Scheduling

Alibaba Cloud EMR supports job and dependency scheduling that allows you to schedule the job and orchestrate them as workflows. E-MapReduce also extends the functionality to orchestrate different types of jobs and perform job editing and management in already scheduled jobs.

Alibaba Cloud EMR has a highly-intelligent scheduling system. It operates with no-fail architecture. As soon as EMR fails to execute a job, an alarm indicating a problem is sent out to the administrator, while EMR automatically re-executes the job. You can also use E-MapReduce to start a temporary cluster and execute jobs on a specific and allocated schedule.

3.  Cluster Deployment and Expansion

Alibaba Cloud E-MapReduce extends the functionality, where you can quickly deploy clusters and expand them as needed. This is the core elasticity the whole solution is based on. You can use the web interface for Alibaba Cloud EMR and enable cluster expansion without managing any hardware or software counterpart.

You have the option of quickly deploying clusters-based Hadoop, Kafka, Druid, and ZooKeeper, with the option of adding any types of nodes to the existing and functional clusters. If you have scheduled multiple jobs, you can enable scheduled cluster creation for added workloads and release these clusters after your job has been executed, making Alibaba Cloud EMR a highly elastic and flexible big data solution.

You have the freedom to add, maintain, or configure different components based on the requirements presented by the system. Alibaba Cloud EMR extends the optimization scenarios, where you can effectively reduce the total cost of ownership and scale compute resources (ECS) in or out at specific times.

4.  General Observations

Alibaba Cloud EMR is easy to use and frees you from configuring any hardware or software resources. All O&M operations can be handled easily using the interactive web interface, as the cluster environment is highly optimized to provide automation with O&M processes. Alibaba Cloud has multiple online support options for the service to help with smooth operation.

Security is a prime concern when we talk about valuable data and its implications with enterprise-grade strategic reforms based on this data. Alibaba Cloud EMR supports Kerberos authentication and data encryption protocols to secure the raw and processed data. As the Alibaba Cloud ecosystem is highly integrated and efficient, you can enable Resource and Access Management (RAM) to provide Identity, user management, and authorization for service permissions.

The entire EMR system allows for seamless scale-in and scale-out based on how big a load is. This elasticity enables high-cost effectiveness and a highly stable architecture to process data.

Continued in Part 2

In Part 2 of this 2-part series, we will discuss E-MapReduce cluster management and explain how it works in real-world scenarios. We will also discuss the various usage scenarios related to Alibaba Cloud EMR and the primary benefits of using EMR over the open-source big-data ecosystem.

Upcoming Articles

  1. Big Data Made Simpler with E-MapReduce – Part 2
0 0 0
Share on

Alibaba Clouder

2,600 posts | 754 followers

You may also like


Alibaba Clouder

2,600 posts | 754 followers

Related Products