All Products
Document Center


Last Updated:May 13, 2024

This topic describes the architecture of E-MapReduce (EMR).

The following figure shows the architecture of EMR. Architecture

EMR consists of four types of services:

  • Open source services

    Apache big data services, such as Hadoop, Hive, and HBase, are integrated into EMR. The versions of the open source services are updated with EMR versions. For more information, see release notes in Overview.


    You are not allowed to update the version of a service in an existing EMR cluster.

  • Open source services enhanced by EMR

    EMR enhances the performance and features of some open source services. Examples:

    The Z-ordering and Data Skipping features are added to Delta Lake. For more information, see Overview.

  • Self-developed services of EMR

    EMR provides the following self-developed services, which ensure that open source components and services can better run on the Alibaba Cloud infrastructure:

    • Shuffle Service is an extended component of EMR. It is used to optimize the shuffle operations of computing engines. For more information, see ESS overview.

    • SmartData optimizes storage, caching, and computing for various EMR computing engines in a centralized manner and extends storage features. For more information, see SmartData.

  • Alibaba Cloud services

    EMR connects to both open source big data ecosystems and the Alibaba Cloud ecosystem. You can deploy EMR clusters on Alibaba Cloud Elastic Compute Service (ECS) instances or Container Service for Kubernetes (ACK) clusters and store data in Alibaba Cloud Object Storage Service (OSS). You can learn and use Machine Learning Platform for AI (PAI) in an EMR Data Science cluster. EMR is integrated into DataWorks, and you can use EMR as a job computing engine or data storage engine in DataWorks.