Alibaba Cloud E-MapReduce (EMR) provides you with convenient and controllable open source big data services at the enterprise level. It allows you to easily deploy open source big data services, such as Hadoop, Spark, Flink, Kafka, and HBase.

Stable and reliable open source components

  • Open source components are used. Each version of EMR provides the latest versions of open source components. For more information about the mappings between EMR versions and open source component versions, see Overview.
  • EMR is fully adapted to open source components and has eliminated the version compatibility issues among various open source components.
  • EMR provides an enhanced deployment environment in Alibaba Cloud for open source components. This ensures a much higher performance than that in the open source community.

Cost effectiveness

  • Compared with traditional HDFS-based clusters that use fixed configurations, EMR clusters use auto scaling and tiered storage mechanisms, which help reduce costs by more than 50%.
  • You can create preemptible instances. Compared with the pay-as-you-go billing method, preemptible instances help reduce fees by 50% to 80%. For more information, see Overview.

Ease of use

  • You can create or scale out a cluster within minutes. You do not need to manually deploy or start services.
  • EMR provides a comprehensive monitoring and alerting system, which covers the hardware and Hadoop service of clusters. You can configure alert templates. For more information, see Overview.

Scalability

  • Compute-storage separation: Computing and storage are decoupled to support the elastic use of resources.
  • Custom cluster environment: You can use bootstrap actions and cluster scripts to flexibly configure a cluster environment and deploy third-party optimization or cluster management tools in EMR. For more information, see Bootstrap actions and Cluster scripts.
  • Self-managed maintenance: You can log on to the master node of a cluster, view the logs and deployment environment of the cluster, and optimize the configurations. For more information, see Common file paths.
  • Auto scaling: EMR can automatically scale in or out clusters based on your business requirements.

Deep integration

  • You can deploy EMR clusters based on Alibaba Cloud Elastic Compute Service (ECS) and Container Service for Kubernetes (ACK). Various ECS instance types are supported. You can determine the instance types to use based on your business requirements. For more information, see ECS instances.
  • EMR is integrated into DataWorks. You can use EMR as a job computing and data storage engine in DataWorks.
  • Data Lake Formation (DLF) is integrated into EMR. In data lake scenarios, EMR allows you to manage metadata for multiple engines in a centralized manner.