All Products
Search
Document Center

E-MapReduce:What is EMR on ECS?

Last Updated:Aug 29, 2023

EMR on ECS allows you to deploy E-MapReduce (EMR) clusters on Elastic Compute Service (ECS) instances. EMR on ECS combines the big data processing capabilities of EMR with the containerized deployment advantages of ECS. This allows you to flexibly configure and manage EMR clusters and better adapt to complex data processing and analytics scenarios. You can use EMR on ECS to quickly create, manage, and maintain EMR clusters and efficiently use computing and storage resources.

Benefits

EMR allows you to easily deploy enterprise-level open source big data services, such as Hadoop, Spark, Flink, Kafka, and HBase.

  • All components in EMR are open source. EMR adapts to and optimizes open source components and provides higher performance than the open source versions of the components.

  • Preemptible instances can help reduce costs based on the time-based auto scaling capability.

  • Computing and storage are decoupled to support the elastic use of resources.

  • You can create or scale out a cluster within minutes. You do not need to manually deploy or start services.

Billing

EMR on ECS supports the following billing methods:

  • Subscription: You pay for resources based on a specific subscription duration before you can use the resources.

  • Pay-as-you-go: You can use resources before you pay for the resources. You can purchase and release resources based on your business requirements.

For more information about the billing rules, see Billing overview.

Comparison between Alibaba Cloud EMR clusters and self-managed Hadoop clusters

The following table compares Alibaba Cloud EMR clusters and self-managed Hadoop clusters.

Item

EMR cluster

Self-managed Hadoop cluster

Cost

You are charged for the resources on a subscription or pay-as-you-go basis. You can adjust the resources in an EMR cluster in a flexible manner and store data at different layers. The resource utilization is high. No additional software license fees are generated.

Resources are estimated in advance and are relatively fixed. The resource utilization is low. A Hadoop distribution is used. Therefore, additional license fees are generated.

Performance

The performance is significantly improved.

Open source component versions are used. You need to optimize performance based on your business requirements.

Ease of use

EMR Hadoop clusters can be started in minutes to quickly respond to business requirements.

You must purchase servers and deploy Hadoop components. It may take several weeks to create a self-managed cluster.

Elasticity

You can temporarily start and delete clusters based on jobs. Cluster resources can be dynamically adjusted by cluster load or in the specified period of time. JindoFS uses a compute-storage separated architecture. You can separately scale computing and storage resources.

A compute-storage integrated architecture is used. Resources are relatively fixed and cannot be adjusted in a flexible manner.

Security

Enterprises can manage resources based on the multi-tenancy capability that is provided by EMR clusters, manage permissions on tables, columns, and rows, and audit logs. Data encryption is supported.

You need to configure the multi-tenancy capability. The multi-tenancy capability requires optimization and cannot meet the requirements of enterprises.

Reliability

EMR clusters are verified in the environments of large-scale enterprises. EMR clusters are continuously upgraded based on open source software versions and pass professional compatibility tests. Therefore, EMR clusters provide better user experience than self-managed clusters.

You must upgrade open source components, verify the version compatibility of different components, and fix bugs.

Service support

Professional and senior big data teams can provide after-sales support.

Service support is unavailable, and additional license fees and service fees are generated for the Hadoop distribution that you use.