E-MapReduce (EMR) is a cloud-native open source big data platform that provides easy-to-integrate open source big data computing and storage engines, such as Hadoop, Hive, Spark, StarRocks, Flink, Presto, and ClickHouse. EMR computing resources can be flexibly scaled. You can deploy EMR clusters on top of Alibaba Cloud Elastic Compute Service (ECS), Container Service for Kubernetes (ACK), or a serverless architecture.
Benefits

-
Full Compatibility with Open Source Components
EMR is 100% built on open source components and evolves with the iterations of open source component versions.

-
High Security and Reliability
EMR allows you to create a big data computing environment within minutes. Features such as intelligent diagnostics and analysis, Kerberos authentication, and data encryption are supported.

-
Cost-effectiveness
Computing resources are used on demand, hot and cold data is stored at different layers, and preemptible Alibaba Cloud instances are supported.

-
Elastic Resources
Cluster resources can be dynamically adjusted by Cluster workload or in the specified period of time. Auto scaling for clusters can be completed within minutes, and multiple elastic resource types are supported.
Features
Ease of Use
Environment Building
EMR allows you to create an EMR cluster in the EMR console or by calling an API operation within minutes. You can easily deploy the open source big data framework without the need to focus on the underlying deployment of hardware and software.
Resource Scaling
EMR allows you to increase or decrease the number of nodes in an EMR cluster in the EMR console or by calling API operations. You can easily configure managed auto scaling rules to enable EMR to automatically manage computing resources to meet your usage and performance requirements. This helps improve cluster utilization and reduce costs.
Service Configuration
EMR allows you to quickly add services provided by EMR, monitor the status of the services, configure the services, and perform O&M operations on the services and its components. You can modify the configurations of services running on an EMR cluster, such as Apache Hadoop, Apache Spark, Apache Hive, and Hue, without the need to restart the cluster or release the cluster and create another cluster. This way, EMR can apply the new configurations and restart the reconfigured services as expected.
Convenient Integration
EMR allows you to apply specific configurations in the EMR console or by using SDKs or a CLI.
Development and Scheduling
EMR Workflow is a serverless platform for interactive data analysis and exploration. It meets the data processing requirements of big data and AI, and provides a visualized development environment for data engineers, data analysts, and data scientists. EMR Notebook allows you to write, debug, and execute code by using multiple languages, such as SQL, Python, and Markdown. EMR Notebook is a fully managed service that is fully compatible with open source Apache DolphinScheduler and can be used to schedule workflows and jobs. EMR Workflow provides easy-to-use scheduling services. You can manage workflows and jobs with ease based on a visualized operation interface and efficiently build data warehouses. This ensures that production jobs can be run stably. EMR can connect to DataWorks. In DataWorks, you can create nodes such as Hive, Spark SQL, Presto, and MapReduce nodes based on an EMR compute engine. You can also configure a workflow, schedule nodes in the workflow on a regular basis, manage metadata, and configure monitoring rules to monitor data quality. This way, you can develop and govern data lakes in a centralized manner.
Scalability and Elasticity
Serverless
The serverless architecture provides extreme resource elasticity and stability, and supports auto scaling of resources based on the business load and second-level billing. EMR serverless instances do not use fixed specifications. The computing resources of an instance are automatically scaled within the range that you specify based on your workloads. This prevents waste of resources and reduces O&M costs.
Auto Scaling
EMR on ECS supports multiple types of auto scaling rules. EMR can automatically scale out or in cluster computing resources by time or load within minutes.
Cost Optimization
More Billing Methods
EMR provides multiple billing methods, including subscription, pay-as-you-go, and preemptible instances. For short-term use, we recommend that you use the pay-as-you-go billing method. For long-term use, we recommend that you use the subscription billing method. Alibaba Cloud provides lower prices for longer subscription durations.
YiTian ARM Architecture
EMR on ECS supports the YiTian ARM architecture. Self-developed YiTian 710 chips help implement collaboration between software and hardware and improve cost-effectiveness by more than 40%.
Monitoring and Diagnostics
Cluster Monitoring
EMR provides various service and host monitoring metrics to quickly locate service and host exceptions in a visualized manner.
Event Center
EMR provides various types of events, such as service events, console-related events, and host events. This helps you quickly and specifically identify cluster issues and trace the causes of issues.
Diagnostic Analysis
EMR supports analysis of HDFS cold and hot data and small files to optimize service performance.
Upgraded Support For You
1 on 1 Presale Consultation, 24/7 Technical Support, Faster Response, and More Free Tickets.
1 on 1 Presale Consultation
24/7 Technical Support
6 Free Tickets per Quarter