E-MapReduce

E-MapReduce An Open Source Big Data Platform Featuring Auto Scaling, High Stability, and Compute-storage Separation in the AI Era.

E-MapReduce (EMR) is a cloud-native open source big data platform that provides easy-to-integrate open source big data computing and storage engines, such as Hadoop, Hive, Spark, StarRocks, Flink, Presto, and ClickHouse. EMR computing resources can be flexibly scaled. You can deploy EMR clusters on top of Alibaba Cloud Elastic Compute Service (ECS), Container Service for Kubernetes (ACK), or a serverless architecture.

Benefits

Full Compatibility with Open Source Components
EMR is 100% built on open source components and evolves with the iterations of open source component versions.
High Security and Reliability
EMR allows you to create a big data computing environment within minutes. Features such as intelligent diagnostics and analysis, Kerberos authentication, and data encryption are supported.
Cost-effectiveness
Computing resources are used on demand, hot and cold data is stored at different layers, and preemptible Alibaba Cloud instances are supported.
Elastic Resources
Cluster resources can be dynamically adjusted by Cluster workload or in the specified period of time. Auto scaling for clusters can be completed within minutes, and multiple elastic resource types are supported.

Features

Ease of Use

Environment Building

EMR allows you to create an EMR cluster in the EMR console or by calling an API operation within minutes. You can easily deploy the open source big data framework without the need to focus on the underlying deployment of hardware and software.

Resource Scaling

EMR allows you to increase or decrease the number of nodes in an EMR cluster in the EMR console or by calling API operations. You can easily configure managed auto scaling rules to enable EMR to automatically manage computing resources to meet your usage and performance requirements. This helps improve cluster utilization and reduce costs.

Service Configuration

EMR allows you to quickly add services provided by EMR, monitor the status of the services, configure the services, and perform O&M operations on the services and its components. You can modify the configurations of services running on an EMR cluster, such as Apache Hadoop, Apache Spark, Apache Hive, and Hue, without the need to restart the cluster or release the cluster and create another cluster. This way, EMR can apply the new configurations and restart the reconfigured services as expected.

Convenient Integration

EMR allows you to apply specific configurations in the EMR console or by using SDKs or a CLI.

Development and Scheduling

EMR Workflow is a serverless platform for interactive data analysis and exploration. It meets the data processing requirements of big data and AI, and provides a visualized development environment for data engineers, data analysts, and data scientists. EMR Notebook allows you to write, debug, and execute code by using multiple languages, such as SQL, Python, and Markdown. EMR Notebook is a fully managed service that is fully compatible with open source Apache DolphinScheduler and can be used to schedule workflows and jobs. EMR Workflow provides easy-to-use scheduling services. You can manage workflows and jobs with ease based on a visualized operation interface and efficiently build data warehouses. This ensures that production jobs can be run stably. EMR can connect to DataWorks. In DataWorks, you can create nodes such as Hive, Spark SQL, Presto, and MapReduce nodes based on an EMR compute engine. You can also configure a workflow, schedule nodes in the workflow on a regular basis, manage metadata, and configure monitoring rules to monitor data quality. This way, you can develop and govern data lakes in a centralized manner.

Scalability and Elasticity

Serverless

The serverless architecture provides extreme resource elasticity and stability, and supports auto scaling of resources based on the business load and second-level billing. EMR serverless instances do not use fixed specifications. The computing resources of an instance are automatically scaled within the range that you specify based on your workloads. This prevents waste of resources and reduces O&M costs.

Auto Scaling

EMR on ECS supports multiple types of auto scaling rules. EMR can automatically scale out or in cluster computing resources by time or load within minutes.

Cost Optimization

More Billing Methods

EMR provides multiple billing methods, including subscription, pay-as-you-go, and preemptible instances. For short-term use, we recommend that you use the pay-as-you-go billing method. For long-term use, we recommend that you use the subscription billing method. Alibaba Cloud provides lower prices for longer subscription durations.

YiTian ARM Architecture

EMR on ECS supports the YiTian ARM architecture. Self-developed YiTian 710 chips help implement collaboration between software and hardware and improve cost-effectiveness by more than 40%.

Monitoring and Diagnostics

Cluster Monitoring

EMR provides various service and host monitoring metrics to quickly locate service and host exceptions in a visualized manner.

Event Center

EMR provides various types of events, such as service events, console-related events, and host events. This helps you quickly and specifically identify cluster issues and trace the causes of issues.

Diagnostic Analysis

EMR supports analysis of HDFS cold and hot data and small files to optimize service performance.

Scenarios

Upgraded Support For You

1 on 1 Presale Consultation, 24/7 Technical Support, Faster Response, and More Free Tickets.

1 on 1 Presale Consultation

Consulting by experienced cloud experts.Learn More

24/7 Technical Support

Extended service time from 10 hours 5 days a week to 24/7. Learn More

6 Free Tickets per Quarter

The number of free tickets doubled from 3 to 6 per quarter. Learn More

Faster Response

Shorten after-sale response time from 36 hours to 18 hours. Learn More
phone Contact Us
AI Assistant Powered By QWEN