E-MapReduce (EMR) clusters are suitable for all the scenarios supported by the Hadoop ecosystem and Spark.

EMR is a cluster service based on Hadoop and Spark. You can use the Alibaba Cloud Elastic Compute Service (ECS) instances on which EMR clusters are deployed as your dedicated physical machines. The following examples show the classic scenarios of EMR:

Batch processing of data

You can synchronize a large number of logs to the data nodes of EMR. Then, you can use tools such as Hue and a mainstream computing framework such as Hive, Spark, or Presto to get a quick insight into the data. You can also use tools such as Sqoop to load data from ApsaraDB RDS or other storage engines to EMR. Then, you can analyze the data and synchronize the results to ApsaraDB RDS or other storage engines. This feature helps implement data visualization.

Batch processing of data

Ad hoc queries for data analysis

EMR allows you to directly import massive data to or use an external table to import massive data to online analytical processing (OLAP) analysis engines, such as ClickHouse, Presto, and Impala, for efficient, real-time, and flexible data analysis. This feature is suitable for business scenarios such as user persona analysis, audience selection, BI reporting, and business analysis. Ad hoc queries for data analysis

Online analysis of large amounts of data

EMR analyzes petabytes of structured, semi-structured, or unstructured data generated by web apps and mobile apps. This allows web apps or data visualization services to visualize data in real time based on the analysis results obtained from EMR.

Online analysis of large amounts of data

Processing of streaming data

EMR allows you to use and process real-time streaming data from services such as Log Service, Message Queue, Message Service (MNS), and Apache Kafka based on Spark Streaming and Storm.

EMR analyzes streaming data in fault-tolerant mode and writes analysis results to Object Storage Service (OSS) or HDFS.

Processing of streaming data