E-MapReduce (EMR) clusters are suitable for all the scenarios supported by the Hadoop ecosystem and Spark.
Batch processing of data
You can synchronize a large number of logs to the data nodes of EMR. Then, you can use tools such as Hue and a mainstream computing framework such as Hive, Spark, or Presto to get a quick insight into the data. You can also use tools such as Sqoop to load data from ApsaraDB RDS or other storage engines to EMR. Then, you can analyze the data and synchronize the results to ApsaraDB RDS or other storage engines. This feature helps implement data visualization.
Ad hoc queries for data analysis
Online analysis of large amounts of data
EMR analyzes petabytes of structured, semi-structured, or unstructured data generated by web apps and mobile apps. This allows web apps or data visualization services to visualize data in real time based on the analysis results obtained from EMR.
Processing of streaming data
EMR allows you to use and process real-time streaming data from services such as Log Service, Message Queue, Message Service (MNS), and Apache Kafka based on Spark Streaming and Storm.
EMR analyzes streaming data in fault-tolerant mode and writes analysis results to Object Storage Service (OSS) or HDFS.