This topic describes the types of clusters that are supported by E-MapReduce (EMR) and the important operations that you can perform in the cluster of each type.

Overview

Cluster type Description Important operation
Hadoop
  • Provides Hadoop, Hive, and Spark components that serve as semi-hosted services and are used to store and compute large-scale distributed data offline.
  • Provides Presto and Impala components for interactive queries.
  • Provides other Hadoop ecosystem components, such as Oozie.
Data Science Data Science clusters are commonly used in big data and AI scenarios. Data Science clusters support the offline extract, transform, load (ETL) of big data based on Hive and Spark, and TensorFlow model training. You can choose the CPU+GPU heterogeneous computing framework and deep learning algorithms supported by NVIDIA GPUs to run computing jobs more efficiently.
Dataflow Dataflow clusters provide an end-to-end (E2E) real-time computing solution. The clusters incorporate Kafka, a distributed message system with high throughput and scalability, and the commercial Flink kernel provided by Apache Flink-powered Ververica. The clusters are used to resolve various E2E real-time computing issues and are widely used in real-time data ETL, and log collection and analysis scenarios. You are free to use one of the two components or both.
Druid Druid clusters provide a semi-hosted, real-time, and interactive analytic service. These clusters can query big data within milliseconds and ingest data in multiple ways. You can use Druid clusters with services such as EMR Hadoop, EMR Spark, Object Storage Service (OSS), and ApsaraDB RDS to build a flexible and stable system for real-time queries.
Component
References
HDFS Overview
YARN Overview
Hive Overview
Spark Overview
Knox Overview
Tez Overview
Sqoop Overview
SmartData Overview
OpenLDAP Overview
Hudi Overview
Hue Overview
HBase Overview
ZooKeeper Overview
Presto Overview
Impala Overview
Zeppelin Overview
Flume Overview
Livy Overview
Ranger Overview
Phoenix Overview
ESS Overview
Alluxio Overview
Kudu Overview
Oozie Overview
Component
References
Druid Overview
Superset Overview
ZooKeeper Overview
Knox Overview
OpenLDAP Overview
Cluster mode
Component
References
Flink HDFS Overview
YARN Overview
ZooKeeper Overview
Knox Overview
Flink Overview
OpenLDAP Overview
Kafka ZooKeeper Overview
Ganglia None
Kafka Overview
Kafka-Manager Overview
OpenLDAP Overview
Knox Overview
Ranger Overview
Hue Overview