This topic describes the types of clusters that are supported by E-MapReduce (EMR) and the important operations that you can perform in the cluster of each type.
Overview
Cluster type | Description | Important operation |
---|---|---|
Hadoop |
|
|
Data Science | Data Science clusters are commonly used in big data and AI scenarios. Data Science clusters support the offline extract, transform, load (ETL) of big data based on Hive and Spark, and TensorFlow model training. You can choose the CPU+GPU heterogeneous computing framework and deep learning algorithms supported by NVIDIA GPUs to run computing jobs more efficiently. |
|
Dataflow | Dataflow clusters provide an end-to-end (E2E) real-time computing solution. The clusters incorporate Kafka, a distributed message system with high throughput and scalability, and the commercial Flink kernel provided by Apache Flink-powered Ververica. The clusters are used to resolve various E2E real-time computing issues and are widely used in real-time data ETL, and log collection and analysis scenarios. You are free to use one of the two components or both. | |
Druid | Druid clusters provide a semi-hosted, real-time, and interactive analytic service. These clusters can query big data within milliseconds and ingest data in multiple ways. You can use Druid clusters with services such as EMR Hadoop, EMR Spark, Object Storage Service (OSS), and ApsaraDB RDS to build a flexible and stable system for real-time queries. |
Component
|
References
|
HDFS | Overview |
YARN | Overview |
Hive | Overview |
Spark | Overview |
Knox | Overview |
Tez | Overview |
Sqoop | Overview |
SmartData | Overview |
OpenLDAP | Overview |
Hudi | Overview |
Hue | Overview |
HBase | Overview |
ZooKeeper | Overview |
Presto | Overview |
Impala | Overview |
Zeppelin | Overview |
Flume | Overview |
Livy | Overview |
Ranger | Overview |
Phoenix | Overview |
ESS | Overview |
Alluxio | Overview |
Kudu | Overview |
Oozie | Overview |
Component
|
References
|
Faiss-Server | Faiss-Server |
GKS | GKS |
Kubeflow |
Use Kubeflow for model training |