All Products
Search
Document Center

E-MapReduce:Choose a business scenario

Last Updated:Mar 26, 2026

E-MapReduce (EMR) provides four predefined cluster types — Data Lake, Data Analytics, Real-time Data Streaming, and Data Service — each pre-configured for a specific workload. If none of these fits your requirements, use a Custom Cluster to deploy any combination of services.

Choose a cluster type

Match your workload to the cluster type using the table below.

Cluster type Included services Core capabilities Typical workloads
Data Lake (DataLake cluster)

Computing: Spark, Hive, Tez, Trino, Kyuubi, Presto

Storage: Hadoop Distributed File System (HDFS), OSS-HDFS, Celeborn, JindoCache

Data integration: Flume, Sqoop

Data lake formats: Hudi, Iceberg, Paimon

Resource management: YARN

Coordination: ZooKeeper

Security: OpenLDAP, Ranger, DLF-Auth, Knox

Unified storage, multiple compatible compute engines, support for Hudi/Iceberg/Paimon formats Offline extract, transform, and load (ETL) — data warehouse ETL, ad hoc analysis
Data Analytics (OLAP cluster)

Online Analytical Processing (OLAP): StarRocks, ClickHouse, Doris

Coordination: ZooKeeper

Subsecond-level query response, column-oriented storage optimization, federated queries Complex aggregation analysis — user profile analysis, user group identification, business intelligence (BI)
Real-time Data Streaming (Dataflow cluster)

Stream computing: Flink

Storage: HDFS, OSS-HDFS

Data lake format: Paimon

Resource management: YARN

Coordination: ZooKeeper

Security: OpenLDAP, Knox

Unified batch and stream processing, low latency, state consistency guarantee Real-time ETL — streaming warehouse ETL
Data Service (DataServing cluster)

Computing: Phoenix

Column-oriented storage: HBase

Storage: HDFS, OSS-HDFS, JindoCache

Coordination: ZooKeeper

Security: OpenLDAP, Ranger, Knox

Millisecond-level point queries, SQL interface optimization, read/write splitting High-concurrency queries — behavior analysis, precision marketing
Custom Cluster

Computing: Spark, Hive, Tez, Trino, Kyuubi, Presto, Flink, Phoenix

OLAP: StarRocks

Column-oriented storage: HBase

Storage: HDFS, OSS-HDFS, Celeborn, JindoCache

Data integration: Flume, Sqoop

Data lake formats: Hudi, Iceberg, Paimon

Resource management: YARN

Coordination: ZooKeeper

Security: OpenLDAP, Ranger, DLF-Auth, Knox

Flexible service deployment, mixed workloads (real-time, offline, and analytical) Offline ETL, real-time ETL, complex aggregation analysis, and high-concurrency queries
Note

Service versions available in a cluster depend on the EMR version. Use the latest EMR version to access the most features, better performance, and security improvements. For a full list of available versions, see Release versions.

When to use a Custom Cluster

A Custom Cluster gives you full control over which services to deploy. Use it when your workload spans multiple cluster types — for example, running Spark, Flink, and HBase together on a single cluster.

Use a Custom Cluster if:

  • Your workload combines offline ETL, real-time processing, and analytical queries

  • No predefined cluster type covers all the services you need

Use separate dedicated clusters instead if:

  • Your offline and real-time workloads have different latency or resource requirements — mixing them on one cluster can cause interference

If a Custom Cluster still cannot fully meet your requirements, deploy additional services manually after evaluating their compatibility and security.

What's next

After selecting a cluster type, plan the remaining cluster configuration:

References