All Products
Search
Document Center

E-MapReduce:Recommendations for evaluating cluster resource specifications

Last Updated:Mar 26, 2026

Accurate resource sizing before cluster creation prevents both under-provisioning (which causes performance degradation under load) and over-provisioning (which wastes cost). This topic provides sizing formulas and recommended specifications for each node group in an E-MapReduce (EMR) cluster running Kafka. After the initial estimate, validate with kafka-producer-perf-test and kafka-consumer-perf-test, then use the scale-out feature to adjust configurations as your workload changes.

Kafka cluster sizing depends on: peak message traffic, average message size, partition count, replication factors, and client count. Collect actual business metrics before applying the formulas below.

Node group overview

The following table summarizes recommended specifications for each node group. Detailed sizing guidance and formulas follow in the sections below.

Node group Role Nodes CPU Memory System disk Data disk
Master ZooKeeper + ecosystem components 3 4 cores 8 GiB 80 GiB 120 GiB cloud disk
Core Kafka broker See Number of brokers 16 cores 64 GiB 80 GiB 4 x cloud disks (size varies)
Task (optional) Kafka Connect >2 >8 cores Based on connector >80 GiB cloud disk

Master node group (ZooKeeper)

The master node group runs ZooKeeper and the Kafka ecosystem components: Kafka Manager, Schema Registry, and REST Proxy.

Configure three master nodes with the following specifications:

Resource Recommended value
Nodes 3
CPU 4 cores
Memory 8 GiB
CPU-to-memory ratio 1:2
System disk 80 GiB
Data disk 120 GiB cloud disk

Core node group (Kafka broker)

Business parameters

Collect the following business parameters before calculating broker count and disk size:

Parameter Description Default
Fan-out factor Number of times downstream nodes consume business data, excluding in-cluster replication
Peak inbound traffic Peak business data throughput (MB/s)
Average inbound traffic Average business data throughput (MB/s)
Data retention period How long data is retained (days) 7 days
Partition replication factor Number of replicas per partition 3
Peak traffic is typically one order of magnitude higher than average traffic. Set the peak inbound traffic value accordingly, and retain adequate redundant capacity so the cluster can sustain service under extreme load.

Use these parameters to derive the following cluster-level metrics:

Metric Formula
Total peak write traffic Peak inbound traffic x Partition replication factor
Total peak read traffic Peak inbound traffic x (Fan-out factor + Partition replication factor - 1)
Total storage capacity Average inbound traffic x Data retention period x Partition replication factor

Recommended node specifications

Configure core nodes with the following specifications:

Resource Recommended value
CPU 16 cores
Memory 64 GiB
CPU-to-memory ratio 1:4
System disk 80 GiB
Data disks 4 x cloud disks (size calculated below)

Disk type. Use cloud disks as data disks to avoid the operations and maintenance (O&M) burden of physical disk failures. This improves service availability and lowers O&M labor costs.

After selecting disk type and count, calculate the total disk I/O throughput of the node. Select a network interface card (NIC) with bandwidth greater than or equal to the total disk I/O throughput.

Number of brokers

In ideal conditions, a Kafka broker's throughput ceiling is either its disk I/O throughput or NIC bandwidth. Use the following steps to calculate the required broker count.

Step 1: Calculate disk throughput per node.

Disk throughput per node = Throughput per disk x Number of data disks

For reference: the maximum throughput of a PL1 Enterprise SSD (ESSD) is 350 MB/s. For local disks, use half the theoretical value as the effective throughput — typically 50 MB/s.

For detailed disk performance values, see Block storage performance.

Step 2: Calculate the number of brokers based on traffic.

With a replication factor of 3, use at least 4 brokers so that a partition with 3 replicas can still be created if one broker is temporarily unavailable. Maintain 50% redundant capacity:

Number of brokers = Max(4, (Total peak read traffic + Total peak write traffic) / Disk throughput per node / 50%)

Step 3: Check against partition replica limits.

If the total number of partition replicas is large, cross-check using the partition-based formula:

Number of brokers = Max(4, Total number of partitions x Partition replication factor / 2,000)

Partition replica limits:

Limit Value
Recommended max replicas per broker 2,000
Hard max replicas per broker 4,000
Hard max replicas per cluster 200,000

Step 4: Calculate disk size per broker.

Disk size per broker = Total storage capacity / Number of brokers / Number of data disks per node / 50%

Scaling after cluster creation

The 50% redundancy reserve built into the sizing formulas keeps the cluster below the load threshold where throttling begins. After cluster creation, monitor resource utilization and use the scale-out feature to adjust configurations based on actual resource usage.

Task node group (Kafka Connect) (optional)

The task node group is optional and runs Kafka Connect. It can be resized at any time after cluster creation based on actual resource usage.

Configure task nodes with the following specifications:

Resource Recommended value
Nodes >2 (for high availability)
CPU >8 cores per node; increase based on connector CPU utilization
Memory Based on connector type and memory usage
CPU-to-memory ratio 1:2 or 1:4
Data disk >80 GiB cloud disk