All Products
Search
Document Center

E-MapReduce:Create a Dataflow Kafka cluster

Last Updated:Mar 26, 2026

A Dataflow Kafka cluster is an E-MapReduce (EMR) cluster deployed with Kafka in the Real-time Data Streaming scenario. This guide covers instance selection, software and network configuration, and the steps to complete the cluster creation wizard.

Important

Kafka is no longer supported in EMR V5.18.0, EMR V3.52.0, or any minor version earlier than those releases. Use ApsaraMQ for Kafka or install Kafka manually instead.

Prerequisites

Before you begin, ensure that you have:

  • An Alibaba Cloud account with the permissions to create EMR clusters

  • A virtual private cloud (VPC) and a vSwitch in your target region and zone

  • A security group (do not use an advanced security group created in the Elastic Compute Service (ECS) console)

Create a Dataflow Kafka cluster

Important

After a cluster is created, you cannot modify any parameters except the cluster name. Verify all settings before clicking Confirm.

Step 1: Go to the cluster creation page

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

  2. (Optional) In the top navigation bar, select a region and a resource group.

    • The region cannot be changed after the cluster is created.

    • All resource groups in your account are displayed by default.

  3. Click Create Cluster.

Step 2: Configure software settings

ParameterDescription
RegionThe region where the cluster is created. Cannot be changed after creation. Example: China (Hangzhou).
Business scenarioSelect Real-time Data Streaming for a Kafka cluster.
Product versionThe EMR version determines the version of each bundled service. For example, EMR-3.43.1 includes Kafka 2.12_2.4.1, where 2.12 is the Scala version and 2.4.1 is the open-source Kafka version.
High Service AvailabilityOff by default. Turn this on to deploy three ZooKeeper nodes in the master node group. Because Kafka availability depends on ZooKeeper availability, we recommend turning this on when you create a cluster. If the master node group is used only for ZooKeeper, configure a single data disk for that group. For sizing guidance, see Suggestions for evaluating cluster resources.
Optional Services (Select One At Least)Select Kafka. Add other services based on your requirements. Selected services are started automatically.
Collect Service Operational LogsOn by default. Keep this on — turning it off limits EMR cluster health checks and service-related technical support. After cluster creation, you can change this on the Basic Information tab. For details, see How do I stop collection of service operational logs?

Step 3: Configure hardware settings

ParameterDescription
Billing methodSubscription (default) or pay-as-you-go. Use pay-as-you-go for short-term tests or dynamically scheduled jobs; use Subscription for stable production workloads.
ZoneThe zone where the cluster is deployed. Clusters within the same region communicate over the internal network.
VPCThe VPC for the cluster. An existing VPC is selected by default. To create one, see Create and manage a VPC.
vSwitchThe vSwitch in the selected zone. If no vSwitch is available, see Create and manage a vSwitch.
Default Security GroupThe security group for the cluster. An existing group is selected by default. To create one, click create a new security group. Do not use an advanced security group created in the ECS console. For an overview, see Overview.
Node GroupConfigure each node group as follows:

Node group settings:

For Kafka brokers, use ECS instances with a CPU-to-memory ratio of 1:4. Balance the I/O throughput of cloud disks against the network interface controller (NIC) bandwidth when sizing nodes.

  • Instance type: Select instance types based on your workload or refer to Suggestions for evaluating cluster resources.

  • Add to Deployment Set: If High Service Availability is on, master nodes are added to a deployment set by default. See Add nodes to the deployment set.

  • System Disk: Select a disk type. Minimum recommended size: 120 GiB. Valid range: 80–500 GiB.

  • Data Disk: Use cloud disks. Minimum recommended size: 80 GiB. Valid range: 40–32768 GiB.

  • Instances: Three master nodes and three core nodes are deployed by default.

  • Additional Security Group: Associate up to two additional security groups per node group.

  • Assign Public Network IP: Off by default. Turn on to associate an elastic IP address (EIP) with the cluster. See What is an Elastic IP Address?

Step 4: Configure basic settings

Configure the parameters in the Basic Information step.

Important

The parameters in the Advanced Settings section are not supported. Do not configure them.

ParameterDescription
Cluster Name1–64 characters. Letters, digits, hyphens (-), and underscores (_) only. Example: Emr-Kafka.
Identity CredentialsKey Pair (default): Access the Linux instance using an SSH key pair. See SSH key pair overview. Password: Set a password for the master node. Must be 8–30 characters and include uppercase letters, lowercase letters, digits, and at least one of: ! @ # $ % ^ & *

Step 5: Confirm and create

  1. In the Confirm step, read the terms of service and select the checkbox.

  2. Click Confirm.

Refresh the EMR on ECS page to monitor progress. When Status shows Running, the cluster is ready.

What's next

After the cluster is running, configure security features: