All Products
Search
Document Center

E-MapReduce:Create a StarRocks cluster

Last Updated:Mar 26, 2026

Use the EMR console wizard to create a StarRocks cluster on EMR on ECS. The wizard walks you through three configuration steps — software, hardware, and basic settings — before provisioning the cluster.

Important

After a cluster is created, you cannot modify any parameters except the cluster name. Review all settings carefully before clicking Confirm.

Prerequisites

Before you begin, ensure that you have:

Create a StarRocks cluster

Steps overview:

  1. Go to the cluster creation page.

  2. Configure software parameters.

  3. Configure hardware parameters.

  4. Configure basic parameters.

  5. (Optional) Save as a cluster template.

  6. Confirm and verify.

Step 1: Go to the cluster creation page

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

  2. (Optional) In the top navigation bar, select the target region and a resource group.

    The region cannot be changed after the cluster is created. All resource groups in your account are displayed by default.
  3. On the EMR on ECS page, click Create Cluster.

Step 2: Configure software parameters

ParameterRequiredDescription
RegionYesThe region where the cluster is created. Cannot be changed after creation.
Business scenarioYesSelect Data Analytics.
Product versionYesThe EMR version. The latest version (for example, EMR-5.19.0) is selected by default.
High Service AvailabilityNoOff by default. When enabled, three master nodes are deployed to ensure ResourceManager and NameNode availability. You can also modify the number of master nodes.
Optional servicesNoAdditional services to include. Select STARROCKS3 to deploy StarRocks.
Collect Service Operational LogsNoOn by default. Collects service logs used exclusively for cluster diagnostics. Disabling this limits EMR health checks and service-related support. After creation, modify the Collection Status of Service Operational Logs parameter on the Basic Information tab. For details, see How do I stop collection of service operational logs?.
StarRocks architectureNoAvailable only when STARROCKS3 is selected. Choose based on your workload: Shared-nothing (default) integrates compute and storage on local disks of compute nodes (CNs) — best for online analytical processing (OLAP), real-time analytics, and business intelligence (BI) reports. Shared-data decouples compute from storage: CNs run query tasks while data is stored in an external distributed system, improving the flexibility and reliability of the system. This option is suitable for scenarios that require large-scale data storage and elastic computing.
DLF Unified MetadataNoSelected by default. Stores metadata in Data Lake Formation (DLF) using your account ID as the DLF catalog ID. To associate this cluster with a different catalog, click Create Catalog, enter a catalog ID, click OK, then select the new catalog from the DLF Catalog drop-down list.
Advanced settingsNoOff by default. Enable Custom Software Configuration to customize component parameters (Hadoop, Spark, Hive) using a JSON file.

Step 3: Configure hardware parameters

ParameterRequiredDescription
Billing methodYesSubscription is selected by default. Use Pay-as-you-go for short-term tests or dynamically scheduled jobs — charges are based on actual hours used, billed at the top of each hour. Use Subscription (pay before use) for production workloads.
ZoneYesThe zone within the selected region. Zones in the same region are connected via an internal network. The default selection works in most cases.
VPCYesAn existing VPC is selected by default. To use a different VPC, create one in the VPC console.
vSwitchYesSelect a vSwitch in the target zone. If none is available, create one in the VPC console.
Default security groupYesAn existing security group is selected by default. To create a new one, click create a new security group to open the Elastic Compute Service (ECS) console. For details, see Create a security group and Overview.
Important

Do not use an advanced security group created in the ECS console.

Node groupYesConfigure the node groups for the cluster. See Node group settings below.

Node group settings

EMR clusters support three node group types:

  • Master node group: Runs control processes (ResourceManager, NameNode). One master node is configured by default. When High Service Availability is enabled, multiple master nodes can be configured, and they are automatically added to a deployment set to distribute ECS instances across physical servers.

  • Core node group: Stores all cluster data. Two core nodes are configured by default. Add more core nodes after creation based on your workload.

  • Task node group: Provides additional compute capacity with no local data storage. Not configured by default. Supports Pay-as-you-go, Preemptible Instance, and Subscription billing.

For each node group, configure the following:

SettingOptionsNotes
System diskStandard SSD, enhanced SSD, ultra diskEnhanced SSDs support performance levels PL0, PL1, and PL2.
Data diskStandard SSD, enhanced SSD, ultra diskEnhanced SSDs support performance levels PL0, PL1, PL2, and PL3. Default performance level: PL1.
Additional security groupUp to 2 security groupsAllows interactions with external resources and applications.
Assign Public Network IPOff by defaultAssigns an Elastic IP address (EIP) to the cluster. Available for DataLake cluster node groups only. If not enabled and you later need internet access, apply for an EIP on ECS. See Apply for EIPs.

For guidance on choosing instance types, see Instance families.

Step 4: Configure basic parameters

Configure parameters in the Basic Configuration step.

ParameterRequiredDescription
Cluster nameYes1–64 characters. Accepts letters, digits, hyphens (-), and underscores (_). This is the only parameter you can modify after cluster creation.
Identity credentialsYesKey Pair (default): SSH key pairs for logging on to Linux instances. See Overview. Password: password for logging on to the master node. Must be 8–30 characters and include uppercase letters, lowercase letters, digits, and special characters (! @ # $ % ^ & *).

(Optional) Advanced settings:

ParameterDescription
ECS Application RoleAssigns an application role to the cluster. EMR uses this role to request temporary AccessKey credentials when accessing other Alibaba Cloud services (such as OSS), so you do not need to enter credentials manually.
Bootstrap actionsRuns custom scripts before the cluster starts. Use bootstrap actions to install software or modify the runtime environment. See Use bootstrap actions to execute scripts.
Release protectionPrevents accidental release of pay-as-you-go clusters. Disable release protection before releasing the cluster. See Enable and disable release protection.
TagsLabels for identifying and managing cluster resources. Tags can also be added on the Basic Information tab after creation. See Manage and use tags.
Resource groupGroups resources by usage, permissions, or ownership. See Use resource groups.
Data Disk EncryptionAvailable only at cluster creation time. Encrypts both data in transit and data at rest on the disk. See Enable data disk encryption.
System Disk EncryptionAvailable only at cluster creation time. Encrypts the operating system, program files, and system data on the system disk. See Enable system disk encryption.
RemarksFree-text notes about the cluster. Editable on the Basic Information tab after creation.

Step 5: (Optional) Save as a cluster template

This option is available only when Key Pair is selected as the identity credential.

  1. Click Save as Cluster Template.

  2. In the dialog box, fill in the following:

    ParameterDescription
    Cluster template name1–64 characters. Accepts letters, digits, hyphens (-), and underscores (_).
    Cluster template resource groupSelect a resource group to organize templates. To create a new resource group, click Create Resource Group. See Create a resource group.
  3. Click OK.

The template appears in the Manage Cluster Templates panel. For details on working with templates, see Create a cluster template.

Step 6: Confirm and verify

  1. Click Confirm.

  2. Refresh the page to monitor progress. The cluster is ready when Status shows Running.

FAQ

How are Frontend (FE) and Backend (BE) nodes distributed across master and core nodes?

FE nodes run on master nodes. With the default single master node, one FE is deployed. When High Service Availability is enabled, three master nodes are deployed by default — each running one FE — providing fault tolerance and load balancing.

BE nodes run on core nodes, one BE per core node by default. The number of BEs scales with the number of core nodes you configure.