Use the EMR console wizard to create a StarRocks cluster on EMR on ECS. The wizard walks you through three configuration steps — software, hardware, and basic settings — before provisioning the cluster.
After a cluster is created, you cannot modify any parameters except the cluster name. Review all settings carefully before clicking Confirm.
Prerequisites
Before you begin, ensure that you have:
A virtual private cloud (VPC) in the target region. For setup instructions, see Create and manage a VPC.
A vSwitch in the target zone within that VPC. For setup instructions, see Create and manage vSwitches.
Create a StarRocks cluster
Steps overview:
Go to the cluster creation page.
Configure software parameters.
Configure hardware parameters.
Configure basic parameters.
(Optional) Save as a cluster template.
Confirm and verify.
Step 1: Go to the cluster creation page
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
(Optional) In the top navigation bar, select the target region and a resource group.
The region cannot be changed after the cluster is created. All resource groups in your account are displayed by default.
On the EMR on ECS page, click Create Cluster.
Step 2: Configure software parameters
| Parameter | Required | Description |
|---|---|---|
| Region | Yes | The region where the cluster is created. Cannot be changed after creation. |
| Business scenario | Yes | Select Data Analytics. |
| Product version | Yes | The EMR version. The latest version (for example, EMR-5.19.0) is selected by default. |
| High Service Availability | No | Off by default. When enabled, three master nodes are deployed to ensure ResourceManager and NameNode availability. You can also modify the number of master nodes. |
| Optional services | No | Additional services to include. Select STARROCKS3 to deploy StarRocks. |
| Collect Service Operational Logs | No | On by default. Collects service logs used exclusively for cluster diagnostics. Disabling this limits EMR health checks and service-related support. After creation, modify the Collection Status of Service Operational Logs parameter on the Basic Information tab. For details, see How do I stop collection of service operational logs?. |
| StarRocks architecture | No | Available only when STARROCKS3 is selected. Choose based on your workload: Shared-nothing (default) integrates compute and storage on local disks of compute nodes (CNs) — best for online analytical processing (OLAP), real-time analytics, and business intelligence (BI) reports. Shared-data decouples compute from storage: CNs run query tasks while data is stored in an external distributed system, improving the flexibility and reliability of the system. This option is suitable for scenarios that require large-scale data storage and elastic computing. |
| DLF Unified Metadata | No | Selected by default. Stores metadata in Data Lake Formation (DLF) using your account ID as the DLF catalog ID. To associate this cluster with a different catalog, click Create Catalog, enter a catalog ID, click OK, then select the new catalog from the DLF Catalog drop-down list. |
| Advanced settings | No | Off by default. Enable Custom Software Configuration to customize component parameters (Hadoop, Spark, Hive) using a JSON file. |
Step 3: Configure hardware parameters
| Parameter | Required | Description |
|---|---|---|
| Billing method | Yes | Subscription is selected by default. Use Pay-as-you-go for short-term tests or dynamically scheduled jobs — charges are based on actual hours used, billed at the top of each hour. Use Subscription (pay before use) for production workloads. |
| Zone | Yes | The zone within the selected region. Zones in the same region are connected via an internal network. The default selection works in most cases. |
| VPC | Yes | An existing VPC is selected by default. To use a different VPC, create one in the VPC console. |
| vSwitch | Yes | Select a vSwitch in the target zone. If none is available, create one in the VPC console. |
| Default security group | Yes | An existing security group is selected by default. To create a new one, click create a new security group to open the Elastic Compute Service (ECS) console. For details, see Create a security group and Overview. Important Do not use an advanced security group created in the ECS console. |
| Node group | Yes | Configure the node groups for the cluster. See Node group settings below. |
Node group settings
EMR clusters support three node group types:
Master node group: Runs control processes (ResourceManager, NameNode). One master node is configured by default. When High Service Availability is enabled, multiple master nodes can be configured, and they are automatically added to a deployment set to distribute ECS instances across physical servers.
Core node group: Stores all cluster data. Two core nodes are configured by default. Add more core nodes after creation based on your workload.
Task node group: Provides additional compute capacity with no local data storage. Not configured by default. Supports Pay-as-you-go, Preemptible Instance, and Subscription billing.
For each node group, configure the following:
| Setting | Options | Notes |
|---|---|---|
| System disk | Standard SSD, enhanced SSD, ultra disk | Enhanced SSDs support performance levels PL0, PL1, and PL2. |
| Data disk | Standard SSD, enhanced SSD, ultra disk | Enhanced SSDs support performance levels PL0, PL1, PL2, and PL3. Default performance level: PL1. |
| Additional security group | Up to 2 security groups | Allows interactions with external resources and applications. |
| Assign Public Network IP | Off by default | Assigns an Elastic IP address (EIP) to the cluster. Available for DataLake cluster node groups only. If not enabled and you later need internet access, apply for an EIP on ECS. See Apply for EIPs. |
For guidance on choosing instance types, see Instance families.
Step 4: Configure basic parameters
Configure parameters in the Basic Configuration step.
| Parameter | Required | Description |
|---|---|---|
| Cluster name | Yes | 1–64 characters. Accepts letters, digits, hyphens (-), and underscores (_). This is the only parameter you can modify after cluster creation. |
| Identity credentials | Yes | Key Pair (default): SSH key pairs for logging on to Linux instances. See Overview. Password: password for logging on to the master node. Must be 8–30 characters and include uppercase letters, lowercase letters, digits, and special characters (! @ # $ % ^ & *). |
(Optional) Advanced settings:
| Parameter | Description |
|---|---|
| ECS Application Role | Assigns an application role to the cluster. EMR uses this role to request temporary AccessKey credentials when accessing other Alibaba Cloud services (such as OSS), so you do not need to enter credentials manually. |
| Bootstrap actions | Runs custom scripts before the cluster starts. Use bootstrap actions to install software or modify the runtime environment. See Use bootstrap actions to execute scripts. |
| Release protection | Prevents accidental release of pay-as-you-go clusters. Disable release protection before releasing the cluster. See Enable and disable release protection. |
| Tags | Labels for identifying and managing cluster resources. Tags can also be added on the Basic Information tab after creation. See Manage and use tags. |
| Resource group | Groups resources by usage, permissions, or ownership. See Use resource groups. |
| Data Disk Encryption | Available only at cluster creation time. Encrypts both data in transit and data at rest on the disk. See Enable data disk encryption. |
| System Disk Encryption | Available only at cluster creation time. Encrypts the operating system, program files, and system data on the system disk. See Enable system disk encryption. |
| Remarks | Free-text notes about the cluster. Editable on the Basic Information tab after creation. |
Step 5: (Optional) Save as a cluster template
This option is available only when Key Pair is selected as the identity credential.
Click Save as Cluster Template.
In the dialog box, fill in the following:
Parameter Description Cluster template name 1–64 characters. Accepts letters, digits, hyphens (-), and underscores (_). Cluster template resource group Select a resource group to organize templates. To create a new resource group, click Create Resource Group. See Create a resource group. Click OK.
The template appears in the Manage Cluster Templates panel. For details on working with templates, see Create a cluster template.
Step 6: Confirm and verify
Click Confirm.
Refresh the page to monitor progress. The cluster is ready when Status shows Running.
FAQ
How are Frontend (FE) and Backend (BE) nodes distributed across master and core nodes?
FE nodes run on master nodes. With the default single master node, one FE is deployed. When High Service Availability is enabled, three master nodes are deployed by default — each running one FE — providing fault tolerance and load balancing.
BE nodes run on core nodes, one BE per core node by default. The number of BEs scales with the number of core nodes you configure.