All Products
Search
Document Center

E-MapReduce:Call the CreateCluster operation to create a cluster

Last Updated:Mar 17, 2025

You can call the CreateCluster operation to create an E-MapReduce (EMR) cluster. You must configure a large number of parameters when you call the CreateCluster operation to create a cluster. The most complex and important parameters are Applications and ApplicationConfigs. This topic describes how to configure the core parameters when you call the CreateCluster operation to create an EMR cluster.

RegionId

The region ID. The following tables describe the regions supported by EMR.

  • Regions in China

    Region name

    Region ID

    China (Hangzhou)

    cn-hangzhou

    China (Shanghai)

    cn-shanghai

    China (Qingdao)

    cn-qingdao

    China (Beijing)

    cn-beijing

    China (Zhangjiakou)

    cn-zhangjiakou

    China (Hohhot)

    cn-huhehaote

    China (Ulanqab)

    cn-wulanchabu

    China (Shenzhen)

    cn-shenzhen

    China (Chengdu)

    cn-chengdu

    China (Hong Kong)

    cn-hongkong

    China North 2 Ali Gov 1

    cn-north-2-gov-1

  • Regions outside China

    Region name

    Region ID

    Japan (Tokyo)

    ap-northeast-1

    Singapore

    ap-southeast-1

    Malaysia (Kuala Lumpur)

    ap-southeast-3

    Indonesia (Jakarta)

    ap-southeast-5

    Germany (Frankfurt)

    eu-central-1

    UK (London)

    eu-west-1

    US (Silicon Valley)

    us-west-1

    US (Virginia)

    us-east-1

    UAE (Dubai)

    me-east-1

    SAU (Riyadh)

    me-central-1

Example: cn-hangzhou.

ResourceGroupId

Optional. The resource group ID. Example: rg-acfmzabjyop****.

PaymentType

The billing method of the cluster. Valid values:

  • PayAsYouGo

  • Subscription

The following figure shows the corresponding configuration in the EMR console.

image

Example: PayAsYouGo.

SubscriptionConfig

This parameter takes effect only if you set the PaymentType parameter to Subscription. For more information, see SubscriptionConfig.

The following figure shows the corresponding configuration in the EMR console.

image.png

ClusterType

The cluster type. Valid values:

  • DATALAKE: DataLake cluster.

  • OLAP: online analytical processing (OLAP) cluster.

  • DATAFLOW: Dataflow cluster.

  • DATASERVING: DataServing cluster.

  • CUSTOM: custom cluster.

  • HADOOP: Hadoop cluster in the original data lake scenario. We recommend that you set this parameter to DATALAKE rather than HADOOP.

The following figure shows the corresponding configuration in the EMR console.

image

Example: DATALAKE.

ReleaseVersion

The EMR version. You can query available EMR versions in the EMR console or by calling the ListReleaseVersions operation.

The following figure shows the corresponding configuration in the EMR console.image

Example: EMR-5.16.0.

ClusterName

The cluster name. The name must be 1 to 128 characters in length. The name must start with a letter but cannot start with http:// or https://. The name can contain letters, digits, colons (:), underscores (_), periods (.), and hyphens (-).

Example: emrtest.

DeployMode

The deployment mode of master nodes in the cluster. Valid values:

  • NORMAL: regular mode. A cluster that contains only one master node is created.

  • HA: high availability (HA) mode. A cluster that contains three master nodes is created.

The following figure shows the configuration of enabling the HA mode in the EMR console.image

Example: NORMAL.

SecurityMode

The security mode of the cluster. Valid values:

  • NORMAL: disables Kerberos authentication for the cluster.

  • KERBEROS: enables Kerberos authentication for the cluster.

The following figure shows the configuration of enabling Kerberos authentication in the EMR console.image

Example: NORMAL.

Applications

The following figure shows the services that can be deployed in a DataLake cluster.

image

In EMR, some services are mutually dependent or mutually exclusive.

  • Service dependency: Service A depends on Service B. In this case, if you want to deploy Service A in your cluster, you must also deploy Service B. For example, Hive depends on YARN. If you want to deploy Hive, you must also deploy YARN.

  • Service exclusion: Service A and Service B are mutually exclusive. If you want to deploy Service A, you cannot deploy Service B. For example, Spark 2 and Spark 3 are mutually exclusive. If you want to deploy Spark 2, you cannot deploy Spark 3.

Service deployment of HA clusters

Service to deploy

Dependency

Exclusive service

HDFS

Hadoop-Common and ZooKeeper

OSS-HDFS

OSS-HDFS

Hadoop-Common

HDFS

Hive

Hadoop-Common, YARN, ZooKeeper, and HDFS or OSS-HDFS

None

Spark 2

Hadoop-Common, YARN, Hive, and ZooKeeper

Spark 3

Spark 3

Hadoop-Common, YARN, Hive, ZooKeeper, and HDFS or OSS-HDFS

Spark 2

Tez

Hadoop-Common, YARN, ZooKeeper, and HDFS or OSS-HDFS

None

Trino

Hadoop-Common

None

Flume

Hadoop-Common

None

Kyuubi

Hadoop-Common, YARN, Hive, Spark 3, ZooKeeper, and HDFS or OSS-HDFS

None

YARN

Hadoop-Common, ZooKeeper, and HDFS or OSS-HDFS

None

Impala

Hadoop-Common, YARN, Hive, ZooKeeper, and HDFS or OSS-HDFS

None

Ranger

Hadoop-Common and Ranger-plugin

None

Presto

Hadoop-Common

None

Sqoop

Hadoop-Common, YARN, ZooKeeper, and HDFS or OSS-HDFS

None

Knox

OpenLDAP

None

StarRocks 2

None

StarRocks 3

StarRocks 3

None

StarRocks 2

ClickHouse

ZooKeeper

None

Flink

Hadoop-Common, YARN, OpenLDAP, ZooKeeper, and HDFS or OSS-HDFS

None

HBase

Hadoop-Common, HDFS or OSS-HDFS, and ZooKeeper

None

Phoenix

Hadoop-Common, HDFS or OSS-HDFS, ZooKeeper, and HBase

None

Service deployment of non-HA clusters

If you set the DeployMode parameter to NORMAL, the cluster is a non-HA cluster. The following table describes the dependencies and exclusions between services.

Service to deploy

Dependency

Exclusive service

HDFS

Hadoop-Common

OSS-HDFS

OSS-HDFS

Hadoop-Common

HDFS

Hive

Hadoop-Common and YARN

None

Spark 2

Hadoop-Common, YARN, and Hive

Spark 3

Spark 3

Hadoop-Common, YARN, and Hive

Spark 2

Tez

Hadoop-Common, YARN, and HDFS or OSS-HDFS

None

Trino

Hadoop-Common

None

Flume

Hadoop-Common

None

Kyuubi

Hadoop-Common, YARN, Hive, Spark 3, and ZooKeeper

None

YARN

Hadoop-Common

None

Impala

Hadoop-Common, YARN, and Hive

None

Ranger

Hadoop-Common and Ranger-plugin

None

Presto

Hadoop-Common

None

Sqoop

Hadoop-Common and YARN

None

Knox

OpenLDAP

None

StarRocks 2

None

StarRocks 3

StarRocks 3

None

StarRocks 2

ClickHouse

ZooKeeper

None

Flink

Hadoop-Common, YARN, and OpenLDAP

None

HBase

Hadoop-Common, HDFS or OSS-HDFS, and ZooKeeper

None

Phoenix

Hadoop-Common, HDFS or OSS-HDFS, ZooKeeper, and HBase

None

ApplicationConfigs

The following tables describe the required configurations of the ApplicationConfigs parameter for different types of clusters.

Note

Replace ${Parameter name} with the actual value based on your business requirements.

DataLake cluster

Scenario

Required configuration

Description

OSS-HDFS is deployed.

[{
"ApplicationName":"OSS-HDFS",
"ConfigFileName":"common.conf",
"ConfigItemKey":"OSS_ROOT_URI",
"ConfigItemValue":"${OSS_ROOT_URI}",
"ConfigScope":"CLUSTER"
}]

${OSS_ROOT_URI} corresponds to the Root Storage Directory of Cluster parameter in the EMR console. Example: oss://emr-apitest****.cn-hangzhou.oss-dls.aliyuncs.com/. The following figure shows the corresponding configuration in the EMR console.

image

DLF is used to store metadata.

In EMR V3.43.0 and later V3.X.X versions and EMR V5.9.0 and later V5.X.X versions

[ {
"ApplicationName":"HIVE",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"hive.metastore.type",
"ConfigItemValue":"DLF",
"ConfigScope":"CLUSTER"
}, {
"ApplicationName":"HIVE",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"dlf.catalog.id",
"ConfigItemValue":"${dlf.catalog.id}",
"ConfigScope":"CLUSTER"
}, {
"ApplicationName":"SPARK3",
"ConfigFileName":"hive-site.xml",
"ConfigItemKey":"hive.metastore.type",
"ConfigItemValue":"DLF",
"ConfigScope":"CLUSTER"
}, {
"ApplicationName":"SPARK3",
"ConfigFileName":"hive-site.xml",
"ConfigItemKey":"dlf.catalog.id",
"ConfigItemValue":"${dlf.catalog.id}",
"ConfigScope":"CLUSTER"
}]

In EMR V3.42.0 and earlier V3.X.X versions and EMR V5.8.0 and earlier V5.X.X versions

[ {
"ApplicationName":"HIVE",
"ConfigFileName":"hive-site.xml",
"ConfigItemKey":"hive.metastore.type",
"ConfigItemValue":"DLF",
"ConfigScope":"CLUSTER"
}, {
"ApplicationName":"HIVE",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"dlf.catalog.id",
"ConfigItemValue":"${dlf.catalog.id}",
"ConfigScope":"CLUSTER"
}, {
"ApplicationName":"SPARK3",
"ConfigFileName":"hive-site.xml",
"ConfigItemKey":"hive.metastore.type",
"ConfigItemValue":"DLF",
"ConfigScope":"CLUSTER"
}, {
"ApplicationName":"SPARK3",
"ConfigFileName":"hive-site.xml",
"ConfigItemKey":"dlf.catalog.id",
"ConfigItemValue":"${dlf.catalog.id}",
"ConfigScope":"CLUSTER"
}]

  • hive.metastore.type: the metadata storage type. The value of this parameter is DLF, which corresponds to DLF Unified Metadata of the Metadata parameter in the EMR console.

    image

  • ${dlf.catalog.id} corresponds to the DLF Catalog parameter in the EMR console.

    image

ApsaraDB RDS is used to store metadata.

[{
"ApplicationName":"HIVE",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"hive.metastore.type",
"ConfigItemValue":"USER_RDS",
"ConfigScope":"CLUSTER"
},{
"ApplicationName":"SPARK3",
"ConfigFileName":"hive-site.xml",
"ConfigItemKey":"hive.metastore.type",
"ConfigItemValue":"USER_RDS",
"ConfigScope":"CLUSTER"
}, {
"ApplicationName":"HIVE",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"javax.jdo.option.ConnectionURL",
"ConfigItemValue":"${dbURL}",
"ConfigScope":"CLUSTER"
}, {
"ApplicationName":"HIVE",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"javax.jdo.option.ConnectionUserName",
"ConfigItemValue":"${dbUser}",
"ConfigScope":"CLUSTER"
}, {
"ApplicationName":"HIVE",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"javax.jdo.option.ConnectionPassword",
"ConfigItemValue":"${dbPassword}",
"ConfigScope":"CLUSTER"
}]

  • hive.metastore.type: the metadata storage type. The value of this parameter is USER_RDS, which corresponds to Self-managed RDS of the Metadata parameter in the EMR console.

    image

  • ${dbURL}: the ApsaraDB RDS address. Example: jdbc:mysql://rm-bp1qg11xjszt3x3****.mysql.rds.aliyuncs.com/hivemeta.

  • ${dbUser}: the username of the ApsaraDB RDS user.

  • ${dbPassword}: the password of the ApsaraDB RDS user.

Built-in MySQL is used to store metadata.

[{
"ApplicationName":"HIVE",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"hive.metastore.type",
"ConfigItemValue":"LOCAL",
"ConfigScope":"CLUSTER"
},{
"ApplicationName":"SPARK3",
"ConfigFileName":"hive-site.xml",
"ConfigItemKey":"hive.metastore.type",
"ConfigItemValue":"LOCAL",
"ConfigScope":"CLUSTER"
}]

hive.metastore.type: the metadata storage type. The value of this parameter is LOCAL, which corresponds to Built-in MySQL of the Metadata parameter in the EMR console.

image

OLAP cluster

Scenario

Required configuration

Description

ClickHouse is deployed.

[
    {
        "ApplicationName": "CLICKHOUSE",
        "ConfigFileName": "cluster-info",
        "ConfigItemKey": "replica",
        "ConfigItemValue": "3",
        "ConfigScope": "CLUSTER"
    },
    {
        "ApplicationName": "CLICKHOUSE",
        "ConfigFileName": "cluster-info",
        "ConfigItemKey": "shard",
        "ConfigItemValue": "2",
        "ConfigScope": "CLUSTER"
    }
]

The following figure shows the corresponding configuration in the EMR console.

image

Important

Make sure that the following condition is met: Number of replicas × Number of shards = Number of core nodes.

StarRocks 2 is deployed and is connected to Data Lake Formation (DLF).

[{
"ApplicationName":"Starrocks2",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"hive.metastore.type",
"ConfigItemValue":"DLF",
"ConfigScope":"CLUSTER"
},
{
"ApplicationName":"Starrocks2",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"dlf.catalog.id",
"ConfigItemValue":"${dlf.catalog.id}",
"ConfigScope":"CLUSTER"
}]
  1. The value of the hive.metastore.type parameter is DLF, which corresponds to DLF Unified Metadata of the Metadata parameter in the EMR console. StarRocks 2 is automatically connected to DLF.

  2. ${dlf.catalog.id} corresponds to the DLF Catalog parameter in the EMR console. image

StarRocks 3 is deployed and is connected to DLF.

[{
"ApplicationName":"Starrocks3",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"hive.metastore.type",
"ConfigItemValue":"DLF",
"ConfigScope":"CLUSTER"
},
{
"ApplicationName":"Starrocks3",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"dlf.catalog.id",
"ConfigItemValue":"${dlf.catalog.id}",
"ConfigScope":"CLUSTER"
}]
  1. The value of the hive.metastore.type parameter is DLF, which corresponds to the selection of the STARROCKS3 Automatically Connect with DLF check box in the EMR console.

  2. ${dlf.catalog.id} corresponds to the DLF Catalog parameter in the EMR console.

Dataflow cluster

Scenario

Required configuration

Description

OSS-HDFS is deployed.

[{
"ApplicationName":"OSS-HDFS",
"ConfigFileName":"common.conf",
"ConfigItemKey":"OSS_ROOT_URI",
"ConfigItemValue":"${OSS_ROOT_URI}",
"ConfigScope":"CLUSTER"
}]

${OSS_ROOT_URI} corresponds to the Root Storage Directory of Cluster parameter in the EMR console. Example: oss://emr-apitest****.cn-hangzhou.oss-dls.aliyuncs.com/. The following figure shows the corresponding configuration in the EMR console.

image

Flink is deployed and is connected to DLF.

[{
"ApplicationName":"FLINK",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"hive.metastore.type",
"ConfigItemValue":"DLF",
"ConfigScope":"CLUSTER"
},{
"ApplicationName":"FLINK",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"dlf.catalog.id",
"ConfigItemValue":"${dlf.catalog.id}",
"ConfigScope":"CLUSTER"
}]
  • The value of the hive.metastore.type parameter is DLF, which corresponds to the selection of the FLINK Automatically Connect with DLF check box in the EMR console.

  • ${dlf.catalog.id} corresponds to the DLF Catalog parameter in the EMR console.

DataServing cluster

Scenario

Required configuration

Description

OSS-HDFS is deployed.

[{
"ApplicationName":"OSS-HDFS",
"ConfigFileName":"common.conf",
"ConfigItemKey":"OSS_ROOT_URI",
"ConfigItemValue":"${OSS_ROOT_URI}",
"ConfigScope":"CLUSTER"
}]

${OSS_ROOT_URI} corresponds to the Root Storage Directory of Cluster parameter in the EMR console. Example: oss://emr-apitest****.cn-hangzhou.oss-dls.aliyuncs.com/. The following figure shows the corresponding configuration in the EMR console.

image

OSS-HDFS and HBase are deployed, and HBase is used to store logs.

[{
"ApplicationName":"HBASE",
"ConfigFileName":"hbase-site.xml",
"ConfigItemKey":"hbase.wal.mode",
"ConfigItemValue":"HDFS",
"ConfigScope":"CLUSTER"
}]

The value of the hbase.wal.mode parameter is HDFS, which corresponds to the selection of the Use HDFS as HBase HLog Storage check box in the EMR console.

image.png

Custom cluster

The required configurations of the ApplicationConfigs parameter for custom clusters are the same as those for other types of clusters.

NodeAttributes

Required. Configure parameters related to Elastic Compute Service (ECS). For more information, see NodeAttributes.

Parameter

Description

VpcId

The virtual private cloud (VPC) ID of the cluster. Example: vpc-bp1tgey2p0ytxmdo5****.

ZoneId

The zone ID. Example: ch-hangzhou-h.

SecurityGroupId

The security group ID. Only basic security groups are supported. Example: sg-hp3abbae8lb6lmb1****.

RamRole

The RAM role that you want to assign to EMR to access other Alibaba Cloud resources from ECS. Default value: AliyunECSInstanceForEMRRole.

KeyPairName

The name of the key pair that is used to log on to the ECS instance in SSH mode.

MasterRootPassword

The initial password of the root user on the master node. This password is used only when the ECS instance is created. You can use the password the first time you configure and verify the identity of the root user.

NodeGroups

Required. The node groups of the cluster. For more information, see NodeGroupConfig.

BootstrapScripts

Optional. The bootstrap actions of the cluster. For more information, see Script.

Tags

Optional. The tags that you want to add to the cluster. For more information, see Tag.

ClientToken

Optional. The client token that is used to ensure the idempotence of the request. You can configure this parameter to prevent repeated calls to create clusters. The same ClientToken value for multiple calls to the CreateCluster operation results in identical responses. Only one cluster can be created by using the same ClientToken value.

References

For more information about the CreateCluster operation, see CreateCluster.