Call the CreateCluster operation to create a cluster - E-MapReduce

You can call the CreateCluster operation to create an E-MapReduce (EMR) cluster. You must configure a large number of parameters when you call the CreateCluster operation to create a cluster. The most complex and important parameters are Applications and ApplicationConfigs. This topic describes how to configure the core parameters when you call the CreateCluster operation to create an EMR cluster.

RegionId

The region ID. The following tables describe the regions supported by EMR.

Regions in China

Region name	Region ID
China (Hangzhou)	cn-hangzhou
China (Shanghai)	cn-shanghai
China (Qingdao)	cn-qingdao
China (Beijing)	cn-beijing
China (Zhangjiakou)	cn-zhangjiakou
China (Hohhot)	cn-huhehaote
China (Ulanqab)	cn-wulanchabu
China (Shenzhen)	cn-shenzhen
China (Chengdu)	cn-chengdu
China (Hong Kong)	cn-hongkong
China North 2 Ali Gov 1	cn-north-2-gov-1

Regions outside China

Region name	Region ID
Japan (Tokyo)	ap-northeast-1
Singapore	ap-southeast-1
Malaysia (Kuala Lumpur)	ap-southeast-3
Indonesia (Jakarta)	ap-southeast-5
Germany (Frankfurt)	eu-central-1
UK (London)	eu-west-1
US (Silicon Valley)	us-west-1
US (Virginia)	us-east-1
UAE (Dubai)	me-east-1
SAU (Riyadh)	me-central-1

Example: cn-hangzhou.

ResourceGroupId

Optional. The resource group ID. Example: rg-acfmzabjyop****.

PaymentType

The billing method of the cluster. Valid values:

PayAsYouGo
Subscription

The following figure shows the corresponding configuration in the EMR console.

Example: PayAsYouGo.

SubscriptionConfig

This parameter takes effect only if you set the PaymentType parameter to Subscription. For more information, see SubscriptionConfig.

The following figure shows the corresponding configuration in the EMR console.

ClusterType

The cluster type. Valid values:

DATALAKE: DataLake cluster.
OLAP: online analytical processing (OLAP) cluster.
DATAFLOW: Dataflow cluster.
DATASERVING: DataServing cluster.
CUSTOM: custom cluster.
HADOOP: Hadoop cluster in the original data lake scenario. We recommend that you set this parameter to DATALAKE rather than HADOOP.

The following figure shows the corresponding configuration in the EMR console.

Example: DATALAKE.

ReleaseVersion

The EMR version. You can query available EMR versions in the EMR console or by calling the ListReleaseVersions operation.

The following figure shows the corresponding configuration in the EMR console.

Example: EMR-5.16.0.

ClusterName

The cluster name. The name must be 1 to 128 characters in length. The name must start with a letter but cannot start with http:// or https://. The name can contain letters, digits, colons (:), underscores (_), periods (.), and hyphens (-).

Example: emrtest.

DeployMode

The deployment mode of master nodes in the cluster. Valid values:

NORMAL: regular mode. A cluster that contains only one master node is created.
HA: high availability (HA) mode. A cluster that contains three master nodes is created.

The following figure shows the configuration of enabling the HA mode in the EMR console.

Example: NORMAL.

SecurityMode

The security mode of the cluster. Valid values:

NORMAL: disables Kerberos authentication for the cluster.
KERBEROS: enables Kerberos authentication for the cluster.

The following figure shows the configuration of enabling Kerberos authentication in the EMR console.

Example: NORMAL.

Applications

The following figure shows the services that can be deployed in a DataLake cluster.

In EMR, some services are mutually dependent or mutually exclusive.

Service dependency: Service A depends on Service B. In this case, if you want to deploy Service A in your cluster, you must also deploy Service B. For example, Hive depends on YARN. If you want to deploy Hive, you must also deploy YARN.
Service exclusion: Service A and Service B are mutually exclusive. If you want to deploy Service A, you cannot deploy Service B. For example, Spark 2 and Spark 3 are mutually exclusive. If you want to deploy Spark 2, you cannot deploy Spark 3.

Service deployment of HA clusters

Service to deploy	Dependency	Exclusive service
HDFS	Hadoop-Common and ZooKeeper	OSS-HDFS
OSS-HDFS	Hadoop-Common	HDFS
Hive	Hadoop-Common, YARN, ZooKeeper, and HDFS or OSS-HDFS	None
Spark 2	Hadoop-Common, YARN, Hive, and ZooKeeper	Spark 3
Spark 3	Hadoop-Common, YARN, Hive, ZooKeeper, and HDFS or OSS-HDFS	Spark 2
Tez	Hadoop-Common, YARN, ZooKeeper, and HDFS or OSS-HDFS	None
Trino	Hadoop-Common	None
Flume	Hadoop-Common	None
Kyuubi	Hadoop-Common, YARN, Hive, Spark 3, ZooKeeper, and HDFS or OSS-HDFS	None
YARN	Hadoop-Common, ZooKeeper, and HDFS or OSS-HDFS	None
Impala	Hadoop-Common, YARN, Hive, ZooKeeper, and HDFS or OSS-HDFS	None
Ranger	Hadoop-Common and Ranger-plugin	None
Presto	Hadoop-Common	None
Sqoop	Hadoop-Common, YARN, ZooKeeper, and HDFS or OSS-HDFS	None
Knox	OpenLDAP	None
StarRocks 2	None	StarRocks 3
StarRocks 3	None	StarRocks 2
ClickHouse	ZooKeeper	None
Flink	Hadoop-Common, YARN, OpenLDAP, ZooKeeper, and HDFS or OSS-HDFS	None
HBase	Hadoop-Common, HDFS or OSS-HDFS, and ZooKeeper	None
Phoenix	Hadoop-Common, HDFS or OSS-HDFS, ZooKeeper, and HBase	None

Service deployment of non-HA clusters

If you set the DeployMode parameter to NORMAL, the cluster is a non-HA cluster. The following table describes the dependencies and exclusions between services.

Service to deploy	Dependency	Exclusive service
HDFS	Hadoop-Common	OSS-HDFS
OSS-HDFS	Hadoop-Common	HDFS
Hive	Hadoop-Common and YARN	None
Spark 2	Hadoop-Common, YARN, and Hive	Spark 3
Spark 3	Hadoop-Common, YARN, and Hive	Spark 2
Tez	Hadoop-Common, YARN, and HDFS or OSS-HDFS	None
Trino	Hadoop-Common	None
Flume	Hadoop-Common	None
Kyuubi	Hadoop-Common, YARN, Hive, Spark 3, and ZooKeeper	None
YARN	Hadoop-Common	None
Impala	Hadoop-Common, YARN, and Hive	None
Ranger	Hadoop-Common and Ranger-plugin	None
Presto	Hadoop-Common	None
Sqoop	Hadoop-Common and YARN	None
Knox	OpenLDAP	None
StarRocks 2	None	StarRocks 3
StarRocks 3	None	StarRocks 2
ClickHouse	ZooKeeper	None
Flink	Hadoop-Common, YARN, and OpenLDAP	None
HBase	Hadoop-Common, HDFS or OSS-HDFS, and ZooKeeper	None
Phoenix	Hadoop-Common, HDFS or OSS-HDFS, ZooKeeper, and HBase	None

ApplicationConfigs

The following tables describe the required configurations of the ApplicationConfigs parameter for different types of clusters.

Note

Replace ${Parameter name} with the actual value based on your business requirements.

DataLake cluster

Scenario	Required configuration	Description
OSS-HDFS is deployed.	`[{ "ApplicationName":"OSS-HDFS", "ConfigFileName":"common.conf", "ConfigItemKey":"OSS_ROOT_URI", "ConfigItemValue":"${OSS_ROOT_URI}", "ConfigScope":"CLUSTER" }]`	`${OSS_ROOT_URI}` corresponds to the Root Storage Directory of Cluster parameter in the EMR console. Example: oss://emr-apitest****.cn-hangzhou.oss-dls.aliyuncs.com/. The following figure shows the corresponding configuration in the EMR console.
DLF is used to store metadata.	In EMR V3.43.0 and later V3.X.X versions and EMR V5.9.0 and later V5.X.X versions [ { "ApplicationName":"HIVE", "ConfigFileName":"hivemetastore-site.xml", "ConfigItemKey":"hive.metastore.type", "ConfigItemValue":"DLF", "ConfigScope":"CLUSTER" }, { "ApplicationName":"HIVE", "ConfigFileName":"hivemetastore-site.xml", "ConfigItemKey":"dlf.catalog.id", "ConfigItemValue":"${dlf.catalog.id}", "ConfigScope":"CLUSTER" }, { "ApplicationName":"SPARK3", "ConfigFileName":"hive-site.xml", "ConfigItemKey":"hive.metastore.type", "ConfigItemValue":"DLF", "ConfigScope":"CLUSTER" }, { "ApplicationName":"SPARK3", "ConfigFileName":"hive-site.xml", "ConfigItemKey":"dlf.catalog.id", "ConfigItemValue":"${dlf.catalog.id}", "ConfigScope":"CLUSTER" }] In EMR V3.42.0 and earlier V3.X.X versions and EMR V5.8.0 and earlier V5.X.X versions [ { "ApplicationName":"HIVE", "ConfigFileName":"hive-site.xml", "ConfigItemKey":"hive.metastore.type", "ConfigItemValue":"DLF", "ConfigScope":"CLUSTER" }, { "ApplicationName":"HIVE", "ConfigFileName":"hivemetastore-site.xml", "ConfigItemKey":"dlf.catalog.id", "ConfigItemValue":"${dlf.catalog.id}", "ConfigScope":"CLUSTER" }, { "ApplicationName":"SPARK3", "ConfigFileName":"hive-site.xml", "ConfigItemKey":"hive.metastore.type", "ConfigItemValue":"DLF", "ConfigScope":"CLUSTER" }, { "ApplicationName":"SPARK3", "ConfigFileName":"hive-site.xml", "ConfigItemKey":"dlf.catalog.id", "ConfigItemValue":"${dlf.catalog.id}", "ConfigScope":"CLUSTER" }]	hive.metastore.type: the metadata storage type. The value of this parameter is DLF, which corresponds to DLF Unified Metadata of the Metadata parameter in the EMR console. `${dlf.catalog.id}` corresponds to the DLF Catalog parameter in the EMR console.
ApsaraDB RDS is used to store metadata.	[{ "ApplicationName":"HIVE", "ConfigFileName":"hivemetastore-site.xml", "ConfigItemKey":"hive.metastore.type", "ConfigItemValue":"USER_RDS", "ConfigScope":"CLUSTER" },{ "ApplicationName":"SPARK3", "ConfigFileName":"hive-site.xml", "ConfigItemKey":"hive.metastore.type", "ConfigItemValue":"USER_RDS", "ConfigScope":"CLUSTER" }, { "ApplicationName":"HIVE", "ConfigFileName":"hivemetastore-site.xml", "ConfigItemKey":"javax.jdo.option.ConnectionURL", "ConfigItemValue":"${dbURL}", "ConfigScope":"CLUSTER" }, { "ApplicationName":"HIVE", "ConfigFileName":"hivemetastore-site.xml", "ConfigItemKey":"javax.jdo.option.ConnectionUserName", "ConfigItemValue":"${dbUser}", "ConfigScope":"CLUSTER" }, { "ApplicationName":"HIVE", "ConfigFileName":"hivemetastore-site.xml", "ConfigItemKey":"javax.jdo.option.ConnectionPassword", "ConfigItemValue":"${dbPassword}", "ConfigScope":"CLUSTER" }]	hive.metastore.type: the metadata storage type. The value of this parameter is USER_RDS, which corresponds to Self-managed RDS of the Metadata parameter in the EMR console. `${dbURL}`: the ApsaraDB RDS address. Example: `jdbc:mysql://rm-bp1qg11xjszt3x3****.mysql.rds.aliyuncs.com/hivemeta`. `${dbUser}`: the username of the ApsaraDB RDS user. `${dbPassword}`: the password of the ApsaraDB RDS user.
Built-in MySQL is used to store metadata.	`[{ "ApplicationName":"HIVE", "ConfigFileName":"hivemetastore-site.xml", "ConfigItemKey":"hive.metastore.type", "ConfigItemValue":"LOCAL", "ConfigScope":"CLUSTER" },{ "ApplicationName":"SPARK3", "ConfigFileName":"hive-site.xml", "ConfigItemKey":"hive.metastore.type", "ConfigItemValue":"LOCAL", "ConfigScope":"CLUSTER" }]`	hive.metastore.type: the metadata storage type. The value of this parameter is LOCAL, which corresponds to Built-in MySQL of the Metadata parameter in the EMR console.

OLAP cluster

Scenario	Required configuration	Description
ClickHouse is deployed.	`[ { "ApplicationName": "CLICKHOUSE", "ConfigFileName": "cluster-info", "ConfigItemKey": "replica", "ConfigItemValue": "3", "ConfigScope": "CLUSTER" }, { "ApplicationName": "CLICKHOUSE", "ConfigFileName": "cluster-info", "ConfigItemKey": "shard", "ConfigItemValue": "2", "ConfigScope": "CLUSTER" } ]`	The following figure shows the corresponding configuration in the EMR console. Important Make sure that the following condition is met: `Number of replicas × Number of shards = Number of core nodes`.
StarRocks 2 is deployed and is connected to Data Lake Formation (DLF).	`[{ "ApplicationName":"Starrocks2", "ConfigFileName":"hivemetastore-site.xml", "ConfigItemKey":"hive.metastore.type", "ConfigItemValue":"DLF", "ConfigScope":"CLUSTER" }, { "ApplicationName":"Starrocks2", "ConfigFileName":"hivemetastore-site.xml", "ConfigItemKey":"dlf.catalog.id", "ConfigItemValue":"${dlf.catalog.id}", "ConfigScope":"CLUSTER" }]`	The value of the hive.metastore.type parameter is DLF, which corresponds to DLF Unified Metadata of the Metadata parameter in the EMR console. StarRocks 2 is automatically connected to DLF. `${dlf.catalog.id}` corresponds to the DLF Catalog parameter in the EMR console.
StarRocks 3 is deployed and is connected to DLF.	`[{ "ApplicationName":"Starrocks3", "ConfigFileName":"hivemetastore-site.xml", "ConfigItemKey":"hive.metastore.type", "ConfigItemValue":"DLF", "ConfigScope":"CLUSTER" }, { "ApplicationName":"Starrocks3", "ConfigFileName":"hivemetastore-site.xml", "ConfigItemKey":"dlf.catalog.id", "ConfigItemValue":"${dlf.catalog.id}", "ConfigScope":"CLUSTER" }]`	The value of the hive.metastore.type parameter is DLF, which corresponds to the selection of the STARROCKS3 Automatically Connect with DLF check box in the EMR console. `${dlf.catalog.id}` corresponds to the DLF Catalog parameter in the EMR console.

Dataflow cluster

Scenario

Required configuration

Description

OSS-HDFS is deployed.

[{
"ApplicationName":"OSS-HDFS",
"ConfigFileName":"common.conf",
"ConfigItemKey":"OSS_ROOT_URI",
"ConfigItemValue":"${OSS_ROOT_URI}",
"ConfigScope":"CLUSTER"
}]

${OSS_ROOT_URI} corresponds to the Root Storage Directory of Cluster parameter in the EMR console. Example: oss://emr-apitest****.cn-hangzhou.oss-dls.aliyuncs.com/. The following figure shows the corresponding configuration in the EMR console.

Flink is deployed and is connected to DLF.

[{
"ApplicationName":"FLINK",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"hive.metastore.type",
"ConfigItemValue":"DLF",
"ConfigScope":"CLUSTER"
},{
"ApplicationName":"FLINK",
"ConfigFileName":"hivemetastore-site.xml",
"ConfigItemKey":"dlf.catalog.id",
"ConfigItemValue":"${dlf.catalog.id}",
"ConfigScope":"CLUSTER"
}]

The value of the hive.metastore.type parameter is DLF, which corresponds to the selection of the FLINK Automatically Connect with DLF check box in the EMR console.
${dlf.catalog.id} corresponds to the DLF Catalog parameter in the EMR console.

DataServing cluster

Scenario

Required configuration

Description

OSS-HDFS is deployed.

[{
"ApplicationName":"OSS-HDFS",
"ConfigFileName":"common.conf",
"ConfigItemKey":"OSS_ROOT_URI",
"ConfigItemValue":"${OSS_ROOT_URI}",
"ConfigScope":"CLUSTER"
}]

OSS-HDFS and HBase are deployed, and HBase is used to store logs.

[{
"ApplicationName":"HBASE",
"ConfigFileName":"hbase-site.xml",
"ConfigItemKey":"hbase.wal.mode",
"ConfigItemValue":"HDFS",
"ConfigScope":"CLUSTER"
}]

The value of the hbase.wal.mode parameter is HDFS, which corresponds to the selection of the Use HDFS as HBase HLog Storage check box in the EMR console.

Custom cluster

The required configurations of the ApplicationConfigs parameter for custom clusters are the same as those for other types of clusters.

NodeAttributes

Required. Configure parameters related to Elastic Compute Service (ECS). For more information, see NodeAttributes.

Parameter	Description
VpcId	The virtual private cloud (VPC) ID of the cluster. Example: vpc-bp1tgey2p0ytxmdo5****.
ZoneId	The zone ID. Example: ch-hangzhou-h.
SecurityGroupId	The security group ID. Only basic security groups are supported. Example: sg-hp3abbae8lb6lmb1****.
RamRole	The RAM role that you want to assign to EMR to access other Alibaba Cloud resources from ECS. Default value: AliyunECSInstanceForEMRRole.
KeyPairName	The name of the key pair that is used to log on to the ECS instance in SSH mode.
MasterRootPassword	The initial password of the root user on the master node. This password is used only when the ECS instance is created. You can use the password the first time you configure and verify the identity of the root user.

NodeGroups

Required. The node groups of the cluster. For more information, see NodeGroupConfig.

BootstrapScripts

Optional. The bootstrap actions of the cluster. For more information, see Script.

ClientToken

Optional. The client token that is used to ensure the idempotence of the request. You can configure this parameter to prevent repeated calls to create clusters. The same ClientToken value for multiple calls to the CreateCluster operation results in identical responses. Only one cluster can be created by using the same ClientToken value.

References

For more information about the CreateCluster operation, see CreateCluster.

E-MapReduce:Call the CreateCluster operation to create a cluster

RegionId

ResourceGroupId

PaymentType

SubscriptionConfig

ClusterType

ReleaseVersion

ClusterName

DeployMode

SecurityMode

Applications

Service deployment of HA clusters

Service deployment of non-HA clusters

ApplicationConfigs

DataLake cluster

In EMR V3.43.0 and later V3.X.X versions and EMR V5.9.0 and later V5.X.X versions

In EMR V3.42.0 and earlier V3.X.X versions and EMR V5.8.0 and earlier V5.X.X versions

OLAP cluster

Dataflow cluster

DataServing cluster

Custom cluster

NodeAttributes

NodeGroups

BootstrapScripts

Tags

ClientToken

References