Create and manage Amazon EMR clusters - Dataphin - Alibaba Cloud Documentation Center

Permissions

Super administrators, system administrators, and users with custom global roles that include the Amazon EMR Cluster - Management permission can create and manage Amazon EMR clusters. These users can also specify which users can reference the cluster when an Amazon EMR compute source is created and assign cluster administrators.
Cluster administrators can manage the clusters for which they are responsible.
Users with the Compute Source Management - Create Custom global role can select Amazon EMR clusters that they are authorized to use when they create an Amazon EMR compute source.

Create an Amazon EMR cluster

In the top menu bar of the Dataphin home page, select Planning > Compute Source.
On the Compute Source page, click Manage Amazon EMR Cluster.
In the Manage Amazon EMR Cluster dialog box, click +Create Amazon EMR Cluster.

On the Create Amazon EMR Cluster page, configure the following parameters.

Basic information

Parameter	Description
Cluster Name	Enter a name for the cluster. The name can contain Chinese characters, letters, digits, underscores (_), and hyphens (-). The name must be 128 characters or less.
Cluster Administrator	Select one or more tenant members as cluster administrators. Cluster administrators can edit the cluster, view historical versions, and delete the cluster.
Description (optional)	Enter a brief description for the cluster. The description must be 128 characters or less.

Cluster security control

Available members: Specify which users can reference the cluster when creating a compute source. Select Roles With "Create Compute Source" Permission or Specified Users.
- Roles With "Create Compute Source" Permission: Selected by default.
- Specified Users: Select one or more personal accounts and user groups.

Cluster configuration

Parameter	Description
Primary Node Public DNS	The public DNS is used to retrieve the VPC private DNS. Both Hive and Spark connect through the private DNS. The format is `ec2-{public_ip}.{region}.compute.amazonaws.com`.
*Key File (.pem**)	The key pair for accessing the primary EC2 node. Use the key pair that was configured when the EMR cluster was created.
core-site.xml	Upload the cluster configuration files, or click Get Cluster Configuration to download them from the primary node. You must first enter the public DNS and upload the key file before downloading the configuration files.
yarn-site.xml
hive-site.xml
hdfs-site.xml
Cluster Storage	Currently, only HDFS is supported.
Metadata Retrieval Method	Select HMS or Amazon Glue. HMS: Selected by default. Amazon Glue: If you select Amazon Glue, you must also configure the Glue Region Code, Glue AccessKey ID, and Glue AccessKey Secret. Glue Region Code: Enter the Region Code for Amazon Glue, such as ap-northeast-3, us-east-1, or us-west-1. Glue AccessKey ID, Glue AccessKey Secret: Enter the AccessKey ID and AccessKey secret for accessing Amazon Glue.
Hive JDBC URL	Enter the JDBC connection address for Hive, or click Auto-retrieve to obtain it automatically. Auto-retrieval requires the public DNS and key file. The Hive JDBC URL format is `jdbc:hive2//host1:port1,host2:post2/`. Do not enter a database name.
Spark SQL	Select Enable or Disable. If you select Enable, you must also configure the Spark JDBC URL.
Spark JDBC URL	Enter the JDBC connection address for Spark. The format is `jdbc:hive2//host1:port1/` or `jdbc:kyuubi://host1:port1/`. Do not enter a database name. Note This parameter is required only if Spark SQL is enabled.
Username	The username for Hive or Spark, used as the `username` in JDBC connections.
Spark Local Client	Select Enable or Disable. If you select Enable, upload the Spark client file. Note Download the matching version of the Spark client from the Spark official website or provide your own client. The client must follow the same directory structure as the community version and include the Hadoop client. Upload the complete compressed package in .tgz or .zip format. Dataphin uses the uploaded client to submit jobs through the scheduling cluster for full lifecycle management.

Click Test Connection. After the test is successful, click Save to create the Amazon EMR cluster.

Manage Amazon EMR clusters

In the top menu bar of the Dataphin home page, select Planning > Compute Source.
On the Compute Source page, click Manage Amazon EMR Cluster.
In the Manage Amazon EMR Cluster dialog box, view the list of Amazon EMR clusters, including cluster name, cluster administrator, associated compute sources, creation information, and modification information.
- Associated Compute Sources: Shows the total number of associated compute sources. Click the icon to view the list. Click a compute source name to open its details page.
- Creation Information: Shows who created the cluster and when.
- Modification Information: Shows who last modified the cluster and when.
Note
Compute tasks can run in only one cluster. Data from different Amazon EMR clusters cannot be joined.
(Optional) In the search box, enter a cluster name to perform a fuzzy search.

In the Actions column, perform the following operations on a cluster.

Operation	Description
View	In the Actions column of the target cluster, click the icon to view details of the current cluster version. Users with the Amazon EMR Cluster - Management permission can download the cluster configuration files.
Edit	In the Actions column of the target cluster, click the icon to open the Edit Amazon EMR Cluster page. On this page, modify the configuration. After you finish, click Save. In the dialog box that appears, enter a Change Description and click OK.
Clone	In the Actions column of the target cluster, click the icon. The system clones all data from the cluster and opens the Create Amazon EMR Cluster page, where you can modify the cloned configuration.
Historical versions	In the Actions column of the target cluster, click the icon and select History. A dialog box displays each cluster version with the version name, modifier, and change description. You can View and Compare historical versions. View: In the Actions column of the target version, click the icon to view details of the selected cluster version. Users with the Amazon EMR Cluster - Management permission can download the cluster configuration files. Compare: In the Actions column of the target version, click the icon to open the Version Comparison page. Select a different version from the drop-down list to compare. By default, the current version is compared with the selected version.
Delete	Note The cluster can be deleted only if it has no associated compute sources. A deleted cluster cannot be restored. In the Actions column of the target cluster, click the icon and select Delete. In the dialog box that appears, click OK.