All Products
Search
Document Center

Dataphin:Create and manage Amazon EMR clusters

Last Updated:Jun 17, 2026

Register Amazon EMR clusters in Dataphin and manage cluster configurations, security controls, and lifecycle operations.

Permissions

  • Super administrators, system administrators, and users with custom global roles that include the Amazon EMR Cluster - Management permission can create and manage Amazon EMR clusters. These users can also specify which users can reference the cluster when an Amazon EMR compute source is created and assign cluster administrators.

  • Cluster administrators can manage the clusters for which they are responsible.

  • Users with the Compute Source Management - Create Custom global role can select Amazon EMR clusters that they are authorized to use when they create an Amazon EMR compute source.

Create an Amazon EMR cluster

  1. In the top menu bar of the Dataphin home page, select Planning > Compute Source.

  2. On the Compute Source page, click Manage Amazon EMR Cluster.

  3. In the Manage Amazon EMR Cluster dialog box, click +Create Amazon EMR Cluster.

  4. On the Create Amazon EMR Cluster page, configure the following parameters.

    • Basic information

      Parameter

      Description

      Cluster Name

      Enter a name for the cluster. The name can contain Chinese characters, letters, digits, underscores (_), and hyphens (-). The name must be 128 characters or less.

      Cluster Administrator

      Select one or more tenant members as cluster administrators. Cluster administrators can edit the cluster, view historical versions, and delete the cluster.

      Description (optional)

      Enter a brief description for the cluster. The description must be 128 characters or less.

    • Cluster security control

      Available members: Specify which users can reference the cluster when creating a compute source. Select Roles With "Create Compute Source" Permission or Specified Users.

      • Roles With "Create Compute Source" Permission: Selected by default.

      • Specified Users: Select one or more personal accounts and user groups.

    • Cluster configuration

      Parameter

      Description

      Primary Node Public DNS

      The public DNS is used to retrieve the VPC private DNS. Both Hive and Spark connect through the private DNS. The format is ec2-{public_ip}.{region}.compute.amazonaws.com.

      Key File (*.pem)

      The key pair for accessing the primary EC2 node. Use the key pair that was configured when the EMR cluster was created.

      core-site.xml

      Upload the cluster configuration files, or click Get Cluster Configuration to download them from the primary node. You must first enter the public DNS and upload the key file before downloading the configuration files.

      yarn-site.xml

      hive-site.xml

      hdfs-site.xml

      Cluster Storage

      Currently, only HDFS is supported.

      Metadata Retrieval Method

      Select HMS or Amazon Glue.

      • HMS: Selected by default.

      • Amazon Glue: If you select Amazon Glue, you must also configure the Glue Region Code, Glue AccessKey ID, and Glue AccessKey Secret.

        • Glue Region Code: Enter the Region Code for Amazon Glue, such as ap-northeast-3, us-east-1, or us-west-1.

        • Glue AccessKey ID, Glue AccessKey Secret: Enter the AccessKey ID and AccessKey secret for accessing Amazon Glue.

      Hive JDBC URL

      Enter the JDBC connection address for Hive, or click Auto-retrieve to obtain it automatically. Auto-retrieval requires the public DNS and key file. The Hive JDBC URL format is jdbc:hive2//host1:port1,host2:post2/. Do not enter a database name.

      Spark SQL

      Select Enable or Disable. If you select Enable, you must also configure the Spark JDBC URL.

      Spark JDBC URL

      Enter the JDBC connection address for Spark. The format is jdbc:hive2//host1:port1/ or jdbc:kyuubi://host1:port1/. Do not enter a database name.

      Note

      This parameter is required only if Spark SQL is enabled.

      Username

      The username for Hive or Spark, used as the username in JDBC connections.

      Spark Local Client

      Select Enable or Disable. If you select Enable, upload the Spark client file.

      Note

      Download the matching version of the Spark client from the Spark official website or provide your own client. The client must follow the same directory structure as the community version and include the Hadoop client. Upload the complete compressed package in .tgz or .zip format. Dataphin uses the uploaded client to submit jobs through the scheduling cluster for full lifecycle management.

  5. Click Test Connection. After the test is successful, click Save to create the Amazon EMR cluster.

Manage Amazon EMR clusters

  1. In the top menu bar of the Dataphin home page, select Planning > Compute Source.

  2. On the Compute Source page, click Manage Amazon EMR Cluster.

  3. In the Manage Amazon EMR Cluster dialog box, view the list of Amazon EMR clusters, including cluster name, cluster administrator, associated compute sources, creation information, and modification information.

    • Associated Compute Sources: Shows the total number of associated compute sources. Click the image icon to view the list. Click a compute source name to open its details page.

    • Creation Information: Shows who created the cluster and when.

    • Modification Information: Shows who last modified the cluster and when.

    Note

    Compute tasks can run in only one cluster. Data from different Amazon EMR clusters cannot be joined.

  4. (Optional) In the search box, enter a cluster name to perform a fuzzy search.

  5. In the Actions column, perform the following operations on a cluster.

    Operation

    Description

    View

    In the Actions column of the target cluster, click the image icon to view details of the current cluster version. Users with the Amazon EMR Cluster - Management permission can download the cluster configuration files.

    Edit

    In the Actions column of the target cluster, click the image icon to open the Edit Amazon EMR Cluster page. On this page, modify the configuration. After you finish, click Save. In the dialog box that appears, enter a Change Description and click OK.

    Clone

    In the Actions column of the target cluster, click the image icon. The system clones all data from the cluster and opens the Create Amazon EMR Cluster page, where you can modify the cloned configuration.

    Historical versions

    In the Actions column of the target cluster, click the image icon and select History. A dialog box displays each cluster version with the version name, modifier, and change description. You can View and Compare historical versions.

    • View: In the Actions column of the target version, click the image icon to view details of the selected cluster version. Users with the Amazon EMR Cluster - Management permission can download the cluster configuration files.

    • Compare: In the Actions column of the target version, click the image icon to open the Version Comparison page. Select a different version from the drop-down list to compare. By default, the current version is compared with the selected version.

    Delete

    Note
    • The cluster can be deleted only if it has no associated compute sources.

    • A deleted cluster cannot be restored.

    In the Actions column of the target cluster, click the image icon and select Delete. In the dialog box that appears, click OK.