All Products
Search
Document Center

Dataphin:Create an Amazon EMR compute source

Last Updated:May 28, 2025

An Amazon EMR compute source provides computing resources for processing compute tasks in Dataphin projects. If the compute engine of the Dataphin system is set to Amazon EMR, projects can use features such as compute tasks, ad hoc queries, and general scripts only after an Amazon EMR compute source is added to the project. This topic describes how to create an Amazon EMR compute source.

Prerequisites

Procedure

  1. In the top navigation bar of the Dataphin homepage, choose Planning > Compute Source.

  2. On the Compute Source page, click New Compute Source and select Amazon EMR Compute Source.

  3. In the Create Amazon EMR Compute Source dialog box, configure the required parameters.

    Parameter

    Description

    Basic Information

    Compute Type

    Select Amazon EMR.

    Compute Source Name

    Supports Chinese characters, letters, digits, underscores (_), and hyphens (-). The name cannot exceed 64 characters in length.

    Configuration Method

    Currently, only Reference Specified Cluster is supported. You can enter keywords to search. After selection, you can click View to go to the View Amazon EMR Cluster page to view cluster information.

    Description (optional)

    Enter a brief description of the compute source. The description cannot exceed 128 characters in length.

    Compute Configuration

    Primary Node Public DNS

    The system automatically obtains this information from the selected Amazon EMR cluster. Modification is not supported.

    Database

    Enter the database name of the Amazon EMR compute engine.

    Spark SQL

    You can select Enable or Disable. The default value is Enable.

    Note

    This parameter can be configured only when Spark SQL is enabled on the referenced cluster.

    Spark Local Client

    You can select Enable or Disable. The default value is Enable.

    Note

    This parameter can be configured only when both Spark SQL and Spark Local Client are enabled on the referenced cluster.

    Default Queue For Production Tasks (optional)

    Enter a YARN resource queue. Manual and scheduled tasks in the production environment will use this queue.

    Queue For Other Tasks (optional)

    Enter a YARN resource queue. Other tasks (such as ad hoc queries, data previews, and JDBC Driver access) will use this queue.

    Queue For Priority Tasks

    You can select Use Default Queue For Production Tasks or Custom.

    If you select Custom, you can specify a YARN resource queue for each priority level.

    Note

    When Dataphin schedules Hive SQL tasks, it sends tasks to the corresponding queues based on task priorities. When the execution engine of Hive is set to Tez or Spark, you must configure different priority queues for the task priority settings to take effect.

  4. Click Submit.

    After you create an Amazon EMR compute source, you can attach it to a project. For more information, see Create a general project.