All Products
Search
Document Center

DataWorks:Associate an EMR Serverless Spark computing resource

Last Updated:Mar 26, 2026

To develop and manage EMR Serverless Spark tasks in DataWorks, associate an EMR Serverless Spark workspace with DataWorks as a computing resource.

Prerequisites

Before you begin, ensure that you have:

Limitations

Supported regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).

Permissions:

User/role DataWorks permissions EMR Serverless Spark permissions
Alibaba Cloud account No additional permissions required No additional permissions required
RAM user or RAM role O&M or Workspace Administrator role, or the AliyunDataWorksFullAccess policy. See Grant the Workspace Administrator role to a user. AliyunEMRServerlessSparkFullAccess policy, and the Owner permission for the EMR Serverless Spark workspace. See Manage users and roles.

Open the computing resource page

  1. Log on to the DataWorks console.

  2. Switch to the destination region.

  3. In the left-side navigation pane, choose More > Management Center.

  4. Select your workspace and click Go To Management Center.

  5. In the left-side navigation pane, click Computing Resource.

Associate a Serverless Spark computing resource

On the Computing Resources page, configure and associate a Serverless Spark computing resource.

  1. Click Associate Computing Resource.

  2. On the Associate Computing Resource page, select Serverless Spark as the resource type. The Associate Serverless Spark Computing Resource configuration page opens.

  3. Configure the following parameters.

    Parameter Description
    Spark Workspace Select the Spark workspace to associate. To create a new one, click Create in the drop-down list.
    Default Engine Version Select the engine version to use when creating EMR Spark tasks in Data Studio. To use a different version for a specific task, override this setting in the task's advanced settings.
    Default Resource Queue
    Select the resource queue to use when creating EMR Spark tasks in Data Studio. To add a new queue, click Create in the drop-down list. To use a different queue for a specific task, override this setting in the task's advanced settings.
    Default Kyuubi Gateway (Optional) Configure a Kyuubi Gateway to control how SQL and Kyuubi tasks run. If configured, the gateway runs all related tasks, including EMR Spark SQL/Kyuubi and Serverless Spark SQL/Kyuubi. If not configured, DataWorks runs EMR Spark SQL and Serverless Spark SQL tasks using spark-submit, and EMR Kyuubi and Serverless Kyuubi tasks will fail. See Configure the Kyuubi Gateway for setup instructions.
    Default Access Identity The identity DataWorks uses to access the Spark workspace. In the development environment, only the Executor identity is supported. In the production environment, Alibaba Cloud account, RAM user, and Task Owner are supported.
    Computing Resource Instance Name A name to identify this computing resource. At runtime, this name is used to select the computing resource for a task.
  4. Click OK.

Configure the Kyuubi Gateway

To configure a Kyuubi Gateway, go to EMR Serverless Spark Console > Operation Center > Gateway > Kyuubi Gateway to create a Kyuubi Gateway and a token.

  • If Kerberos is not enabled: Click the gateway name to get the JDBC URL and token, then combine them to form the connection string:

    jdbc:hive2://kyuubi-cn-hangzhou-internal.spark.emr.aliyuncs.com:80/;transportMode=http;httpPath=cliservice/token/<token>
  • If Kerberos is enabled: Get the Beeline connection string based on your Kerberos configuration. For more information, see Use Kerberos with Kyuubi Gateway.

    jdbc:hive2://ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com:10009/;principal=kyuubi/_HOST@EMR.C-DFD43*****7C204.COM

Configure global Spark parameters

Set Spark parameters at the workspace level. These global parameters apply to all tasks by default. Override them with node-specific settings in Data Studio when needed.

Scope Configuration method
Global Configure in Management Center > Computing resource > Spark Parameters page. Applies to Data Studio, Operation Center, and Data Analysis.
Node-specific Configure on the node editing page in Data Studio. Supported for EMR Spark, EMR Kyuubi, EMR Spark SQL, EMR Spark Streaming, Serverless Spark batch, Serverless Spark SQL, and Serverless Kyuubi nodes. Not supported in other modules.

Permissions

Only the following users can configure global Spark parameters:

  • An Alibaba Cloud account

  • A RAM user or RAM role with the AliyunDataWorksFullAccess policy

  • A RAM user with the Workspace Administrator role

Set global Spark parameters

  1. Go to the Computing Resources page and find the Serverless Spark computing resource.

  2. Click Spark Parameters to open the configuration pane.

  3. In the upper-right corner, click Edit Spark Parameters. Configure the following settings.

    This is a workspace-level configuration. Make sure you have selected the correct workspace before proceeding.
    Parameter Description
    Spark Property Add Spark properties for running Serverless Spark tasks. Click Add, then enter the property name and value. For supported properties, see Spark Configuration and Custom Spark Conf parameters.
    Global Settings Take Precedence If selected, global Spark parameters override any settings configured within individual product modules.
  4. Click OK.

Configure cluster account mapping

Map Alibaba Cloud accounts of DataWorks members to specific EMR cluster identities, so members can run tasks using their mapped identities.

Important

This feature is available only for Serverless resource groups. If you purchased a Serverless resource group before August 15, 2025 and want to use this feature, submit a ticket to upgrade the resource group.

  1. Go to the Computing Resources page and find the Serverless Spark computing resource.

  2. Click Account Mappings to open the configuration pane.

  3. Click Edit Account Mapping and configure the mapping based on the selected type.

    Mapping type Task execution Configuration
    System account mapping Uses the cluster account with the same name as the Default Access Identity to run EMR Spark, EMR Spark SQL, EMR Kyuubi, and Notebook nodes. Same-name mapping is used by default. To use a different account, configure the mapping manually.
    OpenLDAP account mapping Uses the Default Access Identity for EMR Spark and EMR Spark SQL nodes. Uses the mapped OpenLDAP account for EMR Kyuubi and Notebook nodes. Required if you have configured LDAP authentication for Kyuubi Gateway. Map each Alibaba Cloud account to an LDAP Account and LDAP Password. If an account is not in the mapping list, tasks for that account will fail.
    Kerberos account mapping Uses the Default Access Identity for EMR Spark and EMR Spark SQL nodes. Uses the mapped Kerberos account for EMR Kyuubi nodes. 1. Upload the krb5.conf file for the Kerberos service configured on the EMR Serverless Spark cluster. 2. For the account specified as the default access identity, configure the principal and keytab for Kerberos authentication.
  4. Click OK.

What's next

After associating the Serverless Spark computing resource, use it to develop tasks in Data Studio: