All Products
Search
Document Center

DataWorks:Associate an EMR Serverless Spark computing resource

Last Updated:Nov 26, 2025

To develop and manage EMR Serverless Spark tasks in DataWorks, you must first associate your EMR Serverless Spark workspace as a DataWorks Serverless Spark computing resource. After the resource is associated, you can use it for data development in DataWorks.

Prerequisites

  • An EMR Serverless Spark workspace is created.

  • A DataWorks workspace is created. The RAM user who performs the operations is added to the workspace and assigned the Workspace Administrator role.

    Important

    Only workspaces set to Use Data Studio (New Version) are supported.

  • A Serverless resource group is used. The resource group must be associated to the target DataWorks workspace.

Limits

  • Region limits: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Indonesia (Jakarta), Germany (Frankfurt), and US (Virginia).

  • Permission limits:

    Operator

    Required permissions

    Alibaba Cloud account

    No extra permissions are required.

    RAM user/RAM role

    • DataWorks management permissions: Only workspace members with the O&M or Workspace Administrator role, or members with the AliyunDataWorksFullAccess permission can create computing resources. For more information about how to grant permissions, see Grant the Workspace Administrator permissions to a user.

    • EMR Serverless Spark service permissions:

      • The AliyunEMRServerlessSparkFullAccess access policy.

      • The Owner permission on the EMR Serverless Spark workspace. For more information, see Manage users and roles.

Go to the computing resource list page

  1. Log on to the DataWorks console. Switch to the destination region. In the navigation pane on the left, choose More > Management Center. From the drop-down list, select your workspace and click Go To Management Center.

  2. In the navigation pane on the left, click Computing Resources.

Associate a Serverless Spark computing resource

On the computing resources page, you can configure the parameters to associate a Serverless Spark computing resource.

  1. Select a computing resource type to associate.

    1. Click Associate Computing Resource to open the Associate Computing Resource page.

    2. On the Associate Computing Resource page, set the computing resource type to Serverless Spark to open the Associate Serverless Spark Computing Resource configuration page.

  2. Configure the Serverless Spark computing resource.

    On the Associate Serverless Spark Computing Resource page, configure the parameters as described in the following table.

    Parameter

    Description

    Spark Workspace

    Select the Spark workspace that you want to associate. You can also click Create in the drop-down list to create a Spark workspace.

    Role Authorization

    To allow DataWorks to obtain information about the EMR Serverless Spark cluster, click Add Service-linked Role As Workspace Administrator the first time you select a Spark workspace.

    Important

    After you create the service-linked role, do not remove the administrator role of the DataWorks service-linked roles AliyunServiceRoleForDataWorksOnEmr and AliyunServiceRoleForDataworksEngine from the E-MapReduce Serverless Spark workspace.

    Default Engine Version

    Select the database engine version to use.

    • When you create an EMR Spark task in Data Studio, this database engine version is used by default.

    • To set different database engine versions for different tasks, define them in the advanced settings of the Spark task editing window.

    Default Message Queue

    Select the resource queue to use. You can also click Create in the drop-down list to add a queue.

    • When you create an EMR Spark task in Data Studio, this resource queue is used by default.

    • To set different resource queues for different tasks, define them in the advanced settings of the Spark task editing window.

    Default SQL Compute

    Optional. The default SQL Compute used in EMR Spark SQL node tasks. You can click Create in the drop-down list to create an SQL session.

    • SQL sessions let you configure runtime resources for each session. This provides task-level resource isolation and flexible scheduling. Assigning different tasks to different SQL sessions improves cluster resource use, prevents resource contention, and meets various task needs.

    • To set different SQL Compute resources for different tasks, define them in the advanced settings of the Spark task editing window.

    Default Access Identity

    Defines the identity used to access the Spark workspace from the current DataWorks workspace.

    • Development environment: Only the Executor identity is supported.

    • Production environment: The Alibaba Cloud Account, RAM User, and Task Owner identities are supported.

    Computing Resource Instance Name

    Identifies the computing resource. At runtime, the instance name is used to select the computing resource for a task.

  3. Click OK to complete the Serverless Spark computing resource configuration.

Configure global Spark parameters

In DataWorks, you can specify Spark parameters for each module at the workspace level. You can also set whether global parameters have a higher priority than local parameters within a specific module, such as Data Studio. After you complete the configuration, the specified Spark parameters are used by default to run tasks. The settings are configured as follows:

Parameter Scope

Configuration Method

Global configuration

You can configure global SPARK parameters for a DataWorks module at the workspace level to run EMR tasks. You can also define whether these global SPARK parameters have a higher priority than the SPARK parameters configured within a specific module. For more information, see Configure global SPARK parameters.

Single node

In the Data Studio, you can set specific SPARK properties for a single node task on the node editing page. Other product modules do not support setting SPARK properties separately.

Access control

Only the following roles can configure global Spark parameters:

  • Alibaba Cloud account.

  • A RAM user or RAM role with the AliyunDataWorksFullAccess permission.

  • A RAM user with the Workspace Administrator role.

View global SPARK parameters

  1. Go to the computing resources page and find the Serverless Spark computing resource that you associated.

  2. Click Spark-related Parameter to open the SPARK parameter configuration pane and view the global parameter settings.

Configure global SPARK parameters

Configure global Spark parameters as follows. For more information about how to configure the Spark parameters of a Serverless Spark computing resource, see Job configuration instructions.

  1. Go to the computing resources page and find the Serverless Spark computing resource that you associated.

  2. Click Spark-related Parameter to open the Spark parameter configuration pane and view the global Spark parameter settings.

  3. Set global Spark parameters.

    In the upper-right corner of the Spark-related Parameter page, click Edit Spark-related Parameter to configure global SPARK parameters and set their priorities for each module.

    Note

    These are global settings for the workspace. Before you configure the parameters, confirm that you have selected the correct workspace.

    Parameter

    Instructions

    Spark Property Name

    Configure the Spark properties to use when you run Serverless Spark tasks.

    Global Settings Take Precedence

    If you select this option, the global configuration takes precedence over the configurations within product modules. Tasks are run based on the global SPARK properties.

    • Global Configuration: These are the Spark properties configured in Management Center > Computing Resources for the corresponding Serverless Spark computing resource on the Spark-related Parameter page.

      Currently, you can set global SPARK parameters only for the Data Studio and Operation Center modules.

    • Configuration within a product module:

      • Data Studio: For EMR Spark, EMR Spark SQL, Serverless Spark Batch, Serverless Spark SQL, and Serverless Kyuubi nodes, you can set SPARK properties for a single node task in the Spark Parameters section of the Debugging Configurations or Scheduling tab on the node editing page.

      • Other product modules: Setting SPARK properties within these modules is not supported.

  4. Click OK to save the global SPARK parameters.

Configure cluster account mapping

You can manually configure the mapping between the Alibaba Cloud accounts of DataWorks tenant members and the specified identity accounts of the EMR cluster. This allows DataWorks tenant members to run tasks in EMR Serverless Spark using the mapped cluster identities.

Important

This feature is available only for Serverless resource groups. If you purchased a Serverless resource group before August 15, 2025, and want to use this feature, you must submit a ticket to upgrade the resource group.

  1. Go to the computing resources page and find the Serverless Spark computing resource that you associated.

  2. Click Account Mappings to open the Account Mappings configuration pane.

  3. Click Edit Account Mappings and configure the parameters based on the selected Mapping Type.

    Account Mapping Type

    Task execution description

    Configuration description

    System account mapping

    Uses the cluster account that has the same name as the Default Access Identity in the basic information of the computing resource to run EMR Spark, EMR Spark SQL, EMR Kyuubi, and Notebook node tasks.

    By default, same-name mapping is used. To use a different account mapping, you can manually configure a different account.

    OpenLDAP account mapping

    Uses the Default Access Identity in the basic information of the computing resource to run EMR Spark and EMR Spark SQL tasks.

    Uses the OpenLDAP account that is mapped to the default access identity in the basic information of the computing resource to run EMR Kyuubi and Notebook node tasks.

    If you have configured and enabled LDAP authentication for Kyuubi Gateway, you must configure the mapping between the Alibaba Cloud Account and the OpenLDAP account (LDAP Account and LDAP Password) to run the tasks.

    Important

    If the Alibaba Cloud account required to run DataWorks tasks is not in the account mapping list, the tasks may fail to run.

  4. Click Confirm to complete the cluster account mapping configuration.

Configure a Kyuubi connection

To run tasks related to EMR Kyuubi nodes on an EMR Serverless Spark computing resource, you must configure the Kyuubi connection as follows.

Important

This feature is available only for Serverless resource groups. If you purchased a Serverless resource group before August 15, 2025, and want to use this feature, you must submit a ticket to upgrade the resource group.

  • Prerequisite: You have created a Kyuubi Gateway and a token for the EMR Serverless Spark cluster.

  • Procedure:

    1. Go to the computing resources page and find the Serverless Spark computing resource that you associated.

    2. Click Kyuubi Configuration to open the Kyuubi Configuration pane.

    3. In the upper-right corner of the Kyuubi Configuration page, click Edit Kyuubi Configuration to configure the cluster's Kyuubi connection.

      1. Obtain the token that you created. For more information, see Manage a Kyuubi Gateway.

      2. Append the token to the end of the JDBC URL parameter: .../;transportMode=http;httpPath=cliservice/token/.

        If the .../;transportMode=http;httpPath=cliservice/token/ information does not exist, follow the on-screen prompts to create a Kyuubi Gateway.
      3. Click OK to complete the configuration.

What to do next

After you configure the Serverless Spark computing resource, you can use it to develop node tasks in Data Development. For more information, see EMR Spark nodes, EMR Spark SQL node, Serverless Spark Batch node, Serverless Spark SQL nodes, and Serverless Kyuubi nodes.