To develop and manage EMR Serverless Spark tasks in DataWorks, you must first associate your EMR Serverless Spark workspace as a DataWorks Serverless Spark computing resource. After the resource is associated, you can use it for data development in DataWorks.
Prerequisites
An EMR Serverless Spark workspace is created.
A DataWorks workspace is created. The RAM user who performs the operations is added to the workspace and assigned the Workspace Administrator role.
ImportantOnly workspaces set to Use Data Studio (New Version) are supported.
A Serverless resource group is used. The resource group must be associated to the target DataWorks workspace.
Limits
Region limits: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Indonesia (Jakarta), Germany (Frankfurt), and US (Virginia).
Permission limits:
Operator
Required permissions
Alibaba Cloud account
No extra permissions are required.
RAM user/RAM role
DataWorks management permissions: Only workspace members with the O&M or Workspace Administrator role, or members with the
AliyunDataWorksFullAccesspermission can create computing resources. For more information about how to grant permissions, see Grant the Workspace Administrator permissions to a user.EMR Serverless Spark service permissions:
The
AliyunEMRServerlessSparkFullAccessaccess policy.The
Ownerpermission on the EMR Serverless Spark workspace. For more information, see Manage users and roles.
Go to the computing resource list page
Log on to the DataWorks console. Switch to the destination region. In the navigation pane on the left, choose . From the drop-down list, select your workspace and click Go To Management Center.
In the navigation pane on the left, click Computing Resources.
Associate a Serverless Spark computing resource
On the computing resources page, you can configure the parameters to associate a Serverless Spark computing resource.
Select a computing resource type to associate.
Click Associate Computing Resource to open the Associate Computing Resource page.
On the Associate Computing Resource page, set the computing resource type to Serverless Spark to open the Associate Serverless Spark Computing Resource configuration page.
Configure the Serverless Spark computing resource.
On the Associate Serverless Spark Computing Resource page, configure the parameters as described in the following table.
Parameter
Description
Spark Workspace
Select the Spark workspace that you want to associate. You can also click Create in the drop-down list to create a Spark workspace.
Role Authorization
To allow DataWorks to obtain information about the EMR Serverless Spark cluster, click Add Service-linked Role As Workspace Administrator the first time you select a Spark workspace.
ImportantAfter you create the service-linked role, do not remove the administrator role of the DataWorks service-linked roles
AliyunServiceRoleForDataWorksOnEmrandAliyunServiceRoleForDataworksEnginefrom the E-MapReduce Serverless Spark workspace.Default Engine Version
Select the database engine version to use.
When you create an EMR Spark task in Data Studio, this database engine version is used by default.
To set different database engine versions for different tasks, define them in the advanced settings of the Spark task editing window.
Default Message Queue
Select the resource queue to use. You can also click Create in the drop-down list to add a queue.
When you create an EMR Spark task in Data Studio, this resource queue is used by default.
To set different resource queues for different tasks, define them in the advanced settings of the Spark task editing window.
Default SQL Compute
Optional. The default SQL Compute used in EMR Spark SQL node tasks. You can click Create in the drop-down list to create an SQL session.
SQL sessions let you configure runtime resources for each session. This provides task-level resource isolation and flexible scheduling. Assigning different tasks to different SQL sessions improves cluster resource use, prevents resource contention, and meets various task needs.
To set different SQL Compute resources for different tasks, define them in the advanced settings of the Spark task editing window.
Default Access Identity
Defines the identity used to access the Spark workspace from the current DataWorks workspace.
Development environment: Only the Executor identity is supported.
Production environment: The Alibaba Cloud Account, RAM User, and Task Owner identities are supported.
Computing Resource Instance Name
Identifies the computing resource. At runtime, the instance name is used to select the computing resource for a task.
Click OK to complete the Serverless Spark computing resource configuration.
Configure global Spark parameters
In DataWorks, you can specify Spark parameters for each module at the workspace level. You can also set whether global parameters have a higher priority than local parameters within a specific module, such as Data Studio. After you complete the configuration, the specified Spark parameters are used by default to run tasks. The settings are configured as follows:
Parameter Scope | Configuration Method |
Global configuration | You can configure global SPARK parameters for a DataWorks module at the workspace level to run EMR tasks. You can also define whether these global SPARK parameters have a higher priority than the SPARK parameters configured within a specific module. For more information, see Configure global SPARK parameters. |
Single node | In the Data Studio, you can set specific SPARK properties for a single node task on the node editing page. Other product modules do not support setting SPARK properties separately. |
Access control
Only the following roles can configure global Spark parameters:
Alibaba Cloud account.
A RAM user or RAM role with the
AliyunDataWorksFullAccesspermission.A RAM user with the Workspace Administrator role.
View global SPARK parameters
Go to the computing resources page and find the Serverless Spark computing resource that you associated.
Click Spark-related Parameter to open the SPARK parameter configuration pane and view the global parameter settings.
Configure global SPARK parameters
Configure global Spark parameters as follows. For more information about how to configure the Spark parameters of a Serverless Spark computing resource, see Job configuration instructions.
Go to the computing resources page and find the Serverless Spark computing resource that you associated.
Click Spark-related Parameter to open the Spark parameter configuration pane and view the global Spark parameter settings.
Set global Spark parameters.
In the upper-right corner of the Spark-related Parameter page, click Edit Spark-related Parameter to configure global SPARK parameters and set their priorities for each module.
NoteThese are global settings for the workspace. Before you configure the parameters, confirm that you have selected the correct workspace.
Parameter
Instructions
Spark Property Name
Configure the Spark properties to use when you run Serverless Spark tasks.
Click Add below, enter the Spark Property Name and the corresponding Spark Property Value to set the Spark properties.
For information about the supported Spark properties, see Spark Configuration and List of custom Spark Conf parameters.
Global Settings Take Precedence
If you select this option, the global configuration takes precedence over the configurations within product modules. Tasks are run based on the global SPARK properties.
Global Configuration: These are the Spark properties configured in for the corresponding Serverless Spark computing resource on the Spark-related Parameter page.
Currently, you can set global SPARK parameters only for the Data Studio and Operation Center modules.
Configuration within a product module:
Data Studio: For EMR Spark, EMR Spark SQL, Serverless Spark Batch, Serverless Spark SQL, and Serverless Kyuubi nodes, you can set SPARK properties for a single node task in the Spark Parameters section of the Debugging Configurations or Scheduling tab on the node editing page.
Other product modules: Setting SPARK properties within these modules is not supported.
Click OK to save the global SPARK parameters.
Configure cluster account mapping
You can manually configure the mapping between the Alibaba Cloud accounts of DataWorks tenant members and the specified identity accounts of the EMR cluster. This allows DataWorks tenant members to run tasks in EMR Serverless Spark using the mapped cluster identities.
This feature is available only for Serverless resource groups. If you purchased a Serverless resource group before August 15, 2025, and want to use this feature, you must submit a ticket to upgrade the resource group.
Go to the computing resources page and find the Serverless Spark computing resource that you associated.
Click Account Mappings to open the Account Mappings configuration pane.
Click Edit Account Mappings and configure the parameters based on the selected Mapping Type.
Account Mapping Type
Task execution description
Configuration description
System account mapping
Uses the cluster account that has the same name as the Default Access Identity in the basic information of the computing resource to run EMR Spark, EMR Spark SQL, EMR Kyuubi, and Notebook node tasks.
By default, same-name mapping is used. To use a different account mapping, you can manually configure a different account.
OpenLDAP account mapping
Uses the Default Access Identity in the basic information of the computing resource to run EMR Spark and EMR Spark SQL tasks.
Uses the OpenLDAP account that is mapped to the default access identity in the basic information of the computing resource to run EMR Kyuubi and Notebook node tasks.
If you have configured and enabled LDAP authentication for Kyuubi Gateway, you must configure the mapping between the Alibaba Cloud Account and the OpenLDAP account (LDAP Account and LDAP Password) to run the tasks.
ImportantIf the Alibaba Cloud account required to run DataWorks tasks is not in the account mapping list, the tasks may fail to run.
Click Confirm to complete the cluster account mapping configuration.
Configure a Kyuubi connection
To run tasks related to EMR Kyuubi nodes on an EMR Serverless Spark computing resource, you must configure the Kyuubi connection as follows.
This feature is available only for Serverless resource groups. If you purchased a Serverless resource group before August 15, 2025, and want to use this feature, you must submit a ticket to upgrade the resource group.
Prerequisite: You have created a Kyuubi Gateway and a token for the EMR Serverless Spark cluster.
Procedure:
Go to the computing resources page and find the Serverless Spark computing resource that you associated.
Click Kyuubi Configuration to open the Kyuubi Configuration pane.
In the upper-right corner of the Kyuubi Configuration page, click Edit Kyuubi Configuration to configure the cluster's Kyuubi connection.
Obtain the token that you created. For more information, see Manage a Kyuubi Gateway.
Append the token to the end of the JDBC URL parameter:
.../;transportMode=http;httpPath=cliservice/token/.If the
.../;transportMode=http;httpPath=cliservice/token/information does not exist, follow the on-screen prompts to create a Kyuubi Gateway.Click OK to complete the configuration.
What to do next
After you configure the Serverless Spark computing resource, you can use it to develop node tasks in Data Development. For more information, see EMR Spark nodes, EMR Spark SQL node, Serverless Spark Batch node, Serverless Spark SQL nodes, and Serverless Kyuubi nodes.