To develop and manage EMR Serverless Spark tasks in DataWorks, you must associate your EMR Serverless Spark workspace with DataWorks as a serverless Spark computing resource. Once associated, you can use this computing resource for data development in DataWorks.
Prerequisites
You have created a DataWorks workspace. Your RAM user must have the Workspace Administrator role in the workspace.
ImportantOnly workspaces where Use Data Studio (New Version) is selected are supported.
You have used a serverless resource group and associated it with the target DataWorks workspace.
Limitations
Supported regions : China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).
Permissions:
User/role
Required permissions
Alibaba Cloud account
No additional permissions are required.
RAM user or RAM role
DataWorks management permissions: Only workspace members with the O&M or Workspace Administrator role, or members with the
AliyunDataWorksFullAccesspolicy can create computing resources. For more information about authorization, see Grant a user the Workspace Administrator role.EMR Serverless Spark service permissions:
The
AliyunEMRServerlessSparkFullAccesspolicy.The
Ownerpermission for the EMR Serverless Spark workspace. For more information, see Manage users and roles.
Go to the computing resource list page
-
Log on to the DataWorks console. In the navigation pane on the left, switch to the target region and click . Select your workspace from the drop-down list and click Go to Management Center.
-
In the navigation pane on the left, click Computing Resources to open the computing resource list page.
Associate a serverless Spark resource
On the Computing Resources page, configure and associate a serverless Spark computing resource.
Select the type of computing resource to associate.
Click Associate Computing Resources to open the Associate Computing Resources page.
On the Associate Computing Resources page, select Serverless Spark as the computing resource type. This opens the Associate EMR Serverless Spark Computing Resource configuration page.
Configure the serverless Spark computing resource.
On the Associate EMR Serverless Spark Computing Resource configuration page, configure the parameters as described in the following table.
Parameter
Description
EMR Serverless Spark Workspace
Select the Spark workspace that you want to associate. You can also click Create in the drop-down list to create a Spark workspace.
Default Engine Version
Select the engine version that you want to use.
When you create an EMR Spark task in Data Studio, this engine version is used by default.
To use different engine versions for specific tasks, you can configure them in the advanced settings of the Spark task editor.
Default Resource Queue
Select the resource queue that you want to use. You can also click Create in the drop-down list to add a queue.
When you create an EMR Spark task in Data Studio, this resource queue is used by default.
To use different resource queues for specific tasks, you can configure them in the advanced settings of the Spark task editor.
Default Kyuubi Gateway
Optional. The Kyuubi Gateway configuration affects how the following tasks run:
If you configure a Kyuubi Gateway:
All related tasks (EMR Spark SQL/Kyuubi, Serverless Spark SQL/Kyuubi) run through the Kyuubi Gateway.
If you do not configure a Kyuubi Gateway:
EMR Spark SQL and Serverless Spark SQL tasks run using
spark-submit.EMR Kyuubi and Serverless Kyuubi tasks fail to run.
To configure a gateway, go to to create a Kyuubi Gateway and a token .
If Kerberos is not enabled: Click the name of the Kyuubi Gateway to obtain the JDBC URL and token. Then, concatenate them to form the complete URL.
If Kerberos is enabled: Obtain the Beeline URL based on the configured Kerberos information. For more information, see Use Kerberos with Kyuubi Gateway.
# Example of a regular URL jdbc:hive2://kyuubi-cn-hangzhou-internal.spark.emr.aliyuncs.com:80/;transportMode=http;httpPath=cliservice/token/<token> # Example of a URL for a Kerberos-enabled cluster (Make sure to include the principal of the Kyuubi service) jdbc:hive2://ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com:10009/;principal=kyuubi/_HOST@EMR.C-DFD43*****7C204.COM
Default Access Identity
Specifies the identity used to access the EMR Serverless Spark workspace.
Development environment: Only the Executor identity is supported.
Production environment: You can use an Alibaba Cloud Account, an Alibaba Cloud RAM Sub-account, and a Task Owner.
Computing Resource Instance Name
Identifies the computing resource. At runtime, the instance name is used to select the computing resource for a task.
Click Confirm to complete the configuration of the serverless Spark computing resource.
Configure global Spark parameters
In DataWorks, you can specify Spark parameters for each module at the workspace level and configure whether global parameters take precedence over local parameters within a specific module, such as Data Studio. After the configuration, tasks use the corresponding Spark parameters by default. The following table describes how to configure the parameters.
Scope | Configuration method |
Global | You can configure global Spark parameters for a DataWorks functional module at the workspace level when running EMR tasks. You can also define whether these global Spark parameters take precedence over the Spark parameters configured within a specific module. For more information, see Configure global Spark parameters. |
Node-specific | In Data Studio, you can set specific Spark properties for a single node task on the node editing page. Other product modules do not support setting Spark properties separately within the module. |
Permissions
Only the following roles can configure global Spark parameters:
An Alibaba Cloud account.
A RAM user or RAM role with the
AliyunDataWorksFullAccesspolicy.A RAM user with the Workspace Administrator role.
Configure global Spark parameters
You can configure global Spark parameters by following these steps. For more information about the Spark parameters that you can configure for a serverless Spark computing resource, see Job configuration.
Go to the Computing Resources page and find the serverless Spark computing resource that you have associated.
Click Spark parameters to open the Spark parameters configuration pane where you can view the global Spark parameter configurations.
Set global Spark parameters.
In the upper-right corner of the Spark parameters page, click Edit Spark parameters to configure global Spark parameters and their priorities for each module.
NoteThis is a workspace-level configuration. Before you proceed, make sure that you have selected the correct workspace.
Parameter
Procedure
Spark property
Configure the Spark properties used when running Serverless Spark tasks.
Click Add, and then enter a Spark Property Name and its corresponding Spark Property Value to set the Spark property.
For more information about the supported Spark properties, see Spark Configuration and List of custom Spark Conf parameters.
Global Settings Take Precedence
If you select this option, global configurations override those in product modules. Tasks are then run based on the globally configured Spark properties.
Global configuration: The Spark properties configured on the Spark parameters page for the corresponding serverless Spark computing resource in .
Currently, you can set global Spark parameters only for Data Studio, Operation Center, and Data Analysis modules.
Configuration within a product module:
Data Studio: For EMR Spark, EMR Kyuubi, EMR Spark SQL, EMR Spark Streaming, Serverless Spark Batch, Serverless Spark SQL, and Serverless Kyuubi nodes, you can set Spark properties for a single node task in the Spark Parameters section of the Run Configuration or Scheduling Settings tab on the node editing page.
Other product modules: Setting Spark properties separately within these modules is not supported.
Click Confirm to save the configured global Spark parameters.
Configure cluster account mapping
Manually map the cloud accounts of DataWorks members to specified identities in an EMR cluster. This allows members to run EMR Serverless Spark tasks using the mapped cluster identities.
This feature is available only in a serverless resource group. If you purchased a serverless resource group before August 15, 2025 and want to use this feature, you need to submit a ticket to upgrade the resource group.
Go to the Computing Resources page and find the serverless Spark computing resource that you have associated.
Click Account Mappings to go to the Account Mappings parameter configuration pane.
Click Edit Account Mapping to configure the cluster account mapping. You can configure the parameters based on the selected Mapping Type.
Mapping Type
Runtime description
Configuration details
System account mapping
Runs EMR Spark, EMR Spark SQL, EMR Kyuubi, and Notebook node tasks by using the cluster account with the same name as the Default Access Identity in the basic information of the computing resource.
By default, same-name mapping is used. If you need to use a different account mapping, you can manually configure one.
OpenLDAP account mapping
Runs EMR Spark and EMR Spark SQL tasks by using the Default Access Identity in the basic information of the computing resource.
Runs EMR Kyuubi and Notebook node tasks using the OpenLDAP account mapped to the default access identity in the basic information of the computing resource.
If you have configured and enabled LDAP authentication for Kyuubi Gateway, you must configure mappings between Account and OpenLDAP accounts (LDAP Account, LDAP Password) to run the corresponding tasks.
ImportantIf the cloud account required to run a DataWorks task is not in the account mapping list, the task may fail.
Kerberos account mapping
Runs EMR Spark and EMR Spark SQL tasks by using the Default Access Identity in the basic information of the computing resource.
Runs EMR Kyuubi node tasks using the Kerberos account mapped to the default access identity in the basic information of the computing resource.
You must upload the krb5.conf file of the Kerberos service configured for the EMR Serverless Spark cluster.
Configure the principal and keytab required for Kerberos authentication for the cloud account specified as the default access identity.
Click OK to complete the cluster account mapping configuration.
Next steps
After you configure the serverless Spark computing resource, you can use this computing resource to develop node tasks in Data Studio. For more information, see EMR Spark node, EMR Spark SQL node, EMR Spark Streaming node, EMR Kyuubi node, Serverless Spark Batch node, Serverless Spark SQL node, and Serverless Kyuubi node.