All Products
Search
Document Center

DataWorks:Serverless Spark batch node

Last Updated:Mar 27, 2026

Serverless Spark Batch nodes let you develop and schedule Spark batch jobs on E-MapReduce (EMR) Serverless Spark clusters directly from DataWorks. You compile your Spark code into a JAR package, reference it in the node, and DataWorks handles the periodic scheduling.

Prerequisites

Before you begin, make sure you have:

The Workspace Administrator role has extensive permissions. Grant it with caution.

Create a node

See Create a node.

Develop a node

Choose how to provide the JAR package based on its size and storage location:

Scenario Method
JAR package smaller than 500 MB, stored locally Option 1: Upload as an EMR JAR resource
JAR package 500 MB or larger, or already stored in OSS Option 2: Reference directly from OSS

Option 1: Upload and reference an EMR JAR resource

Upload a JAR package from your local machine to DataWorks as an EMR JAR resource, then reference it in the node.

Step 1: Create an EMR JAR resource

  1. In the navigation pane, click the Resource Management icon image to open the Resource Management page.

  2. Click the image icon, select Create Resource > EMR JAR, and enter the name spark-examples_2.11-2.4.0.jar.

  3. Click Upload to upload spark-examples_2.11-2.4.0.jar.

  4. Select a Storage Path, Data Source, and Resource Group.

    Important

    For Data Source, select the bound Serverless Spark cluster.

  5. Click Save.

    image

Step 2: Reference the EMR JAR resource

  1. Open the code editor for the Serverless Spark Batch node.

  2. In the navigation pane, expand Resource Management. Right-click the resource and select Reference Resource. A reference statement is automatically added to the code editor:

    ##@resource_reference{"spark-examples_2.11-2.4.0.jar"}
    spark-examples_2.11-2.4.0.jar
  3. Replace the default code with a spark-submit command:

    Parameter Description
    --class The main class of the task in the JAR package. In this example: org.apache.spark.examples.SparkPi.
    ##@resource_reference{"spark-examples_2.11-2.4.0.jar"}
    spark-submit --class org.apache.spark.examples.SparkPi spark-examples_2.11-2.4.0.jar 100

    For all supported parameters, see Submit a task using spark-submit.

Important
  • The code editor does not support comment statements. Do not add comments — they cause a runtime error.

  • Do not specify the deploy-mode parameter. EMR Serverless Spark supports cluster mode only.

Option 2: Reference directly from OSS

Reference a JAR package stored in OSS. DataWorks downloads it automatically when the node runs. Use this method when the JAR has dependencies or when your Spark task depends on scripts.

Step 1: Upload the JAR to OSS

  1. Log on to the OSS console and click Buckets in the navigation pane.

  2. Click the target bucket to open the file management page.

  3. Click Create Directory to create a folder for the JAR package.

  4. Go to the folder and upload SparkWorkOSS-1.0-SNAPSHOT-jar-with-dependencies.jar. This topic uses SparkWorkOSS-1.0-SNAPSHOT-jar-with-dependencies.jar as an example.

Step 2: Reference the JAR in the node

In the code editor for the Serverless Spark Batch node, add a spark-submit command that points to the OSS path:

spark-submit --class com.aliyun.emr.example.spark.SparkMaxComputeDemo oss://mybucket/emr/SparkWorkOSS-1.0-SNAPSHOT-jar-with-dependencies.jar
Important

Replace mybucket and emr with your actual bucket name and folder path.

Parameter Description
--class The full name of the main class to run.
OSS file path Format: oss://{bucket}/{object}. bucket is a unique container name in OSS (view all buckets in the OSS console). object is the file path within the bucket.

For all supported parameters, see Submit a task using spark-submit.

Debug the node

  1. In the Run Configuration section, configure the following parameters:

    Configure global Spark parameters at the workspace level to apply them across all modules, with the option to override module-specific parameters. See Configure global Spark parameters.
    Parameter Description
    Computing Resource Select a bound EMR Serverless Spark compute resource. If none are available, select Create Computing Resource from the drop-down list.
    Resource Group Select a resource group bound to the workspace.
    Script Parameters Define variables in ${ParameterName} format in the node code, then specify the Parameter Name and Parameter Value here. DataWorks replaces them with actual values at runtime. See Sources and expressions of scheduling parameters.
    ServerlessSpark Node Parameters Runtime parameters for the Spark program. Two categories are supported: DataWorks custom parameters (see Appendix: DataWorks parameters) and Spark built-in properties (Open-source Spark properties and Custom Spark Conf parameters). Format: spark.eventLog.enabled : false. DataWorks passes them to Serverless Spark as --conf key=value.
  2. On the toolbar, click Run.

Important

Before publishing, sync ServerlessSpark Node Parameters from Run Configuration to the ServerlessSpark Node Parameters field under Scheduling.

What's next

After development, the typical workflow is: configure scheduling → publish to production → monitor in Operation Center.

  • Schedule the node: Set scheduling policies and configure scheduling properties in the Scheduling section on the node page. See Schedule a node.

  • Publish the node: Click the image icon to publish. A node runs on a schedule only after it is published to the production environment. See Publish a node.

  • Monitor the node: After publishing, track auto-triggered runs in Operation Center. See Get started with Operation Center.

References

Appendix: DataWorks parameters

Parameter Description
SERVERLESS_QUEUE_NAME The resource queue to submit the task to. By default, tasks go to the Default Resource Queue configured in Cluster Management under Management Center. Add queues for resource isolation. See Manage resource queues. Set this parameter via node parameters or global Spark parameters.