All Products
Search
Document Center

DataWorks:Serverless Spark SQL node

Last Updated:Feb 26, 2026

A Serverless Spark SQL node provides a distributed SQL query engine that runs on an EMR Serverless Spark compute resource. You can use this node to process structured data and improve job execution efficiency.

Prerequisites

  • Compute resource requirements: You can only use an EMR Serverless Spark compute resource. Ensure network connectivity between the resource group and the compute resource.

  • Resource group: Only Serverless resource groups can be used to run this type of task.

  • (Optional) If you are a Resource Access Management (RAM) user, ensure that you have been added to the workspace for task development and have been assigned the Developer or Workspace Administrator role. The Workspace Administrator role has extensive permissions. Grant this role with caution. For more information about adding members, see Add members to a workspace.

    If you use an Alibaba Cloud account, you can skip this step.

Create a node

For more information, see Create a node.

Develop the node

Write your SQL code in the editor. The syntax catalog.database.tablename is supported. Omitting the catalog uses the cluster's default catalog. Omitting catalog.database uses the default catalog's default database.

For more information about catalogs, see Manage data catalogs in EMR Serverless Spark.
-- Replace <catalog.database.tablename> with your actual values
SELECT * FROM <catalog.database.tablename> 

Define variables in your code with the ${variable_name} format and assign their values in the Scheduling Parameters section of the Scheduling Configurations pane. This lets you dynamically pass parameters to scheduled tasks. For more information about how to use scheduling parameters, see Sources and expressions of scheduling parameters. The following code provides an example.

SHOW TABLES; 
-- Define a variable named var by using ${var}. If you assign the value ${yyyymmdd} to this variable, you can create a table with a business date suffix when the scheduled task runs.
CREATE TABLE IF NOT EXISTS userinfo_new_${var} (
  ip STRING COMMENT 'IP address',
  uid STRING COMMENT 'User ID'
) PARTITIONED BY (
  dt STRING
); --This can be used with scheduling parameters.
Note

The maximum SQL statement size is 130 KB.

Debug the node

  1. In the Run Configuration pane, select a Compute resource and a Resource group.

    Parameter

    Description

    Compute resource

    Select a bound EMR Serverless Spark compute resource. If no compute resource is available, select Create Compute Resource from the drop-down list.

    Resource group

    Select a resource group that is bound to the workspace.

    Script parameter

    If you define variables using the ${parameter_name} format in the node content, you must specify the Parameter Name and Parameter Value in the Script Parameter section. The variables are dynamically replaced with their actual values at runtime. For more information, see Sources and expressions of scheduling parameters.

    Serverless Spark node parameters

    The runtime parameters for the Spark application. The following types are supported:

  2. In the toolbar at the top of the node editor, click Run to run the SQL task.

    Important

    Before you deploy the node, you must copy the Run Configuration from the Runtime Configurations pane to the Serverless Spark Node Parameters section in the Scheduling Configurations pane.

Next steps

  • Schedule a node: If a node in the project folder needs to run periodically, you can set the Scheduling Policies and configure scheduling properties in the Scheduling section on the right side of the node page.

  • Publish a node: If the task needs to run in the production environment, click the image icon to publish the task. A node in the project folder runs on a schedule only after it is published to the production environment.

  • Node O&M: After you publish the task, you can view the status of the auto triggered task in the Operation Center. For more information, see Get started with Operation Center.

Appendix: DataWorks parameters

Parameter

Description

FLOW_SKIP_SQL_ANALYZE

Specifies how SQL statements are executed. Valid values:

  • true: Executes multiple SQL statements at a time.

  • false (default): Executes one SQL statement at a time.

Note

This parameter is applicable only to test runs in the Data Development environment.

DATAWORKS_SESSION_DISABLE

Specifies the job submission method. When you run a job in Data Development, the job is submitted to SQL Compute by default. You can use this parameter to specify whether to submit the job to SQL Compute or a resource queue.

  • true: The job is submitted to a resource queue. By default, it uses the default queue specified when the compute resources were bound. If you set DATAWORKS_SESSION_DISABLE to true, you can also configure the SERVERLESS_QUEUE_NAME parameter to specify the queue for job submission during development and execution.

  • false (default): The job is submitted to SQL Compute.

    Note

    This parameter takes effect only when you run jobs in Data Development. It does not take effect for scheduled runs.

SERVERLESS_RELEASE_VERSION

Specifies the Spark engine version. By default, the job uses the default engine version configured for the cluster in the Compute Engines section of the Management Center. Use this parameter to specify a different engine version for a specific job.

Note

The SERVERLESS_RELEASE_VERSION parameter in the advanced settings takes effect only when the SQL Compute (session) specified for the registered cluster is not started in the EMR Serverless Spark console.

SERVERLESS_QUEUE_NAME

Specifies the resource queue for job submission. By default, jobs are sent to the default resource queue configured for the cluster in the Cluster Management section of the Management Center. If you have resource isolation and management requirements, you can add queues and use this parameter to select a different queue. For more information, see Manage resource queues.

Configuration methods:

Note
  • The SERVERLESS_QUEUE_NAME parameter in the advanced settings takes effect only when the SQL Compute (session) specified for the registered cluster is not started in the EMR Serverless Spark console.

  • When running a job in Data Development: You must set DATAWORKS_SESSION_DISABLE to true to submit the job to a queue. The SERVERLESS_QUEUE_NAME parameter takes effect only in this scenario.

  • When running a scheduled job from the Operation Center: The job is always submitted to a queue and cannot be submitted to SQL Compute.

SERVERLESS_SQL_COMPUTE

Specifies the SQL Compute (SQL session). By default, the default SQL Compute instance configured for the cluster in the Compute Engines section of the Management Center is used. If you need to set different SQL sessions for different jobs, you can configure this parameter. For more information about how to create and manage SQL sessions, see Manage SQL sessions.

Others

Custom Spark Configuration parameters. You can add Spark-specific properties.

Use the following format: spark.eventLog.enabled : false . DataWorks automatically appends the parameters to the code submitted to the EMR cluster in the --conf key=value format.

Note

DataWorks allows you to set global Spark parameters at the workspace level. These parameters are applied to all DataWorks modules. You can specify whether these global parameters take priority over module-specific Spark parameters. For more information about how to set global Spark parameters, see Configure global Spark parameters.