All Products
Search
Document Center

DataWorks:ADB Spark SQL node

Last Updated:Apr 21, 2026

The ADB Spark SQL node in DataWorks lets you develop, periodically schedule, and integrate AnalyticDB Spark SQL tasks with other tasks. This topic describes the workflow for developing tasks by using an ADB Spark SQL node.

Background information

Data developers can submit Spark SQL statements directly in the AnalyticDB for MySQL console for data analysis. After you add an AnalyticDB for MySQL Serverless Spark instance to DataWorks as a computing resource, you can use the ADB Spark SQL node to run Spark SQL tasks. For more information, see Introduction to Spark SQL development.

Prerequisites

Prerequisites for AnalyticDB for MySQL:

  • You have created an AnalyticDB for MySQL cluster of Basic Edition in the same region as your DataWorks workspace. For more information, see Create a cluster.

  • You have created an interactive resource group with the engine type set to Spark in your AnalyticDB for MySQL cluster to run Spark SQL tasks from DataWorks. For more information, see Create an interactive resource group.

  • If you need to use Object Storage Service (OSS) in an ADB Spark SQL node, ensure that the OSS bucket is in the same region as your AnalyticDB for MySQL cluster.

Prerequisites for DataWorks:

  • You have a workspace with Use Data Studio (New Version) enabled and a resource group bound to it. For more information, see Create a workspace.

  • The resource group is deployed in the same virtual private cloud (VPC) as the AnalyticDB for MySQL cluster, and you have added the IP addresses of the resource group to the whitelist of the AnalyticDB for MySQL cluster. For more information, see Configure a whitelist.

  • You have added the AnalyticDB for MySQL cluster instance to DataWorks as an AnalyticDB for Spark computing resource and verified its connectivity using the resource group. For more information, see Bind a computing resource.

  • You have created an ADB Spark SQL node. For more information, see Create a node for a scheduling workflow.

Step 1: Develop the ADB Spark SQL node

  1. Create an external database.

    In the node editor, write your SQL code. The following example shows how to create an external database in an ADB Spark SQL node. To create an internal table, see Create an internal table by using Spark SQL.

    CREATE DATABASE IF NOT EXISTS `adb_spark_db` location 'oss://dw-1127/db_home';
  2. Develop the ADB Spark SQL node.

    In the SQL editor, write the task code. You can define variables in the ${variable_name} format and assign values to them in the Scheduling Parameters section of the Scheduling Settings pane on the right. This allows for dynamic parameter passing during scheduled runs. For more information about scheduling parameters, see Sources and expressions of scheduling parameters. The following code provides an example.

    CREATE TABLE IF NOT EXISTS adb_spark_db.tb_order_${var}(id int, name string, age int) 
    USING parquet 
    location 'oss://dw-1127/db_home/tb1' 
    tblproperties ('parquet.compress'='SNAPPY');
    
    CREATE TABLE IF NOT EXISTS adb_spark_db.tb_order_result_${var}(id int, name string, age int) 
    USING parquet 
    location 'oss://dw-1127/db_home/tb2' 
    tblproperties ('parquet.compress'='SNAPPY');
    
    INSERT INTO adb_spark_db.tb_order_result_${var} SELECT * FROM adb_spark_db.tb_order_${var};
    Note

    The variable ${var} in the example can be set to $[yyyymmdd]. This allows you to process daily incremental data in batches.

Step 2: Debug the ADB Spark SQL node

  1. Configure run properties for the ADB Spark SQL node.

    In the Run Configuration pane on the right, configure the Compute Resource, ADB Computing Resource Group, Resource Group, and Compute CU parameters. For more information, see the following table.

    Parameter type

    Parameter

    Description

    Compute Resource

    Compute Resource

    Select the AnalyticDB for Spark computing resource that you bound to the workspace.

    ADB Computing Resource Group

    Select the interactive resource group that you created in the AnalyticDB for MySQL cluster. For more information, see Create and manage a resource group - [This document is offline and has been replaced by a new version. No review is required.].

    Note

    The engine type of the interactive resource group must be Spark.

    Resource Group

    Resource Group

    Select the resource group that passed the connectivity test when you bound the AnalyticDB for Spark computing resource.

    Compute CU

    The node uses the default number of compute units (CUs). You do not need to change this value.

  2. Debug and run the ADB Spark SQL node.

    To run the task, click Save and then Run.

Step 3: Schedule the ADB Spark SQL node

  1. Configure scheduling properties for the ADB Spark SQL node.

    To run the node periodically, go to the Scheduling Settings pane on the right. In the Scheduling Policy section, configure the following parameters based on your business needs. For more information about other parameters, see Configure scheduling for a node.

    Parameter

    Description

    Compute Resource

    Select the AnalyticDB for Spark computing resource that you bound to the workspace.

    ADB Computing Resource Group

    Select the interactive resource group that you created in the AnalyticDB for MySQL cluster. For more information, see Create and manage a resource group - [This document is offline and has been replaced by a new version. No review is required.].

    Note

    The engine type of the interactive resource group must be Spark.

    Resource Group

    Select the resource group that passed the connectivity test when you bound the AnalyticDB for Spark computing resource.

    Compute CU

    The node uses the default number of compute units (CUs). You do not need to change this value.

  2. Deploy the ADB Spark SQL node.

    After configuring the node, you must deploy it. For more information, see Deploy a node or workflow.

Next steps

After a task is deployed, you can view the running status of the periodic task in the Operation Center. For more information, see Getting started with the Operation Center.