This topic describes how to use a Lindorm Spark SQL node in DataWorks to develop and periodically schedule Lindorm Spark SQL Tasks.
Overview
Lindorm is a distributed computing service built on a cloud-native architecture. It supports community-edition compute models, is compatible with Spark interfaces, and is deeply integrated with the Lindorm storage engine. Lindorm leverages the features and indexing capabilities of its underlying data storage to handle large-scale data processing, interactive analytics, machine learning, and graph computing.
A Lindorm Spark SQL node is the DataWorks development object that encapsulates your SQL logic. After you develop and debug a node, you publish it to make it available for periodic scheduling.
The typical workflow is: Create a node → Develop the SQL logic → Debug the node → Configure scheduling → Publish to enable periodic runs.
Prerequisites
Before you begin, make sure that you have:
-
A Lindorm instance created and bound to the DataWorks workspace. For details, see Associate a Lindorm computing resource.
-
(Optional) If you are a Resource Access Management (RAM) user: membership in the relevant workspace with the Developer or Workspace Administrator role assigned. For details, see Add members to a workspace. Alibaba Cloud account users can skip this step.
Create a Lindorm Spark SQL node
Develop a Lindorm Spark SQL node
In the SQL editor, define variables using the ${variable_name} syntax. Assign values to these variables in the Run Configuration or 调度配置 panel on the right side of the node editor.
The following example creates a partitioned Parquet table and inserts daily incremental data into a specific partition. The variable ${var} is a scheduling parameter that controls which partition receives data at runtime — for example, setting it to 2025-04-25 inserts data into the 2025-04-25 partition of lindorm_table_job.
CREATE TABLE IF NOT EXISTS lindorm_table_job (
id INT,
name STRING,
data STRING
)
USING parquet
PARTITIONED BY (partition_date DATE);
INSERT OVERWRITE TABLE lindorm_table_job PARTITION (partition_date='${var}')
VALUES (1, 'Alice', 'Sample data 1'), (2, 'Bob', 'Sample data 2');
For more information about scheduling parameters, see Sources and expressions of scheduling parameters.
For more Lindorm Spark SQL operations, see SQL reference.
Debug a Lindorm Spark SQL node
-
In the Run Configuration panel on the right, configure the runtime properties.
Parameter Description Compute Resource Select the Lindorm compute resource bound to this workspace. Lindorm Resource Group Select the Lindorm Resource Group specified when you bound the Lindorm compute resource. Resource Group Select the Resource Group that passed the connectivity test when you bound the Lindorm Spark compute resource. Script Parameter Provide a value for each variable defined with the ${variable_name}syntax in the node code. For details, see Sources and expressions of scheduling parameters.Spark Parameter Set runtime parameters for the Spark program. For more information about Spark configurations, see Configure parameters for jobs. -
Click Save, then click Run to execute the node.
What's next
-
Node scheduling configuration: To run a node periodically, configure the Scheduling Policy and related settings in the Properties panel on the right.
-
Publish a node: Click the
icon to start the publishing workflow. Nodes run on a schedule only after they are published. -
Data Map (for Lindorm table data): Go to Data Map to collect metadata from Lindorm.