The Lindorm Spark SQL node in DataWorks lets you develop and periodically schedule Lindorm Spark SQL tasks. This topic describes the main process of using a Lindorm Spark SQL node for task development.
Background
Lindorm is a distributed computing service built on a cloud-native architecture. It supports community-edition computing models, is compatible with Spark interfaces, and is deeply integrated with the Lindorm storage engine. By leveraging underlying data storage features and indexing capabilities, it efficiently completes distributed jobs. It is ideal for scenarios such as massive data processing, interactive analysis, machine learning, and graph computing.
Prerequisites
-
(Optional, required for RAM users) The RAM user that performs task development must be added to the target workspace and granted the Development or Workspace Administrator role (assign with caution due to extensive permissions). For details about adding members, see Add members to a workspace.
NoteIf you use an Alibaba Cloud account, ignore this step.
-
A Lindorm instance has been created and associated with the DataWorks workspace. For details, see Associate a Lindorm computing resource.
Create a Lindorm Spark SQL node
For instructions on how to create a node, see Create a Lindorm Spark SQL node.
Develop a Lindorm Spark SQL node
When you write task code in the SQL editor, you can define variables in the format ${variable_name} and assign values to them in the Run Configuration or Scheduling Settings pane on the right side of the node editing page. The following is an example.
CREATE TABLE IF NOT EXISTS lindorm_table_job (
id INT,
name STRING,
data STRING
)
USING parquet
PARTITIONED BY (partition_date DATE);
INSERT OVERWRITE TABLE lindorm_table_job PARTITION (partition_date='${var}')
VALUES (1, 'Alice', 'Sample data 1'), (2, 'Bob', 'Sample data 2');
In the example, the variable parameter ${var} can be set to 2025-04-25. Setting this parameter inserts data into a specific partition of the lindorm_table_job table, enabling dynamic parameter passing in scheduling scenarios. For details about scheduling parameters, see Scheduling parameter sources and expressions.
For more information about Lindorm Spark SQL operations, see SQL reference.
Debug a Lindorm Spark SQL node
-
Configure debugging properties.
In the Run Configuration pane on the right side of the node, configure the Compute Resource, Lindorm Resource Group, and Resource Group. The parameters are described as follows.
Parameter
Description
Compute Resource
Select the Lindorm compute resource that you associated.
Lindorm Resource Group
Select the Lindorm resource group that you specified when you associated the Lindorm compute resource.
Resource Group
Select a resource group that has passed the connectivity test when you associated the Lindorm Spark compute resource.
Script parameters
When you configure node content, define variables in the format
${parameter_name}. In Script Parameters, configure the Parameter name and Parameter Value. When the task runs, the variables are dynamically replaced with their actual values. For details, see Scheduling parameter sources and expressions.Spark parameter
Runtime parameters for the Spark program. For more Spark property configurations, see Configure parameters for jobs.
-
Debug and run the node.
To execute the node task, click Save and then Run the node task.
What to do next
-
Configure node scheduling: If nodes in the project directory need to be scheduled and run periodically, configure the Scheduling Policy in the Scheduling Settings pane on the right side of the node to configure the related scheduling properties.
-
Deploy a node: If the task needs to be deployed to the production environment to run, click the
icon on the page to initiate the deployment process and deploy the task to the production environment. Nodes in the project directory are scheduled periodically only after they are deployed to the production environment. -
Data Map (Lindorm table data): You can go to Data Map to collect metadata information for Lindorm.