This topic describes how to use a Lindorm Spark SQL node in DataWorks to develop and periodically schedule Lindorm Spark SQL Tasks.
Background
Lindorm is a distributed computing service built on a cloud-native architecture. It supports community-edition compute models, is compatible with Spark interfaces, and is deeply integrated with the Lindorm storage engine. Lindorm leverages the features and indexing capabilities of its underlying data storage to efficiently complete distributed jobs, making it ideal for use cases such as large-scale data processing, interactive analytics, Machine Learning, and graph computing.
Prerequisites
Optional: If you are a Resource Access Management (RAM) user, ensure you have been added to the relevant Workspace and assigned the Developer or Workspace Administrator role. For details about how to add members, see Add members to a workspace.
NoteIf you are using an Alibaba Cloud account, you can skip this step.
A Lindorm Instance has been created and bound to the DataWorks Workspace. For details, see Associate a Lindorm computing resource.
Create a Lindorm Spark SQL node
To create a node, see Create a Lindorm Spark SQL node.
Develop a Lindorm Spark SQL node
In the SQL editor, define variables using the ${variable_name} syntax. You can then assign values to these variables in the Run Configuration or 调度配置 panel on the right side of the node editor page.
CREATE TABLE IF NOT EXISTS lindorm_table_job (
id INT,
name STRING,
data STRING
)
USING parquet
PARTITIONED BY (partition_date DATE);
INSERT OVERWRITE TABLE lindorm_table_job PARTITION (partition_date='${var}')
VALUES (1, 'Alice', 'Sample data 1'), (2, 'Bob', 'Sample data 2');In this example, the variable ${var} can be set to 2025-04-25. This setting inserts data into a specific partition of the `lindorm_table_job` table. This enables dynamic parameter passing for scheduled runs. For more information about scheduling parameters, see Sources and expressions of scheduling parameters.
For more Lindorm Spark SQL operations, see SQL reference.
Debug a Lindorm Spark SQL node
Configure runtime properties.
In the Run Configuration panel on the right, configure the Compute Resource, Lindorm Resource Group, and Resource Group. The following table describes these parameters.
Parameter
Description
Compute Resource
Select the Lindorm compute resource that you have bound.
Lindorm Resource Group
Select the Lindorm Resource Group that you specified when you bound the Lindorm compute resource.
Resource Group
Select the Resource Group that passed the connectivity test when you bound the Lindorm Spark compute resource.
Script Parameter
If you define variables in the node code by using the
${Parameter Name}syntax, you must provide a value for each variable. For details, see Sources and expressions of scheduling parameters.Spark Parameter
Runtime parameters for the Spark program. For more information about Spark configurations, see Configure parameters for jobs.
Debug the node.
To run the node Task, click Save and then Run.
Next steps
Node scheduling configuration: If a node must run periodically, configure the Scheduling Policy and other related settings in the Properties panel on the right.
Publish a node: To deploy the Task, click the
icon to start the publishing workflow. Nodes are scheduled for periodic runs only after they are published.Data Map (for Lindorm table data): Go to Data Map to collect metadata from Lindorm.