DataWorks allows you to use Lindorm Spark SQL nodes to develop and schedule Lindorm Spark SQL tasks. This topic describes how to use a Lindorm Spark SQL node to develop a task.
Background information
Lindorm is a distributed computing service based on the cloud-native architecture. It supports Community Edition computing models, is compatible with Spark interfaces, and deeply integrates Lindorm storage engine characteristics. You can use Lindorm to complete distributed jobs and tasks in an efficient manner by leveraging the data storage characteristics and indexing capabilities at the underlying layer. Lindorm can be used in scenarios such as massive data processing, interactive analysis, machine learning, and graph computing.
Prerequisites
(Required if you use a RAM user) The RAM user that you want to use to develop tasks is added to the required workspace and is assigned the Development or Workspace Administrator role. The Workspace Administrator role has extensive permissions, and we recommend that you assign this role to the RAM user only when necessary. For more information about how to add a member to a workspace and grant permissions to the member, see Add workspace members and assign roles to them.
NoteIf you use an Alibaba Cloud account, ignore this prerequisite.
A Lindorm instance is created and associated with the required workspace as a computing resource. For more information, see Add a Lindorm computing resource.
Create a Lindorm Spark SQL node
For information about how to create a Lindorm Spark SQL node, see Create a Lindorm Spark SQL node.
Configure the Lindorm Spark SQL node
When you write code for the Lindorm Spark SQL node in the code editor, you can define variables in the ${variable name} format. Then, you can assign values to the variables on the Debugging Configurations or Properties tab. Sample code for creating a Lindorm table and inserting data into the table:
CREATE TABLE IF NOT EXISTS lindorm_table_job (
id INT,
name STRING,
data STRING
)
USING parquet
PARTITIONED BY (partition_date DATE);
INSERT OVERWRITE TABLE lindorm_table_job PARTITION (partition_date='${var}')
VALUES (1, 'Alice', 'Sample data 1'), (2, 'Bob', 'Sample data 2');You can set ${var} in the preceding code to 2025-04-25. You can configure this parameter to insert data into a fixed partition of the lindorm_table_job table. When the node is run, the value of the parameter is dynamically replaced with an actual value. For more information about how to use scheduling parameters, see Supported formats of scheduling parameters.
For more information about Lindorm Spark SQL operations, see SQL reference.
Debug the Lindorm Spark SQL node
Configure properties for debugging the Lindorm Spark SQL node.
In the right-side navigation pane of the configuration tab of the Lindorm Spark SQL node, click Debugging Configurations. On the Debugging Configurations tab, configure the parameters described in the following table.
Parameter
Description
Computing Resource
Select the Lindorm computing resource that you associate with the workspace.
Lindorm Resource Group
Select the Lindorm resource group that you specify when you associate the Lindorm computing resource with the workspace.
Resource Group
Select the resource group that has passed the connectivity test when you associate the Lindorm computing resource with the workspace.
Script Parameters
If you define variables in the ${Parameter name} format when you configure the Lindorm Spark SQL node, you must configure the Parameter Name and Parameter Value parameters in the Script Parameters section. When the Lindorm Spark SQL node is run, the parameters that you configure will be dynamically replaced with actual values. For more information, see Supported formats of scheduling parameters.
Spark Parameters
The Spark configuration parameters. For more information about parameter settings, see Configure parameters for jobs.
Debug and run the Lindorm Spark SQL node.
In the top toolbar of the configuration tab of the Lindorm Spark SQL node, click Save and Run to save and run the node.
What to do next
Node scheduling: If you want the system to periodically schedule a node in a workspace directory, you can click Properties in the right-side navigation pane of the configuration tab of the node and configure scheduling properties for the node in the Scheduling Policies section.
Node deployment: If you want to deploy a node to the production environment for running, you can click the
icon in the top toolbar of the configuration tab of the node to initiate a deployment process. Nodes in a workspace directory can be periodically scheduled only if they are deployed to the production environment.