All Products
Search
Document Center

DataWorks:Lindorm Spark node

Last Updated:Mar 27, 2026

DataWorks lets you create Lindorm Spark nodes to develop and schedule Spark tasks written in Java, Scala, or Python. This topic walks you through uploading your code to LindormDFS, configuring the node, and running a test execution.

Background information

Lindorm is a distributed computing service based on a cloud native architecture. It supports Community Edition computing models and Apache Spark, and is in-depth integrated with the features provided by the Lindorm storage engine. Lindorm meets computing requirements in various scenarios, such as massive data processing, interactive analytics, machine learning, and graph computing.

Prerequisites

Before you begin, make sure you have:

Note

The Workspace Administrator role grants extensive permissions. Assign it only when necessary.

Create a Lindorm Spark node

For steps on creating the node, see Create a Lindorm Spark node.

Upload your code to LindormDFS

Before configuring the node, upload your JAR package or Python file to LindormDFS so the node can reference it. The upload steps are the same for all languages.

  1. Log on to the Lindorm console. In the top navigation bar, select the target region and find your Lindorm instance on the Instances page.

  2. Click the instance name to open the instance details page.

  3. In the left-side navigation pane, click Compute Engine.

  4. On the Job Management tab, click Upload Resource.

  5. In the upload dialog box, click the dotted area and select the file to upload:

    • Java or Scala: Upload a JAR package. For a quick test, download spark-examples_2.12-3.3.0.jar.

    • Python: Upload a .py file. For a quick test, save the following script as pi.py:

      import sys
      from random import random
      from operator import add
      
      from pyspark.sql import SparkSession
      
      if __name__ == "__main__":
          """
              Usage: pi [partitions]
          """
          spark = SparkSession\
              .builder\
              .appName("PythonPi")\
              .getOrCreate()
      
          partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2
          n = 100000 * partitions
      
          def f(_: int) -> float:
              x = random() * 2 - 1
              y = random() * 2 - 1
              return 1 if x ** 2 + y ** 2 <= 1 else 0
      
          count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
          print("Pi is roughly %f" % (4.0 * count / n))
      
          spark.stop()
      
  6. Click Upload.

  7. After the upload completes, find the file under Upload Resource on the Job Management tab. Click the image icon to the left of the file to copy its storage path in LindormDFS. You need this path when configuring the node.

Configure the Lindorm Spark node

On the configuration tab of the node, set the parameters for your language.

Java or Scala

Parameter Description
Main JAR Resource The LindormDFS storage path you copied in the previous step.
Main Class The fully qualified main class name in the JAR. For the sample JAR, use org.apache.spark.examples.SparkPi.
Parameters Runtime parameters passed to the program. Use the ${var} format for dynamic parameters.
Configuration Items Spark runtime properties. For available properties, see Job configuration instructions. To set global Spark properties shared across jobs, configure them when you associate the Lindorm computing resource.

Python

Parameter Description
Main Package The LindormDFS storage path of the .py file you copied in the previous step.
Parameters Runtime parameters passed to the script. Use the ${var} format for dynamic parameters.
Configuration Items Spark runtime properties. For available properties, see Job configuration instructions.

Debug the Lindorm Spark node

  1. In the right-side navigation pane, click the Run Configuration tab and set the following parameters:

    Parameter Description
    Computing Resource Select the Lindorm computing resource associated with your workspace.
    Lindorm Resource Group Select the Lindorm resource group you specified when associating the computing resource.
    Resource Group Select the resource group that passed the connectivity test during computing resource association.
    Script Parameters If you defined variables in ${Parameter name} format, enter the Parameter Name and Parameter Value here. These values are substituted at runtime. For more information, see Sources and expressions of scheduling parameters.
  2. Save and run the node.

What's next

  • Node scheduling configuration: Click Properties in the right-side navigation pane and configure scheduling properties under Scheduling Policies to have DataWorks run the node on a schedule.

  • Node deployment: Click the image icon in the top toolbar to deploy the node to the production environment. Nodes can only be scheduled periodically after they are deployed to production.