All Products
Search
Document Center

E-MapReduce:Submit jobs to Serverless Spark using VS Code and Lingma

Last Updated:Oct 25, 2025

You can use VS Code, Lingma, and the Serverless Spark spark-submit tool to quickly generate and submit Spark jobs. This topic describes how to submit a Serverless Spark job using these tools.

Prerequisites

Step 1: Prepare the environment

Install VS Code and the Lingma extension

  1. Download and install Visual Studio Code (version 1.68 or later is recommended).

  2. Open VS Code. In the Extensions view, search for and install the following extensions. For detailed instructions, see Install Lingma in Visual Studio Code.

    • Python (from Microsoft) for syntax highlighting and debugging

    • Tongyi Lingma, the official AI programming extension from Alibaba Cloud

  3. After the installation is complete, log on to Lingma with your Alibaba Cloud account.

  4. In the status bar, click the Lingma icon and switch the mode in the dialog box to Agent Mode.

  5. Configure a supported shell for VS Code.

    1. Press Cmd + Shift + P (macOS) or Ctrl + Shift + P (Windows/Linux).

    2. Enter Terminal: Select Default Profile and select it.

    3. Select a supported shell.

      • Linux/macOS: bash, fish, pwsh, zsh

      • Windows: Git Bash, pwsh

    4. Completely exit and then reopen VS Code.

Install ossutil and the spark-submit tool

  1. Download and install ossutil. For detailed instructions, see Install ossutil.

  2. Click emr-serverless-spark-tool-0.9.0-SNAPSHOT-bin.zip to download the spark-submit tool, and then unzip the file.

  3. Go to the conf/ directory. Open the connection.properties file in VS Code and configure the following parameters.

    accessKeyId=<ALIBABA_CLOUD_ACCESS_KEY_ID>
    accessKeySecret=<ALIBABA_CLOUD_ACCESS_KEY_SECRET>
    regionId=cn-hangzhou
    endpoint=emr-serverless-spark.cn-hangzhou.aliyuncs.com
    workspaceId=w-xxxxxxxxxxxx
    Important

    The Resource Access Management (RAM) user or RAM role that is associated with the AccessKey must be granted RAM authorization and added to the Serverless Spark workspace.

    The parameters are described in the following table.

    Parameter

    Required

    Description

    accessKeyId

    Yes

    The AccessKey ID and AccessKey secret of the Alibaba Cloud account or RAM user that runs the Spark job.

    Important

    When you configure the accessKeyId and accessKeySecret parameters, make sure the user corresponding to the AccessKey has read and write permissions on the OSS bucket attached to the workspace. To view the attached OSS bucket, go to the Spark page and click Details in the Actions column of the workspace.

    accessKeySecret

    Yes

    regionId

    Yes

    The region ID. This topic uses the China (Hangzhou) region as an example.

    endpoint

    Yes

    The endpoint of EMR Serverless Spark. For more information about endpoints, see Service endpoints.

    This topic uses the public endpoint in the China (Hangzhou) region as an example. The parameter is set to emr-serverless-spark.cn-hangzhou.aliyuncs.com.

    Note

    If the ECS instance does not have public network access, use the VPC endpoint.

    workspaceId

    Yes

    The ID of the EMR Serverless Spark workspace.

Step 2: Generate sample data and job code

Generate sample data

  1. Create a new project folder on your local machine, such as spark-with-lingma, and open the folder in VS Code.

  2. Open the Lingma input box and enter the following prompt:

    Create a new file named employees.csv. Generate 20 rows of data in CSV format. The file should include columns for employee name, department name, and salary. Use common English names. The departments are Engineering, Marketing, Sales, HR, and Finance. The salary should be between 5000 and 30000.
  3. Lingma automatically generates the following content. Click Accept to save it as employees.csv in the current project folder.

  4. In the Lingma input box, enter the following prompt:

    Upload the employees.csv file to oss://spark-demo
  5. Lingma uploads employees.csv to the oss://spark-demo path.

Generate the job code

Generate PySpark code to calculate the average salary for each department.

  1. In the Lingma input box, enter the following natural language instruction:

    Generate an avg_salary_by_dept.py file with the following content:
    1. Use PySpark to read the CSV file from the OSS path oss://spark-demo/employees.csv. The file has a header and the data types should be inferred. Define a clear data structure. (The data content is the same as the employees.csv file you just generated).
    2. Show some sample data.
    3. Calculate the average salary for each department. When calculating the average salary, exclude the header data (the column name is department).
    4. Print the aggregated results.
    5. Add the necessary import statements.
  2. Lingma generates code similar to the following. Click Accept to save the code as avg_salary_by_dept.py in the current project folder.

  3. Use Lingma to upload avg_salary_by_dept.py to the oss://spark-demo path.

Step 3: Submit the job using spark-submit

Use the spark-submit tool to submit the job to the Serverless Spark cluster.

Build the submission command

Use Lingma to build the spark-submit command.

  1. Enter the following prompt in Lingma:

    The following is a spark-submit example. Use it as a reference to give me a command to submit a job: 
    ./bin/spark-submit  --name SparkPi \
    --queue dev_queue  \
    --num-executors 5 \
    --driver-memory 1g \
    --executor-cores 2 \
    --executor-memory 2g \
    --class org.apache.spark.examples.SparkPi \
     oss://<yourBucket>/path/to/spark-examples_2.12-3.3.1.jar \
    10000
    
    My tool is in /yourPath. The job name is AvgSalaryJob. Use the root_queue queue. The job file path is oss://spark-demo/avg_salary_by_dept.py.
  2. Lingma provides the following output:

Submit the job

Use Lingma to run the job submission command.

  1. Enter the following prompt in Lingma.

    Run the command /yourPath/emr-serverless-spark-tool-0.9.0-SNAPSHOT/bin/spark-submit
    --name AvgSalaryJob
    --queue root_queue
    --num-executors 5
    --driver-memory 1g
    --executor-cores 2
    --executor-memory 2g
    oss://spark-demo/avg_salary_by_dept.py
  2. Lingma provides the following output. Click Run.

  3. View the execution result.

Step 4: Monitor the job execution status

You can check the job status using the spark-submit tool and Lingma.

  • Check the job status: /yourPath/emr-serverless-spark-tool-0.9.0-SNAPSHOT/bin/spark-submit --status <jr-xxxxxxxxxxxx>

  • View job details: /yourPath/emr-serverless-spark-tool-0.9.0-SNAPSHOT/bin/spark-submit --detail <jr-xxxxxxxxxxxx>

  • (Optional) Stop the job: /yourPath/emr-serverless-spark-tool-0.9.0-SNAPSHOT/bin/spark-submit --kill <jr-xxxxxxxxxxxx>

You can also log on to the Serverless Spark workspace to view detailed logs and the job status in the job history list.

References

  • For more information about how to use Lingma Agent Mode, see Agent.

  • For answers to frequently asked questions about using Lingma, see the FAQ.

  • For more information about how to use the spark-submit tool, see Submit a job using spark-submit.