How to run and debug Spark jobs locally - Lindorm - Alibaba Cloud Documentation Center

This topic describes how to run and debug a Spark job locally during the Spark job development process.

Prerequisites

You have enabled the Lindorm compute engine's public endpoint.
The local IP address is added to the whitelist of the Lindorm instance. For more information, see Configure whitelists.
A Spark job project is prepared.

Run a Spark job locally

Download the latest environment installation package of Lindorm compute engine from the environment package.
Decompress the downloaded package. You can specify a custom path for decompression.
Configure the environment variables of the compute engine locally. Set the decompression path as the SPARK_HOME environment variable.
- To configure the environment variables of the compute engine in Windows, perform the following steps:
  1. Open the System Properties page on your local machine and click Environment Variables.
  2. In the Environment Variables window, click New in the System Variables section.
  3. In the New System Variable window, enter the following parameters.
    - Variable Name: Enter SPARK_HOME.
    - Variable Value: Enter the path where the package is decompressed.
  4. Click OK.
  5. Click Apply.
- To configure the environment variables of the compute engine on Linux, run the export SPARK_HOME="<path where the package is decompressed>" command and add this command to ~/.bashrc.

Package the Spark job project and use $SPARK_HOME/bin/spark-submit to submit the Spark job. The following is an example of job submission.

Using the Spark job example as an example, download and decompress the project.

Configure the following parameters:

Parameter	Value	Description
`spark.sql.catalog.lindorm_table.url`	ld-bp1z3506imz2f****-proxy-lindorm-pub.lindorm.rds.aliyuncs.com:30020.	The public endpoint for accessing LindormTable. Enter the public endpoint for accessing LindormTable of the Lindorm instance through the HBase Java API. Only LindormTable of the same Lindorm instance is supported.
`spark.sql.catalog.lindorm_table.username`	The default username is root.	The username for accessing LindormTable.
`spark.sql.catalog.lindorm_table.password`	The default password is root.	The password for accessing LindormTable.

$SPARK_HOME/bin/spark-submit \
# You can add job dependency JAR files using --jars. For more parameters, refer to spark-submit -h
--class com.aliyun.lindorm.ldspark.examples.LindormSparkSQLExample \
lindorm-spark-examples/target/lindorm-spark-examples-1.0-SNAPSHOT.jar

Note

If you do not specify the running mode when submitting a Spark job project, the job runs locally by default. You can also specify the spark.master=local[*] parameter.

Create the corresponding database table structure based on the schema involved in the SQL code.
Use mvn clean package to package the job.

After completing the local development of the job, you can submit the Spark job to run on the cloud by submitting a JAR job. For more information, see Step 1: Configure dependencies. Change the connection address used in the Spark job to the private network connection address of the Lindorm compute engine.

Debug a Spark job

Using the Spark job example as an example, debug the Spark job using IntelliJ IDEA. You can download IntelliJ IDEA from the official IntelliJ IDEA website.

Perform Step 1 to Step 3.
Open IntelliJ IDEA and configure the Spark-related dependencies in the pom.xml file as <scope>provided</scope>.
Add $SPARK_HOME/jars to the project dependencies.
1. In the top menu bar of IntelliJ IDEA, choose File > Project Structure.
2. In the navigation pane on the left, choose Project Settings > Libraries, and click + to add a Java class library.
3. Select $SPARK_HOME/jars.
4. Click OK.
Run the Spark job program. During job execution, you can view the SparkUI using the link address specified in the logs.
```
2022/01/14 15:27:58 INFO SparkUI:Bound SparkUI to 0.0.0.0,and started at http://30.25.XX.XX:4040
```