This topic describes how to run and debug a Spark job locally during the Spark job development process.
Prerequisites
You have enabled the Lindorm compute engine's public endpoint.
The local IP address is added to the whitelist of the Lindorm instance. For more information, see Configure whitelists.
A Spark job project is prepared.
Run a Spark job locally
Download the latest environment installation package of Lindorm compute engine from the environment package.
Decompress the downloaded package. You can specify a custom path for decompression.
Configure the environment variables of the compute engine locally. Set the decompression path as the SPARK_HOME environment variable.
To configure the environment variables of the compute engine in Windows, perform the following steps:
Open the System Properties page on your local machine and click Environment Variables.
In the Environment Variables window, click New in the System Variables section.
In the New System Variable window, enter the following parameters.
Variable Name: Enter SPARK_HOME.
Variable Value: Enter the path where the package is decompressed.
Click OK.
Click Apply.
To configure the environment variables of the compute engine on Linux, run the
export SPARK_HOME="<path where the package is decompressed>"command and add this command to~/.bashrc.
Package the Spark job project and use
$SPARK_HOME/bin/spark-submitto submit the Spark job. The following is an example of job submission.Using the Spark job example as an example, download and decompress the project.
Configure the following parameters:
Parameter
Value
Description
spark.sql.catalog.lindorm_table.urlld-bp1z3506imz2f****-proxy-lindorm-pub.lindorm.rds.aliyuncs.com:30020.
The public endpoint for accessing LindormTable. Enter the public endpoint for accessing LindormTable of the Lindorm instance through the HBase Java API. Only LindormTable of the same Lindorm instance is supported.
spark.sql.catalog.lindorm_table.usernameThe default username is root.
The username for accessing LindormTable.
spark.sql.catalog.lindorm_table.passwordThe default password is root.
The password for accessing LindormTable.
$SPARK_HOME/bin/spark-submit \ # You can add job dependency JAR files using --jars. For more parameters, refer to spark-submit -h --class com.aliyun.lindorm.ldspark.examples.LindormSparkSQLExample \ lindorm-spark-examples/target/lindorm-spark-examples-1.0-SNAPSHOT.jarNoteIf you do not specify the running mode when submitting a Spark job project, the job runs locally by default. You can also specify the
spark.master=local[*]parameter.Create the corresponding database table structure based on the schema involved in the SQL code.
Use
mvn clean packageto package the job.
After completing the local development of the job, you can submit the Spark job to run on the cloud by submitting a JAR job. For more information, see Step 1: Configure dependencies. Change the connection address used in the Spark job to the private network connection address of the Lindorm compute engine.
Debug a Spark job
Using the Spark job example as an example, debug the Spark job using IntelliJ IDEA. You can download IntelliJ IDEA from the official IntelliJ IDEA website.
Open IntelliJ IDEA and configure the Spark-related dependencies in the pom.xml file as
<scope>provided</scope>.Add $SPARK_HOME/jars to the project dependencies.
In the top menu bar of IntelliJ IDEA, choose .
In the navigation pane on the left, choose , and click + to add a Java class library.
Select $SPARK_HOME/jars.
Click OK.
Run the Spark job program. During job execution, you can view the SparkUI using the link address specified in the logs.
2022/01/14 15:27:58 INFO SparkUI:Bound SparkUI to 0.0.0.0,and started at http://30.25.XX.XX:4040