After you submit a Spark job to Lindorm Distributed Processing System (LDPS), you can view the running status of the Spark job by using the Spark web UI or the Apache Hadoop Distributed File System (HDFS) shell.
Prerequisites
LDPS is activated for the Lindorm instance. For more information, see Activate LDPS and modify the configurations.
The IP address of your client is added to the whitelist of the Lindorm instance. For more information, see Configure whitelists.
View the running status of a Spark job on the Spark web UI
You can view the running status of a Spark job in the Spark web UI.
You can view the running status of only Spark jobs that are in the running state.
To view the running status of a completed Spark job, you must enable History Server.
For more information about Spark web UI, see View the information about Spark jobs.
Log on to the Lindorm console.
On the Instances page, click the ID of the Lindorm instance.
In the left-side navigation pane, choose
.Click the address of the Spark web UI of the Spark job that you want to view. Log on to the Spark web UI by using the username and password of LindormTable.
NoteYou can obtain the default username and password of LindormTable on the Wide Table Engine tab of the Database Connections page.
In the top navigation bar of the page that appears, click Executors to view the information about the running Spark job and all executors that are used to run the Spark job.
In the Executors list, click stdout or stderr in the Logs column to view the corresponding logs. You can also click Thread Dump column to view the thread stack information or click the System Status Dump column to view the system status of the Executor.
NoteClick stdout to view standard logs.
Click stderr to view error logs.
Use the Apache HDFS shell to view logs of Spark jobs
Lindorm synchronizes the logs of Spark jobs in LDPS to the Lindorm file engine service (LindormDFS). If you want to view the information about Spark jobs that are interrupted, you can activate LindormDFS for the Lindorm instance and use the Apache HDFS shell to view the logs of the jobs.
We recommend that you initialize the SparkSession object before the Main function is invoked when you submit the JAR file of Spark jobs. This helps you ensure that the logs of abnormal jobs can also be synchronized to LindormDFS.
If a large number of Spark jobs run on LDPS, the logs of the Spark jobs may cause great workloads on LindormDFS. You can specify spark.dfsLog.executor.enabled=false when you start a Spark job to specify that the logs of executors are not synchronized to LindormDFS. This way, only the logs of the driver of the Spark job are synchronized to LindormDFS.
Log on to the Lindorm console.
On the Instances page, click the ID of the Lindorm instance.
In the left-side navigation pane, choose
. On the Job tab, obtain the ID of the Spark job whose logs you want to view in the JobId column.Use the Apache HDFS shell to view logs. For information about how to configure the Apache HDFS Shell, see Use the HDFS shell to connect to and use LindormDFS.
The logs of the Spark job is stored in the
/ldspark/ldspark-logs/${JobId}
directory. The logs of the driver of the Spark job is stored in the__driver_logs__
directory, and the logs of the executors that are used to run the Spark job is stored in the__executor_logs__/${EXECUTOR_ID}
directory. You can run the following command to query the stderr logs of the driver of a Spark job:$HADOOP_HOME/bin/hadoop fs -cat /ldspark/ldspark-logs/${JobId}/__driver_logs__/stderr | less
NoteYou can also use the HDFS FUSE client to mount the directory that stores logs of Spark jobs in LindormDFS to an Elastic Compute Service (ECS) instance. For more information, see Use HDFS FUSE to connect to and use LindormDFS.