After you submit a Spark job to Lindorm Distributed Processing System (LDPS), you can view the logs of the Spark job by using the Spark web user interface (UI) or the Apache Hadoop Distributed File System (HDFS) shell.
Prerequisites
- LDPS is activated for your Lindorm instance. For more information, see Activate LDPS and modify the configurations.
- The IP address of your client is added to the whitelist of the Lindorm instance. For more information, see Configure a whitelist.
View the logs of a Spark job on the Spark web UI page
You can view the logs of a running or completed Spark job in the Spark web UI page.
To view the logs of a completed Spark job, you must enable History Server.
Note
- Only the logs of Spark jobs that are in the running state can be viewed on the Spark web UI page.
- For more information about the Spark web UI page, see Spark web UI.
Use the Apache HDFS shell to view the logs of Spark jobs
Lindorm synchronizes the logs of Spark jobs in LDPS to the Lindorm file engine service
(LindormDFS). If you want to view the information about Spark jobs that are interrupted,
you can activate LindormDFS for the Lindorm instance and use the Apache HDFS shell
to view the logs of the jobs.
Important
- We recommend that you initialize the SparkSession object before the Main function is invoked when you submit the JAR file of Spark jobs. This helps you ensure that the logs of abnormal jobs can also be synchronized to LindormDFS.
- If a large number of Spark jobs run on LDPS, the logs of the Spark jobs may cause great workloads on LindormDFS. You can specify spark.dfsLog.executor.enabled=false when you start a Spark job to specify that the logs of executors are not synchronized to LindormDFS. This way, only the logs of the driver of the Spark job are synchronized to LindormDFS.