After you submit a Spark job to Lindorm Distributed Processing System (LDPS), you can view the logs of the Spark job by using the Spark web user interface (UI) or the Apache Hadoop Distributed File System (HDFS) shell.

Prerequisites

View the logs of a Spark job on the Spark web UI page

You can view the logs of a running or completed Spark job in the Spark web UI page. To view the logs of a completed Spark job, you must enable History Server.
Note
  • Only the logs of Spark jobs that are in the running state can be viewed on the Spark web UI page.
  • For more information about the Spark web UI page, see Spark web UI.
  1. Log on to the Lindorm console.
  2. On the Instances page, click the ID of the Lindorm instance.
  3. In the left-side navigation pane, choose Compute Engine > Jobs.
  4. Click the address of the Spark web UI of the Spark job that you want to view.
  5. In the top navigation bar of the page that appears, click Executors to view the information about the running Spark job and all executors that are used to run the Spark job.
    Information about the Spark job
  6. In the Executors list, click stdout or stderr in the Logs column to view the corresponding logs.
    Note
    • Click stdout to view standard logs.
    • Click stderr to view error logs.

Use the Apache HDFS shell to view the logs of Spark jobs

Lindorm synchronizes the logs of Spark jobs in LDPS to the Lindorm file engine service (LindormDFS). If you want to view the information about Spark jobs that are interrupted, you can activate LindormDFS for the Lindorm instance and use the Apache HDFS shell to view the logs of the jobs.
Important
  • We recommend that you initialize the SparkSession object before the Main function is invoked when you submit the JAR file of Spark jobs. This helps you ensure that the logs of abnormal jobs can also be synchronized to LindormDFS.
  • If a large number of Spark jobs run on LDPS, the logs of the Spark jobs may cause great workloads on LindormDFS. You can specify spark.dfsLog.executor.enabled=false when you start a Spark job to specify that the logs of executors are not synchronized to LindormDFS. This way, only the logs of the driver of the Spark job are synchronized to LindormDFS.
  1. Log on to the Lindorm console.
  2. On the Instances page, click the ID of the Lindorm instance.
  3. In the left-side navigation pane, choose Compute Engine > Jobs. On the Job tab, obtain the ID of the Spark job whose logs you want to view in the JobId column.
  4. Use the Apache HDFS shell to view logs. For information about how to configure the Apache HDFS Shell, see Use the HDFS shell to connect to and use LindormDFS.
  5. The logs of the Spark job is stored in the /ldspark/ldspark-logs/${JobId} directory. The logs of the driver of the Spark job is stored in the __driver_logs__ directory, and the logs of the executors that are used to run the Spark job is stored in the __executor_logs__/${EXECUTOR_ID} directory. You can run the following command to query the stderr logs of the driver of a Spark job:
    $HADOOP_HOME/bin/hadoop fs -cat  /ldspark/ldspark-logs/${JobId}/__driver_logs__/stderr | less
    Note You can also use the HDFS FUSE client to mount the directory that stores logs of Spark jobs in LindormDFS to an Elastic Compute Service (ECS) instance. For more information, see Use HDFS FUSE to connect to and use LindormDFS.