Data Lake Analytics (DLA) allows you to use a pre-defined alert template to perform monitoring and alerting for a single job or all jobs. This topic describes how to perform monitoring and alerting for a specific job.

Prerequisites

  • A virtual cluster of DLA is purchased.
  • The AliyunARMSFullAccess policy is attached to the RAM user that you use. This prerequisite must be met if you want to use the credentials of a RAM user to view the metrics of virtual clusters.
  • A Spark job is created. For more information about how to create a Spark job, see Create and run Spark jobs.

Configure a job delay alert for a specific job

In most cases, if you select the template for job delay alerts, an alert is sent every time a job is delayed. To accurately perform monitoring and alerting for a specific job in a specific virtual cluster, you can select Spark Structure Streaming Job Delay Longer Than 10s from the Alarm template drop-down list on the Create Alert panel. Then, change the value of Alarm expression (PromQL) based on the following syntax:
spark_structured_streaming_driver_latency{vcName="$(vcName)",app_id=~"$(job_id).*"} / 1000 > $(latency_sec)
The following table describes the parameters in Alarm expression (PromQL).
Parameter Description
vcName The name of the virtual cluster related to the job.
job_id The ID of the job.
latency_sec The delay in processing the job, in seconds.

Configure a job stop alert for a specific job

In most cases, if you select the template for job stop alerts, an alert is sent every time a job is stopped. To accurately perform monitoring and alerting for a specific job in a specific virtual cluster, you can select Spark Job Stop from the Alarm template drop-down list on the Create Alert panel. Then, change the value of Alarm expression (PromQL) based on the following syntax:
sum by (parent_job) (label_replace(up{pod_name=~"${job_id}.*-driver"}, "parent_job", "$1", "pod_name", "(.*?)-(.*)")) < 1
The following table describes the parameter in Alarm expression (PromQL).
Parameter Description
job_id The ID of the job.