Data Lake Analytics (DLA) allows you to configure alerts for virtual clusters and Spark jobs. After you configure alert rules, the system notifies all contacts in the alert contact group when metric data meets the conditions that are specified in the alert rules.

Background information

You can use Alibaba Cloud Prometheus Service to perform monitoring and alerting for DLA. Prometheus Service allows you to view dashboards and configure metrics. If metric data meets the conditions that are specified in the alert rules, Prometheus Service can notify all contacts in the alert contact group by using emails, DingTalk, SMS messages, or phone calls. You can maintain the alert contact groups that correspond to metrics. This ensures that the related contacts can receive notifications immediately when an alert is triggered.

Prerequisites

  • A virtual cluster of DLA is purchased.
  • The AliyunARMSFullAccess policy is attached to the RAM user that you use. This prerequisite must be met if you want to use the credentials of a RAM user to view the metrics of virtual clusters.

Add alerts

  1. Log on to the DLA console.
  2. In the left-side navigation pane, click Virtual Cluster management.
  3. Find the virtual cluster whose metrics you want to view and click Details in the Actions column. Virtual Cluster management
  4. In the left-side navigation pane, click Monitoring alarm. On the page that appears, click the Alarm tab.
  5. On the right side of the page, click Create Alert.
  6. In the Create Alert panel, perform the following operations:
    1. Select a template from the Alert template drop-down list.

      DLA supports the following templates: Presto Cluster CPU Utilization Exceeding 90%, Presto Cluster Memory Usage Exceeding 90%, Spark Cluster CPU/Memory Quota Usage Exceeding 90%, Spark Structured Streaming Job Processing Latency Exceeding 10s, Spark Streaming Job Batch Processing Duration Exceeding 10s, Full GC Per Minute on Spark Job Node Exceeding 10s, and Memory Usage on Spark Job Node Exceeding 90%.

    2. In the Rule Name field, enter a rule name, such as Spark Structured Streaming Job Processing Latency Exceeding 10s.
    3. In the Alarm expression (PromQL) field, enter an alert expression. For example, if you select Spark Structured Streaming Job Processing Latency Exceeding 10s from the Alert template drop-down list, the default expression is spark_structured_streaming_driver_latency/1000 > 10.
      Note For more information about how to configure alerts for a specified job, see Configure alerts for a specific job.
    4. In the duration field, enter a duration, such as 1 minute. An alert is triggered if the alert rule is met in 1 minute.
    5. In the Alarm message (message) field, enter the alert information.
    6. Optional:In the Labels section of Advanced Configuration, click Create Tag to add a tag. The tag can be used to configure alert rules.
    7. Optional:In the Annotations section of Advanced Configuration, click Create Annotation to create an annotation. Then, enter message in the Key field, and enter {{Variable name}}Alert information in the Value field. The annotation is in the format of message:{{Variable name}}Alert information. Example: message:{{$labels.pod_name}}Restart.
      You can customize a variable name or select an existing tag as the variable name. Existing tags include:
      • The tags that are included in an alert rule expression.
      • The tags that are created based on an alert rule.
      • The default tags that are provided by Application Real-Time Monitoring Service (ARMS). The following table describes the default tags.
        TagDescription
        alertnameThe name of an alert. The name is in the format of <Alert name>_<Cluster name>.
        _aliyun_arms_alert_levelThe level of an alert.
        _aliyun_arms_alert_typeThe type of an alert.
        _aliyun_arms_alert_rule_idThe ID of an alert rule.
        _aliyun_arms_region_idThe ID of a region.
        _aliyun_arms_useridThe ID of a user.
        _aliyun_arms_involvedObject_typeThe subtype of an associated object, such as ManagedKubernetes or ServerlessKubernetes.
        _aliyun_arms_involvedObject_kindThe type of an associated object, such as app or cluster.
        _aliyun_arms_involvedObject_idThe ID of an associated object.
        _aliyun_arms_involvedObject_nameThe name of an associated object.
    8. Click OK.

Manage alert rules

  1. Log on to the DLA console.
  2. In the left-side navigation pane, click Virtual Cluster management.
  3. Find the virtual cluster whose metrics you want to view and click Details in the Actions column. Virtual Cluster management
  4. In the left-side navigation pane, click Monitoring alarm. On the page that appears, click the Alarm tab.
  5. Click the Alarm tab. Then, find the alert rule that you want to manage, and click the required option in the Actions column to manage the alert rule.
    • To modify the alert rule, click Editing. In the Edit alarm dialog box, modify the alert rule and click OK.
    • To enable the alert rule, click Enable. After you enable the alert rule, the alert status is changed to Enabled.
    • To disable the alert rule, click Closed. After you disable the alert rule, the alert status is changed to Disabled.