All Products
Search
Document Center

Microservices Engine:Configure monitoring and alerting

Last Updated:Apr 23, 2025

Alibaba Cloud XXL-JOB allows you to configure monitoring and alerting at the job, instance, and application levels. When a job is running, you can configure monitoring and alerting. This can help you understand the execution status of jobs and handle issues at the earliest opportunity.

Configure monitoring and alerting at the job level

Procedure

Log on to the MSE console, and select a region in the top navigation bar. Go to the XXL-JOB Version page, find the instance that you want to manage, and then click its ID. In the left-side navigation pane, click Task Management. On the Task Management page, find the job that you want to manage and click Edit in the Operation column. In the Notification Configuration step, configure monitoring and alerting.

image

The following table describes the parameters.

Parameter

Description

Default value

Timeout alarm

Specifies whether to send an alert if the job times out.

Open

Timeout

The timeout period. Unit: seconds. If the job runs longer than the specified period, a timeout alert is sent.

7200

Timeout termination

Specifies whether to automatically stop the job if the job times out. This prevents the job timeout from affecting the next scheduling.

Close

Notification of success

Specifies whether to notify the contacts if the job is successfully run. We recommend that you enable this feature for important jobs.

Close

Failure alarm

Specifies whether to send an alert if the job fails.

Open

Number of consecutive failures

The number of consecutive job failures before an alert can be sent.

1

No Machine Alarm Available

Specifies whether to send an alert if no executor is available when the job is scheduled to run.

Open

Notification Method

The notification methods. Valid values: SMS, webhook, Mail, and Telephone.

Important

If you select webhook for the Notification Method parameter, you must perform the following operations.

  1. If you use webhooks, such as DingTalk, WeCom, or Lark webhooks, alerts are sent over the Internet. Therefore, you must associate an Internet NAT gateway with the virtual private cloud (VPC) in which your XXL-JOB instance resides.

    image

  2. If security settings are required, add a specific keyword to the message whitelist of the corresponding chatbot. For example, if you use a DingTalk chatbot, you must add the keyword SchedulerX (case-sensitive) in the security settings. Otherwise, the alert information cannot be received.

No default value

Notification Object

The contacts. The contacts added in CloudMonitor are displayed. You must configure alert contacts in the CloudMonitor console first. Make sure that the contact information of the alert contacts is verified.

No default value

Configure monitoring and alerting at the instance and application levels

Alibaba Cloud XXL-JOB allows you to configure monitoring and alerting for an instance in the CloudMonitor console based on the job statistics information. Common scenarios:

  • Instance level: You can configure an alert to send if the total number of jobs scheduled for an instance is reduced by 30%.

  • Application level: You can configure an alert to send if a job of an application failed for 5 times within 3 minutes.

Procedure

  1. Log on to the MSE console, and select a region in the top navigation bar. Go to the XXL-JOB Version page, find the instance that you want to manage, and then click its ID. In the left-side navigation pane, click Basic information.

  2. In the Scheduling Statistics section of the Basic information page, click the image icon and select Configure Alert Rules or simply click the image icon to go to the Alert Rules page.

    image

  3. On the Alert Rules page, click Create Alert Rule. In the Create Alert Rule panel, select schedulerx3 from the Product drop-down list. Set the Resource Range parameter to Instances, and click Add Instance to add one or more instances. Then, configure an alert rule.

    Important

    Mute Period indicates the period in which an alert is no longer sent after the alert is sent once. After the period elapses, the alert is sent if the specified conditions are met.

    image

  4. Click Add Rule and select a metric, which can be an instance-level rule or application-level metric, based on your business requirements.

    • Instance-level metric: Collects statistics on the execution of all jobs of the instance.

      image

    • Application-level metric: You must select an application name. This way, the specified metric collects statistics on the execution of jobs that correspond to the specified application of the instance.

      • If the specified metric of the application is never reported, an option cannot be automatically selected from the appName drop-down list. You can manually enter an application name.

      • If appName is left empty, all applications follow this alert rule by default.

      image

  5. Select a contact group from the Alert Contact Group drop-down list and click Confirm.

    image