All Products
Search
Document Center

Simple Log Service:Enable monitoring for data transformation jobs

Last Updated:Mar 04, 2024

After you enable monitoring for data transformation jobs, Simple Log Service sends alert notifications if exceptions occur during data transformation. This helps you handle exceptions at the earliest opportunity. This topic describes how to enable monitoring for data transformation jobs.

Prerequisites

A data transformation job is created. For more information, see Create a data transformation job.

Background information

  • After you create a data transformation job, Simple Log Service automatically creates a dashboard named Data Transformation Troubleshooting for the data transformation job. We recommend that you take note of the following metrics on the Data Transformation Troubleshooting dashboard:

    • System metrics: the data consumption delay and relevant exceptions.

    • Application metrics: the number of received logs and number of delivered logs.

    For more information, see Data transformation dashboard.

  • Simple Log Service provides built-in alert rules, action policy, and alert templates for data transformation. You can use built-in resources based on the following rules:

    • You can enable the alert instances of built-in alert rules without the need to write SQL statements. For example, you can enable the rule that triggers an alert when delay, exceptions, or failures occur during data transformation. For more information, see Monitoring rules for data transformation.

    • You can specify notification methods and alert templates in the built-in action policy for data transformation.

    • You can specify the content of alert notifications in a built-in alert template for data transformation.

Step 1: Configure an action policy

By default, built-in alert rules for data transformation are associated with a built-in action policy whose ID is sls.app.etl.builtin. Before you enable the alert instances of built-in alert rules for data transformation, you must specify one or more notification methods in the action policy.

  1. Log on to the Simple Log Service console.

  2. Go to the Action Policy tab.

    1. In the Projects section, click the project that you want to manage.

    2. In the left-side navigation pane, click Alerts.

    3. On the Alert Center page, choose Notification Policy > Action Policy.

  3. On the Action Policy tab, find the built-in action policy whose ID is sls.app.etl.builtin and click Edit in the Actions column.

  4. In the Edit Action Policy dialog box, click the Primary Action Policy tab. On the Primary Action Policy tab, set the Request URL parameter in the DingTalk-Custom section to the webhook URL of your DingTalk chatbot. Use the default settings for other parameters and click OK.

    For information about how to obtain the webhook URL of a DingTalk chatbot, see DingTalk-Custom. You can use other alert notification methods based on your business requirements. For more information, see Notification methods.

Step 2: Enable an alert instance

Simple Log Service provides built-in alert rules. You can enable the alert instances of the related alert rules based on your business requirements.

  1. On the Alert Center page, click Alert Rule, expand Create Alert, and then click Create from Template.

  2. On the Create from Template panel, click SLS Data Transformation and select a required template.

  3. In the Create Alert panel, configure the parameters and click OK to enable the alert instance.

    After you enable an alert instance, Simple Log Service monitors all data transformation jobs of the project in real time.

    If you want to monitor only a specific data transformation job, click the Query Statistics field in the Create Alert panel. On the Query Statistics page, click the Advanced Settings tab. In the Query editor, specify the ID of the data transformation job that you want to monitor.

Related operations

Operation

Description

Configure whitelists

You can configure whitelists for specific monitoring rules. This way, alerts are not triggered by specific data transformation jobs.

Add an alert instance

You can add an alert instance for an alert rule. You can also configure the alert instance to monitor specific data transformation jobs.

Disable an alert instance

If you disable an alert instance, the value in the Status column of the alert instance changes to Not Enabled, and no more alerts are triggered based on the alert instance.

The configurations of the alert instance are not deleted. If you want to re-enable the alert instance to monitor data, you do not need to reconfigure the parameters of the alert instance.

Pause an alert instance

If you pause an alert instance, no alerts are triggered based on the alert instance within a specified period of time.

Resume an alert instance

You can resume a paused alert instance based on your business requirements.

Delete an alert instance

If you delete an alert instance, the value in the Status column of the alert instance changes to Not Created.

The configurations of the alert instance, such as the specified IDs of data transformation jobs, are deleted. If you want to recreate the alert instance to monitor data, you must reconfigure the parameters of the alert instance.

Modify an alert instance

You can modify the parameters of an alert instance, such as the alert name, the IDs of data transformation jobs that you want to monitor, threshold, action policy, and severity.

Monitoring rules for data transformation

Simple Log Service provides the following built-in monitoring rules for data transformation. For information about how to manage alert rules, see Related operations.

The following tables describe the functionalities, parameters, and associated dashboard metrics of the built-in monitoring rules that are provided by Simple Log Service for data transformation. The tables also describe the handling methods that are used to clear alerts.

  • Data Transformation Delay Monitor rule

    Item

    Description

    Rule name

    Data Transformation Delay Monitor

    Functionality

    This rule monitors the latency that occurs when data is consumed from shards in data transformation jobs. If the latency during data transformation exceeds the value of the Threshold parameter, an alert is triggered.

    Parameters

    • Job ID: the ID of the data transformation job that you want to monitor. Example: dd2de8e7e23f3e42ffbb32fe05710372.

      Default value: .*. This value indicates that all data transformation jobs are monitored. Separate multiple job IDs with vertical bars (|).

    • Threshold: If the latency of a data transformation job exceeds the value of this parameter, an alert is triggered. Default value: 300. Unit: seconds.

    • Action Policy: the action policy that is associated with the current alert rule. Simple Log Service sends alert notifications to the specified users based on the action policy. Default value: sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.

    • Severity: the severity of an alert.

    • Repeat Interval: the interval at which Simple Log Service sends one alert notification for repeated alerts. During each interval, Simple Log Service does not send repeated alert notifications for repeated alerts. For example, if you set the Repeat Interval parameter to 1d, 2h, or 3m, Simple Log Service sends only one alert notification within 1 day, 2 hours, or 3 minutes even if repeated alerts are triggered.

    Associated dashboard

    Data Transformation Troubleshooting > shard consumption delay (seconds)

    Handling method

    You can clear triggered alerts based on the following rules:

    1. If the data volume in the source Logstore significantly increases, perform the following operations based on your business requirements:

      • If the value of the Transform speed (lines/s) metric increases and the value of the shard consumption delay (seconds) metric decreases, the data transformation job is automatically scaling up resources due to the increasing data volume in the source Logstore. In this case, wait for 5 minutes and then check whether the latency is less than the specified threshold. If not, proceed to the next step.

      • If the value of the Transform speed (lines/s) metric does not increase or the value of the shard consumption delay (seconds) metric continues to increase, the number of shards in the source Logstore may be insufficient and the extension of resources for data transformation is limited. In this case, you must split the shards in the source Logstore. For more information, see Split a shard. After you split the shards, wait for 5 minutes and then check whether the latency is less than the specified threshold. If not, proceed to the next step.

    2. If alerts are triggered based on the Data Transformation Error Monitor rule, you must clear the alerts first. After you clear the alerts, wait for 5 minutes and then check whether the latency is less than the specified threshold. If not, proceed to the next step.

    3. If the alerts persist, prepare the information about the related project, Logstore, and data transformation job ID, and then submit a ticket to contact Alibaba Cloud technical support.

  • Data Transformation Error Monitor rule

    Item

    Description

    Rule name

    Data Transformation Error Monitor

    Functionality

    This rule monitors exceptions in data transformation jobs. If an exception occurs during data transformation, an alert is triggered.

    Parameters

    • Job ID: the ID of the data transformation job that you want to monitor. Example: dd2de8e7e23f3e42ffbb32fe05710372.

      Default value: .*. This value indicates that all data transformation jobs are monitored. Separate multiple job IDs with vertical bars (|).

    • Action Policy: the action policy that is associated with the current alert rule. Simple Log Service sends alert notifications to the specified users based on the action policy. Default value: sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.

    • Severity: the severity of an alert.

    • Repeat Interval: the interval at which Simple Log Service sends one alert notification for repeated alerts. During each interval, Simple Log Service does not send repeated alert notifications for repeated alerts. For example, if you set the Repeat Interval parameter to 1d, 2h, or 3m, Simple Log Service sends only one alert notification within 1 day, 2 hours, or 3 minutes even if repeated alerts are triggered.

    Associated dashboard

    Data Transformation Troubleshooting > Exception detail

    Handling method

    Fix exceptions based on the related error messages.

    • If the error message contains Unauthorized, InvalidAccessKeyId, or SignatureNotMatch, the data transformation job does not have the required permissions to read data from the source Logstore or write data to the destination Logstore. For more information, see Authorization overview.

    • If the error message contains ProjectNotExist or LogStoreNotExist, the related project or Logstore of the data transformation job does not exist. In this case, log on to the Simple Log Service console to identify and fix the error.

    • If the error message contains SettingError, the configurations of the data transformation job are invalid. For example, if the specified parameter in a function is invalid or the configuration of an external Alibaba Cloud resource such as an Object Storage Service (OSS) bucket or ApsaraDB RDS for MySQL instance is invalid, an error occurs. For more information, see Function overview.

    • If the error message contains TransformError, the raw data in the source Logstore does not meet the logic of the current data transformation job. This error may occur when new types of data are imported to the source Logstore. In this case, locate the raw data based on the error message, update the data transformation job, and then try again. For more information, see Manage a data transformation job.

  • Data Transformation Flow (Absolute Value) Monitor rule

    Item

    Description

    Rule name

    Data Transformation Flow (Absolute Value) Monitor

    Functionality

    This rule monitors the average number of logs that are transformed by data transformation jobs within 5 minutes. If the average number of transformed logs is less than the value of the Threshold parameter, an alert is triggered.

    Parameters

    • Job ID: the ID of the data transformation job that you want to monitor. Example: dd2de8e7e23f3e42ffbb32fe05710372.

      Default value: .*. This value indicates that all data transformation jobs are monitored. Separate multiple job IDs with vertical bars (|).

    • Threshold: If the average number of transformed logs is less than the value of this parameter, an alert is triggered. Default value: 40000. Unit: lines/s.

    • Action Policy: the action policy that is associated with the current alert rule. Simple Log Service sends alert notifications to the specified users based on the action policy. Default value: sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.

    • Severity: the severity of an alert.

    • Repeat Interval: the interval at which Simple Log Service sends one alert notification for repeated alerts. During each interval, Simple Log Service does not send repeated alert notifications for repeated alerts. For example, if you set the Repeat Interval parameter to 1d, 2h, or 3m, Simple Log Service sends only one alert notification within 1 day, 2 hours, or 3 minutes even if repeated alerts are triggered.

    Associated dashboard

    Data Transformation Troubleshooting > Transform speed (lines/s)

    Handling method

    You can clear triggered alerts based on the following rules:

    1. If the value change trend in the Transform speed (lines/s) metric is consistent with the increase or decrease trend in the data volume in the source Logstore, the number of transformed logs is limited by the data volume in the source Logstore. If not, proceed to the next step.

    2. If alerts are triggered based on the Data Transformation Delay Monitor rule, you must clear the alerts first. After you clear the alerts, wait for 15 minutes. If the latency is less than 1 minute but the trend in the amount of the transformed data is inconsistent with the increase or decrease trend in the data volume in the source Logstore, proceed to the next step.

    3. If the alerts persist, prepare the information about the related project, Logstore, and data transformation job ID, and then submit a ticket to contact Alibaba Cloud technical support.

  • Data Transformation Flow (Daily Compare) Monitor rule

    Item

    Description

    Rule name

    Data Transformation Flow (Daily Compare) Monitor

    Functionality

    This rule monitors the increase rate and decrease rate of the transformed data in data transformation jobs within 5 minutes compared with the same period of the previous day. If the increase rate is greater than the value of the Asc Threshold parameter or the decrease rate is greater than the value of the Desc Threshold parameter, an alert is triggered.

    Parameters

    • Job ID: the ID of the data transformation job that you want to monitor. Example: dd2de8e7e23f3e42ffbb32fe05710372.

      Default value: .*. This value indicates that all data transformation jobs are monitored. Separate multiple job IDs with vertical bars (|).

    • Asc Threshold: If the daily increase rate of transformed data is greater than the value of this parameter, an alert is triggered. Default value: 40%.

    • Desc Threshold: If the daily decrease rate of transformed data is greater than the value of this parameter, an alert is triggered. Default value: 20%.

    • Action Policy: the action policy that is used to send alert notifications. The action policy contains notification methods and alert templates. Default value: sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.

    • Severity: the severity of an alert.

    • Repeat Interval: the interval at which Simple Log Service sends one alert notification for repeated alerts. During each interval, Simple Log Service does not send repeated alert notifications for repeated alerts. For example, if you set the Repeat Interval parameter to 1d, 2h, or 3m, Simple Log Service sends only one alert notification within 1 day, 2 hours, or 3 minutes even if repeated alerts are triggered.

    Associated dashboard

    Data Transformation Troubleshooting > Transform speed (lines/s)

    Handling method

    You can clear triggered alerts based on the following rules:

    1. If the value change trend in the Transform speed (lines/s) metric is consistent with the increase or decrease trend in the data volume in the source Logstore, the number of transformed logs is limited by the data volume in the source Logstore. If not, proceed to the next step.

    2. If alerts are triggered based on the Data Transformation Delay Monitor rule, you must clear the alerts first. After you clear the alerts, wait for 15 minutes. If the latency is less than 1 minute but the trend in the amount of the transformed data is inconsistent with the increase or decrease trend in the data volume in the source Logstore, proceed to the next step.

    3. If the alerts persist, prepare the information about the related project, Logstore, and data transformation job ID, and then submit a ticket to contact Alibaba Cloud technical support.

  • Data Transformation Failed Lines Monitor rule

    Item

    Description

    Rule name

    Data Transformation Failed Lines Monitor

    Functionality

    This rule monitors the number of logs that fail to be transformed by data transformation jobs within 15 minutes. If the number of logs that fail to be transformed during data transformation exceeds the value of the Threshold parameter, an alert is triggered.

    Parameters

    • Job ID: the ID of the data transformation job that you want to monitor. Example: dd2de8e7e23f3e42ffbb32fe05710372.

      Default value: .*. This value indicates that all data transformation jobs are monitored. Separate multiple job IDs with vertical bars (|).

    • Threshold: If the number of logs that fail to be transformed exceeds the value of this parameter, an alert is triggered. Default value: 10.

    • Action Policy: the action policy that is used to send alert notifications. The action policy contains notification methods and alert templates. Default value: sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.

    • Severity: the severity of an alert.

    • Repeat Interval: the interval at which Simple Log Service sends one alert notification for repeated alerts. During each interval, Simple Log Service does not send repeated alert notifications for repeated alerts. For example, if you set the Repeat Interval parameter to 1d, 2h, or 3m, Simple Log Service sends only one alert notification within 1 day, 2 hours, or 3 minutes even if repeated alerts are triggered.

    Associated dashboard

    Data Transformation Troubleshooting > Total logs failed

    Handling method

    You can clear triggered alerts based on the following rules:

    1. Clear the alerts by using the method that is provided by the Data Transformation Error Monitor rule. If no error message is reported, proceed to the next step.

    2. If the alerts persist, prepare the information about the related project, Logstore, and data transformation job ID, and then submit a ticket to contact Alibaba Cloud technical support.