After you enable monitoring for data transformation tasks, Log Service sends alert notifications when exceptions occur during data transformation. This helps you handle exceptions in a timely manner. This topic describes how to enable monitoring for data transformation tasks.
Prerequisites
Background information
- After you create a data transformation rule, Log Service automatically creates a dashboard
named Data Transformation Troubleshooting for the data transformation task. We recommend
that you take note of the following metrics on the Data Transformation Troubleshooting dashboard:
- System metrics: the data consumption delay and relevant exceptions.
- Application metrics: the number of received log entries and number of delivered log entries.
- Log Service provides built-in monitoring rules, action policy, and alert templates
for data transformation. You can use built-in resources based on the following rules:
- Alert Center provides built-in monitoring rules for data transformation. You can enable the alert instance of a monitoring rule to configure alerts. You do not need to write SQL statements. You can enable the alert instances of various monitoring rules, such as the rule that triggers an alert when delay, exceptions, or failures occur during data transformation. For more information, see Monitoring rules for data transformation.
- You can specify notification methods and alert templates in the built-in action policy for data transformation.
- You can specify the content of alert notifications in a built-in alert template for data transformation.
Configuration process
- Use built-in resources.
To configure alerts in an efficient manner, perform the following operations:
- Create a DingTalk chatbot.
Configure a DingTalk chatbot to receive alert notifications.
- Configure an action policy.
Specify the webhook URL of the preceding DingTalk chatbot for the built-in action policy for data transformation. Log Service sends alert notifications by using the webhook URL.
- Enable alert instances.
- Create a DingTalk chatbot.
- Use custom resources.
To create custom resources and use them to configure recipients, alert templates, and notification methods based on your business requirements, perform the following operations:
- Create users.
Configure users or user groups to receive alert notifications. For more information, see Create users and user groups.
- Create an alert template.
Configure the content of alert notifications. For more information, see Create an alert template.
- Create an action policy.
Configure notification methods, such as Voice Call, SMS Message, and Email. For more information, see Create an action policy.
- Enable alert instances.
- Create users.
The built-in resources that are provided by Log Service can be applied to most alerting scenarios. You can use built-in resources or custom resources based on your business requirements. In this example, built-in resources are used to configure alerts.
Step 1: Create a DingTalk chatbot
By default, the built-in action policy for data transformation uses DingTalk-Custom as the notification method to send alert notifications. Before you enable monitoring, you must create a DingTalk chatbot. After an alert is triggered, Log Service sends an alert notification to the specified DingTalk group by using the webhook URL of the DingTalk chatbot.
Step 2: Configure an action policy
Modify the request URL of the DingTalk-Custom notification method for the built-in action policy for data transformation. This way, Log Service sends alert notifications by using the specified webhook URL of the DingTalk chatbot.
- Log on to the Log Service console.
- Go to the Action Policy tab.
- In the Projects section, click the project in which you created the data transformation task.
- In the left-side navigation pane, click Alerts.
- Click Open Alert Center and choose .
- On the Action Policy tab, click Modify in the Actions column of the built-in action policy whose ID is sls.app.etl.builtin.
- In the Edit Action Policy dialog box, click the Primary Action Policy tab and set the Request URL parameter under DingTalk-Custom to the webhook URL that is obtained in Step 1: Create a DingTalk chatbot. Then, click OK.
Step 3: Enable alert instances
Related operations
Operation | Description |
---|---|
Configure whitelists | You can configure whitelists for specific monitoring rules. This way, alerts are not triggered by specified data transformation tasks. |
Add alert instances | You can add an alert instance of a monitoring rule. You can add an alert instance and configure its settings to monitor specific data transformation tasks. |
Disable alert instances | If you disable an alert instance, the status in the Status column of the alert instance changes to Not Enabled, and no more alerts are triggered based on the alert instance.
The configurations of the alert instance are not deleted. If you want to re-enable the alert instance to monitor data transformation tasks, you do not need to reconfigure the parameters of the alert instance. |
Pause alert instances | If you pause an alert instance, no alerts are triggered within a specified period of time based on the alert instance. |
Resume alert instances | You can resume paused alert instances. |
Delete alert instances | If you delete an alert instance, the status in the Status column of the alert instance changes to Not Created.
The configurations of the alert instance are deleted, such as the settings of data transformation tasks. If you want to re-enable the alert instance to monitor data transformation tasks, you must set the parameters of the alert instance again. |
Modify alert instances | You can modify the parameters of an alert instance, such as the alert name, the IDs of data transformation tasks that you want to monitor, monitoring threshold, action policy, and severity. |
Monitoring rules for data transformation
The following tables describe the functionalities, parameters, and associated dashboard metrics of the Log Service built-in monitoring rules for data transformation. The table also provides the handling methods that are used to clear alerts.
- Delay monitoring rule during data transformation
Item Description Rule name Delay Monitoring during Data Transformation Functionality This rule monitors the delay that may occur when data is consumed from shards in data transformation tasks. If the delay during data transformation exceeds the value of the Monitoring Threshold parameter, an alert is triggered. Parameters - Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.
Default value: .*. This value indicates that all data transformation tasks are monitored. Separate multiple task IDs with vertical bars (|).
- Monitoring Threshold: If the delay during data consumption exceeds this value, an alert is triggered. Default value: 300. Unit: seconds.
- Action Policy: the action policy that is used to send alert notifications. The action policy contains notification methods and alert templates. The default value is the built-in action policy whose ID is sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.
- Severity: the severity of the alert. Valid values: Critical, High, Medium, Low, and Report. Default value: High.
Associated dashboard Data Transformation Troubleshooting > shard consumption delay (seconds) Handling method You can clear triggered alerts based on the following rules: - If the amount of data in the source Logstore significantly increases, perform the
following operations as needed:
- If the value of the Transform speed (lines/s) metric increases at the same time and the value of the shard consumption delay (seconds) metric decreases, this indicates that the data transformation task is automatically scaling up resources due to the increasing data volume in the source Logstore. In this case, wait for 5 minutes and then check whether the delay is lower than the specified threshold. If not, proceed to the next step.
- If the value of the Transform speed (lines/s) metric does not increase or the value of the shard consumption delay (seconds) metric continues to increase, this indicates that the number of shards in the source Logstore may be insufficient and the expansion of resources for data transformation is limited. In this case, you must split the shards in the source Logstore. For more information, see Split a shard. After you split the shards, wait for 5 minutes and then check whether the delay is lower than the specified threshold. If not, proceed to the next step.
- If alerts are triggered based on the Exception Monitoring during Data Transformation rule, you must clear the alerts first. After you clear the alerts, wait for 5 minutes and then check whether the delay is lower than the specified threshold. If not, proceed to the next step.
- If you cannot clear the alerts, prepare the information of the related project, Logstore, and the data transformation task ID, and then submit a ticket for Alibaba Cloud technical support.
- Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.
- Exception monitoring rule during data transformation
Item Description Rule name Exception Monitoring during Data Transformation Functionality This rule monitors exceptions in data transformation tasks. If an exception occurs during data transformation, an alert is triggered. Parameters - Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.
Default value: .*. This value indicates that all data transformation tasks are monitored. Separate multiple task IDs with vertical bars (|).
- Action Policy: the action policy that is used to send alert notifications. The action policy contains notification methods and alert templates. The default value is the built-in action policy whose ID is sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.
- Severity: the severity of the alert. Valid values: Critical, High, Medium, Low, and Report. Default value: High.
Associated dashboard Data Transformation Troubleshooting > Exception detail Handling method Fix exceptions based on the related error messages. - If the error message contains Unauthorized, InvalidAccessKeyId, or SignatureNotMatch, the data transformation task does not have the required permissions to read data from the source Logstore or write data to the destination Logstore. For more information, see Authorization overview.
- If the error message contains ProjectNotExist or LogStoreNotExist, the related project or Logstore of the data transformation task does not exist. In this case, log on to the Log Service console to identify and fix the error.
- If the error message contains SettingError, the configuration of the data transformation rule is invalid. For example, if the specified parameters in a function is invalid or the configuration of an external resource such as Object Storage Service (OSS) or ApsaraDB RDS for MySQL is invalid, an error occurs. For more information, see Function overview.
- If the error message contains TransformError, the raw data in the source Logstore does not meet the logic of the current data transformation rule. This error may occur when new types of data are imported to the source Logstore. In this case, identify the raw data from the error message, update the data transformation task, and then try again. For more information, see Manage a data transformation task.
- Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.
- Traffic monitoring rule during data transformation (absolute value)
Item Description Rule name Traffic Monitoring during Data Transformation (Absolute Value) Functionality This rule monitors the average number of log entries that are transformed by data transformation tasks within 5 minutes. If the average number of log entries that are transformed is lower than the value of the Monitoring Threshold parameter, an alert is triggered. Parameters - Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.
Default value: .*. This value indicates that all data transformation tasks are monitored. Separate multiple task IDs with vertical bars (|).
- Monitoring Threshold: If the average number of transformed log entries is lower than this value, an alert is triggered. Default value: 40000. Unit: lines/s.
- Action Policy: the action policy that is used to send alert notifications. The action policy contains notification methods and alert templates. The default value is the built-in action policy whose ID is sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.
- Severity: the severity of the alert. Valid values: Critical, High, Medium, Low, and Report. Default value: High.
Associated dashboard Data Transformation Troubleshooting > Transform speed (lines/s) Handling method You can clear triggered alerts based on the following rules: - If the value change trend of the Transform speed (lines/s) metric is consistent with the increase or decrease trend of the data volume in the source Logstore, this indicates that the number of transformed log entries is limited by the data volume in the source Logstore. If not, proceed to the next step.
- If alerts are triggered based on the Delay Monitoring during Data Transformation rule, you must clear the alerts first. After you clear the alerts, wait for 15 minutes. If the delay is less than 1 minute but the amount of the transformed data is inconsistent with the increase or decrease trend of the data volume in the source Logstore, proceed to the next step.
- If you cannot clear the alerts, prepare the information of the related project, Logstore, and the data transformation task ID, and then submit a ticket for Alibaba Cloud technical support.
- Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.
- Traffic monitoring rule during data transformation (compared with previous day)
Item Description Rule name Traffic Monitoring during Data Transformation (Compared with Previous Day) Functionality This rule monitors the increase rate and decrease rate of the transformed data compared with the same period of the previous day in data transformation tasks. If the increase rate is greater than the value of the Daily Increase Threshold or the decrease rate is greater than the value of the Daily Decrease Threshold parameter, an alert is triggered. Parameters - Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.
Default value: .*. This value indicates that all data transformation tasks are monitored. Separate multiple task IDs with vertical bars (|).
- Daily Increase Threshold: If the daily increase rate of transformed data is greater than this value, an alert is triggered. Default value: 40%.
- Daily Decrease Threshold: If the daily decrease rate of transformed data is greater than this value, an alert is triggered. Default value: 20%.
- Action Policy: the action policy that is used to send alert notifications. The action policy contains notification methods and alert templates. The default value is the built-in action policy whose ID is sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.
- Severity: the severity of the alert. Valid values: Critical, High, Medium, Low, and Report. Default value: High.
Associated dashboard Data Transformation Troubleshooting > Transform speed (lines/s) Handling method You can clear triggered alerts based on the following rules: - If the value change trend of the Transform speed (lines/s) metric is consistent with the increase or decrease trend of the data volume in the source Logstore, this indicates that the number of transformed log entries is limited by the data volume in the source Logstore. If not, proceed to the next step.
- If alerts are triggered based on the Delay Monitoring during Data Transformation rule, you must clear the alerts first. After you clear the alerts, wait for 15 minutes. If the delay is less than 1 minute but the amount of the transformed data is inconsistent with the increase or decrease trend of the data volume in the source Logstore, proceed to the next step.
- If you cannot clear the alerts, prepare the information of the related project, Logstore, and the data transformation task ID, and then submit a ticket for Alibaba Cloud technical support.
- Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.
- Monitoring rule for the number of log entries that fail to be transformed during data
transformation
Item Description Rule name Failure Monitoring during Data Transformation Functionality This rule monitors the failures of data transformation tasks within 15 minutes. If the number of log entries that fail to be transformed during data transformation exceeds the value of the Monitoring Threshold parameter, an alert is triggered. Parameters - Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.
Default value: .*. This value indicates that all data transformation tasks are monitored. Separate multiple task IDs with vertical bars (|).
- Monitoring Threshold: If the number of log entries that fail to be transformed exceeds this value, an alert is triggered. Default value: 10.
- Action Policy: the action policy that is used to send alert notifications. The action policy contains notification methods and alert templates. The default value is the built-in action policy whose ID is sls.app.etl.builtin. This value indicates that alert notifications are sent by using the webhook URL of a DingTalk chatbot.
- Severity: the severity of the alert. Valid values: Critical, High, Medium, Low, and Report. Default value: High.
Associated dashboard Data Transformation Troubleshooting > Total logs failed Handling method You can clear triggered alerts based on the following rules: - Clear the alerts by using the method that is provided by the Exception Monitoring during Data Transformation rule. If no error message is reported, proceed to the next step.
- If you cannot clear the alerts, prepare the information of the related project, Logstore, and the data transformation task ID, and then submit a ticket for Alibaba Cloud technical support.
- Data Transformation ID: the ID of the data transformation task that you want to monitor, for example, dd2de8e7e23f3e42ffbb32fe05710372.