By Huolang, from Alibaba Cloud Storage
Terraform is an automated orchestration tool for IT the basic architecture open sourced by HashiCorp. Write, Plan, and Create Infrastructure as Code. The command line interface (CLI) of Terraform provides a simple mechanism for deploying configuration files to Alibaba Cloud or any other supported cloud and implementing version control.
SLS Alert provides a comprehensive intelligent O&M platform to monitor alerts, reduce noise, manage transactions, and assign notifications. It includes modules, such as log and time series storage, alert monitoring, alert management, and notification management. Powerful features also require automated configuration. This article describes how to use Terraform to implement a simple automated configuration to complete the alert configuration without interfaces.
Please refer to the official link of Alibaba Cloud Terraform for the installation and configuration of Terraform. The Terraform command line has been integrated into Cloud Shell.
SLS Alert mainly involves three operations:
Initialize Alert Resources
Initialize Alert Resources of Project
The rules of alert monitoring can set monitoring settings for data sources (such as time series and logs), which include collaborative monitoring, group evaluation, triggering condition setting, severity setting, non-data alert, alert recovery, and other conditional parameters.
In SLS Alert, after monitoring rules are triggered, a triggered alarm message will match the preset alarm policy. The alarm policy includes noise reduction processing, such as merging, silence, and suppression. After noise reduction processing, the triggered alarm message will be sent to the specified action policy that can be simply understood as a notification channel.
Notification channels include text messages, voice messages, emails, webhooks, DingTalk, WeChat, Feishu, Function Compute, and EventBridge. Managing alert resource data involves the management of users, user groups, and webhooks.
The preceding alert policy, action policy, users, user groups, and webhooks are collectively referred to as alert resource data in SLS.
export ALICLOUD_ACCESS_KEY="LTAIUrZCw3********"
export ALICLOUD_SECRET_KEY="zfwwWAMWIAiooj14GQ2*************"
export ALICLOUD_REGION="cn-heyuan"
The following configuration creates resources under the ALICLOUD_REGION:
data "alicloud_log_alert_resource" "example" {
type = "user"
lang = "cn"
}
The following configuration creates resources in the test-project:
data "alicloud_log_alert_resource" "example" {
type = "project"
project = "test-project"
}
The following configurations will create the rules of alert monitoring, including the following contents:
resource "alicloud_log_alert" "example" {
version = "2.0"
type = "default"
project_name = "test-project"
alert_name = "tf-test-alert-2"
alert_displayname = "tf-test-alert-displayname-2"
dashboard = "tf-test-dashboard"
mute_until = "1632486684"
no_data_fire = "false"
no_data_severity = 8
send_resolved = true
schedule_interval = "5m"
schedule_type = "FixedRate"
query_list {
store = "tf-test-logstore"
store_type = "log"
project = "test-project"
region = "cn-heyuan"
chart_title = "chart_title"
start = "-60s"
end = "20s"
query = "* AND aliyun | select count(1) as cnt"
time_span_type = "Custom"
}
query_list {
store = "tf-test-logstore-5"
store_type = "log"
project = "test-project"
region = "cn-heyuan"
chart_title = "chart_title"
start = "-60s"
end = "20s"
query = "error | select count(1) as error_cnt"
time_span_type = "Custom"
}
join_configurations {
type = "cross_join"
condition = ""
}
labels {
key = "env"
value = "test"
}
labels {
key = "env1"
value = "test1"
}
annotations {
key = "title"
value = "alert title-1"
}
annotations {
key = "desc"
value = "alert desc"
}
annotations {
key = "test_key"
value = "test value"
}
group_configuration {
type = "custom"
fields = ["a", "b", "d"]
}
severity_configurations {
severity = 8
eval_condition = {
condition = "cnt > 3"
count_condition = "__count__ > 3"
}
}
severity_configurations {
severity = 6
eval_condition = {
condition = ""
count_condition = "__count__ > 0"
}
}
severity_configurations {
severity = 2
eval_condition = {
condition = ""
count_condition = ""
}
}
policy_configuration {
alert_policy_id = "sls.builtin.dynamic"
action_policy_id = "sls_test_action"
repeat_interval = "1m"
}
}
Alert resources mainly include users, user groups, on-duty groups, webhook integration, alert policies, action policies, content templates, default logs, and channel quotas. Next, this article takes user creation as an example to introduce the Terraform format. The introduction to the list of relevant resources and structure is attached.
resource "alicloud_log_resource_record" "user" {
resource_name = "sls.common.user"
record_id = "test_tf_user"
tag = "test tf user"
value = "{\n\t\"user_name\": \"test tf user\", \n\t\"sms_enabled\": true, \n\t\"phone\": \"18888888889\", \n\t\"voice_enabled\": false, \n\t\"email\": [\n\t\t\"test@qq.com\"\n\t], \n\t\"enabled\": true, \n\t\"user_id\": \"test_tf_user\", \n\t\"country_code\": \"86\"\n}"
}
resource_name: sls.common.user
record_id: The value is the same as user_id.
Tag: The value is the same as user_name.
Example of value structure:
{
"user_id": "xiaoming",
"user_name": "Xiaoming",
"email": [
"xiaoming@example.com"
],
"country_code": "86",
"phone": "13334567890",
"enabled": true,
"sms_enabled": true,
"voice_enabled": true
}
resource_name: sls.common_user_group
record_id: The value is the same as user_group_id.
Tag: The value is the same as user_group_name.
Example of value structure:
{
"user_group_id": "group-xiaoming",
"user_group_name": "Group-Xiaoming",
"enabled": true,
"members": [
"xiaoming"
]
}
Remarks:
resource_name: sls.alert.oncall_group
record_id: The value is the same as oncall_id.
Tag: The value is the same as oncall_name.
Example of value structure:
{
"oncall_id": "default_oncall",
"oncall_name": "default oncall",
"enabled": true,
"overrides": [],
"rotations": [
{
"targets": [
{
"type": "user",
"target_id": "jizhi"
},
{
"type": "user_group",
"target_id": "alert-dev"
}
],
"end_time": 0,
"shift_day": "",
"shift_time": "12:00",
"shift_type": "day",
"start_time": 1633017600,
"shift_minute": 0,
"end_time_type": "none",
"shift_interval": 1,
"shift_week_custom": null,
"restriction_date_type": "workday",
"restriction_time_type": "allday",
"restriction_week_range": null,
"restriction_time_custom_range": null
}
],
"calendar_id": "default_calendar"
}
resource_name: sls.alert.action_webhook
record_id: The value is the same as the id.
Tag: The value is the same as the name.
Example of value structure:
{
"id": "custom-test",
"name": "customized webhook test",
"type": "custom",
"url": "http://localhost:9099/data/webhook",
"method": "POST",
"headers": [
{
"key": "Content-Type",
"value": "application/json"
},
{
"key": "Foo",
"value": "bar"
}
]
}
Remarks:
Types include:
resource_name: sls.alert.alert_policy
record_id: The value is the same as policy_id.
Tag: The value is the same as policy_name.
Example of value structure:
{
"policy_id": "sls.builtin",
"policy_name": "built-in alert policy",
"parent_id": "sls.root",
"is_default": false,
"group_script": "fire(action_policy=\"sls.builtin\", group={\"project\": \"__a__\", \"uid\": alert.aliuid}, group_wait=\"5s\", group_interval=\"2m\", repeat_interval=\"2m\")\nstop()\nfire(action_policy=\"sls.builtin\", group={\"alert_id\": alert.alert_id}, group_wait=\"5s\", group_interval=\"10s\", repeat_interval=\"2m\")\nif alert.labels.name ~= \"^\\\\w+s$\":\n\tfire(action_policy=\"sls.builtin\", group={\"product\": \"xxs\"}, group_wait=\"5s\", group_interval=\"10s\", repeat_interval=\"2m\")\n\tstop()\nstop()\nfire(action_policy=\"sls.builtin\", group={\"label_name\": alert.labels.name}, group_wait=\"10s\", group_interval=\"10s\", repeat_interval=\"2m\")",
"inhibit_script": "if alert.severity >= 8:\n silence alert.severity < 6",
"silence_script": ""
}
Remarks:
primary_policy_script
and secondary_policy_script
only contain DSL script information, and no UI configuration information. There are no graphics displayed on the console.resource_name: sls.alert.action_policy
record_id: The value is the same as action_policy_id.
Tag: The value is the same as action_policy_name.
Example of value structure:
{
"action_policy_id": "sls.builtin",
"action_policy_name": "default action policy",
"labels": {},
"is_default": false,
"primary_policy_script": "fire(type=\"webhook_integration\", integration_type=\"dingtalk\", webhook_id=\"dingtalk-test\", template_id=\"default-template\", period=\"any\")",
"secondary_policy_script": "fire(type=\"voice\", users=[\"jizhi\"], groups=[\"group-jizhi\"], template_id=\"default-template\")",
"escalation_start_enabled": false,
"escalation_start_timeout": "10s",
"escalation_inprogress_enabled": false,
"escalation_inprogress_timeout": "10s",
"escalation_enabled": false,
"escalation_timeout": "4h0m0s"
}
Remarks:
resource_name: sls.alert.content_template
record_id: The value is the same as template_id.
Tag: The value is the same as template_name.
Example of value structure
{
"template_id": "default-template",
"template_name": "default template",
"is_default": false,
"templates": {
"fc": {
"limit": 0,
"locale": "zh-CN",
"content": "",
"send_type": "merged"
},
"sms": {
"locale": "zh-CN",
"content": ""
},
"lark": {
"title": "Alerthub alert test ${alert_name}",
"locale": "zh-CN",
"content": ""
},
"email": {
"locale": "zh-CN",
"content": "",
"subject": "SLS alert test -jizhi-test"
},
"slack": {
"title": "Alerthub alert test ${alert_name}",
"locale": "zh-CN",
"content": ""
},
"voice": {
"locale": "zh-CN",
"content": ""
},
"wechat": {
"title": "Alerthub alert test ${alert_name}",
"locale": "zh-CN",
"content": ""
},
"webhook": {
"limit": 0,
"locale": "zh-CN",
"content": "",
"send_type": "merged"
},
"dingtalk": {
"title": "Alerthub alert test ${alert_name}",
"locale": "zh-CN",
"content": ""
},
"event_bridge": {
"locale": "zh-CN",
"content": "",
"subject": "wkb-test"
},
"message_center": {
"locale": "zh-CN",
"content": ""
}
}
}
Remarks:
resource_name: sls.common.calender
record_id: The value is the same as calender_id.
Tag: The value is the same as calender_name.
Example of value structure:
{
"calendar_id": "default_calendar",
"calendar_name": "default calendar",
"timezone": "Asia/Shanghai",
"workdays": [
1,
2,
3,
4,
5
],
"worktime": [
{
"end_time": "21:00",
"start_time": "09:00"
}
],
"reset_days": [],
"holiday_sync": "china"
}
Remarks:
resource_name: sls.alert.channel_quota
record_id: The value is the same as the id.
Tag: The value is empty.
Example of value structure:
{
"id": "default",
"quota_script": "if user in [\"jizhi\"]:\n set_limit(sms=5, voice=5, email=5)\nset_limit(sms=100, voice=100, email=100)"
}
Remarks:
How to Write a High-Performance SQL Join: Implementation and Best Practices of Joins
Deployment Practices of General DBAudit in the Kubernetes Environment
1,057 posts | 259 followers
FollowAlibaba Cloud Native Community - April 2, 2024
Yudhistira Heriansyah - September 9, 2024
Alibaba Cloud Community - August 12, 2024
Alibaba Cloud Serverless - August 23, 2022
Alibaba Cloud Storage - June 10, 2021
Alibaba Cloud Community - October 19, 2021
1,057 posts | 259 followers
FollowManaged Service for Grafana displays a large amount of data in real time to provide an overview of business and O&M monitoring.
Learn MoreA unified, efficient, and secure platform that provides cloud-based O&M, access control, and operation audit.
Learn MoreAn all-in-one service for log-type data
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreMore Posts by Alibaba Cloud Community