Container Service for Kubernetes (ACK) allows you to configure alerts to centrally manage exceptions in clusters and provides various metrics for different scenarios. You can deploy Custom Resource Definitions (CRDs) in clusters to configure and manage alert rules. This topic describes how to set up alerting and configure alert rules for a registered cluster.
Prerequisites
An external Kubernetes cluster is registered in the Container Service for Kubernetes (ACK) console. For more information, see Create a registered cluster.
A kubectl client is connected to the registered cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Configure alicloud-monitor-controller in the registered cluster
Step 1: Grant RAM permissions to alicloud-monitor-controller
Use onectl
Install onectl on your on-premises machine. For more information, see Use onectl to manage registered clusters.
Run the following command to grant Resource Access Management (RAM) permissions to alicloud-monitor-controller:
onectl ram-user grant --addon alicloud-monitor-controller
Expected output:
Ram policy ack-one-registered-cluster-policy-alicloud-monitor-controller granted to ram user ack-one-user-ce313528c3 successfully.
Use the console
Before you install a component in a registered cluster, you must set the AccessKey pair to grant the registered cluster the permissions to access Alibaba Cloud resources. Before you set the AccessKey pair, create a RAM user and grant the RAM user the permissions to access Alibaba Cloud resources.
Example:
{ "Action": [ "log:*", "arms:*", "cms:*", "cs:UpdateContactGroup" ], "Resource": [ "*" ], "Effect": "Allow" }
Create an AccessKey pair for the RAM user.
WarningWe recommend that you configure AccessKey pair-based policies for network access control, limiting AccessKey invocation sources to trusted network environments to enhance AccessKey security.
Use the AccessKey pair to create a Secret named alibaba-addon-secret in the registered cluster.
The system automatically uses the AccessKey pair to access cloud resources when you install alicloud-monitor-controller.
kubectl -n kube-system create secret generic alibaba-addon-secret --from-literal='access-key-id=<your access key id>' --from-literal='access-key-secret=<your access key secret>'
NoteReplace
<your access key id>
and<your access key secret>
with the AccessKey pair that you obtained in the previous step.
Step 2: Install and update alicloud-monitor-controller
Use onectl
Run the following command to install alicloud-monitor-controller:
onectl addon install alicloud-monitor-controller
Expected output:
Addon alicloud-monitor-controller, version **** installed.
Use the console
The console automatically checks whether the alerting configuration meets the requirements and guides you to activate, install, or update alicloud-monitor-controller.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Alert Configuration page, click Start Installation. The console automatically checks the prerequisites, and installs and upgrades the components.
After the installation and upgrade are complete, configure alerts on the Alert Configuration page.
Tab
Description
Alert Rule Management
Turn on Enabled to enable the corresponding alert rule set. Click Edit Notification Object to set the associated notification object.
Alert History
You can view the latest 100 historical records sent within the last day. Click a link in the Alert Rule column to go to the corresponding monitoring system and view the detailed rule configuration. Click Troubleshoot to quickly locate the resource page where the exception occurred (event or metric exception).
Contact Management
You can create, edit, or delete alert contacts.
Contact methods can be set through text messages, mailboxes, and robot types. You need to authenticate them first in the CloudMonitor console under to receive alert messages. Contact synchronization is also supported. If the authentication information expires, you can delete the corresponding contact in CloudMonitor and refresh the contacts page. For notification object robot type settings, see DingTalk Robot, WeCom Robot, and Lark Robot.
Contact Group Management
You can create, edit, or delete alert contact groups. If no alert contact group exists, the ACK console automatically creates a default alert contact group based on the information that you provided during registration.
Set up alerting
Step 1: Enable default alert rules
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Alert Rules tab, enable the alert rule set.
Step 2: Configure alert rules
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Alert Rules tab, click Modify Contacts to specify the contact groups to which the alerts are sent. You can turn on or turn off Status to enable or disable the alert rule set.
Feature
Description
Alert Rules
By default, ACK provides an alert rule template that you can use to generate alerts based on exceptions and metrics.
Alert rules are classified into several alert rule sets. You can enable an alert rule set, disable an alert rule set, and configure multiple alert contact groups for an alert rule set.
An alert rule set contains multiple alert rules. Each alert rule corresponds to an alert item. You can create a YAML file to configure multiple alert rule sets in a cluster. You can also modify the YAML file to update alert rules.
For more information about how to configure alert rules by using a YAML file, see Configure alert rules by using CRDs.
For more information about default alert templates, see Default alert rule templates.
Alert History
You can view up to 100 historical alerts. You can select an alert and click the link in the Alert Rule column to view rule details in the monitoring system. You can click Details to go to the resource page where the alert is triggered. The alert may be triggered by an exception or an abnormal metric.
Alert Contacts
You can create, edit, or delete alert contacts.
Alert Contact Groups
You can create, edit, or delete alert contact groups. If no alert contact group exists, the ACK console automatically creates a default alert contact group based on the information that you provided during registration.
Configure alert rules by using CRDs
When the alerting feature is enabled, the system automatically creates an AckAlertRule in the kube-system namespace. The AckAlertRule contains default alert rule templates. You can use the AckAlertRule to configure alert rule sets.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Alert Rule Management tab, click Edit Alert Configuration in the upper-right corner. Then, click Actions > YAML in the row of the target rule to view the AckAlertRule resource configuration of the current cluster.
Refer to the description of the default alert rule template and modify the sample YAML file.
Example:
apiVersion: alert.alibabacloud.com/v1beta1 kind: AckAlertRule metadata: name: default spec: groups: # The following is a sample configuration of a cluster event alert rule. - name: pod-exceptions # The name of the alert rule group, which corresponds to the Group_Name field in the alert template. rules: - name: pod-oom # The name of the alert rule. type: event # The type of the alert rule (Rule_Type). Valid values: event and metric-cms. expression: sls.app.ack.pod.oom # The expression of the alert rule. When the rule type is event, the value of the expression is the value of Rule_Expression_Id in the default alert rule template described in this topic. enable: enable # The status of the alert rule. Valid values: enable and disable. - name: pod-failed type: event expression: sls.app.ack.pod.failed enable: enable # The following is a sample configuration of a cluster basic resource alert rule. - name: res-exceptions # The name of the alert rule group, which corresponds to the Group_Name field in the alert template. rules: - name: node_cpu_util_high # The name of the alert rule. type: metric-cms # The type of the alert rule (Rule_Type). Valid values: event and metric-cms. expression: cms.host.cpu.utilization # The expression of the alert rule. When the rule type is metric-cms, the value of the expression is the value of Rule_Expression_Id in the default alert rule template described in this topic. contactGroups: # The alert contact group configuration that is mapped to the alert rule. The configuration is generated by the ACK console. The same contact is used for the same account. The contact can be reused in multiple clusters. enable: enable # The status of the alert rule. Valid values: enable and disable. thresholds: # The threshold of the alert rule. For more information, see the section about how to change the threshold of an alert rule. - key: CMS_ESCALATIONS_CRITICAL_Threshold unit: percent value: '1'
Default alert rule templates
ACK creates default alert rules in registered clusters based on the following conditions:
Default alert rules are enabled.
You go to the Alert Rules tab for the first time and default alert rules are not enabled.
The following table describes default alert rules.
Alert item | Rule description | Alert source | Rule_Type | ACK_CR_Rule_Name | SLS_Event_ID |
Anomalies detected in cluster inspection | An alert is triggered when the automatic inspection mechanism detects potential anomalies. You need to analyze the specific issue and daily maintenance strategy. Submit a ticket to contact the ACK team. | Simple Log Service | event | cis-sched-failed | sls.app.ack.cis.schedule_task_failed |