All Products
Search
Document Center

Container Service for Kubernetes:Set up alerting for a registered cluster

Last Updated:Apr 26, 2025

Container Service for Kubernetes (ACK) allows you to configure alerts to centrally manage exceptions in clusters and provides various metrics for different scenarios. You can deploy Custom Resource Definitions (CRDs) in clusters to configure and manage alert rules. This topic describes how to set up alerting and configure alert rules for a registered cluster.

Prerequisites

Configure alicloud-monitor-controller in the registered cluster

Step 1: Grant RAM permissions to alicloud-monitor-controller

Use onectl

  1. Install onectl on your on-premises machine. For more information, see Use onectl to manage registered clusters.

  2. Run the following command to grant Resource Access Management (RAM) permissions to alicloud-monitor-controller:

    onectl ram-user grant --addon alicloud-monitor-controller

    Expected output:

    Ram policy ack-one-registered-cluster-policy-alicloud-monitor-controller granted to ram user ack-one-user-ce313528c3 successfully.

Use the console

Before you install a component in a registered cluster, you must set the AccessKey pair to grant the registered cluster the permissions to access Alibaba Cloud resources. Before you set the AccessKey pair, create a RAM user and grant the RAM user the permissions to access Alibaba Cloud resources.

  1. Create a RAM user.

  2. Create a custom policy.

    Example:

    {
                "Action": [
                    "log:*",
                    "arms:*",
                    "cms:*",
                    "cs:UpdateContactGroup"
                ],
                "Resource": [
                    "*"
                ],
                "Effect": "Allow"
    }
  3. Attach the custom policy to the RAM user.

  4. Create an AccessKey pair for the RAM user.

    Warning

    We recommend that you configure AccessKey pair-based policies for network access control, limiting AccessKey invocation sources to trusted network environments to enhance AccessKey security.

  5. Use the AccessKey pair to create a Secret named alibaba-addon-secret in the registered cluster.

    The system automatically uses the AccessKey pair to access cloud resources when you install alicloud-monitor-controller.

    kubectl -n kube-system create secret generic alibaba-addon-secret --from-literal='access-key-id=<your access key id>' --from-literal='access-key-secret=<your access key secret>'
    Note

    Replace <your access key id> and <your access key secret> with the AccessKey pair that you obtained in the previous step.

Step 2: Install and update alicloud-monitor-controller

Use onectl

Run the following command to install alicloud-monitor-controller:

onectl addon install alicloud-monitor-controller

Expected output:

Addon alicloud-monitor-controller, version **** installed.

Use the console

The console automatically checks whether the alerting configuration meets the requirements and guides you to activate, install, or update alicloud-monitor-controller.

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Operations > Alerts.

  3. On the Alert Configuration page, click Start Installation. The console automatically checks the prerequisites, and installs and upgrades the components.

  4. After the installation and upgrade are complete, configure alerts on the Alert Configuration page.

    Tab

    Description

    Alert Rule Management

    Turn on Enabled to enable the corresponding alert rule set. Click Edit Notification Object to set the associated notification object.

    Alert History

    You can view the latest 100 historical records sent within the last day. Click a link in the Alert Rule column to go to the corresponding monitoring system and view the detailed rule configuration. Click Troubleshoot to quickly locate the resource page where the exception occurred (event or metric exception).

    Contact Management

    You can create, edit, or delete alert contacts.

    Contact methods can be set through text messages, mailboxes, and robot types. You need to authenticate them first in the CloudMonitor console under Alert Service > Alert Contact to receive alert messages. Contact synchronization is also supported. If the authentication information expires, you can delete the corresponding contact in CloudMonitor and refresh the contacts page. For notification object robot type settings, see DingTalk Robot, WeCom Robot, and Lark Robot.

    Contact Group Management

    You can create, edit, or delete alert contact groups. If no alert contact group exists, the ACK console automatically creates a default alert contact group based on the information that you provided during registration.

Set up alerting

Step 1: Enable default alert rules

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Operations > Alerts.

  3. On the Alert Rules tab, enable the alert rule set.

    报警规则管理

Step 2: Configure alert rules

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Operations > Alerts.

  3. On the Alert Rules tab, click Modify Contacts to specify the contact groups to which the alerts are sent. You can turn on or turn off Status to enable or disable the alert rule set.

    Feature

    Description

    Alert Rules

    • By default, ACK provides an alert rule template that you can use to generate alerts based on exceptions and metrics.

    • Alert rules are classified into several alert rule sets. You can enable an alert rule set, disable an alert rule set, and configure multiple alert contact groups for an alert rule set.

    • An alert rule set contains multiple alert rules. Each alert rule corresponds to an alert item. You can create a YAML file to configure multiple alert rule sets in a cluster. You can also modify the YAML file to update alert rules.

    • For more information about how to configure alert rules by using a YAML file, see Configure alert rules by using CRDs.

    • For more information about default alert templates, see Default alert rule templates.

    Alert History

    You can view up to 100 historical alerts. You can select an alert and click the link in the Alert Rule column to view rule details in the monitoring system. You can click Details to go to the resource page where the alert is triggered. The alert may be triggered by an exception or an abnormal metric.报警历史查看

    Alert Contacts

    You can create, edit, or delete alert contacts.

    Alert Contact Groups

    You can create, edit, or delete alert contact groups. If no alert contact group exists, the ACK console automatically creates a default alert contact group based on the information that you provided during registration.

Configure alert rules by using CRDs

When the alerting feature is enabled, the system automatically creates an AckAlertRule in the kube-system namespace. The AckAlertRule contains default alert rule templates. You can use the AckAlertRule to configure alert rule sets.

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Operations > Alerts.

  3. On the Alert Rule Management tab, click Edit Alert Configuration in the upper-right corner. Then, click Actions > YAML in the row of the target rule to view the AckAlertRule resource configuration of the current cluster.

  4. Refer to the description of the default alert rule template and modify the sample YAML file.

    Example:

    apiVersion: alert.alibabacloud.com/v1beta1
    kind: AckAlertRule
    metadata:
      name: default
    spec:
      groups:
        # The following is a sample configuration of a cluster event alert rule.
        - name: pod-exceptions                             # The name of the alert rule group, which corresponds to the Group_Name field in the alert template.
          rules:
            - name: pod-oom                                # The name of the alert rule.
              type: event                                  # The type of the alert rule (Rule_Type). Valid values: event and metric-cms.
              expression: sls.app.ack.pod.oom              # The expression of the alert rule. When the rule type is event, the value of the expression is the value of Rule_Expression_Id in the default alert rule template described in this topic.
              enable: enable                               # The status of the alert rule. Valid values: enable and disable.
            - name: pod-failed
              type: event
              expression: sls.app.ack.pod.failed
              enable: enable
        # The following is a sample configuration of a cluster basic resource alert rule.
        - name: res-exceptions                              # The name of the alert rule group, which corresponds to the Group_Name field in the alert template.
          rules:
            - name: node_cpu_util_high                      # The name of the alert rule.
              type: metric-cms                              # The type of the alert rule (Rule_Type). Valid values: event and metric-cms.
              expression: cms.host.cpu.utilization          # The expression of the alert rule. When the rule type is metric-cms, the value of the expression is the value of Rule_Expression_Id in the default alert rule template described in this topic.
              contactGroups:                                # The alert contact group configuration that is mapped to the alert rule. The configuration is generated by the ACK console. The same contact is used for the same account. The contact can be reused in multiple clusters.
              enable: enable                                # The status of the alert rule. Valid values: enable and disable.
              thresholds:                                   # The threshold of the alert rule. For more information, see the section about how to change the threshold of an alert rule.            
                - key: CMS_ESCALATIONS_CRITICAL_Threshold
                  unit: percent
                  value: '1'

Default alert rule templates

ACK creates default alert rules in registered clusters based on the following conditions:

  • Default alert rules are enabled.

  • You go to the Alert Rules tab for the first time and default alert rules are not enabled.

The following table describes default alert rules.

Alert item

Rule description

Alert source

Rule_Type

ACK_CR_Rule_Name

SLS_Event_ID

Anomalies detected in cluster inspection

An alert is triggered when the automatic inspection mechanism detects potential anomalies. You need to analyze the specific issue and daily maintenance strategy. Submit a ticket to contact the ACK team.

Simple Log Service

event

cis-sched-failed

sls.app.ack.cis.schedule_task_failed