All Products
Search
Document Center

Container Service for Kubernetes:Override multi-cluster alert configurations

Last Updated:May 12, 2025

The multi-cluster alert management feature allows you to centrally create or modify alert rules on a Fleet instance. However, the propagated alert rules are identical across all associated clusters. When distinct alert rules are required per cluster, override the alert rules to allow different clusters to use different alert configurations.

Prerequisites

Background Information

The principle of alert rule differentiated configuration aligns with application differentiated configuration. It uses the open-source KubeVela to define and propagate override policies on a Fleet instance. You can define unified alert rules on a Fleet instance and create override policies for differentiated configurations on specific clusters. Examples include enabling GPU-related alerts, setting different alert thresholds, and configuring different contacts. The alert rules with override policies applied are then propagated to targeted associated clusters.

The following figure shows how alerting configurations are differentiated for clusters. An override policy is created on the Fleet instance. The differentiated configurations with the override policy applied is delivered to ACK Cluster 2, while ACK Cluster 1 retains the original alert configurations.

image

Step 1: Create a contact and a contact group

  1. Create a contact and a contact group.

  2. Obtain the contact group ID.

  3. Create alert rules.

Step 2: Propagate differentiated alert rules

Alert rule differentiation is implemented through KubeVela, where override policies are defined and propagated at the Fleet level.

  1. Run the following command to query the IDs of the clusters to which you want to propagate the alert rules:

    kubectl get managedcluster 

    Expected output:

    NAME            HUB ACCEPTED   MANAGED CLUSTER URLS   JOINED   AVAILABLE   AGE
    c565e4****      true                                  True     True        12d
    cbaa12****      true                                  True     True        12d
    Note

    You can also select clusters by specifying cluster labels. For more information, see the Method 2: Specify a label in the cluster selector section of the "Select a cluster to distribute applications" topic.

  2. Create a file named ackalertrule-app-override.yaml based on the following content to define the configurations to override:

    In this example, ack-cluster-1 is a CPU-accelerated cluster and ack-cluster-2 is a GPU-accelerated cluster. This example shows how to override the alert rules of ack-cluster-2. The override policy enables GPU alerting, modifies the alert thresholds, and changes the contacts.

    apiVersion: core.oam.dev/v1alpha1  # Specify the cluster to which the alert rules are propagated by cluster ID. 
    kind: Policy
    metadata:
      name: cluster-cpu
      namespace: kube-system
    type: topology
    properties:
      clusters: ["<ack-cluster-1>"] # Replace <ack-cluster-1> with the cluster ID of ack cluster 1. 
    ---
    apiVersion: core.oam.dev/v1alpha1 # Specify the cluster to which the alert rules are propagated by cluster ID. 
    kind: Policy
    metadata:
      name: cluster-gpu
      namespace: kube-system
    type: topology
    properties:
      clusters: ["<ack-cluster-2>"] # Replace <ack-cluster-2> with the cluster ID of ack cluster 2. 
    ---
    apiVersion: core.oam.dev/v1alpha1 # Define an override policy. 
    kind: Policy
    metadata:
      name: override-gpu
      namespace: kube-system
    type: override
    properties:
      components:
      - name: ackalertrules  # The component name in the associated application. 
        traits:
        - type: alert-rule   # alert-rule trait is used to modify the alert rules. 
          properties:
            groups:           # The override configurations, whose structure is the same as that of the alert rules. You can define multiple groups and alert rules to be overridden. 
            - name: res-exceptions      # Specify the name of the alert group to be overridden. 
              rules:
              - contactGroups:           # Override the contact group. 
                - arms_contact_group_id: "12345"
                  cms_contact_group_name: ack_Default Contact Group
                  id: "1234"
                enable: enable           # Change the value to enable. 
                name: node_cpu_util_high # Specify the name of the alert rule to be overridden.
                thresholds:              # Modify the threshold. 
                - key: CMS_ESCALATIONS_CRITICAL_Threshold
                  unit: percent
                  value: "60"
            - name: cluster-error    # Specify the name of the alert group to override. 
              rules:
              - enable: enable       # Change the value to enable. 
                name: gpu-xid-error  # Specify the name of the alert rule to override. 
    ---
    apiVersion: core.oam.dev/v1alpha1  # Define a KubeVela workflow. 
    kind: Workflow
    metadata:
      name: deploy-ackalertrules
      namespace: kube-system
    steps:
      - type: deploy
        name: deploy-cpu
        properties:
          policies: ["cluster-cpu"]   # Deploy the alert rules to cluster-cpu. 
      - type: deploy
        name: deploy-gpu
        properties:
          policies: ["override-gpu", "cluster-gpu"]  # Apply the override policy to override the alert rules of cluster-gpu. 
    ---
    apiVersion: core.oam.dev/v1beta1   # Define a KubeVela application. 
    kind: Application
    metadata:
      name: alertrules
      namespace: kube-system
      annotations:
        app.oam.dev/publishVersion: version1  # Repropagate the alert rules when resources are updated. The value of publishVersion must be modified. 
    spec:
      components:
        - name: ackalertrules
          type: ref-objects
          properties:
            objects:
              - resource: ackalertrules    # Reference the alert rules created in Step 3. 
                name: default
      workflow:
        ref: deploy-ackalertrules  # Use the propagate rules defined in the workflow to propagate the alert rules.
  3. Run the following command to apply the override policy and override the alert rules:

    kubectl apply -f ackalertrule-app-override.yaml
  4. Run the following command to view the propagation progress of the alert rules:

    kubectl amc appstatus alertrules -n kube-system --tree --detail

    Expected output:

    CLUSTER                       NAMESPACE       RESOURCE             STATUS    APPLY_TIME          DETAIL
    c565e4**** (ack-cluster-1)─── kube-system─── AckAlertRule/default updated   2022-**-** **:**:** Age: **
    cbaa12**** (ack-cluster-2)─── kube-system─── AckAlertRule/default updated   2022-**-** **:**:** Age: **