All Products
Search
Document Center

Container Service for Kubernetes:Override alerting configurations for multi-cluster management

Last Updated:Dec 05, 2023

The multi-cluster alert management feature allows you to create or modify alert rules on a Fleet instance. However, the Fleet instance can propagate only the same alert rules to clusters that are associated with the Fleet instance. You may want your clusters to use different alert rules to meet business requirements. This topic describes how to override alerting configurations to allow different clusters to use different alert configurations.

Prerequisites

Background information

The multi-cluster management feature allows you to create KubeVela override policies on a Fleet instance to override alerting configurations or application configurations. You can create alert rules on a Fleet instance and then create an override policy to override the alert rules of specific clusters. For example, you can create an override policy to enable GPU alerting, set different alert thresholds, and specify different contacts. After you complete the configuration, you can use the Fleet instance to propagate the alert rules to the associated clusters and then apply the override policy.

The following figure shows how alerting configurations are overridden for specific clusters. An override policy is created on the Fleet instance and applied to ACK Cluster 2 to override its alerting configurations. ACK Cluster 1 still uses the original alerting configurations.

Override alerting configurations

Step 1: Create a contact and a contact group

Create a contact and a contact group. For more information, see Step 1: Create a contact and a contact group.

Step 2: Obtain the contact group ID

Obtain the contact group ID. For more information, see Step 2: Obtain the contact group ID.

Step 3: Create alert rules

Create alert rules. For more information, see Step 3: Create an alert rule.

Step 4: Create an override policy and apply the policy to override the alert rules

KubeVela is used to create an override policy on the Fleet instance and then apply the policy from the Fleet instance to override the alert rules. To do this, perform the following steps.

  1. Run the following command to query the IDs of the clusters to which you want to propagate the alert rules:

    kubectl get managedcluster 

    Expected output:

    NAME            HUB ACCEPTED   MANAGED CLUSTER URLS   JOINED   AVAILABLE   AGE
    c565e4****      true                                  True     True        12d
    cbaa12****      true                                  True     True        12d
    Note

    You can also select clusters by specifying cluster labels. For more information, see Method 2: Specify a label as the cluster ID.

  2. Create a file named ackalertrule-app-override.yaml based on the following content to define the configurations to override:

    In this example, ack-cluster-1 is a CPU-accelerated cluster and ack-cluster-2 is a GPU-accelerated cluster. This example shows how to override the alert rules of ack-cluster-2. The override policy enables GPU alerting, modifies the alert thresholds, and changes the contacts.

    apiVersion: core.oam.dev/v1alpha1  # Specify the cluster to which the alert rules are propagated by cluster ID. 
    kind: Policy
    metadata:
      name: cluster-cpu
      namespace: kube-system
    type: topology
    properties:
      clusters: ["<ack-cluster-1>"] # Replace <ack-cluster-1> with the cluster ID of ack cluster 1. 
    ---
    apiVersion: core.oam.dev/v1alpha1 # Specify the cluster to which the alert rules are propagated by cluster ID. 
    kind: Policy
    metadata:
      name: cluster-gpu
      namespace: kube-system
    type: topology
    properties:
      clusters: ["<ack-cluster-2>"] # Replace <ack-cluster-2> with the cluster ID of ack cluster 2. 
    ---
    apiVersion: core.oam.dev/v1alpha1 # Define an override policy. 
    kind: Policy
    metadata:
      name: override-gpu
      namespace: kube-system
    type: override
    properties:
      components:
      - name: ackalertrules  # The component name in the associated application. 
        traits:
        - type: alert-rule   # alert-rule trait is used to modify the alert rules. 
          properties:
            groups:           # The configurations to override, whose structure is the same as that of the alert rules. You can define multiple groups and alert rules to override. 
            - name: res-exceptions      # Specify the alert group to override. 
              rules:
              - contactGroups:           # Override the contact group. 
                - arms_contact_group_id: "12345"
                  cms_contact_group_name: ack_Default Contact Group
                  id: "1234"
                enable: enable           # Change the value to enable. 
                name: node_cpu_util_high # Specify the name of the alert to override.
                thresholds:              # Modify the threshold. 
                - key: CMS_ESCALATIONS_CRITICAL_Threshold
                  unit: percent
                  value: "60"
            - name: cluster-error    # Specify the alert group to override. 
              rules:
              - enable: enable       # Change the value to enable. 
                name: gpu-xid-error  # Specify the name of the alert to override. 
    ---
    apiVersion: core.oam.dev/v1alpha1  # Define a KubeVela workflow. 
    kind: Workflow
    metadata:
      name: deploy-ackalertrules
      namespace: kube-system
    steps:
      - type: deploy
        name: deploy-cpu
        properties:
          policies: ["cluster-cpu"]   # Deploy the alert rules to cluster-cpu. 
      - type: deploy
        name: deploy-gpu
        properties:
          policies: ["override-gpu", "cluster-gpu"]  # Apply the override policy to override the alert rules of cluster-gpu. 
    ---
    apiVersion: core.oam.dev/v1beta1   # Define a KubeVela application. 
    kind: Application
    metadata:
      name: alertrules
      namespace: kube-system
      annotations:
        app.oam.dev/publishVersion: version1  # Repropagate the alert rules when resources are updated. The value of publishVersion must be modified. 
    spec:
      components:
        - name: ackalertrules
          type: ref-objects
          properties:
            objects:
              - resource: ackalertrules    # Reference the alert rules created in Step 3. 
                name: default
      workflow:
        ref: deploy-ackalertrules  # Use the propagate rules defined in the workflow to propagate the alert rules.
  3. Run the following command to apply the override policy and override the alert rules:

    kubectl apply -f ackalertrule-app-override.yaml
  4. Run the following command to view the propagation progress of the alert rules:

    kubectl amc appstatus alertrules -n kube-system --tree --detail

    Expected output:

    CLUSTER                       NAMESPACE       RESOURCE             STATUS    APPLY_TIME          DETAIL
    c565e4**** (ack-cluster-1)─── kube-system─── AckAlertRule/default updated   2022-**-** **:**:** Age: **
    cbaa12**** (ack-cluster-2)─── kube-system─── AckAlertRule/default updated   2022-**-** **:**:** Age: **