Manage alert rules centrally on a Fleet instance and distribute them to all associated clusters automatically — so every cluster uses the same rules without manual per-cluster configuration.
Prerequisites
Before you begin, make sure you have:
-
Fleet management enabled
-
Two clusters associated with the Fleet instance (a service provider cluster and a service consumer cluster)
-
The latest version of Alibaba Cloud CLI installed and configured
How it works
The Fleet instance acts as the central control plane for alert rules. You create an AckAlertRule Custom Resource Definition (CRD) on the Fleet instance, then create a distribution rule (backed by KubeVela) to push the rule to the clusters you select. Any cluster newly associated with the Fleet instance can receive the same rules through the same distribution mechanism.
Step 1: Create a contact and contact group
Contacts and contact groups are shared across all ACK clusters under your Alibaba Cloud account.
-
Log on to the ACK console and click Clusters in the left navigation pane.
-
Click the name of any cluster. In the left pane, choose Operations > Alerts.
-
On the Alert Configuration page, click Start Installation. The console checks prerequisites and installs and upgrades the required components automatically.
-
On the Alerts page, create a contact:
-
Click the Alert Contacts tab, then click Create.
-
In the Create Alert Contact panel, fill in Name, Phone Number, and Email, then click OK. The system sends an activation message or email to the contact. Activate the contact as prompted.
-
-
Create a contact group:
-
Click the Alert Contact Groups tab, then click Create.
-
In the Create Alert Contact Group panel, set Group Name, select contacts in the Contacts section, and click OK. You can add contacts to or remove contacts from the Selected Contacts column.
-
Step 2: Get the contact group ID
Run the following command to query your contact groups:
aliyun cs GET /alert/contact_groups
Expected output:
{
"contact_groups": [
{
"ali_uid": 14783****,
"binding_info": "{\"sls_id\":\"ack_14783****_***\",\"cms_contact_group_name\":\"ack_Default Contact Group\",\"arms_id\":\"1****\"}",
"contacts": null,
"created": "2021-07-21T12:18:34+08:00",
"group_contact_ids": [
2***
],
"group_name": "Default Contact Group",
"id": 3***,
"updated": "2022-09-19T19:23:57+08:00"
}
],
"page_info": {
"page_number": 1,
"page_size": 100,
"total_count": 1
}
}
Map the output fields to the contactGroups parameters you will use in the alert rule:
contactGroups:
- arms_contact_group_id: "1****" # contact_groups.binding_info.arms_id
cms_contact_group_name: ack_Default Contact Group # contact_groups.binding_info.cms_contact_group_name
id: "3***" # contact_groups.id
Step 3: Create an alert rule
The AckAlertRule CRD groups all supported alert rules under a single resource. The following constraints apply:
The alert rule name must be default and the namespace must be kube-system. For the full list of supported rules, see the Configure alert rules by using CRDs section in the Alert management topic.
Choose which rule groups to enable
The template includes 11 rule groups. Enable only the groups relevant to your cluster configuration:
| Rule group | What it monitors | Enable when |
|---|---|---|
error-events |
Cluster error events (SLS-based) | Always recommended |
warn-events |
Cluster warning events (SLS-based) | High-noise environments |
cluster-core-error |
API server, etcd, Scheduler, kube-controller-manager, cloud-controller-manager, CoreDNS, Ingress health | Core component monitoring is required |
cluster-error |
Node failures, GPU errors, image pull failures, node pool (NLC) errors | Node-level fault detection is needed |
res-exceptions |
CPU, memory, disk, network, inode, SLB utilization (default threshold: 85%) | Resource saturation alerting is needed |
cluster-scale |
Cluster Autoscaler scale-up, scale-down, and timeout events | Autoscaling is enabled |
workload-exceptions |
Job failures, Deployment replica errors, DaemonSet scheduling errors | Workload health monitoring is needed |
pod-exceptions |
Pod OOM, pod start failures, pod crash loops | Pod-level fault detection is needed |
cluster-storage-err |
CSI disk errors, PersistentVolume (PV) failures | Persistent storage is used |
cluster-network-err |
SLB sync failures, route errors, Terway allocation errors, Ingress reload errors | Terway CNI or SLB-backed services are used |
security-err |
Config audit high-risk findings | Security auditing is enabled |
Apply the alert rule
-
Set
rules.enabletoenablefor the rule groups you want to activate. In the example below,error-eventsis enabled. -
Add the
contactGroupsblock from Step 2. -
Save the file as
ackalertrule.yamland apply it:kubectl apply -f ackalertrule.yaml
The following is a complete example with error-events enabled:
apiVersion: alert.alibabacloud.com/v1beta1
kind: AckAlertRule
metadata:
name: default
namespace: kube-system
spec:
groups:
- name: error-events
rules:
- enable: enable
contactGroups:
- arms_contact_group_id: "1****"
cms_contact_group_name: ack_Default Contact Group
id: "3***"
expression: sls.app.ack.error
name: error-event
notification:
message: kubernetes cluster error event.
type: event
- name: warn-events
rules:
- enable: disable
expression: sls.app.ack.warn
name: warn-event
notification:
message: kubernetes cluster warn event.
type: event
- name: cluster-core-error
rules:
- enable: disable
expression: prom.apiserver.notHealthy.down
name: apiserver-unhealthy
notification:
message: "Cluster APIServer not healthy. \nPromQL: ((sum(up{job=\"apiserver\"})
<= 0) or (absent(sum(up{job=\"apiserver\"})))) > 0"
type: metric-prometheus
- enable: disable
expression: prom.etcd.notHealthy.down
name: etcd-unhealthy
notification:
message: "Cluster ETCD not healthy. \nPromQL: ((sum(up{job=\"etcd\"}) <= 0)
or (absent(sum(up{job=\"etcd\"})))) > 0"
type: metric-prometheus
- enable: disable
expression: prom.scheduler.notHealthy.down
name: scheduler-unhealthy
notification:
message: "Cluster Scheduler not healthy. \nPromQL: ((sum(up{job=\"ack-scheduler\"})
<= 0) or (absent(sum(up{job=\"ack-scheduler\"})))) > 0"
type: metric-prometheus
- enable: disable
expression: prom.kcm.notHealthy.down
name: kcm-unhealthy
notification:
message: "Custer kube-controller-manager not healthy. \nPromQL: ((sum(up{job=\"ack-kube-controller-manager\"})
<= 0) or (absent(sum(up{job=\"ack-kube-controller-manager\"})))) > 0"
type: metric-prometheus
- enable: disable
expression: prom.ccm.notHealthy.down
name: ccm-unhealthy
notification:
message: "Cluster cloud-controller-manager not healthy. \nPromQL: ((sum(up{job=\"ack-cloud-controller-manager\"})
<= 0) or (absent(sum(up{job=\"ack-cloud-controller-manager\"})))) > 0"
type: metric-prometheus
- enable: disable
expression: prom.coredns.notHealthy.requestdown
name: coredns-unhealthy-requestdown
notification:
message: "Cluster CoreDNS not healthy, continuously request down. \nPromQL:
(sum(rate(coredns_dns_request_count_total{}[1m]))by(server,zone)<=0) or
(sum(rate(coredns_dns_requests_total{}[1m]))by(server,zone)<=0)"
type: metric-prometheus
- enable: disable
expression: prom.coredns.notHealthy.panic
name: coredns-unhealthy-panic
notification:
message: "Cluster CoreDNS not healthy, continuously panic. \nPromQL: sum(rate(coredns_panic_count_total{}[3m]))
> 0"
type: metric-prometheus
- enable: disable
expression: prom.ingress.request.errorRateHigh
name: ingress-err-request
notification:
message: Cluster Ingress Controller request error rate high (default error
rate is 85%).
type: metric-prometheus
- enable: disable
expression: prom.ingress.ssl.expire
name: ingress-ssl-expire
notification:
message: "Cluster Ingress Controller SSL will expire in a few days (default
14 days). \nPromQL: ((nginx_ingress_controller_ssl_expire_time_seconds -
time()) / 24 / 3600) < 14"
type: metric-prometheus
- name: cluster-error
rules:
- enable: disable
expression: sls.app.ack.docker.hang
name: docker-hang
notification:
message: kubernetes node docker hang.
type: event
- enable: disable
expression: sls.app.ack.eviction
name: eviction-event
notification:
message: kubernetes eviction event.
type: event
- enable: disable
expression: sls.app.ack.gpu.xid_error
name: gpu-xid-error
notification:
message: kubernetes gpu xid error event.
type: event
- enable: disable
expression: sls.app.ack.image.pull_back_off
name: image-pull-back-off
notification:
message: kubernetes image pull back off event.
type: event
- enable: disable
expression: sls.app.ack.node.down
name: node-down
notification:
message: kubernetes node down event.
type: event
- enable: disable
expression: sls.app.ack.node.restart
name: node-restart
notification:
message: kubernetes node restart event.
type: event
- enable: disable
expression: sls.app.ack.ntp.down
name: node-ntp-down
notification:
message: kubernetes node ntp down.
type: event
- enable: disable
expression: sls.app.ack.node.pleg_error
name: node-pleg-error
notification:
message: kubernetes node pleg error event.
type: event
- enable: disable
expression: sls.app.ack.ps.hang
name: ps-hang
notification:
message: kubernetes ps hang event.
type: event
- enable: disable
expression: sls.app.ack.node.fd_pressure
name: node-fd-pressure
notification:
message: kubernetes node fd pressure event.
type: event
- enable: disable
expression: sls.app.ack.node.pid_pressure
name: node-pid-pressure
notification:
message: kubernetes node pid pressure event.
type: event
- enable: disable
expression: sls.app.ack.ccm.del_node_failed
name: node-del-err
notification:
message: kubernetes delete node failed.
type: event
- enable: disable
expression: sls.app.ack.ccm.add_node_failed
name: node-add-err
notification:
message: kubernetes add node failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.run_command_fail
name: nlc-run-cmd-err
notification:
message: kubernetes node pool nlc run command failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.empty_task_cmd
name: nlc-empty-cmd
notification:
message: kubernetes node pool nlc delete node failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.url_mode_unimpl
name: nlc-url-m-unimp
notification:
message: kubernetes nodde pool nlc delete node failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.op_not_found
name: nlc-opt-no-found
notification:
message: kubernetes node pool nlc delete node failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.destroy_node_fail
name: nlc-des-node-err
notification:
message: kubernetes node pool nlc destory node failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.drain_node_fail
name: nlc-drain-node-err
notification:
message: kubernetes node pool nlc drain node failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.restart_ecs_wait_fail
name: nlc-restart-ecs-wait
notification:
message: kubernetes node pool nlc restart ecs wait timeout.
type: event
- enable: disable
expression: sls.app.ack.nlc.restart_ecs_fail
name: nlc-restart-ecs-err
notification:
message: kubernetes node pool nlc restart ecs failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.reset_ecs_fail
name: nlc-reset-ecs-err
notification:
message: kubernetes node pool nlc reset ecs failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.repair_fail
name: nlc-sel-repair-err
notification:
message: kubernetes node pool nlc self repair failed.
type: event
- name: res-exceptions
rules:
- enable: disable
expression: cms.host.cpu.utilization
name: node_cpu_util_high
notification:
message: kubernetes cluster node cpu utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.host.memory.utilization
name: node_mem_util_high
notification:
message: kubernetes cluster node memory utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.host.disk.utilization
name: node_disk_util_high
notification:
message: kubernetes cluster node disk utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.host.public.network.utilization
name: node_public_net_util_high
notification:
message: kubernetes cluster node public network utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.host.fs.inode.utilization
name: node_fs_inode_util_high
notification:
message: kubernetes cluster node file system inode utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.slb.qps.utilization
name: slb_qps_util_high
notification:
message: kubernetes cluster slb qps utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.slb.traffic.tx.utilization
name: slb_traff_tx_util_high
notification:
message: kubernetes cluster slb traffic utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.slb.max.connection.utilization
name: slb_max_con_util_high
notification:
message: kubernetes cluster max connection utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.slb.drop.connection
name: slb_drop_con_high
notification:
message: kubernetes cluster drop connection count per second too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: count
value: "1"
type: metric-cms
- enable: disable
expression: sls.app.ack.node.disk_pressure
name: node-disk-pressure
notification:
message: kubernetes node disk pressure event.
type: event
- enable: disable
expression: sls.app.ack.resource.insufficient
name: node-res-insufficient
notification:
message: kubernetes node resource insufficient.
type: event
- enable: disable
expression: sls.app.ack.ip.not_enough
name: node-ip-pressure
notification:
message: kubernetes ip not enough event.
type: event
- enable: disable
expression: sls.app.ack.csi.no_enough_disk_space
name: disk_space_press
notification:
message: kubernetes csi not enough disk space.
type: event
- name: cluster-scale
rules:
- enable: disable
expression: sls.app.ack.autoscaler.scaleup_group
name: autoscaler-scaleup
notification:
message: kubernetes autoscaler scale up.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.scaledown
name: autoscaler-scaledown
notification:
message: kubernetes autoscaler scale down.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.scaleup_timeout
name: autoscaler-scaleup-timeout
notification:
message: kubernetes autoscaler scale up timeout.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.scaledown_empty
name: autoscaler-scaledown-empty
notification:
message: kubernetes autoscaler scale down empty node.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.scaleup_group_failed
name: autoscaler-up-group-failed
notification:
message: kubernetes autoscaler scale up failed.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.cluster_unhealthy
name: autoscaler-cluster-unhealthy
notification:
message: kubernetes autoscaler error, cluster not healthy.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.delete_started_timeout
name: autoscaler-del-started
notification:
message: kubernetes autoscaler delete node started long ago.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.delete_unregistered
name: autoscaler-del-unregistered
notification:
message: kubernetes autoscaler delete unregistered node.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.scaledown_failed
name: autoscaler-scale-down-failed
notification:
message: kubernetes autoscaler scale down failed.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.instance_expired
name: autoscaler-instance-expired
notification:
message: kubernetes autoscaler scale down instance expired.
type: event
- name: workload-exceptions
rules:
- enable: disable
expression: prom.job.failed
name: job-failed
notification:
message: "Cluster Job failed. \nPromQL: kube_job_status_failed{job=\"_kube-state-metrics\"}
> 0"
type: metric-prometheus
- enable: disable
expression: prom.deployment.replicaError
name: deployment-rep-err
notification:
message: "Cluster Deployment replication status error. \nPromQL: kube_deployment_spec_replicas{job=\"_kube-state-metrics\"}
!= kube_deployment_status_replicas_available{job=\"_kube-state-metrics\"}"
type: metric-prometheus
- enable: disable
expression: prom.daemonset.scheduledError
name: daemonset-status-err
notification:
message: "Cluster Daemonset pod status or scheduled error. \nPromQL: ((100
- kube_daemonset_status_number_ready{} / kube_daemonset_status_desired_number_scheduled{}
* 100) or (kube_daemonset_status_desired_number_scheduled{} - kube_daemonset_status_current_number_scheduled{}))
> 0"
type: metric-prometheus
- enable: disable
expression: prom.daemonset.misscheduled
name: daemonset-misscheduled
notification:
message: "Cluster Daemonset misscheduled. \nPromQL: kube_daemonset_status_number_misscheduled{job=\"_kube-state-metrics\"}
\ > 0"
type: metric-prometheus
- name: pod-exceptions
rules:
- enable: disable
expression: sls.app.ack.pod.oom
name: pod-oom
notification:
message: kubernetes pod oom event.
type: event
- enable: disable
expression: sls.app.ack.pod.failed
name: pod-failed
notification:
message: kubernetes pod start failed event.
type: event
- enable: disable
expression: prom.pod.status.notHealthy
name: pod-status-err
notification:
message: 'Pod status exception. \nPromQL: min_over_time(sum by (namespace,
pod, phase) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed", job="_kube-state-metrics"})[${mins}m:1m])
> 0'
type: metric-prometheus
- enable: disable
expression: prom.pod.status.crashLooping
name: pod-crashloop
notification:
message: 'Pod status exception. \nPromQL: sum_over_time(increase(kube_pod_container_status_restarts_total{job="_kube-state-metrics"}[1m])[${mins}m:1m])
> 3'
type: metric-prometheus
- name: cluster-storage-err
rules:
- enable: disable
expression: sls.app.ack.csi.invalid_disk_size
name: csi_invalid_size
notification:
message: kubernetes csi invalid disk size.
type: event
- enable: disable
expression: sls.app.ack.csi.disk_not_portable
name: csi_not_portable
notification:
message: kubernetes csi not protable.
type: event
- enable: disable
expression: sls.app.ack.csi.deivce_busy
name: csi_device_busy
notification:
message: kubernetes csi disk device busy.
type: event
- enable: disable
expression: sls.app.ack.csi.no_ava_disk
name: csi_no_ava_disk
notification:
message: kubernetes csi no available disk.
type: event
- enable: disable
expression: sls.app.ack.csi.disk_iohang
name: csi_disk_iohang
notification:
message: kubernetes csi ioHang.
type: event
- enable: disable
expression: sls.app.ack.csi.latency_too_high
name: csi_latency_high
notification:
message: kubernetes csi pvc latency load too high.
type: event
- enable: disable
expression: prom.pv.failed
name: pv-failed
notification:
message: 'Cluster PersistentVolume failed. \nPromQL: kube_persistentvolume_status_phase{phase=~"Failed|Pending",
job="_kube-state-metrics"} > 0'
type: metric-prometheus
- name: cluster-network-err
rules:
- enable: disable
expression: sls.app.ack.ccm.no_ava_slb
name: slb-no-ava
notification:
message: kubernetes slb not available.
type: event
- enable: disable
expression: sls.app.ack.ccm.sync_slb_failed
name: slb-sync-err
notification:
message: kubernetes slb sync failed.
type: event
- enable: disable
expression: sls.app.ack.ccm.del_slb_failed
name: slb-del-err
notification:
message: kubernetes slb delete failed.
type: event
- enable: disable
expression: sls.app.ack.ccm.create_route_failed
name: route-create-err
notification:
message: kubernetes create route failed.
type: event
- enable: disable
expression: sls.app.ack.ccm.sync_route_failed
name: route-sync-err
notification:
message: kubernetes sync route failed.
type: event
- enable: disable
expression: sls.app.ack.terway.invalid_resource
name: terway-invalid-res
notification:
message: kubernetes terway have invalid resource.
type: event
- enable: disable
expression: sls.app.ack.terway.alloc_ip_fail
name: terway-alloc-ip-err
notification:
message: kubernetes terway allocate ip error.
type: event
- enable: disable
expression: sls.app.ack.terway.parse_fail
name: terway-parse-err
notification:
message: kubernetes terway parse k8s.aliyun.com/ingress-bandwidth annotation
error.
type: event
- enable: disable
expression: sls.app.ack.terway.allocate_failure
name: terway-alloc-res-err
notification:
message: kubernetes parse resource error.
type: event
- enable: disable
expression: sls.app.ack.terway.dispose_failure
name: terway-dispose-err
notification:
message: kubernetes dispose resource error.
type: event
- enable: disable
expression: sls.app.ack.terway.virtual_mode_change
name: terway-virt-mod-err
notification:
message: kubernetes virtual mode changed.
type: event
- enable: disable
expression: sls.app.ack.terway.config_check
name: terway-ip-check
notification:
message: kubernetes terway execute pod ip config check.
type: event
- enable: disable
expression: sls.app.ack.ingress.err_reload_nginx
name: ingress-reload-err
notification:
message: kubernetes ingress reload config error.
type: event
- name: security-err
rules:
- enable: disable
expression: sls.app.ack.si.config_audit_high_risk
name: si-c-a-risk
notification:
message: kubernetes high risks have be found after running config audit.
type: event
ruleVersion: v1.0.9
The alert rule is created on the Fleet instance but does not take effect on any cluster until you create a distribution rule in Step 4.
Step 4: Distribute the alert rule to clusters
Distribution rules use KubeVela to push Kubernetes resources from the Fleet instance to associated clusters. For more information about application distribution, see Application distribution.
Choose a distribution method based on how you want to target clusters:
| Method | Use when |
|---|---|
| By label | You want to target a dynamic set of clusters (for example, all production clusters) |
| By cluster ID | You want to target specific, fixed clusters |
Method 1: Distribute by label
-
Query associated cluster IDs and add a label to the clusters you want to target:
kubectl get managedclusters kubectl label managedclusters <clusterid> production=true -
Create
ackalertrule-app.yamlwith the following content:apiVersion: core.oam.dev/v1beta1 kind: Application metadata: name: alertrules namespace: kube-system annotations: app.oam.dev/publishVersion: version1 spec: components: - name: alertrules type: ref-objects properties: objects: - resource: ackalertrules name: default policies: - type: topology name: prod-clusters properties: clusterSelector: production: "true" # Selects clusters with this label
Method 2: Distribute by cluster ID
Create ackalertrule-app.yaml with the target cluster IDs:
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: alertrules
namespace: kube-system
annotations:
app.oam.dev/publishVersion: version1
spec:
components:
- name: alertrules
type: ref-objects
properties:
objects:
- resource: ackalertrules
name: default
policies:
- type: topology
name: prod-clusters
properties:
clusters: ["<clusterid1>", "<clusterid2>"] # Replace with actual cluster IDs
Apply and verify the distribution rule
-
Apply the distribution rule:
kubectl apply -f ackalertrule-app.yaml -
Check the distribution status:
kubectl amc appstatus alertrules -n kube-system --tree --detailIf the distribution succeeds, the output shows
updatedfor each cluster:CLUSTER NAMESPACE RESOURCE STATUS APPLY_TIME DETAIL c565e4**** (cluster1)─── kube-system─── AckAlertRule/default updated 2022-**-** **:**:** Age: ** cbaa12**** (cluster2)─── kube-system─── AckAlertRule/default updated 2022-**-** **:**:** Age: **If a cluster shows a status other than
updated, verify that the cluster is still associated with the Fleet instance and that theAckAlertRuleresource was created successfully on the Fleet instance. For alert management details, see Alert management.
Update alert rules
To change an alert rule after distribution:
-
Edit
ackalertrule.yamland apply the changes:kubectl apply -f ackalertrule.yaml -
Increment the
app.oam.dev/publishVersionannotation value inackalertrule-app.yaml(for example, changeversion1toversion2), then apply:kubectl apply -f ackalertrule-app.yamlUpdating the annotation triggers KubeVela to re-distribute the modified rule to all targeted clusters.
What's next
-
Alert management — configure alert rules directly on individual clusters
-
Configure alert rules by using CRDs — full reference for
AckAlertRulefields and supported expressions -
Application distribution — distribute other Kubernetes resources across clusters using the same mechanism