Kelola aturan peringatan secara terpusat pada instans Fleet dan distribusikan secara otomatis ke semua kluster terkait—sehingga setiap kluster menggunakan aturan yang sama tanpa perlu konfigurasi manual per kluster.
Prasyarat
Sebelum memulai, pastikan Anda telah:
Manajemen armada telah diaktifkan.
Dua kluster terkait dengan instans Fleet (satu kluster penyedia layanan dan satu kluster konsumen layanan).
Alibaba Cloud CLI versi terbaru telah terinstal dan dikonfigurasi
Cara kerja
Instans Fleet berperan sebagai lapisan kontrol terpusat untuk aturan peringatan. Anda membuat Custom Resource Definition (CRD) AckAlertRule pada instans Fleet, lalu membuat aturan distribusi (didukung oleh KubeVela) untuk mendorong aturan tersebut ke kluster yang Anda pilih. Setiap kluster baru yang dikaitkan dengan instans Fleet dapat menerima aturan yang sama melalui mekanisme distribusi ini.
Langkah 1: Buat kontak dan grup kontak
Kontak dan grup kontak dibagikan di seluruh kluster ACK dalam Akun Alibaba Cloud Anda.
Masuk ke Konsol ACK dan klik Clusters di panel navigasi kiri.
Klik nama salah satu kluster. Di panel kiri, pilih Operations > Alerts.
Di halaman Alert Configuration, klik Start Installation. Konsol akan memeriksa prasyarat serta menginstal dan memutakhirkan komponen yang diperlukan secara otomatis.
Di halaman Alerts, buat kontak:
Klik tab Alert Contacts, lalu klik Create.
Di panel Create Alert Contact, isi Name, Phone Number, dan Email, lalu klik OK. Sistem akan mengirim pesan atau email aktivasi ke kontak tersebut. Aktifkan kontak sesuai petunjuk.
Buat grup kontak:
Klik tab Alert Contact Groups, lalu klik Create.
Di panel Create Alert Contact Group, atur Group Name, pilih kontak di bagian Contacts, lalu klik OK. Anda dapat menambahkan atau menghapus kontak dari kolom Selected Contacts.
Langkah 2: Dapatkan ID grup kontak
Jalankan perintah berikut untuk mengkueri grup kontak Anda:
aliyun cs GET /alert/contact_groupsOutput yang diharapkan:
{
"contact_groups": [
{
"ali_uid": 14783****,
"binding_info": "{\"sls_id\":\"ack_14783****_***\",\"cms_contact_group_name\":\"ack_Default Contact Group\",\"arms_id\":\"1****\"}",
"contacts": null,
"created": "2021-07-21T12:18:34+08:00",
"group_contact_ids": [
2***
],
"group_name": "Default Contact Group",
"id": 3***,
"updated": "2022-09-19T19:23:57+08:00"
}
],
"page_info": {
"page_number": 1,
"page_size": 100,
"total_count": 1
}
}Petakan field output ke parameter contactGroups yang akan Anda gunakan dalam aturan peringatan:
contactGroups:
- arms_contact_group_id: "1****" # contact_groups.binding_info.arms_id
cms_contact_group_name: ack_Default Contact Group # contact_groups.binding_info.cms_contact_group_name
id: "3***" # contact_groups.idLangkah 3: Buat aturan peringatan
CRD AckAlertRule mengelompokkan semua aturan peringatan yang didukung dalam satu resource. Batasan berikut berlaku:
Nama aturan peringatan harus default dan namespace harus kube-system. Untuk daftar lengkap aturan yang didukung, lihat bagian Konfigurasi aturan peringatan menggunakan CRD dalam topik Manajemen Peringatan.
Pilih grup aturan yang akan diaktifkan
Templat mencakup 11 grup aturan. Aktifkan hanya grup yang relevan dengan konfigurasi kluster Anda:
| Grup aturan | Yang dipantau | Aktifkan saat |
|---|---|---|
error-events | Event error kluster (berbasis SLS) | Selalu direkomendasikan |
warn-events | Event peringatan kluster (berbasis SLS) | Lingkungan dengan noise tinggi |
cluster-core-error | Kesehatan API server, etcd, Scheduler, kube-controller-manager, cloud-controller-manager, CoreDNS, Ingress | Pemantauan komponen inti diperlukan |
cluster-error | Kegagalan node, error GPU, kegagalan pull image, error node pool (NLC) | Deteksi kesalahan tingkat node diperlukan |
res-exceptions | Utilisasi CPU, memori, disk, jaringan, inode, SLB (ambang batas default: 85%) | Peringatan saturasi resource diperlukan |
cluster-scale | Event scale-up, scale-down, dan timeout Cluster Autoscaler | Autoscaling diaktifkan |
workload-exceptions | Kegagalan Job, error replika Deployment, error penjadwalan DaemonSet | Pemantauan kesehatan workload diperlukan |
pod-exceptions | Pod OOM, kegagalan start pod, crash loop pod | Deteksi kesalahan tingkat Pod diperlukan |
cluster-storage-err | Error disk CSI, kegagalan PersistentVolume (PV) | Penyimpanan persisten digunakan |
cluster-network-err | Kegagalan sinkronisasi SLB, error rute, error alokasi Terway, error reload Ingress | Menggunakan CNI Terway atau layanan berbasis SLB |
security-err | Temuan risiko tinggi dari audit konfigurasi | Audit keamanan diaktifkan |
Terapkan aturan peringatan
Atur
rules.enablekeenableuntuk grup aturan yang ingin Anda aktifkan. Pada contoh berikut,error-eventsdiaktifkan.Tambahkan blok
contactGroupsdari Langkah 2.Simpan file sebagai
ackalertrule.yamldan terapkan:kubectl apply -f ackalertrule.yaml
Berikut adalah contoh lengkap dengan error-events diaktifkan:
apiVersion: alert.alibabacloud.com/v1beta1
kind: AckAlertRule
metadata:
name: default
namespace: kube-system
spec:
groups:
- name: error-events
rules:
- enable: enable
contactGroups:
- arms_contact_group_id: "1****"
cms_contact_group_name: ack_Default Contact Group
id: "3***"
expression: sls.app.ack.error
name: error-event
notification:
message: kubernetes cluster error event.
type: event
- name: warn-events
rules:
- enable: disable
expression: sls.app.ack.warn
name: warn-event
notification:
message: kubernetes cluster warn event.
type: event
- name: cluster-core-error
rules:
- enable: disable
expression: prom.apiserver.notHealthy.down
name: apiserver-unhealthy
notification:
message: "Cluster APIServer not healthy. \nPromQL: ((sum(up{job=\"apiserver\"})
<= 0) or (absent(sum(up{job=\"apiserver\"})))) > 0"
type: metric-prometheus
- enable: disable
expression: prom.etcd.notHealthy.down
name: etcd-unhealthy
notification:
message: "Cluster ETCD not healthy. \nPromQL: ((sum(up{job=\"etcd\"}) <= 0)
or (absent(sum(up{job=\"etcd\"})))) > 0"
type: metric-prometheus
- enable: disable
expression: prom.scheduler.notHealthy.down
name: scheduler-unhealthy
notification:
message: "Cluster Scheduler not healthy. \nPromQL: ((sum(up{job=\"ack-scheduler\"})
<= 0) or (absent(sum(up{job=\"ack-scheduler\"})))) > 0"
type: metric-prometheus
- enable: disable
expression: prom.kcm.notHealthy.down
name: kcm-unhealthy
notification:
message: "Custer kube-controller-manager not healthy. \nPromQL: ((sum(up{job=\"ack-kube-controller-manager\"})
<= 0) or (absent(sum(up{job=\"ack-kube-controller-manager\"})))) > 0"
type: metric-prometheus
- enable: disable
expression: prom.ccm.notHealthy.down
name: ccm-unhealthy
notification:
message: "Cluster cloud-controller-manager not healthy. \nPromQL: ((sum(up{job=\"ack-cloud-controller-manager\"})
<= 0) or (absent(sum(up{job=\"ack-cloud-controller-manager\"})))) > 0"
type: metric-prometheus
- enable: disable
expression: prom.coredns.notHealthy.requestdown
name: coredns-unhealthy-requestdown
notification:
message: "Cluster CoreDNS not healthy, continuously request down. \nPromQL:
(sum(rate(coredns_dns_request_count_total{}[1m]))by(server,zone)<=0) or
(sum(rate(coredns_dns_requests_total{}[1m]))by(server,zone)<=0)"
type: metric-prometheus
- enable: disable
expression: prom.coredns.notHealthy.panic
name: coredns-unhealthy-panic
notification:
message: "Cluster CoreDNS not healthy, continuously panic. \nPromQL: sum(rate(coredns_panic_count_total{}[3m]))
> 0"
type: metric-prometheus
- enable: disable
expression: prom.ingress.request.errorRateHigh
name: ingress-err-request
notification:
message: Cluster Ingress Controller request error rate high (default error
rate is 85%).
type: metric-prometheus
- enable: disable
expression: prom.ingress.ssl.expire
name: ingress-ssl-expire
notification:
message: "Cluster Ingress Controller SSL will expire in a few days (default
14 days). \nPromQL: ((nginx_ingress_controller_ssl_expire_time_seconds -
time()) / 24 / 3600) < 14"
type: metric-prometheus
- name: cluster-error
rules:
- enable: disable
expression: sls.app.ack.docker.hang
name: docker-hang
notification:
message: kubernetes node docker hang.
type: event
- enable: disable
expression: sls.app.ack.eviction
name: eviction-event
notification:
message: kubernetes eviction event.
type: event
- enable: disable
expression: sls.app.ack.gpu.xid_error
name: gpu-xid-error
notification:
message: kubernetes gpu xid error event.
type: event
- enable: disable
expression: sls.app.ack.image.pull_back_off
name: image-pull-back-off
notification:
message: kubernetes image pull back off event.
type: event
- enable: disable
expression: sls.app.ack.node.down
name: node-down
notification:
message: kubernetes node down event.
type: event
- enable: disable
expression: sls.app.ack.node.restart
name: node-restart
notification:
message: kubernetes node restart event.
type: event
- enable: disable
expression: sls.app.ack.ntp.down
name: node-ntp-down
notification:
message: kubernetes node ntp down.
type: event
- enable: disable
expression: sls.app.ack.node.pleg_error
name: node-pleg-error
notification:
message: kubernetes node pleg error event.
type: event
- enable: disable
expression: sls.app.ack.ps.hang
name: ps-hang
notification:
message: kubernetes ps hang event.
type: event
- enable: disable
expression: sls.app.ack.node.fd_pressure
name: node-fd-pressure
notification:
message: kubernetes node fd pressure event.
type: event
- enable: disable
expression: sls.app.ack.node.pid_pressure
name: node-pid-pressure
notification:
message: kubernetes node pid pressure event.
type: event
- enable: disable
expression: sls.app.ack.ccm.del_node_failed
name: node-del-err
notification:
message: kubernetes delete node failed.
type: event
- enable: disable
expression: sls.app.ack.ccm.add_node_failed
name: node-add-err
notification:
message: kubernetes add node failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.run_command_fail
name: nlc-run-cmd-err
notification:
message: kubernetes node pool nlc run command failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.empty_task_cmd
name: nlc-empty-cmd
notification:
message: kubernetes node pool nlc delete node failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.url_mode_unimpl
name: nlc-url-m-unimp
notification:
message: kubernetes nodde pool nlc delete node failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.op_not_found
name: nlc-opt-no-found
notification:
message: kubernetes node pool nlc delete node failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.destroy_node_fail
name: nlc-des-node-err
notification:
message: kubernetes node pool nlc destory node failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.drain_node_fail
name: nlc-drain-node-err
notification:
message: kubernetes node pool nlc drain node failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.restart_ecs_wait_fail
name: nlc-restart-ecs-wait
notification:
message: kubernetes node pool nlc restart ecs wait timeout.
type: event
- enable: disable
expression: sls.app.ack.nlc.restart_ecs_fail
name: nlc-restart-ecs-err
notification:
message: kubernetes node pool nlc restart ecs failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.reset_ecs_fail
name: nlc-reset-ecs-err
notification:
message: kubernetes node pool nlc reset ecs failed.
type: event
- enable: disable
expression: sls.app.ack.nlc.repair_fail
name: nlc-sel-repair-err
notification:
message: kubernetes node pool nlc self repair failed.
type: event
- name: res-exceptions
rules:
- enable: disable
expression: cms.host.cpu.utilization
name: node_cpu_util_high
notification:
message: kubernetes cluster node cpu utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.host.memory.utilization
name: node_mem_util_high
notification:
message: kubernetes cluster node memory utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.host.disk.utilization
name: node_disk_util_high
notification:
message: kubernetes cluster node disk utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.host.public.network.utilization
name: node_public_net_util_high
notification:
message: kubernetes cluster node public network utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.host.fs.inode.utilization
name: node_fs_inode_util_high
notification:
message: kubernetes cluster node file system inode utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.slb.qps.utilization
name: slb_qps_util_high
notification:
message: kubernetes cluster slb qps utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.slb.traffic.tx.utilization
name: slb_traff_tx_util_high
notification:
message: kubernetes cluster slb traffic utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.slb.max.connection.utilization
name: slb_max_con_util_high
notification:
message: kubernetes cluster max connection utilization too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: percent
value: "85"
type: metric-cms
- enable: disable
expression: cms.slb.drop.connection
name: slb_drop_con_high
notification:
message: kubernetes cluster drop connection count per second too high.
thresholds:
- key: CMS_ESCALATIONS_CRITICAL_Threshold
unit: count
value: "1"
type: metric-cms
- enable: disable
expression: sls.app.ack.node.disk_pressure
name: node-disk-pressure
notification:
message: kubernetes node disk pressure event.
type: event
- enable: disable
expression: sls.app.ack.resource.insufficient
name: node-res-insufficient
notification:
message: kubernetes node resource insufficient.
type: event
- enable: disable
expression: sls.app.ack.ip.not_enough
name: node-ip-pressure
notification:
message: kubernetes ip not enough event.
type: event
- enable: disable
expression: sls.app.ack.csi.no_enough_disk_space
name: disk_space_press
notification:
message: kubernetes csi not enough disk space.
type: event
- name: cluster-scale
rules:
- enable: disable
expression: sls.app.ack.autoscaler.scaleup_group
name: autoscaler-scaleup
notification:
message: kubernetes autoscaler scale up.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.scaledown
name: autoscaler-scaledown
notification:
message: kubernetes autoscaler scale down.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.scaleup_timeout
name: autoscaler-scaleup-timeout
notification:
message: kubernetes autoscaler scale up timeout.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.scaledown_empty
name: autoscaler-scaledown-empty
notification:
message: kubernetes autoscaler scale down empty node.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.scaleup_group_failed
name: autoscaler-up-group-failed
notification:
message: kubernetes autoscaler scale up failed.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.cluster_unhealthy
name: autoscaler-cluster-unhealthy
notification:
message: kubernetes autoscaler error, cluster not healthy.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.delete_started_timeout
name: autoscaler-del-started
notification:
message: kubernetes autoscaler delete node started long ago.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.delete_unregistered
name: autoscaler-del-unregistered
notification:
message: kubernetes autoscaler delete unregistered node.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.scaledown_failed
name: autoscaler-scale-down-failed
notification:
message: kubernetes autoscaler scale down failed.
type: event
- enable: disable
expression: sls.app.ack.autoscaler.instance_expired
name: autoscaler-instance-expired
notification:
message: kubernetes autoscaler scale down instance expired.
type: event
- name: workload-exceptions
rules:
- enable: disable
expression: prom.job.failed
name: job-failed
notification:
message: "Cluster Job failed. \nPromQL: kube_job_status_failed{job=\"_kube-state-metrics\"}
> 0"
type: metric-prometheus
- enable: disable
expression: prom.deployment.replicaError
name: deployment-rep-err
notification:
message: "Cluster Deployment replication status error. \nPromQL: kube_deployment_spec_replicas{job=\"_kube-state-metrics\"}
!= kube_deployment_status_replicas_available{job=\"_kube-state-metrics\"}"
type: metric-prometheus
- enable: disable
expression: prom.daemonset.scheduledError
name: daemonset-status-err
notification:
message: "Cluster Daemonset pod status or scheduled error. \nPromQL: ((100
- kube_daemonset_status_number_ready{} / kube_daemonset_status_desired_number_scheduled{}
* 100) or (kube_daemonset_status_desired_number_scheduled{} - kube_daemonset_status_current_number_scheduled{}))
> 0"
type: metric-prometheus
- enable: disable
expression: prom.daemonset.misscheduled
name: daemonset-misscheduled
notification:
message: "Cluster Daemonset misscheduled. \nPromQL: kube_daemonset_status_number_misscheduled{job=\"_kube-state-metrics\"}
\ > 0"
type: metric-prometheus
- name: pod-exceptions
rules:
- enable: disable
expression: sls.app.ack.pod.oom
name: pod-oom
notification:
message: kubernetes pod oom event.
type: event
- enable: disable
expression: sls.app.ack.pod.failed
name: pod-failed
notification:
message: kubernetes pod start failed event.
type: event
- enable: disable
expression: prom.pod.status.notHealthy
name: pod-status-err
notification:
message: 'Pod status exception. \nPromQL: min_over_time(sum by (namespace,
pod, phase) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed", job="_kube-state-metrics"})[${mins}m:1m])
> 0'
type: metric-prometheus
- enable: disable
expression: prom.pod.status.crashLooping
name: pod-crashloop
notification:
message: 'Pod status exception. \nPromQL: sum_over_time(increase(kube_pod_container_status_restarts_total{job="_kube-state-metrics"}[1m])[${mins}m:1m])
> 3'
type: metric-prometheus
- name: cluster-storage-err
rules:
- enable: disable
expression: sls.app.ack.csi.invalid_disk_size
name: csi_invalid_size
notification:
message: kubernetes csi invalid disk size.
type: event
- enable: disable
expression: sls.app.ack.csi.disk_not_portable
name: csi_not_portable
notification:
message: kubernetes csi not protable.
type: event
- enable: disable
expression: sls.app.ack.csi.deivce_busy
name: csi_device_busy
notification:
message: kubernetes csi disk device busy.
type: event
- enable: disable
expression: sls.app.ack.csi.no_ava_disk
name: csi_no_ava_disk
notification:
message: kubernetes csi no available disk.
type: event
- enable: disable
expression: sls.app.ack.csi.disk_iohang
name: csi_disk_iohang
notification:
message: kubernetes csi ioHang.
type: event
- enable: disable
expression: sls.app.ack.csi.latency_too_high
name: csi_latency_high
notification:
message: kubernetes csi pvc latency load too high.
type: event
- enable: disable
expression: prom.pv.failed
name: pv-failed
notification:
message: 'Cluster PersistentVolume failed. \nPromQL: kube_persistentvolume_status_phase{phase=~"Failed|Pending",
job="_kube-state-metrics"} > 0'
type: metric-prometheus
- name: cluster-network-err
rules:
- enable: disable
expression: sls.app.ack.ccm.no_ava_slb
name: slb-no-ava
notification:
message: kubernetes slb not available.
type: event
- enable: disable
expression: sls.app.ack.ccm.sync_slb_failed
name: slb-sync-err
notification:
message: kubernetes slb sync failed.
type: event
- enable: disable
expression: sls.app.ack.ccm.del_slb_failed
name: slb-del-err
notification:
message: kubernetes slb delete failed.
type: event
- enable: disable
expression: sls.app.ack.ccm.create_route_failed
name: route-create-err
notification:
message: kubernetes create route failed.
type: event
- enable: disable
expression: sls.app.ack.ccm.sync_route_failed
name: route-sync-err
notification:
message: kubernetes sync route failed.
type: event
- enable: disable
expression: sls.app.ack.terway.invalid_resource
name: terway-invalid-res
notification:
message: kubernetes terway have invalid resource.
type: event
- enable: disable
expression: sls.app.ack.terway.alloc_ip_fail
name: terway-alloc-ip-err
notification:
message: kubernetes terway allocate ip error.
type: event
- enable: disable
expression: sls.app.ack.terway.parse_fail
name: terway-parse-err
notification:
message: kubernetes terway parse k8s.aliyun.com/ingress-bandwidth annotation
error.
type: event
- enable: disable
expression: sls.app.ack.terway.allocate_failure
name: terway-alloc-res-err
notification:
message: kubernetes parse resource error.
type: event
- enable: disable
expression: sls.app.ack.terway.dispose_failure
name: terway-dispose-err
notification:
message: kubernetes dispose resource error.
type: event
- enable: disable
expression: sls.app.ack.terway.virtual_mode_change
name: terway-virt-mod-err
notification:
message: kubernetes virtual mode changed.
type: event
- enable: disable
expression: sls.app.ack.terway.config_check
name: terway-ip-check
notification:
message: kubernetes terway execute pod ip config check.
type: event
- enable: disable
expression: sls.app.ack.ingress.err_reload_nginx
name: ingress-reload-err
notification:
message: kubernetes ingress reload config error.
type: event
- name: security-err
rules:
- enable: disable
expression: sls.app.ack.si.config_audit_high_risk
name: si-c-a-risk
notification:
message: kubernetes high risks have be found after running config audit.
type: event
ruleVersion: v1.0.9Aturan peringatan dibuat pada instans Fleet tetapi belum berlaku di kluster mana pun hingga Anda membuat aturan distribusi pada Langkah 4.
Langkah 4: Distribusikan aturan peringatan ke kluster
Aturan distribusi menggunakan KubeVela untuk mendorong resource Kubernetes dari instans Fleet ke kluster terkait. Untuk informasi lebih lanjut tentang distribusi aplikasi, lihat Distribusi aplikasi.
Pilih metode distribusi berdasarkan cara Anda menargetkan kluster:
| Metode | Gunakan saat |
|---|---|
| Berdasarkan label | Anda ingin menargetkan kumpulan kluster dinamis (misalnya, semua kluster produksi) |
| Berdasarkan ID kluster | Anda ingin menargetkan kluster tertentu yang tetap |
Metode 1: Distribusi berdasarkan label
Kueri ID kluster terkait dan tambahkan label ke kluster yang ingin Anda targetkan:
kubectl get managedclusters kubectl label managedclusters <clusterid> production=trueBuat
ackalertrule-app.yamldengan konten berikut:apiVersion: core.oam.dev/v1beta1 kind: Application metadata: name: alertrules namespace: kube-system annotations: app.oam.dev/publishVersion: version1 spec: components: - name: alertrules type: ref-objects properties: objects: - resource: ackalertrules name: default policies: - type: topology name: prod-clusters properties: clusterSelector: production: "true" # Memilih kluster dengan label ini
Metode 2: Distribusi berdasarkan ID kluster
Buat ackalertrule-app.yaml dengan ID kluster target:
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: alertrules
namespace: kube-system
annotations:
app.oam.dev/publishVersion: version1
spec:
components:
- name: alertrules
type: ref-objects
properties:
objects:
- resource: ackalertrules
name: default
policies:
- type: topology
name: prod-clusters
properties:
clusters: ["<clusterid1>", "<clusterid2>"] # Ganti dengan ID kluster aktualTerapkan dan verifikasi aturan distribusi
Terapkan aturan distribusi:
kubectl apply -f ackalertrule-app.yamlPeriksa status distribusi:
kubectl amc appstatus alertrules -n kube-system --tree --detailJika distribusi berhasil, output menampilkan
updateduntuk setiap kluster:CLUSTER NAMESPACE RESOURCE STATUS APPLY_TIME DETAIL c565e4**** (cluster1)─── kube-system─── AckAlertRule/default updated 2022-**-** **:**:** Age: ** cbaa12**** (cluster2)─── kube-system─── AckAlertRule/default updated 2022-**-** **:**:** Age: **Jika suatu kluster menampilkan status selain
updated, verifikasi bahwa kluster tersebut masih terkait dengan instans Fleet dan resourceAckAlertRuletelah berhasil dibuat pada instans Fleet. Untuk detail manajemen peringatan, lihat Manajemen Peringatan.
Perbarui aturan peringatan
Untuk mengubah aturan peringatan setelah didistribusikan:
Edit
ackalertrule.yamldan terapkan perubahan:kubectl apply -f ackalertrule.yamlTingkatkan nilai anotasi
app.oam.dev/publishVersiondalamackalertrule-app.yaml(misalnya, ubahversion1menjadiversion2), lalu terapkan:kubectl apply -f ackalertrule-app.yamlMemperbarui anotasi memicu KubeVela untuk mendistribusikan ulang aturan yang dimodifikasi ke semua kluster target.
Langkah berikutnya
Manajemen Peringatan — konfigurasi aturan peringatan langsung pada masing-masing kluster
Konfigurasi aturan peringatan menggunakan CRD — referensi lengkap untuk field
AckAlertRuledan ekspresi yang didukungDistribusi aplikasi — distribusikan resource Kubernetes lainnya ke berbagai kluster menggunakan mekanisme yang sama