Usage notes
When you use this feature, ensure that the self-managed Prometheus instance can access the API server of the ACK managed Pro cluster and has the /metrics
read permissions.
The self-managed Prometheus instance can be deployed inside or outside the ACK managed Pro cluster.
ACK managed Pro clusters allow you to sink monitoring metrics of key control plane components to external systems. These control plane components include kube-apiserver, etcd, kube-scheduler, kube-controller-manager, and cloud-controller-manager. Before you use this feature, we recommend that you review the following topics to learn about the metrics and their descriptions:
You can also use Managed Service for Prometheus in a cluster. Managed Service for Prometheus monitors and automatically collects data, and provides real-time Grafana dashboards. You can also create alerts for monitoring jobs and receive alerts in real time through emails, text messages, and DingTalk.
Configure the collection file for Managed Service for Prometheus
To use a self-managed Prometheus instance to collect metrics of key control plane components in a cluster, add metric collection Jobs to the configuration file prometheus.yaml of the Prometheus instance. In the configuration file, each Job corresponds to a key component. For more information, refer to the metrics supported by each component.
For more information about how to configure prometheus.yaml for open source Prometheus, see Configuration.
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'codelab-monitor'
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: ack-api-server
......
- job_name: ack-etcd
......
- job_name: ack-scheduler
......
For more information about open source Prometheus Operator and the ack-prometheus-operator component provided by the marketplace of ACK, see Open source Prometheus monitoring. For more information about how to configure custom metric collection configurations, see Prometheus Operator.
Configure alert rules in Managed Service for Prometheus
For more information about how to configure alerts for open source Prometheus, see Alerting_rules.
Monitoring inside the cluster
If Managed Service for Prometheus is deployed in a cluster, you can monitor and collect data from the key components of the cluster. For more information, see the following topics.
kube-apiserver
For more information about the metrics collected by the kube-apiserver component, see Metrics of kube-apiserver.
For clusters created after February 2023 with version 1.20 or later, the service access path for the kubernetes service in the default namespace has been upgraded from Classic Load Balancer (CLB) forwarding to Elastic Network Interface (ENI) direct connection architecture. For more information, see Kube API Server. After this change, all kube-apiserver replicas are visible to the data plane. You can configure monitoring collection tasks to directly collect kube-apiserver metrics, making the collection path more direct and the metric coverage more comprehensive.
You can run the command kubectl get endpoints kubernetes
to determine the backend link type of the kubernetes Service in your cluster.
Expand to view expected output
ENI direct connection architecture: The expected output shows 2 or more IP addresses (such as a.b.c.d:6443,w.x.y.z:6443
).
NAME ENDPOINTS AGE
kubernetes a.b.c.d:6443,w.x.y.z:6443 27h
CLB forwarding architecture: The expected output shows only 1 IP address (such as a.b.c.d:6443
), which is the internal IP address of the CLB.
NAME ENDPOINTS AGE
kubernetes a.b.c.d:6443 27h
Choose the Prometheus collection configuration and alert rules based on the backend link type of the Kubernetes service in your cluster.
etcd
For more information about the metrics collected by the etcd component, see Metrics of etcd.
Prometheus collection configuration
- job_name: ack-etcd
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: /metrics
scheme: https
# scheme: https
honor_labels: true
honor_timestamps: true
params:
hosting: ["true"]
job: ["etcd"]
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [default]
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
insecure_skip_verify: false
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server_name: kubernetes
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: apiserver
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_provider]
separator: ;
regex: kubernetes
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https
action: replace
Prometheus alert rules
- alert: AckETCDWarning
annotations:
message: Etcd cluster has no leader in last 5 minutes, please check whether the cluster is overloaded and contact ACK team.
expr: |
sum_over_time(etcd_server_has_leader[5m]) == 0
for: 5m
labels:
severity: critical
- alert: AckETCDWarning
annotations:
message: Etcd is not available in last 5 minutes. Please check the prometheus job and target status.
expr: |
(absent(up{job="ack-etcd",pod!=""}) or (count(up{job="ack-etcd",pod!=""}) <= 2)) == 1
for: 5m
labels:
severity: critical
kube-scheduler
For more information about the metrics collected by the kube-scheduler component, see Metrics of kube-scheduler.
Prometheus Collection Configuration
- job_name: ack-scheduler
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: /metrics
scheme: https
# scheme: https
honor_labels: true
honor_timestamps: true
params:
hosting: ["true"]
job: ["ack-scheduler"]
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [default]
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
insecure_skip_verify: false
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server_name: kubernetes
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: apiserver
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_provider]
separator: ;
regex: kubernetes
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https
action: replace
Prometheus Alert Rules
- alert: AckSchedulerWarning
annotations:
message: Scheduler is not available in last 3 minutes. Please check the prometheus job and target status.
expr: |
(absent(up{job="ack-scheduler",pod!=""}) or (count(up{job="ack-scheduler",pod!=""}) <= 0)) == 1
for: 3m
labels:
severity: critical
kube-controller-manager
For more information about the metrics collected by the kube-controller-manager component, see Metrics of kube-controller-manager.
Prometheus Collection Configuration
- job_name: ack-kcm
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: /metrics
scheme: https
# scheme: https
honor_labels: true
honor_timestamps: true
params:
hosting: ["true"]
job: ["ack-kube-controller-manager"]
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [default]
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
insecure_skip_verify: false
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server_name: kubernetes
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: apiserver
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_provider]
separator: ;
regex: kubernetes
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https
action: replace
Prometheus Alert Rules
- alert: AckKCMWarning
annotations:
message: KCM is not available in last 3 minutes. Please check the prometheus job and target status.
expr: |
(absent(up{job="ack-kcm",pod!=""})or(count(up{job="ack-kcm",pod!=""})<=0))>=1
for: 3m
labels:
severity: critical
cloud-controller-manager
For more information about the metrics collected by the cloud-controller-manager component, see Metrics of cloud-controller-manager.
Prometheus Collection Configuration
- job_name: ack-cloud-controller-manager
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: /metrics
scheme: https
# scheme: https
honor_labels: true
honor_timestamps: true
params:
hosting: ["true"]
job: ["ack-cloud-controller-manager"]
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [default]
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config: {ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt, server_name: kubernetes,
insecure_skip_verify: false}
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: apiserver
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_provider]
separator: ;
regex: kubernetes
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https
action: replace
Prometheus Alert Rules
- alert: AckCCMWarning
annotations:
message: CCM is not available in last 3 minutes. Please check the prometheus job and target status.
expr: |
(absent(up{job="ack-cloud-controller-manager",pod!=""}) or (count(up{job="ack-cloud-controller-manager",pod!=""}) <= 0)) == 1
for: 3m
labels:
severity: critical
Monitoring outside the cluster
If your Prometheus instance is deployed outside the cluster to be monitored, see Configuration and Monitoring kubernetes with prometheus from outside of k8s cluster to complete the monitoring and data collection of the key components in the cluster. The following section displays the parameters.
- job_name: 'out-of-k8s-scrape-job'
scheme: https
tls_config:
ca_file: /etc/prometheus/kubernetes-ca.crt
bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'
kubernetes_sd_configs:
- api_server: 'https://<KUBERNETES URL>'
role: node
tls_config:
ca_file: /etc/prometheus/kubernetes-ca.crt
bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'
Verify results
Log on to the console of the self-managed Prometheus system and go to the Graph page.
Enter up to check whether all data from the control plane components is displayed as expected.
up
Expected output:

up{instance="XX.XX.XX.XX:6443", job="ack-api-server"}
: Agent Endpoint status. XX.XX.XX.XX
is the IP address of the kubernetes Service in the default namespace of the cluster. The IP address varies by cluster.
up{instance="controlplane-xyz", job="ack-api-server", pod="controlplane-xyz"}
: Status of the control plane pod. The up
metric can be used to detect activity on pods that run control plane components.
Enter the following metric and check whether the metric value is displayed as normal:
apiserver_request_total{job="ack-api-server"}
Expected output:

If the metric and value are displayed as normal, the self-managed Prometheus instance can collect the metrics of the key components as expected.