Prerequisites
Your self-managed Prometheus system must be able to access the ACK managed cluster Pro API Server and have read permissions on /metrics.
You can deploy your self-managed Prometheus system in or outside the cluster.
ACK managed cluster Pro exposes metrics for managed components, such as kube-apiserver, etcd, kube-scheduler, kube-controller-manager, and cloud-controller-manager. Before you use this feature, we recommend that you review the following documents to learn about the metrics exposed by these components and their descriptions:
Alternatively, you can use Alibaba Cloud Prometheus monitoring in your cluster. Alibaba Cloud Prometheus automatically monitors and collects data, provides real-time Grafana dashboards, and lets you create alerts for monitoring jobs. You can receive alerts in real time through channels such as email, text messages, and DingTalk.
Configure the Prometheus collection file
Before you can use a self-managed Prometheus system to collect metrics from control plane components, you must configure the corresponding metric collection jobs in the Prometheus prometheus.yaml configuration file. In the sample configuration file, each core component corresponds to a job configuration. For more information about the configurations, see the metric description document for the corresponding core component.
For more information about how to configure prometheus.yaml for community Prometheus, see Configuration.
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'codelab-monitor'
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: ack-api-server
......
- job_name: ack-etcd
......
- job_name: ack-scheduler
......
For more information about the community Prometheus Operator solution and the ack-prometheus-operator component in the ACK marketplace, see Open source Prometheus monitoring. For instructions on how to configure custom data collection, see the Prometheus Operator documentation from the official Prometheus Operator community.
Configure Prometheus alert rules
For more information about how to configure alerts for open source Prometheus, see Alerting_rules.
Intra-cluster monitoring
If your Prometheus system is deployed inside the cluster that you want to monitor, refer to the following sections to configure monitoring and data collection for the core components of the cluster.
Kube-apiserver
For a list of metrics that can be collected, see Metrics of the kube-apiserver component.
For clusters that are version 1.20 or later and were created in or after February 2023, the access path for the kubernetes service in the default namespace is upgraded from Classic Load Balancer (CLB) forwarding to a direct connection architecture that uses elastic network interfaces (ENIs). For more information, see Kube API Server. After this change, all kube-apiserver replicas are visible to the data plane. You can configure a monitoring job to directly collect kube-apiserver metrics. This provides a more direct collection link and more comprehensive metric coverage.
You can run the kubectl get endpoints kubernetes command to determine the backend link type of the kubernetes service in the cluster.
Click to view the expected output
ENI direct connection architecture: The output displays two or more IP addresses, such as a.b.c.d:6443,w.x.y.z:6443.
NAME ENDPOINTS AGE
kubernetes a.b.c.d:6443,w.x.y.z:6443 27h
CLB forwarding architecture: The output displays only one IP address, such as a.b.c.d:6443. This IP address is the internal IP address of the CLB instance.
NAME ENDPOINTS AGE
kubernetes a.b.c.d:6443 27h
Select the Prometheus collection configuration and alert rule based on the backend link type of the Kubernetes service in the cluster.
Etcd
For a list of metrics that can be collected, see Metrics of the etcd component.
Prometheus collection configuration
- job_name: ack-etcd
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: /metrics
scheme: https
# scheme: https
honor_labels: true
honor_timestamps: true
params:
hosting: ["true"]
job: ["etcd"]
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [default]
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
insecure_skip_verify: false
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server_name: kubernetes
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: apiserver
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_provider]
separator: ;
regex: kubernetes
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https
action: replace
Prometheus alert rule
- alert: AckETCDWarning
annotations:
message: Etcd cluster has no leader in last 5 minutes, please check whether the cluster is overloaded and contact ACK team.
expr: |
sum_over_time(etcd_server_has_leader[5m]) == 0
for: 5m
labels:
severity: critical
- alert: AckETCDWarning
annotations:
message: Etcd is not available in last 5 minutes. Please check the prometheus job and target status.
expr: |
(absent(up{job="ack-etcd",pod!=""}) or (count(up{job="ack-etcd",pod!=""}) <= 2)) == 1
for: 5m
labels:
severity: critical
Kube-scheduler
For a list of metrics that can be collected, see Metrics of the kube-scheduler component.
Prometheus collection configuration
- job_name: ack-scheduler
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: /metrics
scheme: https
# scheme: https
honor_labels: true
honor_timestamps: true
params:
hosting: ["true"]
job: ["ack-scheduler"]
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [default]
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
insecure_skip_verify: false
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server_name: kubernetes
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: apiserver
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_provider]
separator: ;
regex: kubernetes
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https
action: replace
Prometheus alert rule
- alert: AckSchedulerWarning
annotations:
message: Scheduler is not available in last 3 minutes. Please check the prometheus job and target status.
expr: |
(absent(up{job="ack-scheduler",pod!=""}) or (count(up{job="ack-scheduler",pod!=""}) <= 0)) == 1
for: 3m
labels:
severity: critical
Kube-controller-manager
For a list of metrics that can be collected, see Metrics of the kube-controller-manager component.
Prometheus collection configuration
- job_name: ack-kcm
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: /metrics
scheme: https
# scheme: https
honor_labels: true
honor_timestamps: true
params:
hosting: ["true"]
job: ["ack-kube-controller-manager"]
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [default]
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
insecure_skip_verify: false
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server_name: kubernetes
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: apiserver
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_provider]
separator: ;
regex: kubernetes
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https
action: replace
Prometheus alert rule
- alert: AckKCMWarning
annotations:
message: KCM is not available in last 3 minutes. Please check the prometheus job and target status.
expr: |
(absent(up{job="ack-kcm",pod!=""})or(count(up{job="ack-kcm",pod!=""})<=0))>=1
for: 3m
labels:
severity: critical
Cloud-controller-manager
For a list of metrics that can be collected, see Metrics of the cloud-controller-manager component.
Prometheus collection configuration
- job_name: ack-cloud-controller-manager
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: /metrics
scheme: https
# scheme: https
honor_labels: true
honor_timestamps: true
params:
hosting: ["true"]
job: ["ack-cloud-controller-manager"]
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [default]
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config: {ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt, server_name: kubernetes,
insecure_skip_verify: false}
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: apiserver
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_provider]
separator: ;
regex: kubernetes
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_component]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https
action: replace
Prometheus alert rule
- alert: AckCCMWarning
annotations:
message: CCM is not available in last 3 minutes. Please check the prometheus job and target status.
expr: |
(absent(up{job="ack-cloud-controller-manager",pod!=""}) or (count(up{job="ack-cloud-controller-manager",pod!=""}) <= 0)) == 1
for: 3m
labels:
severity: critical
External cluster monitoring
If your Prometheus system is deployed outside the cluster that you want to monitor, refer to Configuration and Monitoring kubernetes with prometheus from outside of k8s cluster to configure monitoring and data collection for the core components of the cluster. The main configurations are as follows:
- job_name: 'out-of-k8s-scrape-job'
scheme: https
tls_config:
ca_file: /etc/prometheus/kubernetes-ca.crt
bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'
kubernetes_sd_configs:
- api_server: 'https://<KUBERNETES URL>'
role: node
tls_config:
ca_file: /etc/prometheus/kubernetes-ca.crt
bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'
Verify the results
Log in to the console of your self-managed Prometheus system and go to the Graph page.
Enter up and check whether the data for all control plane components is displayed as expected.
up
Expected output:

up{instance="XX.XX.XX.XX:6443", job="ack-api-server"}: The status of the proxy Endpoint. XX.XX.XX.XX is the IP address of the kubernetes service in the default namespace of the cluster. This IP address varies based on the cluster.
up{instance="controlplane-xyz", job="ack-api-server", pod="controlplane-xyz"}: The status of the control plane pod. This up metric can be used for liveness probes for the control plane pod.
Enter the following metric and check whether it is displayed as expected.
apiserver_request_total{job="ack-api-server"}
Expected output:

If the queried metric and data are displayed on the page, your self-managed Prometheus system can collect the metrics of core components.