All Products
Search
Document Center

Container Service for Kubernetes:Use a self-managed Prometheus instance to collect metrics of control plane components and configure alerts

Last Updated:Jul 16, 2025

ACK managed Pro clusters provide monitoring metrics of control plane components and a dashboard. This topic describes how to collect metrics of key components in a self-managed Prometheus instance and configure monitoring alerts based on these metrics to integrate with your self-managed monitoring system.

Usage notes

  • When you use this feature, ensure that the self-managed Prometheus instance can access the API server of the ACK managed Pro cluster and has the /metrics read permissions.

  • The self-managed Prometheus instance can be deployed inside or outside the ACK managed Pro cluster.

  • ACK managed Pro clusters allow you to sink monitoring metrics of key control plane components to external systems. These control plane components include kube-apiserver, etcd, kube-scheduler, kube-controller-manager, and cloud-controller-manager. Before you use this feature, we recommend that you review the following topics to learn about the metrics and their descriptions:

    You can also use Managed Service for Prometheus in a cluster. Managed Service for Prometheus monitors and automatically collects data, and provides real-time Grafana dashboards. You can also create alerts for monitoring jobs and receive alerts in real time through emails, text messages, and DingTalk.

Configure the collection file for Managed Service for Prometheus

To use a self-managed Prometheus instance to collect metrics of key control plane components in a cluster, add metric collection Jobs to the configuration file prometheus.yaml of the Prometheus instance. In the configuration file, each Job corresponds to a key component. For more information, refer to the metrics supported by each component.

For more information about how to configure prometheus.yaml for open source Prometheus, see Configuration.
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: ack-api-server
    ......

  - job_name: ack-etcd
    ......

  - job_name: ack-scheduler
    ......

            

For more information about open source Prometheus Operator and the ack-prometheus-operator component provided by the marketplace of ACK, see Open source Prometheus monitoring. For more information about how to configure custom metric collection configurations, see Prometheus Operator.

Configure alert rules in Managed Service for Prometheus

For more information about how to configure alerts for open source Prometheus, see Alerting_rules.

Monitoring inside the cluster

If Managed Service for Prometheus is deployed in a cluster, you can monitor and collect data from the key components of the cluster. For more information, see the following topics.

kube-apiserver

For more information about the metrics collected by the kube-apiserver component, see Metrics of kube-apiserver.

For clusters created after February 2023 with version 1.20 or later, the service access path for the kubernetes service in the default namespace has been upgraded from Classic Load Balancer (CLB) forwarding to Elastic Network Interface (ENI) direct connection architecture. For more information, see Kube API Server. After this change, all kube-apiserver replicas are visible to the data plane. You can configure monitoring collection tasks to directly collect kube-apiserver metrics, making the collection path more direct and the metric coverage more comprehensive.

You can run the command kubectl get endpoints kubernetes to determine the backend link type of the kubernetes Service in your cluster.

Expand to view expected output

  • ENI direct connection architecture: The expected output shows 2 or more IP addresses (such as a.b.c.d:6443,w.x.y.z:6443).

    NAME         ENDPOINTS                               AGE
    kubernetes   a.b.c.d:6443,w.x.y.z:6443               27h
  • CLB forwarding architecture: The expected output shows only 1 IP address (such as a.b.c.d:6443), which is the internal IP address of the CLB.

    NAME         ENDPOINTS                               AGE
    kubernetes   a.b.c.d:6443                            27h

Choose the Prometheus collection configuration and alert rules based on the backend link type of the Kubernetes service in your cluster.

  • Prometheus collection configuration

    • ENI direct connection architecture

      - job_name: ack-api-server  
        scrape_interval: 30s
        scrape_timeout: 30s
        metrics_path: /metrics
        scheme: https
        #  scheme: https
        honor_labels: true
        honor_timestamps: true
        kubernetes_sd_configs:
        - role: endpoints
          namespaces:
            names: [default]
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        tls_config:
          insecure_skip_verify: false
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          server_name: kubernetes
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_label_component]
          separator: ;
          regex: apiserver
          replacement: $1
          action: keep
        - source_labels: [__meta_kubernetes_service_label_provider]
          separator: ;
          regex: kubernetes
          replacement: $1
          action: keep
        - source_labels: [__meta_kubernetes_endpoint_port_name]
          separator: ;
          regex: https
          replacement: $1
          action: keep
        - source_labels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          target_label: namespace
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
          separator: ;
          regex: Node;(.*)
          target_label: node
          replacement: ${1}
          action: replace
        - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
          separator: ;
          regex: Pod;(.*)
          target_label: pod
          replacement: ${1}
          action: replace
        - source_labels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          target_label: service
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          target_label: job
          replacement: ${1}
          action: replace
        - source_labels: [__meta_kubernetes_service_label_component]
          separator: ;
          regex: (.+)
          target_label: job
          replacement: ${1}
          action: replace
        - separator: ;
          regex: (.*)
          target_label: endpoint
          replacement: https
          action: replace
    • CLB forwarding architecture

      - job_name: ack-api-server  
        scrape_interval: 30s
        scrape_timeout: 30s
        metrics_path: /metrics
        scheme: https
        #  scheme: https
        honor_labels: true
        honor_timestamps: true
        params:
          hosting: ["true"]
          job: ["apiserver"]
        kubernetes_sd_configs:
        - role: endpoints
          namespaces:
            names: [default]
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        tls_config:
          insecure_skip_verify: false
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          server_name: kubernetes
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_label_component]
          separator: ;
          regex: apiserver
          replacement: $1
          action: keep
        - source_labels: [__meta_kubernetes_service_label_provider]
          separator: ;
          regex: kubernetes
          replacement: $1
          action: keep
        - source_labels: [__meta_kubernetes_endpoint_port_name]
          separator: ;
          regex: https
          replacement: $1
          action: keep
        - source_labels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          target_label: namespace
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
          separator: ;
          regex: Node;(.*)
          target_label: node
          replacement: ${1}
          action: replace
        - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
          separator: ;
          regex: Pod;(.*)
          target_label: pod
          replacement: ${1}
          action: replace
        - source_labels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          target_label: service
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          target_label: job
          replacement: ${1}
          action: replace
        - source_labels: [__meta_kubernetes_service_label_component]
          separator: ;
          regex: (.+)
          target_label: job
          replacement: ${1}
          action: replace
       - separator: ;
          regex: (.*)
          target_label: endpoint
          replacement: https
          action: replace
  • Prometheus alert rules

    - alert: AckApiServerWarning
      annotations:
        message:  APIServer is not available in last 5 minutes. Please check the prometheus job and target status.
      expr: |
        (absent(up{job="ack-api-server",pod!=""}) or (count(up{job="ack-api-server",pod!=""}) <= 1)) == 1
      for: 5m
      labels:
        severity: critical

etcd

For more information about the metrics collected by the etcd component, see Metrics of etcd.
  • Prometheus collection configuration

    - job_name: ack-etcd 
      scrape_interval: 30s
      scrape_timeout: 30s
      metrics_path: /metrics
      scheme: https
      #  scheme: https
      honor_labels: true
      honor_timestamps: true
      params:
        hosting: ["true"]
        job: ["etcd"]
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [default]
      authorization:
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        insecure_skip_verify: false
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server_name: kubernetes
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: apiserver
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_service_label_provider]
        separator: ;
        regex: kubernetes
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_endpoint_port_name]
        separator: ;
        regex: https
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_namespace]
        separator: ;
        regex: (.*)
        target_label: namespace
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Node;(.*)
        target_label: node
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Pod;(.*)
        target_label: pod
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: service
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: job
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: (.+)
        target_label: job
        replacement: ${1}
        action: replace
      - separator: ;
        regex: (.*)
        target_label: endpoint
        replacement: https
        action: replace
  • Prometheus alert rules

    - alert: AckETCDWarning
      annotations:
        message: Etcd cluster has no leader in last 5 minutes, please check whether the cluster is overloaded and contact ACK team.
      expr: |
        sum_over_time(etcd_server_has_leader[5m]) == 0
      for: 5m
      labels:
        severity: critical
    
    
    - alert: AckETCDWarning
      annotations:
        message: Etcd is not available in last 5 minutes. Please check the prometheus job and target status.
      expr: |
        (absent(up{job="ack-etcd",pod!=""}) or (count(up{job="ack-etcd",pod!=""}) <= 2)) == 1
      for: 5m
      labels:
        severity: critical

kube-scheduler

For more information about the metrics collected by the kube-scheduler component, see Metrics of kube-scheduler.
  • Prometheus Collection Configuration

    - job_name: ack-scheduler
      scrape_interval: 30s
      scrape_timeout: 30s
      metrics_path: /metrics
      scheme: https
      #  scheme: https
      honor_labels: true
      honor_timestamps: true
      params:
        hosting: ["true"]
        job: ["ack-scheduler"]
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [default]
      authorization:
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        insecure_skip_verify: false
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server_name: kubernetes
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: apiserver
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_service_label_provider]
        separator: ;
        regex: kubernetes
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_endpoint_port_name]
        separator: ;
        regex: https
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_namespace]
        separator: ;
        regex: (.*)
        target_label: namespace
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Node;(.*)
        target_label: node
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Pod;(.*)
        target_label: pod
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: service
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: job
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: (.+)
        target_label: job
        replacement: ${1}
        action: replace
      - separator: ;
        regex: (.*)
        target_label: endpoint
        replacement: https
        action: replace
  • Prometheus Alert Rules

    - alert: AckSchedulerWarning
      annotations:
        message: Scheduler is not available in last 3 minutes. Please check the prometheus job and target status.
      expr: |
        (absent(up{job="ack-scheduler",pod!=""}) or (count(up{job="ack-scheduler",pod!=""}) <= 0)) == 1
      for: 3m
      labels:
        severity: critical

kube-controller-manager

For more information about the metrics collected by the kube-controller-manager component, see Metrics of kube-controller-manager.
  • Prometheus Collection Configuration

    - job_name: ack-kcm
      scrape_interval: 30s
      scrape_timeout: 30s
      metrics_path: /metrics
      scheme: https
      #  scheme: https
      honor_labels: true
      honor_timestamps: true
      params:
        hosting: ["true"]
        job: ["ack-kube-controller-manager"]
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [default]
      authorization:
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        insecure_skip_verify: false
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server_name: kubernetes
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: apiserver
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_service_label_provider]
        separator: ;
        regex: kubernetes
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_endpoint_port_name]
        separator: ;
        regex: https
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_namespace]
        separator: ;
        regex: (.*)
        target_label: namespace
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Node;(.*)
        target_label: node
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Pod;(.*)
        target_label: pod
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: service
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: job
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: (.+)
        target_label: job
        replacement: ${1}
        action: replace
      - separator: ;
        regex: (.*)
        target_label: endpoint
        replacement: https
        action: replace
  • Prometheus Alert Rules

    - alert: AckKCMWarning
      annotations:
        message: KCM is not available in last 3 minutes. Please check the prometheus job and target status.
      expr: |
        (absent(up{job="ack-kcm",pod!=""})or(count(up{job="ack-kcm",pod!=""})<=0))>=1
      for: 3m
      labels:
        severity: critical

cloud-controller-manager

For more information about the metrics collected by the cloud-controller-manager component, see Metrics of cloud-controller-manager.
  • Prometheus Collection Configuration

    - job_name: ack-cloud-controller-manager
      scrape_interval: 30s
      scrape_timeout: 30s
      metrics_path: /metrics
      scheme: https
      #  scheme: https
      honor_labels: true
      honor_timestamps: true
      params:
        hosting: ["true"]
        job: ["ack-cloud-controller-manager"]
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [default]
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config: {ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt, server_name: kubernetes,
                   insecure_skip_verify: false}
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: apiserver
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_service_label_provider]
        separator: ;
        regex: kubernetes
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_endpoint_port_name]
        separator: ;
        regex: https
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_namespace]
        separator: ;
        regex: (.*)
        target_label: namespace
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Node;(.*)
        target_label: node
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Pod;(.*)
        target_label: pod
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: service
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: job
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: (.+)
        target_label: job
        replacement: ${1}
        action: replace
      - separator: ;
        regex: (.*)
        target_label: endpoint
        replacement: https
        action: replace
  • Prometheus Alert Rules

    - alert: AckCCMWarning
      annotations:
        message: CCM is not available in last 3 minutes. Please check the prometheus job and target status.
      expr: |
        (absent(up{job="ack-cloud-controller-manager",pod!=""}) or (count(up{job="ack-cloud-controller-manager",pod!=""}) <= 0)) == 1
      for: 3m
      labels:
        severity: critical

Monitoring outside the cluster

If your Prometheus instance is deployed outside the cluster to be monitored, see Configuration and Monitoring kubernetes with prometheus from outside of k8s cluster to complete the monitoring and data collection of the key components in the cluster. The following section displays the parameters.

  - job_name: 'out-of-k8s-scrape-job'
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/kubernetes-ca.crt
    bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'

    kubernetes_sd_configs:
      - api_server: 'https://<KUBERNETES URL>'
        role: node
        tls_config:
          ca_file: /etc/prometheus/kubernetes-ca.crt
        bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'
            

Verify results

  1. Log on to the console of the self-managed Prometheus system and go to the Graph page.

  2. Enter up to check whether all data from the control plane components is displayed as expected.

    up

    Expected output:

    自定义

    • up{instance="XX.XX.XX.XX:6443", job="ack-api-server"}: Agent Endpoint status. XX.XX.XX.XX is the IP address of the kubernetes Service in the default namespace of the cluster. The IP address varies by cluster.

    • up{instance="controlplane-xyz", job="ack-api-server", pod="controlplane-xyz"}: Status of the control plane pod. The up metric can be used to detect activity on pods that run control plane components.

  3. Enter the following metric and check whether the metric value is displayed as normal:

    apiserver_request_total{job="ack-api-server"}

    Expected output:

    显示2

    If the metric and value are displayed as normal, the self-managed Prometheus instance can collect the metrics of the key components as expected.