All Products
Search
Document Center

Container Service for Kubernetes:Use a self-managed Prometheus system to collect metrics of control plane components and configure alerts

Last Updated:Aug 05, 2025

ACK managed cluster Pro exposes metrics of control plane components and provides component dashboards. This topic describes how to collect metrics of core cluster components in your self-managed Prometheus system and configure monitoring and alerts based on these metrics to integrate with your self-managed monitoring system.

Prerequisites

Configure the Prometheus collection file

Before you can use a self-managed Prometheus system to collect metrics from control plane components, you must configure the corresponding metric collection jobs in the Prometheus prometheus.yaml configuration file. In the sample configuration file, each core component corresponds to a job configuration. For more information about the configurations, see the metric description document for the corresponding core component.

For more information about how to configure prometheus.yaml for community Prometheus, see Configuration.
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: ack-api-server
    ......

  - job_name: ack-etcd
    ......

  - job_name: ack-scheduler
    ......

            

For more information about the community Prometheus Operator solution and the ack-prometheus-operator component in the ACK marketplace, see Open source Prometheus monitoring. For instructions on how to configure custom data collection, see the Prometheus Operator documentation from the official Prometheus Operator community.

Configure Prometheus alert rules

For more information about how to configure alerts for open source Prometheus, see Alerting_rules.

Intra-cluster monitoring

If your Prometheus system is deployed inside the cluster that you want to monitor, refer to the following sections to configure monitoring and data collection for the core components of the cluster.

Kube-apiserver

For a list of metrics that can be collected, see Metrics of the kube-apiserver component.

For clusters that are version 1.20 or later and were created in or after February 2023, the access path for the kubernetes service in the default namespace is upgraded from Classic Load Balancer (CLB) forwarding to a direct connection architecture that uses elastic network interfaces (ENIs). For more information, see Kube API Server. After this change, all kube-apiserver replicas are visible to the data plane. You can configure a monitoring job to directly collect kube-apiserver metrics. This provides a more direct collection link and more comprehensive metric coverage.

You can run the kubectl get endpoints kubernetes command to determine the backend link type of the kubernetes service in the cluster.

Click to view the expected output

  • ENI direct connection architecture: The output displays two or more IP addresses, such as a.b.c.d:6443,w.x.y.z:6443.

    NAME         ENDPOINTS                               AGE
    kubernetes   a.b.c.d:6443,w.x.y.z:6443               27h
  • CLB forwarding architecture: The output displays only one IP address, such as a.b.c.d:6443. This IP address is the internal IP address of the CLB instance.

    NAME         ENDPOINTS                               AGE
    kubernetes   a.b.c.d:6443                            27h

Select the Prometheus collection configuration and alert rule based on the backend link type of the Kubernetes service in the cluster.

  • Prometheus collection configuration

    • ENI direct connection architecture

      - job_name: ack-api-server  
        scrape_interval: 30s
        scrape_timeout: 30s
        metrics_path: /metrics
        scheme: https
        #  scheme: https
        honor_labels: true
        honor_timestamps: true
        kubernetes_sd_configs:
        - role: endpoints
          namespaces:
            names: [default]
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        tls_config:
          insecure_skip_verify: false
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          server_name: kubernetes
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_label_component]
          separator: ;
          regex: apiserver
          replacement: $1
          action: keep
        - source_labels: [__meta_kubernetes_service_label_provider]
          separator: ;
          regex: kubernetes
          replacement: $1
          action: keep
        - source_labels: [__meta_kubernetes_endpoint_port_name]
          separator: ;
          regex: https
          replacement: $1
          action: keep
        - source_labels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          target_label: namespace
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
          separator: ;
          regex: Node;(.*)
          target_label: node
          replacement: ${1}
          action: replace
        - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
          separator: ;
          regex: Pod;(.*)
          target_label: pod
          replacement: ${1}
          action: replace
        - source_labels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          target_label: service
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          target_label: job
          replacement: ${1}
          action: replace
        - source_labels: [__meta_kubernetes_service_label_component]
          separator: ;
          regex: (.+)
          target_label: job
          replacement: ${1}
          action: replace
        - separator: ;
          regex: (.*)
          target_label: endpoint
          replacement: https
          action: replace
    • CLB forwarding architecture

      - job_name: ack-api-server  
        scrape_interval: 30s
        scrape_timeout: 30s
        metrics_path: /metrics
        scheme: https
        #  scheme: https
        honor_labels: true
        honor_timestamps: true
        params:
          hosting: ["true"]
          job: ["apiserver"]
        kubernetes_sd_configs:
        - role: endpoints
          namespaces:
            names: [default]
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        tls_config:
          insecure_skip_verify: false
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          server_name: kubernetes
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_label_component]
          separator: ;
          regex: apiserver
          replacement: $1
          action: keep
        - source_labels: [__meta_kubernetes_service_label_provider]
          separator: ;
          regex: kubernetes
          replacement: $1
          action: keep
        - source_labels: [__meta_kubernetes_endpoint_port_name]
          separator: ;
          regex: https
          replacement: $1
          action: keep
        - source_labels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          target_label: namespace
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
          separator: ;
          regex: Node;(.*)
          target_label: node
          replacement: ${1}
          action: replace
        - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
          separator: ;
          regex: Pod;(.*)
          target_label: pod
          replacement: ${1}
          action: replace
        - source_labels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          target_label: service
          replacement: $1
          action: replace
        - source_labels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          target_label: job
          replacement: ${1}
          action: replace
        - source_labels: [__meta_kubernetes_service_label_component]
          separator: ;
          regex: (.+)
          target_label: job
          replacement: ${1}
          action: replace
        - separator: ;
          regex: (.*)
          target_label: endpoint
          replacement: https
          action: replace
  • Prometheus alert rule

    - alert: AckApiServerWarning
      annotations:
        message:  APIServer is not available in last 5 minutes. Please check the prometheus job and target status.
      expr: |
        (absent(up{job="ack-api-server",pod!=""}) or (count(up{job="ack-api-server",pod!=""}) <= 1)) == 1
      for: 5m
      labels:
        severity: critical

Etcd

For a list of metrics that can be collected, see Metrics of the etcd component.
  • Prometheus collection configuration

    - job_name: ack-etcd 
      scrape_interval: 30s
      scrape_timeout: 30s
      metrics_path: /metrics
      scheme: https
      #  scheme: https
      honor_labels: true
      honor_timestamps: true
      params:
        hosting: ["true"]
        job: ["etcd"]
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [default]
      authorization:
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        insecure_skip_verify: false
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server_name: kubernetes
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: apiserver
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_service_label_provider]
        separator: ;
        regex: kubernetes
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_endpoint_port_name]
        separator: ;
        regex: https
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_namespace]
        separator: ;
        regex: (.*)
        target_label: namespace
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Node;(.*)
        target_label: node
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Pod;(.*)
        target_label: pod
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: service
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: job
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: (.+)
        target_label: job
        replacement: ${1}
        action: replace
      - separator: ;
        regex: (.*)
        target_label: endpoint
        replacement: https
        action: replace
  • Prometheus alert rule

    - alert: AckETCDWarning
      annotations:
        message: Etcd cluster has no leader in last 5 minutes, please check whether the cluster is overloaded and contact ACK team.
      expr: |
        sum_over_time(etcd_server_has_leader[5m]) == 0
      for: 5m
      labels:
        severity: critical
    
    
    - alert: AckETCDWarning
      annotations:
        message: Etcd is not available in last 5 minutes. Please check the prometheus job and target status.
      expr: |
        (absent(up{job="ack-etcd",pod!=""}) or (count(up{job="ack-etcd",pod!=""}) <= 2)) == 1
      for: 5m
      labels:
        severity: critical

Kube-scheduler

For a list of metrics that can be collected, see Metrics of the kube-scheduler component.
  • Prometheus collection configuration

    - job_name: ack-scheduler
      scrape_interval: 30s
      scrape_timeout: 30s
      metrics_path: /metrics
      scheme: https
      #  scheme: https
      honor_labels: true
      honor_timestamps: true
      params:
        hosting: ["true"]
        job: ["ack-scheduler"]
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [default]
      authorization:
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        insecure_skip_verify: false
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server_name: kubernetes
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: apiserver
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_service_label_provider]
        separator: ;
        regex: kubernetes
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_endpoint_port_name]
        separator: ;
        regex: https
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_namespace]
        separator: ;
        regex: (.*)
        target_label: namespace
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Node;(.*)
        target_label: node
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Pod;(.*)
        target_label: pod
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: service
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: job
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: (.+)
        target_label: job
        replacement: ${1}
        action: replace
      - separator: ;
        regex: (.*)
        target_label: endpoint
        replacement: https
        action: replace
  • Prometheus alert rule

    - alert: AckSchedulerWarning
      annotations:
        message: Scheduler is not available in last 3 minutes. Please check the prometheus job and target status.
      expr: |
        (absent(up{job="ack-scheduler",pod!=""}) or (count(up{job="ack-scheduler",pod!=""}) <= 0)) == 1
      for: 3m
      labels:
        severity: critical

Kube-controller-manager

For a list of metrics that can be collected, see Metrics of the kube-controller-manager component.
  • Prometheus collection configuration

    - job_name: ack-kcm
      scrape_interval: 30s
      scrape_timeout: 30s
      metrics_path: /metrics
      scheme: https
      #  scheme: https
      honor_labels: true
      honor_timestamps: true
      params:
        hosting: ["true"]
        job: ["ack-kube-controller-manager"]
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [default]
      authorization:
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        insecure_skip_verify: false
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server_name: kubernetes
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: apiserver
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_service_label_provider]
        separator: ;
        regex: kubernetes
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_endpoint_port_name]
        separator: ;
        regex: https
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_namespace]
        separator: ;
        regex: (.*)
        target_label: namespace
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Node;(.*)
        target_label: node
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Pod;(.*)
        target_label: pod
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: service
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: job
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: (.+)
        target_label: job
        replacement: ${1}
        action: replace
      - separator: ;
        regex: (.*)
        target_label: endpoint
        replacement: https
        action: replace
  • Prometheus alert rule

    - alert: AckKCMWarning
      annotations:
        message: KCM is not available in last 3 minutes. Please check the prometheus job and target status.
      expr: |
        (absent(up{job="ack-kcm",pod!=""})or(count(up{job="ack-kcm",pod!=""})<=0))>=1
      for: 3m
      labels:
        severity: critical

Cloud-controller-manager

For a list of metrics that can be collected, see Metrics of the cloud-controller-manager component.
  • Prometheus collection configuration

    - job_name: ack-cloud-controller-manager
      scrape_interval: 30s
      scrape_timeout: 30s
      metrics_path: /metrics
      scheme: https
      #  scheme: https
      honor_labels: true
      honor_timestamps: true
      params:
        hosting: ["true"]
        job: ["ack-cloud-controller-manager"]
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [default]
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config: {ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt, server_name: kubernetes,
                   insecure_skip_verify: false}
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: apiserver
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_service_label_provider]
        separator: ;
        regex: kubernetes
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_endpoint_port_name]
        separator: ;
        regex: https
        replacement: $1
        action: keep
      - source_labels: [__meta_kubernetes_namespace]
        separator: ;
        regex: (.*)
        target_label: namespace
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Node;(.*)
        target_label: node
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
        separator: ;
        regex: Pod;(.*)
        target_label: pod
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: service
        replacement: $1
        action: replace
      - source_labels: [__meta_kubernetes_service_name]
        separator: ;
        regex: (.*)
        target_label: job
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_service_label_component]
        separator: ;
        regex: (.+)
        target_label: job
        replacement: ${1}
        action: replace
      - separator: ;
        regex: (.*)
        target_label: endpoint
        replacement: https
        action: replace
  • Prometheus alert rule

    - alert: AckCCMWarning
      annotations:
        message: CCM is not available in last 3 minutes. Please check the prometheus job and target status.
      expr: |
        (absent(up{job="ack-cloud-controller-manager",pod!=""}) or (count(up{job="ack-cloud-controller-manager",pod!=""}) <= 0)) == 1
      for: 3m
      labels:
        severity: critical

External cluster monitoring

If your Prometheus system is deployed outside the cluster that you want to monitor, refer to Configuration and Monitoring kubernetes with prometheus from outside of k8s cluster to configure monitoring and data collection for the core components of the cluster. The main configurations are as follows:

  - job_name: 'out-of-k8s-scrape-job'
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/kubernetes-ca.crt
    bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'

    kubernetes_sd_configs:
      - api_server: 'https://<KUBERNETES URL>'
        role: node
        tls_config:
          ca_file: /etc/prometheus/kubernetes-ca.crt
        bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'
            

Verify the results

  1. Log in to the console of your self-managed Prometheus system and go to the Graph page.

  2. Enter up and check whether the data for all control plane components is displayed as expected.

    up

    Expected output:

    自定义

    • up{instance="XX.XX.XX.XX:6443", job="ack-api-server"}: The status of the proxy Endpoint. XX.XX.XX.XX is the IP address of the kubernetes service in the default namespace of the cluster. This IP address varies based on the cluster.

    • up{instance="controlplane-xyz", job="ack-api-server", pod="controlplane-xyz"}: The status of the control plane pod. This up metric can be used for liveness probes for the control plane pod.

  3. Enter the following metric and check whether it is displayed as expected.

    apiserver_request_total{job="ack-api-server"}

    Expected output:

    显示2

    If the queried metric and data are displayed on the page, your self-managed Prometheus system can collect the metrics of core components.