All Products
Search
Document Center

Container Service for Kubernetes:Monitor ACK Pro control plane components using a self-managed Prometheus

Last Updated:Dec 31, 2025

ACK Pro managed clusters expose metrics for their control plane components and provide component dashboards. This topic describes how to use a self-managed Prometheus instance to collect metrics from control plane components. You can then use these metrics to configure alerts and integrate them with your own monitoring system.

Before you begin

  • Ensure that your self-managed Prometheus instance can access the API Server of the ACK Pro managed cluster and has read permissions for the /metrics path.

  • You can deploy the self-managed Prometheus instance either inside or outside the cluster.

  • ACK Pro managed clusters expose metrics for control plane components, including kube-apiserver, etcd, kube-scheduler, kube-controller-manager, and cloud-controller-manager. Review the following documents to understand the exposed metrics and their descriptions:

    You can also use Alibaba Cloud Managed Service for Prometheus for monitoring within your cluster. Managed Service for Prometheus automatically monitors and collects data, provides real-time Grafana dashboards, and lets you create alerts delivered through channels such as email, SMS, and DingTalk.

Configure the Prometheus scrape configuration

To collect metrics from the control plane components with a self-managed Prometheus instance, you must configure corresponding scrape jobs in the prometheus.yaml file. In the provided example, each component corresponds to a job_name. For component-specific configuration details, refer to the respective metrics documentation.

For information about how to configure prometheus.yaml for a standard Prometheus instance, see the official Prometheus documentation.
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: ack-api-server
    ......

  - job_name: ack-etcd
    ......

  - job_name: ack-scheduler
    ......

Configure Prometheus alert rules

To learn how to configure alert rules for open source Prometheus, see Alerting rules.

In-cluster monitoring

If your self-managed Prometheus is deployed within the cluster, use the following configurations to collect metrics via the API Server proxy.

Step 1: Verify API Server Architecture

Run the following command to determine your cluster's networking architecture:

kubectl get endpoints kubernetes

ENI Direct Connection: The output shows 2+ IP addresses (such as 10.0.0.1:6443, 10.0.0.2:6443).

CLB Forwarding: The output shows only 1 IP address (the internal IP of the CLB).

Step 2: Component-specific configurations

kube-apiserver

The API server is the gateway for all metrics. Based on your architecture (ENI versus CLB), ensure you are targeting the correct endpoints.

Scrape configuration:

- job_name: ack-api-server
  scrape_interval: 30s
  scrape_timeout: 30s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
    - role: endpoints
      namespaces:
        names: [default]
  relabel_configs:
    - source_labels: [__meta_kubernetes_service_label_component]
      action: keep
      regex: apiserver
    - source_labels: [__meta_kubernetes_service_label_provider]
      action: keep
      regex: kubernetes
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      action: keep
      regex: https

Recommended alerting rules:

- alert: AckApiServerWarning
  expr: (absent(up{job="ack-api-server",pod!=""}) or (count(up{job="ack-api-server",pod!=""}) <= 1)) == 1
  for: 5m
  labels:
    severity: critical
  annotations:
    message: "APIServer is not available. Please check the Prometheus job and target status."

etcd

The etcd job uses the API server as a proxy to fetch metrics from the distributed key-value store.

Scrape configuration:

- job_name: ack-etcd
  scrape_interval: 30s
  scrape_timeout: 30s
  metrics_path: /metrics
  scheme: https
  honor_labels: true
  params:
    hosting: ["true"]
    job: ["etcd"]
  kubernetes_sd_configs:
    - role: endpoints
      namespaces:
        names: [default]
  authorization:
    credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    insecure_skip_verify: false
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    server_name: kubernetes
  relabel_configs:
    - source_labels: [__meta_kubernetes_service_label_component]
      action: keep
      regex: apiserver
    - source_labels: [__meta_kubernetes_service_label_provider]
      action: keep
      regex: kubernetes
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      action: keep
      regex: https
    - source_labels: [__meta_kubernetes_service_label_component]
      action: replace
      target_label: job
      replacement: ${1}

Recommended alerting rules:

- alert: AckETCDLeaderMissing
  expr: sum_over_time(etcd_server_has_leader[5m]) == 0
  for: 5m
  labels:
    severity: critical
  annotations:
    message: "Etcd cluster has no leader in the last 5 minutes. Check if the cluster is overloaded."

- alert: AckETCDDown
  expr: (absent(up{job="ack-etcd",pod!=""}) or (count(up{job="ack-etcd",pod!=""}) <= 2)) == 1
  for: 5m
  labels:
    severity: critical
  annotations:
    message: "Etcd is unavailable. Check the Prometheus job and target status."

kube-scheduler

This job monitors the health and scheduling latency of the Kubernetes scheduler.

Scrape configuration:

- job_name: ack-scheduler
  scrape_interval: 30s
  scheme: https
  params:
    hosting: ["true"]
    job: ["ack-scheduler"]
  kubernetes_sd_configs:
    - role: endpoints
      namespaces:
        names: [default]
  authorization:
    credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    insecure_skip_verify: false
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    server_name: kubernetes
  relabel_configs:
    - source_labels: [__meta_kubernetes_service_label_component]
      action: keep
      regex: apiserver
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      action: keep
      regex: https
    - source_labels: [__meta_kubernetes_service_label_component]
      action: replace
      target_label: job
      replacement: ${1}

Recommended alerting rules:

- alert: AckSchedulerWarning
  expr: (absent(up{job="ack-scheduler",pod!=""}) or (count(up{job="ack-scheduler",pod!=""}) <= 0)) == 1
  for: 3m
  labels:
    severity: critical
  annotations:
    message: "Scheduler is unavailable. Check the Prometheus job and target status."

kube-controller-manager (KCM)

Monitor the controllers that manage core Kubernetes objects like nodes and namespaces.

Scrape configuration:

- job_name: ack-kcm
  scrape_interval: 30s
  scheme: https
  params:
    hosting: ["true"]
    job: ["ack-kube-controller-manager"]
  kubernetes_sd_configs:
    - role: endpoints
      namespaces:
        names: [default]
  authorization:
    credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    server_name: kubernetes
  relabel_configs:
    - source_labels: [__meta_kubernetes_service_label_component]
      action: keep
      regex: apiserver
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      action: keep
      regex: https
    - source_labels: [__meta_kubernetes_service_label_component]
      action: replace
      target_label: job
      replacement: ${1}

Recommended alerting rules:

- alert: AckKCMWarning
  expr: (absent(up{job="ack-kcm",pod!=""}) or (count(up{job="ack-kcm",pod!=""}) <= 0)) == 1
  for: 3m
  labels:
    severity: critical
  annotations:
    message: "KCM is unavailable. Check the Prometheus job and target status."

cloud-controller-manager (CCM)

You should update your alert rule to match the following standard for ACK Pro clusters. This ensures it only targets valid control plane pods and avoids "flapping" (false alarms during minor network blips).

Scrape configuration:

- job_name: ack-cloud-controller-manager
  scrape_interval: 30s
  scheme: https
  params:
    hosting: ["true"]
    job: ["ack-cloud-controller-manager"]
  kubernetes_sd_configs:
    - role: endpoints
      namespaces:
        names: [default]
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    server_name: kubernetes
  relabel_configs:
    - source_labels: [__meta_kubernetes_service_label_component]
      action: keep
      regex: apiserver
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      action: keep
      regex: https

Recommended alerting rules:

- alert: AckCCMWarning
  expr: (absent(up{job="ack-cloud-controller-manager",pod!=""}) or (count(up{job="ack-cloud-controller-manager",pod!=""}) <= 0)) == 1
  for: 3m
  labels:
    severity: critical
  annotations:
    message: "CCM is unavailable. Check the Prometheus job and target status."

Verify the results

  1. Log in to your Prometheus console and navigate to the Graph page.

  2. Run the up command. Verify that up{job="ack-api-server"} and other component jobs return a value of 1.

  3. Verify specific metrics, such as apiserver_request_total, to ensure time-series data is populating correctly.