This topic explains how to adjust Prometheus's configuration to collect metrics from a specific virtual node.
Feature introduction
The architecture of virtual nodes allows for the sharing of a single node IP address among multiple virtual nodes within a cluster. Consequently, querying a single virtual node yields the complete dataset from all associated virtual nodes. Since Prometheus typically retrieves metrics from all nodes via the kubelet service, this can lead to metric duplication. To address this issue, and Container Service for Kubernetes (ACK) offer a feature to collect metrics specifically from a designated virtual node. Besides maintaining the original collection endpoint <nodeIP>:10250/metrics/cadvisor, an additional endpoint is provided for targeting a specific virtual node by name <nodeIP>:10250/metrics/cadvisor?nodeName=<nodeName>. When you specify a virtual node's name, only the monitoring data for the pods overseen by that virtual node is returned.
Prerequisites
The ACK Virtual Node widget is installed, and the widget version is v2.11.0 or higher. For more information, see ACK Virtual Node.
Modify the configuration of Prometheus
You can update the Prometheus configuration to gather metrics from a specific virtual node. Container Service for Kubernetes is compatible with Alibaba Cloud Managed Service for Prometheus, the Community Edition Prometheus Operator solution, and the Container Service for Kubernetes application market ack-prometheus-operator, offering three methods for configuring open-source Prometheus.
Alibaba Cloud Managed Service for Prometheus
Default support is provided. No additional action is required.
Community Edition Prometheus Operator
When using the Community Edition Prometheus Operator solution in conjunction with the Container Service for Kubernetes application marketplace ack-prometheus-operator, you must include the following ServiceMonitor CR configuration.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: virtual-kubelet
namespace: monitoring
labels:
k8s-app: kubelet
# Add this label to automatically manage prometheus-operator.
release: prometheus-operator
spec:
jobLabel: k8s-app
selector:
matchLabels:
k8s-app: kubelet
namespaceSelector:
matchNames:
- kube-system
endpoints:
- port: https-metrics
interval: 15s
scheme: https
path: /metrics/cadvisor
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecureSkipVerify: true
relabelings:
# Retain only the virtual node endpoint.
- sourceLabels: [__meta_kubernetes_endpoint_address_target_name]
regex: (^virtual-kubelet.*)
action: keep
# Add parameters to query based on the specified nodeName.
- sourceLabels: [__meta_kubernetes_endpoint_address_target_name]
regex: (^virtual-kubelet.*)
targetLabel: __param_nodeName
replacement: ${1}
action: replace
If the cluster is already using service discovery based on the kubelet service to collect cAdvisor metrics, you must add the following configuration to exclude the collection configuration for <Virtual Node IP>:10250/metrics/cadvisor to prevent duplicate data collection.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
...
spec:
endpoints:
- path: /metrics/cadvisor
port: https-metrics
...
relabelings:
# The relabeling rule discards the endpoints of all targets whose names start with virtual-kubelet.
- action: drop
regex: (^virtual-kubelet.*)
sourceLabels:
- __meta_kubernetes_endpoint_address_target_name
Open-source Prometheus
For open-source Prometheus, locate the Prometheus configuration file, typically found at /etc/prometheus/prometheus.yml or within your custom configuration folder, and add the following collection configuration.
scrape_configs:
...Other job configuration.
- job_name: monitoring/virtual-kubelet/0
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: (^virtual-kubelet.*)
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: (^virtual-kubelet.*)
target_label: __param_nodeName
replacement: ${1}
action: replace
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
If the cluster is already using service discovery based on the kubelet service to collect cAdvisor metrics, you must add the following configuration to exclude the collection configuration for <Virtual Node IP>:10250/metrics/cadvisor to prevent duplicate data collection.
scrape_configs:
...Other job configuration.
- job_name: monitoring/ack-prometheus-operator-kubelet/0
honor_labels: true
honor_timestamps: true
...
relabel_configs:
...
// Discard the endpoint for collecting the /metrics/cadviso metrics of virtual nodes.
- source_labels: [__meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: (^virtual-kubelet.*)
replacement: $1
action: drop