This topic provides answers to some frequently asked questions (FAQ) about Prometheus Service (Prometheus).

Overview

Why am I unable to view the metrics for Kubelet and API server on the Grafana dashboard I created?

This is because only specific free basic metrics are collected. The following list shows the collected metrics:

	  kube_pod_container_status_last_terminated_reason
	  rest_client_requests_total
	  scheduler_pod_scheduling_attempts_bucket
	  scheduler_pending_pods
	  scheduler_scheduler_cache_size
	  kube_node_spec_taint
	  kube_node_status_capacity_pods
	  kubelet_pleg_relist_duration_seconds_bucket
	  kubelet_node_name
	  kubelet_pod_worker_duration_seconds_bucket
	  kubelet_certificate_manager_client_ttl_seconds
	  kubelet_certificate_manager_server_ttl_seconds
	  kubelet_certificate_manager_client_expiration_renew_errors
	  kubelet_server_expiration_renew_errors
	  apiserver_client_certificate_expiration_seconds_count
	  apiserver_client_certificate_expiration_seconds_bucket
	  aggregator_unavailable_apiservice_count
	  aggregator_unavailable_apiservice
	  kubernetes_build_info
	  node_filesystem_readonly
	  node_network_receive_errs_total
	  node_network_transmit_errs_total
	  node_timex_offset_seconds
	  node_timex_sync_status
	  node_network_up
	  kube_resourcequota
	  kube_daemonset_status_current_number_scheduled
	  kube_daemonset_status_desired_number_scheduled
	  kube_daemonset_status_number_misscheduled
	  kube_daemonset_updated_number_scheduled
	  kube_daemonset_status_number_available
	  kube_job_spec_completions
	  kube_job_status_succeeded
	  kube_job_failed
	  kube_persistentvolume_status_phase
	  apiserver_dropped_requests_total
	  container_fs_inodes_total
	  container_fs_inodes_free
	  nvidia_gpu_num_devices
	  etcd_request_duration_seconds_count
	  etcd_request_duration_seconds_sum
	  etcd_object_counts
	  etcd_debugging_mvcc_keys_total
	  etcd_disk_backend_commit_duration_seconds_bucket
	  etcd_server_leader_changes_seen_total
	  etcd_server_has_leader
	  etcd_debugging_mvcc_db_total_size_in_bytes
	  nvidia_gpu_temperature_celsius
	  nvidia_gpu_memory_used_bytes
	  nvidia_gpu_memory_total_bytes
	  nvidia_gpu_power_usage_milliwatts
	  nvidia_gpu_duty_cycle
	  apiserver_request_latencies_summary
	  kube_hpa_status_condition
	  kube_hpa_labels
	  kube_hpa_metadata_generation
	  kube_resourcequota
	  kube_pod_container_status_waiting_reason
	  container_cpu_load_average_10s
	  container_network_receive_errors_total
	  container_network_receive_packets_dropped_total
	  container_network_transmit_errors_total
	  container_network_transmit_packets_dropped_total
	  container_memory_max_usage_bytes
	  container_memory_cache
	  container_memory_swap
	  container_memory_failcnt
	  kube_pod_labels
	  kube_deployment_labels
	  kube_node_labels
	  kube_pod_status_ready
	  kube_node_status_capacity
	  kube_node_status_condition
	  kube_pod_container_resource_limits
	  kubelet_volume_stats_used_bytes
	  kubelet_volume_stats_inodes_used
	  kubelet_volume_stats_inodes_free
	  kubelet_volume_stats_inodes
	  kubelet_volume_stats_capacity_bytes
	  kubelet_volume_stats_available_bytes
	  kube_pod_container_resource_limits
	  node_filesystem_usage
	  container_fs_writes_bytes_total
	  container_fs_reads_bytes_total
	  container_memory_usage_bytes
	  kube_node_labels
	  kube_deployment_status_replicas_unavailable
	  kube_job_status_failed
	  kube_job_status_active
	  kube_job_status_succeeded
	  kube_pod_container_status_restarts
	  kube_pod_container_status_terminated
	  kube_pod_container_status_waiting
	  kube_pod_container_status_running
	  kube_node_spec_unschedulable
	  kube_node_status_condition
	  kube_node_info
	  kube_node_status_allocatable_pods
	  node_filesystem_size
	  kube_deployment_status_replicas_unavailable
	  node_filesystem_free
	  node_uname_info
	  go_gc_duration_seconds
	  go_goroutines
	  container_network_transmit_bytes_total
	  container_network_receive_bytes_total
	  process_resident_memory_bytes
	  process_cpu_seconds_total
	  apiserver_current_inflight_requests
	  apiserver_longrunning_gauge
	  container_memory_rss
	  container_spec_memory_limit_bytes
	  container_network_transmit_bytes_total
	  apiserver_request_total
	  apiserver_request_count
	  apiserver_request_duration_seconds_bucket
	  kube_pod_container_resource_limits_memory_bytes
	  kube_pod_container_resource_limits_cpu_cores
	  container_accelerator_memory_total_bytes
	  container_accelerator_memory_used_bytes
	  container_accelerator_duty_cycle
	  kube_service_info
	  kube_pod_status_phase
	  node_cpu_seconds_total
	  kube_pod_labels
	  kube_deployment_spec_strategy_rollingupdate_max_unavailable
	  kube_deployment_metadata_generation
	  kube_deployment_status_observed_generation
	  kube_deployment_spec_replicas
	  kube_deployment_status_replicas_available
	  kube_deployment_spec_replicas
	  kube_deployment_status_replicas_updated
	  kube_deployment_created
	  kube_node_status_allocatable_memory_bytes
	  kube_node_status_allocatable_cpu_cores
	  kube_node_status_capacity_memory_bytes
	  kube_node_status_capacity_cpu_cores
	  kube_node_status_condition
	  container_cpu_cfs_throttled_periods_total
	  container_cpu_cfs_periods_total
	  container_cpu_cfs_throttled_seconds_total
	  kube_pod_container_resource_requests_memory_bytes
	  kube_pod_container_resource_requests_cpu_cores
	  kube_hpa_spec_max_replicas
	  kube_hpa_spec_min_replicas
	  kube_hpa_status_desired_replicas
	  kube_hpa_status_current_replicas
	  kube_pod_container_status_restarts_total
	  container_network_receive_bytes_total
	  container_memory_working_set_bytes
	  machine_memory_bytes
	  container_cpu_usage_seconds_total
	  machine_cpu_cores
	  kube_pod_info
	  kube_pod_container_info
	  container_fs_usage_bytes
	  container_fs_limit_bytes
	  kube_daemonset_created
	  kube_statefulset_created
	  kube_deployment_created
	  kube_deployment_status_replicas
	  kube_statefulset_replicas
	  kube_daemonset_status_desired_number_scheduled
	  kube_deployment_status_replicas_available
	  kube_statefulset_status_replicas
	  kube_daemonset_status_number_ready
	  kube_pod_container_resource_requests_cpu_cores
	  kube_pod_container_resource_requests_memory_bytes
	  node_boot_time_seconds
	  node_memory_MemAvailable_bytes
	  node_memory_MemTotal_bytes
	  node_memory_MemFree_bytes
	  node_memory_Buffers_bytes
	  node_memory_Cached_bytes
	  node_filefd_allocated
	  node_filesystem_avail_bytes
	  node_filesystem_size_bytes
	  node_filesystem_free_bytes
	  node_load15
	  node_load1
	  node_load5
	  node_disk_io_time_seconds_total
	  node_disk_read_time_seconds_total
	  node_disk_write_time_seconds_total
	  node_disk_reads_completed_total
	  node_disk_writes_completed_total
	  node_disk_io_now
	  node_disk_read_bytes_total
	  node_disk_written_bytes_total
	  node_disk_io_time_weighted_seconds_total
	  node_network_receive_bytes_total
	  node_network_transmit_bytes_total
	  node_netstat_Tcp_CurrEstab
	  node_sockstat_TCP_tw
	  node_netstat_Tcp_ActiveOpens
	  node_netstat_Tcp_PassiveOpens
	  node_sockstat_TCP_alloc
	  node_sockstat_TCP_inuse
	  node_exporter_build_info
	  http_request_duration_microseconds
	  http_response_size_bytes
	  http_requests_total
	  http_request_size_bytes
	  rest_client_requests_total
	  container_spec_cpu_quota
	  container_network_transmit_packets_total
	  container_fs_write_seconds_total
	  container_fs_read_seconds_total
	  kube_pod_owner
	  kube_deployment_metadata_generation
	  kube_pod_deletion_timestamp

How can I view the collected data?

Method 1:

  1. Go to the homepage of Grafana dashboards.
  2. In the left-side navigation pane, click the Explore icon.
  3. In the upper part of the Explore page, select the monitored Kubernetes cluster from the drop-down list. Enter a PromQL statement in the Metrics field and click Run Query in the upper-right corner for debugging.
    Prometheus data debugging

Method 2:

  1. Log on to the ARMS console.
  2. In the left-side navigation pane, click Prometheus Monitoring.
  3. In the top navigation bar of the Prometheus Monitoring page, select the region where the monitored Kubernetes cluster resides. Then, click the name of the Kubernetes cluster.
  4. In the left-side navigation pane, click Settings.
  5. On the page that appears, click the Targets (beta) tab.
  6. Click the show icon before a service name. Then, click the link in the Endpoint column of the table below the service name to view the collected metrics.
    Settings: Targets (beta) tab

What can I do if I am unable to obtain the metrics of a CSI component?

Check whether the version of the Container Service for Kubernetes (ACK) container storage interface (CSI) component is V1.18.8.45 or later. If the component version is outdated, update the component. For more information, see Use csi-plugin to monitor the storage resources of an ACK cluster.

What can I do if I am unable to view metrics in the dashboard of an exporter?

  1. Check whether the data export of the exporter is as expected based on How can I view the collected data?.
    If the data export of the exporter is abnormal, perform the following steps to troubleshoot the error:
  2. Log on to the ARMS console.
  3. In the left-side navigation pane, click Prometheus Monitoring.
  4. In the top navigation bar of the Prometheus Monitoring page, select the region where the monitored Kubernetes cluster resides. Then, click the name of the Kubernetes cluster.
  5. In the left-side navigation pane, click Exporters.
  6. Find the exporter for which you want to troubleshoot the error and click Log in the Actions column.
    Check whether error logs exist.
    • If error logs exist, check whether solutions are provided in the GitHub project for the exporter based on the open source service. The following list provides links to the GitHub projects for exporters based on open source services:
    • If no error log exists, submit a ticket or contact the DingTalk account arms160804.

How can I prevent Prometheus from automatically creating default alert rules?

Prometheus automatically creates default alert rules for monitored applications. For more information about the default alert rules, see Description of alert rules.

Prometheus provides the defaultAlert parameter that allows you to specify whether Prometheus automatically creates default alert rules. If this parameter is set to true, Prometheus automatically creates default alert rules. If this parameter is set to false, Prometheus does not automatically create default alert rules.

To prevent Prometheus from automatically creating default alert rules for a monitored application, you can set the defaultAlert parameter of the Prometheus agent to false. After you disable automatic creation of default alert rules, we recommend that you delete the default alert rules that have been created by Prometheus for your application.

The following example shows how to configure the Prometheus agent for an ACK cluster.

  1. Set the defaultAlert parameter of the Prometheus agent to false.
    1. Log on to the ACK console.
    2. In the left-side navigation pane of the ACK console, click Clusters.
    3. On the Clusters page, find the cluster that you want to manage and click Applications in the Actions column.
    4. On the Deployments page, set Namespace to arms-prom, find the Deployment whose name starts with arms-prom, such as arms-prom-ack-arms-prometheus, and then click Edit in the Actions column.
    5. On the Edit page, find the Start parameter in the Lifecycle section. In the Parameter field, enter -defaultAlert=false. Then, click Update in the upper-right corner of the page.
      After the Prometheus agent is updated, wait 3 to 5 minutes. Prometheus no longer creates default alert rules for the cluster.
  2. Optional:Delete default alert rules.
    1. Log on to the ARMS console.
    2. In the left-side navigation pane, click Prometheus Monitoring.
    3. On the Prometheus Monitoring page, click the name of the ACK cluster that you want to manage.
    4. In the left-side navigation pane, click Settings.
    5. On the page that appears, click the Rule tab and delete the alert rules.
      Rules

How can I allow Prometheus to automatically enable alert rules?

By default, alert rules are disabled.

Prometheus provides the alert parameter that allows you to specify whether Prometheus automatically enables alert rules. If this parameter is set to true, Prometheus automatically enables alert rules. If this parameter is set to false, Prometheus does not automatically enable alert rules.

To allow Prometheus to automatically enable alert rules that are created for a monitored application, you can set the alert parameter of the Prometheus agent to true.

The following example shows how to configure the Prometheus agent for an ACK cluster.

  1. Log on to the ACK console.
  2. In the left-side navigation pane of the ACK console, click Clusters.
  3. On the Clusters page, find the cluster that you want to manage and click Applications in the Actions column.
  4. On the Deployments page, set Namespace to arms-prom, find the Deployment whose name starts with arms-prom, such as arms-prom-ack-arms-prometheus, and then click Edit in the Actions column.
  5. On the Edit page, find the Start parameter in the Lifecycle section. In the Parameter field, enter -alert=true. Then, click Update in the upper-right corner of the page.
    After the Prometheus agent is updated, wait 3 to 5 minutes. All alert rules are enabled for the cluster.

How can I integrate Prometheus with third-party systems such as Grafana, Istio, and HPA?

To integrate Prometheus with third-party systems such as Grafana, Istio, and Horizontal Pod Autoscaler (HPA), you must obtain the API URL of Prometheus. Perform the following steps to obtain the API URL:

  1. Log on to the ARMS console.
  2. In the left-side navigation pane, click Prometheus Monitoring.
  3. In the upper-left corner of the Prometheus Monitoring page, select the region where the Container Service for Kubernetes cluster resides. Find the cluster and click Settings in the Actions column.
  4. On the Settings page, click the Agent Settings tab.
  5. On the Agent Settings tab, copy the URL next to Step 2: API URL.
    pg_pm_settings_tab_agent_settings

After you obtain the API URL, add it to third-party systems such as Grafana, Istio, and HPA. For more information about how to integrate Prometheus with Grafana, see Import data from Prometheus Monitoring to a local Grafana system.

What can I do if the Prometheus agent is not deleted after the monitored ACK cluster is deleted?

Problem description

After an ACK cluster is deleted, the following issues occur in the region where the deleted ACK cluster resides:

  • On the Prometheus Monitoring page of the Prometheus console, several clusters are dimmed.
  • You cannot create an ACK cluster by using the name of the deleted ACK cluster.

Solution

Log on to the Prometheus console and uninstall the Prometheus agent. For more information, see Uninstall the Prometheus agent.