ACK Net Exporter is a component that enhances the observability of cluster networks. You can deploy ACK Net Exporter in your cluster to collect various metrics of container networks. This allows you to identify and troubleshoot network issues at the earliest opportunity. This topic describes how to use ACK Net Exporter to troubleshoot container network issues.

Prerequisites

A Container Service for Kubernetes (ACK) managed cluster is created. For more information, see Create an ACK managed cluster.

Background information

ACK Net Exporter runs in a daemon pod on each node. ACK Net the Exporter uses the Extended Berkeley Packet Filter (eBPF) technology to collect network information from the node and aggregates the information to the pod. ACK Net Exporter provides a standard interface to allow you to monitor high-level network information. The following figure shows the architecture of ACK Net Exporter. Architecture

Install and configure ACK Net Exporter

Install ACK Net Exporter

  1. Log on to the ACK console and choose Marketplace > Marketplace in the left-side navigation pane.
  2. Find and click ack-net-exporter on the Marketplace page.
  3. On the ack-net-exporter page, click Deploy in the upper-right corner.
  4. In the Basic Information step, specify Cluster and Namespace, and then click Next.
  5. In the Parameters step, configure the following parameters and click OK.
    ParameterDescriptionDefault value
    nameThe name of the ACK Net Exporter component. ack-net-exporter-default
    namespaceThe namespace to which ACK Net Exporter belongs. kube-system
    config.enableEventServerSpecify whether to enable event tracing. Valid values:
    • false: disables event tracing.
    • true: enables event tracing.
    false
    config.enableMetricServerSpecify whether to enable metric collection. Valid values:
    • false: disables metric collection.
    • true: enables metric collection.
    true
    config.enableLegacyVersionSpecify whether to enable the compatibility mode. Valid values:
    • false: disables the compatibility mode.
    • true: enables the compatibility mode. After you enable this mode, ACK Net Exporter supports more operating systems. However, you cannot use new features provided by ACK Net Exporter.
    true
    config.remoteLokiAddressThe Grafana Loki service address to which events are pushed.By default, this parameter is empty.
    config.metricLabelVerboseSpecify whether to enable metric verbose. Valid values:
    • false: disables metric verbose.
    • true: enables metric verbose. After you enable this feature, pod IP addresses and labels are saved as the label information of metrics.
    false
    config.metricServerPortThe port that is used by the metric service to provide HTTP services. 9102
    config.eventServerPortThe port that is used by the event service to provide gRPC streaming services. 19102
    config.metricProbesThe metric probes that you want to enable. For more information, see ACK Net Exporter metrics. By default, this parameter is empty and only the required metric probes are enabled.
    config.eventProbesThe event probes that you want to enable. For more information, see ACK Net Exporter events. By default, this parameter is empty and only the required event probes are enabled.

Configure ACK Net Exporter

  • You can run the following command to modify the ConfigMap of ACK Net Exporter:
    kubectl edit cm inspector-config -n kube-system
  • You can also configure ACK Net Exporter in the ACK console.
    1. Log on to the ACK console and click Clusters in the left-side navigation pane.
    2. On the Clusters page, click the name of a cluster and choose Configurations > ConfigMaps in the left-side navigation pane.
    3. On the ConfigMap page, set Namespace to kube-system, search for inspector-config, and then click Edit in the Actions of inspector-config.
    4. In the Edit panel, configure the parameters and click OK.

      The following table describes the parameters supported by ACK Net Exporter.

      ParameterDescriptionDefault value
      debugmodeSpecify whether to enable the debugging mode. Valid values:
      • false: disables the debugging mode.
      • true: enables the debugging mode. After you enable this feature, debug-level logs, interface debugging, Go pprof, and gops are supported.
      false
      event_config.loki_enableSpecify whether to enable the feature of pushing events to Grafana Loki. For more information, see Use Grafana Loki to collect and visualize events. Valid values:
      • false: disables the feature.
      • true: enables the feature.
      false
      event_config.loki_addressThe Grafana Loki service address. By default, the system automatically discovers a service named grafana-loki in the specified namespace. By default, this parameter is empty.
      event_config.probesThe event probes that you want to enable. For more information, see ACK Net Exporter events. By default, this parameter is empty and only the required event probes are enabled.
      event_config.portThe port used by the event service to provide gRPC streaming services. 19102
      metric_config.verboseSpecify whether to enable metric verbose. Valid values:
      • false: disables metric verbose.
      • true: enables metric verbose. After you enable this feature, pod IP addresses and labels are saved as the label information of metrics.
      false
      metric_config.portThe port that is used by the metric service to provide HTTP services. 9102
      metric_config.probesThe metric probes that you want to enable. For more information, see ACK Net Exporter metrics. By default, this parameter is empty and only the required metric probes are enabled.
      metric_config.intervalThe interval at which metrics are collected. Metric collection compromises performance. Therefore, ACK Net Exporter caches the periodically collected metrics in memory. 5

In earlier ACK Net Exporter versions, you need to trigger the system to recreate all ACK Net Exporter containers after you modify the configuration of ACK Net Exporter. The modified configuration takes effect after the containers are recreated. You no longer need to perform this operation in ACK Net Exporter 0.2.3 and later versions because these versions support hot updates.

Usage notes for ACK Net Exporter

Use ACK Net Exporter in operating systems other than Alinux

Some key features of ACK Net Exporter rely on eBPF programs to collect information. To meet the requirements of different operating system kernels, ACK Net Exporter uses CO-RE to distribute eBPF programs. When ACK Net Exporter starts up, it needs to load the BTF file that is associated with the operating system kernel. The BTF file stores the metadata of the kernel debug information. If no corresponding BTF file is loaded, the key features become unavailable. Most later operating system versions have built-in BTF files. For more information about the operating systems, see BPF Type Format.

To run ACK Net Exporter on Alinux2 and Alinux3 nodes, make sure that the following requirements are met:
  • The kernel version of the operating system must be later than 4.10.
  • One of the following files is installed:
    • The kernel-debuginfo file, which stores the kernel debug information.
    • The vmlinux file, which stores the debug information. The file is compiled by the operating system kernel but has not been compressed.
    • The BTF file provided by the operating system.
  • ACK Net Exporter is updated to 0.2.9 or later, and config.enableLegacyVersion is set to false when you install ACK Net Exporter.
If the preceding requirements are met, you can perform the following steps to use the advanced features provided by ACK Net Exporter:
  1. Store the BTF file in the /boot/ path of the node.
    • If you installed a complete vmlinux file, you can store the vmlinux file in the /boot/ path of the operating system.
    • If you installed the kernel-debuginfo package, find the vmlinux file in the /usr/lib/debug/lib/modules/ path of the node and copy it to the /boot/ path.
  2. Run the following command to check whether valid BTF information is loaded and ACK Net Exporter can run as expected:
    # You can run commands such as docker, podman, and ctr to perform the test.
    nerdctl run -it -v /boot:/boot registry.cn-hangzhou.aliyuncs.com/acs/btfhack:latest  -- btfhack discover

    If the path of the BTF file is returned, the configuration is completed. You can trigger the system to recreate the containers of ACK Net Exporter and wait a period of time. Then, you can view the collected metrics and events.

Metrics and metric format supported by ACK Net Exporter

ACK Net Exporter supports Prometheus metrics. After you install ACK Net Exporter, you can access the service port of a pod that is created for ACK Net Exporter to query metrics.
  1. If you install ACK Net Exporter from the Marketplace page of the ACK console, you can run the following command to query all ACK Net Exporter pods:
    kubectl get pod -l app=net-exporter -n kube-system -o wide
    Expected output:
    NAME      READY   STATUS    RESTARTS   AGE   IP           NODE       NOMINATED NODE       READINESS GATES  
    anp-***   1/1     Running   0          32s   10.1.XX.XX   cn-***     <none>               <none>
  2. Run the following command to query metrics. Replace 10.1.XX.XX with the IP address of ACK Net Exporter obtained in the preceding step.
    curl http://<10.1.XX.XX>:9102/metrics
ACK Net Exporter returns metric data in the following format:
inspector_pod_udprcvbuferrors{namespace="elastic-system",netns="ns402653****",node="iZbp179u0bgzhofjupc****",pod="elastic-operator-0"} 0 1654487977826
The preceding format includes the following fields:
  • inspector_pod_udprcvbuferrors indicates that the metric is provided by ACK Net Exporter and it is a pod metric. Metrics of both pods and nodes are collected. The name of the metric is udprcvbuferrors, which indicates the number of UDP receive buffer errors that occur because the receive queue within a pod is full.
  • namespace, pod, node, and netns: the labels of metrics. You can use PromQL statements to filter labels. The pod label indicates the pod that the metric describes. The namespace label indicates the namespace to which the pod belongs. The node label indicates the name of the node that hosts the pod. The hostname specified in the /etc/hostname file is used as the default hostname. The netns label indicates the network namespace ID of a container in the pod.
  • 0 and 1654487977826 indicate the value of the metric and the point in time when the metric value is collected. The point in time is a UNIX timestamp.

Events and event format supported by ACK Net Exporter

ACK Net Exporter can collect events of network exceptions that occur on nodes. This section describes the network exceptions that you may encounter. These exceptions occasionally occur and are difficult to reproduce. Currently, no efficient methods can be used to troubleshoot these exceptions.

  • Connection failures and request timeouts caused by data packet loss.
  • Performance issues caused by time-consuming data processing.
  • Business interruptions that occur due to the anomalies of the stateful connection mechanism, such as TCP or connection tracking.

ACK Net Exporter provides eBPF-based context observability for operating system kernels to help you troubleshoot the preceding issues. ACK Net Exporter can capture the status of the operating system in real time when an exception occurs and then generates an event log. For more information about the events and event probes supported by ACK Net Exporter, see ACK Net Exporter events.

You can check the relevant information in the event log. Take the tcp_reset probe as an example. When a pod receives a normal packet that is destined for an unknown port, ACK Net Exporter captures the following event:
type=TCPRESET_NOSOCK pod=storage-monitor-5775dfdc77-fj767 namespace=kube-system protocol=TCP saddr=100.103.42.233 sport=443 daddr=10.1.17.188 dport=33488
  • type=TCPRESET_NOSOCK: indicates the TCPRESET_NOSOCK event. This type of event is captured by the tcp_reset probe. The event indicates that a reset packet is returned for a packet that is destined for an unknown port because no matching socket can be found. This event usually occurs when NAT fails. For example, this event occurs when an IPVS timeout occurs.
  • pod/namespace: the pod metadata that is associated with the event after ACK Net Exporter finds the matching IP address and network device serial number based on the network namespace of the packet.
  • saddr/sport/daddr/dport: the packet information obtained by ACK Net Exporter from the kernel. The packet information varies based on the event. For example, an event captured by the net_softirq probe does not contain IP addresses. Instead, the event contains the serial number of the CPU in which the interruption occurs and the delay.

For events that require valid operating system kernel stacking information, ACK Net Exporter captures the stacking context in the operating system kernel when these events occur, such as the following event:

type=PACKETLOSS pod=hostNetwork namespace=hostNetwork protocol=TCP saddr=10.1.17.172 sport=6443 daddr=10.1.17.176 dport=43018  stacktrace:skb_release_data+0xA3 __kfree_skb+0xE tcp_recvmsg+0x61D inet_recvmsg+0x58 sock_read_iter+0x92 new_sync_read+0xE8 vfs_read+0x89 ksys_read+0x5A

ACK Net Exporter allows you to view events by using multiple methods. For more information, see the Collect monitoring data from ACK Net Exporter topic.

Collect monitoring data from ACK Net Exporter

Scenario 1: Export monitoring data to Prometheus or Grafana and visualize the data

ACK Net Exporter can export monitoring data to a Prometheus server. If you use a self-managed Prometheus server, you can add the following scrape_config to enable the Prometheus server to collect monitoring data from ACK Net Exporter:
# In the following example, only one endpoint is specified for data collection. 
scrape_configs:
# The job=<job_name> label is added to each time series that is collected based on the configuration. In this example, the job name is set to net-exporter_sample. 
- job_name: "net-exporter_sample"
static_configs:
  - targets: ["{kubernetes pod ip}:9102"]
                
If your Prometheus server runs in an ACK cluster, you can use the service discovery feature of Prometheus to automatically obtain all ACK Net Exporter pods that function as normal. To do this, add the following configuration to the Prometheus server:
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-server-conf
  labels:
    name: prometheus-server-conf
  namespace: kube-system
data:
  prometheus.yml: |-
          # Add the following configuration to the Prometheus server: 
      - job_name: 'net-exporter'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
        - source_labels: [__meta_kubernetes_endpoints_name]
          regex: 'net-exporter'
          action: keep

      - job_name: 'kubernetes-pods'

        kubernetes_sd_configs:
        - role: pod

        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name

After you add the configuration, the Status > Targets page of the Prometheus server shows the ACK Net Exporter pods that run as normal. You can also enter inspector into the search box on the Graph page of the Prometheus server to view the ACK Net Exporter metrics.

inspector2

You can configure Grafana to visualize the monitoring data that is collected to Prometheus:

  1. In the left-side navigation pane of the Grafana page, choose 5 > Dashboard.
  2. On the New dashboard page, click Add an empty panel.
  3. In the lower part of the Edit Panel page, enter Prometheus into the Data source field. Then, enter the address of the Prometheus server.
  4. Click Metric browser and enter inspector. Then, Grafana displays all available ACK Net Exporter metrics. Click Save in the upper-right part. In the dialog box that appears, click Save. Grafana then displays the visualized data, as shown in the following figure.
    grafna
  5. You can configure how the metrics are displayed on a Grafana dashboard based on the configurations that are displayed in the preceding figure. For example, you can use the following configurations to display the increment trend of the inspector_pod_tcppassiveopens metric. This metric indicates the total number of sockets that are created due to handshake requests sent by clients to establish TCP connections within a network space after the system is started or the container is created. To view the increment trend of this metric, use the following configurations:
    // Use the rate() method provided by PromQL to calculate the increment trend of the metric. 
    rate(inspector_pod_tcppassiveopens[1m])
    
    // Use the labels provided by net-exporter to configure a legend to display the metric. 
    {{namespace}}/{{pod}}/{{node}}

Scenario 2: Export monitoring data to ARMS and visualize the data

To export monitoring data from ACK Net Exporter to Application Real-Time Monitoring Service (ARMS) and visualize the data, perform the following steps.

  1. Enable Prometheus Service.
  2. Configure custom ACK Net Exporter metrics.
    1. Log on to the ARMS console. In the left-side navigation pane, choose Prometheus Service > Prometheus Instances.
    2. In the upper-left corner of the Prometheus Monitoring page, select the region in which your ACK cluster is deployed and click the Prometheus instance that you want to manage. Then, you are redirected to the instance details page. In most cases, the name of the Prometheus instance is the same as the name of your cluster.
    3. In the left-side navigation pane, click Service Discovery and then click the Targets tab. In the lower part of the tab, click kubernetes-pods. The information shows that the custom ACK Net Exporter metrics are configured. Service discovery
      If kubernetes-pods is not displayed, you need to click the Configurations tab and turn on Default Service Discovery. Configuration
    4. In the left-side navigation pane, click Dashboards. Click the corresponding dashboard to log on to Grafana. Click Add panel, select Graph, and then enable the data sources related to your cluster in the Data source section.
    5. Click Metric browser and enter inspector. Then, Grafana automatically displays all available ACK Net Exporter metrics. In the upper-right corner, click Save. In the dialog box that appears, click Save. Grafana then displays the visualized data, as shown in the following figure.81

Scenario 3: Export monitoring data to Grafana Loki and visualize the data

You can push anomaly events collected by ACK Net Exporter to your pre-configured Grafana Loki service in real time. This helps you manage these events in a centralized manner. To export monitoring data from ACK Net Exporter to Grafana Loki, perform the following steps.

  1. Set up Grafana Loki.
    Note Deploy Grafana Loki in a network that is accessible to the ACK Net Exporter pods. ACK Net Exporter can automatically push event logs to Grafana Loki.
  2. On the configuration page of ACK Net Exporter, set enableEventServer to true and lokiServerAddress to the address of the Grafana Loki service. You can specify the IP address or domain name of the Grafana Loki service. Command
  3. Run the following command to access the service address and check whether Grafana Loki is ready:
    curl http://[Address of Grafana Loki]:3100/ready
  4. When Grafana Loki is ready, add Grafana Loki as a Grafana data source.
    Open Grafana. In the left-side navigation pane, choose Data source > Loki, enter the address of Grafana Loki, and then click Save&test. loki
  5. In the left-side navigation pane, click Explore. On the top of the page, set the data source to Loki and view the events pushed to Grafana Loki.
    You can view the events of a node by selecting the node from the Label filters drop-down list or specify keywords in the Line containers field to search for specific events. View events

    You can click Add to dashboard on the top of the page to add a configured event panel to the dashboard.

    The content of the events provided by ACK Net Exporter varies based on the event type. You can check the event details to view the relevant content. Event details

    For more information about the LogQL query language supported by Grafana Loki, see LogQL.

Scenario 4: Use the ACK Net Exporter CLI to collect events

The ACK Net Exporter CLI (inspector-cli) is a scenario-specific troubleshooting and analysis tool developed by the ACK team based on ACK Net Exporter. You can use inspector-cli to collect kernel exception events in real time. inspector-cli can help quickly identify the cause of common exceptions in cloud-native scenarios.

You can run inspector-cli by launching a container on an on-premises machine.
# Launch a temporary container to run inspector-cli. You can replace the image with a later version to update inspector-cli. 
docker run -it --name=inspector-cli --network=host registry.cn-hangzhou.aliyuncs.com/acs/inspector:v0.0.1-12-gff0558c-aliyun

which inspector
# /bin/inspector is the working path of inspector-cli. You can directly run inspector-cli in the container. 
The following example shows how to use inspector-cli to collect the events of a node captured by ACK Net Exporter.
# Set '-e' to specify the address of the event service of ACK Net Exporter. 
inspector watch -e 10.1.16.255

# Expected output: 
 INFO  TCP_RCV_RST_ESTAB Namespace=kube-system Pod=kube-proxy-worker-tbv5s Node=iZbp1jesgumdx66l8ym8j8Z Netns=4026531993 10.1.16.255:43186 -> 100.100.27.15:3128
...

You can also log on to the inspector container of ACK Net Exporter to troubleshoot issues.

# When you run the following command, set the -n parameter to the namespace of net-exporter and specify the net-exporter pod that you want to access. 
kubectl exec -it -n kube-system -c inspector net-exporter-2rvfh -- sh

# Run the following command to view the distribution of network entities on the current node. 
inspector list entity

# Run the following command to listen for network exception events and other relevant information in the local network. 
inspector watch -d -v
#{"time":"2023-02-03T09:01:03.402118044Z","level":"INFO","source":"/go/src/net-exporter/cmd/watch.go:63","msg":"TCPRESET_PROCESS","meta":"hostNetwork/hostNetwork node=izbp1dnsn1bwv9oyu2gaupz netns=ns0 ","event":"protocol=TCP saddr=10.1.17.113 sport=6443 daddr=10.1.17.113 dport=44226  state:TCP_OTHER "}

# You can also specify multiple ACK Net Exporter nodes to view the time when the event occurs on these nodes. 
inspector watch -s 10.1.17.113 -s 10.1.18.14 -d -v
            

How to use ACK Net Exporter to troubleshoot occasional container network issues

This section describes how to troubleshoot occasional network issues in cloud-native scenarios. With the help of ACK Net Exporter, you can quickly obtain information that is required for fixing these issues.

DNS timeout issues

DNS timeout issues in cloud-native environments can cause service access failures. The following reasons can cause DNS timeout issues:
  • The DNS server fails to reply before the DNS query times out.
  • The DNS client cannot deliver the DNS query promptly or fails to deliver the DNS query.
  • The DNS server responds to the DNS query. However, the response is lost due to a DNS client issue, such as insufficient memory.
You can use the following metrics to help you troubleshoot DNS timeout issues.
MetricDescription
inspector_pod_udpsndbuferrorsThe number of UDP packet send errors.
inspector_pod_udpincsumerrorsThe number of UDP packet checksum errors.
inspector_pod_udpnoportsThe number of times that the __udp4_lib_rcv function fails to find the socket when the function is invoked to receive UDP packets.
inspector_pod_udpinerrorsThe number of UDP packet receive errors.
inspector_pod_udpoutdatagramsThe number of UDP packets that are successfully sent.
inspector_pod_udprcvbuferrorsThe number of times that UDP fails to replicate protocol data from the application layer to a socket queue because the socket queue is full.

A large number of services in cloud-native environments rely on the DNS resolution service provided by CoreDNS. If a DNS issue correlated to CoreDNS occurs, you need to check the metrics of the CoreDNS pod.

Nginx Ingress 499, 502, 503, and 504 issues

In cloud-native environments, exceptions usually occur to an Ingress gateway or a service that functions as a proxy or broker. The following 499, 502, 503, and 504 errors commonly occur to an NGINX Ingress or other proxy services that use NGINX as the base:
  • 499: This error is returned if the NGINX client closes the TCP connection without receiving a response from the NGINX server. Common reasons:
    • The NGINX client does not send the request immediately after the TCP connection is created. As a result, the client times out before the NGINX server replies. This issue commonly occurs to asynchronous requests sent by Android clients.
    • The NGINX server requires a period of time to handle the TCP connection. In this scenario, you need to check all possible causes.
    • The NGINX server is waiting for the response from the upstream backend.
  • 502: This error is usually caused by connection issues between the NGINX server and upstream backend, such as connection failures or unexpected connection disruptions. Common reasons:
    • A DNS resolution failure occurs to the backend. This issue commonly occurs when a Kubernetes Service is specified as the backend.
    • The NGINX server fails to connect to the upstream backend.
    • Business interaction is interrupted because the size of the upstream request or response is too large or no memory can be allocated.
  • 503: This error is returned to the client when all upstream backends are unavailable. Common reasons in cloud-native environments:
    • No backends are available. This issue only occasionally occurs.
    • The Ingress triggers rate limiting due to the heavy traffic.
  • 504: This error is returned when packets exchanged between the NGINX server and upstream backend time out. One of the common reasons is that the response from the upstream backend fails to reach the NGINX server before the timeout period ends.
When the preceding errors are returned, you need to collect the following information to narrow down the scope for further troubleshooting:
  • The access_log information provided by NGINX, including request_time, upstream_connect_time, and upstrem_response_time.
  • The error_log information provided by NGINX. You need to check whether error messages are returned when the issue occurs.
  • If you have configured liveness probing or readiness probing, you need to check the health check information.
You need to pay attention to the following metrics if the possible cause is connection failure.
MetricDescription
inspector_pod_tcpextlistenoverflowsThe number of times that the SYN queue is full when the socket in the LISTEN state accepts connections.
inspector_pod_tcpextlistendropsThe number of times that the socket in the LISTEN state fails to create a socket in the SYN_RECV state.
inspector_pod_netdevtxdroppedThe number of packet drops due to NIC send errors.
inspector_pod_netdevrxdroppedThe number of packet drops due to NIC receive errors.
inspector_pod_tcpactiveopensThe number of times that TCP SYN succeeds within a pod, excluding SYN retransmissions. The value of this metric also increases when connection failures occur.
inspector_pod_tcppassiveopensThe number of times that TCP handshake succeeds and a socket is allocated within a pod. In most cases, this metric indicates the number of new connections.
inspector_pod_tcpretranssegsThe total number of packets that are retransmitted within a pod. TCP segments generated by TCP segmentation offload (TSO) are already counted.
inspector_pod_tcpestabresetsThe number of TCP connections that are exceptionally closed within a pod. The value is calculated only based on results.
inspector_pod_tcpoutrstsThe number of TCP reset packets sent within a pod.
inspector_pod_conntrackinvalidThe number of times that connection tracking fails to create connections but does not drop the packets.
inspector_pod_conntrackdropThe number of times that connection tracking drops packets due to connection failures.
You need to pay attention to the following metrics if the NGINX server responds slowly. For example, the request processing time (request_time) is short but the request times out.
MetricDescription
inspector_pod_tcpsummarytcpestablishedconnThe number of TCP connections in the ESTABLISHED state.
inspector_pod_tcpsummarytcptimewaitconnThe number of TCP connections in the TIMEWAIT state.
inspector_pod_tcpsummarytcptxqueueThe size of data packets in the send queue of TCP connections in the ESTABLISHED state. Unit: bytes.
inspector_pod_tcpsummarytcprxqueueThe size of data packets in the receive queue of TCP connections in the ESTABLISHED state. Unit: bytes.
inspector_pod_tcpexttcpretransfailThe number of errors other than EBUSY that are returned after a retransmission. The errors indicate that the retransmission fails.

You can check the changes of the preceding metrics at the point in time when the issue occurs to narrow down the scope. If you still cannot locate the cause, Submit a ticket and include the preceding information in your ticket to request technical support.

TCP reset issues

A host returns a TCP reset packet when it receives an unexpected TCP packet. TCP reset has the following impacts on your applications:
  • connection reset by peer: This error usually occurs on NGINX services that rely on the C library.
  • Broken pipe: This error usually occurs on Java and Python applications that are encapsulated with TCP.
TCP reset is common in cloud-native environments. The cause of TCP reset varies and is hard to identify. The following section lists some common reasons for TCP reset:
  • The server cannot provide services as normal. For example, the memory allocated to TCP is insufficient. In this scenario, TCP proactively sends reset packets.
  • Requests are forwarded to an unexpected backend due to a stateful mechanism error, such as an endpoint or Conntrack error, when Services or load balancers are used.
  • Connections are released due to security reasons.
  • Protection Against Wrapped Sequence numbers (PAWS) or sequence number wrapping issues occur in NAT or high-concurrency scenarios.
  • Connections remain idle for a long period of time when TCP keepalive is used.
You can collect the following metrics to help quickly distinguish the preceding causes.
  1. Analyze the network topology between the client and server when TCP reset packets are generated.
  2. Pay attention to the following metrics.
    MetricDescription
    inspector_pod_tcpexttcpabortontimeoutThe number of times that TCP reset packets are sent to close connections because the upper limit of keepalive, window probe, and retransmission calls is reached.
    inspector_pod_tcpexttcpabortonlingerThe number of times that TCP reset packets are sent to close FIN_WAIT2 connections when the TCP Linger_2 option is enabled.
    inspector_pod_tcpexttcpabortoncloseThe number of times that TCP reset packets are sent to close TCP connections when data reception is still in progress due to a reason other than the status machine.
    inspector_pod_tcpexttcpabortonmemoryThe number of times that TCP reset packets are sent to close connections because tcp_check_oom triggers an out of memory error during memory allocation to tw_sock or tcp_sock.
    inspector_pod_tcpexttcpabortondata*The number of times that TCP reset packets are sent to close connections because the Linger or Linger2 option is enabled.
    inspector_pod_tcpexttcpackskippedsynrecvThe number of times that the socket in the SYN_RECV state does not respond to ACK.
    inspector_pod_tcpexttcpackskippedpawsThe number of times that ACK packets are limited by the Out-of-Window (OOW) rate limiting mechanism because PAWS is triggered.
    inspector_pod_tcpestabresetsThe number of TCP connections that are exceptionally closed within a pod. The value is calculated only based on results.
    inspector_pod_tcpoutrstsThe number of TCP reset packets sent within a pod.
  3. If TCP reset occurs in a specific pattern, you can enable the events feature of ACK Net Exporter to collect the corresponding events.
    EventEvent description
    TCP_SEND_RSTThis event is generated when TCP reset packets are sent to close connections unless the following TCP_SEND_RST_NOSock or TCP_SEND_RST_ACTIVE common event occurs.
    TCP_SEND_RST_NOSockThis event is generated when TCP reset packets are sent because no local socket is found.
    TCP_SEND_RST_ACTIVEThis event is generated when TCP reset packets are sent due to a resource issue or because the user mode is disabled.
    TCP_RCV_RST_SYNThis event is generated when TCP reset packets are sent during the three-way handshake phase.
    TCP_RCV_RST_ESTABThis event is generated when TCP reset packets are sent after connections are established.
    TCP_RCV_RST_TWThis event is generated when TCP reset packets are sent during the four-way handshake phase.

Occasional network latency and jitter issues

Network latency and network jitter issues in cloud-native environments are hard to troubleshoot. The cause of these issues varies. In addition, the network latency issue may further cause the preceding three types of issues. In container networks, network latency issues in nodes usually occur due to the following reasons:
  • A real-time process managed by the RT scheduler requires a long period of time to complete. As a result, user processes or network kernel processes are piled in the queue or run slowly.
  • An external call made by the user process occasionally requires a long period of time to complete. For example, requests are processed slowly because the disk responds slowly or the round-trip time of an RDS instance increases.
  • Some CPUs or NUMA nodes are overwhelmed due to the improper node configuration. As a result, system stuttering occurs.
  • The stateful mechanism of the kernel causes the increased latency. For example, due to the confirm operation performed by connection tracking, a large number of orphan sockets adversely affect socket search.
In most cases, these network issues are caused by operating system issues. You can pay attention to the following metrics to narrow down the scope.
MetricDescription
inspector_node_netsoftirqshedThe duration from the time when a software interrupt is initiated to the time when the ksoftirqd process starts to perform the software interrupt.
inspector_node_netsoftirqThe duration from the time when the ksoftirqd process starts to perform the software interrupt to the time when the ksoftirqd process changes to the offcpu state.
inspector_pod_ioioreadsyscallThe number of read operations performed by the process, such as the number of reads or preads.
inspector_pod_ioiowritesyscallThe number of write operations performed by the process, such as the number of writes or pwrites.
inspector_pod_ioioreadbytesThe number of bytes that the process reads from a file system (a block device in most cases).
inspector_pod_ioiowritebyresThe number of bytes that the process writes into a file system.
inspector_node_virtsendcmdlatThe duration of virtual calls for NIC operations.
inspector_pod_tcpexttcptimeoutsThe number of times that SYN packets are retransmitted because the SYN packets are not answered while the status of TCP_CA is not recovery, loss, or disorder.
inspector_pod_tcpsummarytcpestablishedconnThe number of TCP connections in the ESTABLISHED state.
inspector_pod_tcpsummarytcptimewaitconnThe number of TCP connections in the TIMEWAIT state.
inspector_pod_tcpsummarytcptxqueueThe size of data packets in the send queue of TCP connections in the ESTABLISHED state. Unit: bytes.
inspector_pod_tcpsummarytcprxqueueThe size of data packets in the receive queue of TCP connections in the ESTABLISHED state. Unit: bytes.
inspector_pod_softnetprocessedThe number of backlog packets that all CPUs receive from the NIC within a pod.
inspector_pod_softnettimesqueezeThe number of times that all CPUs fail to receive the complete packet or the receive operation times out within a pod.

Case study

The following cases show how to use ACK Net Exporter to help troubleshoot container network issues.

Case 1: Occasional DNS resolution timeout

Symptom

Customer A submitted a ticket to request technical support to handle DNS resolution timeouts that occasionally occur. The application of Customer A is written in PHP. CoreDNS is configured to perform DNS resolution.

Troubleshooting

  1. Obtain DNS metrics from the monitoring system of Customer A.
  2. The following situations exist based on the obtained metrics:
    • Each time a DNS resolution timeout occurs, the value of inspector_pod_udpnoports increases by 1. The value of this metric is small.
    • The number of __udp4_lib_rcv packet drops indicated by the inspector_pod_packetloss metric increases by 1. However, the change in the number of packet drops is minor.
  3. Customer A specifies that the IP address of the DNS server is a public IP address provided by an Internet service provider (ISP). Based on the obtained metrics, the DNS timeouts occurred because the time required to send the response to the client is long. The response is received after the DNS query times out in user mode.

Case 2: Occasional Java application connection failure

Symptom

Customer B submitted a ticket to request technical support to resolve the following issue: Tomcat occasionally becomes unavailable and the issue lasts 5 to 10 seconds each time.

Troubleshooting

  1. The log analysis result shows that the Java runtime was performing a garbage collection operation when the issue occurred.
  2. Customer B deployed ACK Net Exporter and analyzed the monitoring data. Customer B found that the value of the inspector_pod_tcpextlistendrops metric increased significantly at the time when the issue occurred.
  3. The analysis result shows that request processing was slowed down when the Java runtime performed the garbage collection operation. However, new requests are not throttled. As a result, a large number of connections are created and the backlog of the LISTEN socket is overflowed. This causes the value of the inspector_pod_tcpextlistendrops metric to increase.
  4. The issue of TCP connections piling up lasts only a short period of time and the issue is not caused due to the request processing capability of TCP. In this scenario, Customer B modified the Tomcat parameters as we recommended and resolved the issue.

Case 3: Occasional network jitter

Symptom

Customer C submitted a ticket to request technical support to resolve the following issue: The round-trip time between the Redis instance and application significantly increases. As a result, timeout errors occur. The issue cannot be reproduced.

Troubleshooting

  1. After Customer C analyzed the log, Customer C identified that the response time of Redis requests occasionally exceeds 300 milliseconds.
  2. Customer C also identified that the value of the inspector_node_virtsendcmdlat metric increased at the time when the issue occurred. The affected monitoring levels in Prometheus Service are 15 and 18. After calculation, Customer C identified two virtual calls with a long response time. The response time of the call with a monitoring level of 15 exceeds 36 milliseconds and the response time of the call with a monitoring level of 18 exceeds 200 milliseconds.
  3. The kernel occupies the CPU when processing virtual calls. In this case, CPU resources cannot be preempted by other operations. As a result, the execution of virtual calls is slowed down when pods are added or deleted in batches, which further causes the response time to increase.

Case 4: NGINX Ingress occasionally fails to pass health checks

Symptom

Customer D submitted a ticket to request technical support to resolve the following issue: The NGINX Ingress occasionally fails to pass health checks. As a result, request failures occur.

Troubleshooting

  1. After Customer D deployed ACK Net Exporter, Customer D identified that the following metrics are abnormal:
    1. The values of the inspector_pod_tcpsummarytcprxqueue and inspector_pod_tcpsummarytcptxqueue metrics increased.
    2. The value of the inspector_pod_tcpexttcptimeouts metric increased.
    3. The value of the inspector_pod_tcpsummarytcptimewaitconn metric decreased and the value of the inspector_pod_tcpsummarytcpestablishedconn metric increased.
  2. The analysis result shows that the kernel ran as normal when the issue occurred. Connections are created as normal. However, exceptions occurred when the user process handled the packets in the receive socket and sent packets. In this scenario, the health check failure may be caused by a scheduling or rate limiting issue.
  3. Customer D checked the monitoring data of the cgroups as we recommended and identified CPU throttling at the point in time when the health check failure occurred. This indicates that the user process occasionally failed to schedule CPU resources due to a cgroup issue.
  4. To resolve this issue, refer to CPU Burst and configure CPU Burst for the NGINX Ingress.

References

ACK Net Exporter metrics

The metrics supported by ACK Net Exporter are constantly updated. For more information, see the instructions on the Marketplace page of the ACK console. All metrics and events provide pod-specific information, except net_softirq and virtcmdlat, which are not related to pods.

MetricDescriptionProbe name
inspector_pod_netdevrxbytesThe number of bytes received by the NIC. netdev
inspector_pod_netdevtxbytesThe number of bytes sent by the NIC. netdev
inspector_pod_netdevtxerrorsThe number of NIC send errors. netdev
inspector_pod_netdevrxerrorsThe number of NIC receive errors. netdev
inspector_pod_netdevtxdroppedThe number of packet drops due to NIC send errors. netdev
inspector_pod_netdevrxdroppedThe number of packet drops due to NIC receive errors. netdev
inspector_pod_netdevtxpacketsThe number of packets that are successfully sent by the NIC. netdev
inspector_pod_netdevrxpacketsThe number of packets that are successfully received by the NIC. netdev
inspector_pod_softnetprocessedThe number of backlog packets that all CPUs receive from the NIC within a pod. softnet
inspector_pod_softnetdroppedThe number of backlog packets that are dropped by all CPUs after the CPUs receive the packets from the NIC within a pod. softnet
inspector_pod_softnettimesqueezeThe number of times that all CPUs fail to receive the complete packet or the receive operation times out within a pod. softnet
inspector_pod_tcpactiveopensThe number of times that TCP SYN succeeds within a pod, excluding SYN retransmissions. The value of this metric also increases when connection failures occur. tcp
inspector_pod_tcppassiveopensThe number of times that TCP handshake succeeds and a socket is allocated within a pod. In most cases, this metric indicates the number of new connections. tcp
inspector_pod_tcpretranssegsThe total number of packets that are retransmitted within a pod. TCP segments generated by TSO are already counted. tcp
inspector_pod_tcpestabresetsThe number of TCP connections that are exceptionally closed within a pod. The value is calculated only based on results. tcp
inspector_pod_tcpoutrstsThe number of TCP reset packets sent within a pod. tcp
inspector_pod_tcpcurrestabThe number of active TCP connections within a pod. tcp
inspector_pod_tcpexttcpabortontimeoutThe number of times that TCP reset packets are sent to close connections because the upper limit of keepalive, window probe, and retransmission calls is reached. tcpext
inspector_pod_tcpexttcpabortonlingerThe number of times that TCP reset packets are sent to close FIN_WAIT2 connections when the TCP Linger_2 option is enabled. tcpext
inspector_pod_tcpexttcpabortoncloseThe number of times that TCP reset packets are sent to close TCP connections when data reception is still in progress due to a reason other than the status machine. tcpext
inspector_pod_tcpexttcpabortonmemoryThe number of times that TCP reset packets are sent to close connections because tcp_check_oom triggers an out of memory error during memory allocation to tw_sock or tcp_sock. tcpext
inspector_pod_tcpexttcpabortondata*The number of times that TCP reset packets are sent to close connections because the Linger or Linger2 option is enabled. tcpext
inspector_pod_tcpextlistenoverflowsThe number of times that the SYN queue is full when the socket in the LISTEN state accepts connections. tcpext
inspector_pod_tcpextlistendropsThe number of times that socket in the LISTEN state fails to create a socket in the SYN_RECV state. tcpext
inspector_pod_tcpexttcpackskippedsynrecvThe number of times that the socket in the SYN_RECV state does not respond to ACK. tcpext
inspector_pod_tcpexttcpackskippedpawsThe number of times that ACK packets are limited by the OOW rate limiting mechanism because PAWS is triggered. tcpext
inspector_pod_tcpexttcpackskippedseqThe number of times that ACK packets are limited by the OOW rate limiting mechanism because sequence numbers are out of window. tcpext
inspector_pod_tcpexttcpackskippedchallengeThe number of times that challenge ack packets are limited by the OOW rate limiting mechanism. These packets are usually sent to confirm TCP reset packets. tcpext
inspector_pod_tcpexttcpackskippedtimewaitThe number of times that ACK packets are ignored by the OOW rate limiting mechanism in the fin_wait_2 state. tcpext
inspector_pod_tcpexttcpackskippedfinwait2The number of times that ACK packets are ignored by the OOW rate limiting mechanism in the fin_wait_2 state. tcpext
inspector_pod_tcpextpawsestabrejected*The number of times that TCP inbound packets are dropped because PAWS is triggered. tcpext
inspector_pod_tcpexttcprcvqdropThe value of this metric increases when memory allocation fails and the TCP receive queue is full. tcpext
inspector_pod_tcpexttcpretransfailThe number of errors other than EBUSY that are returned after a retransmission. The errors indicate that the retransmission fails. tcpext
inspector_pod_tcpexttcpsynretransThe number of SYN packets that are retransmitted. tcpext
inspector_pod_tcpexttcpfastretransThe number of times that retransmission is triggered when the status of TCP_CA is not Loss. tcpext
inspector_pod_tcpexttcptimeoutsThe number of times that SYN packets are retransmitted because the SYN packets are not answered while the status of TCP_CA is not recovery, loss, or disorder. tcpext
inspector_pod_tcpsummarytcpestablishedconnThe number of TCP connections in the ESTABLISHED state. tcpsummary
inspector_pod_tcpsummarytcptimewaitconnThe number of TCP connections in the TIMEWAIT state. tcpsummary
inspector_pod_tcpsummarytcptxqueueThe size of data packets in the send queue of TCP connections in the ESTABLISHED state. Unit: bytes. tcpsummary
inspector_pod_tcpsummarytcprxqueueThe size of data packets in the receive queue of TCP connections in the ESTABLISHED state. Unit: bytes. tcpsummary
inspector_pod_udpindatagramsThe number of UDP packets that are successfully received. udp
inspector_pod_udpsndbuferrorsThe number of UDP packet send errors. udp
inspector_pod_udpincsumerrorsThe number of UDP packet checksum errors. udp
inspector_pod_udpignoredmultiThe number of multicast packets that are ignored by UDP. udp
inspector_pod_udpnoportsThe number of times that the corresponding socket cannot be found when the network layer invokes __udp4_lib_rcv to receive packets. udp
inspector_pod_udpinerrorsThe number of UDP packet receive errors. udp
inspector_pod_udpoutdatagramsThe number of UDP packets that are successfully sent. udp
inspector_pod_udprcvbuferrorsThe number of times that UDP fails to replicate protocol data from the application layer to a socket queue because the socket queue is full. udp
inspector_pod_conntrackentries*The number of existing entries. conntrack
inspector_pod_conntrackfoundThe number of times that connection tracking records are found. conntrack
inspector_pod_conntrackinsertThe metric is not in use. conntrack
inspector_pod_conntrackinvalidThe number of times that connection tracking fails to create connections but does not drop the packets. conntrack
inspector_pod_conntrackignoreThe number of times that connection tracking is skipped before connections are already created or connection tracking is not required. conntrack
inspector_pod_conntrackinsertfailedThe metric is not in use. conntrack
inspector_pod_conntrackdropThe number of times that connection tracking drops packets due to connection failures. conntrack
inspector_pod_conntrackearlydropThe metric is not in use. conntrack
inspector_pod_conntracksearchrestartThe number of attempts to retry a search during connection tracking. conntrack
inspector_pod_fdopenfdThe number of file descriptors of all processes within a pod. fd
inspector_pod_fdopensocketThe number of file descriptors of socket type within a pod. fd
inspector_pod_slabtcpslabobjperslabThe number of objects included in a single page of a TCP slab. slab
inspector_pod_slabtcpslabpagesperslabThe number of pages in a TCP slab. slab
inspector_pod_slabtcpslabobjactiveThe number of active objects in a TCP slab. slab
inspector_pod_slabtcpslabobjnumThe number of objects in a TCP slab. slab
inspector_pod_slabtcpslabobjsizeThe size of each object in a TCP slab. The size varies based on the kernel version. slab
inspector_pod_ioioreadsyscallThe number of read operations performed by the process, such as the number of reads or preads. io
inspector_pod_ioiowritesyscallThe number of write operations performed by the process, such as the number of writes or pwrites. io
inspector_pod_ioioreadbytesThe number of bytes that the process reads from a file system (a block device in most cases). io
inspector_pod_ioiowritebyresThe number of bytes that the process writes into a file system. io
inspector_pod_net_softirq_schedslow100msThe number of times that the amount of time to wait for scheduling exceeds 100 milliseconds when a network interruption occurs. net_softirq
inspector_pod_net_softirq_excuteslow100msThe number of times that a network software interruption lasts more than 100 milliseconds. net_softirq
inspector_pod_abnormalloss(inspector_pod_packetloss_abnormal)The number of times that packets are dropped by the kernel due to errors other than packet issues, such as packet integrity issues or packet checksum errors. packetloss
inspector_pod_totalloss(inspector_pod_packetloss_total)The total number of packets dropped by the kernel. packetloss
inspector_pod_virtcmdlatency100msThe number of times that virtualized communication performed by the NIC lasts more than 100 milliseconds. virtcmdlat
inspector_pod_socketlatencyread100msThe number of times that the user program requires more than 100 milliseconds to read content from the network socket file. socketlatency
inspector_pod_socketlatencywrite100msThe number of times that the user program requires more than 100 milliseconds to write content to the network socket file. socketlatency
kernellatency_rxslow100msThe number of times that the operating system kernel requires more than 100 milliseconds to receive a packet. kernellatency
kernellatency_txslow100msThe number of times that the operating system kernel requires more than 100 milliseconds to send a packet. kernellatency

ACK Net Exporter events

The following table describes the operating system network-related events that can be captured by using the latest ACK Net Exporter version.

Probe nameDescription
netiftxlatQueuing Disciplines (qdiscs) of traffic control needs to wait a long period of time before it can send data packets in the queue.
packetlossNormal data packets are dropped by the operating system kernel.
net_softirqPacket scheduling by NET_RX or NET_TX is interrupted or packet processing is severely delayed due to kernel process software interruption.
socketlatencyProcesses in a pod require a long period to time to complete socket-related read and write operations.
kernellatencyThe kernel requires a long period of time to process packets at the network layer.
virtcmdlatencyCommunication between Virtio-net and the host requires a long period of time.
tcpresetTCP reset packets are received or sent.
tcptwrcvTCP receives and processes packets when TCP is in the TIMEWAIT state.

Recommended Grafana configuration file