This topic describes the commands that are commonly used to diagnose DNS resolutions in Container Service for Kubernetes (ACK) clusters and the relevant procedures.

Diagnose the DNS configurations of application pods

  • Commands
    # Run the following command to query the YAML file of the foo pod. Then, check whether the dnsPolicy field in the YAML file is set to a proper value. 
    kubectl get pod foo -o yaml
    
    # If the dnsPolicy field is set to a proper value, check the DNS configuration file of the pod. 
    
    # Run the following command to log on to the containers of the foo pod by using bash. If bash does not exist, use sh. 
    kubectl exec -it foo bash
    
    # Run the following command to query the DNS configuration file. Then, check the DNS server addresses in the nameservers field. 
    cat /etc/resolv.conf
  • DNS policy settings

    The following sample code provides a pod template that is configured with DNS policy settings:

    apiVersion: v1
    kind: Pod
    metadata:
      name: <pod-name>
      namespace: <pod-namespace>
    spec:
      containers:
      - image: <container-image>
        name: <container-name>
    
    # The default value of dnsPolicy is ClusterFirst. 
      dnsPolicy: ClusterFirst
    # The following code shows the DNS policy settings that are applied when NodeLocal DNSCache is used. 
      dnsPolicy: None
      dnsConfig:
        nameservers:
        - 169.254.20.10
        - 172.21.0.10
        options:
        - name: ndots
          value: "3"
        - name: timeout
          value: "1"
        - name: attempts
          value: "2"
        searches:
        - default.svc.cluster.local
        - svc.cluster.local
        - cluster.local
    
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default
      terminationGracePeriodSeconds: 30
    Value of dnsPolicy Description
    Default You can use this value if internal access from within the cluster is not required. The pod uses DNS servers that are specified in the /etc/resolv.conf file of the ECS instance.
    ClusterFirst This is the default value. The IP address of the kube-dns Service is used as the address of the DNS server that is used by the pod. For pods that use the host network, a value of ClusterFirst has the same effect as the value of Default.
    ClusterFirstWithHostNet For pods that use the host network, a value of ClusterFirstWithHostNet has the same effect as the value of ClusterFirst.
    None If you use this value, you can configure self-managed DNS servers and custom parameters in the DNSConfig section. If you enable the automatic injection of DNSConfig for NodeLocal DNSCache, the IP address of the local DNS cache and the IP address of the kube-dns Service are set as the addresses of the DNS servers.

Diagnose the status of the CoreDNS pod

Commands
  • Run the following command to query information about the CoreDNS pod:
    kubectl -n kube-system get pod -o wide -l k8s-app=kube-dns
    Expected output:
    NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE
    coredns-xxxxxxxxx-xxxxx   1/1     Running   0          25h   172.20.6.53   cn-hangzhou.192.168.0.198
  • Run the following command to query the real-time resource usage of the CoreDNS pod:
    kubectl -n kube-system top pod -l k8s-app=kube-dns
    Expected output:
    NAME                      CPU(cores)   MEMORY(bytes)
    coredns-xxxxxxxxx-xxxxx   3m           18Mi
  • If the CoreDNS pod is not in the Running state, run the kubectl -n kube-system describe pod <CoreDNS pod name> command to identify the cause.

Diagnose the CoreDNS log

Commands

Run the following command to query the log of CoreDNS:
kubectl -n kube-system logs -f --tail=500 --timestamps coredns-xxxxxxxxx-xxxxx
Parameter Description
f The log is streamed.
tail=500 The last 500 lines of the log are queried.
timestamps Timestamps are included in each line in the log output.
coredns-xxxxxxxxx-xxxxx The name of the CoreDNS pod.

Diagnose the DNS query log of CoreDNS

Commands

The DNS query log of CoreDNS is generated only when the log plug-in of CoreDNS is enabled. For more information about how to enable the log plug-in, see Introduce and configure the DNS service in ACK clusters.

Run the command that you use to query the log of CoreDNS. For more information, see Diagnose the CoreDNS log.

Diagnose the network connectivity of the CoreDNS pod

Procedure

  1. Log on to the node on which the CoreDNS pod runs.
  2. Run the ps aux | grep coredns command to query the ID of the CoreDNS process.
  3. Run the nsenter -t <pid> -n bash command to enter the network namespace to which CoreDNS belongs. Replace pid with the process ID that you obtained in the previous step.
  4. Test the network connectivity.
    1. Run the telnet <apiserver_slb_ip> 443 command to test the connectivity to the Kubernetes API server of the cluster.

      Replace apiserver_slb_ip with the IP address of the Service that is used to expose the Kubernetes API server of the cluster.

    2. Run the dig <domain> @<upstream_dns_server_ip> command to test the connectivity between the CoreDNS pod and the upstream DNS servers.

      Replace domain with the test domain name and upstream_dns_server_ip with the IP addresses of the upstream DNS servers, which are 100.100.2.136 and 100.100.2.138 by default.

Troubleshoot common issues

Symptom Cause Solution
CoreDNS cannot connect to the Kubernetes API server of the cluster. Errors occur on the Kubernetes API server of the cluster, the node is overloaded, or kube-proxy does not run as normal. Submit a ticket for troubleshooting.
CoreDNS cannot connect to the upstream DNS servers. The node is overloaded, the CoreDNS configurations are wrong, or the routing configurations of the Express Connect circuit are incorrect. Submit a ticket for troubleshooting.

Diagnose the network connectivity between application pods and the CoreDNS pod

Procedure

  1. Use one of the following methods to connect to the container network of the application pods.
    • Method 1: Run the kubectl exec command.
    • Method 2:
      1. Log on to the node on which the application pods run.
      2. Run the ps aux | grep <application process name> command to query the ID of the application process.
      3. Run the nsenter -t <pid> -n bash command to enter the network namespace to which the application pods belong.

        Replace pid with the process ID that you obtained in the previous step.

    • Method 3: If the application pods frequently restart, perform the following steps:
      1. Log on to the node on which the application pods run.
      2. Run the docker ps -a | grep <application container names> command to query the containers whose names start with k8s_POD_ . Record the sandboxed container IDs that are returned.
      3. Run the docker inspect <sandboxed container ID> | grep netns command to query the path of the network namespace to which the container belongs in the /var/run/docker/netns/xxxx file.
      4. Run the nsenter -n<netns path> -n bash command to enter the network namespace.

        Replace netns path with the path that you obtained in the previous step.

        Note Do not add spaces between -n and <netns path>.
  2. Test the network connectivity.
    1. Run the dig <domain> @<kube_dns_svc_ip> command to test the connectivity between the application pods and the kube-dns Service.

      Replace <domain> with the test domain name and <kube_dns_svc_ip> with the IP address of the kube-dns Service in the kube-system namespace.

    2. Run the ping <coredns_pod_ip> command to test the connectivity between the application pods and the CoreDNS pod.

      Replace <coredns_pod_ip> with the IP address of the CoreDNS pod in the kube-system namespace.

    3. Run the dig <domain> @<coredns_pod_ip> command to test the connectivity between the application pods and the CoreDNS pod.

      Replace <domain> with the test domain name and <coredns_pod_ip> with the IP address of the CoreDNS pod in the kube-system namespace.

Troubleshoot common issues

Symptom Cause Solution
The application pods cannot connect to the kube-dns Service. The node is overloaded, kube-proxy does not run as normal, or the security group rules block UDP port 53. Check whether the security group rules open UDP port 53. If the security group rules open UDP port 53, Submit a ticket for troubleshooting.
The application pods cannot connect to the CoreDNS pod. Errors related to the container network occur or the security group rules block Internet Control Message Protocol (ICMP). Diagnose the container network.
The application pods cannot connect to the CoreDNS pod. The node is overloaded or the security group rules block UDP port 53. Check whether the security group rules open UDP port 53. If the security group rules open UDP port 53, Submit a ticket for troubleshooting.

Diagnose the container network

  1. Log on to the ACK console.
  2. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
  3. In the left-side navigation pane of the ACK console, click Clusters.
  4. In the left-side navigation pane of the cluster details page, choose Operations > Cluster Check.
  5. In the left-side navigation pane of the Container Intelligence Service page, choose Cluster Check > Diagnosis.
  6. On the Diagnosis page, click the Network Diagnosis tab.
  7. Set Source address to the IP address of an application pod, Destination address to the IP address of the kube-dns Service, and Destination port to 53. Select Enable packet tracing and I know and agree. Then, click Create diagnosis.
  8. In the list of diagnostics, find the diagnostic that you created and click Diagnosis details in the operation column.
    You can view the diagnostic results that are displayed in the Diagnosis result, Packet paths, and All possible paths sections. The causes of errors are also provided. For more information, see Use the cluster diagnosis feature to troubleshoot cluster issues.

Capture packets

If you cannot identify the cause of DNS resolution errors, capture and diagnose packets.

  1. Log on to the nodes on which the application pods and CoreDNS pod run.
  2. Run the following command on each ECS instance to capture all recent packets received on port 53:
    tcpdump -i any port 53 -C 20 -W 200 -w /tmp/client_dns.pcap
  3. Diagnose the packets that are transferred during the time period in which DNS resolution errors occurred. You can obtain the time period from the application log.
    Note
    • Packet capture does not affect your service. Packet capture only causes a slight increase in the CPU usage and disk I/O.
    • The preceding command rotates the captured packets and can generate at most 200 .pcap files that each has a size of 20 MB.