This topic describes the procedures to diagnose and troubleshoot Domain Name System (DNS) resolution errors of Kubernetes clusters, and introduces the common error messages that are returned when DNS resolution errors occur.

Terms

  • Internal domain name: CoreDNS exposes services deployed in a cluster through an internal domain name that ends with .cluster.local. DNS queries for the internal domain name are resolved based on the DNS cache of CoreDNS instead of the upstream DNS servers.
  • External domain name: Domain names other than the internal domain name of a cluster. DNS queries for external domain names can be resolved by CoreDNS or by the upstream DNS servers that are specified in DNSConfig. By default, 100.100.2.136 and 100.100.2.138 are specified as the upstream DNS servers. The default upstream DNS servers are deployed in a virtual private cloud (VPC). You can also specify self-managed DNS servers.
  • Application pod: Pods other than the pods of system components in a Kubernetes cluster.
  • Application pods that use CoreDNS for DNS resolutions: Application pods that use CoreDNS to process DNS queries.
  • Application pods that use NodeLocal DNSCache for DNS resolutions: After you install NodeLocal DNSCache in your cluster, you can configure DNS settings by injecting DNSConfig to application pods. This way, DNS queries of these pods are first sent to NodeLocal DNSCache. If NodeLocal DNSCache fails to process the queries, the queries are sent to the kube-dns Service of CoreDNS.

Troubleshooting procedure

Troubleshooting flowchart.png
  1. Check the domain name and DNS server. For more information, see Common error messages.
    • If the error message indicates that the domain name does not exist, refer to Check the domain name in the Troubleshooting procedure section.
    • If the error message indicates that connections to the DNS server cannot be established, refer to Check the frequency of errors in the Troubleshooting procedure section.
  2. If the error still exists, perform the following checks:
  3. If the error still exists, Submit a ticket.

Common error messages

Client (or command) Error message Possible cause
ping ping: xxx.yyy.zzz: Name or service not known The domain name does not exist or the DNS server is inaccessible. If the resolution latency is more than 5 seconds, a possible cause is that the DNS server is inaccessible.
curl curl: (6) Could not resolve host: xxx.yyy.zzz
PHP HTTP client php_network_getaddresses: getaddrinfo failed: Name or service not known in xxx.php on line yyy
Golang HTTP client dial tcp: lookup xxx.yyy.zzz on 100.100.2.136:53: no such host The domain name does not exist.
dig ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: xxxxx
Golang HTTP client dial tcp: lookup xxx.yyy.zzz on 100.100.2.139:53: read udp 192.168.0.100:42922->100.100.2.139:53: i/o timeout The DNS server cannot be accessed.
dig ;; connection timed out; no servers could be reached

Troubleshooting procedure

Procedure Symptom References for fixes
Check the domain name Resolution errors occur on the internal domain name and external domain name.
Resolution errors occur only on the external domain name. What do I do if the external domain name of my cluster cannot be resolved?
Resolution errors occur only on domain names that are added to Alibaba Cloud DNS PrivateZone and domain names that contain vpc-proxy. What do I do if domain names that are added to Alibaba Cloud DNS PrivateZone cannot be resolved?
Resolution errors occur only on the domain names of headless Services.
Check the frequency of errors Resolution errors occur every time.
Resolution errors occur only during peak hours.
Resolution errors occur at a high frequency.
Resolution errors occur at a low frequency.
Resolution errors occur only during node scaling events or CoreDNS scaling events. What do I do if DNS resolutions fail due to IP Virtual Server (IPVS) errors?