How to Optimize Your Network in the Mobile Internet Era – Domain Name System Resolution

By Lingming, from Alibaba Cloud Mobile Service

Domain Name is the identification of a computer or a group of computers on the Internet composed of a string of names separated by dots. It is created to facilitate people to access services on the Internet more simply and conveniently. In the actual system implementation, the domain name is converted into the server's IP address through Domain Name System (DNS) to facilitate the machine addressing and communication through IP. The preceding behavior is called domain name resolution.

As the most important part of network communication, domain name resolution is very important. Domain name resolution is implemented by the browser kernel in traditional browser-based website access scenarios. Website developers do not need to think about the details of domain name resolution. However, there are always two sides to every coin. Once an exception occurs in the domain name resolution process, developers will be helpless in the face of such a black-box architecture. A typical example is domain name hijacking, which we will introduce in detail later.

In the era of mobile Internet, a large number of applications are built based on C/S architecture. Compared with the traditional browser-oriented web app, applications with C/S architecture give us very large freedom for software customization. Developers can even penetrate the underlying network implementation of the entire application, thus making the optimization of domain name resolution possible. This article looks at the problems, the corresponding causes, and possible optimization schemes in traditional domain name resolution.

Basic Concepts about Domain Name Resolution

There are several proper nouns we need to know before understanding the process of traditional domain name resolution:

Root Domain, Top-Level Domain, and Second-Level Domain

DNS is generally organized in a tree structure. Let's take ru.wikipedia.org as an example. org is a top-level domain name, Wikipedia is a second-level domain name, and ru is a third-level domain name. The following figure shows the tree structure of ru.wikipedia.org:

Authoritative DNS

Authoritative DNS is the server that ultimately determines the resolution result of a domain name. Developers can configure, change, and delete the corresponding resolution result of a specific domain name on authoritative DNS. Alibaba Cloud DNS is an authoritative DNS service provider.

Recursive DNS

Recursive DNS is also called Local DNS. It cannot determine the domain name resolution result, but it represents the process of obtaining the domain name resolution result from authoritative DNS. Recursive DNS has a cache module. When there is a cached resolution result of the target domain name, and the Time to Live (TTL) has not expired, recursive DNS will return the cached resolution result. (Each domain name has TTL. If the expiry time of the cached domain name resolution result exceeds the TTL, you need to obtain the resolution result from the authoritative DNS again.) Otherwise, recursive DNS queries the authoritative DNS of the domain name at each level until the final complete resolution result of the domain name is obtained. The following section describes the specific process of domain name resolution.

Public DNS

Public DNS is a special case of recursive DNS. It is a recursive DNS service open to the entire network, while the traditional recursive DNS information is generally distributed to users by operators. A typical public DNS is Google's 8.8.8.8. We can implement the domain name resolution by configuring public DNS in the operating system configuration file instead of Local DNS.

In practice, we usually do not need to manually specify our own Local DNS address. The operator will assign Local DNS addresses to our computers during the system network initialization phase through the Dynamic Host Configuration Protocol (DHCP). When we need to use public DNS services, we must manually specify the addresses of these services. Taking Linux as an example, we can change the Local DNS address by adding the Local DNS address entry in the '/etc/resolv.conf'.

After understanding the common terms related to domain name resolution above, let's take a closer look at how domain name resolution occurs:

As shown in the preceding figure, when accessing www.taobao.com, the complete domain name resolution process includes the following:

The terminal initiates a domain name resolution request to Local DNS.
After obtaining the domain name resolution request, Local DNS first obtains the address of the root domain name server from Root hints. (Root hints contain the address information of the DNS root server.)
After obtaining the address of the root domain name server, Local DNS initiates a DNS resolution request to the root domain name server, and the root domain name server returns the address of com (top-level domain name) server.
Local DNS initiates a resolution request to the com server and obtains the address of taobao.com (second-level domain name) server.
Local DNS initiates a resolution request to the taobao.com server and finally obtains the IP address information of www.taobao.com.
Local DNS caches the IP address obtained through recursive queries and returns it to the client.

The Local DNS server contains a cache module. In the actual domain name resolution process, the Local DNS server will query the cached result first and directly return it if the cached result hits and the TTL of the resolution result has not expired. Otherwise, the recursive query will be started.

Problems Faced by Traditional Domain Name Resolution

After understanding the basic concepts and overall process of domain name resolution, let's explore a series of problems in traditional domain name resolution.

Domain Name Hijacking

Domain name hijacking has always been one of the problems that plague many developers. It shows that the DNS resolution result IP1 of domain name A that should be returned has been maliciously replaced with IP2, resulting in a failure to access A or a visit an unsafe website. Let's look at several common domain name hijacking scenarios:

One possible type of domain name hijacking is that attackers invade the broadband router, tamper with the Local DNS of end users, and replace it with the forged Local DNS. Then, domain name hijacking is performed by controlling the Local DNS logic to return the wrong IP information. On the other hand, since DNS resolution is mainly based on UDP, in addition to the attacks above, attackers can listen to the domain name resolution requests from end users and send forged DNS resolution responses to the end user before Local DNS returns the correct results, thus controlling the domain name access of end users.

The impact of the attacks above is relatively limited. Another common domain name hijacking is cache pollution. When receiving a domain name resolution request, Local DNS will query the cache first. If the cache hits, it will directly return the cache results without recursive queries. At this time, if Local DNS changes the cache of some domain names, such as pointing the cache results to the advertisement pages of a third party, the user's access request will be directed to these advertisement page addresses.

Compared with the first type of attack, cache pollution often brings more evident group damage. For example, the user group of an operator in a province may access services abnormally due to the cache pollution of Local DNS in the region. This kind of cache pollution often occurs intermittently and locally, and there is no obvious rule, which makes it difficult for developers to quantify, evaluate, and prevent it.

Some may ask, “If I use HTTPS, can I avoid domain name hijacking?” The answer is no. The domain name resolution takes place before the encrypted network request interaction. Just imagine, if the client does not have the exact address of the server, how do we know with whom to conduct encrypted negotiation and communication?

Imprecise Scheduling

In addition to domain name hijacking, domain name resolution based on traditional Local DNS also brings the problem of imprecise domain name scheduling. The demand for precise scheduling is very strong for scenarios (such as Content Delivery Network (CDN) domain name access) that require intelligent resolution and scheduling by region and operator.

Regarding imprecise scheduling, we can mainly explore it from two aspects. The first is resolution forwarding.

Some Local DNS providers will forward domain name resolution requests for their own nodes to Local DNS nodes of other providers to reduce operating costs, as shown in the preceding figure. If a user requests to resolve a CDN domain name - cdn.aliyun.com- the Local DNS A to which the user is assigned will forward the request to the Local DNS B of another operator to save costs. The authoritative DNS will do intelligent scheduling according to the IP information of Local DNS when performing domain name resolution. The authoritative DNS will perform scheduling according to the IP 78.29.29.1 of Local DNS B and assign the CDN node 78.29.29.2 with the same operator as 78.29.29.1 and closest geographically to 78.29.29.1. However, this CDN node is not the optimal CDN node for terminal 135.35.35.1. Local DNS A and Local DNS B belong to different operators and may be geographically far away from each other. This type of resolution forwarding seriously reduces the accuracy of domain name resolution and increases user access latency.

In addition to the negative impact of resolution forwarding on scheduling accuracy, the deployment of Local DNS affects the accuracy of intelligent domain name resolution.

Latency in Resolution Effectiveness

In some business scenarios, developers are very sensitive to the effective time of changes of domain name resolution results. (These changes are completed by developers on authoritative DNS.) For example, when a business server is attacked, we need to switch the business IP to another group of clusters as quickly as possible. Such demands cannot be satisfied under the traditional domain name resolution system.

Local DNS is independently deployed by each operator in each region. Therefore, the service quality of each Local DNS is uneven. The implementation strategies of each independent node are also different in the processing of the domain name resolution cache. For example, some nodes ignore the TTL of domain name resolution results to save expenses, resulting in a long period of latency in the effectiveness of the resolution result changed on authoritative DNS on the whole network for users. (The longest effective time we know is 48 hours.) This type of latency may directly cause exceptions to user access.

High Latency

You need to recursively traverse multiple DNS servers for the first DNS query or the query after the cache expires to obtain the final resolution result, which increases the latency of network requests. The mobile network quality is uneven, especially in the mobile Internet scenario, and the RTT in a weak network environment may be as high as hundreds of milliseconds. The preceding latency is a heavy burden for a common service request. On the other hand, resolution timeout and resolution failure are common in weak network environments. How to reasonably optimize DNS resolution is crucial to improving the overall network access quality.

HTTPDNS

After the introduction above, some of you may find that the essential root of many problems and challenges faced by traditional domain name resolution lies in the uncontrollable service quality of Local DNS. If a more secure, stable, and efficient recursive DNS service helps perform domain name resolution, the problems above can be completely solved.

HTTPDNS was created in this context. Let's look at the basic concepts of HTTPDNS and how it solves the problems faced by traditional DNS resolution.

Domain Name Hijacking Prevention

HTTPDNS uses the HTTP for domain name resolution instead of the existing UDP-based DNS protocol. Domain name resolution requests are directly sent to the HTTPDNS server, thus bypassing Local DNS, as shown in the following figure:

HTTPDNS replaces the traditional LocalDNS to complete the recursive resolution. It can be applied to almost all network environments based on the HTTP while retaining the extension capabilities with higher security (such as authentication and HTTPS) to avoid malicious attacks and hijacking. On the other hand, the cache management of the commercial HTTPDNS is guaranteed by SLA, which avoids problems similar to cache pollution in Local DNS.

Precise Scheduling

The essence of the scheduling accuracy of traditional domain name resolution lies in the deployment and allocation mechanism of Local DNS. Due to the fragmented management method, it is difficult to guarantee service quality in these links. HTTPDNS optimizes the interaction with authoritative DNS in recursive resolution implementation and directly delivers the IP information of end users to authoritative DNS through the edns-client-subnet protocol. This way, the authoritative DNS can ignore the Local DNS IP information and perform precise scheduling according to the IP information of end users, thus avoiding coordinate interference of Local DNS. (The premise of the precise scheduling scheme above is that authoritative DNS supports edns-client-subnet. Fortunately, the current mainstream authoritative DNS services have already supported the protocol.) The following example shows the process of precise scheduling.

Real-Time Effectiveness

In terms of the effective period of domain name resolution, HTTPDNS also has capabilities that traditional domain name resolution systems don't have. In a previous article, we mentioned Local DNS in each region is independently maintained, the service quality is uneven, and the cache implementation is different. Therefore, the problem that the resolution change taking effect on the whole network is delayed will not occur in commercial HTTPDNS services. (HTTPDNS strictly follows the DNS TTL for cache updates.) On the other hand, even if Local DNS strictly follows TTL for cache management (Here, we assume the TTL configured by the developer is five minutes), when the developer's business is attacked, and the IP information needs to be switched quickly, Local DNS will still follow the TTL and return the old IP information within five minutes. The impact on businesses in these five minutes is a big loss for medium and large enterprises. (A five-minute access anomaly may lead to a decline in millions of transactions for large e-commerce enterprises.). Let's take Alibaba Cloud HTTPDNS as an example. There is a proprietary solution for the quick effectiveness of HTTPDNS. With Alibaba Cloud DNS, the resolution result changes in authoritative DNS will be quickly synchronized to HTTPDNS, overwriting the original cache records and helping users implement domain name resolution transition within seconds.

In terms of DNS resolution latency, since HTTPDNS is based on HTTP and HTTP is based on TCP, there are more redundant handshake links than the traditional UDP transmission, so in principle, the overhead of network requests is not reduced. However, in practice, we can implement a zero-latency DNS resolution through the policy on end. Next, let's look at the best practices of HTTPDNS services on mobile. The following figure shows the process of real-time effectiveness:

Best Practices for Domain Name Resolution

We can implement functions including domain name hijacking prevention, precise scheduling, and real-time resolution effectiveness (through the HTTPDNS service). However, we need the client to work together to optimize the DNS resolution overhead.

Pre-Resolving

There is a startup period for the vast majority of apps in the application initialization phase. We can do some preflight work in this startup period. We can initiate asynchronous HTTPDNS resolution requests in the background for hot domain names of the business in the initialization phase. The pre-resolution results can be directly used in subsequent service requests, thus eliminating the DNS resolution overhead in the first service request and improving the loading speed of the app's home page.

There is one point everyone needs to pay attention to during the process of the client using HTTPDNS. A standard Web server (Nginx as an example) generally processes the value of the Host header in the HTTP request header as the domain name information of the HTTP request. (It depends on the server configuration, but this is generally the case.) For example, when we access the address www.aliyun.com/index.html through the standard network library, the network request is listed below:

> GET /index.html HTTP/1.1
> Host: www.aliyun.com
> User-Agent: curl/7.43.0
> Accept: */*

After using the HTTPDNS, we need to replace the Host domain in the URL of the HTTP request with the IP obtained in HTTPDNS resolution. (Note: The Host domain here refers to the Host field in the URL, not the Host header in the HTTP request header.*) At this time, since the standard network library will assign the Host domain in URL to the Host header in the HTTP request header, the network request issued is listed below:

> GET /index.html HTTP/1.1
> Host: 140.205.63.8
> User-Agent: curl/7.43.0
> Accept: */*

The Host information above will lead to abnormal resolution of the server. (The server is configured with domain name information instead of IP information. Imagine if our server serves two domain names: www.a.com and www.b.com and then receives a request: 140.205.63.8/index.html. At this time, how does it judge which home page it should return, a or b?) We need to actively set the value of the Host header of HTTP requests to solve this problem. Let's take Android's official network library HttpURLConnection as an example:

String originalUrl = “http://www.aliyun.com/index.html";
URL url = new URL(originalURL);
String originalHost = url.getHost();
// Synchronously obtain IP addresses
String ip = httpdns.getIpByHost(originalHost);
HttpURLConnection conn;
if (ip != null) {
    // If the IP address is successfully obtained through the HTTPDNS, perform URL replacement and Host header settings
    url = new URL(originalUrl.replaceFirst(originalHost, ip));
    conn = (HttpURLConnection) url.openConnection();
    // Set the Host request header
    conn.setRequestProperty("Host", originHost);
} else {
    conn = (HttpURLConnection) url.openConnection();
}

After actively setting the Host header, the network request initiated is the same as the network request without the URL replaced.

Intelligent Caching

The IP obtained through pre-resolving has a TTL, and we need to cache it reasonably for management. The DNS cache granularity of the operating system is relatively coarse. We can apply more fine-grained cache management on the client side to improve resolution efficiency. For example, the resolution results of CDN domain names will change in different network operator environments. The resolution result caching for different operators ensures that we can quickly make network requests during network switching, reducing the additional overhead caused by DNS resolution. What's more, we can do local persistent caching and directly read the cache for network access when the app starts next time to improve the loading speed of the first screen.

Lazy Loading

The implementation of the lazy loading strategy allows us to achieve zero-latency DNS resolution. The core implementation of the lazy loading strategy is listed below:

Domain name resolution requests at the business layer only interact with the cache, and the network resolution requests do not occur. If there is a cache record, the cache record in the business layer is returned directly, regardless of whether it expires or not.
If the cache record has expired, the background will initiate an asynchronous network request for HTTPDNS resolution.

Some may have doubts. Isn't it against the original design intention of TTL to return an expired IP? The behavior above does not meet the standard specifications, but when we re-examine our business characteristics, the workaround strategy mentioned above appears to be very meaningful. Our backend IP addresses are several fixed nodes in most business scenarios. Therefore, the continuous resolution results are likely to be consistent under the same environment, which ensures the feasibility of lazy loading to a certain extent. On the other hand, even if an access exception is caused by the return of an expired IP address, the background will quickly perform asynchronous resolution and cache updates of new IP addresses, and the business can be retried and quickly recovered. Therefore, the impact of the behavior above is very small. Furthermore, the service of IP addresses with TTL expired is continuous and predictable in most scenarios. Therefore, the business risks that may be caused by lazy loading are completely controllable. It is still very cost-effective to exchange the instantaneous access risk of business in the 0.1% scenario for the improvement of user experience in the 99.9% scenario. (The use of lazy loading depends on the appropriate business scenario. If IP frequently changes in your business scenario and the access to IP addresses with TTL expired is not available, it is not recommended to apply the lazy loading strategy.)

The following figure depicts the implementation framework of pre-resolving + lazy loading:

In summary, we can see that if we need to achieve the effect of the zero-latency resolution, there is still a lot of work to be done for the client. Alibaba Cloud HTTPDNS provides the SDK terminal to facilitate the integration and use of developers on the terminal. We recommend trying it.

Community

How to Optimize Your Network in the Mobile Internet Era – Domain Name System Resolution

Basic Concepts about Domain Name Resolution

Root Domain, Top-Level Domain, and Second-Level Domain

Authoritative DNS

Recursive DNS

Public DNS

Problems Faced by Traditional Domain Name Resolution

Domain Name Hijacking

Imprecise Scheduling

Latency in Resolution Effectiveness

High Latency

HTTPDNS

Domain Name Hijacking Prevention

Precise Scheduling

Real-Time Effectiveness

Best Practices for Domain Name Resolution

Pre-Resolving

Intelligent Caching

Lazy Loading

References

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

HTTPDNS

EMAS HTTPDNS

WHOIS

.COM Domain