How do CDNs work?

CDN is a technology frequently used in the Internet. You may often hear people say: "Our website uses CDN technology", but maybe they don't know much about CDN, maybe it's limited to - after using it, the website access speed will become faster.

In fact, the principle of CDN is very simple. When the browser requests a resource, the first step is to do DNS resolution. DNS resolution is like finding a number from the address book based on the name. The browser sends the domain name, and then gets the IP address returned by the DNS server. The browser connects to the server through the IP address and obtains resources (the DNS server has many layers of caching, but it is beyond the scope of this article).

For small sites or personal blogs, one domain name corresponds to one IP address, while a large site may contain multiple IP addresses.

When requesting a resource (such as a website), the distance will affect the connection speed, so accessing foreign websites from China will be slower. Therefore, some large companies configure servers all over the world and synchronize data, which is called CDN, and those servers closest to local users are called "edge servers".

DNS resolution

When the browser makes a domain name resolution request through the CDN, there will be some differences for websites with a single IP. The DNS server will find the most suitable server to handle the request, and very simply, DNS will find the edge server closest to the request location. As shown in the figure below, if I send a request from Virginia to a server in the central part of the United States, I will get the address of the edge server on the east coast. If I send a request from California, I will get the address of the edge server on the west coast. .

That is to say, the first step in processing a request: finding the server closest to the requested location. Some companies may optimize CDN servers in other ways, for example, if the nearest server is at full capacity, then subsequent requests will go to other idle servers. In short, CDN will always find the most suitable server to handle the request.

get content
An edge server is a proxy cache, similar to a browser cache. When a request arrives at the edge server, it first checks whether the content is up to date. The cache identifier (key) is the entire URL address (same as the browser). If the content has been cached and has not expired, the cached content will be returned directly.

If there is no cache or it has expired, the edge server will send a request to the origin server to get the content and cache it.

Created an open source project called Apache Traffic Server, which is used to manage the direct interaction of CDN. If you want to know more about the principle of proxy caching, I recommend you to read the documentation of this project.

example

In the CDN service, a tool called "combo handler" is used, which will integrate multiple file requests into one request-response operation.

The domain name yui.yahooapis.com is part of the CDN service and will forward your request to the edge server closest to you. This request contains two files yui-base-min.js and array-extras-min.js, but only It takes one response to complete. These logical processing operations are not on the edge server, but only on the source server.

What does static mean? What situations are suitable for using CDN?
Whenever I describe a system like the "combined processor" above, I often get confused looks. CDN is sometimes easily confused with FTP resources, because they all upload static resources for others to obtain. I hope my description above will make it clear to everyone that the two are not the same thing. The edge server is a proxy, the origin server tells the edge server what to return, the origin server could be Java, Ruby, Node.js, .Net etc, so any logic can be implemented. The edge server does nothing but make the request and return the content.

Since CDN is so efficient, why not use CDN for everything on the website to improve performance? The essence of a CDN is caching. If a dynamic page is saved, the content of the page will change each time, and each request requires an interaction between the edge server and the source server, so this caching is meaningless.

This is why Javascript, CSS, images, Flash, audio, video and other files are particularly suitable for using CDN technology, because these files are unchanging, and all users get the same. Once cached by CDN, all users will benefit.

cache expires

The performance guide stipulates that static resources should have a cache expiration flag stored in the header of the Http protocol. There are two reasons for this: first, the browser will cache the resources, and second, the CDN will also cache the resources for a period of time. This also means that you can't use duplicate filenames, because they will be cached in at least two places, and users may never get the latest version of the file.

There are several ways to solve this problem, the YUI library is divided into directories containing different versions of the library. It is also common to add an identifier at the end of the filename, such as an MD5 hash or a revision number for version control software. Either way is to ensure that when the user's request includes an expiration flag, the latest version of the resource file can still be obtained.

epilogue

CDN technology is already an important part of the Internet today, and it will only become more important as time goes on. Even now, some companies are still trying to move more functions to edge servers in order to give users a faster experience. This includes a technology called Edge Side Includes (ESI), which is used to cache part of the content of the page.

A better understanding of CDN technology and working principles is the key to improving CDN performance.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us