CDN architecture and principle analysis
1. CDN Overview
The full name of CDN is Content Delivery Network, that is, content distribution network. Its purpose is to add a new layer of CACHE (caching) layer to the existing Internet, and publish the content of the website to the node closest to the "edge" of the user's network, so that the user can obtain the required content nearby and improve the user's performance. Responsiveness of access to the website. Technically solve the reasons such as small network bandwidth, large number of user visits, uneven distribution of outlets, etc., and improve the response speed of users visiting the website.
Cache layer technology eliminates node device blocking caused by data peak access. The Cache server has a caching function, so most web page objects (Web page objects), such as html, htm, php and other page files, gif, tif, png, bmp and other image files, and files in other formats, are within the validity period (TTL) , for repeated visits, there is no need to retransmit the file entity from the original website, just pass simple authentication (Freshness Validation) - transmit a header of tens of bytes, and the local copy can be directly transmitted to the visitor. Since the cache server is usually deployed close to the client, it can obtain a response speed similar to that of a local area network and effectively reduce the consumption of wide area bandwidth. Not only can it improve the response speed and save bandwidth, but it is also very effective in speeding up the web server and effectively reducing the load on the source server.
According to different acceleration objects, it is divided into client acceleration and server acceleration
Client acceleration: Cache is deployed at the network egress to cache frequently accessed content locally, improving response speed and saving bandwidth;
Server acceleration: Cache is deployed on the front end of the server as a proxy cache machine for the web server to improve the performance of the web server and speed up the access speed
If multiple Cache acceleration servers are distributed in different regions, it is necessary to manage the Cache network through an effective mechanism, guide users to visit nearby (for example, guide users through DNS), and globally load balance traffic. This is the basic idea of the CDN content delivery network.
The role of CDN in optimizing the network is mainly reflected in the following aspects - Solving the "first mile" problem on the server side - Alleviating or even eliminating the impact of interconnection bottlenecks between different operators - Reducing the export bandwidth pressure of each province - Alleviating The pressure on the backbone network - optimize the distribution of hot content on the Internet
2. How CDNs work
2.1. Traditional access process (not accelerated cache service)
Let's first look at the access process of the traditional uncached service to understand the difference between the CDN cached access method and the uncached access method:
As can be seen from the figure above, the process for a user to access a website that does not use a CDN cache is:
The user enters the domain name to visit, and the operating system queries LocalDns for the IP address of the domain name.
LocalDns queries ROOT DNS for the authoritative server of the domain name (this assumes that the LocalDns cache expires)
ROOT DNS responds the domain name authorization dns record to LocalDns
After LocalDns obtains the authorized dns record of the domain name, it continues to query the IP address of the domain name from the authorized dns of the domain name
After the domain name authorization dns queries the domain name record, it responds to LocalDns
LocalDns will respond to the obtained domain name ip address to the client
After the user obtains the IP address of the domain name, he visits the site server
The site server answers the request, returning the content to the client.
2.2. CDN access process (using cache service)
The CDN network is to add a Cache layer between the user and the server, mainly by taking over the DNS to guide the user's request to the Cache to obtain the data of the source server
Let's take a look at the process of accessing a website cached by CDN:
From the above figure, we can see that the access process of the website after using the CDN cache becomes:
The user enters the domain name to visit, and the operating system queries LocalDns for the IP address of the domain name.
LocalDns queries ROOT DNS for the authoritative server of the domain name (this assumes that the LocalDns cache expires)
ROOT DNS responds the domain name authorization dns record to LocalDns
After LocalDns obtains the authorized dns record of the domain name, it continues to query the IP address of the domain name from the authorized dns of the domain name
After the domain name authorization dns queries the domain name record (usually CNAME), it responds to LocalDns
After LocalDns obtains the domain name record, it queries the IP address of the domain name from the intelligent scheduling DNS
Smart Scheduling DNS Responds to LocalDns with the most suitable CDN node ip address according to certain algorithms and strategies (such as static topology, capacity, etc.)
LocalDns will respond to the obtained domain name ip address to the client
After the user obtains the IP address of the domain name, he visits the site server
The CDN node server responds to the request and returns the content to the client. (On the one hand, the cache server saves it locally for future use, and on the other hand, returns the acquired data to the client to complete the data service process)
Through the above analysis, we can get that in order to achieve transparent access to ordinary users (the user client does not need to make any settings after using the cache), it is necessary to use DNS (Domain Name Resolution) to guide users to access the Cache server to achieve transparent acceleration services. Since the first step for users to visit a website is domain name resolution, it is the simplest and most effective way to guide users to visit by modifying DNS.
2.3. Elements of a CDN network
For ordinary Internet users, each CDN node is equivalent to a website server placed around it.
Through the takeover of dns, the user's request is transparently directed to the nearest node, and the CDN server in the node will respond to the user's request just like the original server of the website.
Since it is closer to the user, the response time is necessarily faster.
The piece circled by the dotted circle in the figure above is the CDN layer, which is located between the client and the site server.
Intelligent scheduling DNS (such as f5's 3DNS)
Smart Scheduling DNS is a key system in CDN service. When a user visits a website that joins CDN service, the domain name resolution request will be finally handled by Smart Scheduling DNS.
It provides the user with the node address closest to the user at that time through a set of pre-defined policies, so that the user can get fast service.
At the same time, it needs to maintain communication with CDN nodes distributed in various places, track the health status and capacity of each node, and ensure that user requests are allocated to the nearest available nodes.
cache function service
Load balancing equipment (such as lvs, BIG/IP of F5)
Content Cache server (such as squid)
Shared storage (determine whether it is needed according to the amount of cached data)
3. CDN intelligent scheduling Dns instance analysis
Analyze img.alibaba.com domain name
In the system, execute the dig command, the output is as follows:
Summary: Generally speaking, when a website needs to use CDN services, it usually CNAMEs the domain name that needs to be accelerated to the domain name of the CDN service provider.
Both cache service and scheduling functions are completed by the service provider.
4. Simplified implementation of CDN's intelligent scheduling Dns
4.1. Scheduling Policy Description
When a user requests to resolve a domain name, Smart DNS judges the IP of the user's LocalDns, and then matches it with the range of the IP table inside the DNS server to see if the user is a Telecom or Netcom user, and then returns the corresponding IP address to the user
The method of static topology is used here, just to judge the IP of LocalDns. If you want to use more complex scheduling algorithm, you can consider commercial products, such as F5's 3DNS.
4.2. Hypothetical CDN node planning
Here we will use the View function of BIND to distinguish operators. Suppose we have a CDN node in each operator's computer room. The list is as follows:
Domain Name Operator (view) Service Address
The following is a partial interception of the named.conf configuration file, only the part related to View, other details can refer to the Internet.
Authorize DNS to determine the ip address of LocalDns used by the user, and match the ip range set above. If the range is in Netcom, it will respond to the corresponding ip address of Netcom (192.168.0.1) to LocalDns (others and so on)
LocalDns will get the domain name ip address and respond to the client (the domain name resolution is completed)
Explanation: In this process, we simplify the CNAME process between the primary DNS and the smart DNS (for a brief description of the problem).
The static topology (according to the ip range) method is used here, also known as the regionalization method, which only judges the IP of LocalDns.
Problems with this simplified scheme
If the user sets the wrong dns, it may cause the user's access to be slower than before (for example, Netcom users set the DNS of China Telecom)
Unable to judge the health status and capacity status of the CDN node server, users may be directed to unavailable CDN nodes
Due to the static topology method, there may be CDN nodes accessed by users that are not optimal and fastest
..... there may be other unexpected....
5. Summary
When establishing a CDN network, the most critical thing is the intelligent scheduling DNS, which is the overall coordination of the CND network. Through efficient scheduling algorithms, users can get the best access experience.
The second is the management of CND nodes, such as the synchronization mechanism involving content, the update of configuration files, etc., all of which need to be guaranteed by a set of mechanisms.
Of course, in large-scale websites, the cost and rate of return of building a CDN system should also be considered.
The full name of CDN is Content Delivery Network, that is, content distribution network. Its purpose is to add a new layer of CACHE (caching) layer to the existing Internet, and publish the content of the website to the node closest to the "edge" of the user's network, so that the user can obtain the required content nearby and improve the user's performance. Responsiveness of access to the website. Technically solve the reasons such as small network bandwidth, large number of user visits, uneven distribution of outlets, etc., and improve the response speed of users visiting the website.
Cache layer technology eliminates node device blocking caused by data peak access. The Cache server has a caching function, so most web page objects (Web page objects), such as html, htm, php and other page files, gif, tif, png, bmp and other image files, and files in other formats, are within the validity period (TTL) , for repeated visits, there is no need to retransmit the file entity from the original website, just pass simple authentication (Freshness Validation) - transmit a header of tens of bytes, and the local copy can be directly transmitted to the visitor. Since the cache server is usually deployed close to the client, it can obtain a response speed similar to that of a local area network and effectively reduce the consumption of wide area bandwidth. Not only can it improve the response speed and save bandwidth, but it is also very effective in speeding up the web server and effectively reducing the load on the source server.
According to different acceleration objects, it is divided into client acceleration and server acceleration
Client acceleration: Cache is deployed at the network egress to cache frequently accessed content locally, improving response speed and saving bandwidth;
Server acceleration: Cache is deployed on the front end of the server as a proxy cache machine for the web server to improve the performance of the web server and speed up the access speed
If multiple Cache acceleration servers are distributed in different regions, it is necessary to manage the Cache network through an effective mechanism, guide users to visit nearby (for example, guide users through DNS), and globally load balance traffic. This is the basic idea of the CDN content delivery network.
The role of CDN in optimizing the network is mainly reflected in the following aspects - Solving the "first mile" problem on the server side - Alleviating or even eliminating the impact of interconnection bottlenecks between different operators - Reducing the export bandwidth pressure of each province - Alleviating The pressure on the backbone network - optimize the distribution of hot content on the Internet
2. How CDNs work
2.1. Traditional access process (not accelerated cache service)
Let's first look at the access process of the traditional uncached service to understand the difference between the CDN cached access method and the uncached access method:
As can be seen from the figure above, the process for a user to access a website that does not use a CDN cache is:
The user enters the domain name to visit, and the operating system queries LocalDns for the IP address of the domain name.
LocalDns queries ROOT DNS for the authoritative server of the domain name (this assumes that the LocalDns cache expires)
ROOT DNS responds the domain name authorization dns record to LocalDns
After LocalDns obtains the authorized dns record of the domain name, it continues to query the IP address of the domain name from the authorized dns of the domain name
After the domain name authorization dns queries the domain name record, it responds to LocalDns
LocalDns will respond to the obtained domain name ip address to the client
After the user obtains the IP address of the domain name, he visits the site server
The site server answers the request, returning the content to the client.
2.2. CDN access process (using cache service)
The CDN network is to add a Cache layer between the user and the server, mainly by taking over the DNS to guide the user's request to the Cache to obtain the data of the source server
Let's take a look at the process of accessing a website cached by CDN:
From the above figure, we can see that the access process of the website after using the CDN cache becomes:
The user enters the domain name to visit, and the operating system queries LocalDns for the IP address of the domain name.
LocalDns queries ROOT DNS for the authoritative server of the domain name (this assumes that the LocalDns cache expires)
ROOT DNS responds the domain name authorization dns record to LocalDns
After LocalDns obtains the authorized dns record of the domain name, it continues to query the IP address of the domain name from the authorized dns of the domain name
After the domain name authorization dns queries the domain name record (usually CNAME), it responds to LocalDns
After LocalDns obtains the domain name record, it queries the IP address of the domain name from the intelligent scheduling DNS
Smart Scheduling DNS Responds to LocalDns with the most suitable CDN node ip address according to certain algorithms and strategies (such as static topology, capacity, etc.)
LocalDns will respond to the obtained domain name ip address to the client
After the user obtains the IP address of the domain name, he visits the site server
The CDN node server responds to the request and returns the content to the client. (On the one hand, the cache server saves it locally for future use, and on the other hand, returns the acquired data to the client to complete the data service process)
Through the above analysis, we can get that in order to achieve transparent access to ordinary users (the user client does not need to make any settings after using the cache), it is necessary to use DNS (Domain Name Resolution) to guide users to access the Cache server to achieve transparent acceleration services. Since the first step for users to visit a website is domain name resolution, it is the simplest and most effective way to guide users to visit by modifying DNS.
2.3. Elements of a CDN network
For ordinary Internet users, each CDN node is equivalent to a website server placed around it.
Through the takeover of dns, the user's request is transparently directed to the nearest node, and the CDN server in the node will respond to the user's request just like the original server of the website.
Since it is closer to the user, the response time is necessarily faster.
The piece circled by the dotted circle in the figure above is the CDN layer, which is located between the client and the site server.
Intelligent scheduling DNS (such as f5's 3DNS)
Smart Scheduling DNS is a key system in CDN service. When a user visits a website that joins CDN service, the domain name resolution request will be finally handled by Smart Scheduling DNS.
It provides the user with the node address closest to the user at that time through a set of pre-defined policies, so that the user can get fast service.
At the same time, it needs to maintain communication with CDN nodes distributed in various places, track the health status and capacity of each node, and ensure that user requests are allocated to the nearest available nodes.
cache function service
Load balancing equipment (such as lvs, BIG/IP of F5)
Content Cache server (such as squid)
Shared storage (determine whether it is needed according to the amount of cached data)
3. CDN intelligent scheduling Dns instance analysis
Analyze img.alibaba.com domain name
In the system, execute the dig command, the output is as follows:
Summary: Generally speaking, when a website needs to use CDN services, it usually CNAMEs the domain name that needs to be accelerated to the domain name of the CDN service provider.
Both cache service and scheduling functions are completed by the service provider.
4. Simplified implementation of CDN's intelligent scheduling Dns
4.1. Scheduling Policy Description
When a user requests to resolve a domain name, Smart DNS judges the IP of the user's LocalDns, and then matches it with the range of the IP table inside the DNS server to see if the user is a Telecom or Netcom user, and then returns the corresponding IP address to the user
The method of static topology is used here, just to judge the IP of LocalDns. If you want to use more complex scheduling algorithm, you can consider commercial products, such as F5's 3DNS.
4.2. Hypothetical CDN node planning
Here we will use the View function of BIND to distinguish operators. Suppose we have a CDN node in each operator's computer room. The list is as follows:
Domain Name Operator (view) Service Address
The following is a partial interception of the named.conf configuration file, only the part related to View, other details can refer to the Internet.
Authorize DNS to determine the ip address of LocalDns used by the user, and match the ip range set above. If the range is in Netcom, it will respond to the corresponding ip address of Netcom (192.168.0.1) to LocalDns (others and so on)
LocalDns will get the domain name ip address and respond to the client (the domain name resolution is completed)
Explanation: In this process, we simplify the CNAME process between the primary DNS and the smart DNS (for a brief description of the problem).
The static topology (according to the ip range) method is used here, also known as the regionalization method, which only judges the IP of LocalDns.
Problems with this simplified scheme
If the user sets the wrong dns, it may cause the user's access to be slower than before (for example, Netcom users set the DNS of China Telecom)
Unable to judge the health status and capacity status of the CDN node server, users may be directed to unavailable CDN nodes
Due to the static topology method, there may be CDN nodes accessed by users that are not optimal and fastest
..... there may be other unexpected....
5. Summary
When establishing a CDN network, the most critical thing is the intelligent scheduling DNS, which is the overall coordination of the CND network. Through efficient scheduling algorithms, users can get the best access experience.
The second is the management of CND nodes, such as the synchronization mechanism involving content, the update of configuration files, etc., all of which need to be guaranteed by a set of mechanisms.
Of course, in large-scale websites, the cost and rate of return of building a CDN system should also be considered.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00