CDN (Content Distribution Network) Technical Principles
1 Introduction
The rapid development of the Internet has brought great convenience to people's work and life, and the requirements for Internet service quality and access speed are getting higher and higher. Although the bandwidth is increasing and the number of users is also increasing, the load of the Web server Influenced by factors such as transmission distance and transmission distance, the slow response speed still often complains and troubles. The solution is to use caching technology in network transmission to make Web service data streams accessible nearby, which is a very effective technology for optimizing network data transmission, so as to obtain high-speed experience and quality assurance.
The purpose of network caching technology is to reduce the repeated transmission of redundant data in the network, minimize it, and convert wide-area transmission to local or nearby access. Most of the content transmitted on the Internet is repeated Web/FTP data. Cache servers and network devices using Caching technology can greatly optimize data link performance and eliminate node device congestion caused by data peak access. The Cache server has a caching function, so most web page objects (Web page objects), such as html, htm, php and other page files, gif, tif, png, bmp and other image files, and files in other formats, are within the validity period (TTL) , for repeated visits, there is no need to retransmit the file entity from the original website, just pass simple authentication (Freshness Validation) - transmit a header of tens of bytes, and the local copy can be directly transmitted to the visitor. Since the cache server is usually deployed close to the client, it can obtain a response speed similar to that of a local area network and effectively reduce the consumption of wide area bandwidth. According to statistics, more than 80% of users on the Internet repeatedly visit 20% of information resources, which provides a prerequisite for the application of caching technology. The architecture of the cache server is different from that of the web server. The cache server can achieve higher performance than the web server. The cache server can not only improve the response speed and save bandwidth, but also is very effective in speeding up the web server and effectively reducing the load on the source server.
Cache Server (Cache Server) is a professional functional server with highly integrated software and hardware. It mainly provides cache acceleration services and is generally deployed at the edge of the network. According to different acceleration objects, it is divided into client acceleration and server acceleration. Client acceleration Cache is deployed at the network egress to cache frequently accessed content locally, improving response speed and saving bandwidth; server acceleration is deploying Cache at the front end of the server as a The front-end processor of the web server improves the performance of the web server and speeds up the access speed. If multiple Cache acceleration servers are distributed in different regions, it is necessary to manage the Cache network through an effective mechanism, guide users to visit nearby, and globally load balance traffic. This is the basic idea of the CDN content delivery network.
2. What are CDNs?
The full name of CDN is Content Delivery Network, that is, content distribution network. Its purpose is to publish the content of the website to the "edge" of the network closest to the user by adding a new layer of network architecture to the existing Internet, so that users can obtain the required content nearby, solve Internet network congestion, and improve The responsiveness of the user's access to the website. Technically, it fully solves the root cause of the slow response speed of users visiting websites due to reasons such as small network bandwidth, large user visits, and uneven distribution of outlets.
In a narrow sense, Content Distribution Network (CDN) is a new type of network construction method, which is a network coverage layer specially optimized for distributing broadband rich media on traditional IP networks; and in a broad sense, CDN represents A network service model based on quality and order. Simply put, Content Publishing Network (CDN) is a strategically deployed overall system, including distributed storage, load balancing, network request redirection, and content management. Content management and global network traffic management ( Traffic Management) is the core of CDN. By judging by user proximity and server load, CDN ensures that content is served to user requests in an extremely efficient manner. In general, the content service is based on the cache server, also known as the proxy cache (Surrogate), which is located at the edge of the network, only "one hop" (Single Hop) away from the user. At the same time, the proxy cache is a transparent mirror of the source server of the content provider (usually located in the data center of the CDN service provider). Such an architecture enables CDN service providers to provide the best possible experience on behalf of their customers, content providers, to end users who cannot tolerate any delay in request response time. According to statistics, the use of CDN technology can handle 70% to 95% of the content visits of the entire website page, reduce the pressure on the server, and improve the performance and scalability of the website.
Compared with the current content release mode, CDN emphasizes the importance of the network in content release. By introducing active content management layer and global load balancing, CDN is fundamentally different from the traditional content distribution mode. In the traditional content release mode, the release of content is completed by the application server of ICP, and the network is only a transparent data transmission channel. Differentiate the quality of service according to different content objects. In addition, due to the "best-effort" feature of the IP network, its quality assurance is realized by providing sufficient bandwidth throughput between the user and the application server end-to-end, which is much larger than the actual need. In such a content publishing mode, not only a large amount of valuable backbone bandwidth is occupied, but also the load on the ICP application server becomes very heavy and unpredictable. When some hotspot events and surge traffic occur, a local hotspot effect will be generated, causing the application server to be overloaded and out of service. Another defect of this central application server-based content distribution model is the lack of personalized services and the distortion of the broadband service value chain. Content providers undertake content distribution services that they should not and cannot do well.
Looking at the entire value chain of broadband services, content providers and users are located at both ends of the entire value chain, with network service providers connecting them in the middle. With the maturity of the Internet industry and the transformation of business models, the roles in this value chain are becoming more and more subdivided. Such as content/application operators, hosting service providers, backbone network service providers, access service providers and so on. Every role in this value chain must cooperate with each other and perform their duties in order to provide customers with good services, thus bringing about a win-win situation. From the point of view of the combination mode of content and network, content publishing has gone through the two stages of ICP content (application) server and IDC. The IDC craze has also spawned the role of managed service provider. However, IDC does not solve the problem of effective distribution of content. The fact that the content is located in the center of the network cannot solve the occupation of the backbone bandwidth and establish the traffic order on the IP network. Therefore, it is an obvious choice to push content to the edge of the network and provide users with edge services that are close to each other, so as to ensure the quality of service and access order on the entire network. And this is the content delivery network (CDN) service model. The establishment of CDN solves the dilemma of content "centralization and decentralization" that plagues content operators. Undoubtedly, it is valuable and indispensable for building a good Internet value chain.
3. CDN New Applications and Clients
The current CDN services are mainly used in the fields of securities, finance and insurance, ISP, ICP, online transactions, portal websites, large and medium-sized companies, and online teaching. In addition, it can be used in industry private networks and the Internet, and can even be used for network optimization of local area networks. Using CDN, these websites do not need to invest in expensive various servers and set up sub-sites, especially the wide application of streaming media information, distance teaching courseware and other media information that consumes a lot of bandwidth resources, and use CDN network to copy the content to the edge of the network It is of great significance to minimize the distance between the content request point and the delivery point, thereby promoting the improvement of the performance of the Web site. The construction of CDN network mainly includes the CDN network built by enterprises to serve enterprises; the CDN network of IDC mainly serves IDC and value-added services; the CDN network mainly built in network operation mainly provides content push services; The built CDN is used for services. Users cooperate with the CDN organization. The CDN is responsible for information transmission, ensuring the normal transmission of information, and maintaining the transmission network. The website only needs content maintenance and no longer needs to consider traffic issues.
CDN can provide guarantees for the speed, security, stability, and scalability of the network.
IDC establishes a CDN network. IDC operators generally need to have multiple IDC centers in various locations. The service targets are customers hosted in the IDC center. Using existing network resources, the investment is small and easy to build. For example, an IDC has 10 computer rooms across the country. Joining the IDC's CDN network and hosting a web server on one node is equivalent to having 10 mirror servers for customers to visit nearby. Broadband metropolitan area network, the intra-domain network speed is very fast, and the bandwidth outside the city will generally be a bottleneck. In order to reflect the high-speed experience of the metropolitan area network, the solution is to cache the content on the Internet locally and deploy the Cache on each POP point of the metropolitan area network In this way, an efficient and orderly network is formed, and users can access most of the content with only one hop. This is also an application to accelerate all website CDNs.
4. How CDNs work
Before describing the implementation principle of CDN, let us first look at the access process of the traditional uncached service, so as to understand the difference between the CDN cached access method and the uncached access method:
As can be seen from the figure above, the process for a user to access a website that does not use a CDN cache is:
1) The user provides the browser with the domain name to be accessed;
2) The browser invokes the domain name resolution function library to resolve the domain name to obtain the IP address corresponding to the domain name;
3), the browser uses the obtained IP address, and the service host of the domain name sends a data access request;
4). The browser displays the content of the webpage according to the data returned by the domain name host.
Through the above four steps, the browser completes the entire process from receiving the domain name that the user wants to visit to obtaining data from the domain name service host. The CDN network is to add a Cache layer between the user and the server. How to guide the user's request to the Cache to obtain the data of the source server is mainly realized by taking over the DNS. Let us look at the process of accessing the website after using the CDN cache:
From the above figure, we can know that the access process of the website after using the CDN cache becomes:
1) The user provides the browser with the domain name to be accessed;
2) The browser invokes the domain name resolution library to resolve the domain name. Since the CDN has adjusted the domain name resolution process, the resolution library generally obtains the CNAME record corresponding to the domain name. In order to obtain the actual IP address, the browser needs to The obtained CNAME domain name is resolved to obtain the actual IP address; in this process, the global load balancing DNS resolution is used, such as the corresponding IP address is resolved according to the geographical location information, so that the user can visit nearby.
3) The IP address of the CDN cache server is obtained through this analysis, and the browser sends an access request to the cache server after obtaining the actual IP address;
4) According to the domain name to be accessed provided by the browser, the cache server obtains the actual IP address of the domain name through the internal dedicated DNS analysis of the Cache, and then the cache server submits an access request to the actual IP address;
5) After the cache server obtains the content from the actual IP address, on the one hand, it saves it locally for future use, and on the other hand, it returns the acquired data to the client to complete the data service process;
6). After the client obtains the data returned by the cache server, it will be displayed and the entire browsing data request process will be completed.
Through the above analysis, we can get that, in order to be transparent to ordinary users (that is, after adding the cache, the user client does not need to make any settings, and can directly use the original domain name of the accelerated website to access), but also to provide the designated website To provide acceleration services while reducing the impact on ICP, just modify the domain name resolution part of the entire access process to achieve transparent acceleration services. The following is the specific operation process of CDN network implementation.
1) As an ICP, it only needs to hand over the right to interpret the domain name to the CDN operator, and does not need to make any changes in other aspects; when operating, the ICP modifies the resolution record of its own domain name, and generally uses the cname method to point to the address of the CDN network Cache server.
2) As a CDN operator, it first needs to provide public resolution for the ICP domain name. In order to implement the sortlist, it generally points the ICP domain name interpretation result to a CNAME record;
3) When a sorlist is required, the CDN operator can use DNS to perform special processing on the domain name resolution process pointed to by the CNAME, so that the DNS server can return different IP addresses of the same domain name according to the client's IP address when receiving a client request ;
4) Since the IP address obtained from the cname has hostname information, after the request reaches the Cache, the Cache must know the IP address of the source server, so an internal DNS server is maintained within the CDN operator to explain what the user is visiting The real IP address of the domain name;
5) When maintaining the internal DNS server, it is also necessary to maintain an authorization server to control which domain names can be cached and which ones cannot be cached, so as to avoid open proxy situations.
5. Technical means of CDN
The main technical means to realize CDN are high-speed cache and mirror server. It can work in two ways of DNS resolution or HTTP redirection, and completes the transmission and synchronous update of content through the Cache server or remote mirror sites. The accuracy rate of user location judgment by DNS method is greater than 85%, and the accuracy rate of HTTP method is more than 99%. In general, the ratio of the amount of user access inflow data of each cache server group to the amount of data fetched from the cache server to the original website is 2: Between 1 and 3:1, that is, sharing 50% to 70% of the data volume of repeated visits to the original website (mainly pictures, streaming media files, etc.); for mirroring, except for data synchronization traffic, the rest is completed locally. Do not access the origin server.
The mirror site (Mirror Site) server is what we often see. It allows content to be distributed directly, and is suitable for static and quasi-dynamic data synchronization. However, the cost of purchasing and maintaining new servers is relatively high. In addition, mirror servers must be set up in various regions and professional technicians should be assigned to manage and maintain them. While large-scale websites update servers in various places at any time, the demand for bandwidth will also increase significantly, so general Internet companies will not build too many mirror servers.
The cost of caching means is low and suitable for static content. Internet statistics show that more than 80% of users often visit 20% of the website content. Under this rule, the cache server can handle most of the static requests of customers, while the original WWW server only needs to handle about 20% of the content. Non-cache requests and dynamic requests, thus greatly speeding up the response time of client requests and reducing the load on the original WWW server. According to the survey of IDC Corporation of the United States, as an important indicator of CDN - the cache market is growing at a rate of nearly 100% every year, and the global turnover will reach 4.5 billion US dollars in 2004. The development of online streaming media will also stimulate the demand in this market.
6. CDN network architecture
The CDN network architecture is mainly composed of two parts, which are divided into two parts: the center and the edge. The center refers to the CDN network management center and the DNS redirection analysis center, which is responsible for global load balancing. The equipment system is installed in the management center computer room, and the edge mainly refers to remote nodes. CDN distribution The carrier is mainly composed of Cache and load balancer.
When a user visits a website that joins the CDN service, the domain name resolution request will be finally handed over to the global load balancing DNS for processing. Global load balancing DNS provides users with the node address closest to users at that time through a set of pre-defined policies, so that users can get fast services. At the same time, it maintains communication with all CDNC nodes distributed around the world, collects the communication status of each node, and ensures that user requests are not allocated to unavailable CDN nodes. In fact, it does global load balancing through DNS.
For ordinary Internet users, each CDN node is equivalent to a WEB placed around it. Through the control of global load balancing DNS, the user's request is transparently directed to the nearest node, and the CDN server in the node will respond to the user's request just like the original server of the website. Since it is closer to the user, the response time is necessarily faster.
Each CDN node consists of two parts: load balancing device and cache server
The load balancing device is responsible for the load balancing of each Cache in each node to ensure the working efficiency of the node; at the same time, the load balancing device is also responsible for collecting the information of the node and the surrounding environment, maintaining communication with the global load DNS, and realizing the load balancing of the entire system.
The high-speed cache server (Cache) is responsible for storing a large amount of information of the customer's website, and responds to the local user's access request like a web server close to the user.
The CDN management system is the guarantee for the normal operation of the entire system. It can not only monitor each subsystem and equipment in the system in real time, generate corresponding alarms for various faults, but also monitor the total flow in the system and the flow of each node in real time, and save them in the system database. Network administrators can easily conduct further analysis. Through the perfect network management system, users can modify the system configuration.
Theoretically, the simplest CDN network can run with one DNS responsible for global load balancing and one Cache for each node. DNS supports resolving different IPs according to the user's source IP address to achieve nearby access. In order to ensure high availability, etc., it is necessary to monitor the traffic and health status of each node. When the load capacity of a single cache of a node is not enough, multiple caches are needed, and a load balancer is needed when multiple caches work at the same time, so that the cache group can work together.
7. CDN example
The commercialized CDN network is used for the nature of service, and the requirements for high availability are very high. There are professional products and CDN network solutions. This article mainly understands the implementation process of CDN from a theoretical perspective, and uses the existing network environment and open source software to make Actual configuration, a deeper understanding of the specific working process of CDN.
Linux is an open-source free operating system that has been successfully used in many key areas. Bind is a very famous DNS service program on Unix-like platforms such as Unix/FreeBSD/Linux. More than 60% of DNS on the Internet run bind. The latest version of Bind is 9.x, and 8.x is more commonly used. Bind 9 has many new features, one of which is to resolve different IP addresses for the same domain name according to the source address of the client. With this feature, It can guide users' access to the same domain name to servers in different geographical nodes for access. Squid is a well-known Cache engine on Linux and other operating systems. Compared with commercial Cache engines, Squid's performance is relatively low. The basic function and working principle of Squid is consistent with that of commercial Cache products. As a test, it is very easy to configure and run. The following briefly introduces the CDN configuration process.
1. To join a website served by a CDN, a domain name (such as www.linuxaid.com.cn, address 202.99.11.120) is required to provide resolution rights to the CDN operator. For the domain name resolution record of Linuxaid, just change the A record of the www host to CNAME and Just point to cache.cdn.com. cache.cdn.com is the identifier of the cache server customized by the CDN network. In the domain name resolution record of /var/named/linuxaid.com.cn
2. After the CDN operator obtains the right to resolve the domain name, it obtains the CNAME record of the domain name, which points to the domain name of the cache server under the CDN network, such as cache.cdn.com. The global load balancing DNS of the CDN network needs to resolve the CNAME record according to the policy. The IP address is generally the Cache address for the nearest access.
The basic function of Bind 9 can resolve the corresponding IP according to different source IP address segments, and realize load balancing based on the nearest access according to the region. Generally, the sortlist option of Bind 9 can be used to return the nearest node IP address according to the client IP address. Specifically, The process is:
1) Set multiple A records for cache.cdn.com, and the content of /var/named/cdn.com is as follows:
2) The content in /etc/named.conf is:
3. If the Cache works in the server acceleration mode in the CDN network, because the url of the acceleration server has been written in the configuration, the Cache directly matches the user request, obtains the content from the source server and caches it for next use; if the Cache works on the client side In acceleration mode, Cache needs to know the IP address of the source server, so the CDN network maintains and operates a DNS server for Cache to resolve the real IP address of the domain name, such as 202.99.11.120. The resolution records of each domain name are the same as before joining the CDN network .
4. Working in the CDN network, the cache server must work in a transparent manner. For Squid, the following parameters need to be set:
The rapid development of the Internet has brought great convenience to people's work and life, and the requirements for Internet service quality and access speed are getting higher and higher. Although the bandwidth is increasing and the number of users is also increasing, the load of the Web server Influenced by factors such as transmission distance and transmission distance, the slow response speed still often complains and troubles. The solution is to use caching technology in network transmission to make Web service data streams accessible nearby, which is a very effective technology for optimizing network data transmission, so as to obtain high-speed experience and quality assurance.
The purpose of network caching technology is to reduce the repeated transmission of redundant data in the network, minimize it, and convert wide-area transmission to local or nearby access. Most of the content transmitted on the Internet is repeated Web/FTP data. Cache servers and network devices using Caching technology can greatly optimize data link performance and eliminate node device congestion caused by data peak access. The Cache server has a caching function, so most web page objects (Web page objects), such as html, htm, php and other page files, gif, tif, png, bmp and other image files, and files in other formats, are within the validity period (TTL) , for repeated visits, there is no need to retransmit the file entity from the original website, just pass simple authentication (Freshness Validation) - transmit a header of tens of bytes, and the local copy can be directly transmitted to the visitor. Since the cache server is usually deployed close to the client, it can obtain a response speed similar to that of a local area network and effectively reduce the consumption of wide area bandwidth. According to statistics, more than 80% of users on the Internet repeatedly visit 20% of information resources, which provides a prerequisite for the application of caching technology. The architecture of the cache server is different from that of the web server. The cache server can achieve higher performance than the web server. The cache server can not only improve the response speed and save bandwidth, but also is very effective in speeding up the web server and effectively reducing the load on the source server.
Cache Server (Cache Server) is a professional functional server with highly integrated software and hardware. It mainly provides cache acceleration services and is generally deployed at the edge of the network. According to different acceleration objects, it is divided into client acceleration and server acceleration. Client acceleration Cache is deployed at the network egress to cache frequently accessed content locally, improving response speed and saving bandwidth; server acceleration is deploying Cache at the front end of the server as a The front-end processor of the web server improves the performance of the web server and speeds up the access speed. If multiple Cache acceleration servers are distributed in different regions, it is necessary to manage the Cache network through an effective mechanism, guide users to visit nearby, and globally load balance traffic. This is the basic idea of the CDN content delivery network.
2. What are CDNs?
The full name of CDN is Content Delivery Network, that is, content distribution network. Its purpose is to publish the content of the website to the "edge" of the network closest to the user by adding a new layer of network architecture to the existing Internet, so that users can obtain the required content nearby, solve Internet network congestion, and improve The responsiveness of the user's access to the website. Technically, it fully solves the root cause of the slow response speed of users visiting websites due to reasons such as small network bandwidth, large user visits, and uneven distribution of outlets.
In a narrow sense, Content Distribution Network (CDN) is a new type of network construction method, which is a network coverage layer specially optimized for distributing broadband rich media on traditional IP networks; and in a broad sense, CDN represents A network service model based on quality and order. Simply put, Content Publishing Network (CDN) is a strategically deployed overall system, including distributed storage, load balancing, network request redirection, and content management. Content management and global network traffic management ( Traffic Management) is the core of CDN. By judging by user proximity and server load, CDN ensures that content is served to user requests in an extremely efficient manner. In general, the content service is based on the cache server, also known as the proxy cache (Surrogate), which is located at the edge of the network, only "one hop" (Single Hop) away from the user. At the same time, the proxy cache is a transparent mirror of the source server of the content provider (usually located in the data center of the CDN service provider). Such an architecture enables CDN service providers to provide the best possible experience on behalf of their customers, content providers, to end users who cannot tolerate any delay in request response time. According to statistics, the use of CDN technology can handle 70% to 95% of the content visits of the entire website page, reduce the pressure on the server, and improve the performance and scalability of the website.
Compared with the current content release mode, CDN emphasizes the importance of the network in content release. By introducing active content management layer and global load balancing, CDN is fundamentally different from the traditional content distribution mode. In the traditional content release mode, the release of content is completed by the application server of ICP, and the network is only a transparent data transmission channel. Differentiate the quality of service according to different content objects. In addition, due to the "best-effort" feature of the IP network, its quality assurance is realized by providing sufficient bandwidth throughput between the user and the application server end-to-end, which is much larger than the actual need. In such a content publishing mode, not only a large amount of valuable backbone bandwidth is occupied, but also the load on the ICP application server becomes very heavy and unpredictable. When some hotspot events and surge traffic occur, a local hotspot effect will be generated, causing the application server to be overloaded and out of service. Another defect of this central application server-based content distribution model is the lack of personalized services and the distortion of the broadband service value chain. Content providers undertake content distribution services that they should not and cannot do well.
Looking at the entire value chain of broadband services, content providers and users are located at both ends of the entire value chain, with network service providers connecting them in the middle. With the maturity of the Internet industry and the transformation of business models, the roles in this value chain are becoming more and more subdivided. Such as content/application operators, hosting service providers, backbone network service providers, access service providers and so on. Every role in this value chain must cooperate with each other and perform their duties in order to provide customers with good services, thus bringing about a win-win situation. From the point of view of the combination mode of content and network, content publishing has gone through the two stages of ICP content (application) server and IDC. The IDC craze has also spawned the role of managed service provider. However, IDC does not solve the problem of effective distribution of content. The fact that the content is located in the center of the network cannot solve the occupation of the backbone bandwidth and establish the traffic order on the IP network. Therefore, it is an obvious choice to push content to the edge of the network and provide users with edge services that are close to each other, so as to ensure the quality of service and access order on the entire network. And this is the content delivery network (CDN) service model. The establishment of CDN solves the dilemma of content "centralization and decentralization" that plagues content operators. Undoubtedly, it is valuable and indispensable for building a good Internet value chain.
3. CDN New Applications and Clients
The current CDN services are mainly used in the fields of securities, finance and insurance, ISP, ICP, online transactions, portal websites, large and medium-sized companies, and online teaching. In addition, it can be used in industry private networks and the Internet, and can even be used for network optimization of local area networks. Using CDN, these websites do not need to invest in expensive various servers and set up sub-sites, especially the wide application of streaming media information, distance teaching courseware and other media information that consumes a lot of bandwidth resources, and use CDN network to copy the content to the edge of the network It is of great significance to minimize the distance between the content request point and the delivery point, thereby promoting the improvement of the performance of the Web site. The construction of CDN network mainly includes the CDN network built by enterprises to serve enterprises; the CDN network of IDC mainly serves IDC and value-added services; the CDN network mainly built in network operation mainly provides content push services; The built CDN is used for services. Users cooperate with the CDN organization. The CDN is responsible for information transmission, ensuring the normal transmission of information, and maintaining the transmission network. The website only needs content maintenance and no longer needs to consider traffic issues.
CDN can provide guarantees for the speed, security, stability, and scalability of the network.
IDC establishes a CDN network. IDC operators generally need to have multiple IDC centers in various locations. The service targets are customers hosted in the IDC center. Using existing network resources, the investment is small and easy to build. For example, an IDC has 10 computer rooms across the country. Joining the IDC's CDN network and hosting a web server on one node is equivalent to having 10 mirror servers for customers to visit nearby. Broadband metropolitan area network, the intra-domain network speed is very fast, and the bandwidth outside the city will generally be a bottleneck. In order to reflect the high-speed experience of the metropolitan area network, the solution is to cache the content on the Internet locally and deploy the Cache on each POP point of the metropolitan area network In this way, an efficient and orderly network is formed, and users can access most of the content with only one hop. This is also an application to accelerate all website CDNs.
4. How CDNs work
Before describing the implementation principle of CDN, let us first look at the access process of the traditional uncached service, so as to understand the difference between the CDN cached access method and the uncached access method:
As can be seen from the figure above, the process for a user to access a website that does not use a CDN cache is:
1) The user provides the browser with the domain name to be accessed;
2) The browser invokes the domain name resolution function library to resolve the domain name to obtain the IP address corresponding to the domain name;
3), the browser uses the obtained IP address, and the service host of the domain name sends a data access request;
4). The browser displays the content of the webpage according to the data returned by the domain name host.
Through the above four steps, the browser completes the entire process from receiving the domain name that the user wants to visit to obtaining data from the domain name service host. The CDN network is to add a Cache layer between the user and the server. How to guide the user's request to the Cache to obtain the data of the source server is mainly realized by taking over the DNS. Let us look at the process of accessing the website after using the CDN cache:
From the above figure, we can know that the access process of the website after using the CDN cache becomes:
1) The user provides the browser with the domain name to be accessed;
2) The browser invokes the domain name resolution library to resolve the domain name. Since the CDN has adjusted the domain name resolution process, the resolution library generally obtains the CNAME record corresponding to the domain name. In order to obtain the actual IP address, the browser needs to The obtained CNAME domain name is resolved to obtain the actual IP address; in this process, the global load balancing DNS resolution is used, such as the corresponding IP address is resolved according to the geographical location information, so that the user can visit nearby.
3) The IP address of the CDN cache server is obtained through this analysis, and the browser sends an access request to the cache server after obtaining the actual IP address;
4) According to the domain name to be accessed provided by the browser, the cache server obtains the actual IP address of the domain name through the internal dedicated DNS analysis of the Cache, and then the cache server submits an access request to the actual IP address;
5) After the cache server obtains the content from the actual IP address, on the one hand, it saves it locally for future use, and on the other hand, it returns the acquired data to the client to complete the data service process;
6). After the client obtains the data returned by the cache server, it will be displayed and the entire browsing data request process will be completed.
Through the above analysis, we can get that, in order to be transparent to ordinary users (that is, after adding the cache, the user client does not need to make any settings, and can directly use the original domain name of the accelerated website to access), but also to provide the designated website To provide acceleration services while reducing the impact on ICP, just modify the domain name resolution part of the entire access process to achieve transparent acceleration services. The following is the specific operation process of CDN network implementation.
1) As an ICP, it only needs to hand over the right to interpret the domain name to the CDN operator, and does not need to make any changes in other aspects; when operating, the ICP modifies the resolution record of its own domain name, and generally uses the cname method to point to the address of the CDN network Cache server.
2) As a CDN operator, it first needs to provide public resolution for the ICP domain name. In order to implement the sortlist, it generally points the ICP domain name interpretation result to a CNAME record;
3) When a sorlist is required, the CDN operator can use DNS to perform special processing on the domain name resolution process pointed to by the CNAME, so that the DNS server can return different IP addresses of the same domain name according to the client's IP address when receiving a client request ;
4) Since the IP address obtained from the cname has hostname information, after the request reaches the Cache, the Cache must know the IP address of the source server, so an internal DNS server is maintained within the CDN operator to explain what the user is visiting The real IP address of the domain name;
5) When maintaining the internal DNS server, it is also necessary to maintain an authorization server to control which domain names can be cached and which ones cannot be cached, so as to avoid open proxy situations.
5. Technical means of CDN
The main technical means to realize CDN are high-speed cache and mirror server. It can work in two ways of DNS resolution or HTTP redirection, and completes the transmission and synchronous update of content through the Cache server or remote mirror sites. The accuracy rate of user location judgment by DNS method is greater than 85%, and the accuracy rate of HTTP method is more than 99%. In general, the ratio of the amount of user access inflow data of each cache server group to the amount of data fetched from the cache server to the original website is 2: Between 1 and 3:1, that is, sharing 50% to 70% of the data volume of repeated visits to the original website (mainly pictures, streaming media files, etc.); for mirroring, except for data synchronization traffic, the rest is completed locally. Do not access the origin server.
The mirror site (Mirror Site) server is what we often see. It allows content to be distributed directly, and is suitable for static and quasi-dynamic data synchronization. However, the cost of purchasing and maintaining new servers is relatively high. In addition, mirror servers must be set up in various regions and professional technicians should be assigned to manage and maintain them. While large-scale websites update servers in various places at any time, the demand for bandwidth will also increase significantly, so general Internet companies will not build too many mirror servers.
The cost of caching means is low and suitable for static content. Internet statistics show that more than 80% of users often visit 20% of the website content. Under this rule, the cache server can handle most of the static requests of customers, while the original WWW server only needs to handle about 20% of the content. Non-cache requests and dynamic requests, thus greatly speeding up the response time of client requests and reducing the load on the original WWW server. According to the survey of IDC Corporation of the United States, as an important indicator of CDN - the cache market is growing at a rate of nearly 100% every year, and the global turnover will reach 4.5 billion US dollars in 2004. The development of online streaming media will also stimulate the demand in this market.
6. CDN network architecture
The CDN network architecture is mainly composed of two parts, which are divided into two parts: the center and the edge. The center refers to the CDN network management center and the DNS redirection analysis center, which is responsible for global load balancing. The equipment system is installed in the management center computer room, and the edge mainly refers to remote nodes. CDN distribution The carrier is mainly composed of Cache and load balancer.
When a user visits a website that joins the CDN service, the domain name resolution request will be finally handed over to the global load balancing DNS for processing. Global load balancing DNS provides users with the node address closest to users at that time through a set of pre-defined policies, so that users can get fast services. At the same time, it maintains communication with all CDNC nodes distributed around the world, collects the communication status of each node, and ensures that user requests are not allocated to unavailable CDN nodes. In fact, it does global load balancing through DNS.
For ordinary Internet users, each CDN node is equivalent to a WEB placed around it. Through the control of global load balancing DNS, the user's request is transparently directed to the nearest node, and the CDN server in the node will respond to the user's request just like the original server of the website. Since it is closer to the user, the response time is necessarily faster.
Each CDN node consists of two parts: load balancing device and cache server
The load balancing device is responsible for the load balancing of each Cache in each node to ensure the working efficiency of the node; at the same time, the load balancing device is also responsible for collecting the information of the node and the surrounding environment, maintaining communication with the global load DNS, and realizing the load balancing of the entire system.
The high-speed cache server (Cache) is responsible for storing a large amount of information of the customer's website, and responds to the local user's access request like a web server close to the user.
The CDN management system is the guarantee for the normal operation of the entire system. It can not only monitor each subsystem and equipment in the system in real time, generate corresponding alarms for various faults, but also monitor the total flow in the system and the flow of each node in real time, and save them in the system database. Network administrators can easily conduct further analysis. Through the perfect network management system, users can modify the system configuration.
Theoretically, the simplest CDN network can run with one DNS responsible for global load balancing and one Cache for each node. DNS supports resolving different IPs according to the user's source IP address to achieve nearby access. In order to ensure high availability, etc., it is necessary to monitor the traffic and health status of each node. When the load capacity of a single cache of a node is not enough, multiple caches are needed, and a load balancer is needed when multiple caches work at the same time, so that the cache group can work together.
7. CDN example
The commercialized CDN network is used for the nature of service, and the requirements for high availability are very high. There are professional products and CDN network solutions. This article mainly understands the implementation process of CDN from a theoretical perspective, and uses the existing network environment and open source software to make Actual configuration, a deeper understanding of the specific working process of CDN.
Linux is an open-source free operating system that has been successfully used in many key areas. Bind is a very famous DNS service program on Unix-like platforms such as Unix/FreeBSD/Linux. More than 60% of DNS on the Internet run bind. The latest version of Bind is 9.x, and 8.x is more commonly used. Bind 9 has many new features, one of which is to resolve different IP addresses for the same domain name according to the source address of the client. With this feature, It can guide users' access to the same domain name to servers in different geographical nodes for access. Squid is a well-known Cache engine on Linux and other operating systems. Compared with commercial Cache engines, Squid's performance is relatively low. The basic function and working principle of Squid is consistent with that of commercial Cache products. As a test, it is very easy to configure and run. The following briefly introduces the CDN configuration process.
1. To join a website served by a CDN, a domain name (such as www.linuxaid.com.cn, address 202.99.11.120) is required to provide resolution rights to the CDN operator. For the domain name resolution record of Linuxaid, just change the A record of the www host to CNAME and Just point to cache.cdn.com. cache.cdn.com is the identifier of the cache server customized by the CDN network. In the domain name resolution record of /var/named/linuxaid.com.cn
2. After the CDN operator obtains the right to resolve the domain name, it obtains the CNAME record of the domain name, which points to the domain name of the cache server under the CDN network, such as cache.cdn.com. The global load balancing DNS of the CDN network needs to resolve the CNAME record according to the policy. The IP address is generally the Cache address for the nearest access.
The basic function of Bind 9 can resolve the corresponding IP according to different source IP address segments, and realize load balancing based on the nearest access according to the region. Generally, the sortlist option of Bind 9 can be used to return the nearest node IP address according to the client IP address. Specifically, The process is:
1) Set multiple A records for cache.cdn.com, and the content of /var/named/cdn.com is as follows:
2) The content in /etc/named.conf is:
3. If the Cache works in the server acceleration mode in the CDN network, because the url of the acceleration server has been written in the configuration, the Cache directly matches the user request, obtains the content from the source server and caches it for next use; if the Cache works on the client side In acceleration mode, Cache needs to know the IP address of the source server, so the CDN network maintains and operates a DNS server for Cache to resolve the real IP address of the domain name, such as 202.99.11.120. The resolution records of each domain name are the same as before joining the CDN network .
4. Working in the CDN network, the cache server must work in a transparent manner. For Squid, the following parameters need to be set:
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00