CDN content push network
1. What is a CDN?
The full name of CDN is Content Delivery Network, that is, content distribution network. Its purpose is to publish the content of the website to the "edge" (edge server) of the network closest to the user by adding a new layer of network architecture to the existing Internet, so that users can obtain the required content nearby and solve the problem of Internet network problems. Crowded conditions, improve the response speed of users visiting the website. Technically, it fully solves the problem of slow response speed of users visiting websites caused by reasons such as small network bandwidth, large number of user visits, and uneven distribution of outlets.
2. Principle of CDN technology
CDN technology is an effective means to solve the problem of poor Internet performance that first emerged in the United States and developed rapidly. The basic idea is to avoid bottlenecks and links on the Internet that may affect the speed and stability of data transmission as much as possible, so as to make content transmission faster and more stable. By placing a layer of intelligent virtual network based on the existing Internet based on node servers placed everywhere in the network, the CDN system can real-time according to the network traffic and the connection of each node, the load status, the distance to the user and the response time and other comprehensive information to redirect the user's request to the service node closest to the user.
The CDN network architecture is mainly composed of two parts, which are divided into two parts: the center and the edge. The center refers to the CDN network management center and the DNS redirection analysis center, which is responsible for global load balancing. The equipment system is installed in the management center computer room; the edge mainly refers to remote nodes, and CDN distribution The carrier is mainly composed of Cache and load balancer.
In fact, CDN is a new type of network construction method. It is a network coverage layer specially optimized for the release of broadband rich media on traditional IP networks; and from a broad perspective, CDN represents a quality and order-based Web service mode. Simply put, CDN is a strategically deployed overall system, including four elements of distributed storage, load balancing, network request redirection and content management.
2.1. Distributed storage
This is obvious. The CDN network distributes storage resources to various geographical locations and network segments. As an integral part of the CDN system, the storage system stores the files and database table records distributed by the CDN to provide continuous services. The storage system adopts a three-level storage architecture, including core storage, CDN service node distributed cache and terminal local cache. Storage crash or failure at any point does not affect the availability of system services.
For example, the CDN system has CDN nodes in five major operators (China Telecom, China Netcom, China Railcom, China Mobile, China Unicom) and two proprietary networks (China Education and Research Computer Network, China Science and Technology Network). In this way, the influence caused by the bottleneck of interconnection between different operators is eliminated, network acceleration across operators is realized, and users in different networks can be guaranteed to have good access quality.
2.2. Content Management
Content management and global network traffic management (Traffic Management) are the core of CDN. By judging by user proximity and server load, CDN ensures that content is served to user requests in an extremely efficient manner. In general, the content service is based on the cache server, also known as the proxy cache (Surrogate), which is located at the edge of the network, only "one hop" (Single Hop) away from the user. At the same time, the proxy cache is a transparent mirror of the source server of the content provider (usually located in the data center of the CDN service provider). Such an architecture enables CDN service providers to provide the best possible experience on behalf of their customers, content providers, to end users who cannot tolerate any delay in request response time. According to statistics, the use of CDN technology can handle 70% to 95% of the content visits of the entire website page, reduce the pressure on the server, and improve the performance and scalability of the website.
2.3. Load balancing
The CDN load balancing system implements the content routing function of the CDN. Its role is to direct user requests to the best node in the entire CDN network. The selection of the best node can be based on various strategies, such as the shortest distance, the lightest node load and so on. The load balancing system is the core of the entire CDN, and the accuracy and efficiency of load balancing directly determine the efficiency and performance of the entire CDN. Generally, load balancing can be divided into two levels: global load balancing (GSLB) and local load balancing (SLB).
l The main purpose of Global Load Balancing (GSLB) is to direct user requests to the nearest node (or area) within the entire network. Therefore, proximity judgment is the main function of global load balancing.
l Local load balancing is generally limited to a certain area, and its goal is to find the most suitable node to provide services in a specific area. Therefore, the health, load, and supported media formats of CDN nodes and other running status It is the main basis for local load balancing to make decisions.
Load balancing can be achieved through a variety of methods, the main methods include DNS, application layer redirection, transport layer redirection and so on. For global load balancing, in order to perform proximity judgment, generally two methods can be adopted, one is static configuration, for example, the mapping from IP address to CDN node is performed according to a static IP address configuration table. Another way is dynamic detection, such as allowing CDN nodes to detect the distance to the target IP in real time (RRT, Hops can be used as the measurement unit), and then compare the detection results for load balancing. Of course, static and dynamic methods can also be used in combination.
For local load balancing, in order to implement effective decision-making, it is necessary to obtain the running status of the Cache device in real time. There are generally two ways to obtain, one is active detection, and the other is protocol interaction. Active detection is aimed at the situation where the SLB device and the Cache device do not have a protocol interaction interface. Active detection is initiated through commands such as ping, and the status is analyzed according to the returned results. The other is protocol interaction, that is, SLB and Cache exchange running status information in real time according to a pre-defined protocol for load balancing. Comparatively speaking, the protocol interaction is more accurate and reliable than the detection method, but there is no standard protocol at present, and the implementation of each manufacturer is generally only a private protocol, and the intercommunication is relatively difficult.
2.4. Redirection of network requests
When a user accesses a resource using CDN services, the DNS domain name server redirects the final domain name request to the intelligent DNS load balancing system in the CDN system through CNAME. The intelligent DNS load balancing system provides users with the node address that can respond to users the fastest at that time through a set of predefined policies (such as content type, geographical area, network load status, etc.), so that users can get fast services.
At the same time, it also maintains communication with all CDN nodes distributed in different locations, collects the health status of each node, and ensures that the user's request is not allocated to any node that is no longer available.
3. CDN resource access process
After using the CDN service, the user's access process is shown in the following figure:
The user provides the domain name of the website to be accessed to the browser, and the domain name resolution request is sent to the DNS domain name resolution server of the website;
Since the DNS domain name resolution server of the website has set a CNAME for the resolution of this domain name, the request is directed to the intelligent DNS load balancing system in the CDN network;
The intelligent DNS load balancing system intelligently analyzes the domain name and returns the node IP with the fastest response speed to the user;
After the browser obtains the IP address of the fastest node, it sends an access request to the CDN node;
Since it is the first visit, the CDN node will return to the source site to fetch the data requested by the user and send it to the user;
When other users access the same content again, the CDN will directly return the data to the client to complete the request/service process.
At the same time, it also maintains communication with all CDN nodes distributed in different locations, collects the health status of each node, and ensures that the user's request is not allocated to any node that is no longer available.
4. Questions about CDN
4.1. After using CDN, how to obtain the real IP of the client?
The original IP address obtained after using the CDN service has changed. Because the request of the source server comes from the CDN node, the original IP of the client is not known. Generally speaking, the CDN node will pass the IP of the source client to the source server in some way. For example, in some Wangsu CDNs, it adds the source IP to an Http Header called "Cdn-Src-Ip". Take the C# language as an example, the way to obtain the real IP of the client is as follows: (It may vary according to different CDN providers).
Or get it from the "HTTP_X_FORWARDED_FOR" field in the HTTP request header. The value of HTTP_X_FORWARDED_FOR is: "original real IP, layer 1 proxy IP, layer 2 proxy IP,...". Determine whether HTTP_X_FORWARDED_FOR is empty (if no CDN is used or the proxy field is empty), if it is not empty, use this as the IP address. This is a simple and effective method. As for very complicated situations, it will not be dealt with, such as multi-layer agents.
4.2. How to ensure the update and synchronization of content after adopting CDN service?
In the CDN service, the synchronization of the content in the CDN node and the content of the source website is mainly realized by setting the refresh time policy. Set different refresh times for different content. For content with low update frequency, you can set a longer refresh time to reduce the access pressure on the source station; for frequently updated content, we can set it to 10 minutes or less, so as to This is to ensure the synchronization of the content. The refresh time can be set by directory, by specific URL, or by a key field, which is very flexible.
The full name of CDN is Content Delivery Network, that is, content distribution network. Its purpose is to publish the content of the website to the "edge" (edge server) of the network closest to the user by adding a new layer of network architecture to the existing Internet, so that users can obtain the required content nearby and solve the problem of Internet network problems. Crowded conditions, improve the response speed of users visiting the website. Technically, it fully solves the problem of slow response speed of users visiting websites caused by reasons such as small network bandwidth, large number of user visits, and uneven distribution of outlets.
2. Principle of CDN technology
CDN technology is an effective means to solve the problem of poor Internet performance that first emerged in the United States and developed rapidly. The basic idea is to avoid bottlenecks and links on the Internet that may affect the speed and stability of data transmission as much as possible, so as to make content transmission faster and more stable. By placing a layer of intelligent virtual network based on the existing Internet based on node servers placed everywhere in the network, the CDN system can real-time according to the network traffic and the connection of each node, the load status, the distance to the user and the response time and other comprehensive information to redirect the user's request to the service node closest to the user.
The CDN network architecture is mainly composed of two parts, which are divided into two parts: the center and the edge. The center refers to the CDN network management center and the DNS redirection analysis center, which is responsible for global load balancing. The equipment system is installed in the management center computer room; the edge mainly refers to remote nodes, and CDN distribution The carrier is mainly composed of Cache and load balancer.
In fact, CDN is a new type of network construction method. It is a network coverage layer specially optimized for the release of broadband rich media on traditional IP networks; and from a broad perspective, CDN represents a quality and order-based Web service mode. Simply put, CDN is a strategically deployed overall system, including four elements of distributed storage, load balancing, network request redirection and content management.
2.1. Distributed storage
This is obvious. The CDN network distributes storage resources to various geographical locations and network segments. As an integral part of the CDN system, the storage system stores the files and database table records distributed by the CDN to provide continuous services. The storage system adopts a three-level storage architecture, including core storage, CDN service node distributed cache and terminal local cache. Storage crash or failure at any point does not affect the availability of system services.
For example, the CDN system has CDN nodes in five major operators (China Telecom, China Netcom, China Railcom, China Mobile, China Unicom) and two proprietary networks (China Education and Research Computer Network, China Science and Technology Network). In this way, the influence caused by the bottleneck of interconnection between different operators is eliminated, network acceleration across operators is realized, and users in different networks can be guaranteed to have good access quality.
2.2. Content Management
Content management and global network traffic management (Traffic Management) are the core of CDN. By judging by user proximity and server load, CDN ensures that content is served to user requests in an extremely efficient manner. In general, the content service is based on the cache server, also known as the proxy cache (Surrogate), which is located at the edge of the network, only "one hop" (Single Hop) away from the user. At the same time, the proxy cache is a transparent mirror of the source server of the content provider (usually located in the data center of the CDN service provider). Such an architecture enables CDN service providers to provide the best possible experience on behalf of their customers, content providers, to end users who cannot tolerate any delay in request response time. According to statistics, the use of CDN technology can handle 70% to 95% of the content visits of the entire website page, reduce the pressure on the server, and improve the performance and scalability of the website.
2.3. Load balancing
The CDN load balancing system implements the content routing function of the CDN. Its role is to direct user requests to the best node in the entire CDN network. The selection of the best node can be based on various strategies, such as the shortest distance, the lightest node load and so on. The load balancing system is the core of the entire CDN, and the accuracy and efficiency of load balancing directly determine the efficiency and performance of the entire CDN. Generally, load balancing can be divided into two levels: global load balancing (GSLB) and local load balancing (SLB).
l The main purpose of Global Load Balancing (GSLB) is to direct user requests to the nearest node (or area) within the entire network. Therefore, proximity judgment is the main function of global load balancing.
l Local load balancing is generally limited to a certain area, and its goal is to find the most suitable node to provide services in a specific area. Therefore, the health, load, and supported media formats of CDN nodes and other running status It is the main basis for local load balancing to make decisions.
Load balancing can be achieved through a variety of methods, the main methods include DNS, application layer redirection, transport layer redirection and so on. For global load balancing, in order to perform proximity judgment, generally two methods can be adopted, one is static configuration, for example, the mapping from IP address to CDN node is performed according to a static IP address configuration table. Another way is dynamic detection, such as allowing CDN nodes to detect the distance to the target IP in real time (RRT, Hops can be used as the measurement unit), and then compare the detection results for load balancing. Of course, static and dynamic methods can also be used in combination.
For local load balancing, in order to implement effective decision-making, it is necessary to obtain the running status of the Cache device in real time. There are generally two ways to obtain, one is active detection, and the other is protocol interaction. Active detection is aimed at the situation where the SLB device and the Cache device do not have a protocol interaction interface. Active detection is initiated through commands such as ping, and the status is analyzed according to the returned results. The other is protocol interaction, that is, SLB and Cache exchange running status information in real time according to a pre-defined protocol for load balancing. Comparatively speaking, the protocol interaction is more accurate and reliable than the detection method, but there is no standard protocol at present, and the implementation of each manufacturer is generally only a private protocol, and the intercommunication is relatively difficult.
2.4. Redirection of network requests
When a user accesses a resource using CDN services, the DNS domain name server redirects the final domain name request to the intelligent DNS load balancing system in the CDN system through CNAME. The intelligent DNS load balancing system provides users with the node address that can respond to users the fastest at that time through a set of predefined policies (such as content type, geographical area, network load status, etc.), so that users can get fast services.
At the same time, it also maintains communication with all CDN nodes distributed in different locations, collects the health status of each node, and ensures that the user's request is not allocated to any node that is no longer available.
3. CDN resource access process
After using the CDN service, the user's access process is shown in the following figure:
The user provides the domain name of the website to be accessed to the browser, and the domain name resolution request is sent to the DNS domain name resolution server of the website;
Since the DNS domain name resolution server of the website has set a CNAME for the resolution of this domain name, the request is directed to the intelligent DNS load balancing system in the CDN network;
The intelligent DNS load balancing system intelligently analyzes the domain name and returns the node IP with the fastest response speed to the user;
After the browser obtains the IP address of the fastest node, it sends an access request to the CDN node;
Since it is the first visit, the CDN node will return to the source site to fetch the data requested by the user and send it to the user;
When other users access the same content again, the CDN will directly return the data to the client to complete the request/service process.
At the same time, it also maintains communication with all CDN nodes distributed in different locations, collects the health status of each node, and ensures that the user's request is not allocated to any node that is no longer available.
4. Questions about CDN
4.1. After using CDN, how to obtain the real IP of the client?
The original IP address obtained after using the CDN service has changed. Because the request of the source server comes from the CDN node, the original IP of the client is not known. Generally speaking, the CDN node will pass the IP of the source client to the source server in some way. For example, in some Wangsu CDNs, it adds the source IP to an Http Header called "Cdn-Src-Ip". Take the C# language as an example, the way to obtain the real IP of the client is as follows: (It may vary according to different CDN providers).
Or get it from the "HTTP_X_FORWARDED_FOR" field in the HTTP request header. The value of HTTP_X_FORWARDED_FOR is: "original real IP, layer 1 proxy IP, layer 2 proxy IP,...". Determine whether HTTP_X_FORWARDED_FOR is empty (if no CDN is used or the proxy field is empty), if it is not empty, use this as the IP address. This is a simple and effective method. As for very complicated situations, it will not be dealt with, such as multi-layer agents.
4.2. How to ensure the update and synchronization of content after adopting CDN service?
In the CDN service, the synchronization of the content in the CDN node and the content of the source website is mainly realized by setting the refresh time policy. Set different refresh times for different content. For content with low update frequency, you can set a longer refresh time to reduce the access pressure on the source station; for frequently updated content, we can set it to 10 minutes or less, so as to This is to ensure the synchronization of the content. The refresh time can be set by directory, by specific URL, or by a key field, which is very flexible.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00