CDN | Principles Caching and Security

"My Opinion on CDN" consists of three chapters, divided into principles, detailed explanations and craters. This chapter is suitable for students who have never been in contact with or only understand some CDN terminology, and want to understand and feel what CDN is. This time, Mr. Platinum will continue to share with you the second series of "My Views on CDN", mainly explaining what cache is, how it works, and how CDN faces security challenges.

First of all, let's talk about the CDN's caching system.

Cache, also called Cache, is the most important part of the CDN field. For static content distribution, it can be cached locally, ranging from video-on-demand, file downloads, to html pages, jpg/gif/png images, css/js, etc. , when end users (netizens) visit, they don't have to go back to the original server to get data, which saves time and reduces the network bandwidth expenditure and load of the origin server.

For CDN, the daily request volume of each Cache machine is astonishing, and the amount of content stored on the disk is also terrifying. When a request comes, it is very important to be able to quickly retrieve the files on the disk and read and spit them out to the end user.

How to quickly retrieve data?

Generally, efficient data structures or algorithms are used, such as Hash (hash).


Simply put, the Hash algorithm is to take the remainder through some kind of modulo operation, and use the finite remainder as the coordinates of different linked lists for data storage. If there are multiple data with the same remainder, then store the data in the form of a singly linked list.

In theory, in the best case, the Hash algorithm is no different from map, with a time complexity of O(1).

In the worst case (all data samples get the same remainder after Hash operation), the performance is the same as that of a singly linked list, and the search time complexity is O(n).

What if you run out of disk space?

Use content elimination algorithms based on access popularity, such as FIFO, LRU, LFU, SLRU, LIRS, etc. Interested students can read the recommended articles:

Although a SATA disk has a large storage space, its IOPS is often low due to the limitation of the seek speed, and the response time will be prolonged. In contrast, SSD can greatly improve the response speed, but due to the limitation of cost and technical process, the space of SSD disk is much smaller and more expensive than SATA. Combining the above two situations, the hierarchical storage mode of hybrid disk was born, that is, SSD + SATA. Through the heat algorithm, the hottest content is placed in memory, the second hottest content is placed in SSD, and the coldest data is placed in SATA. The coldest data is limited by disk space. Restrictions are not stored, and are obtained directly from upstream or source sites.

When netizens access the Cache server, the Cache server will use the Hash algorithm to find out whether there is cached data locally. If there is, it will read it directly, organize it into HTTP message content, and spit it out to netizens.

Similarly, if there is no cached content locally, Cache will take data from its upstream (upstream) and spit it out to netizens, and decide whether/how to cache it according to the description of the content (such as whether it can be cached and how long it needs to be cached) The content of this request.

The key technology of CDN is scheduling and caching. There are many open source software that can meet the basic needs of CDN, but the performance is far from meeting commercial needs. To really build a high-performance Cache system, it is not only necessary to go deep into the data structure and algorithm level, but also to do more research in multiple fields such as hardware, operating system, file system, and underlying principles (for example, Alibaba Cloud’s self-developed CDN Cache The system uses bare disk technology).
From the above content, we can roughly see a truth: CDN, easy to get started, difficult to do well!

The last part of the principle article briefly introduces the security of CDN

General attacks are divided into two types, one is brute force attack and the other is skill attack.
For example, SYN-Flooding, reflection attacks, bandwidth saturation attacks, etc., all use up massive requests (some even useless requests) to exhaust system bandwidth resources and computing resources, so as to make the target unable to provide normal service capabilities. achieve the attacker's attack intent.

This attack is characterized by:

*The source of the attack is widely distributed, and the attack is persistent, so it is difficult to accurately trace the source
*The attack method is violent and the traffic is large, and the main purpose is to achieve short-term service failure
*The technical threshold is low, but due to the need for many resources, the attack cost is high

For example, XSS cross-site, SQL injection, CSRF, vulnerability exploitation, etc., are all systems provided through the website (for example, the old version of the operating system has a protocol stack crash vulnerability), components (such as bash, OpenSSL, Struts2 have remote execution vulnerabilities), software Logical vulnerabilities (such as SQL injection vulnerabilities that cause SQL statements to be assembled due to flaws in input box detection) to attack.

This attack is characterized by:

*The attack flow is small
* The attack is relatively hidden and difficult to be detected
* with the main purpose of stealing content
*Attack has certain damage and irreversibility
* No need for a large number of attack resources, low attack cost and high technical content

Can security risks be avoided after accessing a CDN?

Yes, or to a certain extent, it is possible to solve or avoid security risks as much as possible.

1. Hide source site information

When a website is connected to the CDN, the source site information will be isolated, and it is difficult for the outside world to find the real source site address (it is difficult here, but it is actually possible, mainly depending on the security awareness of the source site), so that the attacker At least it is impossible to directly attack the source site directly.

2. Distributed architecture

If it is not possible to directly attack the source site, the actual target of the attacker is not the source site, but the distributed architecture of the CDN manufacturer. At this time, unless all servers of all CDN nodes of the CDN manufacturer are compromised, the Some servers can provide services.

3. Safety protection

For skill-based attacks, the front-end of the CDN manufacturer can analyze and intercept possible security risks through the technology of pre-application layer filters. (For example, Alibaba Cloud's SCDN is a new product that integrates Security and CDN)

As the final conclusion of "Series II", I would like to emphasize that in addition to technology, CDN also has a series of components and teams such as background support system, after-sales, and sales to work together in order to have safe, efficient, stable, and excellent users. CDN products with a sense of experience.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us