Technical Knowhow: Five ways to promote website performance by static caching
Created#More Posted time:Apr 17, 2017 14:47 PM
I myself am a geek, inclined to experience/knowhow sharing. In this article, I primarily describe and share some experience in static caching. I am prepared to accept your advice for any deficiency in this article.
Speaking of static cache technology, CDN is classic. Static cache technology covers a broad range of technical facets, concerning Apache, Lighttpd, nginx, varnish, squid and other open-source technologies.
Static cache generally refers to a caching technology in a web app in which an image, JS, CSS, video, HTML or any other static file/resource is cached in the disk/memory to improve the mode of resource response and reduce server pressure/resource consumption.
In this article, static cache is elaborated in five aspects: browser cache, disk cache, memory cache, nginx memory cache and CDN.
Browser cache, also called client cache, is the most common and direct presentation in static cache, which is frequently neglected.
The following cache configuration may be found in nginx configuration file:
In JSP writing, you may also notice the word of "expires" in HTML tag concerning HTTP head information:
In Cases 1 and 2 (priority of expires set in nginx is higher than that set in the code), expires refers to a time of expiration set for resources, meaning that the browser confirms by itself whether it expires instead of server verification. As a result, no extra traffic arises. This method is very suitable for resources that are not frequently changed.If files are more changed, cache by expires will not be recommended.
For common web website for instance, CSS style and JS script are basically stereotyped. Thus the most suitable way is caching some content by expires to the visitor’s browser.
Access a picture on the server by Chrome, and open developer front-end debugging tool by F12:
In the first access, the response received a 200 status. In the second and subsequent accesses, it turns to 304. The client already starts to acquire the content in the browser cache instead of acquiring corresponding requested content from the server. In other words, expires parameter settings in nginx already come into effect. After the client cache expires, it will request server content again to update local cache.
By the way, an interest demand occurs to me. For instance, in the access to a static file, it is necessary to get data on the server each time if the client cache is not used. It may be implemented by the "last-modified" parameter. In other words, "last-modified" determines whether to send the loading request again based on the time of file update.
The core configuration of nginx is as follows:
By altering the parameter value of time of revision of the "last-modified" file sent back from the server to the client, the time of file locally saved on the client differs from the time of file sent back from server every time. As a result, every time when the client "mistakenly judges" that there is an update in the static file at the server, it will acquire "the so-called latest data" from the server. Thus, it is observed that returned HTTP status is 200 and 304 is not found any more no matter how many times the browser is accessed.
Misunderstanding: Setting expires in nginx refers to setting the time of client cache at the browser instead of caching static content in nginx, which is a common misunderstanding.
II. Disk cache
Static cache technology at server is primarily classified to disk cache and memory cache in addition to the static cache (browser static) technology saved at the client. Middleware for nginx only such as squid and varnish shows very good performance in processing static data. The core is nginx epoll-based network model, in contrast to the Apache select-based network model. So Apache is good at its intensive computation and good stability. Whereas nginx is inclined to static processing, reverse proxy and high concurrency. For instance, Apache+PHP have better stability than nginx+PHP and performance is clearly much better than nginx.
The above solely refers to the capacity of static data processing in the disk and the so-called disk cache refers to another technology of caching static files. nginx configuration is taken as an example:
It is observed that nginx primarily realizes web cache by proxy_cache. It is not hard for those familiar with nginx to find out that by configuration in location, both static files and dynamic files are cached (to be elaborated in the following section). A test.html test file is prepared for access. The test.html source code is as follows:
Two more cache files are found in the cache directory on the server:
Interestingly, the two files have the following content (view by less command):
Thus, it is not hard to find out that nginx caches HTML content and images in binary formats onto the local disk. In the next user access to test.html, nginx directly returns files cached in the local disk to user. Particularly if Tomcat, or IIS is deployed at the rear end, nginx's powerful static caching capacity can effectively reduce server pressure.
III. Memory cache
In the wake of the above disk caching approaches, memory cache, as its name implies, caches static files in the server memory. In such a cache, if a record in the cache is hit, the cached data in the memory will be returned, which has a much higher performance than retrieving cached data in the disk. Taking Varnish as an example, its core configuration is as follows:
Core configuration of default.vcl is as follows:
Varnish sets 1 hour to cache URLs ended with .gif, .jpg, .jpeg, and .png. After completing Varnish settings, view the hit situation by viewing webpage header through the command line:
At last but not least, we can use the varnishadm command to clear cache or the varnishstat command to view cache state in the Varnish system.
IV. nginx memory cache
In the above sections, Varnish is taken as an example to introduce ways of caching static resources in memory. In fact, nginx also has memory cache, which is implemented through coding compared with squid and Varnish. It is configured as follows:
memcached_pass designates the server address, the variable of $memcache_key is taken as the key query value and corresponding value is queried in MemCache.
For example, if we want to visit: http://***.***.***.***/image/test.jpg, nginx queries the value with the key of "test. jpg" in MemCache and returns the result. If there is no corresponding value, error_page 404 will be returned. By now, its key lies in static files stored in MemCache which should be written into the MemCache through code. I will not talk more about the examples about how to write data of static resources into the MemCache through the PHP/Java and other code.
nginx's memory cache is particularly flexible as it is implemented by coding. In combination with characteristics of its own business systems, flexibility and efficiency of static cache can be guaranteed. The only defect may be the burden on maintenance and management resulting from the code-based implementation. Previously, I was engaged in an e-commerce system in which customers' order photos were written in MemCache by PHP code and customers acquired requested photos from MemCache at a particularly high speed and efficiency. nginx, as almighty and lightweight middleware with high performance on Layer 7, is capable of directly retrieving data from MemCache to realize a static caching effect. Such a feature is what other software cannot rival.
CDN is not strange to most of us. It is the most typical representative for static cache acceleration. CDN technology, which is not a new technology, is a static cache acceleration technology based on traditional nginx, squid, varnish and other web caching technologies in combination with DNS smart rsolution. It is noteworthy that it has no acceleration effects for dynamic link access. Its architecture is as follows:
Therefore, the core of CDN static caching technology primarily lies in two points:
Node cache: For website apps in need of acceleration, corresponding static resources are cached on the server through memory cache+disk cache.
Accurate dispatch: Smart resolution and dispatch are made on visitor IP addresses to realize nearby cache node access. In the above figure for instance, a Beijing user accesses www.a.com. During the DNS resolution and IP address analysis of the user, it is found that it is a Beijing user. Therefore, DNS returns an IP address of a corresponding Beijing cache node to the user and user will then access the cached data on the Beijing server for www.a.com by default, realizing the nearby access policy and greatly promoting the access speed.