Adolph
Engineer
Engineer
  • UID623
  • Fans2
  • Follows1
  • Posts72
Reads:936Replies:0

Essential CDN skill kits for server engineers in the cloud era

Created#
More Posted time:Feb 21, 2017 14:21 PM
In the cloud era, everyone is stretching to their limits to speed up loading static resources, which has prompted the gradual domestic prevalence of CDN in recent years. As a company with a picture-sharing community as the core business, we made significant use of image CDN services. Next I would like to sum up my experiences. Forgive me if there is any error in the behind-closed-doors practices.
The content mainly covers the following:
1. Background of CDN and distributed storage of images
2. Overview of CDN principles
3. Steps and notes for bulk addition and switching CDN
4. Analysis of CDN access faults

Background of CDN and distributed storage of images
CDNs in figures below are all based on our current image storage scenarios. To make the following introductions more clear, I would like to first describe the current image storage structure.

Overview of CDN principles
I drew a simple diagram to illustrate the principle.
 
In the fifth step, the back-to-source step, we in fact will require CDN service providers not to redirect all the requests to the nodes directly to our source sites, but negotiate with them to adopt a unified proxy for the back-to-source operations. That is to say, only one back-to-source attempt is allowed for the same resource. Afterwards, if a request has no hits in the cache of other edge nodes, it will be directed to their own proxies.
In other words, their CDNs have multi-level cache.

Steps and notes for bulk addition of CDN sources
Business requirement: Now we need to switch traffic accessing images on a domain name (a.mengkang.net) to the CDN.
Steps:
1. First, keep statistics of the access logs of the original domain name to get the addresses of frequently-accessed images (such as 200,000 addresses), and hand over the addresses to the CDN service provider.
2. Let the CDN service provider capture the resources of the 200,000 addresses as a warm-up.
3. After the warm-up, we replace part of the domain name (a.mengkang.net) to b.mengkang.net, and conduct CNAME resolution for b.mengkang.net to the provided domain name address of the CDN server (such as b.mengkang.ccgslb.com.cn).
4. Test whether the images under b.mengkang.net can be cached by the CDN using the wgettool.
5. If the cache is okay through testing, we then switch part of the traffic to a.mengkang.net to b.mengkang.net, and my O&M colleagues help with the monitoring of back-to-source traffic. Adjustments on the distributed traffic volume are made based on the back-to-source traffic situations.

Locate CDN resource access faults
Case 1: Images cannot be loaded in large areas. The same image address cannot be opened from time to time. When the address is not accessible, the attempt to access the image address is redirected to the homepage of a gaming website.
I contacted the CDN customer service and the feedback they provided was that the operator CDN had been hijacked and their service had no problem (a rather slack working attitude).
Take the image below for example http://f4.topit.me/4/2d/d1/1133196716aead12d4s.jpg
1. First we are sure that our source site resources are accessible, and there is no problem with the CDN back-to-source function.
We bind the domain name host through the wget command. Supposing the source site IP address is 111.1.23.214, this will enable a route around the CDN to visit our source site directly.
wget -S -0 /dev/null --header="Host: f4:topit.me" http://111.1.23.214/4/2d/d1/1133196716aead12d4s.jpg
We confirmed the image was accessible.
2. Then we printed the detailed HTTP header information through wget -S.
wget -S  http://f4.topit.me/4/2d/d1/1133196716aead12d4s.jpg

--2014-11-08 21:47:34--  http://f4.topit.me/4/2d/d1/1133196716aead12d4s.jpg
Resolving f4.topit.me... 123.150.50.14, 123.150.50.13
Connecting to f4.topit.me|123.150.50.14|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 302 Moved Temporarily
  Server: nginx/1.7.3
  Date: Sat, 08 Nov 2014 13:45:31 GMT
  Content-Type: text/html; charset=iso-8859-1
  Content-Length: 218
  Location: http://www.aiaigame.com/index.html
  Cache-Control: max-age=300
  Expires: Sat, 08 Nov 2014 13:50:31 GMT
  Powered-By-ChinaCache: MISS from CHN-SX-3-3gC.2
  Age: 125
  Powered-By-ChinaCache: HIT from CHN-TJ-7-3V2.6
  Connection: close
Location: http://www.aiaigame.com/index.html [following]
--2014-11-08 21:47:36--  http://www.aiaigame.com/index.html
Resolving www.aiaigame.com... 119.90.14.54, 119.90.14.59, 220.181.64.153, ...
Connecting to www.aiaigame.com|119.90.14.54|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Sat, 08 Nov 2014 13:42:50 GMT
  Server: Apache/2.2.10 (Unix) DAV/2 PHP/5.2.6 mod_ssl/2.2.10 OpenSSL/0.9.8e-fips-rhel5
  Last-Modified: Fri, 07 Nov 2014 09:14:50 GMT
  ETag: "31a8087-132ee-507413eb6f680"
  Accept-Ranges: bytes
  Content-Length: 78574
  Cache-Control: max-age=300
  Expires: Sat, 08 Nov 2014 13:47:50 GMT
  Vary: Accept-Encoding,User-Agent
  Content-Type: text/html
  Powered-By-ChinaCache: HIT from 01011623g3.3
  Age: 288
  Powered-By-ChinaCache: HIT from 01001743SJ
  Connection: keep-alive
Length: 78574 (77K) [text/html]
Saving to: “index.html.4”
  
100%[=====================================================================================================================================================>] 78,574      --.-K/s   in 0.005s  
  
2014-11-08 21:47:38 (16.3 MB/s) - “index.html.4” saved [78574/78574]


Through this request, we can clearly see that the request first reached 123.150.50.14:80 and then received a 302 redirect. The header information explicitly showed Powered-By-ChinaCache: HIT from CHN-TJ-7-3V2.6, indicating a problem with the CDN. In addition, the webpage that the request was redirected to is also a customer using ChinaCache.
The problem was located and the CDN service provider had no excuse to shirk its responsibility, but instead to look into the issue.

Case 2: All the images in the CSS could not be accessed when we visited a webpage, but they were accessible by their respective image addresses. We diagnosed the issue as an anti-leech setting error using wget –referer.
Fault screenshot (ugly-looking)
 
I fed back this problem to the customer service, getting a reply saying that they imposed no restrictions and that it was the problem with our source site. It is time to let the evidence talk.
1. First I confirmed that the source site is functioning normally, while simulating the browser access with referer .
wget -S -O /dev/null --header="Host: img.topit.me" --referer="http://static.topitme.com/s/css/main21.css" http://211.155.84.132/img/bar/next.png
--2015-05-07 13:52:50--  http://211.155.84.132/img/bar/next.png
Connecting to 211.155.84.132:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Server: nginx
  Date: Thu, 07 May 2015 05:52:50 GMT
  Content-Type: image/png
  Content-Length: 3022
  Connection: keep-alive
  Last-Modified: Wed, 04 Jan 2012 14:44:07 GMT
  Expires: Sun, 04 May 2025 05:52:50 GMT
  Cache-Control: max-age=315360000
  Accept-Ranges: bytes
Length: 3022 (3.0K) [image/png]


At the same time, I adopted another approach of wget -e http_proxy for host binding.
wget -SO /dev/null --referer="http://static.topitme.com/s/css/main21.css" http://img.topit.me/img/style/icon_heart.png -e http_proxy=211.155.84.137

2. Then I directly issued the request, without host binding.
wget -S  -O /dev/null  --referer="http://static.topitme.com/s/css/main21.css"  http://img.topit.me/img/bar/next.png
--2015-05-07 11:29:21--  http://img.topit.me/img/bar/next.png
Resolving img.topit.me... 111.202.7.252, 125.39.78.164
Connecting to img.topit.me|111.202.7.252|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 403 Forbidden
  Server: nginx
  Date: Thu, 07 May 2015 03:29:21 GMT
  Content-Type: text/html
  Content-Length: 162
  Connection: keep-alive
2015-05-07 11:29:21 ERROR 403: Forbidden.


We can clearly see the domain name resolution process. The CDN DNS returned to the optimal IP address 111.202.7.252 for access through the pre-defined policy, and then returned 403. It was not until I provided the screenshots to compare the two situations that the CDN customer service personnel approached the problem.
Never count on the customer service to solve your problems. They will never admit that it is their fault until you locate the problems on your own and leave them no alternatives.
The lionized in the new age – Alibaba Cloud OSS
The problems mentioned above have resulted in our development engineers undertaking too much work that should be the responsibility of O&M engineers. Now with the availability of OSS, the problems above won’t affect you. There are no back-to-source problems, because our data is stored directly on the cloud. It is so easy.
Guest