Alibaba Cloud Content Delivery Network (CDN) is a distributed network that is built over the bearer network. It contains edge nodes that are deployed in different regions. CDN reduces the workload of the origin server and prevents network congestion. You can use CDN to accelerate content delivery in different regions and scenarios. After you transfer CDN logs to OSS, or transfer CDN logs to Log Service in real time, you can ship CDN logs from Log Service to OSS, and then use DLA to analyze CDN logs in OSS without further data transfer. This analysis helps you identify problems, and solve these problems to improve the service quality of CDN.

Prerequisites

The following operations are performed:

Step 1: Transfer CDN log files to OSS

For more information, see Log storage.

Step 2: Create an OSS schema

In this topic, the cdn_log.log file is stored in the OSS bucket.

On the DMS page, edit and execute SQL statements to create an OSS schema. You can also use a MySQL client or MySQL command line tool to connect to DLA, and then edit and execute these SQL statements to create an OSS schema.

​CREATE SCHEMA cdn_log_schema with DBPROPERTIES(
   catalog='oss',
  location = 'oss://bucket-name/cdn/'
  );​

Step 3: Create a table for the cdn_log.log file

The cdn_log.log file contains the following information of the created table:

[9/Jun/2015:01:58:09 +0800] 188.165.15.75 - 1542 "-" "GET http://www.aliyun.com/index.html" 200 191 2830 MISS "Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)" "text/html"

The following table describes the fields in the preceding information.

Field Description
[9/Jun/2015:01:58:09 +0800] The time.
188.165.15.75 The IP address.
- The IP address of the proxy server.
1542 The response time. Unit: milliseconds.
"-" The referer.
"GET http://www.aliyun.com/index.html" The request.
200 The HTTP code.
191 The request size. Unit: bytes.
2830 The response size. Unit: bytes.
MISS The cache hit status.
"Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)" The User-Agent header in the request.
"text/html" The file type.

When you create a table in DLA, you must use regular expressions to parse the CDN log. The following table describes the fields and their regular expressions.

Field Regular Expression
[9/Jun/2015:01:58:09 +0800] (-|\\[[^\\]]*\\])
188.165.15.75 ([^ ]*)
- ([^ ]*)
1542 ([^ ]*)
"-" ([^ \"]*|\"[^\"]*\")
"GET http://www.aliyun.com/index.html" ([^ \"]*|\"[^\"]*\")
200 (-|[0-9]*)
191 (-|[0-9]*)
2830 (-|[0-9]*)
MISS ([^ ]*)
"Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)" ([^ \"]*|\"[^\"]*\")
"text/html" ([^ \"]*|\"[^\"]*\")

The complete regular expression is (-|\\[[^\\]]*\\]) ([^ ]*) ([^ ]*) ([^ ]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*) (-|[0-9]*) ([^ ]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\").

You can execute the following SQL statement to create a table for the cdn_log.log file in DLA:

​CREATE EXTERNAL TABLE cdn_log (
  log_timestamp           VARCHAR,
  access_ip               VARCHAR,
  proxy_ip                VARCHAR,
  response_time           VARCHAR,
  referer                 VARCHAR,
  request                 VARCHAR,
  httpcode                SMALLINT,
  request_size            BIGINT,
  response_size           BIGINT,
  cache_hit_status        VARCHAR,
  ua_header               VARCHAR,
  file_type               VARCHAR
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "(-|\\[[^\\]]*\\]) ([^ ]*) ([^ ]*) ([^ ]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*) (-|[0-9]*) ([^ ]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\")"
)
STORED AS TEXTFILE
LOCATION 'oss://bucket-name/cdn/'
TBLPROPERTIES ('recursive.directories' = 'true');​

Step 4: Query and analyze the cdn_log.log file

​SELECT * FROM cdn_log;
+------------------------------+---------------+----------+---------------+---------+----------------------------------+----------+--------------+---------------+------------------+-------------------------------------------------------------+---------------------------+
| log_timestamp                | access_ip     | proxy_ip | response_time | referer | request                          | httpcode | request_size | response_size | cache_hit_status | ua_header                                                   | file_type                 |
+------------------------------+---------------+----------+---------------+---------+----------------------------------+----------+--------------+---------------+------------------+-------------------------------------------------------------+---------------------------+
| [18/Jun/2019:05:08:33 +0800] | 47.92.115.203 | -        | 777           | "-"     | "GET http://www.kalabandha.com/" |      200 |          201 |          7159 | MISS             | "Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D))" | "text/html;charset=UTF-8" |
+------------------------------+---------------+----------+---------------+---------+----------------------------------+----------+--------------+---------------+------------------+-------------------------------------------------------------+---------------------------+​

Use the following DLA system functions to analyze data in the cdn_log.log file:

​SELECT date_parse(substring(log_timestamp, 2, length(log_timestamp) - 8), '%d/%b/%Y:%H:%i:%s') as log_timestamp,
       access_ip,
       proxy_ip,
       response_time,
       substring(referer, 2, length(referer) - 2) as referer,
       substring(request, 2, length(request) - 2) as request,
       httpcode,
       request_size,
       response_size,
       cache_hit_status,
       substring(ua_header, 2, length(ua_header) - 2) as ua_header,
       substring(file_type, 2, length(file_type) - 2) as file_type 
FROM cdn_log;
+-------------------------+---------------+----------+---------------+---------+--------------------------------+----------+--------------+---------------+------------------+-----------------------------------------------------------+-------------------------+
| log_timestamp           | access_ip     | proxy_ip | response_time | referer | request                        | httpcode | request_size | response_size | cache_hit_status | ua_header                                                 | file_type               |
+-------------------------+---------------+----------+---------------+---------+--------------------------------+----------+--------------+---------------+------------------+-----------------------------------------------------------+-------------------------+
| 2019-06-18 05:08:33.000 | 47.92.115.203 | -        | 777           | -       | GET http://www.kalabandha.com/ |      200 |          201 |          7159 | MISS             | Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D)) | text/html;charset=UTF-8 |
+-------------------------+---------------+----------+---------------+---------+--------------------------------+----------+--------------+---------------+------------------+-----------------------------------------------------------+-------------------------+​