Alibaba Cloud Content Delivery Network (CDN) is a distributed network that is built over the bearer network. It contains edge nodes that are deployed in different regions. CDN reduces the workload of the origin server and prevents network congestion. You can use CDN to accelerate content delivery in different regions and scenarios. After you transfer CDN logs to OSS, or transfer CDN logs to Log Service in real time, you can ship CDN logs from Log Service to OSS, and then use DLA to analyze CDN logs in OSS without further data transfer. This analysis helps you identify problems, and solve these problems to improve the service quality of CDN.
Prerequisites
- Function Compute
- OSS
-
Note You can determine whether to create a folder to store CDN log files based on your business needs.
- DLA
Step 1: Transfer CDN log files to OSS
For more information, see Log storage.
Step 2: Create an OSS schema
In this topic, the cdn_log.log file is stored in the OSS bucket.
On the DMS page, edit and execute SQL statements to create an OSS schema. You can also use a MySQL client or MySQL command line tool to connect to DLA, and then edit and execute these SQL statements to create an OSS schema.
CREATE SCHEMA cdn_log_schema with DBPROPERTIES(
catalog='oss',
location = 'oss://bucket-name/cdn/'
);
Step 3: Create a table for the cdn_log.log file
The cdn_log.log file contains the following information of the created table:
[9/Jun/2015:01:58:09 +0800] 188.165.15.75 - 1542 "-" "GET http://www.aliyun.com/index.html" 200 191 2830 MISS "Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)" "text/html"
The following table describes the fields in the preceding information.
Field | Description |
[9/Jun/2015:01:58:09 +0800] |
The time. |
188.165.15.75 |
The IP address. |
- |
The IP address of the proxy server. |
1542 |
The response time. Unit: milliseconds. |
"-" |
The referer. |
"GET http://www.aliyun.com/index.html" |
The request. |
200 |
The HTTP code. |
191 |
The request size. Unit: bytes. |
2830 |
The response size. Unit: bytes. |
MISS |
The cache hit status. |
"Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)" |
The User-Agent header in the request. |
"text/html" |
The file type. |
When you create a table in DLA, you must use regular expressions to parse the CDN log. The following table describes the fields and their regular expressions.
Field | Regular Expression |
[9/Jun/2015:01:58:09 +0800] |
(-|\\[[^\\]]*\\]) |
188.165.15.75 |
([^ ]*) |
- |
([^ ]*) |
1542 |
([^ ]*) |
"-" |
([^ \"]*|\"[^\"]*\") |
"GET http://www.aliyun.com/index.html" |
([^ \"]*|\"[^\"]*\") |
200 |
(-|[0-9]*) |
191 |
(-|[0-9]*) |
2830 |
(-|[0-9]*) |
MISS |
([^ ]*) |
"Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)" |
([^ \"]*|\"[^\"]*\") |
"text/html" |
([^ \"]*|\"[^\"]*\") |
The complete regular expression is (-|\\[[^\\]]*\\]) ([^ ]*) ([^ ]*) ([^ ]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\")
(-|[0-9]*) (-|[0-9]*) (-|[0-9]*) ([^ ]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\")
.
You can execute the following SQL statement to create a table for the cdn_log.log file in DLA:
CREATE EXTERNAL TABLE cdn_log (
log_timestamp VARCHAR,
access_ip VARCHAR,
proxy_ip VARCHAR,
response_time VARCHAR,
referer VARCHAR,
request VARCHAR,
httpcode SMALLINT,
request_size BIGINT,
response_size BIGINT,
cache_hit_status VARCHAR,
ua_header VARCHAR,
file_type VARCHAR
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "(-|\\[[^\\]]*\\]) ([^ ]*) ([^ ]*) ([^ ]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*) (-|[0-9]*) ([^ ]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\")"
)
STORED AS TEXTFILE
LOCATION 'oss://bucket-name/cdn/'
TBLPROPERTIES ('recursive.directories' = 'true');
Step 4: Query and analyze the cdn_log.log file
SELECT * FROM cdn_log;
+------------------------------+---------------+----------+---------------+---------+----------------------------------+----------+--------------+---------------+------------------+-------------------------------------------------------------+---------------------------+
| log_timestamp | access_ip | proxy_ip | response_time | referer | request | httpcode | request_size | response_size | cache_hit_status | ua_header | file_type |
+------------------------------+---------------+----------+---------------+---------+----------------------------------+----------+--------------+---------------+------------------+-------------------------------------------------------------+---------------------------+
| [18/Jun/2019:05:08:33 +0800] | 47.92.115.203 | - | 777 | "-" | "GET http://www.kalabandha.com/" | 200 | 201 | 7159 | MISS | "Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D))" | "text/html;charset=UTF-8" |
+------------------------------+---------------+----------+---------------+---------+----------------------------------+----------+--------------+---------------+------------------+-------------------------------------------------------------+---------------------------+
Use the following DLA system functions to analyze data in the cdn_log.log file:
SELECT date_parse(substring(log_timestamp, 2, length(log_timestamp) - 8), '%d/%b/%Y:%H:%i:%s') as log_timestamp,
access_ip,
proxy_ip,
response_time,
substring(referer, 2, length(referer) - 2) as referer,
substring(request, 2, length(request) - 2) as request,
httpcode,
request_size,
response_size,
cache_hit_status,
substring(ua_header, 2, length(ua_header) - 2) as ua_header,
substring(file_type, 2, length(file_type) - 2) as file_type
FROM cdn_log;
+-------------------------+---------------+----------+---------------+---------+--------------------------------+----------+--------------+---------------+------------------+-----------------------------------------------------------+-------------------------+
| log_timestamp | access_ip | proxy_ip | response_time | referer | request | httpcode | request_size | response_size | cache_hit_status | ua_header | file_type |
+-------------------------+---------------+----------+---------------+---------+--------------------------------+----------+--------------+---------------+------------------+-----------------------------------------------------------+-------------------------+
| 2019-06-18 05:08:33.000 | 47.92.115.203 | - | 777 | - | GET http://www.kalabandha.com/ | 200 | 201 | 7159 | MISS | Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D)) | text/html;charset=UTF-8 |
+-------------------------+---------------+----------+---------------+---------+--------------------------------+----------+--------------+---------------+------------------+-----------------------------------------------------------+-------------------------+