Alibaba Cloud Content Delivery Network (CDN) is a distributed network built over the bearer network and contains edge nodes deployed in different regions. CDN offloads network traffic workloads from origin servers to prevent network congestion. You can use CDN to accelerate content delivery in different regions and scenarios. This improves the user experience of website visitors. Before you analyze log data, you can transfer CDN logs to Object Storage Service (OSS) or transfer these logs to Log Service in real time. This way, you can use DLA to directly analyze data of CDN logs that are stored in OSS. This helps you identify issues in a timely manner and resolve these issues to improve the service quality of CDN.
Prerequisites
- Function Compute
- OSS
- Get started with OSS
- Create buckets
- Upload objectsNote You can determine whether to create a folder to store CDN logs based on your business needs.
- DLA
Step 1: Transfer CDN logs to OSS
For more information, see Use Function Compute to deliver logs.
Step 2: Create an OSS schema
In this topic, the cdn_log.log file is stored in the OSS bucket.
To create an OSS schema and a table for log files and read data from log files, you can edit SQL statements in the DMS console or use a MySQL client or MySQL command-line tool to connect to DLA and edit SQL statements.
CREATE SCHEMA cdn_log_schema with DBPROPERTIES(
catalog='oss',
location = 'oss://bucket-name/cdn/'
);
Step 3: Create a table for the cdn_log.log file
The cdn_log.log file contains the following information:
[9/Jun/2015:01:58:09 +0800] 188.165.XX.XX - 1542 "-" "GET http://www.aliyun.com/index.html" 200 191 2830 MISS "Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)" "text/html"
The following table describes the fields in the preceding information.
Field | Description |
[9/Jun/2015:01:58:09 +0800] |
The time at which the log was generated. |
188.165.XX.XX |
The IP address. |
- |
The proxy IP address. |
1542 |
The response time. Unit: milliseconds. |
"-" |
The referer. |
"GET http://www.aliyun.com/index.html" |
The request. |
200 |
The HTTP code. |
191 |
The request size. Unit: byte. |
2830 |
The response size. Unit: byte. |
MISS |
The cache hit status. |
"Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)" |
The user-agent header in the request. |
"text/html" |
The file type. |
When you create a table in DLA, you must use regular expressions to parse CDN log files. The following table describes the fields and the related regular expressions.
Field | Regular expression |
[9/Jun/2015:01:58:09 +0800] |
(-|\\[[^\\]]*\\]) |
188.165.XX.XX |
([^ ]*) |
- |
([^ ]*) |
1542 |
([^ ]*) |
"-" |
([^ \"]*|\"[^\"]*\") |
"GET http://www.aliyun.com/index.html" |
([^ \"]*|\"[^\"]*\") |
200 |
(-|[0-9]*) |
191 |
(-|[0-9]*) |
2830 |
(-|[0-9]*) |
MISS |
([^ ]*) |
"Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)" |
([^ \"]*|\"[^\"]*\") |
"text/html" |
([^ \"]*|\"[^\"]*\") |
A complete regular expression is (-|\\[[^\\]]*\\]) ([^ ]*) ([^ ]*) ([^ ]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\")
(-|[0-9]*) (-|[0-9]*) (-|[0-9]*) ([^ ]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\")
.
You can execute the following SQL statement to create a table for the cdn_log.log file in DLA:
CREATE EXTERNAL TABLE cdn_log (
log_timestamp VARCHAR,
access_ip VARCHAR,
proxy_ip VARCHAR,
response_time VARCHAR,
referer VARCHAR,
request VARCHAR,
httpcode SMALLINT,
request_size BIGINT,
response_size BIGINT,
cache_hit_status VARCHAR,
ua_header VARCHAR,
file_type VARCHAR
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "(-|\\[[^\\]]*\\]) ([^ ]*) ([^ ]*) ([^ ]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*) (-|[0-9]*) ([^ ]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\")"
)
STORED AS TEXTFILE
LOCATION 'oss://bucket-name/cdn/'
TBLPROPERTIES ('recursive.directories' = 'true');
Step 4: Query and analyze data of the cdn_log.log file
SELECT * FROM cdn_log;
+------------------------------+---------------+----------+---------------+---------+----------------------------------+----------+--------------+---------------+------------------+-------------------------------------------------------------+---------------------------+
| log_timestamp | access_ip | proxy_ip | response_time | referer | request | httpcode | request_size | response_size | cache_hit_status | ua_header | file_type |
+------------------------------+---------------+----------+---------------+---------+----------------------------------+----------+--------------+---------------+------------------+-------------------------------------------------------------+---------------------------+
| [18/Jun/2019:05:08:33 +0800] | 47.92.XX.XX | - | 777 | "-" | | 200 | 201 | 7159 | MISS | "Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D))" | "text/html;charset=UTF-8" |
+------------------------------+---------------+----------+---------------+---------+----------------------------------+----------+--------------+---------------+------------------+-------------------------------------------------------------+---------------------------+
Use the following functions of DLA to analyze data in the cdn_log.log file:
SELECT date_parse(substring(log_timestamp, 2, length(log_timestamp) - 8), '%d/%b/%Y:%H:%i:%s') as log_timestamp,
access_ip,
proxy_ip,
response_time,
substring(referer, 2, length(referer) - 2) as referer,
substring(request, 2, length(request) - 2) as request,
httpcode,
request_size,
response_size,
cache_hit_status,
substring(ua_header, 2, length(ua_header) - 2) as ua_header,
substring(file_type, 2, length(file_type) - 2) as file_type
FROM cdn_log;
+-------------------------+---------------+----------+---------------+---------+--------------------------------+----------+--------------+---------------+------------------+-----------------------------------------------------------+-------------------------+
| log_timestamp | access_ip | proxy_ip | response_time | referer | request | httpcode | request_size | response_size | cache_hit_status | ua_header | file_type |
+-------------------------+---------------+----------+---------------+---------+--------------------------------+----------+--------------+---------------+------------------+-----------------------------------------------------------+-------------------------+
| 2019-06-18 05:08:33.000 | 47.92.XX.XX | - | 777 | - | | 200 | 201 | 7159 | MISS | Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D)) | text/html;charset=UTF-8 |
+-------------------------+---------------+----------+---------------+---------+--------------------------------+----------+--------------+---------------+------------------+-----------------------------------------------------------+-------------------------+