NGINX access logs record detailed information about user access. Parsing these logs is crucial for business operations and maintenance (O&M). This topic describes how to use regular expression functions to parse NGINX access logs.
Parse standard NGINX logs
Simple Log Service lets you parse NGINX logs using Structured Process Language (SPL) regular expressions. The following example shows how to use regular expressions to parse a successful NGINX access log.
Raw log
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.254.254 __tag__:__receive_time__: 1563443076 content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"Parsing requirements
Requirement 1: Extract the
code,ip,datetime,protocol,request,sendbytes,referer,useragent, andverbfields from the NGINX log.Requirement 2: Further parse the
requestfield to extract theuri_proto,uri_domain, anduri_paramfields.Requirement 3: Further parse the extracted
uri_paramfield to extract theuri_pathanduri_queryfields.
SLS SPL orchestration
Complete orchestration
* | parse-regexp content, '(\d+\.\d+\.\d+\.\d+) - - \[([\s\S]+)\] \"([A-Z]+) ([\S]*) ([\S]+)["] (\d+) (\d+) ["]([\S]*)["] ["]([\S\s]+)["]' as ip, datetime,verb,request,protocol,code,sendbytes,refere,useragent | parse-regexp request, '^(\w+):\/\/([^\/]+)(\/.*)$' as uri_proto, uri_domain, uri_param | parse-regexp uri_param, '([^?]*)\?(.*)' as uri_path, uri_queryOrchestration breakdown and corresponding results
The SPL orchestration for Requirement 1 is as follows.
* | parse-regexp content, '(\d+\.\d+\.\d+\.\d+) - - \[([\s\S]+)\] \"([A-Z]+) ([\S]*) ([\S]+)["] (\d+) (\d+) ["]([\S]*)["] ["]([\S\s]+)["]' as ip, datetime,verb,request,protocol,code,sendbytes,refere,useragentCorresponding result:
__source__: 192.168.0.1 __tag__: __receive_time__: 1563443076 code: 200 content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"httpversion: 1.1 datetime: 04/Jan/2019:16:06:38 +0800 ip: 192.168.0.2 protocol: HTTP/1.1 refere: - request: http://example.aliyundoc.com/_astats?application=&inf.name=eth0 sendbytes: 273932 useragent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html) verb: GET
The SPL orchestration for Requirement 2 is as follows.
* | parse-regexp request, '^(\w+):\/\/([^\/]+)(\/.*)$' as uri_proto, uri_domain, uri_paramCorresponding result:
uri_param: /_astats?application=&inf.name=eth0 uri_domain: example.aliyundoc.com uri_proto: httpThe SPL orchestration for Requirement 3 is as follows.
* | parse-regexp uri_param, '([^?]*)\?(.*)' as uri_path, uri_queryCorresponding result:
uri_path: /_astats uri_query: application=&inf.name=eth0
Final SPL processing result
__source__: 192.168.0.1 __tag__: __receive_time__: 1563443076 code: 200 content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"httpversion: 1.1 datetime: 04/Jan/2019:16:06:38 +0800 ip: 192.168.0.2 protocol: HTTP/1.1 refere: - request: http://example.aliyundoc.com/_astats?application=&inf.name=eth0 sendbytes: 273932 uri_domain: example.aliyundoc.com uri_proto: http uri_param: /_astats?application=&inf.name=eth0 uri_path: /_astats uri_query: application=&inf.name=eth0 useragent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html) verb: GET
Parse non-standard NGINX logs
Use case 1: Extract keywords from the middle of a log
You can use a regular expression with the parse-regexp function to extract the `Time`, `Level`, `Server`, and `Info` fields from the middle of the `message` field.
Example
Raw log
{"message": "[2024-10-11 10:30:34.917962]\t[info]\t[SingleWorldService]\t[ResourceManager:testOut for 2, srvClusterId=1009]\t[[] ...ewEntities/ResourceServiceComponent/ResourceManager.out:190]"}SPL orchestration
*| parse-regexp message, '\[([^[\]]+)\]\s+\[([^[\]]+)\]\s+\[([^[\]]+)\]\s+\[([^[\]]+)\]' as Time,Level,Server,InfoProcessing result
Time:2024-10-11 10:30:34.917962 Level:info Server:SingleWorldService Info:ResourceManager:testOut for 2, srvClusterId=1009 message:[2024-10-11 10:30:34.917962] [info] [SingleWorldService] [ResourceManager:testOut for 2, srvClusterId=1009] [[] ...ewEntities/ResourceServiceComponent/ResourceManager.out:190]