NGINX access logs record the detailed information of user access requests. You can parse NGINX access logs to monitor and analyze your business. This topic describes how to use regular expressions or the Grok function to parse NGNIX access logs.

Parsing methods

Log Service allows you to parse NGNIX logs by using regular expressions or the Grok function.
  • Use regular expressions.

    If you are unfamiliar with regular expressions, using regular expressions to parse logs may be difficult, inefficient, and time consuming. Therefore, we recommend that you use the Grok function instead of regular expressions to parse logs. For more information about regular expressions, see Regular expressions.

  • (Recommended) Use the Grok function.

    Compared with regular expressions, the Grok function is easier to learn. You can use this function to parse logs if you are familiar with the field types in different Grok patterns. The Grok function is superior to regular expressions in terms of flexibility, efficiency, cost effectiveness, and learning curves. Log Service supports 400 Grok patterns for data transformation. We recommend that you use this function to parse logs. For more information about Grok patterns, see Grok patterns.

Note
  • You can combine regular expressions and the Grok function to parse logs.
  • You can customize regular expressions or the Grok function to parse NGINX logs that are in a custom format.

Use regular expressions to parse NGINX access logs that contain a success status code

The following example shows how to use regular expressions to parse NGINX access logs that contain a success status code.
  • Raw log entry
    __source__:  192.168.0.1
    __tag__:__client_ip__:  192.168.254.254
    __tag__:__receive_time__:  1563443076
    content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://cdn1cdedge0001.coxlab.net/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
  • Requirements
    • Requirement 1: Extract the code, ip, datetime, protocol, request, sendbytes, refere, useragent, and verb fields from the NGINX logs.
    • Requirement 2: Extract the uri_proto, uri_domain, and uri_param fields from the request field.
    • Requirement 3: Extract the uri_path and uri_query fields from the uri_param field.
  • DSL orchestration
    • General orchestration
      """Step 1: Parse the NGINX logs."""
      e_regex("content",r'(? P<ip>\d+\.\d+\.\d+\.\d+)( - - \[)(? P<datetime>[\s\S]+)\] \"(? P<verb>[A-Z]+) (? P<request>[\S]*) (? P<protocol>[\S]+)["] (? P<code>\d+) (? P<sendbytes>\d+) ["](? P<refere>[\S]*)["] ["](? P<useragent>[\S\s]+)["]')
      """Step 2: Parse the request field obtained in Step 1."""
      e_regex('request',r'(? P<uri_proto>(\w+)):\/\/(? P<uri_domain>[a-z0-9.] *[^\/])(? P<uri_param>(. +)$)')
      """Step 3: Parse the uri_param field obtained in Step 2."""
      e_regex('uri_param',r'(? P<uri_path>\/\_[a-z]+[^?]) \?(? <uri_query>(. +)$)')
    • Specific orchestration and the transformation results
      • Orchestration specific to Requirement 1:
        e_regex("content",r'(? P<ip>\d+\.\d+\.\d+\.\d+)( - - \[)(? P<datetime>[\s\S]+)\] \"(? P<verb>[A-Z]+) (? P<request>[\S]*) (? P<protocol>[\S]+)["] (? P<code>\d+) (? P<sendbytes>\d+) ["](? P<refere>[\S]*)["] ["](? P<useragent>[\S\s]+)["]')
        Sub-result
        __source__:  192.168.0.1
        __tag__:  __receive_time__:  1563443076
        code:  200
        content:  192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://cdn1cdedge0001.coxlab.net/_astats?application=&amp;inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"httpversion:  1.1
        datetime:  04/Jan/2019:16:06:38 +0800
        ip:  192.168.0.2
        protocol:  HTTP/1.1
        refere:  -
        request:  http://cdn1cdedge0001.coxlab.net/_astats?application=&amp;inf.name=eth0
        sendbytes:  273932
        useragent:  Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
        verb:  GET
      • Orchestration specific to Requirement 2 (Parse the request field).
        e_regex('request',r'(? P<uri_proto>(\w+)):\/\/(? P<uri_domain>[a-z0-9.] *[^\/])(? P<uri_param>(. +)$)')
        Sub-result
        uri_param: /_astats? application=&inf.name=eth0
        uri_domain: cdn1cdedge0001.coxlab.net
        uri_proto: http
      • Orchestration specific to Requirement 3 (Parse the uri_param field).
        e_regex('uri_param',r'(? P<uri_path>\/\_[a-z]+[^?]) \?(? <uri_query>(. +)$)')
        Sub-result
        uri_path: /_astats
        uri_query: application=&inf.name=eth0
  • Result
    __source__:  192.168.0.1
    __tag__:  __receive_time__:  1563443076
    code:  200
    content:  192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://cdn1cdedge0001.coxlab.net/_astats?application=&amp;inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"httpversion:  1.1
    datetime:  04/Jan/2019:16:06:38 +0800
    ip:  192.168.0.2
    protocol:  HTTP/1.1
    refere:  -
    request:  http://cdn1cdedge0001.coxlab.net/_astats?application=&amp;inf.name=eth0
    sendbytes:  273932
    uri_domain:  cdn1cdedge0001.coxlab.net
    uri_proto:  http
    uri_param: /_astats? application=&inf.name=eth0
    uri_path: /_astats 
    uri_query: application=&inf.name=eth0
    useragent:  Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    verb:  GET

Use the Grok function to parse NGINX access logs that contain a success status code.

The following example shows how to use the Grok function to parse NGINX access logs that contain a success status code.
  • Raw log entry
    __source__:  192.168.0.1
    __tag__:__client_ip__:  192.168.254.254
    __tag__:__receive_time__:  1563443076
    content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://cdn1cdedge0001.coxlab.net/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
  • Requirements
    • Requirement 1: Extract the clientip, bytes, agent, auth, verb, request, ident, timestamp, httpversion, response, and referrer fields from the NGINX logs.
    • Requirement 2: Extract the uri_proto, uri_domain, and uri_param fields from the request field.
    • Requirement 3: Extract uri_path and uri_query fields from the uri_param field.
  • DSL orchestration
    • General orchestration
      """Step 1: Parse the NGINX logs."""
      e_regex('content',grok('%{COMBINEDAPACHELOG}'))
      """Step 2: Parse the request field obtained in Step 1."""
      e_regex('request',grok("%{URIPROTO:uri_proto}://(?:%{USER:user}(?::[^@]*)? @)?(?:%{URIHOST:uri_domain})?(?:%{URIPATHPARAM:uri_param})?"))
      """Step 3: Parse the uri_param field obtained in Step 2."""
      e_regex('uri_param',grok("%{GREEDYDATA:uri_path}\? %{GREEDYDATA:uri_query}"))
      To use the Grok function to parse the NGINX logs, you only need to use the COMBINEDAPACHELOG pattern.
      Pattern Rule Description
      COMMONAPACHELOG

      %{IPORHOST:clientip} %

      {HTTPDUSER:ident} %

      {USER:auth} \[%

      {HTTPDATE:timestamp}\] "(?:%

      {WORD:verb} %

      {NOTSPACE:request}(?: HTTP/%

      {NUMBER:httpversion})? |%

      {DATA:rawrequest})" %

      {NUMBER:response} (?:%

      {NUMBER:bytes}|-)

      Parses the clientip, ident, auth, timestamp, verb, request, httpversion, response, and bytes fields.
      COMBINEDAPACHELOG

      %{COMMONAPACHELOG} %

      {QS:referrer} %{QS:agent}

      Parses all the fields in the COMMONAPACHELOG pattern, and parses the referrer and agent fields.
    • Specific orchestration and the transformation results
      • Orchestration specific to Requirement 1:
        e_regex('content',grok('%{COMBINEDAPACHELOG}'))
        Sub-result
        clientip: 192.168.0.1
        __tag__:  __receive_time__:  1563443076
        agent:  "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
        auth:  -
        bytes:  273932
        clientip:  192.168.0.2
        content:  192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://cdn1cdedge0001.coxlab.net/_astats?application=&amp;inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
        httpversion:  1.1
        ident:  -
        referrer:  "-"
        request:  http://cdn1cdedge0001.coxlab.net/_astats?application=&amp;inf.name=eth0
        response:  200
        timestamp:  04/Jan/2019:16:06:38 +0800
        verb:  GET
      • Orchestration specific to Requirement 2 (Parse the request field).
        e_regex('request',grok("%{URIPROTO:uri_proto}://(?:%{USER:user}(?::[^@]*)? @)?(?:%{URIHOST:uri_domain})?(?:%{URIPATHPARAM:uri_param})?"))
        Sub-result
        uri_proto: http
        uri_domain: cdn1cdedge0001.coxlab.net
        uri_param: /_astats? application=&inf.name=eth0
        You can use the Grok patterns to parse the request field. The following table describes the patterns.
        Pattern Rule Description
        URIPROTO [A-Za-z]+(\+[A-Za-z+]+)? Matches URI schemes. For example, in http://hostname.domain.tld/_astats?application=&inf.name=eth0, the matched content is http.
        USER [a-zA-Z0-9. _-]+ Matches content that contains letters, digits, and . _-.
        URIHOST %{IPORHOST}(?::% Matches IP addresses, hostnames, or positive integers.
        URIPATHPARAM %{URIPATH}(?:%{URIPARAM})? Matches the uri_param field.
      • Orchestration specific to Requirement 3 (Parse the uri_param field).
        e_regex('uri_param',grok("%{GREEDYDATA:uri_path}\? %{GREEDYDATA:uri_query}"))
        Sub-result
        uri_path: /_astats
        uri_query: application=&inf.name=eth0
        The following table describes the Grok pattern that is used to parse the uri_param field.
        Pattern Rule Description
        GREEDYDATA . * Matches zero or multiple characters that are not line breaks.
  • Result
    __source__:  192.168.0.1
    __tag__:__receive_time__:  1563443076
    agent:  "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
    auth:  -
    bytes:  273932
    clientip:  192.168.0.2
    content:  192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://cdn1cdedge0001.coxlab.net/_astats?application=&amp;inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
    httpversion:  1.1
    ident:  -
    referrer:  "-"
    request:  http://cdn1cdedge0001.coxlab.net/_astats?application=&amp;inf.name=eth0
    response:  200
    timestamp:  04/Jan/2019:16:06:38 +0800
    uri_domain:  cdn1cdedge0001.coxlab.net
    uri_param:  /_astats? application=&amp;inf.name=eth0
    uri_path:  /_astats
    uri_proto:  http
    uri_query:  application=&amp;inf.name=eth0
    verb:  GET

Use the Grok function to parse NGINX access logs that contain an error status code

The following example shows how to use the Grok function to parse NGINX access logs that contain an error status code.
  • Raw log entry
    __source__:  192.168.0.1
    __tag__:__client_ip__:  192.168.254.254
    __tag__:__receive_time__:  1563443076
    content: 2019/08/07 16:05:17 [error] 1234#1234: *1234567 attempt to send data on a closed socket: u:111111ddd, c:0000000000000000, ft:0 eof:0, client: 1.2.3.4, server: sls.aliyun.com, request: "GET /favicon.ico HTTP/1.1", host: "sls.aliyun.com", referrer: "https://sls.aliyun.com/question/answer/123.html?from=singlemessage"
  • Requirement:

    Parse the host, http_version, log_level, pid, referrer, request, request_time, server, and verb fields from the content field.

  • DSL orchestration:
    e_regex('content',grok('%{DATESTAMP:request_time} \[%{LOGLEVEL:log_level}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: (? <client>%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server})(?:, request: "%{WORD:verb} %{NOTSPACE:request}( HTTP/%{NUMBER:http_version})")(?:, host: "%{HOSTNAME:host}")?(?:, referrer: "%{NOTSPACE:referrer}")?'))
  • Result
    ___source__:  192.168.0.1
    __tag__:__client_ip__:  192.168.254.254
    __tag__:__receive_time__:  1563443076
    content:  2019/08/07 16:05:17 [error] 1234#1234: *1234567 attempt to send data on a closed socket: u:111111ddd, c:0000000000000000, ft:0 eof:0, client: 1.2.3.4, server: sls.aliyun.com, request: "GET /favicon.ico HTTP/1.1", host: "sls.aliyun.com", referrer: "https://sls.aliyun.com/question/answer/123.html?
    host: sls.aliyun.com
    http_version: 1.1
    log_level: error
    pid: 1234
    referrer: https://sls.aliyun.com/question/answer/123.html?from=singlemessage
    request: /favicon.ico
    request_time:  19/08/07 16:05:17
    server: sls.aliyun.com
    verb: GET