All Products
Search
Document Center

Simple Log Service:Parse CSV format logs

Last Updated:Dec 11, 2025

This topic describes how to parse normal and abnormal CSV format logs.

Normal CSV format logs

  • Raw log

    _program_:access
    _severity_:6
    _priority_:14
    _facility_:1
    topic:syslog-forwarder
    content:10.64.10.20|10/Jun/2019:11:32:16 +0800|m.zf.cn|GET /css/mip-base.css HTTP/1.1|200|0.077|6404|10.11.186.82:8001|200|0.060|https://yz.m.sm.cn/s?q=%25%24%23%40%21&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei|-|Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-A00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36|-|-
  • Requirements

    1. If the value of the _program_ field is access, parse the content field as pipe-separated values (PSV), and then delete the content field.

    2. Split the request field into the request_method, request, and http_version fields.

    3. You can apply URL decoding to http_referer.

  • Solution

    1. If the value of the _program_ field is access, use the parse-csv instruction to parse the content field and delete the original content field. The statement is as follows:

      * | where _program_='access' | parse-csv -delim='|' content as remote_addr,time_local,host,request,status,request_time,body_bytes_sent,upstream_addr,upstream_status,upstream_response_time,http_referer,http_x_forwarded_for,http_user_agent,session_id,guid | project-away content

      Returned log:

      __source__:  1.2.3.4
      __tag__:__client_ip__:  2.3.X.X
      __tag__:__receive_time__:  1562845168
      __topic__:
      _facility_:  1
      _priority_:  14
      _program_:  access
      _severity_:  6
      body_bytes_sent:  6404
      guid:  -
      host:  m.zf.cn
      http_referer:  https://yz.m.sm.cn/s?q=%25%24%23%40%21&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei
      http_user_agent:  Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-A00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36
      http_x_forwarded_for:  -
      remote_addr:  10.64.10.20
      request: GET /css/mip-base.css HTTP/1.1
      request_time:  0.077
      session_id:  -
      status:  200
      time_local:  10/Jun/2019:11:32:16 +0800
      topic:  syslog-forwarder
      upstream_addr:  10.11.186.82:8001
      upstream_response_time:  0.060
      upstream_status:  200
    2. Use parse-regexp to parse the request field into the request_method, request, and http_version fields.

      * | parse-regexp request, '(\S+)' as request_method |  parse-regexp request, '\S+\s+\S+\s+(\S+)' as http_version | parse-regexp request, '\S+\s+(\S+)' as request

      Returned log:

      request:  /css/mip-base.css
      request_method:  GET
      http_version:  HTTP/1.1
    3. URL-decode the http_referer field.

      * | extend http_referer=url_decode(http_referer)

      Returned log:

      http_referer:https://yz.m.sm.cn/s?q=%$#@!&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei

Abnormal CSV format logs

The following log format contains an abnormal log entry.

  • Raw log

    __source__:  1.2.3.4
    __tag__:__client_ip__:  2.3.X.X
    __tag__:__receive_time__:  1562840879
    __topic__:
    content: 101.132.xx.xx|07/Aug/2019:11:10:37 +0800|www.123.com|GET /alyun/htsw/?ad=5|8|6|11| HTTP/1.1|200|6.729|14559|1.2.3.4:8001|200|6.716|-|-|Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D))||
  • Requirement

    Parse the content field.

  • Solution

    In the content field, replace GET /alyun/htsw/?ad=5|8|6|11| HTTP/1.1 with "GET /alyun/htsw/?ad=5|8|6|11| HTTP/1.1". Then, use the parse-csv instruction and set the quote parameter to correctly parse the fields. Finally, delete the original content field.

    * | extend content=replace(content,'GET /alyun/htsw/?ad=5|8|6|11| HTTP/1.1','"GET /alyun/htsw/?ad=5|8|6|11| HTTP/1.1"') | parse-csv -delim='|' 
    -quote='"' content as remote_addr,time_local,host,request,status,request_time,body_bytes_sent,upstream_addr,upstream_status, upstream_response_time,http_referer,http_x_forwarded_for,http_user_agent,session_id,guid | project-away content
  • Returned log

    __source__:  1.2.3.4
    __tag__:__client_ip__:  2.3.X.X
    __tag__:__receive_time__:  1562840879
    __topic__:
    body_bytes_sent:  14559
    host:  www.123.com
    http_referer:  -
    http_user_agent:  Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D))
    http_x_forwarded_for:  -
    remote_addr:  101.132.xx.xx
    request:  GET /alyun/htsw/?ad=5|8|6|11|  HTTP/1.1
    request_time:  6.729
    status:  200
    time_local:  07/Aug/2019:11:10:37 +0800
    upstream_addr:  1.2.3.4:8001
    upstream_response_time:  6.716
    upstream_status:  200