All Products
Search
Document Center

Simple Log Service:Ingest processor scenarios

Last Updated:Jan 03, 2025

You can use ingest processors to process logs before the logs are written to a Logstore. For example, you can use ingest processors to modify fields, parse fields, filter data, and mask data. This topic describes how to configure ingest processors. This topic also describes the scenarios in which ingest processors are used.

Prerequisites

A project and a Standard Logstore are created, and log collection settings are configured. For more information, see Create a project, Create a Logstore, and Data collection overview.

Scenarios

To extract the request_method, request_uri, and status fields from a raw log, perform the following operations.

Raw log

body_bytes_sent: 22646
host: www.example.com
http_protocol: HTTP/1.1
remote_addr: 192.168.31.1
remote_user: Elisa
request_length: 42450
request_method: GET
request_time: 675
request_uri: /index.html
status: 200
time_local: 2024-12-04T13:47:54+08:00

Procedure

  1. Create an ingest processor.

    1. Log on to the Simple Log Service console.

    2. In the Projects section, click the project that you want to manage.

    3. In the left-side navigation pane, choose Resources > Data Processor.

    4. On the Ingest Processor tab, click Create. In the Create Processor panel, configure the Processor Name, SPL, and Error Handling parameters and click OK. The following table describes the parameters.

      Parameter

      Description

      Processor Name

      The name of the ingest processor. Example: nginx-logs-text.

      Description

      The description of the ingest processor.

      SPL

      The Simple Log Service Processing Language (SPL) statement. Example:

      * | project request_method, request_uri, status

      For more information, see SPL instructions.

      Error Handling

      The action that is performed when an SPL-based data processing failure occurs. Valid values:

      • Retain Raw Data

      • Discard Raw Data

      Note
      • In this topic, SPL-based data processing failures refer to the execution failures of SPL statements. For example, SPL statements may fail to be executed due to invalid data input. SPL-based data processing failures that are caused by invalid SPL syntax are not involved.

      • If data fails to be parsed due to invalid SPL syntax configurations, the raw data is retained by default.

  2. Associate the ingest processor with a Logstore.

    1. In the left-side navigation pane of the project that you want to manage, click Log Storage, move the pointer over the Logstore that you want to manage, and then choose Modify.

    2. In the upper-right corner of the Logstore Attributes page, click Modify. In edit mode, select the ingest processor that you want to associate with the Logstore from the Ingest Processor drop-down list and click Save.image

      Note

      An associated ingest processor takes effect only for incremental logs. Approximately 1 minute is required for the ingest processor to take effect.

  3. On the query and analysis page of the Logstore, click Search & Analyze to query the collected logs.image

Other scenarios

  • Modify fields

    You can use the following SPL instructions to manage fields: project, project-away, project-rename, and extend. Raw log:

    body_bytes_sent: 22646
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.31.1
    remote_user: Elisa
    request_length: 42450
    request_method: PUT
    request_time: 675
    request_uri: /request/path-1/file-6?query=123456
    status: 200
    time_local: 2024-12-04T13:47:54+08:00

    Scenario

    Requirement description

    SPL statement

    Result

    Retain specific fields

    Retain only the following fields:

    • request_method

    • request_uri

    • status

    * | project request_method, request_uri, status
    request_method: PUT
    request_uri: /request/path-1/file-6?query=123456
    status: 200

    Retain only the following fields and rename the result fields:

    • Rename the request_method field to method.

    • Rename the request_uri field to uri.

    • status.

    * | project method=request_method, uri=request_uri, status
    method: PUT
    uri: /request/path-1/file-6?query=123456
    status: 200

    Retain all fields whose names start with request_.

    * | project -wildcard "request_*"
    request_length: 42450
    request_method: PUT
    request_time: 675
    request_uri: /request/path-1/file-6?query=123456

    Delete specific fields

    Delete the following fields:

    • http_protocol

    • referer

    • remote_addr

    • remote_user

    * | project-away http_protocol, referer, remote_addr, remote_user
    body_bytes_sent: 22646
    host: www.example.com
    request_length: 42450
    request_method: PUT
    request_time: 675
    request_uri: /request/path-1/file-6?query=123456
    status: 200
    time_local: 2024-12-04T13:47:54+08:00

    Delete all fields whose names start with request_.

    * | project-away -wildcard "request_*"
    body_bytes_sent: 22646
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.31.1
    remote_user: Elisa
    status: 200
    time_local: 2024-12-04T13:47:54+08:00

    Create fields

    Create a field named app and set the value of the field to test-app.

    * | extend app='test-app'
    app: test-app
    body_bytes_sent: 22646
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.31.1
    remote_user: Elisa
    request_length: 42450
    request_method: PUT
    request_time: 675
    request_uri: /request/path-1/file-6?query=123456
    status: 200
    time_local: 2024-12-04T13:47:54+08:00

    Create a field named request_query and extract the value of the request_query field from the value of the request_uri field.

    * | extend request_query=url_extract_query(request_uri)
    body_bytes_sent: 22646
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.31.1
    remote_user: Elisa
    request_length: 42450
    request_method: PUT
    request_query: query=123456
    request_time: 675
    request_uri: /request/path-1/file-6?query=123456
    status: 200
    time_local: 2024-12-04T13:47:54+08:00

    Modify field names

    Rename the time_local field to time.

    * | project-rename time=time_local
    body_bytes_sent: 22646
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.31.1
    remote_user: Elisa
    request_length: 42450
    request_method: PUT
    request_time: 675
    request_uri: /request/path-1/file-6?query=123456
    status: 200
    time: 2024-12-04T13:47:54+08:00

    Modify field values

    Retain the path in the request_uri field and delete the query parameter.

    * | extend request_uri=url_extract_path(request_uri)

    Or

    * | extend request_uri=regexp_replace(request_uri, '\?.*', '')
    body_bytes_sent: 22646
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.31.1
    remote_user: Elisa
    request_length: 42450
    request_method: PUT
    request_time: 675
    request_uri: /request/path-1/file-6
    status: 200
    time_local: 2024-12-04T13:47:54+08:00
  • Parse fields

    You can use the following SPL instructions and SQL processing functions such as regular expression functions and JSON functions to parse and extract fields: parse-regexp, parse-json, and parse-csv.

    Scenario

    Raw data

    Requirement description

    SPL statement

    Result

    Data parsing in regex mode

    content: 192.168.1.75 - David [2024-07-31T14:27:24+08:00] "PUT /request/path-0/file-8 HTTP/1.1" 819 21577 403 73895 www.example.com www.example.com "Mozilla/5.0 (Windows NT 5.2; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.41 Safari/535.1"

    Extract fields from a NGINX access log by using a regular expression and discard the content field in the raw data.

    * 
    | parse-regexp content, '(\S+)\s-\s(\S+)\s\[(\S+)\]\s"(\S+)\s(\S+)\s(\S+)"\s(\d+)\s(\d+)\s(\d+)\s(\d+)\s(\S+)\s(\S+)\s"(.*)"' as remote_addr, remote_user, time_local, request_method, request_uri, http_protocol, request_time, request_length, status, body_bytes_sent, host, referer, user_agent
    | project-away content
    body_bytes_sent: 73895
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.1.75
    remote_user: David
    request_length: 21577
    request_method: PUT
    request_time: 819
    request_uri: /request/path-0/file-8
    status: 403
    time_local: 2024-07-31T14:27:24+08:00
    user_agent: Mozilla/5.0 (Windows NT 5.2; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.41 Safari/535.1
    request_method: PUT
    request_uri: /request/path-0/file-8
    status: 200

    Extract the string file-8 from the request_uri field and name the string file.

    * | extend file=regexp_extract(request_uri, 'file-.*')

    file: file-8
    request_method: PUT
    request_uri: /request/path-0/file-8
    status: 200

    Data parsing in JSON mode

    headers: {"Authorization": "bearer xxxxx", "X-Request-ID": "29bbe977-9a62-4e4a-b2f4-5cf7b65d508f"}

    Parse the headers field in JSON mode and discard the headers field in the raw data.

    *
    | parse-json headers
    | project-away headers

    Autorization: bearer xxxxx
    X-Request-ID: 29bbe977-9a62-4e4a-b2f4-5cf7b65d508f

    Extract specific fields from the headers field. For example, extract the Authorization field from the headers field and rename the Authorization field to token.

    * | extend token=json_extract_scalar(headers, 'Authorization')

    headers: {"Authorization": "bearer xxxxx", "X-Request-ID": "29bbe977-9a62-4e4a-b2f4-5cf7b65d508f"}
    token: bearer xxxxx

    request: {"body": {"user_id": 12345, "user_name": "Alice"}}

    Parse the body field in the request field in JSON mode.

    * | parse-json -path='$.body' request
    request: {"body": {"user_id": 12345, "user_name": "Alice"}}
    user_id: 12345
    user_name: Alice

    Data parsing in delimiter mode

    content: 192.168.0.100,"10/Jun/2019:11:32:16,127 +0800",www.example.com

    Split fields by using commas (,) and discard the content field in the raw data.

    *
    | parse-csv -quote='"' content as ip, time, host
    | project-away content
    host: www.example.com
    ip: 192.168.0.100
    time: 10/Jun/2019:11:32:16,127 +0800
    content: 192.168.0.100||10/Jun/2019:11:32:16,127 +0800||www.example.com

    Split fields by using the delimiter|| and discard the content field in the raw data.

    *
    | parse-csv -delim='||' content as ip, time, host
    | project-away content
    host: www.example.com
    ip: 192.168.0.100
    time: 10/Jun/2019:11:32:16,127 +0800

  • Filter data

    You can use the where instruction to filter data.

    Note that during SPL-based data processing, all fields in the raw data are used as text by default. Before you process numeric values, you must use data type conversion functions to convert the values of the required fields. For more information, see Data type conversion functions.

    Raw data

    Requirement description

    SPL statement

    Result

    request_id: ddbde824-7c3e-4ff1-a6d1-c3a53fd4a919
    status: 200
    
    ---
    
    request_id: 7f9dad20-bc57-4aa7-af0e-436621f1f51d
    status: 500

    Retain only logs whose status field value is 200.

    * | where status='200'

    Or

    * | where cast(status as bigint)=200
    request_id: ddbde824-7c3e-4ff1-a6d1-c3a53fd4a919
    status: 200
    request_id: ddbde824-7c3e-4ff1-a6d1-c3a53fd4a919
    status: 200
    
    ---
    
    request_id: 7f9dad20-bc57-4aa7-af0e-436621f1f51d
    status: 500
    error: something wrong

    Retain only logs that does not contain the error field.

    * | where error is null
    request_id: ddbde824-7c3e-4ff1-a6d1-c3a53fd4a919
    status: 200

    Retain only logs that contain the error field.

    * | where error is not null
    request_id: 7f9dad20-bc57-4aa7-af0e-436621f1f51d
    status: 500
    error: something wrong
    method: POST
    request_uri: /app/login
    
    ---
    
    method: GET
    request_uri: /user/1/profile
    status: 404
    
    ---
    
    method: GET
    request_uri: /user/2/profile
    status: 200

    Retain only logs whose request_uri field value starts with /user/.

    * | where regexp_like(request_uri, '^\/user\/')
    method: GET
    request_uri: /user/1/profile
    status: 404
    
    ---
    
    method: GET
    request_uri: /user/2/profile
    status: 200

    Retain only logs whose request_uri field value starts with /user/ and whose status field value is 200.

    * | where regexp_like(request_uri, '^\/user\/') and status='200'
    method: GET
    request_uri: /user/2/profile
    status: 200
  • Mask data

    You can use the extend instruction and SQL functions such as regular expression functions, string functions, and URL functions to mask data.

    When you use the regexp_replace function to replace field values, you can use capturing groups. You can use \1, \2, and \N to represent the values of the first, second, and Nth capturing groups.

    For example, the result of the regexp_replace('192.168.1.1', '(\d+)\.(\d+)\.\d+\.\d+', '\1.\2.*.*') function is 192.168.*.*.

    Raw data

    Requirement description

    SPL statement

    Result

    request_uri: /api/v1/resources?user=123&ticket=abc
    status: 200

    Remove sensitive information from the request_uri field.

    * | extend request_uri=url_extract_path(request_uri)

    Or

    * | extend request_uri=regexp_replace(request_uri, '\?.*', '')
    request_uri: /api/v1/resources
    status: 200
    client_ip: 192.168.1.123
    latency: 100

    Mask the middle two octets of an IP address with asterisks (*).

    * | extend client_ip=regexp_replace(client_ip, '(\d+)\.\d+\.\d+\.(\d+)', '\1.*.*.\2')
    client_ip: 192.*.*.123
    latency: 100
    sql: SELECT id, name, config FROM app_info WHERE name="test-app"
    result_size: 1024

    The sql field contains sensitive information. In this case, retain only operations and the name of the table.

    *
    | extend table=regexp_extract(sql, '\bFROM\s+([^\s;]+)|\bINTO\s+([^\s;]+)|\bUPDATE\s+([^\s;]+)', 1)
    | extend action=regexp_extract(sql,'\b(SELECT|INSERT|UPDATE|DELETE|CREATE|DROP|ALTER)\b', 1)
    | project-away sql
    action: SELECT
    table: app_info
    result_size: 1024