All Products
Search
Document Center

Simple Log Service:Ingest processor use cases

Last Updated:Dec 07, 2025

Use ingest processors to process logs before the logs are written to a logstore. For example, use ingest processors to modify fields, parse fields, filter data, and mask data. This topic describes how to configure ingest processors. This topic also describes the use cases in which ingest processors are used.

Prerequisites

A project and a Standard logstore are created, and log collection settings are configured. For more information, see Create a project, Create a logstore, and Data collection overview.

Use cases

To extract the request_method, request_uri, and status fields from a raw log, follow these steps.

Raw log

body_bytes_sent: 22646
host: www.example.com
http_protocol: HTTP/1.1
remote_addr: 192.168.31.1
remote_user: Elisa
request_length: 42450
request_method: GET
request_time: 675
request_uri: /index.html
status: 200
time_local: 2024-12-04T13:47:54+08:00

Procedure

  1. Create an ingest processor.

    1. Log on to the Simple Log Service console.

    2. In the Projects section, click the project you want.

    3. In the navigation pane on the left, choose Resources > Data Processor.

    4. On the Ingest Processor tab, click Create. Configure parameters and click Save. The following table describes the parameters.

      Parameter

      Description

      Processor Name

      Enter a name for the processor. Example: nginx-logs-text.

      SPL

      The Simple Log Service Processing Language (SPL) statement. Example:

      * | project request_method, request_uri, status

      For more information, see SPL instructions.

      Error Handling

      The action that is performed when an SPL-based data processing failure occurs. Valid values:

      • Retain Raw Data

      • Discard Raw Data

      Note
      • In this topic, SPL-based data processing failures refer to the execution failures of SPL statements. For example, SPL statements may fail to be executed due to invalid data input. SPL-based data processing failures that are caused by invalid SPL syntax are not involved.

      • If data fails to be parsed due to invalid SPL syntax configurations, the raw data is retained by default.

  2. Associate the ingest processor with a logstore.

    1. In the navigation pane on the left of the project that you want to manage, click Log Storage, move the pointer over the logstore that you want to manage, and then choose Modify.

    2. In the upper-right corner of the Logstore Attributes page, choose Ingest Processor, click Modify, and select the ingest processor that you want to associate with the logstore from the drop-down list. Then click Save.

      Note

      An associated ingest processor takes effect only for incremental logs. Approximately 1 minute is required for the ingest processor to take effect.

Other use cases

  • Modify fields

    Use SPL instructions such as project, project-away, project-rename, and extend to add, delete, and modify data fields. For example, consider the following raw log:

    body_bytes_sent: 22646
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.31.1
    remote_user: Elisa
    request_length: 42450
    request_method: PUT
    request_time: 675
    request_uri: /request/path-1/file-6?query=123456
    status: 200
    time_local: 2024-12-04T13:47:54+08:00

    Use case

    Requirement description

    SPL statement

    Result

    Retain specific fields

    Retain only the following fields:

    • request_method

    • request_uri

    • status

    * | project request_method, request_uri, status
    request_method: PUT
    request_uri: /request/path-1/file-6?query=123456
    status: 200

    Retain only the following fields and rename the result fields:

    • Rename request_method to method.

    • Rename request_uri to uri.

    • status

    * | project method=request_method, uri=request_uri, status
    method: PUT
    uri: /request/path-1/file-6?query=123456
    status: 200

    Retain all fields that start with request_.

    * | project -wildcard "request_*"
    request_length: 42450
    request_method: PUT
    request_time: 675
    request_uri: /request/path-1/file-6?query=123456

    Delete specific fields

    Delete the following fields:

    • http_protocol

    • referer

    • remote_addr

    • remote_user

    * | project-away http_protocol, referer, remote_addr, remote_user
    body_bytes_sent: 22646
    host: www.example.com
    request_length: 42450
    request_method: PUT
    request_time: 675
    request_uri: /request/path-1/file-6?query=123456
    status: 200
    time_local: 2024-12-04T13:47:54+08:00

    Delete all fields that start with request_.

    * | project-away -wildcard "request_*"
    body_bytes_sent: 22646
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.31.1
    remote_user: Elisa
    status: 200
    time_local: 2024-12-04T13:47:54+08:00

    Create fields

    Add a field named app and set its value to test-app.

    * | extend app='test-app'
    app: test-app
    body_bytes_sent: 22646
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.31.1
    remote_user: Elisa
    request_length: 42450
    request_method: PUT
    request_time: 675
    request_uri: /request/path-1/file-6?query=123456
    status: 200
    time_local: 2024-12-04T13:47:54+08:00

    Add a field named request_query and extract its value from the request_uri field.

    * | extend request_query=url_extract_query(request_uri)
    body_bytes_sent: 22646
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.31.1
    remote_user: Elisa
    request_length: 42450
    request_method: PUT
    request_query: query=123456
    request_time: 675
    request_uri: /request/path-1/file-6?query=123456
    status: 200
    time_local: 2024-12-04T13:47:54+08:00

    Modify field names

    Rename the time_local field to time.

    * | project-rename time=time_local
    body_bytes_sent: 22646
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.31.1
    remote_user: Elisa
    request_length: 42450
    request_method: PUT
    request_time: 675
    request_uri: /request/path-1/file-6?query=123456
    status: 200
    time: 2024-12-04T13:47:54+08:00

    Modify field values

    Process the request_uri field to retain only the path and discard the parameters.

    * | extend request_uri=url_extract_path(request_uri)

    Or

    * | extend request_uri=regexp_replace(request_uri, '\?.*', '')
    body_bytes_sent: 22646
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.31.1
    remote_user: Elisa
    request_length: 42450
    request_method: PUT
    request_time: 675
    request_uri: /request/path-1/file-6
    status: 200
    time_local: 2024-12-04T13:47:54+08:00
  • Parse fields

    Use SPL instructions, such as parse-regexp, parse-json, and parse-csv, and SQL functions for regular expressions and JSON to parse and extract data fields.

    Use case

    Raw data

    Requirement description

    SPL statement

    Result

    Data parsing in regex mode

    content: 192.168.1.75 - David [2024-07-31T14:27:24+08:00] "PUT /request/path-0/file-8 HTTP/1.1" 819 21577 403 73895 www.example.com www.example.com "Mozilla/5.0 (Windows NT 5.2; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.41 Safari/535.1"

    Use a regular expression to extract fields from an NGINX access log and discard the original content field.

    * 
    | parse-regexp content, '(\S+)\s-\s(\S+)\s\[(\S+)\]\s"(\S+)\s(\S+)\s(\S+)"\s(\d+)\s(\d+)\s(\d+)\s(\d+)\s(\S+)\s(\S+)\s"(.*)"' as remote_addr, remote_user, time_local, request_method, request_uri, http_protocol, request_time, request_length, status, body_bytes_sent, host, referer, user_agent
    | project-away content
    body_bytes_sent: 73895
    host: www.example.com
    http_protocol: HTTP/1.1
    referer: www.example.com
    remote_addr: 192.168.1.75
    remote_user: David
    request_length: 21577
    request_method: PUT
    request_time: 819
    request_uri: /request/path-0/file-8
    status: 403
    time_local: 2024-07-31T14:27:24+08:00
    user_agent: Mozilla/5.0 (Windows NT 5.2; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.41 Safari/535.1
    request_method: PUT
    request_uri: /request/path-0/file-8
    status: 200

    Extract the file-8 part from the request_uri field and name the new field file.

    * | extend file=regexp_extract(request_uri, 'file-.*')

    file: file-8
    request_method: PUT
    request_uri: /request/path-0/file-8
    status: 200

    Data parsing in JSON mode

    headers: {"Authorization": "bearer xxxxx", "X-Request-ID": "29bbe977-9a62-4e4a-b2f4-5cf7b65d508f"}

    Parse the headers field as JSON and discard the original headers field.

    *
    | parse-json headers
    | project-away headers

    Authorization: bearer xxxxx
    X-Request-ID: 29bbe977-9a62-4e4a-b2f4-5cf7b65d508f

    Extract specific fields from the headers field. For example, extract the Authorization field and name the new field token.

    * | extend token=json_extract_scalar(headers, 'Authorization')

    headers: {"Authorization": "bearer xxxxx", "X-Request-ID": "29bbe977-9a62-4e4a-b2f4-5cf7b65d508f"}
    token: bearer xxxxx

    request: {"body": {"user_id": 12345, "user_name": "Alice"}}

    Parse the body subfield within the request field as JSON.

    * | parse-json -path='$.body' request
    request: {"body": {"user_id": 12345, "user_name": "Alice"}}
    user_id: 12345
    user_name: Alice

    Data parsing in delimiter mode

    content: 192.168.0.100,"10/Jun/2019:11:32:16,127 +0800",www.example.com

    Split the field using commas and discard the original content field.

    *
    | parse-csv -quote='"' content as ip, time, host
    | project-away content
    host: www.example.com
    ip: 192.168.0.100
    time: 10/Jun/2019:11:32:16,127 +0800
    content: 192.168.0.100||10/Jun/2019:11:32:16,127 +0800||www.example.com

    Use || as the separator to split the field and discard the original content field.

    *
    | parse-csv -delim='||' content as ip, time, host
    | project-away content
    host: www.example.com
    ip: 192.168.0.100
    time: 10/Jun/2019:11:32:16,127 +0800

  • Filter data

    Use the where instruction to filter data.

    Note that during SPL-based data processing, all fields in the raw data are used as text by default. Before you process numeric values, you must use data type conversion functions to convert the values of the required fields. For more information, see Data type conversion functions.

    Raw data

    Requirement description

    SPL statement

    Result

    request_id: ddbde824-7c3e-4ff1-a6d1-c3a53fd4a919
    status: 200
    
    ---
    
    request_id: 7f9dad20-bc57-4aa7-af0e-436621f1f51d
    status: 500

    Retain only data where the status is 200.

    * | where status='200'

    Or

    * | where cast(status as bigint)=200
    request_id: ddbde824-7c3e-4ff1-a6d1-c3a53fd4a919
    status: 200
    request_id: ddbde824-7c3e-4ff1-a6d1-c3a53fd4a919
    status: 200
    
    ---
    
    request_id: 7f9dad20-bc57-4aa7-af0e-436621f1f51d
    status: 500
    error: something wrong

    Retain only data that does not contain the error field.

    * | where error is null
    request_id: ddbde824-7c3e-4ff1-a6d1-c3a53fd4a919
    status: 200

    Retain only data that contains the error field.

    * | where error is not null
    request_id: 7f9dad20-bc57-4aa7-af0e-436621f1f51d
    status: 500
    error: something wrong
    method: POST
    request_uri: /app/login
    
    ---
    
    method: GET
    request_uri: /user/1/profile
    status: 404
    
    ---
    
    method: GET
    request_uri: /user/2/profile
    status: 200

    Retain only data where the request_uri starts with /user/.

    * | where regexp_like(request_uri, '^\/user\/')
    method: GET
    request_uri: /user/1/profile
    status: 404
    
    ---
    
    method: GET
    request_uri: /user/2/profile
    status: 200

    Retain data where the request_uri starts with /user/ and the status is 200.

    * | where regexp_like(request_uri, '^\/user\/') and status='200'
    method: GET
    request_uri: /user/2/profile
    status: 200
  • Mask data

    • Using the mask function

      The mask function supports built-in rules and keyword matching to accurately mask structured and unstructured data.

      Raw data

      Requirement description

      SPL statement

      Result

      client_ip: 192.168.1.123
      latency: 100

      Mask the IP address.

      * | extend client_ip =  mask(client_ip,'[
                   {"mode":"buildin","types":["IP_ADDRESS"],"maskChar":"*","keepPrefix":3,"keepSuffix":3}
                  ]')
      client_ip: 192*****123
      latency: 100
      2025-08-20 18:04:40,998 INFO  blockchain-event-poller-3 [10.0.1.20] [com.service.listener.TransactionStatusListener:65] [TransactionStatusListener#handleSuccessfulTransaction]{"message":"On-chain transaction successfully confirmed","confirmationDetails":{"transactionHash":"0x2baf892e9a164b1979","status":"success","blockNumber":45101239,"gasUsed":189543,"effectiveGasPrice":"58.2 Gwei","userProfileSnapshot":{"wallet":"0x71C7656EC7a5f6d8A7C4","sourceIp":"203.0.113.55","phone":"19901012345","address":"123 Main St, Anytown, USA","birthday":null}}}

      Mask sensitive fields in the log, such as wallet address, address information, source IP, phone number, and transaction hash.

      *| extend content =  mask(content,'[
                 {"mode":"keyword","keys":["wallet","address","sourceIp","phone","transactionHash"], "maskChar":"*","keepPrefix":3,"keepSuffix":3}
               ]')
      2025-08-20 18:04: 40, 998 INFO blockchain-event-poller-3 [10.0.1.20][com.service.listener.TransactionStatusListener: 65]][TransactionStatusListener#handleSuccessfulTransaction]{"message": "On-chain transaction successfully confirmed", "confirmationDetails": {"transactionHash": "0×2**************979", "status": "success", "blockNumber": 45101239, "gasUsed": 189543, "effectiveGasPrice": "58.2 Gwei", "userProfileSnapshot": {"wallet": "0x7****************7C4", "sourceIp": "203******.55", "phone": "199*****345", "address": "Shanghai*********No. 00", "birthday": null}}}

    • Using regular expressions

      Use the SPL extend instruction together with SQL functions for regular expressions, strings, and URLs to mask data.

      When you use the regexp_replace function to replace field values based on a regular expression, use capturing groups. When you specify the replacement value, use \1, \2, and so on, to represent the values of the first, second, and subsequent capturing groups, respectively.

      For example, the result of regexp_replace('192.168.1.1', '(\d+)\.(\d+)\.\d+\.\d+', '\1.\2.*.*') is 192.168.*.*.

      Raw data

      Requirement description

      SPL statement

      Result

      request_uri: /api/v1/resources?user=123&ticket=abc
      status: 200

      The request_uri parameter contains sensitive information that needs to be removed.

      * | extend request_uri=url_extract_path(request_uri)

      Or

      * | extend request_uri=regexp_replace(request_uri, '\?.*', '')
      request_uri: /api/v1/resources
      status: 200
      client_ip: 192.168.1.123
      latency: 100

      Mask the middle two octets of an IP address with asterisks (*).

      * | extend client_ip=regexp_replace(client_ip, '(\d+)\.\d+\.\d+\.(\d+)', '\1.*.*.\2')
      client_ip: 192.*.*.123
      latency: 100
      sql: SELECT id, name, config FROM app_info WHERE name="test-app"
      result_size: 1024

      The sql field may contain sensitive information. Therefore, you need to retain only the operation and the corresponding table name.

      *
      | extend table=regexp_extract(sql, '\bFROM\s+([^\s;]+)|\bINTO\s+([^\s;]+)|\bUPDATE\s+([^\s;]+)', 1)
      | extend action=regexp_extract(sql,'\b(SELECT|INSERT|UPDATE|DELETE|CREATE|DROP|ALTER)\b', 1)
      | project-away sql
      action: SELECT
      table: app_info
      result_size: 1024