Use ingest processors to process logs before the logs are written to a logstore. For example, use ingest processors to modify fields, parse fields, filter data, and mask data. This topic describes how to configure ingest processors. This topic also describes the use cases in which ingest processors are used.
Prerequisites
A project and a Standard logstore are created, and log collection settings are configured. For more information, see Create a project, Create a logstore, and Data collection overview.
Use cases
To extract the request_method, request_uri, and status fields from a raw log, follow these steps.
Raw log
body_bytes_sent: 22646
host: www.example.com
http_protocol: HTTP/1.1
remote_addr: 192.168.31.1
remote_user: Elisa
request_length: 42450
request_method: GET
request_time: 675
request_uri: /index.html
status: 200
time_local: 2024-12-04T13:47:54+08:00Procedure
Create an ingest processor.
Log on to the Simple Log Service console.
In the Projects section, click the project you want.
In the navigation pane on the left, choose .
On the Ingest Processor tab, click Create. Configure parameters and click Save. The following table describes the parameters.
Parameter
Description
Processor Name
Enter a name for the processor. Example:
nginx-logs-text.SPL
The Simple Log Service Processing Language (SPL) statement. Example:
* | project request_method, request_uri, statusFor more information, see SPL instructions.
Error Handling
The action that is performed when an SPL-based data processing failure occurs. Valid values:
Retain Raw Data
Discard Raw Data
NoteIn this topic, SPL-based data processing failures refer to the execution failures of SPL statements. For example, SPL statements may fail to be executed due to invalid data input. SPL-based data processing failures that are caused by invalid SPL syntax are not involved.
If data fails to be parsed due to invalid SPL syntax configurations, the raw data is retained by default.
Associate the ingest processor with a logstore.
In the navigation pane on the left of the project that you want to manage, click Log Storage, move the pointer over the logstore that you want to manage, and then choose .
In the upper-right corner of the Logstore Attributes page, choose Ingest Processor, click Modify, and select the ingest processor that you want to associate with the logstore from the drop-down list. Then click Save.
NoteAn associated ingest processor takes effect only for incremental logs. Approximately 1 minute is required for the ingest processor to take effect.
Other use cases
Modify fields
Use SPL instructions such as
project,project-away,project-rename, andextendto add, delete, and modify data fields. For example, consider the following raw log:body_bytes_sent: 22646 host: www.example.com http_protocol: HTTP/1.1 referer: www.example.com remote_addr: 192.168.31.1 remote_user: Elisa request_length: 42450 request_method: PUT request_time: 675 request_uri: /request/path-1/file-6?query=123456 status: 200 time_local: 2024-12-04T13:47:54+08:00Use case
Requirement description
SPL statement
Result
Retain specific fields
Retain only the following fields:
request_methodrequest_uristatus
* | project request_method, request_uri, statusrequest_method: PUT request_uri: /request/path-1/file-6?query=123456 status: 200Retain only the following fields and rename the result fields:
Rename
request_methodtomethod.Rename
request_uritouri.status
* | project method=request_method, uri=request_uri, statusmethod: PUT uri: /request/path-1/file-6?query=123456 status: 200Retain all fields that start with
request_.* | project -wildcard "request_*"request_length: 42450 request_method: PUT request_time: 675 request_uri: /request/path-1/file-6?query=123456Delete specific fields
Delete the following fields:
http_protocolrefererremote_addrremote_user
* | project-away http_protocol, referer, remote_addr, remote_userbody_bytes_sent: 22646 host: www.example.com request_length: 42450 request_method: PUT request_time: 675 request_uri: /request/path-1/file-6?query=123456 status: 200 time_local: 2024-12-04T13:47:54+08:00Delete all fields that start with
request_.* | project-away -wildcard "request_*"body_bytes_sent: 22646 host: www.example.com http_protocol: HTTP/1.1 referer: www.example.com remote_addr: 192.168.31.1 remote_user: Elisa status: 200 time_local: 2024-12-04T13:47:54+08:00Create fields
Add a field named
appand set its value totest-app.* | extend app='test-app'app: test-app body_bytes_sent: 22646 host: www.example.com http_protocol: HTTP/1.1 referer: www.example.com remote_addr: 192.168.31.1 remote_user: Elisa request_length: 42450 request_method: PUT request_time: 675 request_uri: /request/path-1/file-6?query=123456 status: 200 time_local: 2024-12-04T13:47:54+08:00Add a field named
request_queryand extract its value from therequest_urifield.* | extend request_query=url_extract_query(request_uri)body_bytes_sent: 22646 host: www.example.com http_protocol: HTTP/1.1 referer: www.example.com remote_addr: 192.168.31.1 remote_user: Elisa request_length: 42450 request_method: PUT request_query: query=123456 request_time: 675 request_uri: /request/path-1/file-6?query=123456 status: 200 time_local: 2024-12-04T13:47:54+08:00Modify field names
Rename the
time_localfield totime.* | project-rename time=time_localbody_bytes_sent: 22646 host: www.example.com http_protocol: HTTP/1.1 referer: www.example.com remote_addr: 192.168.31.1 remote_user: Elisa request_length: 42450 request_method: PUT request_time: 675 request_uri: /request/path-1/file-6?query=123456 status: 200 time: 2024-12-04T13:47:54+08:00Modify field values
Process the
request_urifield to retain only the path and discard the parameters.* | extend request_uri=url_extract_path(request_uri)Or
* | extend request_uri=regexp_replace(request_uri, '\?.*', '')body_bytes_sent: 22646 host: www.example.com http_protocol: HTTP/1.1 referer: www.example.com remote_addr: 192.168.31.1 remote_user: Elisa request_length: 42450 request_method: PUT request_time: 675 request_uri: /request/path-1/file-6 status: 200 time_local: 2024-12-04T13:47:54+08:00Parse fields
Use SPL instructions, such as
parse-regexp,parse-json, andparse-csv, and SQL functions for regular expressions and JSON to parse and extract data fields.Use case
Raw data
Requirement description
SPL statement
Result
Data parsing in regex mode
content: 192.168.1.75 - David [2024-07-31T14:27:24+08:00] "PUT /request/path-0/file-8 HTTP/1.1" 819 21577 403 73895 www.example.com www.example.com "Mozilla/5.0 (Windows NT 5.2; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.41 Safari/535.1"Use a regular expression to extract fields from an NGINX access log and discard the original
contentfield.* | parse-regexp content, '(\S+)\s-\s(\S+)\s\[(\S+)\]\s"(\S+)\s(\S+)\s(\S+)"\s(\d+)\s(\d+)\s(\d+)\s(\d+)\s(\S+)\s(\S+)\s"(.*)"' as remote_addr, remote_user, time_local, request_method, request_uri, http_protocol, request_time, request_length, status, body_bytes_sent, host, referer, user_agent | project-away contentbody_bytes_sent: 73895 host: www.example.com http_protocol: HTTP/1.1 referer: www.example.com remote_addr: 192.168.1.75 remote_user: David request_length: 21577 request_method: PUT request_time: 819 request_uri: /request/path-0/file-8 status: 403 time_local: 2024-07-31T14:27:24+08:00 user_agent: Mozilla/5.0 (Windows NT 5.2; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.41 Safari/535.1request_method: PUT request_uri: /request/path-0/file-8 status: 200Extract the
file-8part from therequest_urifield and name the new fieldfile.* | extend file=regexp_extract(request_uri, 'file-.*')file: file-8 request_method: PUT request_uri: /request/path-0/file-8 status: 200Data parsing in JSON mode
headers: {"Authorization": "bearer xxxxx", "X-Request-ID": "29bbe977-9a62-4e4a-b2f4-5cf7b65d508f"}Parse the
headersfield as JSON and discard the originalheadersfield.* | parse-json headers | project-away headersAuthorization: bearer xxxxx X-Request-ID: 29bbe977-9a62-4e4a-b2f4-5cf7b65d508fExtract specific fields from the
headersfield. For example, extract theAuthorizationfield and name the new fieldtoken.* | extend token=json_extract_scalar(headers, 'Authorization')headers: {"Authorization": "bearer xxxxx", "X-Request-ID": "29bbe977-9a62-4e4a-b2f4-5cf7b65d508f"} token: bearer xxxxxrequest: {"body": {"user_id": 12345, "user_name": "Alice"}}Parse the
bodysubfield within therequestfield as JSON.* | parse-json -path='$.body' requestrequest: {"body": {"user_id": 12345, "user_name": "Alice"}} user_id: 12345 user_name: AliceData parsing in delimiter mode
content: 192.168.0.100,"10/Jun/2019:11:32:16,127 +0800",www.example.comSplit the field using commas and discard the original
contentfield.* | parse-csv -quote='"' content as ip, time, host | project-away contenthost: www.example.com ip: 192.168.0.100 time: 10/Jun/2019:11:32:16,127 +0800content: 192.168.0.100||10/Jun/2019:11:32:16,127 +0800||www.example.comUse
||as the separator to split the field and discard the originalcontentfield.* | parse-csv -delim='||' content as ip, time, host | project-away contenthost: www.example.com ip: 192.168.0.100 time: 10/Jun/2019:11:32:16,127 +0800Filter data
Use the
whereinstruction to filter data.Note that during SPL-based data processing, all fields in the raw data are used as text by default. Before you process numeric values, you must use data type conversion functions to convert the values of the required fields. For more information, see Data type conversion functions.
Raw data
Requirement description
SPL statement
Result
request_id: ddbde824-7c3e-4ff1-a6d1-c3a53fd4a919 status: 200 --- request_id: 7f9dad20-bc57-4aa7-af0e-436621f1f51d status: 500Retain only data where the
statusis200.* | where status='200'Or
* | where cast(status as bigint)=200request_id: ddbde824-7c3e-4ff1-a6d1-c3a53fd4a919 status: 200request_id: ddbde824-7c3e-4ff1-a6d1-c3a53fd4a919 status: 200 --- request_id: 7f9dad20-bc57-4aa7-af0e-436621f1f51d status: 500 error: something wrongRetain only data that does not contain the
errorfield.* | where error is nullrequest_id: ddbde824-7c3e-4ff1-a6d1-c3a53fd4a919 status: 200Retain only data that contains the
errorfield.* | where error is not nullrequest_id: 7f9dad20-bc57-4aa7-af0e-436621f1f51d status: 500 error: something wrongmethod: POST request_uri: /app/login --- method: GET request_uri: /user/1/profile status: 404 --- method: GET request_uri: /user/2/profile status: 200Retain only data where the
request_uristarts with/user/.* | where regexp_like(request_uri, '^\/user\/')method: GET request_uri: /user/1/profile status: 404 --- method: GET request_uri: /user/2/profile status: 200Retain data where the
request_uristarts with/user/and thestatusis200.* | where regexp_like(request_uri, '^\/user\/') and status='200'method: GET request_uri: /user/2/profile status: 200Mask data
Using the mask function
The mask function supports built-in rules and keyword matching to accurately mask structured and unstructured data.
Raw data
Requirement description
SPL statement
Result
client_ip: 192.168.1.123 latency: 100Mask the IP address.
* | extend client_ip = mask(client_ip,'[ {"mode":"buildin","types":["IP_ADDRESS"],"maskChar":"*","keepPrefix":3,"keepSuffix":3} ]')client_ip: 192*****123 latency: 1002025-08-20 18:04:40,998 INFO blockchain-event-poller-3 [10.0.1.20] [com.service.listener.TransactionStatusListener:65] [TransactionStatusListener#handleSuccessfulTransaction]{"message":"On-chain transaction successfully confirmed","confirmationDetails":{"transactionHash":"0x2baf892e9a164b1979","status":"success","blockNumber":45101239,"gasUsed":189543,"effectiveGasPrice":"58.2 Gwei","userProfileSnapshot":{"wallet":"0x71C7656EC7a5f6d8A7C4","sourceIp":"203.0.113.55","phone":"19901012345","address":"123 Main St, Anytown, USA","birthday":null}}}Mask sensitive fields in the log, such as wallet address, address information, source IP, phone number, and transaction hash.
*| extend content = mask(content,'[ {"mode":"keyword","keys":["wallet","address","sourceIp","phone","transactionHash"], "maskChar":"*","keepPrefix":3,"keepSuffix":3} ]')2025-08-20 18:04: 40, 998 INFO blockchain-event-poller-3 [10.0.1.20][com.service.listener.TransactionStatusListener: 65]][TransactionStatusListener#handleSuccessfulTransaction]{"message": "On-chain transaction successfully confirmed", "confirmationDetails": {"transactionHash": "0×2**************979", "status": "success", "blockNumber": 45101239, "gasUsed": 189543, "effectiveGasPrice": "58.2 Gwei", "userProfileSnapshot": {"wallet": "0x7****************7C4", "sourceIp": "203******.55", "phone": "199*****345", "address": "Shanghai*********No. 00", "birthday": null}}}Using regular expressions
Use the SPL
extendinstruction together with SQL functions for regular expressions, strings, and URLs to mask data.When you use the
regexp_replacefunction to replace field values based on a regular expression, use capturing groups. When you specify the replacement value, use\1,\2, and so on, to represent the values of the first, second, and subsequent capturing groups, respectively.For example, the result of
regexp_replace('192.168.1.1', '(\d+)\.(\d+)\.\d+\.\d+', '\1.\2.*.*')is192.168.*.*.Raw data
Requirement description
SPL statement
Result
request_uri: /api/v1/resources?user=123&ticket=abc status: 200The
request_uriparameter contains sensitive information that needs to be removed.* | extend request_uri=url_extract_path(request_uri)Or
* | extend request_uri=regexp_replace(request_uri, '\?.*', '')request_uri: /api/v1/resources status: 200client_ip: 192.168.1.123 latency: 100Mask the middle two octets of an IP address with asterisks (*).
* | extend client_ip=regexp_replace(client_ip, '(\d+)\.\d+\.\d+\.(\d+)', '\1.*.*.\2')client_ip: 192.*.*.123 latency: 100sql: SELECT id, name, config FROM app_info WHERE name="test-app" result_size: 1024The
sqlfield may contain sensitive information. Therefore, you need to retain only the operation and the corresponding table name.* | extend table=regexp_extract(sql, '\bFROM\s+([^\s;]+)|\bINTO\s+([^\s;]+)|\bUPDATE\s+([^\s;]+)', 1) | extend action=regexp_extract(sql,'\b(SELECT|INSERT|UPDATE|DELETE|CREATE|DROP|ALTER)\b', 1) | project-away sqlaction: SELECT table: app_info result_size: 1024