All Products
Search
Document Center

Simple Log Service:Regular expressions

Last Updated:Aug 11, 2023

This topic describes the matching modes of regular expressions and the methods that can be used to escape special characters in regular expressions.

Full match

If a regular expression matches an entire string, a full match is performed. For example, \d+ fully matches 1234.

Some functions support partial matches for regular expressions. To perform full matches, you can enclose the regular expressions by using a caret (^) and a dollar sign ($) in the ^Regular expression$ format. For more information, see Regular expression operations.

The following table describes the matching modes for different functions.

Category

Function

Matching mode

Global processing functions

e_regex

Partial match

e_keep_fields

Full match

e_drop_fields

Full match

e_rename

Full match

e_kv

Partial match

e_search_dict_map

Partial match

e_search_table_map

Partial match

Expression functions

e_match

Full match by default (configurable by using a parameter)

e_search

Partial match

regex_select

Partial match

regex_findall

Partial match

regex_match

Partial match by default (configurable by using a parameter)

regex_replace

Partial match

regex_split

Partial match

The following examples are based on different matching modes:

  • regex_match("abc123", r"\d+"): The string matches the regular expression. In this example, the default matching mode of partial match is used.

  • regex_match("abc123", r"\d+", full=True): The string does not match the regular expression. In this example, the matching mode is set to full match.

  • regex_match("abc123", r"^\d+$"): The string does not match the regular expression. In this example, the matching mode is considered full match.

  • e_search(r'status~="\d+"'): Whether the value of the status field matches the regular expression is based on the actual value. In this example, the matching mode is considered partial match.

  • e_search(r'status~="^\d+$"'): Whether the value of the status field matches the regular expression is based on the actual value. In this example, the matching mode is considered full match.

Character escape

Regular expressions may contain special characters. If you want to retain the literal meanings of the characters, you must escape the characters. You can use the following methods to escape special characters:

  • Use backslashes (\).

    For more information, see Escape special characters

  • Use the str_regex_escape function.

    • Example 1: If you use e_drop_fields(str_regex_escape("abc.test"), the abc.test field is discarded.

    • Example 2: If you use e_drop_fields("abc.test"), the fields that match abc?test are discarded. The question mark (?) specifies any character.

Group

You can use parentheses () to enclose subexpressions in a regular expression to create a group. The group can be repeatedly referenced. The following example shows the difference between a regular expression before and after a group is created:

"""
Log before processing:
SourceIP: 192.0.2.1
Log after processing:
SourceIP: 192.0.2.1
ip: 192.0.2.1
"""
# Before a group is created:
e_regex("SourceIP",r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}","ip")
# After a group is created:
e_regex("SourceIP", "\d{1,3}(.\d{1,3}){3}", "ip")

Capturing group

The text content that matches a capturing group is cached in the memory. The matched text content can be reused in other regular expressions by using backreferences. If the content that is enclosed in the parentheses () of a group does not start with ?:, the group is a capturing group.

By default, all capturing groups are numbered from left to right based on an opening parenthesis. The first group is numbered 1, the second group is numbered 2, and so on. In the following example, three capturing groups are created:

(\d{4})-(\d{2}-(\d{2}))

1     1 2      3     32

If a regular expression contains both common capturing groups and named capturing groups, the named capturing groups are numbered after the common capturing groups. Simple Log Service allows you to directly reference the custom name of a capturing group in regular expressions or programs.

Non-capturing group

The text content that matches a non-capturing group is not cached in the memory. If the content that is enclosed in the parentheses () of a group starts with ?:, the group is a non-capturing group.

For example, if you want to search for program and project, you can use the pro(gram|ject) regular expression. If you do not want to cache the content that matches the regular expression in the memory, you can use pro(?:gram|ject).

Note

(?:x) specifies that x matches the content but the matched content is not cached. You can define a subexpression in the (?:x) format and use the subexpression together with operators in the regular expression.