This topic describes the matching modes of regular expressions, how to escape special characters in regular expressions, and the group concept of regular expressions.

Full match and partial match

If a regular expression matches an entire string, the string fully matches the regular expression. For example, the 1234 string fully matches the \d+ regular expression, while the abc123 string partially matches the \d+ regular expression.

Some functions use regular expressions in partial match mode. You can enclose a regular expression in a caret (^) and a dollar sign ($) in the format of ^Regular expression$ to forcibly enable full match. For more information about regular expression syntax, see Regular expression operations.

The following table lists the matching modes for different functions.
Category Function Matching mode
Global processing functions e_regex Partial match
e_keep_fields Full match
e_drop_fields Full match
e_rename Full match
e_kv Partial match
e_search_dict_map Partial match
e_search_table_map Partial match
Expression functions e_match The matching mode is controlled by a parameter and is full match by default.
e_search Partial match
regex_select Partial match
regex_findall Partial match
regex_match The matching mode is controlled by a parameter and is partial match by default.
regex_replace Partial match
regex_split Partial match
Matching mode examples:
  • reg_match("abc123", r"\d+"): The string matches the regular expression. The matching mode is partial match, which is the default mode.
  • reg_match("abc123", r"\d+", full=True): The string does not match the regular expression. The matching mode is full match.
  • reg_match("abc123", r"^\d+$"): The string does not match the regular expression. The matching mode is full match based on the syntax of the regular expression.
  • e_search(r'status~="\d+"'): Whether the value of the status field matches the regular expression depends on the actual value. The matching mode is partial match, which is the default mode.
  • e_search(r'status~="^\d+$"'): Whether the value of the status field matches the regular expression depends on the actual value. The matching mode is full match based on the syntax of the regular expression.

Escape special characters

Regular expressions may contain special characters. You must escape these characters if you want to use them literally.
  • Escape special characters through backslashes (\).
  • Escape special characters through the str_regex_escape function.

    For example, the e_drop_fields(str_regex_escape("abc.test") function drops the abc.test field. The e_drop_fields("abc.test") function drops fields whose names match abc?test, where the question mark (?) represents any character.

Group

You can use parentheses () to enclose a subexpression in a regular expression to create a group. The group can be referenced repeatedly. The following example shows a regular expression without a group and a regular expression with a group. The two regular expressions generate the same result.
"""
Raw log:
SourceIP: 1.1.1.1
Processing result:
SourceIP: 1.1.1.1
ip: 1.1.1.1
"""
# The regular expression without a group:
e_regex("SourceIP",r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}","ip")
# The regular expression with a group:
e_regex("SourceIP", "\d{1,3}(.\d{1,3}){3}", "ip")

Capturing group

The text content that matches a capturing group is cached in the memory. The cached text content can be reused through a backreference. A group is a capturing group if the subexpression in parentheses () does not start with a question mark followed by a colon (?:).

By default, all capturing groups are numbered. Capturing groups are numbered from left to right based on the opening parenthesis. The first group is numbered 1, the second group is numbered 2, and so on. Example:
(\d{4})-(\d{2}-(\d{2}))

1     1 2      3     32

If a regular expression contains both common capturing groups and named capturing groups, the named capturing groups are numbered after the common capturing groups.

You can directly reference the custom name of a capturing group in regular expressions or programs.

Non-capturing group

The text content that matches a non-capturing group is not cached in the memory. A group is a non-capturing group if the subexpression in parentheses () starts with a question mark followed by a colon (?:).

For example, the regular expression pro(gram|ject) matches program and project. If you do not want to cache the matched content in the memory, you can write the regular expression as pro(?:gram|ject). Non-capturing groups simplify the matching process and occupy less memory.
Note (?:x) indicates that content is matched with x but the matched content is not cached. You can define a subexpression in the (?:x) format and use it with operators in a regular expression.