This topic describes the matching modes of regular expressions, how to escape special characters in regular expressions, and the group concept of regular expressions.
Full match
If a regular expression matches an entire string, the string fully matches the regular
expression. For example, the 1234
string fully matches the \d+
regular expression. However, the abc123
string partially matches the \d+
regular expression.
Some functions use regular expressions in partial match mode. You can enclose a regular
expression in a caret (^
) and a dollar sign ($
) in the format of ^Regular expression$
to forcibly enable full match. For more information about the regular expression
syntax, see Regular expression operations.
Type | Function | Matching mode |
---|---|---|
Global processing functions | e_regex | Partial match |
e_keep_fields | Full match | |
e_drop_fields | Full match | |
e_rename | Full match | |
e_kv | Partial match | |
e_search_dict_map | Partial match | |
e_search_table_map | Partial match | |
Expression functions | e_match | The matching mode is controlled by a parameter and is full match by default. |
e_search | Partial match | |
regex_select | Partial match | |
regex_findall | Partial match | |
regex_match | The matching mode is controlled by a parameter and is partial match by default. | |
regex_replace | Partial match | |
regex_split | Partial match |
reg_match("abc123", r"\d+")
: The string matches the regular expression. The matching mode is partial match, which is the default mode.reg_match("abc123", r"\d+", full=True)
: The string does not match the regular expression. The matching mode is full match.reg_match("abc123", r"^\d+$")
: The string does not match the regular expression. The matching mode is full match based on the syntax of the regular expression.e_search(r'status~="\d+"')
: Whether the value of thestatus
field matches the regular expression depends on the actual value. The matching mode is partial match based on the syntax of the regular expression.e_search(r'status~="^\d+$"')
: Whether the value of thestatus
field matches the regular expression depends on the actual value. The matching mode is full match based on the syntax of the regular expression.
Escape special characters
- Escape special characters by using backslashes (
\
). - Escape special characters by using the
str_regex_escape
function.For example, the
e_drop_fields(str_regex_escape("abc.test")
function drops theabc.test
field. Thee_drop_fields("abc.test")
function drops fields whose names matchabc? test
, where the question mark (?
) represents any character.
Group
()
to enclose a subexpression in a regular expression to create a group. The group can
be repeatedly referenced. The following example shows a regular expression without
a group and a regular expression with a group:
"""
Raw log entry:
SourceIP: 1.1.1.1
Result:
SourceIP: 1.1.1.1
ip: 1.1.1.1
"""
# The regular expression without a group:
e_regex("SourceIP",r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}","ip")
# The regular expression with a group:
e_regex("SourceIP", "\d{1,3}(.\d{1,3}){3}", "ip")
Capturing group
The text content that matches a capturing group is cached in the memory. The cached
text content can be reused by using a backreference. A group is a capturing group
if the subexpression in parentheses () does not start with a question mark followed
by a colon (?:
).
(\d{4})-(\d{2}-(\d{2}))
1 1 2 3 32
If a regular expression contains both common capturing groups and named capturing groups, the named capturing groups are numbered after the common capturing groups.
You can directly reference the custom name of a capturing group in regular expressions or programs.
Non-capturing group
The text content that matches a non-capturing group is not cached in the memory. A
group is a non-capturing group if the subexpression in parentheses () starts with
a question mark followed by a colon (?:
).
pro(gram|ject)
matches program
and project
. If you do not want to cache the matched content in the memory, you can write the
regular expression as pro(?:gram|ject)
. Non-capturing groups simplify the matching process and occupy less memory.
(?:x)
indicates that content is matched with x
but the matched content is not cached. You can define a subexpression in the (?:x)
format and use it with operators in a regular expression.