All Products
Search
Document Center

Simple Log Service:Getting started with regular expressions

Last Updated:Oct 24, 2023

Regular expressions are patterns that are used to match characters in text. This topic describes how to get started with regular expressions.

Important

This topic may contain information about third-party tools. The information is for reference only. Alibaba Cloud does not guarantee or make any commitments to the performance and reliability of the third-party tools, or the potential impacts of operations on these tools.

If you are not familiar with regular expressions, you can practice or debug your regular expressions by using tools such as regex101. These tools visualize regular expressions and the matching process of regular expressions in text. In this topic, regex101 is used to illustrate the examples of regular expressions.

Syntax

A regular expression consists of different types of characters, including literal characters, metacharacters, delimiters, and escape characters.

  • Literal character: matches a character as it is.

  • Metacharacter: matches a specific character or character set. For example, a period (.) can match any character, and \d can match any digit.

  • Delimiter: marks the start or end of a regular expression. In most cases, a forward slash (/) or a number sign (#) is used as a delimiter.

  • Escape character: escapes a character for special use, such as a metacharacter or a limiter, to its literal meaning. A backward slash (\) is used as an escape character. For example, \. can match a period.

By default, the regex101 tool adds the delimiter (/) before the regular expression a.\d\., as shown in the following figure. In the regular expression, a matches the letter a, . matches an arbitrary character, \d matches an arbitrary number, and \. matches a period.image.png

The following table describes commonly used characters.

Important

The characters and syntax that are supported vary based on the programming languages and regular expression engines. You must specify regular expressions based on the programming language and regular expression engine that you use.

Character

Description

.

Matches any character except a line feed.

\d

Matches any digit character and is equivalent to [0-9].

\D

Matches any non-digit character and is equivalent to [^0-9].

\w

Matches any letter, digit, or underscore (_), and is equivalent to [A-Za-z0-9_].

\W

Matches any character except letters, digits, and underscores (_), and is equivalent to [^A-Za-z0-9_].

\s

Matches any white space character, including spaces, tabs, and line feeds.

\S

Matches any non-white space character.

\b

Matches a word boundary, which is the position between a word character and a non-word character.

\B

Matches a non-word boundary.

*

Matches the preceding character zero or multiple times.

+

Matches the preceding character one or more times.

?

Matches the preceding character zero or one time.

|

Matches one element based on an OR operator.

{n}

Matches the preceding character n times.

{n,}

Matches the preceding character at least n times.

{n,m}

Matches the preceding character at least n times and at most m times.

[abc]

Matches any character in the character set.

[^abc]

Matches any character except the characters in the character set.

^

Matches the start of a string.

$

Matches the end of a string.

()

Represents a group. A group of characters in the pair of parentheses are considered as a whole.

/

Marks the start and end of a regular expression. This character is commonly used as a delimiter.

\

Escapes a character for special use, such as a metacharacter or a limiter, to its literal meaning. A backward slash (\) is used as an escape character.

Examples

Example 1: Match a string that contains a specified keyword

Filter for logs that contain the 05/Jan/2023 keyword.

  • Sample logs: Info 05/Jan/2023 Warning and Info 06/Jan/2023 Error

  • Regular expression: .*05\/Jan\/2023.*

    • .* matches any zero or multiple characters. In this example, logs are matched regardless of the characters that precede or follow 05/Jan/2023.

    • 05\/Jan\/2023 represents the 05/Jan/2023 keyword for matching.

      Logtail supports regular expressions that use a forward slash (/) as delimiters. In this case, the escape character (\) is required. You must add the escape character (\) before a forward slash (/) to escape the forward slash (/) to its literal meaning.

image.png

Example 2: Match a mobile phone number

Filter for logs that contain 11-digit mobile phone numbers. The numbers start with 111 or 222.

  • Sample logs: 11144445555, 22266667777, and 33388889999

  • Regular expression: (111|222)\d{8}

    In a mobile phone number, the first three digits are the operator code, the middle four digits are the area code, and the last four digits are arbitrary digits. Assume that the operator code can only be 111 or 222 and the area code is arbitrary digits.

    • (111|222) represents a group that contains the supported values: 111 and 222.

    • \d matches any digit.

    • {8} specifies that \d must match any digit eight times. As a result, a total of eight digits are matched.

image.png

Example 3: Match a complete string

Filter for logs in the [Time] [Level] [Module] [Information] format. The time is in the yyyy-mm-dd hh:mm:ss format. The level can be DEBUG, INFO, WARN, or ERROR. The module and information are arbitrary strings.

  • Sample log: [2021-09-23 10:23:45] [INFO] [user login] [user login success]

  • Regular expression: ^\[\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}\] \[(DEBUG|INFO|WARN|ERROR)\] \[.+\] \[.+\]$

    • \[ and \] match [] because [] have special meaning in the syntax of regular expressions and require the escape character (\).

    • \[\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}\] matches all dates and time.

    • \[(DEBUG|INFO|WARN|ERROR)\] matches all levels of logs.

    • \[.+\] \[.+\]$ matches any non-empty string.

image.png

Example 4: Match a string that does not start with a specified keyword

Filter for logs that do not start with DEBUG.

  • Sample logs: DEBUG: test debug and INFO: test info

  • Regular expression: ^(?!DEBUG).*

    • ^ matches the start of a string. In this example, DEBUG is the start of a string.

    • (?!DEBUG) excludes the logs that contain DEBUG. (?!DEBUG) specifies a negative assert in the (?!<pattern>) format, where <pattern> specifies the content to exclude.

    • .* matches any character until the end of the log line.

image.png

Example 5: Match a string that does not contain a specified keyword

Filter for logs that do not contain INFO or DEBUG.

  • Sample logs: hello world, INFO, ERROR message, DEBUG, warning log, error INFO, debug detail, and info status

  • Regular expression: ^(?!.*(INFO|DEBUG)).*

    • ^ matches the start of a string. In this example, INFO or DEBUG is the start of a string.

    • (?!.*(INFO|DEBUG)) excludes the logs that contain INFO or DEBUG.

    • .* matches any character until the end of the log line.

image.png

References