Regular expressions in MaxCompute SQL use the Perl Compatible Regular Expressions (PCRE) standards to match strings by character.

RLIKE

The RLIKE operator can be used together with regular expressions to implement queries.
Metacharacter Description
^ Match the beginning of a string.
$ Match the end of a string.
. Match any character.
* Match zero or more instances of the preceding character or character pattern.
+ Match one or more instances of the preceding character or character pattern.
? Match zero or one instance of the preceding character or character pattern.
? Match a modifier. If this character follows one of other characters (*, +, ?, {n}, {n,}, or {n,m}), the match pattern is non-greedy. In the non-greedy pattern, as few characters as possible are matched with the searched character string. In the default greedy pattern, as many characters as possible are matched with the searched character string.
A|B Match A or B.
(abc)* Match zero or more instances of the abc sequence.
{n} or {m,n} The number of matches.
[ab] Match any character (a or b) in the brackets. Fuzzy match is used.
[a-d] Match any of the following characters: a, b, c, and d.
[^ab] Match a character that is not a or b. ^ indicates 'non'.
[::] For more information, see the following table.
\ The escape character.
\n n indicates a digit from 1 to 9 and is backward referenced.
\d Digits.
\D Non-digit characters.

Character group

Category Character group Description Valid value
POSIX [[:alnum:]] Letters and digits [a-zA-Z0-9]
[[:alpha:]] Letters [a-zA-Z]
[[:ascii:]] ASCII characters [\x00-\x7F]
[[:blank:]] Spaces and tab characters [ \t]
[[:cntrl:]] Control characters [\x00-\x1F\x7F]
[[:digit:]] Digits [0-9]
[[:graph:]] Characters other than whitespace characters [\x21-\x7E]
[[:lower:]] Lowercase letters [a-z]
[[:print:]] [:graph:] and whitespace characters [\x20-\x7E]
[[:punct:]] Punctuations [][!"#$%&'()*+,./:;<=>? @\^_`{|}~-]
[[:space:]] Whitespace characters [ \t\r\n\v\f]
[[:upper:]] Uppercase letters [A-Z]
[[:xdigit:]] Hexadecimal characters [A-Fa-f0-9]
Chinese N/A Chinese characters and punctuations [^\\x{00}-\\x{ff}]
N/A Chinese characters [\\x{4e00}-\\x{9fa5}]

Escape characters

The system uses a backslash (\) as the escape character. Backslashes (\) in regular expressions must also be escaped. For example, a regular expression is used to match the a+b string. The plus sign (+) is a special character in the expression, and must be escaped. In the regular expression engine, the string is expressed as a\\+b. The system must perform another escape on the expression. Therefore, the expression that can match the string is a\\\+b.

Assume that the test_dual table exists. Example:
select 'a+b' rlike 'a\\\+b' from test_dual;

+------+
| _c1  |
+------+
| true |
+------+
The \ character is a special character in the regular expression engine and must be expressed as \\ in the engine. Then, the system performs another escape on the expression. As a result, the \ character is expressed as \\\.
select 'a\\b', 'a\\b' rlike 'a\\\b' from test_dual;

+-----+------+
| _c0 | _c1  |
+-----+------+
| a\b | false |
+-----+------+
Note If a MaxCompute SQL statement includes a\\b, a\b is displayed in the output because MaxCompute escapes the expression.
If a string includes tab characters, the system stores \t as one character when the system reads this expression. Therefore, it is a common character in regular expressions.
select 'a\tb', 'a\tb' rlike 'a\tb' from test_dual;

+---------+------+
| _c0     | _c1  |
+---------+------+
| a     b | true |
+---------+------+