Regular expressions in MaxCompute SQL use the Perl Compatible Regular Expressions (PCRE) standards to match strings by character.

RLIKE

To perform queries, you can use the RLIKE operator together with regular expressions.
Metacharacter Description
^ Matches the beginning of a string.
$ Matches the end of a string.
. Matches any single character.
* Matches zero or more instances of the preceding character or character pattern.
+ Matches one or more instances of the preceding character or character pattern.
? Matches zero or one instance of the preceding character or character pattern.
? Matches a modifier. If this character follows one of other characters (*, +, ?, {n}, {n,}, or {n,m}), the match pattern is non-greedy. In the non-greedy algorithm, a character string matches as few characters as possible. In the greedy algorithm, a character string matches as many characters as possible. By default, the greedy algorithm is used.
A|B Matches A or B.
(abc)* Matches zero or more instances of the abc sequence.
{n} or {m,n} The number of matches.
[ab] Matches any character in the brackets. The character can be a or b. Fuzzy match is used.
[a-d] Matches one of the following characters: a, b, c, and d.
[^ab] Matches any character except those in the square brackets.
[::] For more information, see the following table.
\ The escape character.
\n n is a digit from 1 to 9 and is backward referenced.
\d Digits.
\D Non-digit characters.

Character group

Category Character group Description Valid value
POSIX [[:alnum:]] Letters and digits [a-zA-Z0-9]
[[:alpha:]] Letters [a-zA-Z]
[[:ascii:]] ASCII characters [\x00-\x7F]
[[:blank:]] Spaces and tab characters [ \t]
[[:cntrl:]] Control characters [\x00-\x1F\x7F]
[[:digit:]] Digits [0-9]
[[:graph:]] Characters other than whitespace characters [\x21-\x7E]
[[:lower:]] Lowercase letters [a-z]
[[:print:]] [:graph:] and whitespace characters [\x20-\x7E]
[[:punct:]] Punctuations [][!"#$%&'()*+,./:;<=>? @\^_`{|}~-]
[[:space:]] Whitespace characters [ \t\r\n\v\f]
[[:upper:]] Uppercase letters [A-Z]
[[:xdigit:]] Hexadecimal characters [A-Fa-f0-9]
Chinese N/A Double-byte characters such as Chinese characters. [^\\x{00}-\\x{ff}]
N/A Chinese characters [\\x{4e00}-\\x{9fa5}]
N/A Chinese punctuations Chinese punctuations do not have unified Unicode range. You can search for the Unicode of a Chinese punctuation in the search engine and use operators to match Chinese punctuations.

Escape characters

The system uses a backslash (\) as the escape character. Backslashes (\) in regular expressions must also be escaped. For example, a regular expression is used to match the a+b string. The plus sign (+) is a special character in the expression and must be escaped. In the regular expression engine, the string is expressed as a\\+b. The system must perform another escape in the expression. Therefore, the expression that can match the string is a\\\+b.

The following example shows how to use escape characters in a regular expression. The test_dual table is used.
select 'a+b' rlike 'a\\\+b';

+------+
| _c1  |
+------+
| true |
+------+
The \ character is a special character in the regular expression engine and must be expressed as \\ in the engine. Then, the system performs another escape in the expression. As a result, the \ character is expressed as \\\\.
select 'a\\b', 'a\\b' rlike 'a\\\b';
+-----+------+
| _c0 | _c1  |
+-----+------+
| a\b | false|
+-----+------+

select 'a\\b', 'a\\b' rlike 'a\\\\b';
+-----+------+
| _c0 | _c1  |
+-----+------+
| a\b | true |
+-----+------+
Note If a MaxCompute SQL statement contains a\\b, the system returns a\b because MaxCompute escapes the expression.
If a string contains tab characters, the system stores \t as one character when the system reads this expression. In this case, \t is a common character in regular expressions.
select 'a\tb', 'a\tb' rlike 'a\tb';

+---------+------+
| _c0     | _c1  |
+---------+------+
| a     b | true |
+---------+------+