Regular expressions in MaxCompute SQL use the Perl Compatible Regular Expressions (PCRE) standards to match strings by character.

RLIKE

To perform queries, you can use the RLIKE operator together with regular expressions.
MetacharacterDescription
^Matches the beginning of a string.
$Matches the end of a string.
.Matches any single character.
*Matches the preceding character or character pattern zero or more times.
+Matches the preceding character or character pattern one or more times.
?Matches the preceding character or character pattern zero or one time.
?Matches a modifier. If this character follows one of other characters (*, +, ?, {n}, {n,}, or {n,m}), the match pattern is non-greedy. In the non-greedy algorithm, a character string matches as few characters as possible. In the greedy algorithm, a character string matches as many characters as possible. By default, the greedy algorithm is used.
A|BMatches A or B.
(abc)*Matches the abc sequence zero or more times.
{n} or {m,n}The number of matches.
[ab]Matches any character in the brackets. The character can be a or b. Fuzzy match is used.
[a-d]Matches one of the following characters: a, b, c, and d.
[^ab]Matches any character except those in the square brackets.
[::]For more information, see the following table.
\The escape character.
\nn is a digit from 1 to 9 and is backward referenced.
\dDigits.
\DNon-digit characters.

Character group

CategoryCharacter groupDescriptionValid value
POSIX[[:alnum:]]Letters and digits[a-zA-Z0-9]
[[:alpha:]]Letters[a-zA-Z]
[[:ascii:]]ASCII characters[\x00-\x7F]
[[:blank:]]Spaces and tab characters[ \t]
[[:cntrl:]]Control characters[\x00-\x1F\x7F]
[[:digit:]]Digits[0-9]
[[:graph:]]Characters other than whitespace characters[\x21-\x7E]
[[:lower:]]Lowercase letters[a-z]
[[:print:]][:graph:] and whitespace characters[\x20-\x7E]
[[:punct:]]Punctuations[][!”#$%&’()*+,./:;<=>? @\^_`{|}~-]
[[:space:]]Whitespace characters[ \t\r\n\v\f]
[[:upper:]]Uppercase letters[A-Z]
[[:xdigit:]]Hexadecimal characters[A-Fa-f0-9]
ChineseN/ADouble-byte characters such as Chinese characters[^\\x{00}-\\x{ff}]
N/AChinese characters[\\x{4e00}-\\x{9fa5}]
N/AChinese punctuationsChinese punctuations do not have unified Unicode range. You can search for the Unicode of a Chinese punctuation in the search engine and use operators to match Chinese punctuations.
Note If you want to match Chinese single quotation marks (’), you can use unicode0027. The Chinese single quotation mark (’) is expressed as [\\x{0027}].

Escape characters

The system uses a backslash (\) as the escape character. Backslashes (\) in regular expressions must also be escaped. For example, a regular expression is used to match the a+b string. The plus sign (+) is a special character in the expression and must be escaped. In the regular expression engine, the string is expressed as a\\+b. The system must perform another escape in the expression. Therefore, the expression that can match the string is a\\\+b.

The following example shows how to use escape characters in a regular expression.
select 'a+b' rlike 'a\\\+b';

+------+
| _c1  |
+------+
| true |
+------+
The \ character is a special character in the regular expression engine and must be expressed as \\ in the engine. Then, the system performs another escape in the expression. As a result, the \ character is expressed as \\\\.
select 'a\\b', 'a\\b' rlike 'a\\\b';
+-----+------+
| _c0 | _c1  |
+-----+------+
| a\b | false |
+-----+------+

select 'a\\b', 'a\\b' rlike 'a\\\\b';
+-----+------+
| _c0 | _c1  |
+-----+------+
| a\b | true |
+-----+------+
Note If a MaxCompute SQL statement contains a\\b, the system returns a\b because MaxCompute escapes the expression.
If a string contains tab characters, the system stores \t as one character when the system reads this expression. In this case, \t is a common character in regular expressions.
select 'a\tb', 'a\tb' rlike 'a\tb';

+---------+------+
| _c0     | _c1  |
+---------+------+
| a     b | true |
+---------+------+