Regular Expression

Last Updated: May 09, 2018

The regular expressions in MaxCompute SQL use the PCRE standard, matched by characters. The metacharacter to be supported is as follows:

  1. Metacharacter Description
  2. ^ Top of line (TOL)
  3. $ End of line
  4. . Any character
  5. * Matches for zero or multiple times
  6. + Matches for once or multiple times
  7. ? Matches for zero time or once
  8. ? Matches modifier. When this character follows any other constraints (*, +,? {n}, {n, {n, m},}, the match mode is non greedy. Non greedy mode matches strings as little as possible, while the default greedy mode matches strings as more as possible.
  9. A|B A or B
  10. (abc)* Matches abc for zero or multiple times
  11. {n} or {m,n} Matching times
  12. [ab] Matches any character in the brackets. In the example, it is to match a or b.
  13. [a-d] Matches any character in a, b, c, and d.
  14. [^ab] ^ indicats non’, to match any character which is not a and b.
  15. [::] See POSIX character group in next table.
  16. \ Escape character
  17. \n N is a digit from 1 to 9 and is backward referenced.
  18. \d digits
  19. \D Non-number

POSIX character group:

  1. POSIX Character Group Description Range
  2. [[:alnum:]] letter and digit characters [a-zA-Z0-9]
  3. [[:alpha:]] letter [a-zA-Z]
  4. [[:ascii:]] ASCII character [\x00-\x7F]
  5. [[:blank:]] Space character and tabs [ \t]
  6. [[:cntrl:]] Control character [\x00-\x1F\x7F]
  7. [[:digit:]] Digit character [0-9]
  8. [[:graph:]] Characters except white space characters [\x21-\x7E]
  9. [[:lower:]] Lowercase characters [a-z]
  10. [[:print:]] [:graph:] and white space characters [\x20-\x7E]
  11. [[:punct:]] punctuation [][!”#$%&’()*+,./:;<=>? @\^_`{|}~-]
  12. [[:space:]] White space characters [ \t\r\n\v\f]
  13. [[:upper:]] Uppercase characters [A-Z]
  14. [[:xdigit:]] hexadecimal character [A-Fa-f0-9]

Because the system uses a backslash () as an escape character, all “\” which appear in the regular expression pattern perform two escapes. For example, the regular expression needs to match the string “a+b”. The “+” is a special character in regular expressions and must be expressed by escape. The expression in a regular engine is “a\+b”, because the system needs to explain a layer of escape, the expression which can match this string is “a\\+b”. Suppose that the table test_dual is:

  1. select 'a+b' rlike 'a\\\+b' from test_dual;
  2. +------+
  3. | _c1 |
  4. +------+
  5. | true |
  6. +------+

In extreme cases, to match the character “ \”, because “ \” is a special character in a regular engine, it needs to be expressed by “\”, while the system does an escape for it again, it is written as “\\”.

  1. select 'a\\b', 'a\\b' rlike 'a\\\\b' from test_dual;
  2. +-----+------+
  3. | _c0 | _c1 |
  4. +-----+------+
  5. | a\b | true |
  6. +-----+------+

Note:

To write ”a\\b” in MaxCompute SQL, and the output result is ’a\b’.

If TAB exists in a string, when the system reads these two characters ‘ \t’, they are already saved as one character by the system. Therefore, in regular expression, it is a general character.

  1. select 'a\tb', 'a\tb' rlike 'a\tb' from test_dual;
  2. +---------+------+
  3. | _c0 | _c1 |
  4. +---------+------+
  5. | a b | true |
  6. +---------+------+
Thank you! We've received your feedback.