AnalyticDB for MySQL supports the following regular expression functions for pattern matching, extraction, and replacement in SQL queries.
REGEXP_INSTR — returns the position of a match
REGEXP_MATCHES — returns an array of all matched substrings
REGEXP_REPLACE — replaces matched substrings
REGEXP_SUBSTR — returns a matched substring
Prerequisites
Before you begin, ensure that:
The minor engine version of the AnalyticDB for MySQL cluster is 3.1.5.10 or later
To check the minor engine version, see How can I view the version of an AnalyticDB for MySQL cluster?
REGEXP_INSTR
regexp_instr(source, pattern[, position[, occurrence[, option]]])Returns an integer indicating the start or end position of the first (or Nth) substring in source that matches pattern. Returns 0 if no match is found.
Parameters
Required:
| Parameter | Type | Description |
|---|---|---|
source | VARCHAR | The string to search. |
pattern | — | The regular expression to match against. |
Optional:
| Parameter | Type | Default | Description |
|---|---|---|---|
position | BIGINT | 1 | The character position in source where matching starts. |
occurrence | BIGINT | 1 | Which occurrence of the match to return. |
option | BIGINT | 0 | Controls whether to return the start position of the match (0) or the position immediately after the end of the match (1). |
Return value
Returns BIGINT. Returns 0 if no match is found.
Examples
Return the start position of the first match
SELECT REGEXP_INSTR('dog cat dog', 'dog') as res;+-----+
| res |
+-----+
| 1 |
+-----+Return the start position of the second match
SELECT REGEXP_INSTR('dog cat dog', 'dog', 1, 2) as res;+-----+
| res |
+-----+
| 9 |
+-----+Return the position after the end of the first match
SELECT REGEXP_INSTR('dog cat dog', 'dog', 1, 1, 1) as res;+-----+
| res |
+-----+
| 4 |
+-----+REGEXP_MATCHES
regexp_matches(source, pattern[, flag])Returns an ARRAY(ARRAY(VARCHAR)) of all substrings in source that match pattern. Returns an empty array if no match is found.
Without the
gflag: returns only the first match.With the
gflag: returns all matches.If
patterncontains capture groups, the matched substrings of each group are returned as nested arrays. Otherwise, the full match is returned.
To get a single matched string rather than an array, use REGEXP_SUBSTR instead.
Parameters
Required:
| Parameter | Type | Description |
|---|---|---|
source | VARCHAR | The string to search. |
pattern | — | The regular expression to match against. |
Optional:
| Parameter | Type | Description |
|---|---|---|
flag | VARCHAR | One or more characters that control matching behavior. Use g to return all matches instead of just the first. |
Return value
Returns ARRAY(ARRAY(VARCHAR)). Returns an empty array if no match is found.
Examples
Match with capture groups (first match only)
SELECT regexp_matches('foobarbequebaz', '(bar)(beque)');+---------------------+
| regexp_matches |
+---------------------+
| [["bar","beque"]] |Match without capture groups
SELECT regexp_matches('foobarbequebaz', 'barbeque');+---------------------+
| regexp_matches |
+---------------------+
| [["barbeque"]] |Match all occurrences using the `g` flag
SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b]+)(b[^b]+)', 'g');+------------------------------------------+
| regexp_matches |
+------------------------------------------+
| [["bar","beque"], ["bazil","barf"]] |REGEXP_REPLACE
regexp_replace(source, pattern, replacement[, position[, occurrence]])Replaces substrings in source that match pattern with replacement. By default, replaces all matches starting from the first character. Returns the original string if no match is found.
Parameters
Required:
| Parameter | Type | Description |
|---|---|---|
source | VARCHAR | The string to search. |
pattern | — | The regular expression to match against. |
replacement | VARCHAR | The string to substitute for each match. |
Optional:
| Parameter | Type | Default | Description |
|---|---|---|---|
position | BIGINT | 1 | The character position in source where matching starts. |
occurrence | BIGINT | 0 | Which occurrence to replace. 0 replaces all matches; a positive integer replaces only that occurrence. |
Return value
Returns VARCHAR. Returns the original string if no match is found.
Examples
Replace all matches
SELECT REGEXP_REPLACE('abc def ghi', '[a-z]+', 'X') as res;+-------+
| res |
+-------+
| X X X |
+-------+Replace only the third match
SELECT REGEXP_REPLACE('abc def ghi', '[a-z]+', 'X', 1, 3) as res;+-----------+
| res |
+-----------+
| abc def X |
+-----------+REGEXP_SUBSTR
regexp_substr(source, pattern[, position[, occurrence]])Returns the substring in source that matches pattern. Returns NULL if no match is found.
Parameters
Required:
| Parameter | Type | Description |
|---|---|---|
source | VARCHAR | The string to search. |
pattern | — | The regular expression to match against. |
Optional:
| Parameter | Type | Default | Description |
|---|---|---|---|
position | BIGINT | 1 | The character position in source where matching starts. |
occurrence | BIGINT | 1 | Which occurrence of the match to return. |
Return value
Returns VARCHAR. Returns NULL if no match is found.
Examples
Return the third match
SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+', 1, 3) as res;+------+
| res |
+------+
| ghi |
+------+Return the first match (default behavior)
SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+') as res;+------+
| res |
+------+
| abc |
+------+