REGEXP_EXTRACT_ALL extracts all substrings from a source string that match a regular expression pattern and returns them as an array.
Syntax
ARRAY<STRING> REGEXP_EXTRACT_ALL(STRING <source>, STRING <pattern>)
ARRAY<STRING> REGEXP_EXTRACT_ALL(STRING <source>, STRING <pattern>, BIGINT <group_id>)Parameters
| Parameter | Required | Type | Description |
|---|---|---|---|
source | Yes | STRING | The string to search. |
pattern | Yes | STRING | A STRING constant or a regular expression pattern. Must not be empty. In SQL string literals, use double backslashes to escape special characters — for example, '\\d+' matches one or more digits. For supported syntax, see Regular expression specifications. |
group_id | No | BIGINT | The index of the capturing group to return. Must be 0 or a positive integer. Defaults to 1 when omitted. Set to 0 to match the entire pattern as a single group, which works even when the pattern has no capturing groups. |
Return value
Returns ARRAY<STRING>.
Returns
NULLifsource,pattern, orgroup_idisNULL.Returns an error if
patternis an empty string.Returns an error if
patterncontains no capturing groups andgroup_idis not specified. To avoid this error, either add a capturing group to your pattern or setgroup_idto0. Alternatively, enable BigQuery compatibility mode (see the note below).Returns an error if
group_idis not a BIGINT value or is less than0.
RunSET odps.sql.bigquery.compatible=true;to enable BigQuery compatibility mode. In this mode, ifpatterncontains no capturing groups andgroup_idis not specified, the function returns an array of substrings that match the entire pattern instead of returning an error.
Examples
Return results for the first capturing group (default)
When group_id is omitted, the function returns matches for the first capturing group.
-- Returns ["100","300"].
SELECT REGEXP_EXTRACT_ALL('100-200, 300-400', '(\\d+)-(\\d+)');Return results for a specific capturing group
Set group_id to the index of the capturing group to return.
-- Returns ["200","400"].
SELECT REGEXP_EXTRACT_ALL('100-200, 300-400', '(\\d+)-(\\d+)', 2);Match the entire pattern with group_id = 0
Set group_id to 0 to treat the entire pattern as a single group, which works even when the pattern has no capturing groups.
-- Returns ["100-200","300-200"].
SELECT REGEXP_EXTRACT_ALL('100-200, 300-200', '\\d+-\\d+', 0);Use a pattern with no capturing groups (BigQuery compatibility mode)
Without group_id = 0, a pattern with no capturing groups returns an error by default. Enable BigQuery compatibility mode to return matches for the entire pattern instead.
-- Error (default behavior):
-- FAILED: ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed: SQL Runtime Unretryable Error: ODPS-0121065:Argument value out of range - Regex group count is 0, but the specified group index is 1
SELECT REGEXP_EXTRACT_ALL('100-200, 300-200', '\\d+-\\d+');
-- Enable BigQuery compatibility mode, then retry:
SET odps.sql.bigquery.compatible=true;
-- Returns ["100-200","300-200"].
SELECT REGEXP_EXTRACT_ALL('100-200, 300-200', '\\d+-\\d+');Related functions
REGEXP_EXTRACT_ALL is a string function. For a full list of string functions, see String functions.