PolarDB for PostgreSQL (Compatible with Oracle) supports three pattern matching methods: LIKE, SIMILAR TO, and POSIX regular expressions. Each method offers different levels of power and complexity.
Regular expressions can be crafted to consume arbitrary amounts of time and memory. When accepting patterns from untrusted sources, impose a statement timeout. SIMILAR TO carries the same security risk as POSIX regular expressions. LIKE is the safest option for untrusted pattern sources.
Tip: For pattern matching beyond what these built-in methods offer, write a user-defined function in Perl or Tcl.
Choose a method
| Method | Matches entire string | Wildcards | Regex features | Security risk |
|---|---|---|---|---|
LIKE | Yes | _, % | None | Low |
SIMILAR TO | Yes | _, % | Alternation, quantifiers | Medium |
| POSIX regex | No (unless anchored) | ., .* | Full | Medium |
LIKE
string LIKE pattern [ESCAPE escape-character]
string NOT LIKE pattern [ESCAPE escape-character]LIKE returns true when the string matches the pattern. NOT LIKE returns true when there is no match.
Two wildcards are available:
_matches any single character.%matches any sequence of zero or more characters.
LIKE always matches against the entire string. To match a substring anywhere in the string, wrap the pattern in %.
Examples:
'abc' LIKE 'abc' -- true
'abc' LIKE 'a%' -- true
'abc' LIKE '_b_' -- true
'abc' LIKE 'c' -- falseEscape characters
The default escape character is the backslash (\). Precede a wildcard with the escape character to match it literally. To match the escape character itself, write it twice.
Use the ESCAPE clause to specify a different escape character:
'50%' LIKE '50!%' ESCAPE '!' -- trueDisable the escape mechanism entirely with ESCAPE ''. This makes it impossible to match a literal underscore or percent sign.
If standard_conforming_strings is off, double all backslashes in literal string constants.
The SQL standard specifies that omitting ESCAPE means there is no escape character (not that it defaults to a backslash), and prohibits zero-length ESCAPE values. PostgreSQL behavior differs slightly from the standard in this regard.
Case-insensitive matching
ILIKE is a case-insensitive variant of LIKE, based on the active locale. It is a PostgreSQL extension not in the SQL standard.
Operator aliases:
| Operator | Equivalent |
|---|---|
~~ | LIKE |
~~* | ILIKE |
!~~ | NOT LIKE |
!~~* | NOT ILIKE |
These operator names appear in EXPLAIN output because the parser translates LIKE and its variants into these operators internally. LIKE, ILIKE, NOT LIKE, and NOT ILIKE are treated as operators in PostgreSQL syntax and can appear in expression operator ANY (subquery) constructs, although an ESCAPE clause cannot be included in that context.
For simple prefix matching, the ^@ operator and the starts_with function are more direct alternatives.
SIMILAR TO regular expressions
string SIMILAR TO pattern [ESCAPE escape-character]
string NOT SIMILAR TO pattern [ESCAPE escape-character]SIMILAR TO returns true when the pattern matches the entire string (unlike POSIX regular expressions, which can match anywhere). It combines LIKE notation with a subset of POSIX regex syntax.
SIMILAR TO uses _ and % as wildcards (equivalent to . and .* in POSIX). Note that . is not a metacharacter in SIMILAR TO.
Additional metacharacters borrowed from POSIX:
| Metacharacter | Meaning |
|---|---|
| | Alternation (either of two alternatives) |
* | Previous item zero or more times |
+ | Previous item one or more times |
? | Previous item zero or one time |
{m} | Previous item exactly m times |
{m,} | Previous item m or more times |
{m,n} | Previous item at least m and at most n times |
() | Group items into a single logical item |
[...] | Bracket expression (character class) |
Escape rules are the same as for LIKE: a backslash disables the special meaning of the following metacharacter. Use ESCAPE to set a different escape character, or ESCAPE '' to disable escaping. (PostgreSQL's behavior here is slightly nonstandard, for the same reasons as with LIKE.)
Examples:
'abc' SIMILAR TO 'abc' -- true
'abc' SIMILAR TO 'a' -- false
'abc' SIMILAR TO '%(b|d)%' -- true
'abc' SIMILAR TO '(b|c)%' -- false
'-abc-' SIMILAR TO '%\mabc\M%' -- true
'xabcy' SIMILAR TO '%\mabc\M%' -- falseExtract a substring with SIMILAR TO
The substring function extracts a portion of a string that matches a SQL regular expression pattern. Two calling conventions are supported:
substring(string from pattern for escape-character)
substring(string, pattern, escape-character)The pattern must match the entire string, or the function returns null. To specify which part of the match to return, include exactly two occurrences of the escape character followed by " in the pattern. The text between these separators is extracted.
The pattern is internally split into three independent regular expressions at the separator positions. The first and third sections match as little text as possible (non-greedy). PostgreSQL also allows one separator (the third regex is taken as empty) or no separators (both the first and third are taken as empty), as an extension to the SQL standard.
Example with #" as the separator:
substring('foobar' from '%#"o_b#"%' for '#') -- oob
substring('foobar' from '#"o_b#"%' for '#') -- NULLPOSIX regular expressions
POSIX regular expressions offer the most expressive pattern matching. Unlike LIKE and SIMILAR TO, a POSIX pattern matches anywhere within a string unless explicitly anchored to the beginning (^) or end ($).
Match operators
| Operator | Returns | Description | Example |
|---|---|---|---|
text ~ text | boolean | Matches, case-sensitive | 'thomas' ~ 't.*ma' → t |
text ~* text | boolean | Matches, case-insensitive | 'thomas' ~* 'T.*ma' → t |
text !~ text | boolean | Does not match, case-sensitive | 'thomas' !~ 't.*max' → t |
text !~* text | boolean | Does not match, case-insensitive | 'thomas' !~* 'T.*ma' → f |
Examples:
'abc' ~ 'abc' -- true
'abc' ~ '^a' -- true, ^ anchors to start of string
'abc' ~ '(b|d)' -- true, | means OR
'abc' ~ '^(b|c)' -- falseregexp_replace
regexp_replace replaces substrings that match a POSIX pattern:
regexp_replace(source, pattern, replacement [, flags])In the replacement string, \n (where n is 1-9) inserts the substring matched by the nth parenthesized subexpression. \& inserts the entire matched substring. Use \\ to insert a literal backslash.
Flags: i for case-insensitive matching, g to replace all matches instead of only the first.
regexp_replace('foobarbaz', 'b..', 'X') -- fooXbaz
regexp_replace('foobarbaz', 'b..', 'X', 'g') -- fooXX
regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g') -- fooXarYXazYregexp_match
regexp_match returns a text array of captured substrings from the first match:
regexp_match(string, pattern [, flags])No match: returns
NULL.Match with no parenthesized subexpressions: returns a single-element array containing the whole matched substring.
Match with parenthesized subexpressions: returns an array where each element is the substring matched by the corresponding subexpression (non-capturing parentheses are not counted).
SELECT regexp_match('foobarbequebaz', 'bar.*que');
-- {barbeque}
SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
-- {bar,beque}To get just the matched string (not an array), index the result:
SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
-- barbequeTip:regexp_matchis available in PostgreSQL version 10 and later. For earlier versions, useregexp_matchesin a sub-select to get equivalent behavior: ``sql SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;``
regexp_matches
regexp_matches has the same syntax as regexp_match but returns a set of text arrays—one row per match. Use the g flag to return all matches:
SELECT regexp_matches('foo', 'not there');
-- (0 rows)
SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b]+)(b[^b]+)', 'g');
-- {bar,beque}
-- {bazil,barf}Tip: When you only want the first match,regexp_matchis simpler and more efficient thanregexp_matches.
regexp_split_to_table and regexp_split_to_array
These functions split a string using a POSIX pattern as a delimiter:
regexp_split_to_table(string, pattern [, flags]) -- returns rows
regexp_split_to_array(string, pattern [, flags]) -- returns a text arrayIf there is no match to the pattern, the function returns the original string unchanged. Both functions ignore zero-length matches at the start or end of the string, or immediately following a previous match.
SELECT foo FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', '\s+') AS foo;
-- the / quick / brown / fox / jumps / over / the / lazy / dog (9 rows)
SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', '\s+');
-- {the,quick,brown,fox,jumps,over,the,lazy,dog}Extract a substring with POSIX regex
substring with two parameters extracts the first portion of the string that matches a POSIX pattern:
substring(string from pattern)Returns null if there is no match. If the pattern contains parentheses, returns the portion matched by the first parenthesized subexpression:
substring('foobar' from 'o.b') -- oob
substring('foobar' from 'o(.)b') -- oRegular expression details
PostgreSQL's regular expression engine was written by Henry Spencer and implements the POSIX 1003.2 specification.
RE forms
PostgreSQL always assumes ARE (advanced regular expression) rules by default. ERE (extended regular expression, similar to egrep) or BRE (basic regular expression, similar to ed) rules can be chosen by prepending an embedded option to the pattern.
Structure
A regular expression consists of one or more branches separated by |. Each branch is a sequence of quantified atoms or constraints. An atom is matched as a unit; a constraint matches an empty string under specific conditions.
Atoms
| Atom | Description |
|---|---|
(re) | Matches re; the match is noted for reporting |
(?:re) | Matches re; the match is not noted (non-capturing); AREs only |
. | Any single character |
[chars] | A bracket expression matching any one of the listed characters |
\k (non-alphanumeric k) | That character literally; e.g., \\ matches a backslash |
\c (alphanumeric c) | An escape; AREs only (in EREs and BREs, matches c literally) |
{ (followed by non-digit) | A literal left brace |
x (any other single character) | That character literally |
An RE cannot end with a backslash.
If standard_conforming_strings is off, double all backslashes in literal string constants.
Quantifiers
| Quantifier | Matches |
|---|---|
* | 0 or more |
+ | 1 or more |
? | 0 or 1 |
{m} | Exactly m |
{m,} | m or more |
{m,n} | m through n inclusive; m cannot exceed n |
*? | Non-greedy * |
+? | Non-greedy + |
?? | Non-greedy ? |
{m}? | Non-greedy {m} |
{m,}? | Non-greedy {m,} |
{m,n}? | Non-greedy {m,n} |
The bound values m and n are unsigned decimal integers from 0 to 255 inclusive. Non-greedy quantifiers are available in AREs only.
A quantifier cannot immediately follow another quantifier, begin an expression or subexpression, or follow ^ or |.
Constraints
| Constraint | Description |
|---|---|
^ | Beginning of string |
$ | End of string |
(?=re) | Positive lookahead: matches where a substring matching re begins; AREs only |
(?!re) | Negative lookahead: matches where no substring matching re begins; AREs only |
(?<=re) | Positive lookbehind: matches where a substring matching re ends; AREs only |
(?<!re) | Negative lookbehind: matches where no substring matching re ends; AREs only |
Lookahead and lookbehind constraints cannot contain back references, and all parentheses within them are non-capturing.
Bracket expressions
A bracket expression [...] matches any single character listed within the brackets. If the list starts with ^, it matches any character not in the list. A range like [0-9] matches all characters between the two endpoints in the collating sequence.
Rules for including special characters:
To include
], make it the first character (after^if present).To include
-, make it the first or last character, or the second endpoint of a range.
Within a bracket expression:
A collating element enclosed in
[.and.]stands for its character sequence, treated as a single element.An equivalence class enclosed in
[=and=]stands for all characters equivalent to that element in the current collation.A character class enclosed in
[:and:]stands for all characters belonging to that POSIX class.
PostgreSQL does not currently support multi-character collating elements.
POSIX standard character classes: alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit. PostgreSQL adds word (letters, digits, and underscore) and ascii (the 7-bit ASCII set).
The bracket expressions [[:<:]] and [[:>:]] are word-boundary constraints, matching empty strings at the beginning and end of a word respectively.
Regular expression escapes
Escapes start with \ followed by an alphanumeric character. In AREs, an invalid escape is an error. In EREs, \ is not special outside bracket expressions.
Character-entry escapes
| Escape | Description |
|---|---|
\a | Alert (bell) |
\b | Backspace |
\B | Backslash (synonym, reduces need for doubling) |
\cX | Character with low-order 5 bits same as X |
\e | ESC character (octal 033) |
\f | Form feed |
\n | Newline |
\r | Carriage return |
\t | Horizontal tab |
\v | Vertical tab |
\uwxyz | Unicode character 0xwxyz (exactly 4 hex digits) |
\Ustuvwxyz | Unicode character 0xstuvwxyz (exactly 8 hex digits) |
\xhhh | Character with hexadecimal value 0xhhh |
\0 | Null byte |
\xy | Character with octal value 0xy (exactly 2 octal digits, not a back reference) |
\xyz | Character with octal value 0xyz (exactly 3 octal digits, not a back reference) |
Escape values outside the ASCII range (0-127) are encoding-dependent. In UTF-8, they map to Unicode code points (for example, \u1234 means U+1234). An escape value that does not correspond to a valid character in the database encoding never matches any data and raises no error.
Class-shorthand escapes
| Escape | Equivalent |
|---|---|
\d | [[:digit:]] |
\s | [[:space:]] |
\w | [[:word:]] |
\D | [^[:digit:]] |
\S | [^[:space:]] |
\W | [^[:word:]] |
These shorthands also work inside bracket expressions; for example, [a-c\d] is equivalent to [a-c[:digit:]].
Constraint escapes
| Escape | Description |
|---|---|
\A | Beginning of string only |
\m | Beginning of a word |
\M | End of a word |
\y | Beginning or end of a word |
\Y | Not the beginning or end of a word |
\Z | End of string only |
Constraint escapes are not valid inside bracket expressions.
Back references
| Escape | Description |
|---|---|
\m | Back reference to the mth subexpression (nonzero digit) |
\mnn | Back reference to the mnnth subexpression, where mnn is within the valid range |
For example, ([bc])\1 matches bb or cc but not bc or cb.
Octal character-entry escapes and back references can be ambiguous. A leading zero always indicates an octal escape. A single non-zero digit not followed by another digit is always a back reference. A multi-digit sequence not starting with zero is treated as a back reference if the number is within the valid range, and as octal otherwise.
Regular expression metasyntax
Director prefixes
***:— the rest of the RE is an ARE.***=— the rest of the RE is a literal string; all characters are ordinary.
Embedded options
An ARE can start with (?xyz) where xyz is one or more option letters:
| Option | Description |
|---|---|
b | Rest of RE is a BRE |
c | Case-sensitive matching (overrides operator type) |
e | Rest of RE is an ERE |
i | Case-insensitive matching |
m | Synonym for n |
n | Newline-sensitive matching |
p | Partial newline-sensitive matching |
q | Rest of RE is a literal string |
s | Non-newline-sensitive matching (default) |
t | Tight syntax (default) |
w | Inverse partial newline-sensitive ("weird") matching |
x | Expanded syntax |
Embedded options take effect at the closing ) and can only appear at the start of an ARE (after the ***: director, if any).
Expanded syntax
With the x option, whitespace and #-to-newline comments in the RE are ignored, allowing you to break up and annotate complex expressions. Three exceptions: a \-preceded whitespace or # is retained; whitespace or # inside a bracket expression is retained; whitespace and comments cannot appear inside multi-character symbols like (?:.
Comments can also be written as (?#ttt) (deprecated; use expanded syntax instead).
Matching rules
When a pattern can match more than one substring, the RE matches the one starting earliest in the string. When multiple matches start at the same position, the match length depends on whether the RE is greedy or non-greedy.
Greediness rules:
Most atoms and all constraints have no greediness attribute.
Adding parentheses does not change greediness.
A quantified atom with
{m}or{m}?inherits the atom's greediness.A quantified atom with other normal quantifiers is greedy (prefers the longest match).
A quantified atom with non-greedy quantifiers (e.g.,
*?) is non-greedy (prefers the shortest match).A branch has the same greediness as its first quantified atom with a greediness attribute.
An RE with top-level
|is always greedy.
Greediness applies to the whole RE, not just individual atoms. Once the overall match length is determined, subexpression match lengths are resolved based on their own greediness, with earlier subexpressions taking priority.
Example:
SELECT SUBSTRING('XY1234Z', 'Y*([0-9]{1,3})');
-- Result: 123 (Y* is greedy; matches Y123; captures 123)
SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})');
-- Result: 1 (Y*? is non-greedy; matches Y1; captures 1)Use {1,1} to force greediness or {1,1}? to force non-greediness on a subexpression when the whole RE needs a different greediness from what its elements imply:
SELECT regexp_match('abc01234xyz', '(.*)(\d+)(.*)');
-- Result: {abc0123,4,xyz} -- first .* eats too much
SELECT regexp_match('abc01234xyz', '(.*?)(\d+)(.*)');
-- Result: {abc,0,""} -- RE is non-greedy overall; ends too soon
SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
-- Result: {abc,01234,xyz} -- force whole RE to be greedyMatch lengths are measured in characters. An empty string is considered longer than no match at all.
Newline-sensitive matching: with the n option, . and ^-negated bracket expressions do not match newlines, and ^/$ also match at newline boundaries. \A and \Z continue to match only the start and end of the entire string. \D and \W match newlines regardless of this mode (unlike behavior before PostgreSQL 14).
Limits and compatibility
No length limit is imposed on regular expressions in this implementation. For portability, avoid REs longer than 256 bytes (POSIX-compliant implementations may refuse them).
The only true incompatibility between AREs and POSIX EREs is that \ retains its special significance inside bracket expressions in AREs. All other ARE features use syntax that is either illegal or undefined in POSIX EREs.
Incompatibilities introduced in version 7.4:
In AREs,
\followed by an alphanumeric character is either an escape or an error (previously it was just the alphanumeric character).In AREs,
\is special inside[]; a literal backslash inside a bracket expression must be written as\\.
Differences from XQuery (LIKE_REGEX)
SQL:2008 defines a LIKE_REGEX operator based on the XQuery regular expression standard. PostgreSQL does not yet implement LIKE_REGEX. The regexp_match function provides similar behavior, since XQuery regular expressions are close to the ARE syntax described here.
Key differences between POSIX-based and XQuery regular expressions:
XQuery character class subtraction (
[a-z-[aeiou]]) is not supported.XQuery class shorthands
\c,\C,\i,\Iare not supported.XQuery Unicode property classes (
\p{UnicodeProperty},\P{UnicodeProperty}) are not supported.POSIX interprets character classes (like
\w) by the active locale; XQuery uses Unicode properties.XQuery newline handling is broader: it treats
\r,\r\n, and certain Unicode characters (e.g., LINE SEPARATOR U+2028) as newlines in addition to\n. Notably,.and\scount\r\nas one character, not two, according to SQL.XQuery character-entry escapes: only
\n,\r, and\tare supported.XQuery does not support
[::]character class syntax in bracket expressions.XQuery has no lookahead, lookbehind, constraint escapes, or metasyntax forms.
XQuery flag letters differ from POSIX: while the
iandqoptions behave the same, others do not. XQuery'ss(dot matches newline) andm(^ and $ match at newlines) differ from POSIXsandm; XQuery'sx(ignore whitespace) differs from POSIX expanded mode in that POSIX also allows#comments and preserves backslash-escaped whitespace.