This topic describes how to use the string function REGEXP_EXTRACT in Realtime Compute.

Syntax

VARCHAR REGEXP_EXTRACT(VARCHAR str, VARCHAR pattern, INT index)       

Input parameters

Parameter Data type Description
str VARCHAR The source string.
pattern VARCHAR The regular expression pattern that is used to search for substrings.
index INT The index of the substring that you want to extract from the source string.
Notice The constants in the regular expression must comply with the Java code syntax. Codegen can convert SQL constant strings to Java code. If you want to describe a number \d, write it as '\d' in the regular expression. This is the same way as you write a regular expression in Java.

Description

Extracts the substring that has the specified index from the source string based on the specified regular expression pattern. The index number starts from 1. If any input parameter is null or the regular expression is invalid, null is returned.

Example

  • Test data
    str1 (VARCHAR) pattern1 (VARCHAR) index1 (INT)
    foothebar foo(. *?)( bar) 2
    100-200 (\\d+)-(\\d+) 1
    null foo(. *?)( bar) 2
    foothebar null 2
    foothebar null 2
    foothebar ( 2
  • Test statements
    SELECT  REGEXP_EXTRACT(str1, pattern1, index1) as result
    FROM T1;    
  • Test results
    result (VARCHAR)
    bar
    100
    null
    null
    null
    null