String functions and operators - PolarDB - Alibaba Cloud Documentation Center

This topic describes the functions and operators that are used to check and manipulate string values.

Strings in this context include values of the character, character varying, and text data types. Unless otherwise noted, these functions and operators are declared to accept and return values of the text data type. The functions and operators interchangeably accept values of the character varying data type. Before a function or operator is applied, values of the character data type are converted to the text data type. As a result, any trailing spaces in values of the character type are deleted.

SQL defines some string functions that use keywords other than commas (,) to separate parameters. For more information, see the "SQL string functions and operators" section of this topic. PolarDB for PostgreSQL(Compatible with Oracle) also provides versions of these functions that can be invoked by using the regular function invocation syntax. For more information, see the "Other string functions" section of this topic.

Note

Due to the implicit type conversion of values from other data types to the text data type, the functions in the earlier versions of PolarDB for PostgreSQL(Compatible with Oracle) accept values of non-string types. The implicit type conversions are disabled in the current version of PolarDB for PostgreSQL(Compatible with Oracle). This is because these type conversions always cause surprising behaviors. However, the string concatenation operator (||) still accepts input values of non-string types, as long as at least one of the input values is of the string type. For more information, see the "SQL string functions and operators" section of this topic. In other cases, if you want to retain the previous behaviors, you can use an explicit type conversion to convert values to the text data type.

SQL string functions and operators

text || text → text

Concatenates two strings.

'Post' || 'greSQL' → PostgreSQL

text || anynonarray → text

anynonarray || text → text

Converts non-string input to text and concatenates two strings. The non-string input cannot be of the array data type because this causes ambiguity with the array concatenation operator ||. If you want to concatenate an array of text equivalents, explicitly convert the array to the text data type.

'Value: ' || 42 → Value: 42

text IS [NOT] [form] NORMALIZED → boolean

Checks whether a string is in the specified Unicode normalization form. form: optional. The Unicode normalization form. Valid values: NFC (default), NFD, NFKC, and NFKD. This function can be used only if the server uses UTF8 encoding. You can use this function to check whether a string is normalized, which is often faster than normalizing possibly already normalized strings.

U&'\0061\0308bc' IS NFD NORMALIZED → t

bit_length ( text ) → integer

Returns the number of bits in a string, which is eight times the return value of octet_length.

bit_length('jose') → 32

char_length ( text ) → integer

character_length ( text ) → integer

Returns the number of characters in a string.

char_length('josé') → 4

lower ( text ) → text

Converts all letters in a string to lowercase letters based on the locale of the database.

lower('TOM') → tom

normalize ( text [, form ] ) → text

Converts a string to the specified Unicode normalization form. form: optional. The Unicode normalization form. Valid values: NFC (default), NFD, NFKC, and NFKD. This function can be used only if the server uses UTF8 encoding.

normalize(U&'\0061\0308bc', NFC) → U&'\00E4bc'

octet_length ( text ) → integer

Returns the number of bytes in a string.

octet_length('josé') → 5 (if server encoding is UTF8)

octet_length ( character ) → integer

Returns the number of bytes in a string. This version of the function accepts values of the character data type. Therefore, the trailing spaces in the values are not automatically stripped.

octet_length('abc '::character(4)) → 4

overlay ( string text PLACING newsubstring text FROM start integer [ FOR count integer ] ) → text

Replaces the substring from the character at the start position in the string and extends for count characters by using the newsubstring. If count is omitted, the length of the newsubstring is used by default.

overlay('Txxxxas' placing 'hom' from 2 for 4) → Thomas

position ( substring text IN string text ) → integer

Returns the start index of the specified substring in the string. If the substring does not exist, 0 is returned.

position('om' in 'Thomas') → 3

substring ( string text [ FROM start integer ] [ FOR count integer ] ) → text

Extracts the substring from the character at the start position in the string if start is specified, and stops extraction after count characters are extracted if count is specified. Specify at least one of start and count.

substring('Thomas' from 2 for 3) → hom

substring('Thomas' from 3) → omas

substring('Thomas' for 2) → Th

substring ( string text FROM pattern text ) → text

Extracts the first substring that matches a POSIX regular expression.

substring('Thomas' from '...$') → mas

substring ( string text FROM pattern text FOR escape text ) → text

Extracts the first substring that matches an SQL regular expression.

substring('Thomas' from '%#"o_a#"_' for '#') → oma

trim ( [ LEADING | TRAILING | BOTH ] [ characters text ] FROM string text ) → text

Removes the longest string that contains only the characters specified by characters, which is a space by default, from the start, end, or both sides of the string. By default, BOTH is used in this function, which indicates that the removal is performed from both sides of the string.

trim(both 'xyz' from 'yxTomxx') → Tom

trim ( [ LEADING | TRAILING | BOTH ] [ FROM ] string text [, characters text ] ) → text

This is the non-standard syntax of trim().

trim(both from 'yxTomxx', 'xyz') → Tom

upper ( text ) → text

Convert all letters in a string to uppercase letters based on the locale of the database.

upper('tom') → TOM

The following section describes other string functions that are available. Some of the functions are internally used to implement standard SQL string functions that are described in the previous table.

Other string functions

ascii ( text ) → integer

Returns the numeric code of the first character of the parameter. In UTF8 encoding, the Unicode code point of the character is returned. In other multi-byte encodings, the parameter must be an ASCII character.

ascii('x') → 120

btrim ( string text [, characters text ] ) → text

Removes the longest string that contains only the characters specified by characters, which is a space by default, from the start and end of the string.

btrim('xyxtrimyyx', 'xyz') → trim

chr ( integer ) → text

Returns the character with the specified code. In UTF8 encoding, the parameter is parsed as a Unicode code point. In other multi-byte encodings, the parameter must specify an ASCII character. chr(0) is not allowed because the character cannot be stored in the text data type.

chr(65) → A

concat ( val1 "any" [, val2 "any" [, ...] ] ) → text

Concatenates the text representations of all parameters. NULL parameters are ignored.

concat('abcde', 2, NULL, 22) → abcde222

concat_ws ( sep text, val1 "any" [, val2 "any" [, ...] ] ) → text

Concatenate all but the first parameter by using a delimiter. The first parameter specifies the delimiter and cannot be NULL. Other NULL parameters are ignored.

concat_ws(',', 'abcde', 2, NULL, 22) → abcde,2,22

format ( formatstr text [, formatarg "any" [, ...] ] ) → text

Formats parameters based on a format string. This function is similar to the C function sprintf.

format('Hello %s, %1$s', 'World') → Hello World, World

initcap ( text ) → text

Converts the first letter of each word to an uppercase letter and the rest letters to lowercase letters. Words are sequences of alphanumeric characters separated with non-alphanumeric characters.

initcap('hi THOMAS') → Hi Thomas

left ( string text, n integer ) → text

Returns the first n characters in the string, or returns all characters except for the last |n| characters if n is negative.

left('abcde', 2) → ab

length ( text ) → integer

Returns the number of characters in a string.

length('jose') → 4

lpad ( string text, length integer [, fill text ] ) → text

Extends the length of the string to length by prepending characters specified by fill, which is a space by default. If the string is already longer than length, the string is truncated on the right.

lpad('hi', 5, 'xy') → xyxhi

ltrim ( string text [, characters text ] ) → text

Removes the longest string that contains only the characters specified by characters, which is a space by default, from the start of the string.

ltrim('zzzytest', 'xyz') → test

md5 ( text ) → text

Calculates the MD5 hash value of the parameter. The result is returned in the hexadecimal format.

md5('abc') → 900150983cd24fb0d6963f7d28e17f72

parse_ident ( qualified_identifier text [, strict_mode boolean DEFAULT true ] ) → text[]

Splits the string specified by qualified_identifier into an array of identifiers, and removes all quotation marks of individual identifiers. By default, extra characters after the last identifier are considered an error. However, if the value of the second parameter is false, these extra characters are ignored. This behavior works on parsing names of objects such as functions. However, this function does not truncate identifiers of excess length. If you want to truncate identifiers, you can convert the result to name[].

parse_ident('"SomeSchema".someTable') → {SomeSchema,sometable}

pg_client_encoding ( ) → name

Returns the encoding name of the current client.

pg_client_encoding() → UTF8

quote_ident ( text ) → text

Returns a string properly quoted so that it can be used as an identifier in an SQL statement string. Quotation marks are added only if necessary. For example, a string is quoted if it contains non-identifier characters or is to be case folded. Embedded quotation marks are properly doubled.

quote_ident('Foo bar') → "Foo bar"

quote_literal ( text ) → text

Returns a string properly quoted so that it can be used as a string literal in an SQL statement string. Embedded single quotation marks (') and backslashes (/) are properly doubled. Note: quote_literal returns NULL for NULL input. If the parameter may be NULL, we recommend that you use quote_nullable.

quote_literal(E'O\'Reilly') → 'O''Reilly'

quote_literal ( anyelement ) → text

Converts the specified value to text and then quotes it as a string literal. Embedded single quotation marks (') and backslashes (/) are properly doubled.

quote_literal(42.5) → '42.5'

quote_nullable ( text ) → text

Returns a string properly quoted so that it can be used as a string literal in an SQL statement string. If the parameter is NULL, the string literal NULL is returned. Embedded single quotation marks (') and backslashes (/) are properly doubled.

quote_nullable(NULL) → NULL

quote_nullable ( anyelement ) → text

Converts the specified value to text and then quotes it as a string literal. If the parameter is NULL, the string literal NULL is returned. Embedded single quotation marks (') and backslashes (/) are properly doubled.

quote_nullable(42.5) → '42.5'

regexp_match ( string text, pattern text [, flags text ] ) → text[]

Returns the first substring in the string that matches a POSIX regular expression.

regexp_match('foobarbequebaz', '(bar)(beque)') → {bar,beque}

regexp_matches ( string text, pattern text [, flags text ] ) → setof text[]

Returns the substrings in the string that match a POSIX regular expression.

regexp_matches('foobarbequebaz', 'ba.', 'g') →

regexp_replace ( string text, pattern text, replacement text [, flags text ] ) → text

Replaces the substrings that match a POSIX regular expression.

regexp_replace('Thomas', '.[mN]a.', 'M') → ThM

regexp_split_to_array ( string text, pattern text [, flags text ] ) → text[]

Splits the string by using a POSIX regular expression as the delimiter.

regexp_split_to_array('hello world', '\s+') → {hello,world}

regexp_split_to_table ( string text, pattern text [, flags text ] ) → setof text

Splits the string by using a POSIX regular expression as the delimiter.

regexp_split_to_table('hello world', '\s+') →

repeat (string text, number integer) → text

Repeats the string for the specified number of times.

repeat('Pg', 4) → PgPgPgPg

replace ( string text, from text, to text ) → text

Replaces all occurrences of the from substring in the string with the to substring.

replace('abcdefabcdef', 'cd', 'XX') → abXXefabXXef

reverse ( text ) → text

Reverses the order of the characters in a string.

reverse('abcde') → edcba

right ( string text, n integer ) ) → text

Returns the last n characters in a string, or returns all characters except for the first |n| characters if n is negative.

right('abcde', 2) → de

rpad ( string text, length integer [, fill text ] ) ) → text

Extends the length of the string to length by appending characters specified by fill, which is a space by default. If the string is already longer than length, the string is truncated.

rpad('hi', 5, 'xy') → hixyx

rtrim ( string text [, characters text ] ) → text

Removes the longest string that contains only the characters specified by characters, which is a space by default, from the end of the string.

rtrim('testxxzx', 'xyz') → test

split_part ( string text, delimiter text, n integer ) → text

Splits the string when the delimiter appears, and returns the n th field, which is counted from one.

split_part('abc~@~def~@~ghi', '~@~', 2) → def

strpos ( string text, substring text ) → integer

Returns the start index of the specified substring in the string. If the substring does not exist, 0 is returned. This function is the same as the function position(substring in string , but the parameter order is in reverse.

strpos('high', 'ig') → 2

substr ( string text, start integer [, count integer ] ) → text

Extracts the substring from the character at the start position in the string and extends for count characters, if count is specified. This function is the same as the function substring( string from start for count).

substr('alphabet', 3) → phabet

substr('alphabet', 3, 2) → ph

starts_with ( string text, prefix text ) → boolean

Returns true if the string starts with the prefix.

starts_with('alphabet', 'alph') → t

to_ascii ( string text ) → text

to_ascii ( string text, encoding name ) → text

to_ascii ( string text, encoding integer ) → text

Converts the string to ASCII characters from another encoding, which can be identified by name or number. If encoding is not specified, the database encoding is used. This is the only useful case in practices. The conversion mainly consists of dropping accents. Only conversions from LATIN1, LATIN2, LATIN9, and WIN1250 to ASCII are supported. For information about more flexible solutions, see the unaccent module.

to_ascii('Karél') → Karel

to_hex ( integer ) → text

to_hex ( bigint ) → text

Converts a number to a string in the hexadecimal format.

to_hex(2147483647) → 7fffffff

translate ( string text, from text, to text ) → text

Replaces each character in the string that matches a character in the from set with the corresponding character in the to set. If from is longer than to, the excess characters in from are deleted.

translate('12345', '143', 'ax') → a2x5

The concat, concat_ws, and format functions are mutable. Therefore, you can pass the values to be concatenated or formatted as an array by using the VARIADIC keyword. The elements in the array are considered separate parameters of the function. If the array is NULL, concat and concat_ws return NULL, but format considers NULL as an array with zero elements.

format

The format function produces formatted output based on a format string. This function works in a way similar to the C function sprintf.

    format(formatstr text [, formatarg "any" [, ...] ])

formatstr is a format string that specifies how the result is formatted. The text in the format string is directly copied into the result, except the place in which a format specifier is used. A format specifier acts as a placeholder in the format string. A format specifier defines how subsequent function parameters are formatted and inserted into the result. Each formatarg parameter is converted to text based on the regular output rules of the data type of the parameter value, and then formatted and inserted into the result string based on the format specifier.

A format specifier starts with a percent sign (%) and is in the following format:

    %[position][flags][width]type

Description of the fields in a format specifier:

position: optional. The string in the n''$ format, in which n is the index of the parameter to be displayed. Index 1 indicates the first parameter after formatstr. If position is not specified, the next parameter in the sequence is used by default.

flags: optional. The additional option that controls how the output of the format specifier is formatted. Only the minus sign (-) is supported. The - flag specifies that the output of the format specifier is left-justified. This field does not take effect unless the width field is also specified.

width: optional. The minimum number of characters used to display the output of the format specifier. The output is padded with spaces on the left or right based on whether the - flag is specified. A small width does not cause the output to be truncated, but is simply ignored. The width can be specified by using one of the following formats: a positive integer, an asterisk (*) indicating that the next function parameter is used as the width, and a string in the *n$ format indicating that the nth function parameter is used as the width.

If the width comes from a function parameter, the parameter is consumed before it is used by the value of the format specifier. If the width is negative, the result is left-justified by using a width of abs(width). This is the same as the effect when the - flag is specified.

type: required. The type of format conversion that is used to generate the output of the format specifier. Valid values:

s: formats the parameter value as a simple string. A NULL value is considered an empty string.
I: considers the parameter value as an SQL identifier and adds double quotation marks (") to the SQL identifier if necessary. If the parameter value is NULL, an error is returned, which is the same as the return of the quote_ident function.
L: quotes the parameter value as an SQL literal. A NULL value will be displayed as an unquoted string NULL, which is the same as the return of the quote_nullable function.

In addition to the preceding format specifiers, you can use the special sequence %% to display a literal % character.

The following examples show basic format conversions:

    SELECT format('Hello %s', 'World');
    Result: Hello World

    SELECT format('Testing %s, %s, %s, %%', 'one', 'two', 'three');
    Result: Testing one, two, three, %

    SELECT format('INSERT INTO %I VALUES(%L)', 'Foo bar', E'O\'Reilly');
    Result: INSERT INTO "Foo bar" VALUES('O''Reilly')

    SELECT format('INSERT INTO %I VALUES(%L)', 'locations', 'C:\Program Files');
    Result: INSERT INTO locations VALUES(E'C:\\Program Files')

The following examples show how the width field and the - flag are used:

    SELECT format('|%10s|', 'foo');
    Result: |       foo|

    SELECT format('|%-10s|', 'foo');
    Result: |foo       |

    SELECT format('|%*s|', 10, 'foo');
    Result: |       foo|

    SELECT format('|%*s|', -10, 'foo');
    Result: |foo       |

    SELECT format('|%-*s|', 10, 'foo');
    Result: |foo       |

    SELECT format('|%-*s|', -10, 'foo');
    Result: |foo       |

The following examples show how the position field is used:

    SELECT format('Testing %3$s, %2$s, %1$s', 'one', 'two', 'three');
    Result: Testing three, two, one

    SELECT format('|%*2$s|', 'foo', 10, 'bar');
    Result: |       bar|

    SELECT format('|%1$*2$s|', 'foo', 10, 'bar');
    Result: |       foo|

Different from the standard C function sprintf, the format function allows format specifiers with or without the position fields to be mixed in the same format string. A format specifier without a position field always uses the next parameter after the last consumed parameter. In addition, the format function does not require that all function parameters be used in the format string. Example:

    SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
    Result: Testing three, two, one

The %I and %L format specifiers are especially useful to safely construct dynamic SQL statements.

NLS_UPPER

Description

This function converts all letters in a string to uppercase letters.

Syntax

NLS_UPPER(char [, 'nlsparam' ])

Parameters

Parameter	Description
char	The string to be converted. The following data types are supported: `CHAR`, `VARCHAR2`, `NCHAR`, `NVARCHAR2`, and `CLOB`.

Return value type

The return value is of the same data type as the value of the char parameter.

Example

select NLS_UPPER('aaa BBB CcC') FROM DUAL;
  nls_upper  
-------------
 AAA BBB CCC

NLSSORT

Description

This function returns a string that is sorted.

Syntax

NLSSORT(char [, 'nlsparam' ])

Parameters

Parameter	Description
char	The string to be sorted. The following data types are supported: `CHAR`, `VARCHAR2`, `NCHAR`, `NVARCHAR2`, and `CLOB`.
nlsparam	Optional. The method in which the string is sorted.

Return value type

The return value is of the same data type as the value of the char parameter.

Example

CREATE TABLE test (name VARCHAR2(15));
INSERT INTO test VALUES ('ccc');
INSERT INTO test VALUES ('bbb');
INSERT INTO test VALUES ('aaa');
SELECT * FROM test ORDER BY name;
   name    
-----------
 Gaardiner
 Gaasten
 Gaberd

SELECT * FROM test ORDER BY NLSSORT(name, 'NLS_SORT = Danish');
   name    
-----------
 Gaberd
 Gaardiner
 Gaasten

REGEXP_INSTR

Description

This function returns the position of the substring that matches a regular expression in a string.

Syntax

REGEXP_INSTR (source_char, pattern[, position[, occurrence[, return_opt[, match_param[, subexpr]]]]])

Parameters

Parameter	Description
source_char	The string to be searched. The following data types are supported: `CHAR`, `VARCHAR2`, `NCHAR`, `NVARCHAR2`, and `CLOB`.
pattern	The regular expression. In most cases, the value is a text literal, and the following character types are supported: `CHAR`, `VARCHAR2`, `NCHAR`, and `NVARCHAR2`.
position	Optional. The positive integer that indicates the position from which the system starts to search for the substring that matches the regular expression. Default value: 1, which indicates that the search starts from the first character of the string specified by `source_char`.
occurrence	Optional. The positive integer that indicates the occurrence of the substring that matches the regular expression in the string specified by `source_char`. Default value: `1`. If the value of `occurrence` is greater than `1`, the system starts to search for the second occurrence from the first character after the first occurrence, and so forth.
return_opt	Optional. The option that specifies how the position of the occurrence is returned. Default value: `0`. If you set this parameter to `0`, the position of the first character in the occurrence is returned. If you set this parameter to `1`, the position of the first character after the occurrence is returned.
match_param	Optional. The default matching behavior of the function. The value is a character expression of the `VARCHAR2` or `CHAR` data type. `i` specifies case-insensitive matching. `c` specifies case-sensitive matching. `n` specifies that periods (.) can match line feeds. `m` specifies the multi-line mode. `x` specifies that space characters are ignored. By default, space characters match themselves.
subexpr	Optional. The subexpression that you want to use in all the subexpressions that are specified by `pattern`. The value is a non-negative integer that ranges from `0` to `9`. Default value: `0`, which indicates the first subexpression.

Return value type

The return value is of the NUMBER data type.

Example

SELECT regexp_instr('abcdefghi', 'd.f') FROM DUAL;
 regexp_instr 
--------------
            4

INSTR

Description

This function returns the position of a substring in a string. A multi-byte character, such as a full-width character, is counted as one character.

Syntax

INSTR(char1,char2[,i[,j]])

Parameters

Parameter	Description
char1	The string to be searched. The following data types are supported: `CHAR`, `VARCHAR2`, `NCHAR`, `NVARCHAR2`, and `CLOB`.
char2	The substring that you want to find. The following data types are supported: `CHAR`, `VARCHAR2`, `NCHAR`, `NVARCHAR2`, and `CLOB`.
i	The position of the character from which the search starts. Default value: `1`. If the value is less than `0`, the search starts from the opposite direction, but the function still returns the left-to-right position of the found substring.
j	Optional. The integer that indicates the occurrence of the substring that you want to find. Default value: `1`.

Return value type

The return value is of the NUMBER data type.

Example

SELECT INSTR('PolarDB-PG','P',1,2) "INSTR" FROM DUAL;
 INSTR 
-------
     9

TRIM

Description

This function removes the specified leading or trailing characters or both from a string.

Note

The values of trim_character and trim_source can be of the VARCHAR2 data type or of the data types that can be implicitly converted to the VARCHAR2 data type. If the string that is returned by the function is of the VARCHAR2 data type, the maximum length of the string is the same as that of the string specified by trim_source.

Syntax

TRIM([{{ LEADING | TRAILING | BOTH }[ trim_character ]|trim_character }FROM] trim_source)

Parameters

Parameter	Description
LEADING \| TRAILING \| BOTH	The position of the characters to be removed. A value of `LEADING` specifies the leading characters of the string. If you specify `LEADING` for this parameter, any leading characters that are the same as the character specified by `trim_character` are removed. A value of `TRAILING` specifies the trailing characters of the string. If you specify `TRAILING` for this parameter, any trailing characters that are the same as the character specified by `trim_character` are removed. A value of `BOTH` specifies both the leading and trailing characters of the string. If you specify `BOTH` for this parameter or do not specify this parameter, any leading and trailing characters that are the same as the character specified by `trim_character` are removed.
trim_character	The character to be removed. You can specify only a single character. If `trim_character` is not specified, a space is used by default.
trim_source	The string that you want to trim. If you specify only `trim_source`, the leading and trailing spaces are removed.

Return value type

If trim_source is set to values of the CHAR or VARCHAR2 data type, the function returns values of the VARCHAR2 data type.
If trim_source is set to values of the NCHAR or NVARCHAR2 data type, the function returns values of the NVARCHAR2 data type.
If trim_source is set to values of the CLOB data type, the function returns values of the CLOB data type.
If trim_source or trim_character is set to NULL, this function returns NULL.

Example

SELECT trim(both 'x' from 'x!@#$x') FROM DUAL;
 btrim 
-------
 !@#$

TREAT

Description

This function changes the declared type of an expression. You must have the EXECUTE object permission on the new type to use this function.

Note

The new type must be a supertype or subtype of the declared type of the expression. If the most specific type of the expression is the new type or a subtype of the new type, TREAT returns the expression. If the most specific type of the expression is not the new type or a subtype of the new type, TREAT returns NULL.
REF can be specified only if the declared type of the expression is a REF type.
If the declared type of the expression is a REF to the source type of the expression, the new type must be a subtype or supertype of the source type of the expression. If the most specific type of DEREF(expr) is the new type or a subtype of the new type, TREAT returns the expression. If the most specific type of DEREF(expr) is not the new type or a subtype of the new type, TREAT returns NULL.
This function does not directly support values of the CLOB data type. However, values of the CLOB data type can be passed in as parameters by using implicit type conversion.

Syntax

TREAT( expr AS [ REF ] [ schema. ]type)

Parameters

Parameter	Description
expr	The expression whose declared type you want to change. The following data types are supported: `CHAR`, `VARCHAR2`, `NCHAR`, and `NVARCHAR2`.
REF	Optional. The REF. You can specify this parameter only if the declared type of the expression is a REF type.
schema	Optional. The schema of the new type.
type	The new type.

Return value type

The return value is of the same data type as the value of the expr parameter.

Example

SELECT name, TREAT(VALUE(p) AS employee_t).salary salary FROM persons p;
NAME                          SALARY
------------------------- ----------
Bob
Joe                           100000
Tim                             1000

SOUNDEX

Description

This function is used to return a string that contains the phonetic representation of the specified string. You can use this function to compare words in English that are spelled differently but sound alike.

Syntax

SOUNDEX(char)

Parameters

Parameter	Description
char	The string for which you want to obtain the phonetic representation. The following data types are supported: `CHAR`, `VARCHAR2`, `NCHAR`, and `NVARCHAR2`.

Return value type

The return value is of the same data type as the value of the char parameter.

Example

create table test(c varchar(100));
insert into test values('Smith');

select c from test where SOUNDEX(c) = SOUNDEX('SMYTHE');
   c   
-------
 Smith

REGEEXP_SUBSTR

Description

This function returns the substring that matches a regular expression in a string, which extends the functionality of the SUBSTR function. This function works in a way similar to the REGEXP_INSTR function, but this function returns the substring instead of the position of the substring. This function is useful if you need to only obtain the content of a substring, but do not need to return its position in the source string.

Syntax

REGEXP_SUBSTR(source_char, pattern
               [, position [, occurrence [, match_param [, subexpr] ] ] ]
              )

Parameters

Parameter	Description
source_char	The string to be searched. The following data types are supported: `CHAR`, `VARCHAR2`, `NCHAR`, `NVARCHAR2`, and `CLOB`.
pattern	The regular expression. In most cases, the value is a text literal, and the following data types are supported: `CHAR`,`VARCHAR2`,`NCHAR`, and `NVARCHAR2`.
position	Optional. The positive integer that indicates the position from which the system starts to search for the substring that matches the regular expression. Default value: 1, which indicates that the search starts from the first character of the string specified by `source_char`.
occurrence	Optional. The positive integer that indicates the occurrence of the substring that matches the regular expression in the string specified by `source_char`. Default value: 1. If the value of `occurrence` is greater than 1, the system starts to search for the second occurrence from the first character after the first occurrence, and so forth.
match_param	Optional. The default matching behavior of the function. The value is a character expression of the `VARCHAR2` or `CHAR` data type. `i` specifies case-insensitive matching. `c` specifies case-sensitive matching. `n` specifies that periods (`.`) can match line feeds. `m` specifies the multi-line mode. `x` specifies that space characters are ignored. By default, space characters match themselves.
subexpr	Optional. The subexpression that you want to use in all the subexpressions that are specified by `pattern`. The value is a non-negative integer that ranges from 0 to 9. Default value: 0, which indicates the first subexpression.

Return value type

The return value is of the VARCHAR2 or CLOB data type in the same character set as source_char.

Example

SELECT REGEXP_SUBSTR('500 Oracle Parkway, Redwood Shores, CA',',[^,]+,') "REGEXPR_SUBSTR" FROM DUAL;
  REGEXPR_SUBSTR   
-------------------
 , Redwood Shores,

SELECT REGEXP_SUBSTR('http://www.oracle.com/products','http://([[:alnum:]]+\.?){3,4}/?') "REGEXP_SUBSTR" FROM DUAL;
     REGEXP_SUBSTR      
------------------------
 http://www.oracle.com/

NLS_LOWER

Description

This function converts all letters in a string to lowercase letters.

Syntax

NLS_LOWER(char [, 'nlsparam' ])

Parameters

Parameter	Description
char	The string to be converted. The following data types are supported: `CHAR`, `VARCHAR2`, `NCHAR`, `NVARCHAR2`, and `CLOB`.

Return value type

The return value is of the same data type as the value of the char parameter.

Example

select NLS_LOWER('aaa BBB CcC') FROM DUAL;
  nls_lower  
-------------
 aaa bbb ccc

NLS_INTCAP

Description

This function returns a string with the first letter of each word in uppercase and the rest letters in lowercase. Words are separated with spaces or non-alphanumeric characters.

Syntax

NLS_INITCAP(c)

Parameters

Parameter	Description
c	The string that you want to convert. The following data types are supported: `CHAR`, `VARCHAR2`, `NCHAR`, and `NVARCHAR2`.

Return value type

The return value is of the same data type as the value of the c parameter.

Example

select INITCAP('aaa BBB CcC') FROM DUAL;
   initcap   
-------------
 Aaa Bbb Ccc