This topic describes the string functions provided by MaxCompute, which can be used in SQL statements.
CHAR_MATCHCOUNT
- Syntax
BIGINT CHAR_MATCHCOUNT(STRING str1, STRING str2)
- Description
This function returns the number of characters that belong to str1 and appear in str2.
- Parameters
str1 and str2: the valid UTF-8 strings. The function returns a negative value if invalid characters are found during the comparison of the two strings.
- Return value
Returns a value of the BIGINT type. NULL is returned if an input parameter is NULL.
- Examples
CHAR_MATCHCOUNT('abd','aabc') = 2 -- The a and b characters in str1 appear in str2.
CHR
- Syntax
STRING CHR(BIGINT ascii)
- Description
This function converts a specified ASCII code to the corresponding character.
- Parameters
ascii: a value of the BIGINT type, which indicates the ASCII code. Value range: 0 to 255. An exception is thrown if the input value is out of this range. The input value is implicitly converted to a value of the BIGINT type before computing if it is of the STRING, DOUBLE, or DECIMAL type. If the input value is of another data type, an exception is thrown.
- Return value
Returns a value of the STRING type. NULL is returned if the input parameter is NULL.
CONCAT
- Syntax
STRING CONCAT(STRING a, STRING b...)
- Description
This function concatenates all specified strings and returns the final string.
- Parameters
The values of all parameters are of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.
- Return value
Returns a value of the STRING type. NULL is returned if no input parameters are present or an input parameter is NULL.
- Examples
CONCAT('ab','c') = 'abc' CONCAT() = NULL CONCAT('a', null, 'b') = NULL
GET_JSON_OBJECT
- Syntax
STRING GET_JSON_OBJECT(STRING json,STRING path)
- Description
This function extracts the specified string from a standard JSON string based on
path
. The original data is read each time this function is called. Therefore, repeated calls may waste system resources and increase your cost. To avoid repeated calls, you can use theGET_JSON_OBJECT
function together with User-Defined Table-Generating Functions (UDTFs). For more information, see Convert JSON log data by using MaxCompute built-in functions and UDTFs. - Parameters
json
: the string in the standardJSON
format.path
: the string that starts with the dollar sign ($
), which describes the path of a JSON object in the JSON string.For more information about
path
, see LanguageManual UDF.$
: indicates the root node..
: indicates a child node.[]
:[number]
indicates the array subscript. The array is represented in the format ofkey[sub1][sub2][sub3]……
.*
: indicates the wildcard for[]
, which returns the entire array. An asterisk (*
) cannot be escaped.
- Return value
- Returns
NULL
ifjson
is empty or invalid. - Returns the corresponding string if
json
is valid and the specifiedpath
exists. - A key cannot appear in an object twice. For example,
{a:1, a:0}
is not supported. Otherwise, the command may fail to be parsed. - Emoji expressions are not supported.
- Returns
- Example 1
+----+ json +----+ {"store": {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net", "owner":"amy" }
You can execute the following statements to extract data from the precedingJSON
string.odps> SELECT GET_JSON_OBJECT(src_json.json, '$.owner') FROM src_json; amy odps> SELECT GET_JSON_OBJECT(src_json.json, '$.store.fruit\[0]') FROM src_json; {"weight":8,"type":"apple"} odps> SELECT GET_JSON_OBJECT(src_json.json, '$.non_exist_key') FROM src_json; NULL
Example 2GET_JSON_OBJECT('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]')= "2222" GET_JSON_OBJECT('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]') = "["h0","h1","h2"]" GET_JSON_OBJECT('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]') = "h1"
INSTR
- Syntax
BIGINT INSTR(STRING str1, STRING str2[, BIGINT start_position[, BIGINT nth_appearance]])
- Description
This function returns the position of substring
str2
in stringstr1
. - Parameters
str1
: a value of the STRING type, which indicates the string to be searched. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.str2
: a value of the STRING type, which indicates the substring to be searched for. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.start_position
: a value of the BIGINT type. If it is of another data type, an exception is thrown. It indicates the character instr1
from which the search will start. The start position defaults to the first character, marked as 1.nth_appearance
: a value of the BIGINT type, which must be greater than 0. It indicates the nth time that the substring appears in the string, where n is specified bynth_appearance
. An exception is thrown if the value ofnth_appearance
is of another data type or is no greater than 0.
- Return value
- Returns a value of the BIGINT type.
- Returns 0 if
str2
is not found instr1
. - Returns NULL if an input parameter is NULL.
- The matching always succeeds if
str2
is empty. For example, 1 is returned forinstr(‘abc’, ")
.
- Examples
INSTR('Tech on the net', 'e') = 2 INSTR('Tech on the net', 'e', 1, 1) = 2 INSTR('Tech on the net', 'e', 1, 2) = 11 INSTR('Tech on the net', 'e', 1, 3) = 14
IS_ENCODING
- Syntax
BOOLEAN IS_ENCODING(STRING str, STRING from_encoding, STRING to_encoding)
- Description
This function determines whether the input string
str
can be converted from the character set specified byfrom_encoding
to the character set specified byto_encoding
. It can be used to determine whether the input string is garbled. Generally,from_encoding
is set to utf-8 andto_encoding
is set to gbk. - Parameters
str
: a value of the STRING type. An empty string can belong to any character set.from_encoding
andto_encoding
: the source and target character sets. The values are of the STRING type.- NULL is returned if an input parameter is NULL.
- Return value
Returns a value of the BOOLEAN type. If
str
can be converted, true is returned. Otherwise, false is returned.
KEYVALUE
- Syntax
KEYVALUE(STRING srcStr,STRING split1,STRING split2, STRING key) KEYVALUE(STRING srcStr,STRING key) //split1 = ";",split2 = ":"
- Description
This function splits the source string
srcStr
into key-value pairs bysplit1
, separates key-value pairs bysplit2
, and then returns the value of the specified key. - Parameters
srcStr
: the source string to be split.key
: the key corresponds to the value returned after the source string is split bysplit1
andsplit2
in sequence. This parameter is of the STRING type.split1
andsplit2
: the strings that are used as delimiters to split the source string. If you do not specify these two parameters,split1
defaults to;
andsplit2
defaults to:
. If a key-value pair that is obtained after the source string is split bysplit1
contains multiple delimiters specified bysplit2
, the returned result is undefined.
- Return value
- Returns a value of the STRING type.
- Returns NULL if
split1
orsplit2
is NULL. - Returns NULL if
scrStr
andkey
are NULL or if nokey
matches. - Returns the value corresponding to the first matched
key
if multiple key-value pairs match.
- Example 1
The source string isKEYVALUE('0:1\;1:2', 1) = '2'
"0:1\;1:2"
.split1
defaults to;
andsplit2
defaults to:
because they are not specified.After the source string is split bysplit1
, the key-value pairs are0:1\,1:2
. After the key-value pairs are separated bysplit2
, the key-value pairs change to the following form:
The value 2 corresponding to key0 1/ 1 2
1
is returned.Example 2
The source string isKEYVALUE("\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\;","\;",":","tf") = "21910" value:21910
"\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\;"
. After the source string is split by\;
specified by split1, the following key-value pairs are generated:
After the key-value pairs are split bydecreaseStore:1,xcard:1,isB2C:1,tf:21910,cart:1,shipping:2,pf:0,market:shoes,instPayAmount:0
:
specified by split2, the key-value pairs change to the following form:
The value 21910 corresponding to keydecreaseStore 1 xcard 1 isB2C 1 tf 21910 cart 1 shipping 2 pf 0 market shoes instPayAmount 0
tf
is returned.
LENGTH
- Syntax
BIGINT LENGTH(STRING str)
- Description
This function returns the length of the string
str
. - Parameters
str
: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.- Returns a value of the BIGINT type. NULL is returned if the input parameter is NULL. -1 is returned if the input parameter is not encoded in UTF-8.
- Examples
LENGTH('hi! CN') = 6
LENGTHB
- Syntax
BIGINT LENGTHB(STRING str)
- Description
This function returns the length of the string
str
in bytes. - Parameters
str
: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.- Returns a value of the BIGINT type. NULL is returned if the input parameter is NULL.
- Examples
LENGTHB('hi! China!') = 10
MD5
- Syntax
STRING MD5(STRING value)
- Description
This function returns the MD5 value of the input string
value
. - Parameters
value
: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown. - Return value
Returns a value of the STRING type. NULL is returned if the input parameter is NULL.
REGEXP_EXTRACT
- Syntax
STRING REGEXP_EXTRACT(STRING source, STRING pattern[, BIGINT occurrence])
- Description
This function splits the
source
string based on a givenpattern
and returns the characters in thegroup
at the nth occurrence, where n is specified byoccurrence
. - Parameters
source
: the string to split and search.pattern
: a constant of the STRING type. An exception is thrown ifpattern
is an empty string or nogroup
is specified inpattern
.occurrence
: a constant of the BIGINT type, which must be no less than 0. Otherwise, an exception is thrown. If you do not specify this parameter, it defaults to 1, indicating that characters in the firstgroup
are returned. If you setoccurrence
to 0, all substrings that match the regular expression specified bypattern
are returned.
Note Data is stored in the UTF-8 format. Chinese characters can be represented in hexadecimal. They are encoded in the range of [\x{4e00}-\x{9fa5}]. - Return value
Returns a value of the STRING type. Returns NULL if an input parameter is NULL.
REGEXP_INSTR
- Syntax
BIGINT REGEXP_INSTR(STRING source, STRING pattern[, BIGINT start_position[, BIGINT nth_occurrence[, BIGINT return_option]]])
- Description
This function returns the start or end position of the substring that matches
pattern
at the nth occurrence, where n is specified bynth_occurrence
, in thesource
string from the start position specified bystart_position
. - Parameters
source
: a value of the STRING type, which indicates the string to be searched.pattern
: a constant of the STRING type. An exception is thrown ifpattern
is an empty string.start_position
: a constant of the BIGINT type, which indicates the start position of a search. If you do not specify this parameter, its value defaults to 1. An exception is thrown if its value is of another data type or is no greater than 0.nth_occurrence
: a constant of the BIGINT type. If you do not specify this parameter, its value defaults to 1, indicating the first occurrence of the matched substring. An exception is thrown if its value is of another data type or is no greater than 0.return_option
: a constant of the BIGINT type, which can be 0 or 1. An exception is thrown if its value is of another data type or is out of the value range. 0 indicates that the start position of the matched substring is returned, and 1 indicates that the end position of the matched substring is returned.
- Return value
Returns a value of the BIGINT type. It is the start or end position of the matched substring in the
source
string based on the type specified byreturn_option
. NULL is returned if an input parameter is NULL. - Examples
REGEXP_INSTR("i love www.taobao.com", "o[[:alpha:]]{1}", 3, 2) = 14
REGEXP_REPLACE
- Syntax
STRING REGEXP_REPLACE(STRING source, STRING pattern, STRING replace_string[, BIGINT occurrence])
- Description
This function substitutes the specified string
replace_string
for the substring that matches a givenpattern
at the nth occurrence, where n is specified byoccurrence
, in thesource
string, and returns the result. - Parameters
source
: the string to be replaced.pattern
: a constant of the STRING type, which indicates the pattern to be matched. An exception is thrown ifpattern
is an empty string.replace_string
: a value of the STRING type, which is a string that is substituted for the substring matching the givenpattern
.occurrence
: a constant of the BIGINT type, which must be no less than 0. It indicates thatreplace_string
is substituted for the matching substring at the nth occurrence. The value 0 indicates that all matching substrings are replaced. An exception is thrown if its value is of another type or is less than 0. 0 is assumed by default.
- Return value
- Returns a value of the STRING type. If the string to be replaced does not exist, the replacement does not occur.
- Returns NULL if an input parameter is NULL.
- Returns NULL if
replace_string
is NULL and certain substrings match the givenpattern
. - Returns the original string if
replace_string
is NULL but no substring matches the givenpattern
.
- Examples
REGEXP_REPLACE("123.456.7890", "([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4})", "(\\1)\\2-\\3", 0) = "(123)456-7890" REGEXP_REPLACE("abcd", "(.)", "\\1 ", 0) = "a b c d " REGEXP_REPLACE("abcd", "(.)", "\\1 ", 1) = "a bcd" REGEXP_REPLACE("abcd", "(.)", "\\2", 1) = "abcd" -- Only one group is defined in the pattern and the referenced second group does not exist. -- Try to avoid this action. The result of referencing a nonexistent group is undefined. REGEXP_REPLACE("abcd", "(. *)(.)$", "\\2", 0) = "d" REGEXP_REPLACE("abcd", "a", "\\1", 0) = "bcd" -- No group is defined in the pattern, so "\1" references a nonexistent group. -- Try to avoid this action. The result of referencing a nonexistent group is undefined.
REGEXP_SUBSTR
- Syntax
STRING REGEXP_SUBSTR(STRING source, STRING pattern[, BIGINT start_position[, BIGINT nth_occurrence]])
- Description
This function returns the string that matches a given
pattern
at the nth occurrence, where n is specified bynth_occurrence
, in thesource
string from the start position specified bystart_position
. - Parameters
source
: a value of the STRING type, which indicates the string to be searched.pattern
: a constant of the STRING type, which indicates the pattern to be matched. An exception is thrown ifpattern
is an empty string.start_position
: a constant of the BIGINT type, which must be greater than 0. An exception is thrown if its value is of another data type or is no greater than 0. If you do not specify this parameter, its value defaults to 1, indicating that the match starts from the first character of thesource
.nth_occurrence
: a constant of the BIGINT type, which must be greater than 0. An exception is thrown if its value is of another data type or is no greater than 0. If you do not specify this parameter, its value defaults to 1, indicating that the first matched substring is returned.
- Return value
Returns a value of the STRING type. NULL is returned if an input parameter is NULL or no substrings match.
- Examples
REGEXP_SUBSTR("I love aliyun very much", "a[[:alpha:]]{5}") = "aliyun" REGEXP_SUBSTR('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 1) = " have" REGEXP_SUBSTR('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 2) = " 2"
REGEXP_COUNT
- Syntax
BIGINT REGEXP_COUNT(STRING source, STRING pattern[, BIGINT start_position])
- Description
This function returns the number of times a substring matches a given
pattern
in thesource
string from the start position specified bystart_position
. - Parameters
source
: a value of the STRING type, which indicates the string to be searched. An exception is thrown if its value is of another data type.pattern
: a constant of the STRING type, which indicates the pattern to be matched. An exception is thrown ifpattern
is an empty string or is of another data type.start_position
: a constant of the BIGINT type, which must be greater than 0. An exception is thrown if its value is of another data type or is no greater than 0. If you do not specify this parameter, its value defaults to 1, indicating that the match starts from the first character of thesource
string.
- Return value
Returns a value of the BIGINT type. 0 is returned if no substring is matched. Null is returned if an input parameter is NULL.
- Examples
REGEXP_COUNT('abababc', 'a.c') = 1 REGEXP_COUNT('abcde', '[[:alpha:]]{2}', 3) = 1
SPLIT_PART
- Syntax
STRING SPLIT_PART(STRING str, STRING separator, BIGINT start[, BIGINT end])
- Description
This function uses a delimiter specified by
separator
to split the stringstr
, and returns a substring that starts from the character specified bystart
and ends with the character specified byend
. - Parameters
str
: a value of the STRING type, which indicates a string to be split. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.separator
: a constant of the STRING type, which is a separator used for splitting. It can be a character or string. If it is of another data type, an exception is thrown.start
: a constant of the BIGINT type, which must be greater than 0. An exception is thrown if its value is not a constant or is of another data type. It indicates the start number of the segment to be returned, starting from 1. Ifend
is not specified, the segment specified bystart
is returned.end
: a constant of the BIGINT type, which must be no less thanstart
. It indicates the end number of the segment to be returned. An exception is thrown if its value is not a constant or is of another data type. If end is not specified, the last segment is returned.
- Return value
- Returns a value of the STRING type.
- Returns an empty string if
start
is set to a value greater than the number of segments, for example, the string has 6 segments but thestart
value is greater than 6. - Returns the entire string
str
ifseparator
is absent in the stringstr
andstart
is set to 1. An empty string is returned ifstr
is an empty string. - Returns the original string
str
ifseparator
is an empty string. - Returns the string between start and the last segment if
end
is set to a value greater than the number of segments. - Returns NULL if an input parameter is NULL.
- Examples
SPLIT_PART('a,b,c,d', ',', 1) = 'a' SPLIT_PART('a,b,c,d', ',', 1, 2) = 'a,b' SPLIT_PART('a,b,c,d', ',', 10) = ''
SUBSTR
- Syntax
STRING SUBSTR(STRING str, BIGINT start_position[, BIGINT length])
- Description
This function returns a substring that starts from
start_position
instr
and has a length specified bylength
. - Parameters
str
: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.start_position
: a value of the BIGINT type. 1 is assumed by default. An empty string is returned ifstart_position
is 0. Ifstart_position
is a negative value, the start position is counted backwards from the last character of the string. For example, -1 indicates the last character, -2 indicates the second-to-last character, -3 indicates the third-from-last character, and so on. If the value is of another data type, an exception is thrown.length
: a value of the BIGINT type, which indicates the length of the substring. Its value is greater than 0. An exception is thrown if its value of another data type or is no greater than 0.
- Return value
Returns a value of the STRING type. NULL is returned if an input parameter is NULL.
Note Iflength
is not specified, the substring from the start position to the end ofstr
is returned. - Examples
SUBSTR("abc", 2) = "bc" SUBSTR("abc", 2, 1) = "b" SUBSTR("abc",-2,2) = "bc" SUBSTR("abc",-3) = "abc"
SUBSTRING
- Syntax
STRING SUBSTRING(STRING|BINARY str, INT start_position[, INT length])
- Description
This function returns a substring that starts from
start_position
instr
and has a length specified bylength
. - Parameters
str
: a value of the STRING or BINARY type. NULL or an error is returned if the input value is of another data type.start_position
: a value of the INT type. 1 is assumed by default. An empty string is returned ifstart_position
is set to 0. Ifstart_position
is a negative value, the start position is counted backwards from the last character of the string. For example, -1 indicates the last character, -2 indicates the second-to-last character, -3 indicates the third-from-last character, and so on. If the value is of another data type, an exception is thrown.length
: a value of the BIGINT type, which indicates the length of the substring. Its value is greater than 0. An exception is thrown if its value is of another data type or is no greater than 0.
- Return value
Returns a value of the STRING type. NULL is returned if an input parameter is NULL.
Note Iflength
is not specified, the substring from the start position to the end ofstr
is returned. - Examples
SUBSTRING('abc', 2) = 'bc' SUBSTRING('abc', 2, 1) ='"b' SUBSTRING('abc',-2,2) = 'bc' SUBSTRING('abc',-3,2) = 'ab' SUBSTRING(BIN(2345),2,3) = '001'
TOLOWER
- Syntax
STRING TOLOWER(STRING source)
- Description
This function converts the string
source
to a lowercase string. - Parameters
source
: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown. - Return value
Returns a value of the STRING type. NULL is returned if the input parameter is NULL.
- Examples
tolower("aBcd") = "abcd" tolower("ABCd") = "abcd"
TOUPPER
- Syntax
STRING TOUPPER(STRING source)
- Description
This function converts the string
source
to an uppercase string. - Parameters
source
: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown. - Return value
Returns a value of the STRING type. NULL is returned if the input parameter is NULL.
- Examples
toupper("aBcd") = "ABCD" toupper("abCd") = "ABCD"
TO_CHAR
- Syntax
STRING TO_CHAR(BOOLEAN value) STRING TO_CHAR(BIGINT value) STRING TO_CHAR(DOUBLE value) STRING TO_CHAR(DECIMAL value)
- Description
This function converts data of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type to the STRING type.
- Parameters
value
: The input value can be of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type. An exception is thrown if its value is of another data type. - Return value
Returns a value of the STRING type. NULL is returned if the input parameter is NULL.
- Examples
TO_CHAR(123) = '123' TO_CHAR(true) = 'TRUE' TO_CHAR(1.23) = '1.23' TO_CHAR(null) = NULL
TRIM
- Syntax
STRING TRIM(STRING str)
- Description
This function eliminates the spaces on the left and right sides of the string
str
. - Parameters
str
: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown. - Return value
Returns a value of the STRING type. NULL is returned if the input parameter is NULL.
LTRIM
- Syntax
STRING LTRIM(STRING str)
- Description
This function eliminates the spaces on the left side of the string
str
. - Parameters
str
: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown. - Return value
Returns a value of the STRING type. NULL is returned if the input parameter is NULL.
- Examples
SELECT LTRIM(' abc '); -- Returned result: +-----+ | _c0 | +-----+ | abc | +-----+
RTRIM
- Syntax
STRING RTRIM(STRING str)
- Description
This function eliminates the spaces on the right side of the string
str
. - Parameters
str
: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown. - Return value
Returns a value of the STRING type. NULL is returned if the input parameter is NULL.
- Examples
SELECT RTRIM('a abc '); -- Returned result: +-----+ | _c0 | +-----+ | a abc | +-----+
REVERSE
- Syntax
STRING REVERSE(STRING str)
- Description
This function returns a string in reverse order.
- Parameters
str
: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown. - Return value
Returns a value of the STRING type. NULL is returned if the input parameter is NULL.
- Examples
SELECT REVERSE('abcedfg');--Returned result: +-----+ | _c0 | +-----+ | gfdecba | +-----+
REPEAT
- Syntax
STRING REPEAT(STRING str, BIGINT n)
- Description
This function returns
n
duplicates of the stringstr
. - Parameters
str
: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.n
: a value of the BIGINT type. The returned string cannot exceed 2 MB in length. An exception is thrown if n is NULL.
- Return value
Returns a value of the STRING type.
- Examples
SELECT REPEAT('abc',5); -- Return abcabcabcabcabc.
ASCII
- Syntax
BIGINT ASCII(STRING str)
- Description
This function returns the ASCII code of the first character in the string
str
. - Parameters
str
: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown. - Return value
Returns a value of the BIGINT type.
- Examples
SELECT ASCII('abcde'); -- Return 97.
Additional functions of MaxCompute V2.0
set
statement before the SQL statement:set odps.sql.type.system.odps2=true;
set odps.sql.type.system.odps2=true;
before the SQL statement that uses the function, and commit and run it together with
the SQL statement so that the new data types can be used.
This section describes new string functions in MaxCompute V2.0.
CONCAT_WS
- Syntax
STRING CONCAT_WS(STRING SEP, STRING a, STRING b...) STRING CONCAT_WS(STRING SEP, ARRAY)
- Description
This function concatenates all input strings in an array by using a specified delimiter.
- Parameters
SEP
: the delimiter of the STRING type. An exception is thrown if you do not specify this parameter.
- Return value
Returns a value of the STRING type. NULL is returned if no input parameters are present or an input parameter is NULL.
- Examples
CONCAT_WS(':','name','hanmeimei')='name:hanmeimei' CONCAT_WS(':','avg',null,'34')=null
LPAD
- Syntax
STRING LPAD(STRING a, INT len, STRING b)
- Description
This function pads the left side of string
a
with stringb
until the new padded string haslen
characters. - Parameters
len
: a value of the INT type.- Parameters such as
a
andb
: values of the STRING type.
- Return value
Returns a value of the STRING type. If
len
is smaller than the number of characters ina
,a
is truncated from the left to obtain a string with the number of characters specified bylen
. Iflen
is 0, NULL is returned. - Examples
lpad('abcdefgh',10,'12')='12abcdefgh' lpad('abcdefgh',5,'12')='abcde' lpad('abcdefgh',0,'12') -- Return NULL.
- Session level: To use a new data type, you must insert
SET odps.sql.type.system.odps2=true;
before the SQL statement, and commit and execute them together. - Project level: The project owner can set the project as needed. It takes 10 to 15
minutes for the settings to take effect. Run the following command:
For more information aboutsetproject odps.sql.type.system.odps2=true;
setproject
, see Project operations. For the precautions you must take when you enable data types at the project level, see Date types.
RPAD
- Syntax
STRING RPAD(STRING a, INT len, STRING b)
- Description
This function pads the right side of string
a
with stringb
until the new padded string haslen
characters. - Parameters
len
: a value of the INT type.- Parameters such as
a
andb
: values of the STRING type.
- Return value
Returns a value of the STRING type. If
len
is smaller than the number of characters ina
,a
is truncated from the left to obtain a string with the number of characters specified bylen
. Iflen
is 0, NULL is returned. - Examples
rpad('abcdefgh',10,'12')='abcdefgh12' rpad('abcdefgh',5,'12')='abcde' rpad('abcdefgh',0,'12') -- Return NULL.
- Session level: To use a new data type, you must insert
SET odps.sql.type.system.odps2=true;
before the SQL statement, and commit and execute them together. - Project level: The project owner can set the project as needed. It takes 10 to 15
minutes for the settings to take effect. Run the following command:
For more information aboutsetproject odps.sql.type.system.odps2=true;
setproject
, see Project operations. For the precautions you must take when you enable data types at the project level, see Date types.
REPLACE
- Syntax
STRING REPLACE(STRING a, STRING OLD, STRING NEW)
- Description
This function substitutes the string
NEW
for the part of stringa
that is exactly the same as the stringOLD
, and returns the stringa
. - Parameters
The values of all parameters are of the STRING type.
- Return value
Returns a value of the STRING type. NULL is returned if an input parameter is NULL.
- Examples
REPLACE('ababab','abab','12')='12ab' REPLACE('ababab','cdf','123')='ababab' REPLACE('123abab456ab',null,'abab')=null
SOUNDEX
- Syntax
STRING SOUNDEX(STRING a)
- Description
This function converts a normal string to a string of the
soundex
type. - Parameters
a
: a value of the STRING type. - Return value
Returns a value of the STRING type. NULL is returned if the input parameter is NULL.
- Examples
SOUNDEX('hello')='H400'
SUBSTRING_INDEX
- Syntax
STRING SUBSTRING_INDEX(STRING a, STRING sep, INT count)
- Description
This function truncates the string
a
to a substring from the first character to the nth delimiter, where n is specified bycount
. Ifcount
is a positive value, the string is truncated from left to right. Otherwise, the string is truncated from right to left. - Parameters
a
andsep
are of the STRING type, whereascount
is of the INT type. - Return value
Returns a value of the STRING type. NULL is returned if an input parameter is NULL.
- Examples
SUBSTRING_INDEX('https://www.alibabacloud.com, '.', 2)='https://www.alibabacloud' SUBSTRING_INDEX('https://www.alibabacloud.com', '.', -2)='alibabacloud.com' SUBSTRING_INDEX('https://www.alibabacloud.com', null, 2)=null
- Session level: To use a new data type, you must insert
SET odps.sql.type.system.odps2=true;
before the SQL statement, and commit and execute them together. - Project level: The project owner can set the project as needed. It takes 10 to 15
minutes for the settings to take effect. Run the following command:
For more information aboutsetproject odps.sql.type.system.odps2=true;
setproject
, see Project operations. For the precautions you must take when you enable data types at the project level, see Date types.
URL_ENCODE
- Syntax
STRING URL_ENCODE(STRING input[, STRING encoding])
- Description
This function encodes the input string in the
application/x-www-form-urlencoded MIME
format and returns the encoded string.- All letters remain unchanged.
- Dots (.), hyphens (-), asterisks (*), and underscores (_) remain unchanged.
- Spaces are converted to plus signs (+).
- Other characters are converted to byte values based on the specified
encoding
. Each byte value is then represented in the format of%xy
, wherexy
is the hexadecimal representation of the character value.
- Parameters
input
: the string to be encoded.encoding
: the specified encoding format. GBK and UTF-8 are supported. If you do not specify this parameter, its value defaults to UTF-8.
- Return value
Returns a value of the STRING type. NULL is returned if the input parameter is NULL.
- Examples
URL_ENCODE('Example for URL_ENCODE:// dsf(fasfs)', 'GBK') = "Example+for+URL_ENCODE+%3A%2F%2F+dsf%28fasfs%29"
URL_DECODE
- Syntax
STRING URL_DECODE(STRING input[, STRING encoding])
- Description
This function converts an input string from the
application/x-www-form-urlencoded MIME
format to a normal string. This is the inverse function ofURL_ENCODE
.- All letters remain unchanged.
- Dots (.), hyphens (-), asterisks (*), and underscores (_) remain unchanged.
- Each plus sign (+) is converted to a space.
- The
%xy
formatted sequence is converted to byte values. Consecutive byte values are decoded to the corresponding strings based on the inputencoding
. - Other characters remain unchanged.
- The final return value of the function is a UTF-8 string.
- Parameters
input
: the string to be decoded.encoding
: the specified encoding format. GBK and UTF-8 are supported. If you do not specify this parameter, its value defaults to UTF-8.
- Return value
Returns a value of the STRING type. NULL is returned if the input parameter is NULL.
- Examples
URL_DECODE('Example+for+URL_DECODE+%3A%2F%2F+dsf%28fasfs%29', 'GBK') = "Example for URL_DECODE:// dsf(fasfs)"
JSON_TUPLE
- Syntax
STRING JSON_TUPLE(STRING json,STRING key1,STRING key2,...)
- Description
This function extracts strings from a standard JSON string based on specified keys.
- Parameters
- json: a value of the STRING type, which indicates a standard JSON string.
- key: a value of the STRING type, which is used to describe the path of a JSON object
in the JSON string.
. The value cannot start with a dollar sign ($). You can enter multiple keys at a time.
- Return value
Returns a value of the STRING type.
Note- NULL is returned if json is empty or invalid.
- NULL is returned if key is empty, invalid, or does not exist in the JSON string.
- The corresponding string is returned if json is valid and key is also present.
- This function can parse JSON data that contains Chinese characters.
- This function can parse nested JSON data.
- This function can parse JSON data that contains nested arrays.
- The parsing action is equivalent to the execution of GET_JSON_OBJECT() along with
set odps.sql.udf.getjsonobj.new=true;
. To obtain multiple objects from a JSON string, you must call the GET_JSON_OBJECT() function multiple times. As a result, the JSON string is parsed multiple times. The JSON_TUPLE() function allows you to enter multiple keys at a time and the JSON string is parsed only once. Compared with GET_JSON_OBJECT(), JSON_TUPLE() is more efficient. - JSON_TUPLE() is a User-Defined Table-Generating Function (UDTF). If you need to select other columns, use it along with LATERAL VIEW.