This topic describes the syntax, parameters, and examples of string functions supported by the data analysis feature.
Function | Description |
Converts an ASCII code into a character. | |
Concatenates all specified strings and returns the final string. | |
Extracts a specified substring from a JSON string. | |
Returns the position of a specified substring in a string. | |
Decodes Base64-encoded data. | |
Returns the length of a string. | |
Returns the length of a string in bytes. | |
Returns the MD5 value of a string. | |
Replaces a substring in a specified position of a string with a specified substring. | |
Splits a string on a specified delimiter and returns a substring that starts from a specified position to another specified position. | |
Returns a substring that has a specified length from a specified position of a string. | |
Replaces every uppercase letter in a string with a lowercase letter. | |
Replaces every lowercase letter in a string with an uppercase letter. | |
Converts data of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type to the STRING type. | |
Removes the spaces from the left and right sides of a string. | |
Removes the spaces from the left side of a string. | |
Removes the spaces from the right side of a string. | |
Returns the characters of a string in reverse order. | |
Returns the ASCII code of the first character in a string. | |
Pads the left side of a string specified by a with a string specified by b until the number of characters in the new padded string reaches a value specified by len. | |
Pads the right side of a string specified by a with a string specified by b until the number of characters in the new padded string reaches a value specified by len. | |
Replaces a specified substring in a string with a new substring. | |
Encodes the input string into the application/x-www-form-urlencoded MIME format. | |
Converts an input string from the application/x-www-form-urlencoded MIME format into a normal string. This is the inverse function of url_encoding. | |
Checks whether String 1 ends with String 2. | |
Converts binary data to a Base64-encoded string. |
CHR
Syntax
STRING CHR(BIGINT ascii)Description
Converts a specified ASCII code to the corresponding character.
Parameters
ascii: a value of the BIGINT type, which indicates the ASCII code. Value range: 0 to 255. An exception is returned if the input value is out of this range. The input value is implicitly converted to a value of the BIGINT type before computing if it is of the STRING, DOUBLE, or DECIMAL type. If the input value is of another data type, an exception is returned.
Return value
A value of the STRING type is returned. If the input parameter is set to NULL, NULL is returned.
CONCAT
Syntax
STRING CONCAT(STRING a, STRING b...)Description
Concatenates all specified strings and returns the final string.
Parameters
The values of all parameters are of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is returned.
Return value type
A value of the STRING type is returned. NULL is returned if no input parameters are present or an input parameter is NULL.
Examples
CONCAT('ab','c') = 'abc' CONCAT() = NULL CONCAT('a', null, 'b') = NULL
GET_JSON_OBJECT
Syntax
STRING GET_JSON_OBJECT(STRING json,STRING path)Description
Extracts a specified string from a standard JSON string based on path. The original data is read each time this function is called. Therefore, repeated calls may waste system resources and increase your cost. To prevent repeated calls, you can call the
GET_JSON_OBJECTfunction together with user-defined table-valued functions (UDTFs) to convert JSON log data.Parameters
json: the string in the standard
JSONformat.path: the string that starts with the dollar sign ($), which describes the path of a JSON object in the JSON string. For more information about path, seeLanguageManual UDF.$: indicates the root node..: indicates a child node.[]:[number]indicates the array subscript. The array is represented in the format ofkey[sub1][sub2][sub3]…….*: indicates the wildcard for[]. If this character is used in the path parameter, an entire array is returned. An asterisk (*) cannot be escaped.
Return value
Returns
NULLif json is empty or invalid.Returns the corresponding string if json is valid and the specified
pathexists.Duplicate keys cannot appear in an object. For example, {a:1, a:0} is not supported. Otherwise, the string may fail to be parsed.
Emoji expressions are not supported.
Examples
Example 1:
-- JSON data. {"store": {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net", "owner":"amy" }You can execute the following statements to extract data from the preceding JSON string.
-- The return value is amy. GET_JSON_OBJECT(json, '$.owner'); -- The return value is {"weight":8,"type":"apple"}. GET_JSON_OBJECT(json, '$.store.fruit\[0]'); -- The return value is null. GET_JSON_OBJECT(json, '$.non_exist_key');Example 2:
GET_JSON_OBJECT('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]')= "2222" GET_JSON_OBJECT('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]') = "["h0","h1","h2"]" GET_JSON_OBJECT('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]') = "h1"
INSTR
Syntax
BIGINT INSTR(STRING str1, STRING str2[, BIGINT start_position[, BIGINT nth_appearance]])Description
Returns the position of substring str2 in string
str1.Parameters
: a value of the STRING type, which indicates the string in which you search for the substring.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is returned.: a value of the STRING type, which indicates the substring you want to search for.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is returned.: a value of the BIGINT type. If it is of another data type, an exception is returned.It indicates which character in str1 the search will start with. The default start position is the first character, marked as 1.: a value of the BIGINT type, which must be greater than 0.It indicates the nth time that the substring appears in the string, where n is specified by nth_appearance. An exception is returned if the value of nth_appearance is of another data type or is no greater than 0.
Return value
A value of the BIGINT type is returned.
If
str1is not found instr2, 0 is returned.If an input parameter is set to NULL, NULL is returned.
The matching always succeeds if str2 is empty. For example, 1 is returned for
instr(‘abc’, ").
Examples
INSTR('Tech on the net', 'e') = 2 INSTR('Tech on the net', 'e', 1, 1) = 2 INSTR('Tech on the net', 'e', 1, 2) = 11 INSTR('Tech on the net', 'e', 1, 3) = 14
BASE64_DECODE
Syntax
base64_decode(STRING str)Description
Decodes a Base64-encoded string.
Parameters
str: a value of the STRING type, which indicates a string to be split. If the input value is of the BIGINT, DOUBLE, DECIMAL, TIMESTAMP, or DATE type, the value is implicitly converted into the STRING type before calculation. If the input value is not a Base64-encoded string, the original string is returned.
Return values
A value of the STRING type is returned.
LENGTH
Syntax
BIGINT LENGTH(STRING str)Description
Returns the length of the string str.
Parameters
: a value of the STRING type.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is returned.A value of the BIGINT type is returned. NULL is returned if the input parameter is NULL. -1 is returned if the input parameter is not encoded in UTF-8.
Example
LENGTH('hi! China') = 6
LENGTHB
Syntax
BIGINT LENGTHB(STRING str)Description
Returns the length of the string str in bytes.
Parameters
: a value of the STRING type.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is returned.A value of the BIGINT type is returned. If the input parameter is set to NULL, NULL is returned.
Example
LENGTHB('hi! China') = 10
MD5
Syntax
STRING MD5(STRING value)Description
Returns the MD5 value of the input string value.
Parameters
: a value of the STRING type.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is returned.Return values
A value of the STRING type is returned. If the input parameter is set to NULL, NULL is returned.
REGEXP_REPLACE
Syntax
STRING REGEXP_REPLACE(STRING source, STRING pattern, STRING replace_string[, BIGINT occurrence])Description
Uses a string specified by
replace_STRINGto replace the substring that matches a specifiedpatternat the nth occurrence specified byoccurrencein the source string and returns a result.Parameters
source: the string to be replaced.pattern: a constant of the STRING type, which indicates the pattern to be matched. An exception is returned if
patternis an empty string.replace_string: a value of the STRING type, which is a string that is substituted for the substring matching the given
pattern.occurrence: a constant of the BIGINT type, which must be no less than 0. It indicates thatreplace_stringis substituted for the matching substring at the nth occurrence. The value 0 indicates that all matching substrings are replaced. An exception is returned if its value is of another type or is less than 0. Default value: 0.
Return value
A value of the STRING type is returned. If the referenced group does not exist, no substring is replaced.
If an input parameter is set to NULL, NULL is returned.
If replace_string is set to null and a substring matches the given
pattern, NULL is returned.If replace_string is set to null but no substring matches the given
pattern, the original string is returned.
Examples
regexp_replace("123.456.7890", "([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4})", "(\\1)\\2-\\3", 0) = "(123)456-7890" regexp_replace("abcd", "(.)", "\\1 ", 0) = "a b c d " regexp_replace("abcd", "(.)", "\\1 ", 1) = "a bcd" regexp_replace("abcd", "(.)", "\\2", 1) = "abcd" -- Only one group is defined in the pattern and the group that you want to reference does not exist. -- We recommend that you do not use this function in this manner. The result of referencing a nonexistent group is not defined. regexp_replace("abcd", "(.*)(.)$", "\\2", 0) = "d" regexp_replace("abcd", "a", "\\1", 0) = "bcd" -- No character class is defined in pattern. \1 references a non-existent character class. -- We recommend that you do not use this function in this manner. The result of referencing a nonexistent group is not defined.
SPLIT_PART
Syntax
STRING SPLIT_PART(STRING str, STRING separator, BIGINT start[, BIGINT end])Description
Splits a string specified by
stron a delimiter specified byseparatorand returns a substring that starts from a position specified bystartto another position specified byend.Parameters
: a value of the STRING type, which indicates a string to be split.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is returned.separator: a constant of the STRING type, which is a delimiter used for splitting. It can be a character or string. If it is of another data type, an exception is returned.: a constant of the BIGINT type, which must be greater than 0.An exception is returned if its value is not a constant or is of another data type. This parameter specifies the start position of the substring to be returned. The position starts from 1. If end is not configured, the substring in the position specified bystartis returned.end: a constant of the BIGINT type, which must be no less than the value of
start. This parameter specifies the end position of the segment to be returned. If the value is not a constant or is of another data type, an exception is returned. If end is not configured, the last substring is returned.
Return value
A value of the STRING type is returned.
If you set
startto a value that is greater than the number of segments, an empty string is returned. For example, if a string has six segments but you setstartto a value that is greater than 6, an empty string is returned.If the value of
separatordoes not exist in a string specified bystrand you setstartto 1, the string specified bystris returned. If the value of str is an empty string, an empty string is returned.If the value of
separatoris an empty string, a string specified bystris returned.If you set end to a value that is greater than the number of segments, all segments are returned.
If an input parameter is set to NULL, NULL is returned.
Examples
SPLIT_PART('a,b,c,d', ',', 1) = 'a' SPLIT_PART('a,b,c,d', ',', 1, 2) = 'a,b' SPLIT_PART('a,b,c,d', ',', 10) = ''
SUBSTR
Syntax
STRING SUBSTR(STRING str, BIGINT start_position[, BIGINT length])Description
Returns a substring that starts from a position specified by
start_positionin str and has a length specified by length.Parameters
str: a value of the STRING type. The input value is implicitly converted to a value of the STRING type before calculation if it is of the BIGINT, DECIMAL, DOUBLE, or DATETIME type. If the input value is of another data type, an exception is returned.start_position: a value of the BIGINT type. Default value: 1. If you setstart_positionto 0, an empty string is returned. If you setstart_positionto a negative value, the start position is counted backwards from the last character of the string. For example, -1 indicates the last character, -2 indicates the second-to-last character, and -3 indicates the third-from-last character. If the value is of another data type, an exception is returned.: a value of the BIGINT type, which indicates the length of the substring.Its value is greater than 0. An exception is returned if its value of another data type or is no greater than 0.
Return value
A value of the STRING type is returned. If an input parameter is set to NULL, NULL is returned.
NoteIf the length is not configured, the substring from a position specified by start_position to the end of a string specified by str is returned.
Examples
SUBSTR("abc", 2) = "bc" SUBSTR("abc", 2, 1) = "b" SUBSTR("abc",-2,2) = "bc" SUBSTR("abc",-3) = "abc"
TOLOWER
Syntax
STRING TOLOWER(STRING source)Description
Converts the string source to a lowercase string.
Parameters
: a value of the STRING type.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is returned.Return value
A value of the STRING type is returned. If the input parameter is set to NULL, NULL is returned.
Examples
tolower("aBcd") = "abcd" tolower("HAHACd") = "hahacd"
TOUPPER
Syntax
STRING TOUPPER(STRING source)Description
Converts a string specified by source to an uppercase string.
Parameters
: a value of the STRING type.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is returned.Return value
A value of the STRING type is returned. If the input parameter is set to NULL, NULL is returned.
Examples
toupper("aBcd") = "ABCD" toupper("HahaCd") = "HAHACD"
TO_CHAR
Syntax
STRING TO_CHAR(BOOLEAN value) STRING TO_CHAR(BIGINT value) STRING TO_CHAR(DOUBLE value) STRING TO_CHAR(DECIMAL value)Description
Converts data of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type to the STRING type.
Parameters
: The input value can be of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type. An exception is returned if its value is of another data type.Return value
A value of the STRING type is returned. If the input parameter is set to NULL, NULL is returned.
Examples
TO_CHAR(123) = '123' TO_CHAR(true) = 'TRUE' TO_CHAR(1.23) = '1.23' TO_CHAR(null) = NULL
TRIM
Syntax
STRING TRIM(STRING str)Description
Removes the spaces from the left and right sides of a string specified by str.
Parameters
: a value of the STRING type.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.Return value
A value of the STRING type is returned. If the input parameter is set to NULL, NULL is returned.
Example
LTRIM(' abc ')="abc";
LTRIM
Syntax
STRING LTRIM(STRING str)Description
Removes the spaces from the left side of a string specified by str.
Parameters
: a value of the STRING type.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.Return value
A value of the STRING type is returned. If the input parameter is set to NULL, NULL is returned.
Example
LTRIM(' abc ')="abc ";
RTRIM
Syntax
STRING RTRIM(STRING str)Description
Removes the spaces from the right side of a string specified by str.
Parameters
: a value of the STRING type.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.Return value
A value of the STRING type is returned. If the input parameter is set to NULL, NULL is returned.
Example
RTRIM('a abc ')="a abc";
REVERSE
Syntax
STRING REVERSE(STRING str)Description
Returns a string in reverse order.
Parameters
: a value of the STRING type.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.Return value
A value of the STRING type is returned. If the input parameter is set to NULL, NULL is returned.
Example
REVERSE('abcedfg')="gfdecba"
ASCII
Syntax
BIGINT ASCII(STRING str)Description
Returns the ASCII code of the first character in a string specified by str.
Parameters
: a value of the STRING type.The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.Return value
A value of the BIGINT type is returned.
Example
-- The return value is 97. ASCII('abcde')
LPAD
Syntax
STRING LPAD(STRING a, INT len, STRING b)Description
Pads the left side of a string specified by
awith a string specified by b until the number of characters in the new padded string reachs a value specified by len.Parameters
: a value of the INT type.a and
b: values of the STRING type.
Return value
A value of the STRING type is returned. If the value of
lenis less than the number of characters in a string specified bya, a substring of a length specified bylenis truncated from the left of the string and is returned. If you set len to 0, NULL is returned.Examples
lpad('abcdefgh',10,'12')='12abcdefgh' lpad('abcdefgh',5,'12')='abcde' lpad('abcdefgh',0,'12') -- Return NULL.
RPAD
Syntax
STRING RPAD(STRING a, INT len, STRING b)Description
Pads the right side of string
awith string b until the number of characters in the new padded string reaches a value specified by len.Parameters
: a value of the INT type.Parameters such as a and
b: values of the STRING type.
Return values
A value of the STRING type is returned. If the value of
lenis less than the number of characters in a string specified bya, a substring of a length specified bylenis truncated from the left of the string and is returned.len. If the value oflenis 0, NULL is returned.Examples
rpad('abcdefgh',10,'12')='abcdefgh12' rpad('abcdefgh',5,'12')='abcde' rpad('abcdefgh',0,'12') -- Return NULL.
REPLACE
Syntax
STRING REPLACE(STRING a, STRING OLD, STRING NEW)Description
Purpose: It is used to replace the part of string a that is exactly the same as string OLD with string NEW, and return string a.
Parameters
The values of all parameters are of the STRING type.
Return value
A value of the STRING type is returned. If an input parameter is set to NULL, NULL is returned.
Examples
REPLACE('ababab','abab','12')='12ab' REPLACE('ababab','cdf','123')='ababab' REPLACE('123abab456ab',null,'abab')=null
URL_ENCODE
Syntax
STRING URL_ENCODE(STRING input[, STRING encoding])Description
Encodes the input string in the
application/x-www-form-urlencoded MIMEformat and returns the encoded string.All letters remain unchanged.
Periods (.), hyphens (-), asterisks (*), and underscores (_) remain unchanged.
Spaces are converted into plus signs (+).
Other characters are converted into byte values based on the format specified by
encoding. Each byte value is then represented in the%xyformat, wherexyis the hexadecimal representation of the character value.
Parameters
: the string that you want to decode.: the encoding format. Valid values include GBK and UTF-8. If you do not configure this parameter, the default value UTF-8 is used.
Return value
A value of the STRING type is returned. If the input parameter is set to NULL, NULL is returned.
Examples
URL_ENCODE('Example for URL_ENCODE:// (fdsf)') = "%E7%A4%BA%E4%BE%8Bfor+URL_ENCODE%3A%2F%2F+%28fdsf%29" URL_ENCODE('Example for URL_ENCODE:// dsf(fasfs)', 'GBK') = "Example+for+URL_ENCODE+%3A%2F%2F+dsf%28fasfs%29"
URL_DECODE
Syntax
STRING URL_DECODE(STRING input[, STRING encoding])Description
Converts an input string from the
application/x-www-form-urlencoded MIMEformat into a normal string. This is the inverse function ofURL_ENCODE.All letters remain unchanged.
Periods (.), hyphens (-), asterisks (*), and underscores (_) remain unchanged.
Plus signs (+) are converted into spaces.
The
%xyformatted sequence is converted into byte values. Consecutive byte values are decoded to the related strings based on the value ofencoding.Other characters remain unchanged.
The return value of the function is a string encoded in UTF-8.
Parameters
: the string that you want to decode.: the encoding format. Valid values include GBK and UTF-8. If you do not specify this parameter, the default value UTF-8 is used.
Return value
A value of the STRING type is returned. If the input parameter is set to NULL, NULL is returned.
Examples
URL_DECODE('%E7%A4%BA%E4%BE%8Bfor+URL_DECODE%3A%2F%2F+%28fdsf%29')= "Example for URL_DECODE:// (fdsf)" URL_DECODE('Example+for+URL_DECODE+%3A%2F%2F+dsf%28fasfs%29', 'GBK') = "Example for URL_DECODE:// dsf(fasfs)"
ENDSWITH
Syntax
BOOLEAN ENDSWITH(STRING input, STRING suffix)Description
Checks whether the input string ends with a string specified by suffix.
Parameters
The values of all parameters are of the STRING type. The input value is implicitly converted to a value of the STRING type before computing if it is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type. If the input value is of another data type, an exception is thrown.
Return value
A value of the BOOLEAN type is returned.
TO_BASE64
Syntax
TO_BASE64(BINARY bin)Description
Converts binary data into a Base64-encoded string.
Parameters
The value of the input parameter is of the BINARY type.
Return value
A Base64-encoded string.