This topic describes the string functions provided by MaxCompute, such as CHAR_MATCHCOUNT, CHR, CONCAT, GET_JSON_OBJECT, INSTR, and IS_ENCODING.

CHAR_MATCHCOUNT

  • Syntax
    bigint char_matchcount(string str1, string str2)
  • Description

    This function returns the number of characters in str1 that appear in str2.

  • Parameters

    str1 and str2: valid UTF-8 character strings. The function returns a negative value if invalid characters are found during the comparison of the two strings.

  • Return value

    Returns a value of the bigint type. NULL is returned if any input parameter is NULL.

  • Examples
    char_matchcount('abd','aabc') = 2
    -- The a and b characters in str1 appear in str2.

CHR

  • Syntax
    string chr(bigint ascii)
  • Description

    This function converts the specified ASCII code into the corresponding character.

  • Parameters

    ascii: the ASCII value of the bigint type. The input value is implicitly converted into a value of the bigint type before calculation if it is of the string, double, or decimal type. If the input value is of another data type, an exception is thrown.

  • Return value

    Returns a value of the string type. Input value range: 0 to 255. An exception is thrown if the input value is out of this range. NULL is returned if the input parameter is NULL.

CONCAT

  • Syntax
    string concat(string a, string b...)
  • Description

    This function concatenates all character strings and returns the resulting string.

  • Parameters

    The input value is of the string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.

  • Return value

    Returns a value of the string type. NULL is returned if no input parameters are present or any input parameter is NULL.

  • Examples
    concat('ab','c') = 'abc'
    concat() = NULL
    concat('a', null, 'b') = NULL

GET_JSON_OBJECT

  • Syntax
    STRING GET_JSON_OBJECT(STRING json,STRING path)
  • Description

    This function extracts the specified character string from a standard JSON string according to path. The original data is read each time this function is called. Therefore, repeated calls may waste the system resources and increase your cost. To avoid repeated calls, you can use the GET_JSON_OBJECT function together with UDTFs to easily convert JSON log data. For more information, see Convert JSON log data by using MaxCompute built-in functions and UDTFs.

  • Parameters
    • json: a character string in standard json format.
    • path: a character string starting with a dollar sign ($), which describes the path in json. For more information about path, see LanguageManual UDF.
      • $: indicates the root node.
      • .: indicates a child node.
      • []: [number] indicates the array subscript. The array is represented in the format of key[sub1][sub2][sub3]…….
      • *: wildcard for [], which returns the entire array. An asterisk (*) does not support escapes.
  • Return value
    • Returns NULL if json is empty or its format is invalid.
    • Returns the corresponding string if the format of json is valid and path exists.
    • The same key cannot appear in an object twice, for example, {a:1, a:0}. Otherwise, the command may fail to be parsed.
    • Currently, Emoji expressions are not supported.
  • Example 1
    +----+
    json
    +----+
    {"store":
    {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
    "bicycle":{"price":19.95,"color":"red"}
    },
    "email":"amy@only_for_json_udf_test.net",
    "owner":"amy"
    }
    You can extract information from the json object by performing the following query:
    odps> SELECT get_json_object(src_json.json, '$.owner') FROM src_json;
    amy
    odps> SELECT get_json_object(src_json.json, '$.store.fruit\[0]') FROM src_json;
    {"weight":8,"type":"apple"}
    odps> SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_json;
    NULL
    Example 2
    get_json_object('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]')= "2222"
    get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]') = "["h0","h1","h2"]"
    get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]') = "h1"

INSTR

  • Syntax
    bigint instr(string str1, string str2[, bigint start_position[, bigint nth_appearance]])
  • Description

    This function computes the position of substring str2 in string str1.

  • Parameters
    • str1: a value of the string type, which is the character string to be searched. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.
    • str2: a value of the string type, which is the substring to be searched for. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.
    • start_position: a value of the bigint type. If it is of another data type, an exception is thrown. It indicates the character in str1 from which the search will start. The start position defaults to the first character, marked as 1.
    • nth_appearance: a value of the bigint type, which is greater than 0. It indicates the nth time that the substring appears in the character string, where n is specified by nth_appearance. An exception is thrown if the value of nth_appearance is of another data type or is less than or equal to 0.
  • Return value
    • Returns a value of the bigint type.
    • Returns 0 if str2 is not found in str1.
    • Returns NULL if any input parameter is NULL.
    • The matching will always be successful if str2 is empty. For example, 1 is returned for instr(‘abc’, ‘’).
  • Examples
    instr('Tech on the net', 'e') = 2
    instr('Tech on the net', 'e', 1, 1) = 2
    instr('Tech on the net', 'e', 1, 2) = 11
    instr('Tech on the net', 'e', 1, 3) = 14

IS_ENCODING

  • Syntax
    boolean is_encoding(string str, string from_encoding, string to_encoding)
  • Description

    This function determines whether an input string str can be converted from a specified character set (from_encoding) to another character set (to_encoding). It can be used to determine whether the input string is garbled. Generally, from_encoding is set to utf-8 and to_encoding is set to gbk.

  • Parameters
    • str: a value of the string type. An empty string can belong to any character set.
    • from_encoding and to_encoding: values of the string type, which indicate the source and target character sets.
    • NULL is returned if any input parameter is NULL.
  • Return value

    Returns a value of the boolean type. True is returned if str can be successfully converted. Otherwise, False is returned.

KEYVALUE

  • Syntax
    KEYVALUE(STRING srcStr,STRING split1,STRING split2, STRING key)
    KEYVALUE(STRING srcStr,STRING key) //split1 = ";",split2 = ":"
  • Description

    This function splits the source character string srcStr into key-value pairs by split1, separates key-value pairs by split2, and then returns the value of a specified key.

  • Parameters
    • srcStr: the source character string to be split.
    • key: a value of the string type. After the source string is split by split1 and split2 in sequence, the value of the specified key is returned.
    • split1 and split2: the strings that are used as separators to split the source string. If the function does not specify these two items, split1 defaults to a semicolon (;), and split2 defaults to a colon (:). The returned result is undefined if a string that has been split by split1 has multiple split2 values.
  • Return value
    • Returns a value of the string type.
    • Returns NULL if split1 or split2 is NULL.
    • Returns NULL if scrStr and key are NULL or if no key matches.
    • Returns the value corresponding to the first matched key if multiple key-value pairs match.
  • Example 1
    keyvalue('0:1\;1:2', 1) = '2'
    Note The source string is "0:1\;1:2". Because split1 and split2 are not specified, split1 defaults to a semicolon (;) and split2 defaults to a colon (:).
    After the source string is split by split1, the key-value pairs include 0:1\,1:2. After being further split by split2, the key-value pairs change to the following form:
    0 1/  
    1 2
    The value 2 corresponding to key 1 is returned.
    Example 2
    keyvalue("\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\;","\;",":","tf") = "21910" value:21910.
    Note The source string is as follows:
    "\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\;”
    After the source string is split by split1 "\;", the key-value pairs obtained are as follows:
    decreaseStore:1,xcard:1,isB2C:1,tf:21910,cart:1,shipping:2,pf:0,market:shoes,instPayAmount:0 
    After being further split by split2 ":", the key-value pairs change to the following form:
    decreaseStore 1  
    xcard 1  
    isB2C 1  
    tf 21910  
    cart 1  
    shipping 2  
    pf 0  
    market shoes  
    instPayAmount 0

    The return value is 21910 if the key is tf.

LENGTH

  • Syntax
    bigint length(string str)
  • Description

    This function returns the length of the string str.

  • Parameters
    • str: a value of the string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.
    • The return value is of the bigint type. NULL is returned if the input parameter is NULL. -1 is returned if the input parameter is in a non-UTF-8 encoding format.
  • Examples
    length('hi! CN') = 6

LENGTHB

  • Syntax
    bigint lengthb(string str)
  • Description

    This function returns the length of the string str in bytes.

  • Parameters
    • str: a value of the string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.
    • The return value is of the bigint type. NULL is returned if the input parameter is NULL.
  • Examples
    lengthb('hi! China!') = 10

MD5

  • Syntax
    string md5(string value)
  • Description

    This function computes the MD5 value of the input string value.

  • Parameters

    value: a value of the string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.

  • Return value

    Returns a value of the string type. NULL is returned if the input parameter is NULL.

REGEXP_EXTRACT

  • Syntax
    string regexp_extract(string source, string pattern[, bigint occurrence])
  • Description

    This function splits the source string by the regular expression pattern and returns the group characters at the nth occurrence, where n is specified by occurrence.

  • Parameters
    • source: a value of the string type, which indicates the string to be searched.
    • pattern: a constant of the string type. An exception is thrown if pattern is an empty string or no group is specified in pattern.
    • occurrence: a constant of the bigint type, which must be greater than or equal to 0. An exception is thrown if its value is of another data type or less than 0. If it is not specified, its value defaults to 1, indicating that the first group is returned. All substrings matching pattern are returned if the value of occurrence is 0.
    Note Data is stored in the UTF-8 format. Chinese characters can be represented in hexadecimal. They are encoded in the range [\x{4e00}-\x{9fa5}].
  • Return value

    Returns a value of the string type. NULL is returned if any input parameter is NULL.

REGEXP_INSTR

  • Syntax
    bigint regexp_instr(string source, string pattern[,
    bigint start_position[, bigint nth_occurrence[, bigint return_option]]])
  • Description

    This function returns the start or end position of the substring that matches pattern at the nth occurrence, where n is specified by nth_occurrence, in the source string from the start position specified by start_position.

  • Parameters
    • source: a value of the string type, which indicates the string to be searched.
    • pattern: a constant of the string type. An exception is thrown if pattern is an empty string.
    • start_position: a constant of the bigint type, which indicates the start position of a search. If it is not specified, its value defaults to 1. If its value is of another data type or less than or equal to 0, an exception is thrown.
    • nth_occurrence: a constant of the bigint type. If it is not specified, the default value is 1, which indicates the first occurrence of the matching result. An exception is thrown if its value is of another data type or is less than or equal to 0.
    • return_option: a constant of the bigint type, which can be 0 or 1. An exception is thrown if the value is of another data type or out of the value range. 0 indicates that the start position of the matched substring is returned, and 1 indicates that the end position of the matched substring is returned.
  • Return value

    Returns a value of the bigint type. It is the start or end position of the matching substring in the source string according to the type specified by return_option. NULL is returned if any input parameter is NULL.

  • Examples
    regexp_instr("i love www.taobao.com", "o[[:alpha:]]{1}", 3, 2) = 14

REGEXP_REPLACE

  • Syntax
    string regexp_replace(string source, string pattern, string replace_string[, bigint occurrence])
  • Description

    This function substitutes the specified string replace_string for the substring that matches a given pattern at the nth occurrence, where n is specified by occurrence, in the source string, and returns the result.

  • Parameters
    • source: a character string to be replaced.
    • pattern: a constant of the string type, which indicates the pattern to be matched. An exception is thrown if pattern is an empty string.
    • replace_string: a value of the string type, which is a string that is substituted for the substring matching the given pattern.
    • occurrence: a constant of the bigint type, which must be greater than or equal to 0. It indicates that replace_string is substituted for the matching substring at the nth occurrence. The value 0 indicates that all matching substrings are replaced. An exception is thrown if its value is of another data type or less than 0. 0 is assumed by default.
    Note In the absence of the referenced group, the action is not defined.
  • Return value
    • Returns a value of the string type. In the absence of the referenced group, the replacement does not occur.
    • Returns NULL if any input parameter is NULL.
    • Returns NULL if replace_string is NULL and certain substrings match the given pattern.
    • Returns the original string if replace_string is NULL but no substring matches the given pattern.
  • Examples
    regexp_replace("123.456.7890", "([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4})",
    "(\\1)\\2-\\3", 0) = "(123)456-7890"
    regexp_replace("abcd", "(.)", "\\1 ", 0) = "a b c d "
    regexp_replace("abcd", "(.)", "\\1 ", 1) = "a bcd"
    regexp_replace("abcd", "(.)", "\\2", 1) = "abcd"
    -- Only one group is defined in the pattern and the referenced second group does not exist.
    -- Try to avoid this action. The result of referencing a nonexistent group is not defined.
    regexp_replace("abcd", "(. *)(.)$", "\\2", 0) = "d"
    regexp_replace("abcd", "a", "\\1", 0) = "bcd"
    -- No group is defined in the pattern, so "\1" references a nonexistent group.
    -- Try to avoid this action. The result of referencing a nonexistent group is not defined.

REGEXP_SUBSTR

  • Syntax
    string regexp_substr(string source, string pattern[, bigint start_position[, bigint nth_occurrence]])
  • Description

    This function returns the string that matches a given pattern at the nth occurrence, where n is specified by nth_occurrence, in the source string from the start position specified by start_position.

  • Parameters
    • source: a value of the string type, which indicates the string to be searched.
    • pattern: a constant of the string type, which indicates the pattern to be matched. An exception is thrown if pattern is an empty string.
    • start_position: a constant of the bigint type, which must be greater than 0. An exception is thrown if it is of another data type or its value is less than or equal to 0. If not specified, it defaults to 1, indicating that the match starts from the first character of the source string.
    • nth_occurrence: a constant of the bigint type, which must be greater than 0. An exception is thrown if it is of another data type or its value is less than or equal to 0. If not specified, it defaults to 1, indicating that the first matched substring is returned.
  • Return value

    Returns a value of the string type. NULL is returned if any input parameter is NULL or no substrings match.

  • Examples
    regexp_substr ("I love aliyun very much", "a[[:alpha:]]{5}") = "aliyun"
    regexp_substr('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 1) = " have"
    regexp_substr('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 2) = " 2"

REGEXP_COUNT

  • Syntax
    bigint regexp_count(string source, string pattern[, bigint start_position])
  • Description

    This function returns the number of times a substring matches a given pattern in the source string from the start position specified by start_position.

  • Parameters
    • source: a value of the string type, which indicates the string to be searched. An exception is thrown if it is of another data type.
    • pattern: a constant of the string type, which indicates the pattern to be matched. An exception is thrown if pattern is an empty string or of another data type.
    • start_position: a constant of the bigint type, which must be greater than 0. An exception is thrown if it is of another data type or its value is less than or equal to 0. If not specified, it defaults to 1, indicating that the match starts from the first character of the source string.
  • Return value

    Returns a value of the bigint type. 0 is returned if there is no match. NULL is returned if any input parameter is NULL.

  • Examples
    regexp_count('abababc', 'a.c') = 1
    regexp_count('abcde', '[[:alpha:]]{2}', 3) = 1

SPLIT_PART

  • Syntax
    string split_part(string str, string separator, bigint start[, bigint end])
  • Description

    This function uses a separator to split the string str, and returns a substring that starts from the character specified by start and ends with the character specified by end.

  • Parameters
    • str1: a value of the string type, which indicates a string to be split. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.
    • separator: a constant of the string type, which is a separator used for splitting. It can be a character or string. If it is of another data type, an exception is thrown.
    • start: a constant of the bigint type, which must be greater than 0. An exception is thrown if it is not a constant or of another data type. It indicates the start number of the segment to be returned, starting from 1. If end is not specified, the segment specified by start is returned.
    • end: a constant of the bigint type, which must be greater than or equal to start. It indicates the end number of the segment to be returned. An exception is thrown if it is not a constant or of another data type. If end is not specified, the last segment is returned.
  • Return value
    • Returns a value of the string type.
    • Returns an empty string if start is set to a value greater than the number of segments, for example, the string has 6 segments but the start value is greater than 6.
    • Returns the entire string str if separator is absent in the string str, and start is set to 1. An empty string is returned if str is an empty string.
    • Returns the original string str if separator is an empty string.
    • Returns the string between start and the last segment if end is set to a value greater than the number of segments.
    • Returns NULL if any input parameter is NULL.
  • Examples
    split_part('a,b,c,d', ',', 1) = 'a'
    split_part('a,b,c,d', ',', 1, 2) = 'a,b'
    split_part('a,b,c,d', ',', 10) = ''

SUBSTR

  • Syntax
    string substr(string str, bigint start_position[, bigint length])
  • Description

    This function returns a substring that starts from start_position in str and has a length specified by length.

  • Parameters
    • str: a value of string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.
    • start_position: a value of the bigint type. 1 is assumed by default. An empty string is returned if start_position is 0. If start_position is a negative value, the start position is counted backwards from the last character of the string. That is, -1 indicates the last character, -2 indicates the second-to-last character, -3 indicates the third-from-last character, and so on. If the value is of another data type, an exception is thrown.
    • length: a value of the bigint type, which indicates the length of a substring. Its value is greater than 0. An exception is thrown if it is of another data type or its value is less than or equal to 0.
  • Return value

    Returns a value of the string type. NULL is returned if any input parameter is NULL.

    Note If length is not specified, the substring from the start position to the end of str is returned.
  • Examples
    substr("abc", 2) = "bc"
    substr("abc", 2, 1) = "b"
    substr("abc",-2,2) = "bc"
    substr("abc",-3) = "abc"

SUBSTRING

  • Syntax
    string substring(string|binary str, int start_position[, int length])
  • Description

    This function returns a substring that starts from start_position in str and has a length specified by length.

  • Parameters
    • str: a value of the string or binary type. NULL or an error is returned if the input value is of another data type.
    • start_position: a value of the int type. 1 is assumed by default. An empty string is returned if start_position is set to 0. If start_position is a negative value, the start position is counted backwards from the last character of the string. That is, -1 indicates the last character, -2 indicates the second-to-last character, -3 indicates the third-from-last character, and so on. If the value is of another data type, an exception is thrown.
    • length: a value of the bigint type, which indicates the length of a substring. Its value is greater than 0. An exception is thrown if it is of another data type or its value is less than or equal to 0.
  • Return value

    Returns a value of the string type. NULL is returned if any input parameter is NULL.

    Note If length is not specified, the substring from the start position to the end of str is returned.
  • Examples
    substring('abc', 2) = 'bc'
    substring('abc', 2, 1) ='"b'
    substring('abc',-2,2) = 'bc'
    substring('abc',-3,2) = 'ab'
    substring(BIN(2345),2,3) = '001'

TOLOWER

  • Syntax
    string tolower(string source)
  • Description

    This function converts the character string source into a lowercase string.

  • Parameters

    source: a value of the string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.

  • Return value

    Returns a value of the string type. NULL is returned if the input parameter is NULL.

  • Examples
    tolower("aBcd") = "abcd"
    tolower("ABCd") = "abcd"

TOUPPER

  • Syntax
    string toupper(string source)
  • Description

    This function converts the character string source into an uppercase string.

  • Parameters

    source: a value of the string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.

  • Return value

    Returns a value of the string type. NULL is returned if the input parameter is NULL.

  • Examples
    toupper("aBcd") = "ABCD"
    toupper("abCd") = "ABCD"

TO_CHAR

  • Syntax
    string to_char(boolean value)
    string to_char(bigint value)
    string to_char(double value)
    string to_char(DECIMAL value)
  • Description

    This function converts data of the boolean, bigint, decimal, or double type into the corresponding string type.

  • Parameters

    value: The input value can be of the boolean, bigint, decimal, or double type. An exception is thrown if it is of another data type. For more information about how to convert the data of the datetime type, see the description of TO_CHAR in Date functions.

  • Return value

    Returns a value of the string type. NULL is returned if the input parameter is NULL.

  • Examples
    to_char(123) = '123'
    to_char(true) = 'TRUE'
    to_char(1.23) = '1.23'
    to_char(null) = NULL

TRIM

  • Syntax
    string trim(string str)
  • Description

    This function eliminates the spaces on the left and right sides of the character string str.

  • Parameters

    str: a value of the string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.

  • Return value

    Returns a value of the string type. NULL is returned if the input parameter is NULL.

LTRIM

  • Syntax
    string ltrim(string str)
  • Description

    This function eliminates the spaces on the left side of the character string str.

  • Parameters

    str: a value of the string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.

  • Return value

    Returns a value of the string type. NULL is returned if the input parameter is NULL.

  • Examples
    select ltrim(' abc ') from dual; 
    -- Returned result:
    +-----+
    | _c0 |
    +-----+
    | abc |
    +-----+

RTRIM

  • Syntax
    string rtrim(string str)
  • Description

    This function eliminates the spaces on the right side of the character string str.

  • Parameters

    str: a value of the string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.

  • Return value

    Returns a value of the string type. NULL is returned if the input parameter is NULL.

  • Examples
    select rtrim('a abc ') from dual; 
    -- Returned result: 
    +-----+
    | _c0 |
    +-----+
    | a abc |
    +-----+

REVERSE

  • Syntax
    STRING REVERSE(string str)
  • Description

    This function returns a reverse character string.

  • Parameters

    str: a value of the string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.

  • Return value

    Returns a value of the string type. NULL is returned if the input parameter is NULL.

  • Examples
    select reverse('abcedfg') from dual;
    -- Returned result:
    +-----+
    | _c0 |
    +-----+
    | gfdecba |
    +-----+

REPEAT

  • Syntax
    STRING REPEAT(string str, bigint n)
  • Description

    This function returns n duplicates of the character string str.

  • Parameters
    • str: a value of the string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.
    • n: a value of the bigint type. The returned string cannot exceed 2 MB in length. An exception is thrown if n is null.
  • Return value

    Returns a value of the string type.

  • Examples
    select repeat('abc',5) from lxw_dual; -- Return abcabcabcabcabc.

ASCII

  • Syntax
    Bigint ASCII(string str)
  • Description

    This function returns the ASCII code of the first character in the string str.

  • Parameters

    str: a value of the string type. The input value is implicitly converted into a value of the string type before calculation if it is of the bigint, double, decimal, or datetime type. If the input value is of another data type, an exception is thrown.

  • Return value

    Returns a value of the bigint type.

  • Examples
    select ascii('abcde') from dual; -- Return 97.

Additional functions of MaxCompute V2.0

MaxCompute V2.0 extends some mathematical functions. If an additional function that you use in an SQL statement involves new data types, you must insert the following set statement before the SQL statement:
set odps.sql.type.system.odps2=true;
Note You need to add set odps.sql.type.system.odps2=true; before the SQL statement that uses the function, and commit and run it together with the SQL statement so that the new data types can be used.

This section describes the new string functions in MaxCompute V2.0.

CONCAT_WS

  • Syntax
    string concat_ws(string SEP, string a, string b...)
    string concat_ws(string SEP, array)
  • Description

    This function concatenates all input strings in an array by using a specified delimiter.

  • Parameters
    • SEP: a delimiter of the string type. An exception is thrown if this parameter is not specified.
  • Return value

    Returns a value of the string type. NULL is returned if no input parameters are present or any input parameter is NULL.

  • Examples
    concat_ws(':','name','hanmeimei')='name:hanmeimei'
    concat_ws(':','avg',null,'34')=null

LPAD

  • Syntax
    string lpad(string a, int len, string b)
  • Description

    This function pads the left side of string a with string b until the new padded string has len characters.

  • Parameters
    • len: a value of the int type.
    • Parameters such as a and b: values of the string type.
  • Return value

    Returns a value of the string type. If len is smaller than the number of characters in a, a is truncated from the left to obtain a string with the number of characters specified by len. If len is 0, null is returned.

  • Examples
    lpad('abcdefgh',10,'12')='12abcdefgh'
    lpad('abcdefgh',5,'12')='abcde'
    lpad('abcdefgh',0,'12') -- Return null.
Note When a new data type, such as tinyint, smallint, int, float, varchar, timestamp, or binary, is used in a MaxCompute SQL statement, you must insert a set statement before the SQL statement so that the new data type can be enabled.
  • Session level: To use a new data type, you must insert set odps.sql.type.system.odps2=true; before the SQL statement, and commit and run it along with the SQL statement.
  • Project level: You can enable a new data type at the project level. The project owner can run the following command to configure a project:
    setproject odps.sql.type.system.odps2=true;
    For more information about setproject, see Other operations. For more information about the precautions you must take when enabling data types at the project level, see Date types.

RPAD

  • Syntax
    string rpad(string a, int len, string b)
  • Description

    This function pads the right side of string a with string b until the new padded string has len characters.

  • Parameters
    • len: a value of the int type.
    • Parameters such as a and b: values of the string type.
  • Return value

    Returns a value of the string type. If len is smaller than the number of characters in a, a is truncated from the left to obtain a string with the number of characters specified by len. If len is 0, null is returned.

  • Examples
    rpad('abcdefgh',10,'12')='abcdefgh12'
    rpad('abcdefgh',5,'12')='abcde'
    rpad('abcdefgh',0,'12') -- Return null.
Note When a new data type, such as tinyint, smallint, int, float, varchar, timestamp, or binary, is used in a MaxCompute SQL statement, you must insert a set statement before the SQL statement so that the new data type can be enabled.
  • Session level: To use a new data type, you must insert set odps.sql.type.system.odps2=true; before the SQL statement, and commit and run it along with the SQL statement.
  • Project level: You can enable a new data type at the project level. The project owner can run the following command to configure a project:
    setproject odps.sql.type.system.odps2=true;
    For more information about setproject, see Other operations. For more information about the precautions you must take when enabling data types at the project level, see Date types.

REPLACE

  • Syntax
    string replace(string a, string OLD, string NEW)
  • Description

    This function substitutes the string NEW for part of string a that is exactly the same as the string OLD, and returns the string a.

  • Parameters

    The values of all parameters are of the string type.

  • Return value

    Returns a value of the string type. NULL is returned if any input parameter is NULL.

  • Examples
    replace('ababab','abab','12')='12ab'
    replace('ababab','cdf','123')='ababab'
    replace('123abab456ab',null,'abab')=null

SOUNDEX

  • Syntax
    string soundex(string a)
  • Description

    This function converts a common character string into a string of the soundex type.

  • Parameters

    a: a value of the string type.

  • Return value

    Returns a value of the string type. NULL is returned if the input parameter is NULL.

  • Examples
    soundex('hello')='H400'

SUBSTRING_INDEX

  • Syntax
    string substring_index(string a, string SEP, int count))
  • Description

    This function truncates the string a to a substring from the first character to the nth delimiter, where n is specified by count. If count is a positive value, the string is truncated from left to right. Otherwise, the string is truncated from right to left.

  • Parameters

    a and sep are of the string type, whereas count is of the int type.

  • Return value

    Returns a value of the string type. NULL is returned if any input parameter is NULL.

  • Examples
    substring_index('https://www.alibabacloud.com, '.', 2)='https://www.alibabacloud'
    substring_index('https://www.alibabacloud.com', '.', -2)='alibabacloud.com'
    substring_index('https://www.alibabacloud.com', null, 2)=null
Note When a new data type, such as tinyint, smallint, int, float, varchar, timestamp, or binary, is used in a MaxCompute SQL statement, you must insert a set statement before the SQL statement so that the new data type can be enabled.
  • Session level: To use a new data type, you must insert set odps.sql.type.system.odps2=true; before the SQL statement, and commit and run it along with the SQL statement.
  • Project level: You can enable a new data type at the project level. The project owner can run the following command to configure a project:
     setproject odps.sql.type.system.odps2=true;
    For more information about setproject, see Other operations. For more information about the precautions you must take when enabling data types at the project level, see Date types.

JSON_TUPLE

  • Syntax
    STRING JSON_TUPLE(STRING json,STRING key1,STRING key2,...)
  • Description

    This function extracts specified strings from a standard JSON string according to a set of input keys (key1, key2, ...).

  • Parameters
    • json: a value of the string type, which indicates a standard string in the JSON format.
    • key: a value of the string type, which is used to describe the path in JSON. You can enter multiple keys at a time, but they cannot start with a dollar sign ($).
  • Return value

    Returns a value of the string type.

    Note
    • NULL is returned if json is null or in an invalid JSON format.
    • NULL is returned if key is null or invalid (which does not exist in JSON).
    • The corresponding string is returned if json is valid and key is also present.
    • This function can parse JSON data that contains Chinese characters.
    • This function can parse nested JSON data.
    • This function can parse JSON data that contains nested arrays.
    • The parsing action is equivalent to the execution of get_json_object along with set odps.sql.udf.getjsonobj.new=true;. When the same JSON string needs to be parsed multiple times, you need to call get_json_object multiple times. However, json_tuple allows you to enter multiple keys at a time and needs to parse the JSON string only once. Compared with get_json_object, json_tuple is more efficient.
    • json_tuple is a User-Defined Table-Generating Function (UDTF). When selecting other columns, use it along with LATERAL VIEW.