You can use string functions to process specified strings in MaxCompute SQL. This topic describes the syntax, parameters, and examples of string functions supported by MaxCompute SQL. It guides you through data development by using string functions.

The following table describes the string functions supported by MaxCompute SQL.

Function Description
ASCII Returns the ASCII code of the first character in a specified string.
BASE64 Converts a binary value into a Base64-encoded string.
CHAR_MATCHCOUNT Calculates the number of characters of String A that appear in String B.
CHR Converts an ASCII code to a character.
CONCAT Concatenates all specified strings and returns the final string.
FROM_JSON Returns data of the ARRAY, MAP, or STRUCT type based on the specified JSON string and output format information.
GET_JSON_OBJECT Extracts the specified string from a standard JSON string based on a specified method.
INSTR Returns the position of String A in String B.
IS_ENCODING Determines whether a string can be converted from Character set A to Character set B.
KEYVALUE Splits a string into key-value pairs, separates the key-value pairs, and then returns the value that corresponds to a specified key.
LENGTH Calculates the length of a string.
LENGTHB Calculates the length of a string in bytes.
MD5 Returns the MD5 value of a string.
PARSE_URL Parses a URL to return the specified information.
REGEXP_COUNT Returns the number of times a substring matches a specified pattern in a string from a specified start position.
REGEXP_EXTRACT Splits a string into groups based on a specified pattern and returns the string in a specified group.
REGEXP_INSTR Returns the start or end position of a substring that starts at a specified position and matches a specified pattern for a specified number of times.
REGEXP_REPLACE Replaces a substring in a string that matches a specified pattern for a specified number of times with another string.
REGEXP_SUBSTR Returns a substring in a string that matches a specified pattern for a specified number of times from a specified position.
SPLIT_PART Uses a delimiter to split a string into substrings and returns a substring of the specified part of the string.
SUBSTR Returns a substring of a string that starts from a specified position and has a specified length. The string is of the STRING type.
SUBSTRING Returns a substring of a string that starts from a specified position and has a specified length. The string is of the STRING or BINARY type.
TO_CHAR Converts data of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type into the STRING type.
TO_JSON Converts data of a complex data type into a JSON string.
TOLOWER Converts uppercase letters in a string into lowercase letters.
TOUPPER Converts lowercase letters in a string into uppercase letters.
TRIM Removes the spaces from the left and right sides of a string.
LTRIM Removes the space from the left side of a string.
RTRIM Removes the space from the right side of a string.
REPEAT Returns a new string that repeats a string for a specified number of times.
REVERSE Returns the characters of a string in reverse order.
UNBASE64 Converts a Base64-encoded string into a binary value.
CONCAT_WS Concatenates all input strings in an array by using a specified delimiter.
JSON_TUPLE Extracts strings from a standard JSON string based on a set of input keys.
LPAD Left pads a string to a specified length.
RPAD Right pads a string to a specified length.
REPLACE Replaces a substring in String A that matches String B with another substring.
SOUNDEX Converts a normal string into a string of the SOUNDEX type.
SUBSTRING_INDEX Truncates a string before a specified delimiter.
URL_ENCODE Encodes the input string in the application/x-www-form-urlencoded MIME format and returns the encoded string.
URL_DECODE Converts an input string that is of the application/x-www-form-urlencoded MIME format into a normal string.

ASCII

  • Syntax
    bigint ascii(string <str>)
  • Description

    Returns the ASCII code of the first character in a string specified by str.

  • Parameter

    str: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.

  • Return value

    A value of the BIGINT type is returned. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Return the ASCII code of the first character in the string abcde. Sample statement:
      -- The return value is 97.
      select ascii('abcde'); 
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select ascii(null);

BASE64

  • Syntax
    string base64(binary <value>)
  • Description

    Converts a binary value into a Base64-encoded string.

  • Parameter

    value: required. A value of the BINARY type, which is the value that you want to convert.

  • Return value

    A value of the STRING type is returned. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Convert the binary result of cast ('alibaba' as binary) into a Base64-encoded string. Sample statement:
      -- The return value is YWxpYmFiYQ==.
      select base64(cast ('alibaba' as binary));
    • Example 2: The input parameter is set to null. Sample statement:
      --The return value is null.
      select base64(null);

CHAR_MATCHCOUNT

  • Syntax
    bigint char_matchcount(string <str1>, string <str2>)
  • Description

    Returns the number of characters that belong to str1 and appear in str2.

  • Parameters

    str1 and str2: required. Values of the STRING type. The values must be valid UTF-8 strings. If invalid characters (non-Unicode-encoded characters) are found during the comparison of the two strings, this function returns a negative value.

  • Return value

    A value of the BIGINT type is returned. If any input parameter is set to null, null is returned.

  • Examples
    • Example 1: Calculate the number of characters that belong to the string aabc and appear in the string abcde. Sample statement:
      -- The return value is 4.
      select char_matchcount('aabc','abcde');
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select char_matchcount(null,'abcde');

CHR

  • Syntax
    string chr(bigint <ascii>)
  • Description

    Converts a specified ASCII code into characters.

  • Parameter

    ascii: required. A value of the BIGINT type, which indicates the ASCII code. Valid values: 0 to 128. If the input value is out of the value range, an error is returned. If the input value is of the STRING, DOUBLE, or DECIMAL type, it is implicitly converted into a value of the BIGINT type before calculation. If the input value is of another data type, an error is returned.

  • Return value

    A value of the STRING type is returned. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Convert the ASCII code 100 into characters. Sample statement:
      -- The return value is d.
      select chr(100);
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select chr(null);
    • Example 3: The input value is a character of the STRING type. Sample statement:
      -- The input value is implicitly converted into a value of the BIGINT type before calculation. The return value is d.
      select chr('100');

CONCAT

  • Syntax
    string concat(string <str1>, string <str2>[,...])
  • Description

    Concatenates all specified strings and returns the final string.

  • Parameters

    str1 and str2: required. Values of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.

  • Return value

    A value of the STRING type is returned. If no input parameters are present or an input parameter is set to null, null is returned.

  • Examples
    • Example 1: Concatenate strings aabc and abcde. Sample statement:
      -- The return value is aabcabcde.
      select concat('aabc','abcde');
    • Example 2: The input value is empty. Sample statement:
      -- The return value is null.
      select concat();
    • Example 3: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select concat('aabc','abcde',null);

FROM_JSON

  • Syntax
    from_json(<jsonStr>, <schema>)
  • Description

    Returns a value of the ARRAY, MAP, or STRUCT data type based on jsonStr and schema of a specified JSON string.

  • Parameters
    • jsonStr: required. The JSON string that you entered.
    • schema: required. The schema of the JSON string. The value of this parameter must be in the same format as that in the statement for creating a table, such as array<bigint>, map<string, array<string>>, or struct<a:int, b:double, `C`:map<string,string>>.
      Note Keys in values of the STRUCT data type are case-sensitive. You can also specify values of the STRUCT data type in the format of a bigint, b double, which is equivalent to struct<a:bigint, b:double>.

      The following table describes the mappings between JSON data types and MaxCompute data types.

      JSON data type MaxCompute data type
      OBJECT STRUCT, MAP, and STRING
      ARRAY ARRAY and STRING
      NUMBER TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, and STRING
      BOOLEAN BOOLEAN and STRING
      STRING STRING, CHAR, VARCHAR, BINARY, DATE, and DATETIME
      NULL All types
      Note The JSON string of the OBJECT and ARRAY types are parsed as much as possible. If the data type of the JSON string is not mapped to any MaxCompute data type, the JSON string is omitted. For ease of use, all JSON data types can be converted into the STRING data type of MaxCompute.
  • Return value

    A value of the ARRAY, MAP, or STRUCT type is returned.

  • Examples
    • Example 1: Convert a specified JSON string into a value of a specified data type. Sample statements:
      -- The return value is {"a":1,"b":0.8}.
      select from_json('{"a":1, "b":0.8}', 'a int, b double');
      -- The return value is "time":"26/08/2015"}.
      select from_json('{"time":"26/08/2015"}', 'time string');
      -- The return value is {"a":1,"b":0.8}.
      select from_json('{"a":1, "b":0.8}', 'a int, b double, c string');
      -- The return value is [1,2,3].
      select from_json('[1, 2, 3, "a"]', 'array<bigint>');
      -- The return value is {"d":"v","a":"1","b":"[1,2,3]","c":"{}"}.
      select from_json('{"a":1,"b":[1,2,3],"c":{},"d":"v"}', 'map<string, string>');
    • Example 2: Use the map_keys and from_json functions to obtain all keys in the JSON string. You can also use JSON_KEYS for the same purpose. Sample statement:
      -- The return value is ["a","b"].
      select map_keys(from_json('{"a":1,"b":2}','map<string,string>'));

GET_JSON_OBJECT

  • Syntax
    string get_json_object(string <json>,string <path>)
  • Description

    Extracts the specified string from a standard JSON string based on path. The original data is read each time this function is called. Therefore, repeated calls may affect system performance and increase costs. To avoid repeated calls, you can use the GET_JSON_OBJECT function with user-defined table-valued functions (UDTFs). For more information, see Convert JSON log data by using MaxCompute built-in functions and UDTFs.

  • Parameters
    • json: required. A value of the STRING type. It indicates a standard JSON object.
    • path: required. A value of the STRING type. It indicates the path in the value of the json parameter and starts with $. For more information about the path parameter, see LanguageManual UDF. For more information about best practices, see Migrate JSON data from OSS to MaxCompute. Different characters have the following meanings:
      • $: indicates the root node.
      • .: indicates a child node.
      • []: [number] indicates the array subscript. The array is represented in the format of key[sub1][sub2][sub3]…….
      • *: indicates the wildcard for [], which returns the entire array. An asterisk (*) cannot be escaped.
  • Return value
    • If the json parameter is empty or invalid, null is returned.
    • If the format of json is valid and path exists, the related string is returned.
    • Duplicate keys cannot appear in an object. For example, {a:1, a:0} is not supported. Otherwise, the string may fail to be parsed.
    • Emoji expressions are not supported.
    You can specify flagodps.sql.udf.getjsonobj.new for a session to determine how this function returns a value.
    • If you run the set odps.sql.udf.getjsonobj.new=false; command, this function escapes JSON reserved characters when it returns a value.

      For example, JSON reserved characters such as line feeds (\n) and quotation marks (") are displayed as '\n' and '\"', respectively.

    • If you run the set odps.sql.udf.getjsonobj.new=true; command, this function retains the original strings when it returns a value.

      We recommend that you use this configuration because it results in more standard function return behavior. This facilitates data processing and improves data processing performance. In this configuration, the return value is still a JSON string, which can be parsed as JSON data, without the need to use the REPLACE or REGEXP_REPLACE function to replace backslashes (\).

    Note For MaxCompute projects that are created on or after January 21, 2021, the GET_JSON_OBJECT function retains the original strings when it returns a value. For MaxCompute projects that are created before January 21, 2021, the GET_JSON_OBJECT function escapes JSON reserved characters when it returns a value. The following example helps you determine how the GET_JSON_OBJECT function returns a value in a MaxCompute project.
    select get_json_object('{"a":"[\\"1\\"]"}', '$.a');
    -- The return value when this function escapes JSON reserved characters:
    [\"1\"]
    
    -- The return value when this function retains the original strings:
    ["1"]

    You can submit a ticket to request the MaxCompute technical support team to enable the GET_JSON_OBJECT function. This way, you do not need to frequently specify flagodps.sql.udf.getjsonobj.new for a session. This function retains the original strings when it returns a value.

  • Examples
    • Example 1: Extract information from the JSON object src_json.json. Sample statements:
      -- Extract the content of the JSON object src_json.json.
      +----+
      json
      +----+
      {"store":
      {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
      "bicycle":{"price":19.95,"color":"red"}
      },
      "email":"amy@only_for_json_udf_test.net",
      "owner":"amy"
      }
      -- Extract the information of the owner field and return amy.
      select get_json_object(src_json.json, '$.owner') from src_json;
      -- Extract the information of the first array in the store.fruit field and return {"weight":8,"type":"apple"}.
      select get_json_object(src_json.json, '$.store.fruit[0]') from src_json;
      -- Extract the information of the non-existent field and return null.
      select get_json_object(src_json.json, '$.non_exist_key') from src_json;
    • Example 2: Extract the information of a JSON object of the ARRAY type. Sample statements:
      -- The return value is 2222.
      select get_json_object('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]');
      -- The return value is ["h0","h1","h2"].
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]');
      -- The return value is h1.
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]');
    • Example 3: The JSON value is empty or invalid. Sample statements:
      -- The return value is null.
      select get_json_object('','$.array[1][1]');
      -- The return value is null.
      select get_json_object('"array":["aaaa",1111],"bbbb":["cccc",3333]','$.array[1][1]');

INSTR

  • Syntax
    bigint instr(string <str1>, string <str2>[, bigint <start_position>[, bigint <nth_appearance>]])
  • Description

    Returns the position of substring str2 in string str1.

  • Parameters
    • str1: required. A value of the STRING type. It indicates a string that contains the substring that you want to search for. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.
    • str2: required. A value of the STRING type. It indicates the substring that you want to search for. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.
    • start_position: optional. A value of the BIGINT type. If the input value is of another data type, an error is returned. It indicates the character in str1 from which the search starts. The default start position is the first character, marked as 1. If start_position is a negative value, the search starts from the end to the start of the string and the last character is –1.
    • nth_appearance: optional. A value of the BIGINT type, which is greater than 0. It indicates the nth time that the substring str2 appears in the string str1. If the value of nth_appearance is of another data type or is less than or equal to 0, an error is returned.
  • Return value
    • A value of the BIGINT type is returned.
    • If str2 is not found in str1, 0 is returned.
    • If any input parameter is set to null, null is returned.
    • If str2 is an empty string, the matching always succeeds. For example, 1 is returned for select instr('abc','');.
  • Examples
    • Example 1: Return the position of the substring e in the string Tech on the net. Sample statement:
      -- The return value is 2.
      select instr('Tech on the net', 'e');
    • Example 2: Return the position of the substring ec in the string Tech on the net. Sample statement:
      -- The return value is 6.
      select instr('Tech on the net', 'on');
    • Example 3: Return the position of the second occurrence in which the substring e appears in the string Tech on the net from the third character. Sample statement:
      -- The return value is 14.
      select instr('Tech on the net', 'e', 3, 2);
    • Example 4: An input value is null. Sample statement:
      -- The return value is null.
      select instr('Tech on the net', null);

IS_ENCODING

  • Syntax
    boolean is_encoding(string <str>, string <from_encoding>, string <to_encoding>)
  • Description

    Determines whether the input string str can be converted from the character set specified by from_encoding to the character set specified by to_encoding. It can be used to determine whether the input string is garbled. In normal cases, from_encoding is set to utf-8 and to_encoding is set to gbk.

  • Parameters
    • str: required. A value of the STRING type. An empty string can belong to any character set.
    • from_encoding and to_encoding: required. Values of the STRING type. from_encoding indicates the source character set, and to_encoding indicates the destination character set.
  • Return value

    A value of the BOOLEAN type is returned. If str can be converted, True is returned. Otherwise, False is returned. If any input parameter is set to null, null is returned.

KEYVALUE

  • Syntax
    keyvalue(string <str>,[string <split1>,string <split2>,] string <key>)
    keyvalue(string <str>,string <key>) 
  • Description

    Splits the string str into key-value pairs by split1, separates the key-value pairs by split2, and then returns the value of the specified key.

  • Parameters
    • str: required. A value of the STRING type. It indicates the string that you want to split.
    • split1 and split2: optional. Values of the STRING type. The strings that are used as delimiters to split the source string. If you do not specify the two parameters, the default value of split1 is a semicolon (;) and the default value of split2 is a colon (:). If a key-value pair that is obtained after the source string is split by split1 contains multiple delimiters specified by split2, the returned result is undefined.
    • key: required. A value of the STRING type. After the source string is split by split1 and split2 in sequence, the value that corresponds to the key is returned.
  • Return value
    • A value of the STRING type is returned.
    • If split1 or split2 is set to null, null is returned.
    • If str or key is set to null or if no key matches, null is returned.
    • If multiple key-value pairs match, the value that corresponds to the first matched key is returned.
  • Examples
    • Example 1: Split the string 0:1\;1:2 into key-value pairs and return the value that corresponds to key 1. Sample statement:
      -- The return value is 2.
      select keyvalue('0:1\;1:2', 1);
      The split1 and split2 parameters are not specified. The default value of split1 is a semicolon (;) and the default value of split2 is a colon (:).
      After the source string is split by split1, the key-value pairs are 0:1\,1:2. After the key-value pairs are split by split2, they change to the following form:
      0 1/  
      1 2
      The value 2 that corresponds to key 1 is returned.
    • Example 2: Split the string \;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\; into key-value pairs by \;, separate the values from keys by :, and then return the value that corresponds to key tf. Sample statement:
      -- The return value is 21910.
      select keyvalue("\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\;","\;",":","tf");
      After the source string \;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\; is split by \;, the following key-value pairs are generated:
      decreaseStore:1, xcard:1, isB2C:1, tf:21910, cart:1, shipping:2, pf:0, market:shoes, instPayAmount:0 
      After the key-value pairs are separated by :, the following keys and values are generated:
      decreaseStore 1  
      xcard 1  
      isB2C 1  
      tf 21910  
      cart 1  
      shipping 2  
      pf 0  
      market shoes  
      instPayAmount 0
      The value 21910 that corresponds to key tf is returned.

LENGTH

  • Syntax
    bigint length(string <str>)
  • Description

    Returns the length of the string str.

  • Parameter

    str: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.

  • Return value

    A value of the BIGINT type. If the input parameter is set to null, null is returned. If the input parameter is not UTF-8 encoded, –1 is returned.

  • Examples
    • Example 1: Return the length of the string Tech on the net. Sample statement:
      -- The return value is 15.
      select length('Tech on the net');
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select length(null);

LENGTHB

  • Syntax
    bigint lengthb(string <str>)
  • Description

    Returns the length of a string specified by str in bytes.

  • Parameter

    str: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.

  • Return value

    A value of the BIGINT type. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Return the length of the string Tech on the net in bytes. Sample statement:
      -- The return value is 15.
      select lengthb('Tech on the net');
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select lengthb(null);

MD5

  • Syntax
    string md5(string <str>)
  • Description

    Returns the MD5 value of a string specified by str.

  • Parameter

    str: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.

  • Return value

    A value of the STRING type. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Return the MD5 value of the string Tech on the net. Sample statement:
      -- The return value is ddc4c4796880633333d77a60fcda9af6.
      select md5('Tech on the net');
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select md5(null);

PARSE_URL

  • Syntax
    string parse_url(string <url>, string <part>[, string <key>])
  • Description

    Parses url and extracts information based on the value specified by part.

  • Parameters
    • url: required. A URL. If the URL is invalid, an error is returned.
    • pat: required. A value of the STRING type. Valid values: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. The value of this parameter is not case sensitive. If it is none of the preceding values, an error is returned.
    • key: optional. If part is set to QUERY, this function returns the value that corresponds to the key.
  • Return value

    A value of the STRING type is returned. If url or part is set to null, null is returned.

  • Examples
    -- The return value is example.com.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'HOST');
    -- The return value is /over/there/index.dtb.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'PATH');
    -- The return value is animal.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'QUERY', 'type');
    -- The return value is nose.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'REF');
    -- The return value is file.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'PROTOCOL');
    -- The return value is username:password@example.com:8042.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'AUTHORITY');
    -- The return value is username:password.
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'USERINFO');

REGEXP_COUNT

  • Syntax
    bigint regexp_count(string <source>, string <pattern>[, bigint <start_position>])
  • Description

    Returns the number of times a substring matches a given pattern in the source string from the start position specified by start_position.

  • Parameters
    • source: required. A value of the STRING type. It is a string that contains the substring that you want to search for. If the value is of another data type, an error is returned.
    • pattern: required. A constant of the STRING type. It indicates the pattern that a specified substring needs to match. If pattern is an empty string or is of another data type, an error is returned.
    • start_position: optional. A constant of the BIGINT type. The value of this parameter must be greater than 0. If the value is of another data type or less than or equal to 0, an error is returned. If you do not specify this parameter, the default value is 1, which indicates that the search starts from the first character of the source string.
  • Return value

    A value of the BIGINT type is returned. If no substring is matched, 0 is returned. If any input parameter is set to null, null is returned.

  • Examples
    • Example 1: Calculate the number of substrings in the abababc that match the specified rule from the specified position. Sample statements:
      -- The return value is 1.
      select regexp_count('abababc', 'a.c');
      -- The return value is 2.
      select regexp_count('abababc', '[[:alpha:]]{2}', 3);
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select regexp_count('abababc', null);

REGEXP_EXTRACT

  • Syntax
    string regexp_extract(string <source>, string <pattern>[, bigint <occurrence>])
  • Description

    Splits the source string into groups based on a given pattern and returns the string in the nth group specified by occurrence.

  • Parameters
    • source: required. The string that you want to split, which is of the STRING type.
    • pattern: required. A constant of the STRING type. If pattern is an empty string or no group is specified in pattern, an error is returned.
    • occurrence: optional. A constant of the BIGINT type. The value of this parameter must be greater than or equal to 0. If it is of another data type or is less than 0, an error is returned. If you do not specify this parameter, the default value is 1. This value indicates that the string in the first group is returned. If occurrence is set to 0, all substrings that match pattern are returned.
    Note Data is stored in the UTF-8 format. Chinese characters can be represented in hexadecimal. They are encoded in the range of [\x{4e00}-\x{9fa5}].
  • Return value

    A value of the STRING type. If any input parameter is set to null, null is returned.

REGEXP_INSTR

  • Syntax
    bigint regexp_instr(string <source>, string <pattern>[,bigint <start_position>[, bigint <occurrence>[, bigint <return_option>]]])
  • Description

    Returns the start or end position of the substring that matches pattern at the nth occurrence specified by occurrence, in the source string from the start position specified by start_position.

  • Parameters
    • source: required. A value of the STRING type. The source string.
    • pattern: required. A constant of the STRING type. If pattern is an empty string, an error is returned.
    • start_position: optional. A constant of the BIGINT type. It indicates the position from which the search starts. If you do not specify this parameter, the default value of this parameter is 1. If the value is of another data type or is less than or equal to 0, an error is returned.
    • occurrence: optional. A constant of the BIGINT type. If you do not specify this parameter, the default value 1 is used, indicating the position where a substring matches pattern in the search for the first time. If the value is of another data type or less than or equal to 0, an error is returned.
    • return_option: optional. A constant of the BIGINT type. It indicates that the start or end position of the substring that matches a specified pattern is returned. Valid values: 0 and 1. Default value: 0. The value 0 indicates that the start position of the substring that matches a specified pattern is returned. 1 indicates that the end position of the substring that matches a specified pattern is returned. If it is of another data type or is an invalid value, an error is returned.
  • Return value

    A value of the BIGINT type is returned. It is the start or end position of the matched substring in the source string based on the type specified by return_option. If any input parameter is set to null, null is returned.

  • Examples
    • Example 1: Return the start position of the substring that matches o[[:alpha:]]{1} at the second occurrence in the i love www.taobao.com string from the third character. Sample statement:
      -- The return value is 14.
      select regexp_instr('i love www.taobao.com', 'o[[:alpha:]]{1}', 3, 2);
    • Example 2: Return the end position of the substring that matches o[[:alpha:]]{1} at the second occurrence in the i love www.taobao.com string from the third character. Sample statement:
      -- The return value is 16.
      select regexp_instr('i love www.taobao.com', 'o[[:alpha:]]{1}', 3, 2, 1);
    • Example 3: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select regexp_instr('i love www.taobao.com', null, 3, 2);

REGEXP_REPLACE

  • Syntax
    string regexp_replace(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])
  • Description

    Replaces the substring that matches a given pattern at the nth occurrence specified by occurrence in the source string with a string specified by replace_string and returns a result.

  • Parameters
    • source: required. A value of the STRING type. It indicates the string that you want to split.
    • pattern: required. A constant of the STRING type. It indicates the pattern that a string matches. If pattern is an empty string, an error is returned.
    • replace_string: required. A value of the STRING type. The value is used to replace the string that matches the pattern.
    • occurrence: optional. A constant of the BIGINT type, which must be greater than or equal to 0. The value of this parameter indicates that the string that matches the specified pattern at the nth occurrence specified by occurrence is replaced with replace_string. If this parameter is set to 0, all the substrings that match the specified pattern are replaced. If it is of another data type or is less than 0, an error is returned. Default value: 0.
  • Return value
    • A value of the STRING type is returned. If the referenced group does not exist, no substring is replaced.
    • If any input parameter is set to null, null is returned.
    • If replace_string is set to null and a substring matches the given pattern, null is returned.
    • If replace_string is set to null but no substring matches the given pattern, the original string is returned.
  • Examples
    • Example 1: Replace all strings in the 123.456.7890 string that match the ([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4}) with (\\1)\\2-\\3. Sample statement:
      -- The return value is (123)456-7890.
      select regexp_replace('123.456.7890', '([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4})',
      '(\\1)\\2-\\3', 0);
    • Example 2: Replace the substring that matches the specified pattern in the abcd string. Sample statement:
      -- The return value is a b c d.
      select regexp_replace('abcd', '(.)', '\\1 ', 0);
      -- The return value is a bcd.
      select regexp_replace('abcd', '(.)', '\\1 ', 1);
      -- The return value is abcd.
      select regexp_replace('abcd', '(.)', '\\2', 1);
    • Example 3: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select regexp_replace('abcd', '(.)', null, 0);
    • Example 4: The referenced group does not exist. Sample statement:
      -- Only one group is defined in the pattern and the referenced group does not exist.
      -- Try to avoid this situation. The result of referencing a nonexistent group is not defined.
      regexp_replace("abcd", "(.*)(.)$", "\\2", 0) = "d"
      regexp_replace("abcd", "a", "\\1", 0) = "bcd"
      -- The referenced group is defined in the pattern. Therefore, "\1" references a nonexistent group.
      -- Try to avoid this situation. The result of referencing a nonexistent group is not defined. 

REGEXP_SUBSTR

  • Syntax
    string regexp_substr(string <source>, string <pattern>[, bigint <start_position>[, bigint <occurrence>]])
  • Description

    Returns a substring that matches a given pattern at the nth occurrence specified by occurrence, in the source string from the start position specified by start_position.

  • Parameters
    • source: required. A value of the STRING type. It indicates a string in which you want to search.
    • pattern: required. A constant of the STRING type. It indicates a pattern that is used for a substring match. If pattern is an empty string, an error is returned.
    • start_position: optional. A constant of the BIGINT type. The value of this parameter must be greater than 0. If it is of another data type or is less than or equal to 0, an error is returned. If you do not specify this parameter, the default value is 1. This value indicates that the search starts from the first character of the source string.
    • occurrence: optional. A constant of the BIGINT type. The value of this parameter must be greater than 0. If it is of another data type or is less than or equal to 0, an error is returned. If you do not specify this parameter, the default value is 1. This value indicates that the first matched substring is returned.
  • Return value

    A value of the STRING type. If any input parameter is set to null, null is returned. If no substring matches the specified pattern, null is returned.

  • Examples
    • Example 1: Return the substring in the I love aliyun very much string that matches a specified pattern. Sample statement:
      -- The return value is aliyun.
      select regexp_substr('I love aliyun very much', 'a[[:alpha:]]{5}');
      -- The return value is have.
      select regexp_substr('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 1);
      -- The return value is 2.
      select regexp_substr('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 2);
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select regexp_substr('I love aliyun very much', null);

SPLIT_PART

  • Syntax
    string split_part(string <str>, string <separator>, bigint <start>[, bigint <end>])
  • Description

    Uses a delimiter specified by separator to split a string specified by str, and returns a substring that starts from the character specified by start and ends with the character specified by end.

  • Parameters
    • str: required. A value of the STRING type. It indicates the string that you want to split. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.
    • separator: required. A constant of the STRING type. It indicates the delimiter that is used to split a string. It can be a character or a string. If it is neither a character nor a string, an error is returned.
    • str: required. A constant of the BIGINT type. The value of this parameter must be greater than 0. If it is of another data type or is not a constant, an error is returned. It indicates the start position of the substring to be returned. The position starts from 1. If end is not specified, the substring that starts from the character specified by start is returned.
    • end: a constant of the BIGINT type, which must be greater than or equal to the value of start. Otherwise, an error is returned. It indicates the end position of the substring to be returned. If the value is not a constant or is of another data type, an error is returned. If you do not specify this parameter, the last part is used.
  • Return value
    • A value of the STRING type is returned.
    • If start is set to a value greater than the number of substrings, for example, the string has 6 substrings but the start value is greater than 6, an empty string is returned.
    • If separator is absent from a string specified by str and start is set to 1, the entire string specified by str is returned. If str is an empty string, an empty string is returned.
    • If separator is an empty string, the original string specified by str is returned.
    • If the value of end is greater than the number of substrings, the substring starting from the character specified by start is returned.
    • If any input parameter except separator is set to null, null is returned.
  • Examples
    • Example 1: Use commas (,) as a delimiter to split the string a,b,c,d and return the substring that matches the specified rule. Sample statement:
      -- The return value is a.
      select split_part('a,b,c,d', ',', 1);
      -- The return value is a,b.
      select split_part('a,b,c,d', ',', 1, 2);
    • Example 2: The value of start is greater than the actual number of substrings after the specified string is split into substrings. Sample statement:
      -- The return value is an empty string.
      select split_part('a,b,c,d', ',', 10);
    • Example 3: separator does not exist in the string specified by str. Sample statements:
      -- The return value is a,b,c,d.
      select split_part('a,b,c,d', ':', 1);
      -- The return value is an empty string.
      select split_part('a,b,c,d', ':', 2);
    • Example 4: separator is an empty string. Sample statement:
      -- The return value is a,b,c,d.
      select split_part('a,b,c,d', '', 1);
    • Example 5: The value of end is greater than the actual number of substrings after the specified string is split into substrings. Sample statement:
      -- The return value is b,c,d.
      select split_part('a,b,c,d', ',', 2, 6);
    • Example 6: An input parameter except separator is set to null. Sample statement:
      -- The return value is null.
      select split_part('a,b,c,d', ',', null);

SUBSTR

  • Syntax
    string substr(string <str>, bigint <start_position>[, bigint <length>])
  • Description

    Returns a substring that starts from start_position in a string specified by str and has a length specified by length.

  • Parameters
    • str: required. A value of the STRING type. If the input value is of the BIGINT, DECIMAL, DOUBLE, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.
    • start_position: required. A value of the BIGINT type. Default value: 1.
      • Hive-compatible data type edition: If start_position is set to 0, the return value is the same as that when this parameter is set to 1.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: If start_position is set to 0, null is returned.
    • length: optional. A value of the BIGINT type. It indicates the length of a substring. The value of this parameter must be greater than 0. If the input value is of another data type or is less than or equal to 0, an error is returned.
  • Return value

    A value of the STRING type. If any input parameter is set to null, null is returned.

    Note If length is not specified, the substring from the start position to the end of str is returned.
  • Examples
    • Example 1: Return a substring with the specified length starting from the specified position in the string abc. Sample statements:
      -- The return value is bc.
      select substr('abc', 2);
      -- The return value is b.
      select substr('abc', 2, 1);
      -- The return value is bc.
      select substr('abc',-2 , 2);
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select substr('abc', null);

SUBSTRING

  • Syntax
    string substring(string|binary <str>, int <start_position>[, int <length>])
  • Description

    Returns a substring that starts from start_position in a string specified by str and has a length specified by length.

  • Parameters
    • str: required. A value of the STRING or BINARY type. If the input value is of another data type, null or an error is returned.
    • start_position: required. A value of the INT type. The start position starts from 1. If start_position is set to 0, an empty string is returned. If start_position is a negative value, the counting starts from the end to the start of the string and the last character is –1. If the input value is of another data type, an error is returned.
    • length: optional. A value of the BIGINT type. It indicates the length of a substring. The value of this parameter must be greater than 0. If the input value is of another data type or is less than or equal to 0, an error is returned.
  • Return value

    A value of the STRING type. If any input parameter is set to null, null is returned.

    Note If length is not specified, the substring from the start position to the end of the string specified by str is returned.
  • Examples
    • Example 1: Return a substring with the specified length starting from the specified position in the string abc. Sample statements:
      -- The return value is bc.
      select substring('abc', 2);
      -- The return value is b.
      select substring('abc', 2, 1);
      -- The return value is bc.
      select substring('abc',-2,2);
      -- The return value is ab.
      select substring('abc',-3,2);
      -- The return value is 001.
      substring(bin(2345), 2, 3);
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select substring('abc', null, null);

TO_CHAR

  • Syntax
    string to_char(boolean <value>)
    string to_char(bigint <value>)
    string to_char(double <value>)
    string to_char(decimal <value>)
  • Description

    Converts data of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type into the STRING type.

  • Parameter

    value: required. A value of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type. If the input value is of another data type, an error is returned.

  • Return value

    A value of the STRING type is returned. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Convert values into the STRING type. Sample statements:
      -- The return value is 123.
      select to_char(123);
      -- The return value is TRUE.
      select to_char(true);
      -- The return value is 1.23.
      select to_char(1.23);
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select to_char(null);

TO_JSON

  • Syntax
    to_json(<expr>)
  • Description

    Converts an expression of a given complex data type into a JSON string

  • Parameters
    expr: required. The expression of the ARRAY, MAP, or STRUCT type.
    Note If the input expression is of the STRUCT type (struct<key1:value1, key2:value2>):
    • All keys are converted into lowercase letters when you convert the expression into a JSON string.
    • If value is set to null, the key-value pair to which value belongs is not included in the JSON string that is returned. For example, if value2 is set to null, key2:value2 is not included in the JSON string that is returned.
  • Examples
    • Example 1: Convert an expression of a given data type into a JSON string. Sample statements:
      -- The return value is {"a":1,"b":2}.
      select to_json(named_struct('a', 1, 'b', 2));
      -- The return value is {"time":"26/08/2015"}.
      select to_json(named_struct('time', "26/08/2015"));
      -- The return value is [{"a":1,"b":2}].
      select to_json(array(named_struct('a', 1, 'b', 2)));
      -- The return value is {"a":{"b":1}}.
      select to_json(map('a', named_struct('b', 1)));
      -- The return value is {"a":1}.
      select to_json(map('a', 1));
      -- The return value is [{"a":1}].
      select to_json(array((map('a', 1))));
    • Example 2: The input expression is of the STRUCT type. Sample statements:
      -- The return value is {"a":"B"}. If the expression of the STRUCT type is converted into a JSON string, all keys are converted into lowercase letters.
      select to_json(named_struct("A", "B"));
      -- The return value is {"k2":"v2"}. The key-value pair to which null belongs is not included in the JSON string that is returned.
      select to_json(named_struct("k1", cast(null as string), "k2", "v2"));

TOLOWER

  • Syntax
    string tolower(string <source>)
  • Description

    Converts uppercase letters in a string specified by source into lowercase letters.

  • Parameter

    source: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.

  • Return value

    A value of the STRING type is returned. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Convert uppercase letters in a string into lowercase letters. Sample statements:
      -- The return value is abcd.
      select tolower('aBcd');
      -- The return value is china fighting.
      select tolower ('China Fighting');
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select tolower(null);

TOUPPER

  • Syntax
    string toupper(string <source>)
  • Description

    Converts lowercase letters in a string specified by source into uppercase letters.

  • Parameter

    source: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.

  • Return value

    A value of the STRING type is returned. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Convert lowercase letters in a string into uppercase letters. Sample statements:
      -- The return value is ABCD.
      select toupper('aBcd');
      -- The return value is CHINA FIGHTING.
      select toupper('China Fighting');
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select toupper(null);

TRIM

  • Syntax
    string trim(string <str>)
  • Description

    Removes the spaces from the left and right sides of a string specified by str.

  • Parameter

    str: required. A value of the STRING type. If the input value is of the BIGINT, DECIMAL, DOUBLE, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.

  • Return value

    A value of the STRING type is returned. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Remove the spaces from the left and right sides of the string I love aliyun very much . Sample statement:
      -- The return value is "I love aliyun very much".
      select trim(' I love aliyun very much ');
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select trim(null);

LTRIM

  • Syntax
    string ltrim(string <str>)
  • Description

    Removes the space from the left side of a string specified by str.

  • Parameter

    str: required. A value of the STRING type. If the input value is of the BIGINT, DECIMAL, DOUBLE, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.

  • Return value

    A value of the STRING type is returned. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Remove the space from the left side of the string I love aliyun very much . Sample statement:
      -- The return value is "I love aliyun very much ".
      select ltrim(' I love aliyun very much ');
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select ltrim(null);

RTRIM

  • Syntax
    string rtrim(string <str>)
  • Description

    Removes the space from the right side of a string specified by str.

  • Parameter

    str: required. A value of the STRING type. If the input value is of the BIGINT, DECIMAL, DOUBLE, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.

  • Return value

    A value of the STRING type is returned. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Remove the space from the right side of the string I love aliyun very much . Sample statement:
      -- The return value is " I love aliyun very much".
      select rtrim(' I love aliyun very much ');
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select rtrim(null);

REPEAT

  • Syntax
    string repeat(string <str>, bigint <n>)
  • Description

    Returns a new string with a specified number of copies of a string specified by str.

  • Parameters
    • str: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.
    • n: required. A value of the BIGINT type. The returned string cannot exceed 2 MB in length. If the input value is empty, an error is returned.
  • Return value

    A value of the STRING type is returned. If any input parameter is set to null, null is returned.

  • Examples
    • Example 1: Return a new string that repeats the string abc for 5 times. Sample statement:
      -- The return value is abcabcabcabcabc.
      select repeat('abc', 5); 
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select repeat('abc', null);

REVERSE

  • Syntax
    string reverse(string <str>)
  • Description

    Returns the characters of a string in reverse order.

  • Parameter

    str: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.

  • Return value

    A value of the STRING type is returned. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Return a string whose characters are in reverse order of the string I love aliyun very much. Sample statement:
      -- The return value is hcum yrev nuyila evol I.
      select reverse('I love aliyun very much');
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select reverse(null);

UNBASE64

  • Syntax
    binary unbase64(string <str>)
  • Description

    Converts a Base64-encoded string specified by str into a binary value.

  • Parameter

    str: required. A value of the STRING type. It is a Base64-encoded string that you want to convert.

  • Return value

    A value of the BINARY type is returned.

  • Examples
    • Example 1: Convert the string YWxpYmFiYQ== into a binary value. Sample statement:
      -- The return value is alibaba.
      select unbase64('YWxpYmFiYQ==');
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select ascii(null);

Additional functions of MaxCompute V2.0

MaxCompute V2.0 provides additional functions. If the functions you use involve new data types, execute the following SET statement to enable these data types. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.
  • Session level: To use a new data type, you must insert set odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
  • Project level: The project owner can configure new data types for the project as required. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
    setproject odps.sql.type.system.odps2=true;
    For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Data types.

CONCAT_WS

  • Syntax
    string concat_ws(string <separator>, string <str1>, string <str2>[,...])
    string concat_ws(string <separator>, array)
  • Description

    Concatenates all input strings in an array by using a specified delimiter. This function is an additional function of MaxCompute V2.0.

  • Parameters
    • separator: required. A delimiter of the STRING type. If you do not specify this parameter, an error is returned.
    • str1 and str2: At least two strings must be specified. Values of the STRING type. If the input value is of the BIGINT, DECIMAL, DOUBLE, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.
  • Return value

    A value of the STRING type is returned. If no input parameters are present or any input parameter is set to null, null is returned.

  • Examples
    • Example 1: Concatenate the strings name and hanmeimei by using a colon (:). Sample statement:
      -- The return value is name:hanmeimei.
      select concat_ws(':','name','hanmeimei');
      -
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select concat_ws(':','avg',null,'34');

JSON_TUPLE

  • Syntax
    string json_tuple(string <json>, string <key1>, string <key2>,...)
  • Description

    Extracts strings from a standard JSON string based on a set of input keys, such as key1 and key2. This function is an additional function of MaxCompute V2.0.

  • Parameters
    • json: required. A value of the STRING type. It indicates a standard JSON string.
    • key: required. A value of the STRING type. It is used to describe the path of a JSON object in the JSON string. The value cannot start with a dollar sign ($). You can enter multiple keys at a time.
  • Return value

    A value of the STRING type is returned.

    Note
    • If json is empty or invalid, null is returned.
    • If key is empty, invalid, or does not exist in the JSON string, null is returned.
    • If json is valid and key exists, the related string is returned.
    • This function can parse JSON data that contains Chinese characters.
    • This function can parse nested JSON data.
    • This function can parse JSON data that contains nested arrays.
    • The parsing action is equivalent to the execution of GET_JSON_OBJECT() along with set odps.sql.udf.getjsonobj.new=true;. To obtain multiple objects from a JSON string, you must call the GET_JSON_OBJECT() function multiple times. As a result, the JSON string is parsed multiple times. The JSON_TUPLE() function allows you to enter multiple keys at a time and the JSON string is parsed only once. JSON_TUPLE() is more efficient than GET_JSON_OBJECT().
    • JSON_TUPLE() is a user-defined table-valued function (UDTF). If you want to select some columns from a table, use JSON_TUPLE() along with LATERAL VIEW.

LPAD

  • Syntax
    string lpad(string <str1>, int <length>, string <str2>)
  • Description

    Left pads str1 with str2 to a specified length. This function is an additional function of MaxCompute V2.0.

  • Parameters
    • str: required. A value of the STRING type. It indicates the string that you want to left pad.
    • length: required. A value of the INT type. It indicates the number of characters that are used for left padding.
    • str: required. It indicates the string that you use to left pad another string.
  • Return value

    A value of the STRING type is returned. If the value of length is less than the number of characters in str1, this function truncates str1 from the left to obtain a string with the number of characters specified by length. If length is set to 0, an empty string is returned.

    If no input parameters are present or any input parameter is null, null is returned.

  • Examples
    • Example 1: Left pad the string abcdefgh with the string12 to obtain a string with 10 characters in length. Sample statement:
      -- The return value is 12abcdefgh.
      select lpad('abcdefgh', 10, '12');
    • Example 2: Left pad the string abcdefgh with the string12 to obtain a string with 5 characters in length. Sample statement:
      -- The return value is abcde.
      select lpad('abcdefgh', 5, '12');
    • Example 3: The value of length is 0. Sample statement:
      -- The return value is an empty string.
      select lpad('abcdefgh' ,0, '12'); 
    • Example 4: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select lpad(null ,0, '12');

RPAD

  • Syntax
    string rpad(string <str1>, int <length>, string <str2>)
  • Description

    Right pads str1 with str2 to a specified length. This function is an additional function of MaxCompute V2.0.

  • Parameters
    • str: required. A value of the STRING type. It indicates the string that you want to right pad.
    • length: required. A value of the INT type. It indicates the number of characters that are used for right padding.
    • str: required. It indicates the string that you use to right pad another string.
  • Return value

    A value of the STRING type is returned. If the value of length is less than the number of characters in str1, this function truncates str1 from the left to obtain a string with the number of characters specified by length. If length is set to 0, an empty string is returned.

    If no input parameters are present or any input parameter is set to null, null is returned.

  • Examples
    • Example 1: Right pad the string abcdefgh with the string12 to obtain a string with 10 characters in length. Sample statement:
      -- The return value is abcdefgh12.
      select rpad('abcdefgh', 10, '12');
    • Example 2: Right pad the string abcdefgh with the string12 to obtain a string with 5 characters in length. Sample statement:
      -- The return value is abcde.
      select rpad('abcdefgh', 5, '12');
    • Example 3: The value of length is 0. Sample statement:
      -- The return value is an empty string.
      select rpad('abcdefgh' ,0, '12'); 
    • Example 4: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select rpad(null ,0, '12');

REPLACE

  • Syntax
    string replace(string <str>, string <old>, string <new>)
  • Description

    Replaces the part of a string specified by str that is exactly the same as a string specified by old with a string specified by new, and returns the obtained string. If no part of the string specified by str is exactly the same as the string specified by old, the original string is returned. This function is an additional function of MaxCompute V2.0.

  • Parameters
    • str: required. A value of the STRING type. It indicates the string that you want to replace.
    • old: required. It indicates the string that you use for comparison.
    • new: required. It indicates the string that you use to replace the original string.
  • Return value

    A value of the STRING type. If any input parameter is set to null, null is returned.

  • Examples
    • Example 1: Replace the part of the string ababab that is exactly the same as the string abab with the string12. Sample statement:
      -- The return value is 12ab.
      select replace('ababab','abab','12');
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select replace('123abab456ab',null,'abab');

SOUNDEX

  • Syntax
    string soundex(string <str>)
  • Description

    Converts a normal string into a string of the SOUNDEX type.

  • Parameter

    str: required. A value of the STRING type. It indicates the string that you want to convert. This function is an additional function of MaxCompute V2.0.

  • Return value

    A value of the STRING type is returned. If the input parameter is set to null, null is returned.

  • Examples
    • Example 1: Convert the string hello into a string of the SOUNDEX type. Sample statement:
      -- The return value is H400.
      select soundex('hello');
    • Example 2: The input parameter is set to null. Sample statement:
      -- The return value is null.
      select soundex(null);

SUBSTRING_INDEX

  • Syntax
    string substring_index(string <str>, string <separator>, int <count>)
  • Description

    Truncates the string str to a substring from the first character to the nth delimiter. n is specified by count. If count is a positive value, the string is truncated from left to right. If count is a negative value, the string is truncated from right to left. This function is an additional function of MaxCompute V2.0.

  • Parameters
    • str: required. A value of the STRING type. It indicates the string that you want to truncate.
    • separator: required. A delimiter of the STRING type.
    • count: required. A value of the INT type. It indicates the position of the delimiter.
  • Return value

    A value of the STRING type is returned. If any input parameter is set to null, null is returned.

  • Examples
    • Example 1: Truncate the string https://www.alibabacloud.com. Sample statements:
      -- The return value is https://www.alibabacloud.
      select substring_index('https://www.alibabacloud.com', '.', 2);
      -- The return value is alibabacloud.com.
      select substring_index('https://www.alibabacloud.com', '.', -2);
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null.
      select substring_index('https://www.alibabacloud.com', null, 2);

URL_ENCODE

  • Syntax
    string url_encode(string <input>[, string <encoding>])
  • Description
    Encodes the input string in the application/x-www-form-urlencoded MIME format and returns the encoded string. The encoding format must comply with the following rules:
    • All letters remain unchanged.
    • Dots (.), hyphens (-), asterisks (*), and underscores (_) remain unchanged.
    • Spaces are converted into plus signs (+).
    • Other characters are converted into byte values based on the specified encoding. Each byte value is then represented in the format of %xy. xy is the hexadecimal representation of the character value.
    This function is an additional function of MaxCompute V2.0.
  • Parameters
    • input: required. A value of the STRING type. It indicates the string that you want to enter.
    • encoding: optional. It indicates the encoding format, which can be GBK or UTF-8. If you do not specify this parameter, the default value is UTF-8.
  • Return value

    A value of the STRING type is returned. If the input parameter is set to null, null is returned.

  • Examples
    -- The return value is %E7%A4%BA%E4%BE%8Bfor+url_encode%3A%2F%2F+%28fdsf%29.
    select url_encode('Example for url_encode:// (fdsf)');
    -- The return value is Example+for+url_encode+%3A%2F%2F+dsf%28fasfs%29.
    select url_encode('Example for url_encode:// dsf(fasfs)', 'GBK');

URL_DECODE

  • Syntax
    string url_decode(string <input>[, string <encoding>])
  • Description
    Converts an input string from the application/x-www-form-urlencoded MIME format into a normal string. This is the inverse function of url_encoding. The encoding format must comply with the following rules:
    • All letters remain unchanged.
    • Dots (.), hyphens (-), asterisks (*), and underscores (_) remain unchanged.
    • Plus signs (+) are converted into spaces.
    • The %xy formatted sequence is converted into byte values. Consecutive byte values are decoded to the related strings based on the value of encoding.
    • Other characters remain unchanged.
    This function is an additional function of MaxCompute V2.0.
  • Parameters
    • input: required. A value of the STRING type. It indicates the string that you want to enter.
    • encoding: optional. It indicates the encoding format, which can be GBK or UTF-8. If you do not specify this parameter, the default value is UTF-8.
  • Return value

    A UTF-8 encoded string of the STRING type is returned. If the input parameter is set to null, null is returned.

  • Examples
    -- The return value is for url_decode:// (fdsf).
    select url_decode('%E7%A4%BA%E4%BE%8Bfor+url_decode%3A%2F%2F+%28fdsf%29');
    -- The return value is Example for URL_DECODE:// dsf(fasfs).
    select url_decode('Example+for+url_decode+%3A%2F%2F+dsf%28fasfs%29', 'GBK');