You can use string functions to process specified strings in MaxCompute SQL. This topic describes the syntax and parameters of string functions that are supported by MaxCompute SQL, and provides examples on how to use string functions. This topic guides you through data development by using string functions.

The following table describes the string functions that are supported by MaxCompute SQL.

FunctionDescription
ASCIIReturns the ASCII code of the first character in a specified string.
CHAR_MATCHCOUNTCalculates the number of characters of String A that appear in String B.
CHRConverts an ASCII code into a character.
CONCATConcatenates all the specified strings and returns the final string.
CONCAT_WSConcatenates all input strings in an array by using a specified delimiter.
ENCODEEncodes a string in the specified encoding format.
FIND_IN_SETReturns the position of the specified string among multiple strings that are separated by commas (,).
FORMAT_NUMBERConverts a number into a string in the specified format.
FROM_JSONReturns data of the ARRAY, MAP, or STRUCT type based on a given JSON string and a given output format.
GET_JSON_OBJECTExtracts a single string from a standard JSON string by using a specific method.
INSTRReturns the position of String A in String B.
IS_ENCODINGDetermines whether a string can be converted from one character set to another character set.
KEYVALUESplits a string into key-value pairs, separates the key-value pairs, and then returns the value that corresponds to a specified key.
KEYVALUE_TUPLESplits a string into key-value pairs, separates the key-value pairs, and then returns the values that correspond to the keys.
LENGTHCalculates the length of a string.
LENGTHBCalculates the length of a string in bytes.
LOCATEReturns the position of a specified string in another string.
LTRIMRemoves the characters from the left side of a string.
MASK_HASHReturns a hash value based on a string expression.
MD5Returns the MD5 value of a string.
PARSE_URLParses a URL and returns the specified part of the URL.
PARSE_URL_TUPLEParses a URL and returns multiple parts of the URL.
REGEXP_COUNTReturns the number of substrings that match a specified pattern from a specified position.
REGEXP_EXTRACTSplits a string into groups based on a specified pattern and returns the string in a specified group.
REGEXP_INSTRReturns the start or end position of a substring that starts at a specified position and matches a specified pattern for a specified number of times.
REGEXP_REPLACEUses a string to replace a substring of another string if the substring matches a specified pattern for a specified number of times.
REGEXP_SUBSTRReturns a substring in a string that matches a specified pattern for a specified number of times from a specified position.
REPEATReturns a string that repeats a specified string for a specified number of times.
REVERSEReturns the characters of a string in reverse order.
RTRIMRemoves the characters from the right side of a string.
SPACEGenerates a space string.
SPLIT_PARTUses a delimiter to split a string into substrings and returns a substring of the specified part of the string.
SUBSTRReturns a substring that has a specified length from a specified position of a string. The string is of the STRING type.
SUBSTRINGReturns a substring that has a specified length from a specified position of a string. The string is of the STRING or BINARY type.
TO_CHARConverts data of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type into the STRING type.
TO_JSONConverts data of a complex data type into a JSON string.
TOLOWERConverts uppercase letters in a string into lowercase letters.
TOUPPERConverts lowercase letters in a string into uppercase letters.
TRIMRemoves the characters from both of the left and right sides of a string.
URL_DECODEConverts an input string that is in the application/x-www-form-urlencoded MIME format into a standard string.
URL_ENCODEEncodes the input string in the application/x-www-form-urlencoded MIME format and returns the encoded string.
JSON_TUPLEExtracts strings from a standard JSON string based on a set of input keys.
LPADLeft pads a string to a specified length.
RPADRight pads a string to a specified length.
REPLACEReplaces a substring in String A that matches String B with another substring.
SOUNDEXConverts a string into a string of the SOUNDEX type.
SUBSTRING_INDEXTruncates a string from a specified delimiter.
TRANSLATEReplaces the part of String A that appears in String B with String C.
REGEXP_EXTRACT_ALLFinds all substrings that match the pattern of a regular expression in a string and returns the substrings as an array.

Usage notes

MaxCompute V2.0 provides extension date functions. If the functions that you use involve new data types, you must run one of the following SET commands to enable the MaxCompute V2.0 data type edition. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.
  • Session level: To use the MaxCompute V2.0 data type edition, you must add set odps.sql.type.system.odps2=true; before the SQL statement that you want to execute, and commit and execute them together.
  • Project level: The project owner can enable the MaxCompute V2.0 data type edition for the project based on the project requirements. The configuration takes effect after 10 to 15 minutes. Run the following command:
    setproject odps.sql.type.system.odps2=true;
    For more information about setproject, see Project operations. For more information about the precautions that you must take when you enable the MaxCompute V2.0 data type edition at the project level, see Data type editions.

ASCII

  • Syntax
    bigint ascii(string <str>)
  • Description

    Returns the ASCII code of the first character in a string specified by str.

  • Parameters

    str: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation.

  • Return value
    A value of the BIGINT type is returned. The return value varies based on the following rules:
    • If the value of str is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
    • If the value of str is null, null is returned.
  • Examples:
    • Example 1: Return the ASCII code of the first character in the string abcde. Sample statement:
      -- The return value is 97. 
      select ascii('abcde'); 
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select ascii(null);

CHAR_MATCHCOUNT

  • Syntax
    bigint char_matchcount(string <str1>, string <str2>)
  • Description

    Returns the number of characters that belong to str1 and appear in str2.

  • Parameters

    str1 and str2: required. Values of the STRING type. The values must be valid UTF-8 strings. If invalid characters (non-Unicode-encoded characters) are found during the comparison of the two strings, this function returns a negative value.

  • Return value

    A value of the BIGINT type is returned. If the value of str1 or str2 is null, null is returned.

  • Examples:
    • Example 1: Calculate the number of characters that belong to the string aabc and appear in the string abcde. Sample statement:
      -- The return value is 4. 
      select char_matchcount('aabc','abcde');
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select char_matchcount(null,'abcde');

CHR

  • Syntax
    string chr(bigint <ascii>)
  • Description

    Converts a specified ASCII code into characters.

  • Parameters

    ascii: required. An ASCII code of the BIGINT type. Valid values: 0 to 128. If the input value is of the STRING, DOUBLE, or DECIMAL type, the value is implicitly converted into a value of the BIGINT type before calculation.

  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of ascii is not within the valid range, an error is returned.
    • If the value of ascii is not of the BIGINT, STRING, DOUBLE, or DECIMAL type, an error is returned.
    • If the value of ascii is null, null is returned.
  • Examples:
    • Example 1: Convert the ASCII code 100 into characters. Sample statement:
      -- The return value is d. 
      select chr(100);
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select chr(null);
    • Example 3: The input value is a character of the STRING type. Sample statement:
      -- The input value is implicitly converted into a value of the BIGINT type before calculation. The return value is d. 
      select chr('100');

CONCAT

  • Syntax
    array<T> concat(array<T> <a>, array<T> <b>[,...])
    string concat(string <str1>, string <str2>[,...])
  • Description
    • Arrays as inputs: Concatenates all elements of multiple arrays and returns a new array.
    • Strings as inputs: Concatenates multiple strings and returns a new string.
  • Parameters
    • a and b: required. These parameters specify arrays that T in array<T> specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type. The null elements are also involved in the operation.
    • str1 and str2: required. Values of the STRING type. If the input values are of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, they are implicitly converted into values of the STRING type before calculation. If the input values are of other data types, an error is returned.
  • Return value
    • A value of the ARRAY type is returned. If one of the input arrays is null, null is returned.
    • A value of the STRING type is returned. If no input parameters are configured or an input parameter is set to null, null is returned.
  • Examples
    • Example 1: Concatenate all elements of array(10, 20) and array(20, -20). Sample statement:
      -- [10, 20, 20, -20] is returned. 
      select concat(array(10, 20), array(20, -20));
    • Example 2: One of the input arrays contains a null element. Sample statement:
      -- [10, null, 20, -20] is returned. 
      select concat(array(10, null), array(20, -20));
    • Example 3: One of the input arrays is null. Sample statement:
      -- The value null is returned. 
      select concat(array(10, 20), null);
    • Example 4: Concatenate strings aabc and abcde. Sample statement:
      -- aabcabcde is returned. 
      select concat('aabc','abcde');
    • Example 5: The input is empty. Sample statement:
      -- The value null is returned. 
      select concat();
    • Example 6: One of the input strings is null. Sample statement:
      -- The value null is returned. 
      select concat('aabc', 'abcde', null);

CONCAT_WS

  • Syntax
    string concat_ws(string <separator>, string <str1>, string <str2>[,...])
    string concat_ws(string <separator>, array<string> <a>)
  • Description

    Concatenates all strings in a parameter or elements in an array by using the specified delimiter and returns the result. This function is an extension function of MaxCompute V2.0.

  • Parameters
    • separator: required. A delimiter of the STRING type.
    • str1 and str2: At least two strings must be specified. Values of the STRING type. If the input values are of the BIGINT, DECIMAL, DOUBLE, or DATETIME type, the values are implicitly converted into values of the STRING type before calculation.
    • a: required. This parameter specifies an array. The elements in the array are of the STRING type.
  • Return value
    A value of the STRING or STRUCT type is returned. The return value varies based on the following rules:
    • If the value of str1 or str2 is not of the STRING, BIGINT, DECIMAL, DOUBLE, or DATETIME type, an error is returned.
    • If no input parameters are present or an input parameter is set to null, null is returned.
  • Examples:
    • Example 1: Concatenate the strings name and hanmeimei by using a colon (:). Sample statement:
      -- The return value is name:hanmeimei. 
      select concat_ws(':','name','hanmeimei');
      -
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select concat_ws(':','avg',null,'34');
    • Example 3: Concatenate the elements in the array('name', 'hanmeimei') array by using colons (:). Sample statement:
      -- The return value is name:hanmeimei. 
      select concat_ws(':',array('name', 'hanmeimei'));

ENCODE

  • Syntax
    binary encode(string <str>, string <charset>)
  • Description

    Encodes string str in the format specified by charset.

  • Parameters
    • str: required. A value of the STRING type. This parameter specifies the string that you want to encode.
    • charset: required. A value of the STRING type. This parameter specifies the encoding format. Valid values: UTF-8, UTF-16, UTF-16LE, UTF-16BE, ISO-8859-1, and US-ASCII.
  • Return value

    A value of the BINARY type is returned. If the value of str or charset is null, null is returned.

  • Examples:
    • Example 1: Encode string abc in the UTF-8 format. Sample statement:
      -- The return value is abc. 
      select encode("abc", "UTF-8");
    • Example 2: Encode string abc in the UTF-16BE format. Sample statement:
      -- The return value is =00a=00b=00c. 
      select encode("abc", "UTF-16BE");
    • Example 3: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select encode("abc", null);

FIND_IN_SET

  • Syntax
    bigint find_in_set(string <str1>, string <str2>)
  • Description

    Returns the position of substring str1 in string str2. The substrings in string str2 are separated by commas (,). The first position is 1.

  • Parameters
    • str1: required. A value of the STRING type. This parameter specifies the substring whose position you want to obtain.
    • str2: required. A value of the STRING type. This parameter specifies a string in which substrings are separated by commas (,).
  • Return value
    A value of the BIGINT type is returned. The return value varies based on the following rules:
    • If str2 does not contain str1 or str1 contains commas (,), 0 is returned.
    • If the value of str1 or str2 is null, null is returned.
  • Examples:
    • Example 1: Return the position of substring ab in string abc,hello,ab,c. Sample statement:
      -- The return value is 3. 
      select find_in_set('ab', 'abc,hello,ab,c');
    • Example 2: Return the position of substring hi in abc,hello,ab,c. Sample statement:
      -- The return value is 0. 
      select find_in_set('hi', 'abc,hello,ab,c');
    • Example 3: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select find_in_set(null, 'abc,hello,ab,c');

FORMAT_NUMBER

  • Syntax
    string format_number(float|double|decimal <expr1>, int <expr2>)
  • Description

    Converts the value of expr1 into a string that meets the format specified by expr2.

  • Parameters
    • expr1: required. A value of the FLOAT, DOUBLE, or DECIMAL type. This parameter specifies the expression that you want to format.
    • expr2: required. A value of the INT type. Valid values: 0 to 340. This parameter specifies the number of decimal places that you want to retain. It can also be expressed in a format similar to #,###,###.##. The number of decimal places returned varies based on the value of this parameter.
  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of expr2 is greater than 0 and is less than or equal to 340, a value that is rounded to the specified number of decimal places is returned.
    • If the value of expr2 is 0, the return value contains only the integral part and does not contain the decimal point or decimal part.
    • If the value of expr2 is less than 0 or greater than 340, an error is returned.
    • If expr1 or expr2 is empty or is set to null, null is returned.
  • Examples:
    • Example 1: Return a value in the specified format based on a given number. Sample statements:
      -- The return value is 5.230. 
      select format_number(5.230134523424545456,3);
      -- The return value is 12,332.123. 
      select format_number(12332.123456, '#,###,###,###.###');
    • Example 2: An input parameter is empty or set to null. Sample statements:
      -- The return value is null. 
      select format_number('',3);
      -- The return value is null. 
      select format_number(null,3);

FROM_JSON

  • Syntax
    from_json(<jsonStr>, <schema>)
  • Description

    Returns data of the ARRAY, MAP, or STRUCT type based on JSON string jsonStr and output format schema.

  • Parameters
    • jsonStr: required. The JSON string that you entered.
    • schema: required. The schema of the JSON string. The value of this parameter must be in the same format as that in the statement for creating a table, such as array<bigint>, map<string, array<string>>, or struct<a:int, b:double, `C`:map<string,string>>.
      Note Keys in a struct are case-sensitive. You can also specify a struct in the format of a BIGINT, b DOUBLE, which is equivalent to STRUCT<a:BIGINT, b:DOUBLE>.

      The following table describes the mappings between JSON data types and MaxCompute data types.

      JSON data typeMaxCompute data type
      OBJECTSTRUCT, MAP, and STRING
      ARRAYARRAY and STRING
      NUMBERTINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, and STRING
      BOOLEANBOOLEAN and STRING
      STRINGSTRING, CHAR, VARCHAR, BINARY, DATE, and DATETIME
      NULLAll types
      Note The JSON string of the OBJECT and ARRAY types are parsed as much as possible. If the data type of the JSON string is not mapped to any MaxCompute data type, the JSON string is omitted. For ease of use, all JSON data types can be converted into the STRING data type supported by MaxCompute. When you convert a JSON string of the NUMBER type to a value of the FLOAT, DOUBLE, or DECIMAL type, the precision of the value cannot be ensured. We recommend you convert the JSON string to a value of the STRING type and then convert the obtained value to a value of the FLOAT, DOUBLE, or DECIMAL type.
  • Return value

    A value of the ARRAY, MAP, or STRUCT type is returned.

  • Examples
    • Example 1: Convert a specific JSON string into a value of a specific data type. Sample statement:
      -- {"a":1,"b":0.8} is returned. 
      select from_json('{"a":1, "b":0.8}', 'a int, b double');
      -- {"time":"26/08/2015"} is returned. 
      select from_json('{"time":"26/08/2015"}', 'time string');
      -- {"a":1,"b":0.8} is returned. 
      select from_json('{"a":1, "b":0.8}', 'a int, b double, c string');
      -- [1,2,3] is returned. 
      select from_json('[1, 2, 3, "a"]', 'array<bigint>');
      -- {"d":"v","a":"1","b":"[1,2,3]","c":"{}"} is returned. 
      select from_json('{"a":1,"b":[1,2,3],"c":{},"d":"v"}', 'map<string, string>');
    • Example 2: Use the map_keys and from_json functions to obtain all keys in the JSON string. You can also use JSON_KEYS for the same purpose. Sample statement:
      -- ["a","b"] is returned. 
      select map_keys(from_json('{"a":1,"b":2}','map<string,string>'));

GET_JSON_OBJECT

  • Syntax
    string get_json_object(string <json>, string <path>)
  • Description

    Extracts a single string from a standard JSON string based on path. The original data is read each time this function is called. Repeated calls may affect system performance and increase costs. To prevent repeated calls, you can use the GET_JSON_OBJECT function with UDTFs. For more information, see Convert JSON log data by using MaxCompute built-in functions and UDTFs.

  • Parameters
    • json: required. A value of the STRING type. This parameter specifies a standard JSON object in the format of {Key:Value, Key:Value,...}. If the string contains a double quotation mark ("), use two backslashes (\\) to escape the double quotation mark (") before extraction. If the string contains a single quotation mark ('), use a single backslash (\) to escape the single quotation mark (') before extraction.
    • path: required. A value of the STRING type. This parameter specifies the path in the value of the json parameter and starts with $. For more information about the path parameter, see LanguageManual UDF. For more information about best practices, see Migrate JSON data from OSS to MaxCompute. Different characters have the following meanings:
      • $: indicates the root node.
      • . or ['']: indicates a child node. MaxCompute parses JSON objects by using . or ['']. If a key in a JSON object contains a period (.), [''] can be used.
      • [] ([number]): indicates an array subscript, which starts from 0.
      • *: indicates the wildcard for []. If this character is used in the path parameter, an entire array is returned. An asterisk (*) cannot be escaped.
  • Limits

    Only MaxCompute V2.0 allows you to extract data by using [''] in the path parameter. To use [''], you must add the set odps.sql.udf.getjsonobj.new=true; statement before the statement that you want to execute.

  • Return value
    • If the json parameter is empty or invalid, null is returned.
    • If the format of json is valid and path exists, the related string is returned.
    • To determine how this function returns a value, you can specify odps.sql.udf.getjsonobj.new for a session.
      • If you run the set odps.sql.udf.getjsonobj.new=true; command, this function retains the original strings when it returns a value.
        We recommend that you use this configuration because it results in more standard function return behavior. This facilitates data processing and improves data processing performance. If a MaxCompute project has jobs in which this function escapes JSON reserved characters, we recommend that you retain the original escaping operation to prevent errors caused by lack of verification. The function complies with the following rules when it returns a value:
        • The return value is still a JSON string, which can be parsed as JSON data. You do not need to use the REPLACE or REGEXP_REPLACE function to replace backslashes (\).
        • Duplicate keys are allowed in a JSON object. If duplicate keys exist, the data can be parsed.
          -- 1 is returned. 
          select get_json_object('{"a":"1","a":"2"}', '$.a');
        • The encoded strings that correspond to emojis are supported. However, DataWorks does not allow you to enter emojis. DataWorks allows you to enter only the encoded strings that correspond to emojis to MaxCompute by using a tool, such as Data Integration. DataWorks uses the GET_JSON_OBJECT function to process the data.
          -- An emoji is returned. 
          select get_json_object('{"a":"<Emoji>"}', '$.a');
        • The output results are displayed in alphabetical order.
          -- {"b":"1","a":"2"} is returned. 
          select get_json_object('{"b":"1","a":"2"}', '$');
      • If you run the set odps.sql.udf.getjsonobj.new=false; command, this function escapes JSON reserved characters when it returns a value. The function complies with the following rules when it returns a value:
        • JSON reserved characters, such as line feeds (\n) and double quotation marks (") are displayed as '\n' and '\"'.
        • Each key in a JSON object must be unique. If duplicate keys exist, the data may fail to be parsed.
          -- The value null is returned. 
          select get_json_object('{"a":"1","a":"2"}', '$.a');
        • The encoded strings that correspond to emojis cannot be parsed.
          -- The value null is returned. 
          select get_json_object('{"a":"<Emoji>"}', '$.a');
        • The output results are displayed in alphabetical order.
          -- {"a":"2","b":"1"} is returned. 
          select get_json_object('{"b":"1","a":"2"}', '$');
      Note For MaxCompute projects that are created on or after January 21, 2021, the GET_JSON_OBJECT function retains the original strings when it returns a value. For MaxCompute projects that were created before January 21, 2021, the GET_JSON_OBJECT function escapes JSON reserved characters when it returns a value. The following example helps you determine how the GET_JSON_OBJECT function returns a value in a MaxCompute project.
      select get_json_object('{"a":"[\\"1\\"]"}', '$.a');
      -- Return JSON reserved characters by using escape characters.
      [\"1\"]
      
      -- Return the original strings.
      ["1"]

      You can submit a ticket to request the MaxCompute technical support team to enable the GET_JSON_OBJECT function. This way, you do not need to frequently specify flagodps.sql.udf.getjsonobj.new for a session. This function retains the original strings when it returns a value.

  • Examples
    • Example 1: Extract information from the JSON object src_json.json. Sample statement:
      -- The JSON string src_json.json contains the following content: 
      +----+
      json
      +----+
      {"store":
      {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
      "bicycle":{"price":19.95,"color":"red"}
      },
      "email":"amy@only_for_json_udf_test.net",
      "owner":"amy"
      }
      -- Extract the information of the owner field and return amy. 
      select get_json_object(src_json.json, '$.owner') from src_json;
      -- Extract the information of the first array in the store.fruit field and return {"weight":8,"type":"apple"}. 
      select get_json_object(src_json.json, '$.store.fruit[0]') from src_json;
      -- Extract the information of the non-existent field and return null. 
      select get_json_object(src_json.json, '$.non_exist_key') from src_json;
    • Example 2: Extract information from a JSON object of the ARRAY type. Sample statement:
      -- 2222 is returned. 
      select get_json_object('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]');
      -- ["h0","h1","h2"] is returned. 
      set odps.sql.udf.getjsonobj.new=true;
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]');
      -- ["h0","h1","h2"] is returned. 
      set odps.sql.udf.getjsonobj.new=false;
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh');
      -- h1 is returned. 
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]');
    • Example 3: Extract information from a JSON object that contains a period (.). Sample statement:
      -- Create a table. 
      create table mf_json (id string, json string);
      -- Insert data into the table. The key in the data contains a period (.). 
      insert into table mf_json (id, json) values ("1", "{
      \"China.beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\",
      \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}");
      -- Insert data into the table. The key in the data does not contain a period (.). 
      insert into table mf_json (id, json) values ("2", "{
      \"China_beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\",
      \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}");
      -- Query the value of id in the JSON object whose key is China.beijing. 0 is returned. Only [''] can be used to specify the key because the key contains a period (.). This way, MaxCompute can parse the key. 
      select get_json_object(json, "$['China.beijing'].school['id']") from mf_json where id =1;
      -- -- Query the value of id in the JSON object whose key is China_beijing. 0 is returned. You can use one of the following statements: 
      select get_json_object(json, "$['China_beijing'].school['id']") from mf_json where id =2;
      select get_json_object(json, "$.China_beijing.school['id']") from mf_json where id =2;
    • Example 4: The json parameter is empty or invalid. Sample statement:
      -- The value null is returned. 
      select get_json_object('','$.array[1][1]');
      -- The value null is returned. 
      select get_json_object('"array":["aaaa",1111],"bbbb":["cccc",3333]','$.array[1][1]');
    • Example 5: Escape a JSON string. Sample statement:
      set odps.sql.udf.getjsonobj.new=true;
      -- "1" is returned. 
      select get_json_object('{"a":"\\"1\\"","b":"2"}', '$.a'); 
      -- '1' is returned. 
      select get_json_object('{"a":"\'1\'","b":"2"}', '$.a'); 

INSTR

  • Syntax
    bigint instr(string <str1>, string <str2>[, bigint <start_position>[, bigint <nth_appearance>]])
  • Description

    Returns the position of substring str2 in string str1.

  • Parameters
    • str1: required. A value of the STRING type. This parameter specifies the string that contains the substring you want to search for. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.
    • str2: required. A value of the STRING type. This parameter specifies the substring that you want to search for. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.
    • start_position: optional. A value of the BIGINT type. If the input value is of another data type, an error is returned. This parameter specifies the position of the character in str1 from which the search starts. The default start position is the first character, marked as 1. If start_position is a negative value, the search starts from the end to the start of the string and the last character is -1.
    • nth_appearance: optional. A value of the BIGINT type, which is greater than 0. This parameter specifies the position of substring str2 that appears in string str1 at the nth time. If the value of nth_appearance is of another data type or is less than or equal to 0, an error is returned.
  • Return value

    A value of the BIGINT type is returned. The return value varies based on the following rules:

    • If str2 is not found in str1, the value 0 is returned.
    • If str2 is an empty string, the matching always succeeds. For example, the value 1 is returned for select instr('abc','');.
    • If the value of str1, str2, start_position, or nth_appearance is null, null is returned.
  • Examples:
    • Example 1: Return the position of the substring e in the string Tech on the net. Sample statement:
      -- The return value is 2. 
      select instr('Tech on the net', 'e');
    • Example 2: Return the position of substring on in string Tech on the net. Sample statement:
      -- The return value is 6. 
      select instr('Tech on the net', 'on');
    • Example 3: Return the position of the second occurrence in which the substring e appears in the string Tech on the net from the third character. Sample statement:
      -- The return value is 14. 
      select instr('Tech on the net', 'e', 3, 2);
    • Example 4: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select instr('Tech on the net', null);

IS_ENCODING

  • Syntax
    boolean is_encoding(string <str>, string <from_encoding>, string <to_encoding>)
  • Description

    Determines whether the input string str can be converted from the character set specified by from_encoding into the character set specified by to_encoding. This function can also be used to determine whether the input string is garbled. In most cases, from_encoding is set to UTF-8 and to_encoding is set to GBK.

  • Parameters
    • str: required. A value of the STRING type. An empty string can belong to any character set.
    • from_encoding and to_encoding: required. Values of the STRING type. from_encoding specifies the source character set and to_encoding specifies the destination character set.
  • Return value
    A value of the BOOLEAN type is returned. The return value varies based on the following rules:
    • If str can be converted, True is returned. Otherwise, False is returned.
    • If the value of str, from_encoding, or to_encoding is null, null is returned.

KEYVALUE

  • Syntax
    keyvalue(string <str>,[string <split1>,string <split2>,] string <key>)
    keyvalue(string <str>,string <key>) 
  • Description

    Splits the string str into key-value pairs by split1, separates the key-value pairs by split2, and then returns the value of the specified key.

  • Parameters
    • str: required. A value of the STRING type. This parameter specifies the string that you want to split.
    • split1 and split2: optional. Values of the STRING type. These parameters specify the strings that are used as delimiters to split the source string. If you do not specify the two parameters, the default value of split1 is a semicolon (;) and the default value of split2 is a colon (:). If a key-value pair that is obtained after the source string is split by split1 contains multiple delimiters specified by split2, the returned result is undefined.
    • key: required. A value of the STRING type. After the source string is split by split1 and split2 in sequence, the value that corresponds to the key is returned.
  • Return value

    A value of the STRING type is returned. The return value varies based on the following rules:

    • If the value of split1 or split2 is null, null is returned.
    • If the value of str or key is null or no key matches, null is returned.
    • If multiple key-value pairs match, the value that corresponds to the first matched key is returned.
  • Examples:
    • Example 1: Split the string 0:1\;1:2 into key-value pairs and return the value that corresponds to the key 1. Sample statement:
      -- The return value is 2. 
      select keyvalue('0:1\;1:2', 1);
      The split1 and split2 parameters are not specified. The default value of split1 is a semicolon (;) and the default value of split2 is a colon (:).
      After the source string is split by using split1, the key-value pairs 0:1\,1:2 are returned. After the key-value pairs are split by using split2, the following keys and values are returned:
      0 1/  
      1 2
      The value 2 that corresponds to key 1 is returned.
    • Example 2: Split the string \;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\; into key-value pairs by using a backslash and a semicolon (\;), separate the values from keys by using a colon (:), and then return the value that corresponds to the key tf. Sample statement:
      -- The return value is 21910. 
      select keyvalue("\;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\;","\;",":","tf");
      After the source string \;decreaseStore:1\;xcard:1\;isB2C:1\;tf:21910\;cart:1\;shipping:2\;pf:0\;market:shoes\;instPayAmount:0\; is split by using a backslash and a semicolon (\;), the following key-value pairs are generated:
      decreaseStore:1, xcard:1, isB2C:1, tf:21910, cart:1, shipping:2, pf:0, market:shoes, and instPayAmount:0. 
      After the key-value pairs are separated by using a colon (:), the following keys and values are generated:
      decreaseStore 1  
      xcard 1  
      isB2C 1  
      tf 21910  
      cart 1  
      shipping 2  
      pf 0  
      market shoes  
      instPayAmount 0
      The value 21910 that corresponds to the key tf is returned.

KEYVALUE_TUPLE

  • Syntax
    KEYVALUE_TUPLE(str, split1, split2, key1, key2, ..., keyN)
  • Description

    Splits the string str into key-value pairs by split1, separates the key-value pairs by split2, and then returns the values of the keys.

  • Parameters
    • str: required. A value of the STRING type. This parameter specifies the string that you want to split.
    • split1 and split2: required. Values of the STRING type. These parameters specify the strings that are used as delimiters to split the source string. If a key-value pair that is obtained after the source string is split by split1 contains multiple delimiters specified by split2, the returned result is undefined.
    • key: required. A value of the STRING type. After the source string is split by split1 and split2 in sequence, the value that corresponds to the key is returned.
  • Return value

    A value of the STRING type is returned. The return value varies based on the following rules:

    • If the value of split1 or split2 is null, null is returned.
    • If the value of str or key is null or no key matches, null is returned.
  • Examples:
    -- Create a table.
    create table mf_user (
    user_id string,
    user_info string
    );
    -- Insert data into the table.
    insert into mf_user values('1','age:18;genda:f;address:abc'),('2','age:20;genda:m;address:bcd');
    -- Query the data that you inserted.
    SELECT user_id,
    KEYVALUE(user_info,';',':','age') as age,
    KEYVALUE(user_info,';',':','genda') as genda,
    KEYVALUE(user_info,';',':','address') as address
    FROM mf_user;
    -- It is equivalent to the KEYVALUE clause.
    SELECT user_id,
    age,
    genda,
    address
    FROM mf_user LATERAL VIEW KEYVALUE_TUPLE(user_info,';', ':','age','genda','address') ui AS age,genda,address;
    The following result is returned:
    +------------+------------+------------+------------+
    | user_id    | age        | genda      | address    |
    +------------+------------+------------+------------+
    | 1          | 18         | f          | abc        |
    | 2          | 20         | m          | bcd        |
    +------------+------------+------------+------------+

LENGTH

  • Syntax
    bigint length(string <str>)
  • Description

    Returns the length of the string str.

  • Parameters

    str: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation.

  • Return value
    A value of the BIGINT type is returned. The return value varies based on the following rules:
    • If the value of str is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
    • If the value of str is null, null is returned.
    • If the value of str is not UTF-8 encoded, -1 is returned.
  • Examples:
    • Example 1: Return the length of the string Tech on the net. Sample statement:
      -- The return value is 15. 
      select length('Tech on the net');
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select length(null);

LENGTHB

  • Syntax
    bigint lengthb(string <str>)
  • Description

    Returns the length of a string specified by str in bytes.

  • Parameters

    str: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation.

  • Return value
    A value of the BIGINT type is returned. The return value varies based on the following rules:
    • If the value of str is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
    • If the value of str is null, null is returned.
  • Examples:
    • Example 1: Return the length of string Tech on the net in bytes. Sample statement:
      -- The return value is 15. 
      select lengthb('Tech on the net');
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select lengthb(null);

LOCATE

  • Syntax
    bigint locate(string <substr>, string <str>[, bigint <start_pos>]) 
  • Description

    Returns the position of substring substr in string str. You can use start_pos to specify the position from which the search starts. The value starts from 1.

  • Parameters
    • substr: required. A value of the STRING type. This parameter specifies the substring that you want to search for.
    • str: required. A value of the STRING type. This parameter specifies the string in which you want to search for the substring.
    • start_pos: optional. A value of the BIGINT type. This parameter specifies the position from which the search starts.
  • Return value
    A value of the BIGINT type is returned. The return value varies based on the following rules:
    • If substr cannot be found in str, 0 is returned.
    • If the value of str or substr is null, null is returned.
    • If the value of start_pos is null, 0 is returned.
  • Examples:
    • Example 1: Return the position of string ab in string abchelloabc. Sample statement:
      -- The return value is 1. 
      select locate('ab', 'abchelloabc');
    • Example 2: Return the position of string hi in string abchelloabc. Sample statement:
      -- The return value is 0. 
      select locate('hi', 'abc,hello,ab,c');
    • Example 3: The start_pos parameter is set to null. Sample statement:
      -- The return value is 0. 
      select locate('ab', 'abhelloabc', null);

LTRIM

  • Syntax
    string ltrim(string <str>[, <trimChars>])
    string trim(leading [<trimChars>] from <str>)
  • Description
    Removes the characters from the left side of a string that is specified by str.
    • If you do not specify trimChars, the spaces on the left side are removed by default.
    • If you specify trimChars, the substring that consists of one or more characters specified by trimChars is removed from the left side of the string that is specified by str.
  • Parameters
    • str: required. A value of the STRING type. This parameter specifies the string from the left side of which the characters are removed. If the input value is of the BIGINT, DECIMAL, DOUBLE, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation.
    • trimChars: optional. A value of the STRING type. This parameter specifies the characters to be removed.
  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of str is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
    • If the value of str or trimChars is null, null is returned.
  • Examples:
    • Example 1: Remove the space from the left side of the string specified by yxTxyomxx . Sample statement:
      -- The return value is yxTxyomxx . 
      select ltrim(' yxTxyomxx ');
      -- The preceding statements are equivalent to the following statements: 
      select trim(leading from ' yxTxyomxx ');
    • Example 2: Remove the substring that consists of one or more characters in the xy collection from the left side of the string yxTxyomxx.
      -- The return value is Txyomxx. If x or y appears on the left side, it is removed. 
      select ltrim('yxTxyomxx', 'xy');
      -- The preceding statement is equivalent to the following statement: 
      select trim(leading 'xy' from 'yxTxyomxx');
    • Example 3: The input parameter is set to null. Sample statement:
      -- The return value is null. 
      select ltrim(null);
      select ltrim('yxTxyomxx', null);

MASK_HASH

  • Syntax
    mask_hash(<expr>)
  • Description

    Returns a hash value that is calculated by using a string expression that is specified by expr. If the values calculated by a string expression are the same, a consistent hash value is returned.

  • Parameters

    expr: required. A string expression that is used to calculate a hash value. The STRING, CHAR, VARCHAR, and BINARY types are supported.

  • Return value

    A hash value calculated by using a string expression is returned. The hash value is 64 bytes in length. In the Hive-compatible data type edition, the value NULL is returned if a non-string expression is used.

  • Examples:
    -- A hash value is returned if the string abc is used.
    select mask_hash("abc");
    -- The following result is returned:
    +------------+
    | _c0        |
    +------------+
    | ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad |
    +------------+
    
    -- The value NULL is returned if a non-string expression is used.
    select mask_hash(100);
    -- The following result is returned:
    +------------+
    | _c0        |
    +------------+
    | NULL       |
    +------------+

MD5

  • Syntax
    string md5(string <str>)
  • Description

    Returns the MD5 value of a string specified by str.

  • Parameters

    str: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation.

  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of str is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
    • If the value of str is null, null is returned.
  • Examples:
    • Example 1: Return the MD5 value of string Tech on the net. Sample statement:
      -- The return value is ddc4c4796880633333d77a60fcda9af6. 
      select md5('Tech on the net');
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select md5(null);

PARSE_URL

  • Syntax
    string parse_url(string <url>, string <part>[, string <key>])
  • Description

    Parses url and extracts information based on the value specified by part.

  • Parameters
    • url: required. A value of the STRING type. This parameter specifies a URL. If the URL is invalid, an error is returned.
    • part: required. A value of the STRING type. Valid values: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. The value of this parameter is not case-sensitive.
    • key: optional. If part is set to QUERY, this function returns the value that corresponds to key.
  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of url, part, or key is null, null is returned.
    • If the value of part is invalid, an error is returned.
  • Examples:
    -- The return value is example.com. 
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'HOST');
    -- The return value is /over/there/index.dtb. 
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'PATH');
    -- The return value is animal. 
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'QUERY', 'type');
    -- The return value is nose. 
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'REF');
    -- The return value is file. 
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'PROTOCOL');
    -- The return value is username:password@example.com:8042. 
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'AUTHORITY');
    -- The return value is username:password. 
    select parse_url('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'USERINFO');

PARSE_URL_TUPLE

  • Syntax
    string parse_url_tuple(string <url>, string <key1>, string <key2>,...)
  • Description

    Parses url and extracts the strings that are specified by a group of input keys, such as key1 and key2. The PARSE_URL_TUPLE function is similar to the PARSE_URL function. However, the PARSE_URL_TUPLE function provides better performance because it can extract the strings that correspond to multiple keys at the same time.

  • Parameters
    • url: required. A value of the STRING type. This parameter specifies a URL. If the URL is invalid, an error is returned.
    • key1 and key2: required. Values of the STRING type. These parameters specify the keys that correspond to the strings you want to extract. Valid values:
      • HOST: indicates the host address, which can be a domain name or an IP address.
      • PATH: indicates the path of network resources in the server.
      • QUERY: indicates the string that you want to query.
      • REF: indicates the URL annotation, which is displayed when you move your pointer over the URL.
      • PROTOCOL: indicates the protocol type.
      • AUTHORITY: indicates the domain name or IP address, port number, and user authentication information, such as username and password, of the server.
      • FILE: indicates the path of network resources in the server and the content that you want to query. FILE consists of PATH and QUERY.
      • USERINFO: indicates user authentication information.
      • QUERY:<KEY>: indicates the value of the specified key in the query string.
      The values of these parameters are not case-sensitive. If you specify a value other than the preceding values, an error is returned.
  • Return value

    A value of the STRING type is returned. If the value of url or key is null, an error is returned.

  • Examples:
    Extract the strings that correspond to keys from file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose. Sample statement:
    select parse_url_tuple('file://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose', 'HOST', 'PATH', 'QUERY', 'REF', 'PROTOCOL', 'AUTHORITY', 'FILE', 'USERINFO', 'QUERY:type', 'QUERY:name') as (item0, item1, item2, item3, item4, item5, item6, item7, item8, item9);
    The following result is returned:
    +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
    | item0      | item1      | item2      | item3      | item4      | item5      | item6      | item7      | item8      | item9      |
    +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
    | example.com | /over/there/index.dtb | type=animal&name=narwhal | nose       | file       | username:password@example.com:8042 | /over/there/index.dtb?type=animal&name=narwhal | username:password | animal     | narwhal    |
    +------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+

REGEXP_COUNT

  • Syntax
    bigint regexp_count(string <source>, string <pattern>[, bigint <start_position>])
  • Description

    Returns the number of substrings that match a specified pattern in the source string from the start position specified by start_position.

  • Parameters
    • source: required. A value of the STRING type. This parameter specifies the string that contains the substring you want to search for. If the value is not a string, an error is returned.
    • pattern: required. A constant of the STRING type or a regular expression. This parameter specifies the pattern that a specified substring must match. For more information about regular expressions, see Regular expressions. If pattern is an empty string or is of another data type, an error is returned.
    • start_position: optional. A constant of the BIGINT type. The value of this parameter must be greater than 0. If the value is of another data type or is less than or equal to 0, an error is returned. If you do not specify this parameter, the default value is 1. This value indicates that the search starts from the first character of the source string.
  • Return value
    A value of the BIGINT type is returned. The return value varies based on the following rules:
    • If no substring is matched, 0 is returned.
    • If the value of source, pattern, or start_position is null, null is returned.
  • Examples:
    • Example 1: Calculate the number of substrings in abababc that match a specified pattern from the specified position. Sample statements:
      -- The return value is 1. 
      select regexp_count('abababc', 'a.c');
      -- The return value is 2. 
      select regexp_count('abababc', '[[:alpha:]]{2}', 3);
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select regexp_count('abababc', null);
    • Example 3: Calculate the number of colons (:) in the JSON string {"account_id":123456789,"account_name":"allen","location":"hangzhou","bill":100}. Sample statement:
      -- The return value is 4. 
      select regexp_count('{"account_id":123456789,"account_name":"allen","location":"hangzhou","bill":100}',':');

REGEXP_EXTRACT

  • Syntax
    string regexp_extract(string <source>, string <pattern>[, bigint <groupid>])
  • Description

    Splits the source string into groups based on a given pattern and returns the string in the nth group specified by groupid.

  • Parameters
    • source: required. A value of the STRING type. This parameter specifies the string that you want to split.
    • pattern: required. A constant of the STRING type or a regular expression. This parameter specifies the pattern that a specified substring must match. For more information about regular expressions, see Regular expressions.
    • groupid: optional. A constant of the BIGINT type. The value of this parameter must be greater than or equal to 0.
    Note Data is stored in the UTF-8 format. Chinese characters can be represented in hexadecimal. They are encoded in the range of [\\x{4e00},\\x{9fa5}].
  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If pattern is an empty string or no group is specified in pattern, an error is returned.
    • If the value of groupid is not of the BIGINT type or is less than 0, an error is returned. If you do not specify this parameter, the default value is 1. This value indicates that the string in the first group is returned. If the value of groupid is 0, all substrings that match pattern are returned.
    • If the value of source, pattern, or groupid is null, null is returned.

REGEXP_INSTR

  • Syntax
    bigint regexp_instr(string <source>, string <pattern>[,bigint <start_position>[, bigint <occurrence>[, bigint <return_option>]]])
  • Description

    Returns the start or end position of the substring that matches pattern at the nth occurrence specified by occurrence, in the source string from the start position specified by start_position.

  • Parameters
    • source: required. A value of the STRING type. This parameter specifies the source string.
    • pattern: required. A constant of the STRING type or a regular expression. This parameter specifies the pattern that a specified substring must match. For more information about regular expressions, see Regular expressions. If pattern is an empty string, an error is returned.
    • start_position: optional. A constant of the BIGINT type. This parameter specifies the position from which the search starts. If you do not specify this parameter, the default value 1 is used.
    • occurrence: optional. A constant of the BIGINT type. If you do not specify this parameter, the default value 1 is used. This value indicates the position where a substring matches the pattern in the search for the first time.
    • return_option: optional. A constant of the BIGINT type. This parameter specifies whether the start or end position of the substring that matches a specified pattern is returned. Valid values: 0 and 1. If this parameter is not specified, the default value 0 is used. If this parameter is set to an invalid number or a value of another data type, an error is returned. The value 0 indicates that the start position of the substring that matches a specified pattern is returned. The value 1 indicates that the end position of the substring that matches a specified pattern is returned.
  • Return value
    A value of the BIGINT type is returned. The return value is the start or end position specified by return_option of the matched substring in the source string. The return value varies based on the following rules:
    • If pattern is an empty string, an error is returned.
    • If the value of start_position or occurrence is not of the BIGINT type or is less than or equal to 0, an error is returned.
    • If the value of source, pattern, start_position, occurrence, or return_option is null, null is returned.
  • Examples:
    • Example 1: Return the start position of the substring that matches o[[:alpha:]]{1} at the second occurrence in the i love www.taobao.com string from the third character. Sample statement:
      -- The return value is 14. 
      select regexp_instr('i love www.taobao.com', 'o[[:alpha:]]{1}', 3, 2);
    • Example 2: Return the end position of the substring that matches o[[:alpha:]]{1} at the second occurrence in the i love www.taobao.com string from the third character. Sample statement:
      -- The return value is 16. 
      select regexp_instr('i love www.taobao.com', 'o[[:alpha:]]{1}', 3, 2, 1);
    • Example 3: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select regexp_instr('i love www.taobao.com', null, 3, 2);

REGEXP_REPLACE

  • Syntax
    string regexp_replace(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])
  • Description

    Uses a string specified by replace_string to replace the substring that matches a given pattern at the nth occurrence specified by occurrence in the source string and returns a result.

  • Parameters
    • source: required. A value of the STRING type. This parameter specifies the strings that you want to replace.
    • pattern: required. A constant of the STRING type or a regular expression. This parameter specifies the pattern that a specified substring must match. For more information about regular expressions, see Regular expressions. If pattern is an empty string, an error is returned.
    • replace_string: required. A value of the STRING type. The value is used to replace the string that matches the pattern.
    • occurrence: optional. A constant of the BIGINT type, which must be greater than or equal to 0. The value of this parameter indicates that the string that matches the specified pattern at the nth occurrence specified by occurrence is replaced with replace_string. If this parameter is set to 0, all the substrings that match the specified pattern are replaced. If this parameter is of another data type or is less than 0, an error is returned. Default value: 0.
  • Return value

    A value of the STRING type is returned. The return value varies based on the following rules:

    • If the group that you want to reference does not exist, no substring is replaced.
    • If the value of replace_string is null and a substring matches the given pattern, null is returned.
    • If the value of replace_string is null but no substring matches the given pattern, the original string is returned.
    • If the value of source, pattern, or occurrence is null, null is returned.
  • Examples:
    • Example 1: Use (\\1)\\2-\\3 to replace all the substrings that match ([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4}) in the 123.456.7890 string. Sample statement:
      -- The return value is (123)456-7890. 
      select regexp_replace('123.456.7890', '([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4})',
      '(\\1)\\2-\\3', 0);
    • Example 2: Replace the substring that matches the specified pattern in the abcd string. Sample statements:
      -- The return value is a b c d. 
      select regexp_replace('abcd', '(.)', '\\1 ', 0);
      -- The return value is a bcd. 
      select regexp_replace('abcd', '(.)', '\\1 ', 1);
      -- The return value is abcd. 
      select regexp_replace('abcd', '(.)', '\\2', 1);
    • Example 3: The data of the url column in the table url_set is in the www.simple@xxx.com format and xxx in each row is different. Replace all the content after www in the url column. Sample statement:
      -- The return value is wwwtest. 
      select regexp_replace(url,'(www)(.*)','wwwtest',0) from url_set;
    • Example 4: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select regexp_replace('abcd', '(.)', null, 0);
    • Example 5: The group that you want to reference does not exist. Sample statements:
      -- Only one group is defined in the pattern and the group that you want to reference does not exist. 
      -- We recommend that you do not use this function in this way. The result of referencing a nonexistent group is not defined. 
      regexp_replace("abcd", "(.*)(.)$", "\\2", 0) = "d"
      -- No group is defined in the pattern. Therefore, "\1" references a nonexistent group. 
      -- We recommend that you do not use this function in this way. The result of referencing a nonexistent group is not defined. 
      regexp_replace("abcd", "a", "\\1", 0) = "bcd"

REGEXP_SUBSTR

  • Syntax
    string regexp_substr(string <source>, string <pattern>[, bigint <start_position>[, bigint <occurrence>]])
  • Description

    Returns a string that matches a given pattern at the nth occurrence specified by occurrence, in the source string from the start position specified by start_position.

  • Parameters
    • source: required. A value of the STRING type. This parameter specifies the string in which the substring you want to search for.
    • pattern: required. A constant of the STRING type or a regular expression. This parameter specifies the pattern that a specified substring must match. For more information about regular expressions, see Regular expressions.
    • start_position: optional. A constant of the BIGINT type. The value of this parameter must be greater than 0. If you do not specify this parameter, the default value is 1, which indicates that the search starts from the first character of the source string.
    • occurrence: optional. A constant of the BIGINT type. The value of this parameter must be greater than 0. If you do not specify this parameter, the default value is 1, which indicates that the first matched substring is returned.
  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If pattern is an empty string, an error is returned.
    • If no substring matches the specified pattern, null is returned.
    • If the value of start_position or occurrence is not of the BIGINT type or is less than or equal to 0, an error is returned.
    • If the value of source, pattern, start_position, occurrence, or return_option is null, null is returned.
  • Examples:
    • Example 1: Return the substring in the I love aliyun very much string that matches a specified pattern. Sample statements:
      -- The return value is aliyun. 
      select regexp_substr('I love aliyun very much', 'a[[:alpha:]]{5}');
      -- The return value is have. 
      select regexp_substr('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 1);
      -- The return value is 2. 
      select regexp_substr('I have 2 apples and 100 bucks!', '[[:blank:]][[:alnum:]]*', 1, 2);
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select regexp_substr('I love aliyun very much', null);

REPEAT

  • Syntax
    string repeat(string <str>, bigint <n>)
  • Description

    Returns a string that repeats the string specified by str for n times.

  • Parameters
    • str: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation.
    • n: required. The value of this parameter must be of the BIGINT type. The returned string cannot exceed 2 MB in length.
  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of str is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
    • If n is left empty, an error is returned.
    • If the value of str or n is null, null is returned.
  • Examples:
    • Example 1: Return a string that repeats the string abc for 5 times. Sample statement:
      -- The return value is abcabcabcabcabc. 
      select repeat('abc', 5); 
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select repeat('abc', null);

REVERSE

  • Syntax
    string reverse(string <str>)
  • Description

    Returns a string in reverse order.

  • Parameters

    str: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation.

  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of str is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
    • If the value of str is null, null is returned.
  • Examples:
    • Example 1: Return a string whose characters are in reverse order of the string I love aliyun very much. Sample statement:
      -- The return value is hcum yrev nuyila evol I. 
      select reverse('I love aliyun very much');
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select reverse(null);

RTRIM

  • Syntax
    string rtrim(string <str>[, <trimChars>])
    string trim(trailing [<trimChars>] from <str>)
  • Description
    Removes the characters from the right side of a string that is specified by str.
    • If you do not specify trimChars, the spaces on the right side are removed by default.
    • If you specify trimChars, the substring that consists of one or more characters specified by trimChars is removed from the right side of the string that is specified by str.
  • Parameters
    • str: required. A value of the STRING type. This parameter specifies the string from the right side of which the characters are removed. If the input value is of the BIGINT, DECIMAL, DOUBLE, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation.
    • trimChars: optional. A value of the STRING type. This parameter specifies the characters to be removed.
  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of str is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
    • If the value of str or trimChars is null, null is returned.
  • Examples:
    • Example 1: Remove the space from the right side of string yxTxyomxx . Sample statement:
      -- The return value is yxTxyomxx. 
      select rtrim(' yxTxyomxx ');
      -- The preceding statement is equivalent to the following statement: 
      select trim(trailing from ' yxTxyomxx ');
    • Example 2: Remove the substring that consists of one or more characters in the xy collection from the right side of the string yxTxyomxx.
      -- The return value is yxTxyom. If x or y appears on the right side of a string, it is removed. 
      select rtrim('yxTxyomxx', 'xy');
      -- The preceding statement is equivalent to the following statement: 
      select trim(trailing 'xy' from 'yxTxyomxx');
    • Example 3: The input parameter is set to null. Sample statement:
      -- The return value is null. 
      select rtrim(null);
      select ltrim('yxTxyomxx', 'null');

SPACE

  • Syntax
    string space(bigint <n>)
  • Description

    Generates a space string with a length of n.

  • Parameters

    n: required. A value of the BIGINT type. The returned string cannot exceed 2 MB in length.

  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If n is left empty, an error is returned.
    • If the value of n is null, null is returned.
  • Examples:
    -- The return value is 10. 
    select length(space(10));

SPLIT_PART

  • Syntax
    string split_part(string <str>, string <separator>, bigint <start>[, bigint <end>])
  • Description

    Uses a delimiter specified by separator to split a string specified by str, and returns a substring that starts from the character specified by start and ends with the character specified by end.

  • Parameters
    • str: required. A value of the STRING type. This parameter specifies the string that you want to split. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation.
    • separator: required. A constant of the STRING type. This parameter specifies the delimiter that is used to split the string. The delimiter can be a character or a string.
    • start: required. A constant of the BIGINT type. The value of this parameter must be greater than 0. This parameter specifies the start position of the substring to be returned. The position starts from 1.
    • end: a constant of the BIGINT type. The value of this parameter must be greater than or equal to start. This parameter specifies the end position of the substring to be returned. If this parameter is not specified, the value of this parameter is equal to start and the substring that starts from the character specified by start is returned.
  • Return value

    A value of the STRING type is returned. The return value varies based on the following rules:

    • If start is set to a value greater than the number of substrings, for example, the string has 6 substrings but the start value is greater than 6, an empty string is returned.
    • If separator is absent from a string specified by str and start is set to 1, the entire string specified by str is returned. If str is an empty string, an empty string is returned.
    • If separator is an empty string, the original string specified by str is returned.
    • If the value of end is greater than the number of substrings, the substring that starts from the character specified by start is returned.
    • If the value of str is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
    • If the value of separator is not a constant of the STRING type, an error is returned.
    • If the value of start or end is not a constant of the BIGINT type, an error is returned.
    • If the value of an input parameter except separator is null, null is returned.
  • Examples:
    • Example 1: Use commas (,) as a delimiter to split the string a,b,c,d and return the substring that matches the specified rule. Sample statements:
      -- The return value is a. 
      select split_part('a,b,c,d', ',', 1);
      -- The return value is a,b. 
      select split_part('a,b,c,d', ',', 1, 2);
    • Example 2: The value of start is greater than the number of substrings after the specified string is split into the substrings. Sample statement:
      -- The return value is an empty string. 
      select split_part('a,b,c,d', ',', 10);
    • Example 3: separator does not exist in the string specified by str. Sample statements:
      -- The return value is a,b,c,d. 
      select split_part('a,b,c,d', ':', 1);
      -- The return value is an empty string. 
      select split_part('a,b,c,d', ':', 2);
    • Example 4: separator is an empty string. Sample statement:
      -- The return value is a,b,c,d. 
      select split_part('a,b,c,d', '', 1);
    • Example 5: The value of end is greater than the number of substrings after the specified string is split into the substrings. Sample statement:
      -- The return value is b,c,d. 
      select split_part('a,b,c,d', ',', 2, 6);
    • Example 6: An input parameter except separator is set to null. Sample statement:
      -- The return value is null. 
      select split_part('a,b,c,d', ',', null);

SUBSTR

  • Syntax
    string substr(string <str>, bigint <start_position>[, bigint <length>])
  • Description

    Returns a substring that starts from start_position in a string specified by str and has a length specified by length.

  • Parameters
    • str: required. A value of the STRING type. If the input value is of the BIGINT, DECIMAL, DOUBLE, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation.
    • start_position: required. A value of the BIGINT type. Default value: 1.
      • Hive-compatible data type edition: If start_position is set to 0, the return value is the same as that when this parameter is set to 1.
      • MaxCompute V1.0 and MaxCompute V2.0 data type editions: If start_position is set to 0, null is returned.
    • length: optional. A value of the BIGINT type. This parameter specifies the length of a substring. The value of this parameter must be greater than 0.
      Important
      • If setproject odps.function.strictmode is set to false and the value of the length parameter is less than 0, no substring is returned.
      • If setproject odps.function.strictmode is set to true and the value of the length parameter is less than 0, an error is returned.
  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of str is not of the STRING, BIGINT, DECIMAL, DOUBLE, or DATETIME type, an error is returned.
    • If the value of length is not of the BIGINT type or is less than or equal to 0, an error is returned.
    • If length is not specified, the substring from the start position to the end of str is returned.
    • If the value of str, start_position, or length is null, null is returned.
  • Examples:
    • Example 1: Return a substring with the specified length that starts from the specified position in the string abc. Sample statements:
      -- The return value is bc. 
      select substr('abc', 2);
      -- The return value is b. 
      select substr('abc', 2, 1);
      -- The return value is bc. 
      select substr('abc',-2 , 2);
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select substr('abc', null);

SUBSTRING

  • Syntax
    string substring(string|binary <str>, int <start_position>[, int <length>])
  • Description

    Returns a substring that starts from start_position in a string specified by str and has a length specified by length.

  • Parameters
    • str: required. A value of the STRING or BINARY type.
    • start_position: required. A value of the INT type. The start position starts from 1. If start_position is set to 0, an empty string is returned. If start_position is a negative value, the search starts from the end to the start of the string and the last character is -1.
    • length: optional. A value of the BIGINT type. This parameter specifies the length of a substring. The value of this parameter must be greater than 0.
  • Return value

    A value of the STRING type is returned. The return value varies based on the following rules:

    • If str is not of the STRING or BINARY type, an error is returned.
    • If the value of length is not of the BIGINT type or is less than or equal to 0, an error is returned.
    • If length is not specified, the substring from the start position to the end of str is returned.
    • If the value of str, start_position, or length is null, null is returned.
  • Examples:
    • Example 1: Return a substring with the specified length that starts from the specified position in the string abc. Sample statements:
      -- The return value is bc. 
      select substring('abc', 2);
      -- The return value is b. 
      select substring('abc', 2, 1);
      -- The return value is bc. 
      select substring('abc',-2,2);
      -- The return value is ab. 
      select substring('abc',-3,2);
      -- The return value is 001. 
      substring(bin(2345), 2, 3);
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select substring('abc', null, null);

TO_CHAR

  • Syntax
    string to_char(boolean <value>)
    string to_char(bigint <value>)
    string to_char(double <value>)
    string to_char(decimal <value>)
  • Description

    Converts data of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type into the STRING type.

  • Parameters

    value: required. A value of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type.

  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If value is not of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type, an error is returned.
    • If value is set to null, null is returned.
  • Examples
    • Example 1: Convert values into the STRING type. Sample statements:
      -- The return value is 123. 
      select to_char(123);
      -- The return value is TRUE. 
      select to_char(true);
      -- The return value is 1.23. 
      select to_char(1.23);
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select to_char(null);

TO_JSON

  • Syntax
    to_json(<expr>)
  • Description

    Converts an expression specified by expr of a given complex data type into a JSON string.

  • Parameters
    expr: required. An expression of the ARRAY, MAP, or STRUCT type.
    Note If the input expression is of the STRUCT type (struct<key1:value1, key2:value2>), take note of the following points:
    • All keys are converted into lowercase letters when you convert the expression into a JSON string.
    • If a value is null, the key-value pair to which the value belongs is not included in the JSON string that is returned. For example, if value2 is null, key2:value2 is not included in the JSON string that is returned.
  • Examples
    • Example 1: Convert an expression of a given complex data type into a JSON string. Sample statement:
      -- {"a":1,"b":2} is returned. 
      select to_json(named_struct('a', 1, 'b', 2));
      -- {"time":"26/08/2015"} is returned. 
      select to_json(named_struct('time', "26/08/2015"));
      -- [{"a":1,"b":2}] is returned. 
      select to_json(array(named_struct('a', 1, 'b', 2)));
      -- {"a":{"b":1}} is returned. 
      select to_json(map('a', named_struct('b', 1)));
      -- {"a":1} is returned. 
      select to_json(map('a', 1));
      -- [{"a":1}] is returned. 
      select to_json(array((map('a', 1))));
    • Example 2: The input expression is of the STRUCT type. Sample statement:
      -- {"a":"B"} is returned. If the expression of the STRUCT type is converted into a JSON string, all keys are converted into lowercase letters. 
      select to_json(named_struct("A", "B"));
      -- {"k2":"v2"} is returned. The key-value pair to which null belongs is not included in the JSON string that is returned. 
      select to_json(named_struct("k1", cast(null as string), "k2", "v2"));

TOLOWER

  • Syntax
    string tolower(string <source>)
  • Description

    Converts uppercase letters in a string specified by source into lowercase letters.

  • Parameters

    source: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation.

  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of source is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
    • If the value of source is null, null is returned.
  • Examples
    • Example 1: Convert uppercase letters in a string into lowercase letters. Sample statements:
      -- The return value is abcd. 
      select tolower('aBcd');
      -- The return value is china fighting. 
      Select tolower ('China Fighting');
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select tolower(null);

TOUPPER

  • Syntax
    string toupper(string <source>)
  • Description

    Converts lowercase letters in a string specified by source into uppercase letters.

  • Parameters

    source: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation.

  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of source is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
    • If the value of source is null, null is returned.
  • Examples
    • Example 1: Convert lowercase letters in a string into uppercase letters. Sample statements:
      -- The return value is ABCD. 
      select toupper('aBcd');
      -- The return value is CHINA FIGHTING. 
      select toupper('China Fighting');
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select toupper(null);

TRIM

  • Syntax
    string trim(string <str>[,<trimChars>])
    string trim([BOTH] [<trimChars>] from <str>)
  • Description
    Removes the characters from both of the left and right sides of a string that is specified by str.
    • If you do not specify trimChars, the spaces on both of the left and right sides are removed by default.
    • If you specify trimChars, the substrings that consist of one or more characters specified by trimChars are removed from both of the left and right sides of the string that is specified by str.
  • Parameters
    • str: required. A value of the STRING type. This parameter specifies the string from both of the left and right sides of which the characters are removed. If the input value is of the BIGINT, DECIMAL, DOUBLE, or DATETIME type, the value is implicitly converted into a value of the STRING type before calculation.
    • trimChars: optional. A value of the STRING type. This parameter specifies the characters to be removed.
  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of str is not of the STRING, BIGINT, DOUBLE, DECIMAL, or DATETIME type, an error is returned.
    • If the value of str or trimChars is null, null is returned.
  • Examples
    • Example 1: Remove the space from both of the left and right sides of the string yxTxyomxx . Sample statement:
      -- The return value is yxTxyomxx. 
      select trim(' yxTxyomxx ');
      -- The preceding statement is equivalent to the following statement: 
      select trim(both from ' yxTxyomxx ');
      select trim(from ' yxTxyomxx ');
    • Example 2: Remove the substrings that consist of one or more characters in the xy collection from both of the left and right sides of the string yxTxyomxx.
      -- The return value is yxTxyom. If x or y appears on the left or right side, it is removed. 
      select trim('yxTxyomxx', 'xy');
      -- The preceding statement is equivalent to the following statement: 
      select trim(both 'xy' from 'yxTxyomxx');
      select trim('xy' from 'yxTxyomxx');
    • Example 3: The input parameter is set to null. Sample statement:
      -- The return value is null. 
      select trim(null);
      select trim('yxTxyomxx', null);

URL_DECODE

  • Syntax
    string url_decode(string <input>[, string <encoding>])
  • Description
    Converts an input string from the application/x-www-form-urlencoded MIME format into a normal string. This is the inverse function of url_encoding. The encoding format must comply with the following rules:
    • All letters remain unchanged.
    • Periods (.), hyphens (-), asterisks (*), and underscores (_) remain unchanged.
    • Plus signs (+) are converted into spaces.
    • The %xy formatted sequence is converted into byte values. Consecutive byte values are decoded to the related strings based on the value of encoding.
    • Other characters remain unchanged.
  • Parameters
    • input: required. A value of the STRING type. This parameter specifies the string that you want to enter.
    • encoding: optional. This parameter specifies the encoding format, which can be GBK or UTF-8. If you do not specify this parameter, the default value is UTF-8.
  • Return value

    A UTF-8 encoded string of the STRING type is returned. If the value of input or encoding is null, null is returned.

  • Examples
    -- The return value is for url_decode:// (fdsf). 
    select url_decode('%E7%A4%BA%E4%BE%8Bfor+url_decode%3A%2F%2F+%28fdsf%29');
    -- The return value is Example for URL_DECODE:// dsf(fasfs). 
    select url_decode('Example+for+url_decode+%3A%2F%2F+dsf%28fasfs%29', 'GBK');

URL_ENCODE

  • Syntax
    string url_encode(string <input>[, string <encoding>])
  • Description
    Encodes the input string in the application/x-www-form-urlencoded MIME format and returns the encoded string. The encoding format must comply with the following rules:
    • All letters remain unchanged.
    • Periods (.), hyphens (-), asterisks (*), and underscores (_) remain unchanged.
    • Spaces are converted into plus signs (+).
    • Other characters are converted into byte values based on the specified encoding format. Each byte value is then represented in the %xy format, where xy is the hexadecimal representation of the character value.
  • Parameters
    • input: required. A value of the STRING type. This parameter specifies the string that you want to enter.
    • encoding: optional. This parameter specifies the encoding format, which can be GBK or UTF-8. If you do not specify this parameter, the default value is UTF-8.
  • Return value

    A value of the STRING type is returned. If the value of input or encoding is null, null is returned.

  • Examples
    -- The return value is %E7%A4%BA%E4%BE%8Bfor+url_encode%3A%2F%2F+%28fdsf%29. 
    select url_encode('Example for url_encode:// (fdsf)');
    -- The return value is Example+for+url_encode+%3A%2F%2F+dsf%28fasfs%29. 
    select url_encode('Example for url_encode:// dsf(fasfs)', 'GBK');

JSON_TUPLE

  • Syntax
    string json_tuple(string <json>, string <key1>, string <key2>,...)
  • Description

    Extracts strings from a standard JSON string based on a set of input keys, such as (key1,key2,...).

  • Parameters
    • json: required. A value of the STRING type. This parameter specifies a standard JSON string.
    • key: required. A value of the STRING type. This parameter is used to describe the path of a JSON object in the JSON string. The value cannot start with a dollar sign ($). You can enter multiple keys at a time. MaxCompute parses JSON objects by using . or ['']. If a key in a JSON object contains a period (.), [''] can be used.
  • Return values

    A value of the STRING type is returned.

    Note
    • If json is empty or invalid, null is returned.
    • If key is empty, invalid, or does not exist in the JSON string, null is returned.
    • If json is valid and key exists, the related string is returned.
    • This function can parse JSON data that contains Chinese characters.
    • This function can parse nested JSON data.
    • This function can parse JSON data that contains nested arrays.
    • The parsing action is equivalent to the execution of GET_JSON_OBJECT along with set odps.sql.udf.getjsonobj.new=true;. To obtain multiple objects from a JSON string, you must call the GET_JSON_OBJECT function multiple times. As a result, the JSON string is parsed multiple times. The JSON_TUPLE function allows you to enter multiple keys at a time and the JSON string is parsed only once. JSON_TUPLE is more efficient than GET_JSON_OBJECT.
    • JSON_TUPLE is a UDTF and is used with Lateral View when other columns need to be selected.

LPAD

  • Syntax
    string lpad(string <str1>, int <length>, string <str2>)
  • Description

    Left pads str1 with str2 to a specified length. This function is an extension function of MaxCompute V2.0.

  • Parameters
    • str1: required. A value of the STRING type. This parameter specifies the string that you want to left pad.
    • length: required. A value of the INT type. This parameter specifies the number of characters that are used for left padding.
    • str2: required. This parameter specifies the string that you use to left pad another string.
  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of length is less than the number of characters in str1, this function truncates str1 from the left to obtain a string with the number of characters specified by length.
    • If length is set to 0, an empty string is returned.
    • If no input parameters are present or an input parameter is set to null, null is returned.
  • Examples
    • Example 1: Left pad the string abcdefgh with the string12 to obtain a string with 10 characters in length. Sample statement:
      -- The return value is 12abcdefgh. 
      select lpad('abcdefgh', 10, '12');
    • Example 2: Left pad the string abcdefgh with the string12 to obtain a string with 5 characters in length. Sample statement:
      -- The return value is abcde. 
      select lpad('abcdefgh', 5, '12');
    • Example 3: The value of length is 0. Sample statement:
      -- The return value is an empty string. 
      select lpad('abcdefgh' ,0, '12'); 
    • Example 4: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select lpad(null ,0, '12');

RPAD

  • Syntax
    string rpad(string <str1>, int <length>, string <str2>)
  • Description

    Right pads str1 with str2 to a specified length. This function is an extension function of MaxCompute V2.0.

  • Parameters
    • str1: required. A value of the STRING type. This parameter specifies the string that you want to right pad.
    • length: required. A value of the INT type. This parameter specifies the number of characters that are used for right padding.
    • str2: required. This parameter specifies the string that you use to right pad another string.
  • Return value
    A value of the STRING type is returned. The return value varies based on the following rules:
    • If the value of length is less than the number of characters in str1, this function truncates str1 from the left to obtain a string with the number of characters specified by length.
    • If length is set to 0, an empty string is returned.
    • If no input parameters are present or an input parameter is set to null, null is returned.
  • Examples
    • Example 1: Right pad the string abcdefgh with the string 12 to obtain a string with 10 characters in length. Sample statement:
      -- The return value is abcdefgh12. 
      select rpad('abcdefgh', 10, '12');
    • Example 2: Right pad the string abcdefgh with the string 12 to obtain a string with 5 characters in length. Sample statement:
      -- The return value is abcde. 
      select rpad('abcdefgh', 5, '12');
    • Example 3: The value of length is 0. Sample statement:
      -- The return value is an empty string. 
      select rpad('abcdefgh' ,0, '12'); 
    • Example 4: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select rpad(null ,0, '12');

REPLACE

  • Syntax
    string replace(string <str>, string <old>, string <new>)
  • Description

    If a part of the string specified by str exactly matches the string specified by old, this part of the string is replaced by the string specified by new, and the complete string after the replacement is returned. If no part of the string specified by str matches the string specified by old, the original string specified by str is returned. This function is an extension function of MaxCompute V2.0.

  • Parameters
    • str: required. A value of the STRING type. This parameter specifies the string that you want to replace.
    • old: required. This parameter specifies the string that you use for comparison.
    • new: required. This parameter specifies the string that you use to replace the original string.
  • Return value

    A value of the STRING type is returned. If an input parameter is set to null, null is returned.

  • Examples
    • Example 1: Replace the part of the string ababab that is exactly the same as the string abab with the string12. Sample statement:
      -- The return value is 12ab. 
      select replace('ababab','abab','12');
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select replace('123abab456ab',null,'abab');

SOUNDEX

  • Syntax
    string soundex(string <str>)
  • Description

    Converts a normal string into a string of the SOUNDEX type.

  • Parameters

    str: required. A value of the STRING type. This parameter specifies the string that you want to convert. This function is an extension function of MaxCompute V2.0.

  • Return value

    A value of the STRING type is returned. If the value of str is null, null is returned.

  • Examples
    • Example 1: Convert the string hello into a string of the SOUNDEX type. Sample statement:
      -- The return value is H400. 
      select soundex('hello');
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select soundex(null);

SUBSTRING_INDEX

  • Syntax
    string substring_index(string <str>, string <separator>, int <count>)
  • Description

    Truncates the string str to a substring from the first character to the nth delimiter. n is specified by count. If count is a positive value, the string is truncated from left to right. If count is a negative value, the string is truncated from right to left. This function is an extension function of MaxCompute V2.0.

  • Parameters
    • str: required. A value of the STRING type. This parameter specifies the string that you want to truncate.
    • separator: required. A delimiter of the STRING type.
    • count: required. A value of the INT type. This parameter specifies the position of the delimiter.
  • Return value

    A value of the STRING type is returned. If an input parameter is set to null, null is returned.

  • Examples
    • Example 1: Truncate the string https://www.alibabacloud.com. Sample statements:
      -- The return value is https://www.alibabacloud. 
      select substring_index('https://www.alibabacloud.com', '.', 2);
      -- The return value is alibabacloud.com. 
      select substring_index('https://www.alibabacloud.com', '.', -2);
    • Example 2: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select substring_index('https://www.alibabacloud.com', null, 2);

TRANSLATE

  • Syntax
    string translate(string|varchar <str1>, string|varchar <str2>, string|varchar <str3>)
  • Description

    Uses the specified characters in str3 to replace the characters that are included in str1 from str2. If characters that are included in str1 do not appear in str2, no character is replaced. This function is an extension function of MaxCompute V2.0.

  • Return value

    A value of the STRING type is returned. If an input parameter is set to null, null is returned.

  • Examples
    • Example 1: Use the specified characters in the string cd to replace the characters that are included in ababab from abab. Sample statement:
      -- The return value is cdcdcd. 
      select translate('ababab','abab','cd');
    • Example 2: Use the specified characters in the string cdefg to replace the characters that are included in ababab from abab. Sample statement:
      -- The return value is cdcdcd. 
      select translate('ababab','abab','cdefg');
    • Example 3: An input parameter is set to null. Sample statement:
      -- The return value is null. 
      select translate('ababab','cd',null);

REGEXP_EXTRACT_ALL

  • Syntax
    array<T> regexp_extract_all(string <source>, string <pattern>[,bigint <group_id>])
  • Description

    Finds all substrings that match the pattern of a regular expression in a string and returns the substrings as an array.

  • Parameters
    • source: required. A value of the STRING type. This parameter specifies the string that you want to analyze.
    • pattern: required. A value of the STRING type. This parameter specifies the pattern that you want substrings to match. This parameter can be a constant of the STRING type or a regular expression. For more information about regular expressions, see Regular expressions.
    • group_id: optional. A value of the BIGINT type. This parameter specifies the ID of the group that is used to match the pattern. The value of this parameter must be greater than or equal to 0. If you do not specify this parameter, the group with an ID of 1 is used to match the pattern. If you set this parameter to 0, all groups are used to match the pattern.
  • Return value

    A value of the ARRAY type is returned. If you specify group_id, an array that consists of all matching results for the group specified by group_id is returned. If you do not specify group_id, an array that consists of all matching results for the group with an ID of 1 is returned.

  • Examples
    • If you do not specify group_id, an array that consists of all matching results for the group with an ID of 1 is returned. Sample statement:
      SELECT regexp_extract_all('100-200, 300-400', '(\\d+)-(\\d+)');
      The following result is returned:
      +------------+
      | _c0        |
      +------------+
      | [100,300] |
      +------------+
    • If you set group_id to 2, an array that consists of all matching results for the group with an ID of 2 is returned. Sample statement:
      SELECT regexp_extract_all('100-200, 300-400', '(\\d+)-(\\d+)',2);
      The following result is returned:
      +------------+
      | _c0        |
      +------------+
      | [200,400] |
      +------------+