You can use complex type functions in MaxCompute SQL to process data of complex data types, such as ARRAY, MAP, STRUCT, and JSON. This topic describes the syntax and parameters of complex type functions that are supported by MaxCompute SQL, and provides examples on how to use complex type functions. This topic guides you through data development by using complex type functions.

The following table describes the complex type functions that are supported by MaxCompute SQL.

Function type Function Description
ARRAY functions ALL_MATCH Checks whether all the elements in an array meet a specific condition.
ANY_MATCH Checks whether an element in an array meets a specific condition.
ARRAY Creates an array based on given values.
ARRAY_CONTAINS Checks whether an array contains a given value.
ARRAY_DISTINCT Removes duplicate elements from an array.
ARRAY_EXCEPT Finds the elements that exist in Array A but do not exist in Array B and returns the elements as a new array without duplicates.
ARRAY_INTERSECT Calculates the intersection of two arrays.
ARRAY_JOIN Concatenates the elements in an array by using a delimiter.
ARRAY_MAX Returns the largest element in an array.
ARRAY_MIN Returns the smallest element in an array.
ARRAY_POSITION Returns the position of the first occurrence of a given element in an array.
ARRAY_REDUCE Aggregates the elements in an array.
ARRAY_REMOVE Removes a given element from an array.
ARRAY_REPEAT Returns a new array in which a given element is repeated several times.
ARRAY_SORT Sorts the elements in an array based on a comparator.
ARRAY_UNION Calculates the union of two arrays and returns the union as a new array without duplicates.
ARRAYS_OVERLAP Checks whether two arrays contain the same element.
ARRAYS_ZIP Merges multiple arrays.
CONCAT Concatenates multiple arrays or strings.
EXPLODE Transposes one row of data into multiple rows. This function is a user-defined table-valued function (UDTF).
FILTER Filters the elements in an array.
INDEX Returns the element at a specific position in an array.
POSEXPLODE Converts an array into a table that has two columns. The first column lists the position of each element in the array, starting from 0. The second column lists the elements.
SIZE Returns the number of elements in an array.
SLICE Copies the elements in an array from a specific position based on a specific length and returns the elements as a new array.
SORT_ARRAY Sorts the elements in an array.
TRANSFORM Transforms the elements in an array.
ZIP_WITH Merges two arrays at the element level based on element positions and returns a new array.
MAP functions EXPLODE Transposes one row of data into multiple rows. This function is a UDTF.
INDEX Returns the value that meets a specific condition in a map.
MAP Creates a map based on given key-value pairs.
MAP_CONCAT Returns the union of multiple maps.
MAP_ENTRIES Converts key-value pairs in a map into a struct array.
MAP_FILTER Filters the elements in a map.
MAP_FROM_ARRAYS Creates a map based on given arrays.
MAP_FROM_ENTRIES Creates a map based on given struct arrays.
MAP_KEYS Returns all keys in a map as an array.
MAP_VALUES Returns all values in a map as an array.
MAP_ZIP_WITH Merges two given maps into a single map.
SIZE Returns the number of key-value pairs in a map.
TRANSFORM_KEYS Transforms the keys in a map by using a given function. The values in the map are not changed.
TRANSFORM_VALUES Transforms the values in a map by using a given function. The keys in the map are not changed.
STRUCT functions FIELD Obtains the value of a member variable in a struct.
INLINE Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row.
STRUCT Creates a struct based on a given value list.
NAMED_STRUCT Creates a struct based on given name-value pairs.
JSON functions FROM_JSON Returns data of the ARRAY, MAP, or STRUCT type based on a given JSON string and a given output format.
GET_JSON_OBJECT Extracts a single string from a standard JSON string by using a specific method.
JSON_TUPLE Extracts strings from a standard JSON string based on a set of input keys.
TO_JSON Converts data of a complex data type into a JSON string.

ALL_MATCH

  • Syntax
    boolean all_match(array<T> <a>, function<T, boolean> <predicate>)
  • Description

    Checks whether all the elements in Array a meet the predicate condition.

  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.
    • predicate: required. This parameter specifies the built-in function, user-defined function, or expression that is used to determine whether all the elements in Array a meet requirements. The data type of the input parameter must be the same as the data type of the elements in Array a.
  • Return value
    A value of the BOOLEAN type is returned. The return value varies based on the following rules:
    • If all the elements in Array a meet the predicate condition or the array is empty, True is returned.
    • If one or more elements in Array a do not meet the predicate condition, False is returned.
    • If an element in Array a is null and other elements meet the predicate condition, NULL is returned.
  • Examples
    • Example 1: Check whether all the elements in array(4, 5, 6) meet the x-> x > 3 condition. This condition specifies that all elements are greater than 3. Sample statement:
      -- True is returned. 
      select all_match(array(4, 5, 6), x -> x>3);
    • Example 2: The array is empty. Sample statement:
      -- True is returned. 
      select all_match(array(), x -> x>3);
    • Example 3: Check whether all the elements in array(1, 2, -10, 100, -30) meet the x-> x > 3 condition. Sample statement:
      -- False is returned. 
      select all_match(array(1, 2, -10, 100, -30), x -> x>3);
    • Example 4: Check whether all the elements in array(10, 100, 30, null) meet the x-> x > 3 condition. Sample statement:
      -- NULL is returned. 
      select all_match(array(10, 100, 30, null), x -> x>3);

ANY_MATCH

  • Syntax
    boolean any_match(array<T> <a>, function<T, boolean> <predicate>)
  • Description

    Checks whether an element in Array a meets the predicate condition.

  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.
    • predicate: required. This parameter specifies the built-in function, user-defined function, or expression that is used to determine whether all the elements in Array a meet requirements. The data type of the input parameter must be the same as the data type of the elements in Array a.
  • Return value
    A value of the BOOLEAN type is returned. The return value varies based on the following rules:
    • If one or more elements in Array a meet the predicate condition, True is returned.
    • If no elements in Array a meet the predicate condition or the array is empty, False is returned.
    • If an element in Array a is null and other elements do not meet the predicate condition, NULL is returned.
  • Examples
    • Example 1: Check whether an element in array(1, 2, -10, 100, -30) meets the x-> x > 3 condition. Sample statement:
      -- True is returned. 
      select any_match(array(1, 2, -10, 100, -30), x-> x > 3);
    • Example 2: The array is empty. Sample statement:
      -- False is returned. 
      select any_match(array(), x-> x > 3);
    • Example 3: Check whether an element in array(1, 2, -10, -20, -30) meets the x-> x > 3 condition. Sample statement:
      -- False is returned. 
      select any_match(array(1, 2, -10, -20, -30), x-> x > 3);
    • Example 4: Check whether an element in array(1, 2, null, -10) meets the x-> x > 3 condition. Sample statement:
      -- NULL is returned. 
      select any_match(array(1, 2, null, -10), x-> x > 3);

ARRAY

  • Syntax
    array array(<value>,<value>[, ...])
  • Description

    Creates an array based on given values.

  • Parameters

    value: required. All data types are supported. All values must be of the same data type.

  • Return value

    A value of the ARRAY type is returned.

  • Examples
    The t_table table contains the c1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT) fields. Data in the table:
    +------------+----+----+------------+------------+
    | c1         | c2 | c3 | c4         | c5         |
    +------------+----+----+------------+------------+
    | 1000       | k11 | k21 | 86         | 15         |
    | 1001       | k12 | k22 | 97         | 2          |
    | 1002       | k13 | k23 | 99         | 1          |
    +------------+----+----+------------+------------+
    Sample statement:
    -- Create an array based on the data in the c2, c4, c3, and c5 columns. 
    select array(c2,c4,c3,c5) from t_table;
    
    -- The following result is returned: 
    +------+
    | _c0  |
    +------+
    | [k11, 86, k21, 15] |
    | [k12, 97, k22, 2] |
    | [k13, 99, k23, 1] |
    +------+

ARRAY_CONTAINS

  • Syntax
    boolean array_contains(array<T> <a>, value <v>)
  • Description

    Checks whether Array a contains Element v.

  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.
    • v: required. This parameter specifies the element that you want to check. The value of v must be of the same data type as the elements in Array a.
  • Return value

    A value of the BOOLEAN type is returned.

  • Examples
    The t_table_array table contains thec1 (BIGINT) and t_array (ARRAY<STRING>) fields. Data in the table:
    +------------+---------+
    | c1         | t_array |
    +------------+---------+
    | 1000       | [k11, 86, k21, 15] |
    | 1001       | [k12, 97, k22, 2] |
    | 1002       | [k13, 99, k23, 1] |
    +------------+---------+
    Sample statement:
    -- Check whether the t_array column contains the value 1. 
    select c1, array_contains(t_array,'1') from t_table_array;
    -- The following result is returned: 
    +------------+------+
    | c1         | _c1  |
    +------------+------+
    | 1000       | false |
    | 1001       | false |
    | 1002       | true |
    +------------+------+

ARRAY_DISTINCT

  • Syntax
    array<T> array_distinct(array<T> <a>)
  • Description

    Removes duplicate elements from Array a.

  • Parameters

    a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

  • Return value
    A value of the ARRAY type is returned. The return value varies based on the following rules:
    • The returned array has no duplicate elements, and the elements are sorted in the same order as Array a.
    • If an element in Array a is null, the null element is involved in the operation.
    • If the input array is empty, an empty array is returned.
  • Examples
    • Example 1: Remove duplicate elements from array(10, 20, 30, 30, 20, 10). Sample statement:
      -- [10,20,30] is returned. 
      select array_distinct(array(10, 20, 30, 30, 20, 10));
    • Example 2: Remove duplicate elements from array(10, 20, 20, null, null, 30, 20, null). Sample statement:
      -- [10,20,null,30] is returned. 
      select array_distinct(array(10, 20, 20, null, null, 30, 20, null)); 
    • Example 3: The array is empty. Sample statement:
      -- [] is returned. 
      select array_distinct(array());

ARRAY_EXCEPT

  • Syntax
    array<T> array_except(array<T> <a>, array<T> <b>)
  • Description

    Finds the elements that exist in Array a but do not exist in Array b and returns the elements as a new array without duplicates.

  • Parameters

    a and b: required. These parameters specify arrays that T in array<T> specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.

  • Return value
    A value of the ARRAY type is returned. The return value varies based on the following rules:
    • The returned array has no duplicate elements, and the elements are sorted in the same order as Array a.
    • If an element in an array is null, the null element is involved in the operation.
    • If one of the input arrays is empty, a new array is returned based on the non-empty array without duplicates.
    • If the two input arrays are empty, an empty array is returned.
  • Examples
    • Example 1: Find the elements that exist in array(1, 1, 3, 3, 5, 5) but do not exist in array(1, 1, 2, 2, 3, 3) and return the elements as a new array without duplicates. Sample statement:
      -- [5] is returned. 
      select array_except(array(1, 1, 3, 3, 5, 5), array(1, 1, 2, 2, 3, 3));
    • Example 2: Find the elements that exist in array(1, 1, 3, 3, 5, 5, null, null) but do not exist in array(1, 1, 2, 2, 3, 3) and return the elements as a new array without duplicates. Sample statement:
      -- [5,null] is returned. 
      select array_except(array(1, 1, 3, 3, 5, 5, null, null), array(1, 1, 2, 2, 3, 3));
    • Example 3: One of the input arrays is empty. Sample statement:
      -- [2,1] is returned. 
      select array_except(array(2, 1, 1, 2), cast(array() as array<int>)); 
    • Example 4: The two input arrays are empty. Sample statement:
      -- [] is returned. 
      select array_except(cast(array() as array<int>), cast(array() as array<int>));

ARRAY_INTERSECT

  • Syntax
    array<T> array_intersect(array<T> <a>, array<T> <b>)  
  • Description

    Calculates the intersection of Array a and Array b and removes duplicate elements.

  • Parameters

    a and b: required. These parameters specify arrays that T in array<T> specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.

  • Return value
    A value of the ARRAY type is returned. The return value varies based on the following rules:
    • If an element in an array is null, the null element is involved in the operation.
    • The returned array has no duplicate elements, and the elements are sorted in the same order as Array a.
    • If Array a or Array b is null, NULL is returned.
  • Examples
    • Example 1: Calculate the intersection of array(1, 2, 3) and array(1, 3, 5), and remove duplicate elements. Sample statement:
      -- [1,3] is returned. 
      select array_intersect(array(1, 2, 3), array(1, 3, 5));
    • Example 2: Calculate the intersection of array(10, 20, 20, 30, 30, null, null) and array(30, 30, 20, 20, 40, null, null), and remove duplicate elements. Sample statement:
      -- [20,30,null] is returned. 
      select array_intersect(array(10, 20, 20, 30, 30, null, null), array(30, 30, 20, 20, 40, null, null)); 

ARRAY_JOIN

  • Syntax
    array_join(array<T> <a>, <delimiter>[, <nullreplacement>]) 
  • Description

    Concatenates the elements in Array a by using a delimiter. If the array contains a null element, use nullreplacement to specify the string that you want to use to replace the null element in the results. If you do not configure nullreplacement, the null element is ignored.

  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array.
      Note If the elements in the array are not of the STRING type, MaxCompute converts the data type of the elements into STRING.
    • delimiter: required. A value of the STRING type. This parameter specifies the string that is used to separate the concatenated elements in Array a.
    • nullreplacement: optional. This parameter specifies the string that is used to replace null elements.
  • Return value

    A value of the STRING type is returned.

  • Examples
    -- 10,20,20,30 is returned. 
    select array_join(array(10, 20, 20, null, null, 30), ",");
    -- 10##20##20##null##null##30 is returned. 
    select array_join(array(10, 20, 20, null, null, 30), "##", "null");

ARRAY_MAX

  • Syntax
    T array_max(array<T> <a>)  
  • Description

    Returns the largest element in Array a.

  • Parameters

    a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array.

    The following data types are supported:
    • TINYINT, SMALLINT, INT, and BIGINT
    • FLOAT and DOUBLE
    • BOOLEAN
    • DECIMAL and DECIMALVAL
    • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
    • STRING, BINARY, VARCHAR, and CHAR
    • ARRAY, STRUCT, and MAP
  • Return value
    The largest element in Array a is returned. The return value varies based on the following rules:
    • If Array a is null, NULL is returned.
    • If an element in Array a is null, the null element is not involved in the operation.
  • Examples
    -- 20 is returned. 
    select array_max(array(1, 20, null, 3));

ARRAY_MIN

  • Syntax
    T array_min(array<T> <a>)  
  • Description

    Returns the smallest element in Array a.

  • Parameters

    a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array.

    The following data types are supported:
    • TINYINT, SMALLINT, INT, and BIGINT
    • FLOAT and DOUBLE
    • BOOLEAN
    • DECIMAL and DECIMALVAL
    • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
    • STRING, BINARY, VARCHAR, and CHAR
    • ARRAY, STRUCT, and MAP
  • Return value
    The smallest element in Array a is returned. The return value varies based on the following rules:
    • If Array a is null, NULL is returned.
    • If an element in Array a is null, the null element is not involved in the operation.
  • Examples
    -- 1 is returned. 
    select array_min(array(1, 20, null, 3));

ARRAY_POSITION

  • Syntax
    bigint array_position(array<T> <a>, T <element>) 
  • Description

    Returns the position of the first occurrence of a given element in Array a. The position numbers of elements are counted from left to right and start from 1.

  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The following data types are supported:
      • TINYINT, SMALLINT, INT, and BIGINT
      • FLOAT and DOUBLE
      • BOOLEAN
      • DECIMAL and DECIMALVAL
      • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
      • STRING, BINARY, VARCHAR, and CHAR
      • ARRAY, STRUCT, and MAP
    • element: required. The element whose position you want to query. The data type of this parameter must be the same as the data type of the elements in Array a.
  • Return value
    A value of the BIGINT type is returned. The return value varies based on the following rules:
    • If Array a or element is null, NULL is returned.
    • If the specific element is not found, 0 is returned.
  • Examples
    • Example 1: Return the position of the first occurrence of 1 in array(3, 2, 1). Sample statement:
      -- 3 is returned. 
      select array_position(array(3, 2, 1), 1);
    • Example 2: element is null. Sample statement:
      -- NULL is returned. 
      select array_position(array(3, 1, null), null);

ARRAY_REMOVE

  • Syntax
    array<T> array_remove(array<T> <a>, T <element>) 
  • Description

    Removes a given element from Array a.

  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The following data types are supported:
      • TINYINT, SMALLINT, INT, and BIGINT
      • FLOAT and DOUBLE
      • BOOLEAN
      • DECIMAL and DECIMALVAL
      • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
      • STRING, BINARY, VARCHAR, and CHAR
      • ARRAY, STRUCT, and MAP
    • element: required. The element that you want to remove. The data type of this parameter must be the same as the data type of the elements in Array a.
  • Return value
    A value of the ARRAY type is returned. The return value varies based on the following rules:
    • If an element in Array a is null, the null element is not involved in the operation.
    • If Array a or element is null, NULL is returned.
    • If Array a does not contain the specific element, Array a is returned.
  • Examples
    • Example 1: Remove 1 from array(3, 2, 1). Sample statement:
      -- [3,2] is returned. 
      select array_remove(array(3, 2, 1), 1);
    • Example 2: element is null. Sample statement:
      -- NULL is returned. 
      select array_remove(array(3, 1, null), null);
    • Example 3: Remove 2 from array(3, 1, null). Sample statement:
      -- [3,1,null] is returned. 
      select array_remove(array(3, 1, null), 2);

ARRAY_REDUCE

  • Syntax
    R array_reduce(array<T> <a>, buf <init>, function<buf, T, buf> <merge>, function<buf, R> <final>)
  • Description

    Aggregates the elements in Array a.

  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.
    • init: required. The initial value of the intermediate result that is used to aggregate elements.
    • merge: required. A built-in function, user-defined function, or expression that is used to perform an operation on each element in Array a and the intermediate result. This function or expression uses the elements of Array a and the init parameter as input parameters.
    • final: required. A built-in function, user-defined function, or expression that is used to convert the intermediate result into the final result. This function or expression uses the result of merge as the input parameter. R specifies the data type of the output.
  • Return value

    The data type of the return value is the same as the data type that is specified for final.

  • Examples
    -- 6 is returned. 
    select array_reduce(array(1, 2, 3), 0, (buf, e)->buf + e, buf->buf);
    -- 2.5 is returned. 
    select array_reduce(array(1, 2, 3, 4), named_struct('sum', 0, 'count', 0), (buf, e)->named_struct('sum', buf.sum + e, 'count', buf.count + 1), buf -> buf.sum / buf.count);

ARRAY_REPEAT

  • Syntax
    array<T> array_repeat(T <element>, int <count>) 
  • Description

    Returns a new array in which element t is repeated count times.

  • Parameters
    • t: required. This parameter specifies the element that you want to repeat. The following data types are supported:
      • TINYINT, SMALLINT, INT, and BIGINT
      • FLOAT and DOUBLE
      • BOOLEAN
      • DECIMAL and DECIMALVAL
      • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
      • STRING, BINARY, VARCHAR, and CHAR
      • ARRAY, STRUCT, and MAP
    • count: required. This parameter specifies the number of repetitions. A value of the INT type is required. The value must be greater than or equal to 0.
  • Return value
    A value of the ARRAY type is returned. The return value varies based on the following rules:
    • If the value of count is null, NULL is returned.
    • If the value of count is less than 0, an empty array is returned.
  • Examples
    • Example 1: Repeat 123 twice and return the new array. Sample statement:
      -- [123, 123] is returned. 
      select array_repeat('123', 2);
    • Example 2: The value of count is null. Sample statement:
      -- NULL is returned. 
      select array_repeat('123', null);
    • Example 3: The value of count is less than 0. Sample statement:
      -- [] is returned. 
      select array_repeat('123', -1);

ARRAY_SORT

  • Syntax
    array<T> array_sort(array<T> <a>, function<T, T, bigint> <comparator>) 
  • Description

    Sorts the elements in Array a based on a comparator.

  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.
    • comparator: required. A built-in function, user-defined function, or expression that is used to compare two elements in the array.

      Processing logic of comparator(a, b): If a is equal to b, 0 is returned. If a is less than b, a negative integer is returned. If a is greater than b, a positive integer is returned. If comparator(a, b) returns NULL, an error is returned.

  • Return value

    A value of the ARRAY type is returned.

  • Examples
    -- [{"a":1,"b":10},{"a":2,"b":12},{"a":3,"b":11}] is returned. 
    select array_sort(a, (a,b)->case when a.a> b.a then 1L when a.a=b.a then 0L else -1L end)
    from values (
      array(named_struct('a', 1, 'b', 10),
            named_struct('a', 3, 'b', 11),
            named_struct('a', 2, 'b', 12)))
      as t(a);

ARRAY_UNION

  • Syntax
    array<T> array_union(array<T> <a>,  array<T> <b>)
  • Description

    Calculates the union of Array a and Array b, and removes duplicate elements.

  • Parameters

    a and b: required. These parameters specify arrays that T in array<T> specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.

    The following data types are supported:
    • TINYINT, SMALLINT, INT, and BIGINT
    • FLOAT and DOUBLE
    • BOOLEAN
    • DECIMAL and DECIMALVAL
    • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
    • STRING, BINARY, VARCHAR, and CHAR
    • ARRAY, STRUCT, and MAP
  • Return value

    A value of the ARRAY type is returned. If Array a or Array b is null, NULL is returned.

  • Examples
    • Example 1: Calculate the union of array(1, 2, 3) and array(1, 3, 5), and remove duplicate elements. Sample statement:
      -- [1,2,3,5] is returned. 
      select array_union(array(1, 2, 3), array(1, 3, 5));
    • Example 2: One of the arrays is null. Sample statement:
      -- NULL is returned. 
      select array_union(array(1, 2, 3), null);

ARRAYS_OVERLAP

  • Syntax
    boolean arrays_overlap(array<T> <a>,  array<T> <b>)
  • Description

    Checks whether Array a and Array b contain the same element.

  • Parameters

    a and b: required. These parameters specify arrays that T in array<T> specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.

    The following data types are supported:
    • TINYINT, SMALLINT, INT, and BIGINT
    • FLOAT and DOUBLE
    • BOOLEAN
    • DECIMAL and DECIMALVAL
    • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
    • STRING, BINARY, VARCHAR, and CHAR
    • ARRAY, STRUCT, and MAP
  • Return value

    A value of the BOOLEAN type is returned. The return value varies based on the following rules:

    • If Array a contains at least one element that is in Array b and is not null, True is returned.
    • If Array a and Array b do not contain the same element, both of the arrays are not empty, and one or both of the arrays contain a null element, NULL is returned.
    • If Array a and Array b do not contain the same element, and both of the arrays are not empty and do not contain a null element, False is returned.
  • Examples
    • Example 1: Check whether array(1, 2, 3) and array(3, 4, 5) contain the same element. Sample statement:
      -- True is returned. 
      select arrays_overlap(array(1, 2, 3), array(3, 4, 5));
    • Example 2: Check whether array(1, 2, 3) and array(6, 4, 5) contain the same element. Sample statement:
      -- False is returned. 
      select arrays_overlap(array(1, 2, 3), array(6, 4, 5));
    • Example 3: One of the arrays contains a null element. Sample statement:
      -- NULL is returned. 
      select arrays_overlap(array(1, 2, 3), array(5, 4, null));

ARRAYS_ZIP

  • Syntax
    array<struct<T, U, ...>> arrays_zip(array<T> <a>, array<U> <b>[, ...])
  • Description

    Merges multiple given arrays and returns a struct array, in which the Nth struct contains all the Nth elements of the input arrays.

  • Parameters

    a and b: required. These parameters specify arrays that T in array<T> and U in array<U> specify the data types of the elements in the arrays. The elements can be of any data type.

    The following data types are supported:
    • TINYINT, SMALLINT, INT, and BIGINT
    • FLOAT and DOUBLE
    • BOOLEAN
    • DECIMAL and DECIMALVAL
    • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
    • STRING, BINARY, VARCHAR, and CHAR
    • ARRAY, STRUCT, and MAP
  • Return value
    A value of the ARRAY type is returned. The return value varies based on the following rules:
    • The Nth struct in the generated struct array contains all the Nth elements of the input arrays. If an array contains less than N elements, null is used as the Nth element of the array.
    • If one or more input arrays are null, NULL is returned.
  • Examples
    • Example 1: Merge array(1, 2, 3) and array(2, 3, 4) into a struct array. Sample statement:
      -- [{0:1, 1:2}, {0:2, 1:3}, {0:3, 1:4}] is returned. 
      select arrays_zip(array(1, 2, 3), array(2, 3, 4));
    • Example 2: Merge array(1, 2, 3) and array(4, 5) into a struct array. Sample statement:
      -- [{0:1, 1:4}, {0:2, 1:5}, {0:3, 1:null}] is returned. 
      select arrays_zip(array(1, 2, 3), array(4, 5));

CONCAT

  • Syntax
    array<T> concat(array<T> <a>, array<T> <b>[,...])
    string concat(string <str1>, string <str2>[,...])
  • Description
    • Arrays as inputs: Concatenates all elements of multiple arrays and returns a new array.
    • Strings as inputs: Concatenates multiple strings and returns a new string.
  • Parameters
    • a and b: required. These parameters specify arrays that T in array<T> specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type. The null elements are also involved in the operation.
    • str1 and str2: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.
  • Return value
    • A value of the ARRAY type is returned. If one of the input arrays is null, NULL is returned.
    • A value of the STRING type is returned. If no input parameters are configured or an input parameter is set to null, NULL is returned.
  • Examples
    • Example 1: Concatenate all elements of array(10, 20) and array(20, -20). Sample statement:
      -- [10, 20, 20, -20] is returned. 
      select concat(array(10, 20), array(20, -20));
    • Example 2: One of the input arrays contains a null element. Sample statement:
      -- [10, null, 20, -20] is returned. 
      select concat(array(10, null), array(20, -20));
    • Example 3: One of the input arrays is null. Sample statement:
      -- NULL is returned. 
      select concat(array(10, 20), null);
    • Example 4: Concatenate strings aabc and abcde. Sample statement:
      -- aabcabcde is returned. 
      select concat('aabc','abcde');
    • Example 5: The input is empty. Sample statement:
      -- NULL is returned. 
      select concat();
    • Example 6: One of the input strings is null. Sample statement:
      -- NULL is returned. 
      select concat('aabc', 'abcde', null);

EXPLODE

  • Limits
    • A SELECT statement can contain only one EXPLODE function, and no other columns of a table are allowed.
    • This function cannot be used with the GROUP BY, CLUSTER BY, DISTRIBUTE BY, or SORT BY clause.
  • Syntax
    explode (<var>)
  • Description
    Transposes one row of data into multiple rows. This function is a UDTF.
    • If the parameter value is of the array<T> type, the array stored in the column is transposed into multiple rows.
    • If the parameter value is of the map<K, V> type, each key-value pair of the map stored in the column is transposed into one row with two columns. One column is used to store keys, and the other column is used to store values.
  • Parameters

    var: required. The value must be of the array<T> or map<K, V> type.

  • Return value

    Rows after transposition are returned.

  • Examples

    The t_table_map table contains the c1 (BIGINT) and t_map (MAP<STRING,BIGINT>) fields. Data in the table:

      +------------+-------+
    | c1         | t_map |
    +------------+-------+
    | 1000       | {k11:86, k21:15} |
    | 1001       | {k12:97, k22:2} |
    | 1002       | {k13:99, k23:1} |
    +------------+-------+
    Sample statement:
    select explode(t_map) from t_table_map;
    -- The following result is returned: 
    +-----+------------+
    | key | value      |
    +-----+------------+
    | k11 | 86         |
    | k21 | 15         |
    | k12 | 97         |
    | k22 | 2          |
    | k13 | 99         |
    | k23 | 1          |
    +-----+------------+

FILTER

  • Syntax
    array<T> filter(array<T> <a>, function<T,boolean> <func>)
  • Description

    Filters the elements in Array a by using func and returns a new array.

  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.
    • func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to filter the elements in Array a. The value must be of the same data type as the elements in Array a. The output result of the function or expression is of the BOOLEAN type.
  • Return value

    A value of the ARRAY type is returned.

  • Examples
    -- [2, 3] is returned. 
    select filter(array(1, 2, 3), x -> x > 1);

INDEX

  • Syntax
    index(<var1>[<var2>])
  • Description
    • If var1 is of the map<K, V> type, this function obtains the element that is at position var2 in var1. The position numbers of elements are counted from left to right and start from 0.
    • If var1 is of the map<K, V> type, this function obtains the value whose key is var2 in var1.
    Note When you use this function, you must remove index and directly execute <var1>[<var2>]. Otherwise, an error is returned.
  • Parameters
    • var1: required. The value must be of the array<T> or map<K, V> type. T in array<T> specifies the data type of the elements in an array. The elements can be of any data type. K and V in map<K, V> specify the keys and values of a map.
    • var2: required.
      • If var1 is of the array<T> type, var2 must be of the BIGINT type and greater than or equal to 0.
      • If var1 is of the map<K, V> type, var2 must be of the same data type as K.
  • Return value
    • If var1 is of the array<T> type, a value of the data type that is specified by T is returned. The return value varies based on the following rules:
      • If the number of elements in var1 is less than var2, NULL is returned.
      • If var1 is null, NULL is returned.
    • If var1 is of the map<K, V> type, a value that is of the same data type as V is returned. The return value varies based on the following rules:
      • If map<K, V> does not contain a key whose value is var2, NULL is returned.
      • If var1 is null, NULL is returned.
  • Examples
    • Example 1: var1 is of the array<T> type. Sample statement:
      -- c is returned. 
      select array('a','b','c')[2];
    • Example 2: var1 is of the map<K, V> type. Sample statement:
      -- 1 is returned. 
      select str_to_map("test1=1,test2=2")["test1"];

POSEXPLODE

  • Syntax
    posexplode(array<T> <a>)
  • Description

    Converts Array a into a table that has two columns. The first column lists the position of each element in the array, starting from 0. The second column lists the elements.

  • Parameters

    a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

  • Return value

    A table is returned.

  • Examples
    select posexplode(array('a','c','f','b'));
    -- The following result is returned: 
    +------------+------------+
    | pos        | val        |
    +------------+------------+
    | 0          | a          |
    | 1          | c          |
    | 2          | f          |
    | 3          | b          |
    +------------+------------+

SIZE

  • Syntax
    int size(array<T> <a>)
    int size(map<K, V> <b> )
  • Description
    • An array as the input: Calculates the number of elements in Array a.
    • A map as the input: Calculates the number of key-value pairs in Map b.
  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.
    • b: required. This parameter specifies a map. K and V in map<K, V> specify the keys and values of a map.
  • Return value

    A value of the INT type is returned.

  • Examples
    • Example 1: Calculate the number of elements in array('a','b'). Sample statement:
      -- 2 is returned. 
      select size(array('a','b'));
    • Example 2: Calculate the number of key-value pairs in map('a',123,'b',456).
      -- 2 is returned. 
      select size(map('a',123,'b',456)); 

SLICE

  • Syntax
    array<T> slice(array<T> <a>, <start>, <length>)
  • Description

    Copies the elements in an array from the start position based on a specific length and returns the elements as a new array.

  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.
    • start: required. This parameter specifies the position at which the function starts to copy elements from left to right. The minimum positive value of this parameter is 1. You can also set the start parameter to a negative value. In this case, the start position is counted from the end of the array, but the elements are still copied from left to right.
    • length: required. The number of elements in the returned array. The value must be greater than or equal to 0. If the value is greater than the length of the input array, this function returns a new array that consists of the elements from the start position to the end of the input array.
  • Return value

    A value of the ARRAY type is returned.

  • Examples
    • Example 1: Copy the elements in array(10, 20, 20, null, null, 30) from position 1 based on a length of 3. Sample statement:
      -- [10, 20, 20] is returned. 
      select slice(array(10, 20, 20, null, null, 30), 1, 3);
    • Example 2: Copy the elements in array(10, 20, 20, null, null, 30) from position -2 based on a length of 2. Sample statement:
      -- [null, 30] is returned. 
      select slice(array(10, 20, 20, null, null, 30), -2, 2);  
    • Example 3: Copy the elements in array(10, 20, 20, null, null, 30) from position 3 based on a length of 10. Sample statement:
      -- [20, null, null, 30] is returned. 
      select slice(array(10, 20, 20, null, null, 30), 3, 10); 
    • Example 4: Copy the elements in array(10, 20, 20, null, null, 30) from position 3 based on a length of 0. Sample statement:
      -- [] is returned. 
      select slice(array(10, 20, 20, null, null, 30), 3, 0);

SORT_ARRAY

  • Syntax
    array<T> sort_array(array<T> <a>[, <isasc>])
  • Description

    Sorts the elements in an array.

  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.
    • isasc: optional. This parameter specifies whether to sort elements in the array in ascending or descending order. Valid values: true and false. The value true indicates the ascending order and false indicates the descending order. If you do not specify this parameter, the elements are sorted in ascending order.
  • Return value

    A value of the ARRAY type is returned. NULL is interpreted as the minimum value.

  • Examples
    • Example 1: The t_array table contains the c1 ARRAY<STRING>, c2 ARRAY<INT>, and c3 ARRAY<STRING> fields. Data in the table:
      +------------+---------+--------------+
      | c1         | c2      | c3           |
      +------------+---------+--------------+
       [a, c, f, b]   [4, 5, 7, 2, 5, 8]    [You, Me, Him] 
      +------------+---------+--------------+
      Sort data in each column of the table. Sample statement:
      [a, b, c, f] [2, 4, 5, 5, 7, 8] [Him, You, Me] is returned. 
      select sort_array(c1),sort_array(c2),sort_array(c3) from t_array;
    • Example 2: Sort the elements in array(10, 20, 40, 30, 30, null, 50) in descending order. Sample statement:
      -- [50, 40, 30, 30, 20, 10, null] is returned. 
      select sort_array(array(10, 20, 40, 30, 30, null, 50), false);

TRANSFORM

  • Syntax
    array<R> transform(array<T> <a>, function<T, R> <func>)
  • Description

    Transforms the elements in Array a by using func and returns a new array.

  • Parameters
    • a: required. This parameter specifies an array that T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.
    • func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to transform the elements in Array a. The value must be of the same data type as the elements in Array a. R specifies the data type of the output results.
  • Return value

    A value of the ARRAY type is returned.

  • Examples
    -- [2, 3, 4] is returned. 
    select transform(array(1, 2, 3), x -> x + 1);

ZIP_WITH

  • Syntax
    array<R> array_sort(array<T> <a>, array<S> <b>, function<T, S, R> <combiner>)
  • Description

    Merges Array a and Array b at the element level based on element positions and a combiner and returns a new array.

  • Parameters
    • a and b: required. These parameters specify arrays that T in array<T> and S in array<S> specify the data types of the elements in the arrays. The elements can be of any data type.
    • combiner: required. This parameter specifies the built-in function, user-defined function, or expression that is used to merge Array a and Array b at the element level. The combiner consists of two input parameters. One input parameter must be of the same data type as the elements in Array a. The other input parameter must be of the same data type as the elements in Array b.
  • Return value
    A value of the ARRAY type is returned. The return value varies based on the following rules:
    • The elements in the returned array are at the same positions as the elements in Array a and Array b.
    • If Array a and Array b have different lengths, the null elements are added to the shorter array before the arrays are merged.
  • Examples
    -- [2, 4, 6, null] is returned. 
    select zip_with(array(1,2,3), array(1,2,3,4), (x,y) -> x + y);

MAP

  • Syntax
    map(K, V) map(K <key1>, V <value1>, K <key2>, V <value2>[, ...])
  • Description

    Creates a map based on given key-value pairs.

  • Parameters
    • key: required. All keys must be of the same data type after implicit conversions. Only basic data types are supported.
    • value: required. All values must be of the same data type after implicit conversions. All data types are supported.
  • Return value
    Data of the MAP type is returned.
    Note You can specify the odps.sql.map.key.dedup.policy parameter at the session level to configure the method that is used to process duplicate keys. Valid values:
    • exception: An error is returned.
    • last_win: The latter key overwrites the former key.
    If you do not specify the parameter, the default value last_win is used.
  • Examples
    • Example 1: No duplicate keys exist. The t_table table contains the c1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT) fields. Data in the table:
      +------------+----+----+------------+------------+
      | c1         | c2 | c3 | c4         | c5         |
      +------------+----+----+------------+------------+
      | 1000       | k11 | k21 | 86         | 15         |
      | 1001       | k12 | k22 | 97         | 2          |
      | 1002       | k13 | k23 | 99         | 1          |
      +------------+----+----+------------+------------+
      Sample statement:
      -- Define a map based on the key-value pairs between the c2 and c4 columns, and between the c3 and c5 columns. 
      select map(c2,c4,c3,c5) from t_table;
      -- The following result is returned: 
      +------+
      | _c0  |
      +------+
      | {k11:86, k21:15} |
      | {k12:97, k22:2} |
      | {k13:99, k23:1} |
      +------+
    • Example 2: Duplicate keys exist. The t_table table contains the c1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT) fields. Data in the table:
      1000,'k11','k11',86,15
      1001,'k12','k22',97,2
      1002,'k13','k23',99,1
      1003,'k13','k24',100,1
      1004,'k12','k25',95,1
      Sample statement:
      -- Define a map based on the key-value pairs between the c2 and c4 columns, and between the c3 and c5 columns. 
      select map(c2,c4,c3,c5) from t_table;
      -- The following result is returned: 
      +------+
      | _c0  |
      +------+
      | {'k11':15} |
      | {'k12':97, 'k22':2} |
      | {'k13':99, 'k23':1} |
      | {'k13':100, 'k24':1} |
      | {'k12':95, 'k25':1} |
      +------+

MAP_CONCAT

  • Syntax
    map<K, V> map_concat([string <mapDupKeyPolicy>,] map<K, V> <a>, map<K, V> <b>[,...])
  • Description

    Calculates the union of multiple maps.

  • Parameters
    • mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:
      • exception: An error is returned.
      • last_win: The latter key overwrites the former key.

      You can also specify the odps.sql.map.key.dedup.policy parameter at the session level to configure the method that is used to process duplicate keys. For example, you can set odps.sql.map.key.dedup.policy to exception. If you do not specify this parameter, the default value last_win is used.

      Note The behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of odps.sql.map.key.dedup.policy prevails.
    • a and b: required. These parameters specify maps. The keys of the maps must be of the same data type, and the values of the maps must be of the same data type. K and V in map<K, V> specify the keys and values of a map.
  • Return value
    Data of the MAP type is returned. An error is returned in the following scenarios:
    • A map is null or the key of a map is null.
    • The data types of multiple maps are different.
  • Examples
    -- {1:a, 2:b, 3:c} is returned. 
    select map_concat(map(1, 'a', 2, 'b'), map(3, 'c'));
    -- {1:a, 2:d, 3:c} is returned. 
    select map_concat('last_win', map(1, 'a', 2, 'b'), map(3, 'c'), map(2, 'd'));

MAP_ENTRIES

  • Syntax
    array<struct<K, V>> map_entries(map<K, V> <a>): 
  • Description

    Converts key-value pairs in Map a into a struct array.

  • Parameters

    a: required. This parameter specifies a map. K and V in map<K, V> specify the keys and values of a map.

  • Return value

    A struct array is returned. If the input is null, NULL is returned.

  • Examples
    -- [{key:1, value:a}, {key:2, value:b}] is returned. 
    select map_entries(map(1,  'a',  2,  'b'));

MAP_FILTER

  • Syntax
    map<K, V> map_filter(map<K, V> <input>, function <K, V, boolean> <predicate>)
  • Description

    Filters the elements in Map input and retains only the elements that meet the predicate condition.

  • Parameters
    • input: required. A value of the MAP type. K and V in map<K, V> specify the keys and values of a map.
    • predicate: required. This parameter specifies the built-in function, user-defined function, or expression that is used to filter the elements in the map. The predicate condition consists of two input parameters that correspond to the keys and values in input. The output result is of the BOOLEAN type.
  • Return value

    Data of the MAP type is returned.

  • Examples
    -- {-30:100, 20:50} is returned. 
    select map_filter(map(10, -20, 20, 50, -30, 100, 21, null), (k, v) -> (k+v) > 10);

MAP_FROM_ARRAYS

  • Syntax
    map<K, V> map_from_arrays([string <mapDupKeyPolicy>,] array<K> <a>, array<V> <b>))
  • Description

    Creates a map based on Array a and Array b.

  • Parameters
    • mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:
      • exception: An error is returned.
      • last_win: The latter key overwrites the former key.

      You can also specify the odps.sql.map.key.dedup.policy parameter at the session level to configure the method that is used to process duplicate keys. For example, you can set odps.sql.map.key.dedup.policy to exception. If you do not specify this parameter, the default value last_win is used.

      Note The behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of odps.sql.map.key.dedup.policy prevails.
    • a: required. This parameter specifies an array. This parameter corresponds to the key in the generated map. K in array<K> specifies the data type of the elements in the array. The elements can be of any data type.
    • b: required. This parameter specifies an array. This parameter corresponds to the value in the generated map. V in array<V> specifies the data type of the elements in the array. The elements can be of any data type.
  • Return value
    Data of the MAP type is returned. The return value varies based on the following rules:
    • If Array a or Array b is null, NULL is returned.
    • If Array a contains a null element or the two arrays are of different lengths, an error is returned.
  • Examples
    -- {1:2, 3:4} is returned. 
    select map_from_arrays(array(1.0, 3.0), array('2', '4'));
    -- {1:2, 3:6} is returned. 
    select map_from_arrays('last_win', array(1.0, 3.0, 3), array('2', '4', '6'));

MAP_FROM_ENTRIES

  • Syntax
    map<K, V> map_from_entries([string <mapDupKeyPolicy>,] array <struct<K, V> , struct<K, V>[,...]>)
  • Description

    Creates a map based on given struct arrays.

  • Parameters
    • mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:
      • exception: An error is returned.
      • last_win: The latter key overwrites the former key.

      You can also specify the odps.sql.map.key.dedup.policy parameter at the session level to configure the method that is used to process duplicate keys. For example, you can set odps.sql.map.key.dedup.policy to exception. If you do not specify this parameter, the default value last_win is used.

      Note The behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of odps.sql.map.key.dedup.policy prevails.
    • Data of the STRUCT type is required. K corresponds to the keys in the generated map. V corresponds to the values in the generated map. K and V in struct<K, V> specify the keys and values of a struct array.
  • Return value
    Data of the MAP type is returned. The return value varies based on the following rules:
    • If a struct array is null, NULL is returned.
    • If the number of fields in a struct array is not 2 or the key of a struct array is null, an error is returned.
  • Examples
    -- {1:a, 2:b} is returned. 
    select map_from_entries(array(struct(1, 'a'), struct(2, 'b')));
    -- {1:a, 2:c} is returned. 
    select map_from_entries(array(struct(1, 'a'), struct(2, 'b'), struct(2, 'c')));

MAP_KEYS

  • Syntax
    array<K> map_keys(map<K, V> <a>)
  • Description

    Returns all keys in Map a as an array.

  • Parameters

    a: required. This parameter specifies a map. K and V in map<K, V> specify the keys and values of a map.

  • Return value

    A value of the ARRAY type is returned. If the input map is null, NULL is returned.

  • Examples
    The t_table_map table contains the c1 (BIGINT) and t_map (MAP<STRING,BIGINT>) fields. Data in the table:
    +------------+-------+
    | c1         | t_map |
    +------------+-------+
    | 1000       | {k11:86, k21:15} |
    | 1001       | {k12:97, k22:2} |
    | 1002       | {k13:99, k23:1} |
    +------------+-------+
    Sample statement:
    -- Return keys in the t_map column as an array. 
    select c1, map_keys(t_map) from t_table_map;
    -- The following result is returned: 
    +------------+------+
    | c1         | _c1  |
    +------------+------+
    | 1000       | [k11, k21] |
    | 1001       | [k12, k22] |
    | 1002       | [k13, k23] |
    +------------+------+

MAP_VALUES

  • Syntax
    array<V> map_values(map<K, V> <a>)
  • Description

    Returns all values in Map a as an array.

  • Parameters

    a: required. This parameter specifies a map. K and V in map<K, V> specify the keys and values of a map.

  • Return value

    A value of the ARRAY type is returned. If the input map is null, NULL is returned.

  • Examples
    The t_table_map table contains the c1 (BIGINT) and t_map (MAP<STRING,BIGINT>) fields. Data in the table:
    +------------+-------+
    | c1         | t_map |
    +------------+-------+
    | 1000       | {k11:86, k21:15} |
    | 1001       | {k12:97, k22:2} |
    | 1002       | {k13:99, k23:1} |
    +------------+-------+
    Sample statement:
    -- Return keys in the t_map column as an array. 
    select c1,map_values(t_map) from t_table_map;
    -- The following result is returned: 
    +------------+------+
    | c1         | _c1  |
    +------------+------+
    | 1000       | [86, 15] |
    | 1001       | [97, 2] |
    | 1002       | [99, 1] |
    +------------+------+

MAP_ZIP_WITH

  • Syntax
    <K, V1, V2, V3> map<K, V3> map_zip_wuth(map<K, V1> <input1>, map<K, V2> <input2>, function<K, V1, V2, V3> <func>)
  • Description

    Merges Map input1 and Map input2 into a single map. The keys of the new map are the union of the keys of the two input maps. The value of each key of the new map is calculated by using func.

  • Parameters
    • input1 and input2: required. These parameters specify maps. K and V in map<K, V> specify the keys and values of a map.
    • func: required. func consists of three input parameters, which correspond to a key, the value that corresponds to the key in input1, and the value that corresponds to the key in input2. If the key does not exist in input1 or input2, null is used to replace the value that corresponds to the key in func.
  • Return value

    Data of the data type defined by func is returned.

  • Examples
    -- {1:[1, 1, 4], 2:[2, 2, 5], 3:[3, null, null], 4:[4, null, 7]} is returned. 
    select map_zip_with(map(1, 1, 2, 2, 3, null), map(1, 4, 2, 5, 4, 7), (k, v1, v2) -> array(k, v1, v2));

TRANSFORM_KEYS

  • Syntax
    map<K2, V> transform_keys([string <mapDupKeyPolicy>,] map<K1, V> <input>, function<K1, V, K2> <func>)
  • Description

    Transforms the keys in Map input by using func. The values in the map are not changed.

  • Parameters
    • mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:
      • exception: An error is returned.
      • last_win: The latter key overwrites the former key.

      You can also specify the odps.sql.map.key.dedup.policy parameter at the session level to configure the method that is used to process duplicate keys. For example, you can set odps.sql.map.key.dedup.policy to exception. If you do not specify this parameter, the default value last_win is used.

      Note The behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of odps.sql.map.key.dedup.policy prevails.
    • input: required. This parameter specifies a map. K1 and V in map<K1, V> specify the keys and values of a map.
    • func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to transform the keys. The function or expression consists of two input parameters that correspond to the keys and values in input. K2 specifies the data type of keys in the returned map.
  • Return value

    Data of the MAP type is returned. If one of the new keys is null, an error is returned.

  • Examples
    -- {-10:-20, 70:50, 71:101} is returned. 
    select transform_keys(map(10, -20, 20, 50, -30, 101), (k, v) -> k + v);
    -- No error is returned. The returned result depends on the order of the elements in the input map. 
    select transform_keys("last_win", map(10, -20, 20, 50, -30, 100), (k, v) -> k + v);
    -- An error is returned because duplicate keys exist. 
    select transform_keys("exception", map(10, -20, 20, 50, -30, 100), (k, v) -> k + v);

TRANSFORM_VALUES

  • Syntax
    map<K, V2> transform_values(map<K, V1> <input>, function<K, V1, V2> <func>)
  • Description

    Transforms the values in Map input by using func. The keys in the map are not changed.

  • Parameters
    • input: required. This parameter specifies a map. K and V1 in map<K, V1> specify the keys and values of the map.
    • func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to transform the keys. The function or expression consists of two input parameters that correspond to the keys and values in input. V2 specifies the data type of values in the returned map.
  • Return value

    Data of the MAP type is returned.

  • Examples
    -- {-30:71, 10:-10, 20:null} is returned. 
    select transform_values(map(10, -20, 20, null, -30, 101), (k, v) -> k + v);

FIELD

  • Syntax
    T field(struct <s>, string <fieldName>)
  • Description

    Obtains the value of a member variable in a struct.

  • Parameters
    • s: required. This parameter specifies a struct. The struct is in the format of {f1:T1, f2:T2[, ...]}. f1 and f2 specify member variables, T1 specifies the value of f1, and T2 specifies the value of f2.
    • fieldName: required. A value of the STRING type. The value must be one of the member variables in the struct.
  • Return value

    The value of the specific member variable in the struct is returned.

  • Examples
    -- hello is returned. 
    select field(named_struct('f1', 'hello', 'f2', 3), 'f1');

INLINE

  • Syntax
    inline(array<struct<f1:T1, f2:T2[, ...]>>)
  • Description

    Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row.

  • Parameters

    f1:T1 and f2:T2: required. A value of any data type. f1 and f2 specify member variables, T1 specifies the value of f1, and T2 specifies the value of f2.

  • Return value

    The expanded data of the struct array is returned.

  • Examples
    The t_table table contains the t_struct (STRUCT<user_id:BIGINT,user_name:STRING,married:STRING,weight:DOUBLE>) field. Data in the table:
    +----------+
    | t_struct |
    +----------+
    | {user_id:10001, user_name:LiLei, married:N, weight:63.5} |
    | {user_id:10002, user_name:HanMeiMei, married:Y, weight:43.5} |
    +----------+
    Sample statement:
    -- Expand the t_struct column. 
    select inline(array(t_struct)) from t_table;
    -- The following result is returned: 
    +------------+-----------+---------+------------+
    | user_id    | user_name | married | weight     |
    +------------+-----------+---------+------------+
    | 10001      | LiLei     | N       | 63.5       |
    | 10002      | HanMeiMei | Y       | 43.5       |
    +------------+-----------+---------+------------+

STRUCT

  • Syntax
    struct struct(<value1>,<value2>[, ...])
  • Description

    Creates a struct based on a given value list.

  • Parameters

    value: required. Data in the array can be of any data type.

  • Return value

    A value of the STRUCT type is returned. Column names are sequentially named as col1, col2, ...

  • Examples
    -- {col1:a, col2:123, col3:ture, col4:56.9} is returned. 
    select struct('a',123,'ture',56.90);

NAMED_STRUCT

  • Syntax
    struct named_struct(string <name1>, T1 <value1>, string <name2>, T2 <value2>[, ...])
  • Description

    Creates a struct based on given name-value pairs.

  • Parameters
    • value: required. All data types are supported.
    • name: required. The column name of the STRING type. This parameter is a constant.
  • Return value

    A value of the STRUCT type is returned. Column names are sequentially named as name1, name2, ...

  • Examples
    -- {user_id:10001, user_name:LiLei, married:F, weight:63.5} is returned. 
    select named_struct('user_id',10001,'user_name','LiLei','married','F','weight',63.50);

FROM_JSON

  • Syntax
    from_json(<jsonStr>, <schema>)
  • Description

    Returns data of the ARRAY, MAP, or STRUCT type based on JSON string jsonStr and output format schema.

  • Parameters
    • jsonStr: required. The JSON string that you entered.
    • schema: required. The schema of the JSON string. The value of this parameter must be in the same format as that in the statement for creating a table, such as array<bigint>, map<string, array<string>>, or struct<a:int, b:double, `C`:map<string,string>>.
      Note Keys in a struct are case-sensitive. You can also specify a struct in the format of a BIGINT, b DOUBLE, which is equivalent to STRUCT<a:BIGINT, b:DOUBLE>.

      The following table describes the mappings between JSON data types and MaxCompute data types.

      JSON data type MaxCompute data type
      OBJECT STRUCT, MAP, and STRING
      ARRAY ARRAY and STRING
      NUMBER TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, and STRING
      BOOLEAN BOOLEAN and STRING
      STRING STRING, CHAR, VARCHAR, BINARY, DATE, and DATETIME
      NULL All types
      Note The JSON string of the OBJECT and ARRAY types are parsed as much as possible. If the data type of the JSON string is not mapped to any MaxCompute data type, the JSON string is omitted. For ease of use, all JSON data types can be converted into the STRING data type supported by MaxCompute. When you convert a JSON string of the NUMBER type to a value of the FLOAT, DOUBLE, or DECIMAL type, the precision of the value cannot be ensured. We recommend you convert the JSON string to a value of the STRING type and then convert the obtained value to a value of the FLOAT, DOUBLE, or DECIMAL type.
  • Return value

    A value of the ARRAY, MAP, or STRUCT type is returned.

  • Examples
    • Example 1: Convert a specific JSON string into a value of a specific data type. Sample statements:
      -- {"a":1,"b":0.8} is returned. 
      select from_json('{"a":1, "b":0.8}', 'a int, b double');
      -- {"time":"26/08/2015"} is returned. 
      select from_json('{"time":"26/08/2015"}', 'time string');
      -- {"a":1,"b":0.8} is returned. 
      select from_json('{"a":1, "b":0.8}', 'a int, b double, c string');
      -- [1,2,3] is returned. 
      select from_json('[1, 2, 3, "a"]', 'array<bigint>');
      -- {"d":"v","a":"1","b":"[1,2,3]","c":"{}"} is returned. 
      select from_json('{"a":1,"b":[1,2,3],"c":{},"d":"v"}', 'map<string, string>');
    • Example 2: Use the map_keys and from_json functions to obtain all keys in the JSON string. You can also use JSON_KEYS for the same purpose. Sample statement:
      -- ["a","b"] is returned. 
      select map_keys(from_json('{"a":1,"b":2}','map<string,string>'));

GET_JSON_OBJECT

  • Syntax
    string get_json_object(string <json>, string <path>)
  • Description

    Extracts a single string from a standard JSON string based on path. The original data is read each time this function is called. Therefore, repeated calls may affect system performance and increase costs. To prevent repeated calls, you can use the GET_JSON_OBJECT function with UDTFs. For more information, see Convert JSON log data by using MaxCompute built-in functions and UDTFs.

  • Parameters
    • json: required. A value of the STRING type. This parameter specifies a standard JSON object in the format of {Key:Value, Key:Value,...}. If the string contains a double quotation mark ("), use two backslashes (\\) to escape the double quotation mark (") before extraction. If the string contains a single quotation mark ('), use a single backslash (\) to escape the single quotation mark (') before extraction.
    • path: required. A value of the STRING type. This parameter specifies the path in the value of the json parameter and starts with $. For more information about the path parameter, see LanguageManual UDF. For more information about best practices, see Migrate JSON data from OSS to MaxCompute. Different characters have the following meanings:
      • $: indicates the root node.
      • . or ['']: indicates a child node. MaxCompute parses JSON objects by using . or ['']. If a key in a JSON object contains a period (.), [''] can be used.
      • [] ([number]): indicates an array subscript, which starts from 0.
      • *: indicates the wildcard for [], which returns an entire array. An asterisk (*) cannot be escaped.
  • Return value
    • If the json parameter is empty or invalid, NULL is returned.
    • If the format of json is valid and path exists, the related string is returned.
    • To determine how this function returns a value, you can specify odps.sql.udf.getjsonobj.new for a session.
      • If you run the set odps.sql.udf.getjsonobj.new=true; command, this function retains the original strings when it returns a value.
        We recommend that you use this configuration because it results in more standard function return behavior. This facilitates data processing and improves data processing performance. If a MaxCompute project has jobs in which this function escapes JSON reserved characters, we recommend that you retain the original escaping operation to prevent errors caused by lack of verification. The function complies with the following rules when it returns a value:
        • The return value is still a JSON string, which can be parsed as JSON data. You do not need to use the REPLACE or REGEXP_REPLACE function to replace backslashes (\).
        • Duplicate keys are allowed in a JSON object. If duplicate keys exist, the data can be parsed.
          -- 1 is returned. 
          select get_json_object('{"a":"1","a":"2"}', '$.a');
        • The encoded strings that correspond to emojis are supported. However, DataWorks does not allow you to enter emojis. DataWorks allows you to enter only the encoded strings that correspond to emojis to MaxCompute by using a tool, such as Data Integration. DataWorks uses the GET_JSON_OBJECT function to process the data.
          -- An emoji is returned. 
          select get_json_object('{"a":"<Emoji>"}', '$.a');
        • The output results are displayed in alphabetical order.
          -- {"b":"1","a":"2"} is returned. 
          select get_json_object('{"b":"1","a":"2"}', '$');
      • If you run the set odps.sql.udf.getjsonobj.new=false; command, this function escapes JSON reserved characters when it returns a value. The function complies with the following rules when it returns a value:
        • JSON reserved characters, such as line feeds (\n) and double quotation marks (") are displayed as '\n' and '\"'.
        • Each key in a JSON object must be unique. If duplicate keys exist, the data may fail to be parsed.
          -- NULL is returned. 
          select get_json_object('{"a":"1","a":"2"}', '$.a');
        • The encoded strings that correspond to emojis cannot be parsed.
          -- NULL is returned. 
          select get_json_object('{"a":"<Emoji>"}', '$.a');
        • The output results are displayed in alphabetical order.
          -- {"a":"2","b":"1"} is returned. 
          select get_json_object('{"b":"1","a":"2"}', '$');
      Note For MaxCompute projects that are created on or after January 21, 2021, the GET_JSON_OBJECT function retains the original strings when it returns a value. For MaxCompute projects that are created before January 21, 2021, the GET_JSON_OBJECT function escapes JSON reserved characters when it returns a value. The following example helps you determine how the GET_JSON_OBJECT function returns a value in a MaxCompute project.
      select get_json_object('{"a":"[\\"1\\"]"}', '$.a');
      -- Return JSON reserved characters by using escape characters.
      [\"1\"]
      
      -- Return the original strings.
      ["1"]

      You can submit a ticket to request the MaxCompute technical support team to enable the GET_JSON_OBJECT function. This way, you do not need to frequently specify flagodps.sql.udf.getjsonobj.new for a session. This function retains the original strings when it returns a value.

  • Examples
    • Example 1: Extract information from the JSON object src_json.json. Sample statement:
      -- The JSON string src_json.json contains the following content: 
      +----+
      json
      +----+
      {"store":
      {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
      "bicycle":{"price":19.95,"color":"red"}
      },
      "email":"amy@only_for_json_udf_test.net",
      "owner":"amy"
      }
      -- Extract the information of the owner field and return amy. 
      select get_json_object(src_json.json, '$.owner') from src_json;
      -- Extract the information of the first array in the store.fruit field and return {"weight":8,"type":"apple"}. 
      select get_json_object(src_json.json, '$.store.fruit[0]') from src_json;
      -- Extract the information of the non-existent field and return NULL. 
      select get_json_object(src_json.json, '$.non_exist_key') from src_json;
    • Example 2: Extract information from a JSON object of the ARRAY type. Sample statement:
      -- 2222 is returned. 
      select get_json_object('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]');
      -- ["h0","h1","h2"] is returned. 
      set odps.sql.udf.getjsonobj.new=true;
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]');
      -- ["h0","h1","h2"] is returned. 
      set odps.sql.udf.getjsonobj.new=false;
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh');
      -- h1 is returned. 
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]');
    • Example 3: Extract information from a JSON object that contains a period (.). Sample statement:
      -- Create a table. 
      create table mf_json (id string, json string);
      -- Insert data into the table. The key in the data contains a period (.). 
      insert into table mf_json (id, json) values ("1", "{
      \"China.beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\",
      \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}");
      -- Insert data into the table. The key in the data does not contain a period (.). 
      insert into table mf_json (id, json) values ("2", "{
      \"China_beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\",
      \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}");
      -- Query the value of id in the JSON object whose key is China.beijing. 0 is returned. Only [''] can be used to specify the key because the key contains a period (.). This way, MaxCompute can parse the key. 
      select get_json_object(json, "$['China.beijing'].school['id']") from mf_json where id =1;
      -- -- Query the value of id in the JSON object whose key is China_beijing. 0 is returned. You can use one of the following statements: 
      select get_json_object(json, "$['China_beijing'].school['id']") from mf_json where id =2;
      select get_json_object(json, "$.China_beijing.school['id']") from mf_json where id =2;
    • Example 4: The json parameter is empty or invalid. Sample statement:
      -- NULL is returned. 
      select get_json_object('','$.array[1][1]');
      -- NULL is returned. 
      select get_json_object('"array":["aaaa",1111],"bbbb":["cccc",3333]','$.array[1][1]');
    • Example 5: Escape a JSON string. Sample statement:
      set odps.sql.udf.getjsonobj.new=true;
      -- "1" is returned. 
      select get_json_object('{"a":"\\"1\\"","b":"2"}', '$.a'); 
      -- '1' is returned. 
      select get_json_object('{"a":"\'1\'","b":"2"}', '$.a'); 

JSON_TUPLE

  • Syntax
    string json_tuple(string <json>, string <key1>, string <key2>,...)
  • Description

    Extracts strings from a standard JSON string based on a set of input keys, such as (key1,key2,...).

  • Parameters
    • json: required. A value of the STRING type. This parameter specifies a standard JSON string.
    • key: required. A value of the STRING type. This parameter is used to describe the path of a JSON object in the JSON string. The value cannot start with a dollar sign ($). You can enter multiple keys at a time. MaxCompute parses JSON objects by using . or ['']. If a key in a JSON object contains a period (.), [''] can be used.
  • Return value

    A value of the STRING type is returned.

    Note
    • If json is empty or invalid, NULL is returned.
    • If key is empty, invalid, or does not exist in the JSON string, NULL is returned.
    • If json is valid and key exists, the related string is returned.
    • This function can parse JSON data that contains Chinese characters.
    • This function can parse nested JSON data.
    • This function can parse JSON data that contains nested arrays.
    • The parsing action is equivalent to the execution of GET_JSON_OBJECT along with set odps.sql.udf.getjsonobj.new=true;. To obtain multiple objects from a JSON string, you must call the GET_JSON_OBJECT function multiple times. As a result, the JSON string is parsed multiple times. The JSON_TUPLE function allows you to enter multiple keys at a time and the JSON string is parsed only once. JSON_TUPLE is more efficient than GET_JSON_OBJECT.
    • JSON_TUPLE is a UDTF and is used with Lateral View when other columns need to be selected.
  • Examples
    -- Create a table named school. 
    create table school (id string, json string);
    -- Insert data into the table. 
    insert into table mf_json (id, json) values ("1", "{
    \"school\": \"湖畔大学\", \"地址\":\"杭州\", \"SchoolRank\": \"00\", 
    \"Class1\":{\"Student\":[{\"studentId\":1,\"scoreRankIn3Year\":[1,2,[3,2,6]]}, 
    {\"studentId\":2,\"scoreRankIn3Year\":[2,3,[4,3,1]]}]}}");
    • Example 1: Extract JSON object information. Sample statement:
      select json_tuple(school.json,"SchoolRank","Class1") as (item0, item1) from school;
      -- The preceding statement is equivalent to the following statement: 
      select get_json_object(school.json,"$.SchoolRank") item0,get_json_object(school.json,"$.Class1") item1 from school;
      
      -- The following result is returned: 
      +-------+-------+
      | item0 | item1 |
      +-------+-------+
      | 00    | {"Student":[{"studentId":1,"scoreRankIn3Year":[1,2,[3,2,6]]},{"studentId":2,"scoreRankIn3Year":[2,3,[4,3,1]]}]} |
      +-------+-------+
    • Example 2: Extract JSON object information by using ['']. Sample statement:
      select json_tuple(school.json,"school","['Class1'].Student") as (item0, item2) from school where id=1;
      -- The following result is returned: 
      +-------+-------+
      | item0 | item2 |
      +-------+-------+
      | 湖畔大学 | [{"studentId":1,"scoreRankIn3Year":[1,2,[3,2,6]]},{"studentId":2,"scoreRankIn3Year":[2,3,[4,3,1]]}] |
    • Example 3: Parse JSON data that contains Chinese characters. Sample statement:
      select json_tuple(school.json,"校名","地址") as (item0,item1) from school;
      -- The following result is returned: 
      +-------+-------+
      | item0 | item1 |
      +-------+-------+
      | 湖畔大学 | 杭州 |
      +-------+-------+
    • Example 4: Parse nested JSON data. Sample statement:
      select sc.Id, q.item0, q.item1 
      from school sc lateral view json_tuple(sc.json,"Class1.Student[*].studentId","Class1.Student[0].scoreRankIn3Year") q as item0,item1;
      
      -- The following result is returned: 
      +------------+-------+-------+
      | id         | item0 | item1 |
      +------------+-------+-------+
      | 1          | [1,2] | [1,2,[3,2,6]] |
      +------------+-------+-------+
    • Example 5: Parse JSON data that contains nested arrays. Sample statement:
      select sc.Id, q.item0, q.item1
      from school sc lateral view json_tuple(sc.json,"Class1.Student[0].scoreRankIn3Year[2]","Class1.Student[0].scoreRankIn3Year[2][1]") q as item0,item1;
      -- The following result is returned: 
      +------------+-------+-------+
      | id         | item0 | item1 |
      +------------+-------+-------+
      | 1          | [3,2,6] | 2     |
      +------------+-------+-------+

TO_JSON

  • Syntax
    to_json(<expr>)
  • Description

    Converts an expression of a given complex data type into a JSON string.

  • Parameters
    expr: required. The expression of the ARRAY, MAP, or STRUCT type.
    Note If the input expression is of the STRUCT type (struct<key1:value1, key2:value2>), take note of the following points:
    • All keys are converted into lowercase letters when you convert the expression into a JSON string.
    • If a value is null, the key-value pair to which the value belongs is not included in the JSON string that is returned. For example, if value2 is null, key2:value2 is not included in the JSON string that is returned.
  • Examples
    • Example 1: Convert an expression of a given data type into a JSON string. Sample statement:
      -- {"a":1,"b":2} is returned. 
      select to_json(named_struct('a', 1, 'b', 2));
      -- {"time":"26/08/2015"} is returned. 
      select to_json(named_struct('time', "26/08/2015"));
      -- [{"a":1,"b":2}] is returned. 
      select to_json(array(named_struct('a', 1, 'b', 2)));
      -- {"a":{"b":1}} is returned. 
      select to_json(map('a', named_struct('b', 1)));
      -- {"a":1} is returned. 
      select to_json(map('a', 1));
      -- [{"a":1}] is returned. 
      select to_json(array((map('a', 1))));
    • Example 2: The input expression is of the STRUCT type. Sample statement:
      -- {"a":"B"} is returned. If the expression of the STRUCT type is converted into a JSON string, all keys are converted into lowercase letters. 
      select to_json(named_struct("A", "B"));
      -- {"k2":"v2"} is returned. The key-value pair to which null belongs is not included in the JSON string that is returned. 
      select to_json(named_struct("k1", cast(null as string), "k2", "v2"));