You can use complex type functions in MaxCompute SQL to process data of complex data types, such as ARRAY, MAP, STRUCT, and JSON. This topic describes the syntax and parameters of complex type functions that are supported by MaxCompute SQL, and provides examples on how to use complex type functions. This topic guides you through data development by using complex type functions.
The following table describes the complex type functions that are supported by MaxCompute SQL.
Function type | Function | Description |
---|---|---|
ARRAY functions | ALL_MATCH | Checks whether all the elements in an array meet a specific condition. |
ANY_MATCH | Checks whether an element in an array meets a specific condition. | |
ARRAY | Creates an array based on given values. | |
ARRAY_CONTAINS | Checks whether an array contains a given value. | |
ARRAY_DISTINCT | Removes duplicate elements from an array. | |
ARRAY_EXCEPT | Finds the elements that exist in Array A but do not exist in Array B and returns the elements as a new array without duplicates. | |
ARRAY_INTERSECT | Calculates the intersection of two arrays. | |
ARRAY_JOIN | Concatenates the elements in an array by using a delimiter. | |
ARRAY_MAX | Returns the largest element in an array. | |
ARRAY_MIN | Returns the smallest element in an array. | |
ARRAY_POSITION | Returns the position of the first occurrence of a given element in an array. | |
ARRAY_REDUCE | Aggregates the elements in an array. | |
ARRAY_REMOVE | Removes a given element from an array. | |
ARRAY_REPEAT | Returns a new array in which a given element is repeated several times. | |
ARRAY_SORT | Sorts the elements in an array based on a comparator. | |
ARRAY_UNION | Calculates the union of two arrays and returns the union as a new array without duplicates. | |
ARRAYS_OVERLAP | Checks whether two arrays contain the same element. | |
ARRAYS_ZIP | Merges multiple arrays. | |
CONCAT | Concatenates multiple arrays or strings. | |
EXPLODE | Transposes one row of data into multiple rows. This function is a user-defined table-valued function (UDTF). | |
FILTER | Filters the elements in an array. | |
INDEX | Returns the element at a specific position in an array. | |
POSEXPLODE | Converts an array into a table that has two columns. The first column lists the position of each element in the array, starting from 0. The second column lists the elements. | |
SIZE | Returns the number of elements in an array. | |
SLICE | Copies the elements in an array from a specific position based on a specific length and returns the elements as a new array. | |
SORT_ARRAY | Sorts the elements in an array. | |
TRANSFORM | Transforms the elements in an array. | |
ZIP_WITH | Merges two arrays at the element level based on element positions and returns a new array. | |
MAP functions | EXPLODE | Transposes one row of data into multiple rows. This function is a UDTF. |
INDEX | Returns the value that meets a specific condition in a map. | |
MAP | Creates a map based on given key-value pairs. | |
MAP_CONCAT | Returns the union of multiple maps. | |
MAP_ENTRIES | Converts key-value pairs in a map into a struct array. | |
MAP_FILTER | Filters the elements in a map. | |
MAP_FROM_ARRAYS | Creates a map based on given arrays. | |
MAP_FROM_ENTRIES | Creates a map based on given struct arrays. | |
MAP_KEYS | Returns all keys in a map as an array. | |
MAP_VALUES | Returns all values in a map as an array. | |
MAP_ZIP_WITH | Merges two given maps into a single map. | |
SIZE | Returns the number of key-value pairs in a map. | |
TRANSFORM_KEYS | Transforms the keys in a map by using a given function. The values in the map are not changed. | |
TRANSFORM_VALUES | Transforms the values in a map by using a given function. The keys in the map are not changed. | |
STRUCT functions | FIELD | Obtains the value of a member variable in a struct. |
INLINE | Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row. | |
STRUCT | Creates a struct based on a given value list. | |
NAMED_STRUCT | Creates a struct based on given name-value pairs. | |
JSON functions | FROM_JSON | Returns data of the ARRAY, MAP, or STRUCT type based on a given JSON string and a given output format. |
GET_JSON_OBJECT | Extracts a single string from a standard JSON string by using a specific method. | |
JSON_TUPLE | Extracts strings from a standard JSON string based on a set of input keys. | |
TO_JSON | Converts data of a complex data type into a JSON string. |
ALL_MATCH
- Syntax
boolean all_match(array<T> <a>, function<T, boolean> <predicate>)
- Description
Checks whether all the elements in Array a meet the predicate condition.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type. - predicate: required. This parameter specifies the built-in function, user-defined function, or expression that is used to determine whether all the elements in Array a meet requirements. The data type of the input parameter must be the same as the data type of the elements in Array a.
- a: required. This parameter specifies an array that
- Return value
A value of the BOOLEAN type is returned. The return value varies based on the following rules:
- If all the elements in Array a meet the predicate condition or the array is empty, True is returned.
- If one or more elements in Array a do not meet the predicate condition, False is returned.
- If an element in Array a is null and other elements meet the predicate condition, NULL is returned.
- Examples
- Example 1: Check whether all the elements in
array(4, 5, 6)
meet thex-> x > 3
condition. This condition specifies that all elements are greater than 3. Sample statement:-- True is returned. select all_match(array(4, 5, 6), x -> x>3);
- Example 2: The array is empty. Sample statement:
-- True is returned. select all_match(array(), x -> x>3);
- Example 3: Check whether all the elements in
array(1, 2, -10, 100, -30)
meet thex-> x > 3
condition. Sample statement:-- False is returned. select all_match(array(1, 2, -10, 100, -30), x -> x>3);
- Example 4: Check whether all the elements in
array(10, 100, 30, null)
meet thex-> x > 3
condition. Sample statement:-- NULL is returned. select all_match(array(10, 100, 30, null), x -> x>3);
- Example 1: Check whether all the elements in
ANY_MATCH
- Syntax
boolean any_match(array<T> <a>, function<T, boolean> <predicate>)
- Description
Checks whether an element in Array a meets the predicate condition.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type. - predicate: required. This parameter specifies the built-in function, user-defined function, or expression that is used to determine whether all the elements in Array a meet requirements. The data type of the input parameter must be the same as the data type of the elements in Array a.
- a: required. This parameter specifies an array that
- Return value
A value of the BOOLEAN type is returned. The return value varies based on the following rules:
- If one or more elements in Array a meet the predicate condition, True is returned.
- If no elements in Array a meet the predicate condition or the array is empty, False is returned.
- If an element in Array a is null and other elements do not meet the predicate condition, NULL is returned.
- Examples
- Example 1: Check whether an element in
array(1, 2, -10, 100, -30)
meets thex-> x > 3
condition. Sample statement:-- True is returned. select any_match(array(1, 2, -10, 100, -30), x-> x > 3);
- Example 2: The array is empty. Sample statement:
-- False is returned. select any_match(array(), x-> x > 3);
- Example 3: Check whether an element in
array(1, 2, -10, -20, -30)
meets thex-> x > 3
condition. Sample statement:-- False is returned. select any_match(array(1, 2, -10, -20, -30), x-> x > 3);
- Example 4: Check whether an element in
array(1, 2, null, -10)
meets thex-> x > 3
condition. Sample statement:-- NULL is returned. select any_match(array(1, 2, null, -10), x-> x > 3);
- Example 1: Check whether an element in
ARRAY
- Syntax
array array(<value>,<value>[, ...])
- Description
Creates an array based on given values.
- Parameters
value: required. All data types are supported. All values must be of the same data type.
- Return value
A value of the ARRAY type is returned.
- Examples
The
t_table
table contains thec1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT)
fields. Data in the table:
Sample statement:+------------+----+----+------------+------------+ | c1 | c2 | c3 | c4 | c5 | +------------+----+----+------------+------------+ | 1000 | k11 | k21 | 86 | 15 | | 1001 | k12 | k22 | 97 | 2 | | 1002 | k13 | k23 | 99 | 1 | +------------+----+----+------------+------------+
-- Create an array based on the data in the c2, c4, c3, and c5 columns. select array(c2,c4,c3,c5) from t_table; -- The following result is returned: +------+ | _c0 | +------+ | [k11, 86, k21, 15] | | [k12, 97, k22, 2] | | [k13, 99, k23, 1] | +------+
ARRAY_CONTAINS
- Syntax
boolean array_contains(array<T> <a>, value <v>)
- Description
Checks whether Array a contains Element v.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type. - v: required. This parameter specifies the element that you want to check. The value of v must be of the same data type as the elements in Array a.
- a: required. This parameter specifies an array that
- Return value
A value of the BOOLEAN type is returned.
- Examples
The
t_table_array
table contains thec1 (BIGINT) and t_array (ARRAY<STRING>)
fields. Data in the table:
Sample statement:+------------+---------+ | c1 | t_array | +------------+---------+ | 1000 | [k11, 86, k21, 15] | | 1001 | [k12, 97, k22, 2] | | 1002 | [k13, 99, k23, 1] | +------------+---------+
-- Check whether the t_array column contains the value 1. select c1, array_contains(t_array,'1') from t_table_array; -- The following result is returned: +------------+------+ | c1 | _c1 | +------------+------+ | 1000 | false | | 1001 | false | | 1002 | true | +------------+------+
ARRAY_DISTINCT
- Syntax
array<T> array_distinct(array<T> <a>)
- Description
Removes duplicate elements from Array a.
- Parameters
a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type. - Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
- The returned array has no duplicate elements, and the elements are sorted in the same order as Array a.
- If an element in Array a is null, the null element is involved in the operation.
- If the input array is empty, an empty array is returned.
- Examples
- Example 1: Remove duplicate elements from
array(10, 20, 30, 30, 20, 10)
. Sample statement:-- [10,20,30] is returned. select array_distinct(array(10, 20, 30, 30, 20, 10));
- Example 2: Remove duplicate elements from
array(10, 20, 20, null, null, 30, 20, null)
. Sample statement:-- [10,20,null,30] is returned. select array_distinct(array(10, 20, 20, null, null, 30, 20, null));
- Example 3: The array is empty. Sample statement:
-- [] is returned. select array_distinct(array());
- Example 1: Remove duplicate elements from
ARRAY_EXCEPT
- Syntax
array<T> array_except(array<T> <a>, array<T> <b>)
- Description
Finds the elements that exist in Array a but do not exist in Array b and returns the elements as a new array without duplicates.
- Parameters
a and b: required. These parameters specify arrays that
T
inarray<T>
specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type. - Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
- The returned array has no duplicate elements, and the elements are sorted in the same order as Array a.
- If an element in an array is null, the null element is involved in the operation.
- If one of the input arrays is empty, a new array is returned based on the non-empty array without duplicates.
- If the two input arrays are empty, an empty array is returned.
- Examples
- Example 1: Find the elements that exist in
array(1, 1, 3, 3, 5, 5)
but do not exist inarray(1, 1, 2, 2, 3, 3)
and return the elements as a new array without duplicates. Sample statement:-- [5] is returned. select array_except(array(1, 1, 3, 3, 5, 5), array(1, 1, 2, 2, 3, 3));
- Example 2: Find the elements that exist in
array(1, 1, 3, 3, 5, 5, null, null)
but do not exist inarray(1, 1, 2, 2, 3, 3)
and return the elements as a new array without duplicates. Sample statement:-- [5,null] is returned. select array_except(array(1, 1, 3, 3, 5, 5, null, null), array(1, 1, 2, 2, 3, 3));
- Example 3: One of the input arrays is empty. Sample statement:
-- [2,1] is returned. select array_except(array(2, 1, 1, 2), cast(array() as array<int>));
- Example 4: The two input arrays are empty. Sample statement:
-- [] is returned. select array_except(cast(array() as array<int>), cast(array() as array<int>));
- Example 1: Find the elements that exist in
ARRAY_INTERSECT
- Syntax
array<T> array_intersect(array<T> <a>, array<T> <b>)
- Description
Calculates the intersection of Array a and Array b and removes duplicate elements.
- Parameters
a and b: required. These parameters specify arrays that
T
inarray<T>
specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type. - Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
- If an element in an array is null, the null element is involved in the operation.
- The returned array has no duplicate elements, and the elements are sorted in the same order as Array a.
- If Array a or Array b is null, NULL is returned.
- Examples
- Example 1: Calculate the intersection of
array(1, 2, 3)
andarray(1, 3, 5)
, and remove duplicate elements. Sample statement:-- [1,3] is returned. select array_intersect(array(1, 2, 3), array(1, 3, 5));
- Example 2: Calculate the intersection of
array(10, 20, 20, 30, 30, null, null)
andarray(30, 30, 20, 20, 40, null, null)
, and remove duplicate elements. Sample statement:-- [20,30,null] is returned. select array_intersect(array(10, 20, 20, 30, 30, null, null), array(30, 30, 20, 20, 40, null, null));
- Example 1: Calculate the intersection of
ARRAY_JOIN
- Syntax
array_join(array<T> <a>, <delimiter>[, <nullreplacement>])
- Description
Concatenates the elements in Array a by using a delimiter. If the array contains a null element, use nullreplacement to specify the string that you want to use to replace the null element in the results. If you do not configure nullreplacement, the null element is ignored.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array.Note If the elements in the array are not of the STRING type, MaxCompute converts the data type of the elements into STRING. - delimiter: required. A value of the STRING type. This parameter specifies the string that is used to separate the concatenated elements in Array a.
- nullreplacement: optional. This parameter specifies the string that is used to replace null elements.
- a: required. This parameter specifies an array that
- Return value
A value of the STRING type is returned.
- Examples
-- 10,20,20,30 is returned. select array_join(array(10, 20, 20, null, null, 30), ","); -- 10##20##20##null##null##30 is returned. select array_join(array(10, 20, 20, null, null, 30), "##", "null");
ARRAY_MAX
- Syntax
T array_max(array<T> <a>)
- Description
Returns the largest element in Array a.
- Parameters
a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array.The following data types are supported:- TINYINT, SMALLINT, INT, and BIGINT
- FLOAT and DOUBLE
- BOOLEAN
- DECIMAL and DECIMALVAL
- DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
- STRING, BINARY, VARCHAR, and CHAR
- ARRAY, STRUCT, and MAP
- Return value
The largest element in Array a is returned. The return value varies based on the following rules:
- If Array a is null, NULL is returned.
- If an element in Array a is null, the null element is not involved in the operation.
- Examples
-- 20 is returned. select array_max(array(1, 20, null, 3));
ARRAY_MIN
- Syntax
T array_min(array<T> <a>)
- Description
Returns the smallest element in Array a.
- Parameters
a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array.The following data types are supported:- TINYINT, SMALLINT, INT, and BIGINT
- FLOAT and DOUBLE
- BOOLEAN
- DECIMAL and DECIMALVAL
- DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
- STRING, BINARY, VARCHAR, and CHAR
- ARRAY, STRUCT, and MAP
- Return value
The smallest element in Array a is returned. The return value varies based on the following rules:
- If Array a is null, NULL is returned.
- If an element in Array a is null, the null element is not involved in the operation.
- Examples
-- 1 is returned. select array_min(array(1, 20, null, 3));
ARRAY_POSITION
- Syntax
bigint array_position(array<T> <a>, T <element>)
- Description
Returns the position of the first occurrence of a given element in Array a. The position numbers of elements are counted from left to right and start from 1.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The following data types are supported:- TINYINT, SMALLINT, INT, and BIGINT
- FLOAT and DOUBLE
- BOOLEAN
- DECIMAL and DECIMALVAL
- DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
- STRING, BINARY, VARCHAR, and CHAR
- ARRAY, STRUCT, and MAP
- element: required. The element whose position you want to query. The data type of this parameter must be the same as the data type of the elements in Array a.
- a: required. This parameter specifies an array that
- Return value
A value of the BIGINT type is returned. The return value varies based on the following rules:
- If Array a or element is null, NULL is returned.
- If the specific element is not found, 0 is returned.
- Examples
- Example 1: Return the position of the first occurrence of
1
inarray(3, 2, 1)
. Sample statement:-- 3 is returned. select array_position(array(3, 2, 1), 1);
- Example 2: element is null. Sample statement:
-- NULL is returned. select array_position(array(3, 1, null), null);
- Example 1: Return the position of the first occurrence of
ARRAY_REMOVE
- Syntax
array<T> array_remove(array<T> <a>, T <element>)
- Description
Removes a given element from Array a.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The following data types are supported:- TINYINT, SMALLINT, INT, and BIGINT
- FLOAT and DOUBLE
- BOOLEAN
- DECIMAL and DECIMALVAL
- DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
- STRING, BINARY, VARCHAR, and CHAR
- ARRAY, STRUCT, and MAP
- element: required. The element that you want to remove. The data type of this parameter must be the same as the data type of the elements in Array a.
- a: required. This parameter specifies an array that
- Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
- If an element in Array a is null, the null element is not involved in the operation.
- If Array a or element is null, NULL is returned.
- If Array a does not contain the specific element, Array a is returned.
- Examples
- Example 1: Remove
1
fromarray(3, 2, 1)
. Sample statement:-- [3,2] is returned. select array_remove(array(3, 2, 1), 1);
- Example 2: element is null. Sample statement:
-- NULL is returned. select array_remove(array(3, 1, null), null);
- Example 3: Remove
2
fromarray(3, 1, null)
. Sample statement:-- [3,1,null] is returned. select array_remove(array(3, 1, null), 2);
- Example 1: Remove
ARRAY_REDUCE
- Syntax
R array_reduce(array<T> <a>, buf <init>, function<buf, T, buf> <merge>, function<buf, R> <final>)
- Description
Aggregates the elements in Array a.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type. - init: required. The initial value of the intermediate result that is used to aggregate elements.
- merge: required. A built-in function, user-defined function, or expression that is used to perform an operation on each element in Array a and the intermediate result. This function or expression uses the elements of Array a and the init parameter as input parameters.
- final: required. A built-in function, user-defined function, or expression that is used to convert the intermediate result into the final result. This function or expression uses the result of merge as the input parameter. R specifies the data type of the output.
- a: required. This parameter specifies an array that
- Return value
The data type of the return value is the same as the data type that is specified for final.
- Examples
-- 6 is returned. select array_reduce(array(1, 2, 3), 0, (buf, e)->buf + e, buf->buf); -- 2.5 is returned. select array_reduce(array(1, 2, 3, 4), named_struct('sum', 0, 'count', 0), (buf, e)->named_struct('sum', buf.sum + e, 'count', buf.count + 1), buf -> buf.sum / buf.count);
ARRAY_REPEAT
- Syntax
array<T> array_repeat(T <element>, int <count>)
- Description
Returns a new array in which element t is repeated count times.
- Parameters
- t: required. This parameter specifies the element that you want to repeat. The following
data types are supported:
- TINYINT, SMALLINT, INT, and BIGINT
- FLOAT and DOUBLE
- BOOLEAN
- DECIMAL and DECIMALVAL
- DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
- STRING, BINARY, VARCHAR, and CHAR
- ARRAY, STRUCT, and MAP
- count: required. This parameter specifies the number of repetitions. A value of the INT type is required. The value must be greater than or equal to 0.
- t: required. This parameter specifies the element that you want to repeat. The following
data types are supported:
- Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
- If the value of count is null, NULL is returned.
- If the value of count is less than 0, an empty array is returned.
- Examples
- Example 1: Repeat
123
twice
and return the new array. Sample statement:-- [123, 123] is returned. select array_repeat('123', 2);
- Example 2: The value of count is null. Sample statement:
-- NULL is returned. select array_repeat('123', null);
- Example 3: The value of count is less than 0. Sample statement:
-- [] is returned. select array_repeat('123', -1);
- Example 1: Repeat
ARRAY_SORT
- Syntax
array<T> array_sort(array<T> <a>, function<T, T, bigint> <comparator>)
- Description
Sorts the elements in Array a based on a comparator.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type. - comparator: required. A built-in function, user-defined function, or expression that is used
to compare two elements in the array.
Processing logic of
comparator(a, b)
: If a is equal to b, 0 is returned. If a is less than b, a negative integer is returned. If a is greater than b, a positive integer is returned. Ifcomparator(a, b)
returns NULL, an error is returned.
- a: required. This parameter specifies an array that
- Return value
A value of the ARRAY type is returned.
- Examples
-- [{"a":1,"b":10},{"a":2,"b":12},{"a":3,"b":11}] is returned. select array_sort(a, (a,b)->case when a.a> b.a then 1L when a.a=b.a then 0L else -1L end) from values ( array(named_struct('a', 1, 'b', 10), named_struct('a', 3, 'b', 11), named_struct('a', 2, 'b', 12))) as t(a);
ARRAY_UNION
- Syntax
array<T> array_union(array<T> <a>, array<T> <b>)
- Description
Calculates the union of Array a and Array b, and removes duplicate elements.
- Parameters
a and b: required. These parameters specify arrays that
T
inarray<T>
specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.The following data types are supported:- TINYINT, SMALLINT, INT, and BIGINT
- FLOAT and DOUBLE
- BOOLEAN
- DECIMAL and DECIMALVAL
- DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
- STRING, BINARY, VARCHAR, and CHAR
- ARRAY, STRUCT, and MAP
- Return value
A value of the ARRAY type is returned. If Array a or Array b is null, NULL is returned.
- Examples
- Example 1: Calculate the union of
array(1, 2, 3)
andarray(1, 3, 5)
, and remove duplicate elements. Sample statement:-- [1,2,3,5] is returned. select array_union(array(1, 2, 3), array(1, 3, 5));
- Example 2: One of the arrays is null. Sample statement:
-- NULL is returned. select array_union(array(1, 2, 3), null);
- Example 1: Calculate the union of
ARRAYS_OVERLAP
- Syntax
boolean arrays_overlap(array<T> <a>, array<T> <b>)
- Description
Checks whether Array a and Array b contain the same element.
- Parameters
a and b: required. These parameters specify arrays that
T
inarray<T>
specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.The following data types are supported:- TINYINT, SMALLINT, INT, and BIGINT
- FLOAT and DOUBLE
- BOOLEAN
- DECIMAL and DECIMALVAL
- DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
- STRING, BINARY, VARCHAR, and CHAR
- ARRAY, STRUCT, and MAP
- Return value
A value of the BOOLEAN type is returned. The return value varies based on the following rules:
- If Array a contains at least one element that is in Array b and is not null, True is returned.
- If Array a and Array b do not contain the same element, both of the arrays are not empty, and one or both of the arrays contain a null element, NULL is returned.
- If Array a and Array b do not contain the same element, and both of the arrays are not empty and do not contain a null element, False is returned.
- Examples
- Example 1: Check whether
array(1, 2, 3)
andarray(3, 4, 5)
contain the same element. Sample statement:-- True is returned. select arrays_overlap(array(1, 2, 3), array(3, 4, 5));
- Example 2: Check whether
array(1, 2, 3)
andarray(6, 4, 5)
contain the same element. Sample statement:-- False is returned. select arrays_overlap(array(1, 2, 3), array(6, 4, 5));
- Example 3: One of the arrays contains a null element. Sample statement:
-- NULL is returned. select arrays_overlap(array(1, 2, 3), array(5, 4, null));
- Example 1: Check whether
ARRAYS_ZIP
- Syntax
array<struct<T, U, ...>> arrays_zip(array<T> <a>, array<U> <b>[, ...])
- Description
Merges multiple given arrays and returns a struct array, in which the Nth struct contains all the Nth elements of the input arrays.
- Parameters
a and b: required. These parameters specify arrays that
T
inarray<T>
andU
inarray<U>
specify the data types of the elements in the arrays. The elements can be of any data type.The following data types are supported:- TINYINT, SMALLINT, INT, and BIGINT
- FLOAT and DOUBLE
- BOOLEAN
- DECIMAL and DECIMALVAL
- DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
- STRING, BINARY, VARCHAR, and CHAR
- ARRAY, STRUCT, and MAP
- Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
- The Nth struct in the generated struct array contains all the Nth elements of the input arrays. If an array contains less than N elements, null is used as the Nth element of the array.
- If one or more input arrays are null, NULL is returned.
- Examples
- Example 1: Merge
array(1, 2, 3)
andarray(2, 3, 4)
into a struct array. Sample statement:-- [{0:1, 1:2}, {0:2, 1:3}, {0:3, 1:4}] is returned. select arrays_zip(array(1, 2, 3), array(2, 3, 4));
- Example 2: Merge
array(1, 2, 3)
andarray(4, 5)
into a struct array. Sample statement:-- [{0:1, 1:4}, {0:2, 1:5}, {0:3, 1:null}] is returned. select arrays_zip(array(1, 2, 3), array(4, 5));
- Example 1: Merge
CONCAT
- Syntax
array<T> concat(array<T> <a>, array<T> <b>[,...]) string concat(string <str1>, string <str2>[,...])
- Description
- Arrays as inputs: Concatenates all elements of multiple arrays and returns a new array.
- Strings as inputs: Concatenates multiple strings and returns a new string.
- Parameters
- a and b: required. These parameters specify arrays that
T
inarray<T>
specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type. The null elements are also involved in the operation. - str1 and str2: required. A value of the STRING type. If the input value is of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, it is implicitly converted into a value of the STRING type before calculation. If the input value is of another data type, an error is returned.
- a and b: required. These parameters specify arrays that
- Return value
- A value of the ARRAY type is returned. If one of the input arrays is null, NULL is returned.
- A value of the STRING type is returned. If no input parameters are configured or an input parameter is set to null, NULL is returned.
- Examples
- Example 1: Concatenate all elements of
array(10, 20)
andarray(20, -20)
. Sample statement:-- [10, 20, 20, -20] is returned. select concat(array(10, 20), array(20, -20));
- Example 2: One of the input arrays contains a null element. Sample statement:
-- [10, null, 20, -20] is returned. select concat(array(10, null), array(20, -20));
- Example 3: One of the input arrays is null. Sample statement:
-- NULL is returned. select concat(array(10, 20), null);
- Example 4: Concatenate strings
aabc
andabcde
. Sample statement:-- aabcabcde is returned. select concat('aabc','abcde');
- Example 5: The input is empty. Sample statement:
-- NULL is returned. select concat();
- Example 6: One of the input strings is null. Sample statement:
-- NULL is returned. select concat('aabc', 'abcde', null);
- Example 1: Concatenate all elements of
EXPLODE
- Limits
- A
SELECT
statement can contain only oneEXPLODE
function, and no other columns of a table are allowed. - This function cannot be used with the
GROUP BY
,CLUSTER BY
,DISTRIBUTE BY
, orSORT BY
clause.
- A
- Syntax
explode (<var>)
- Description
Transposes one row of data into multiple rows. This function is a UDTF.
- If the parameter value is of the
array<T>
type, the array stored in the column is transposed into multiple rows. - If the parameter value is of the
map<K, V>
type, each key-value pair of the map stored in the column is transposed into one row with two columns. One column is used to store keys, and the other column is used to store values.
- If the parameter value is of the
- Parameters
var: required. The value must be of the
array<T>
ormap<K, V>
type. - Return value
Rows after transposition are returned.
- Examples
The
t_table_map
table contains thec1 (BIGINT) and t_map (MAP<STRING,BIGINT>)
fields. Data in the table:
Sample statement:+------------+-------+ | c1 | t_map | +------------+-------+ | 1000 | {k11:86, k21:15} | | 1001 | {k12:97, k22:2} | | 1002 | {k13:99, k23:1} | +------------+-------+
select explode(t_map) from t_table_map; -- The following result is returned: +-----+------------+ | key | value | +-----+------------+ | k11 | 86 | | k21 | 15 | | k12 | 97 | | k22 | 2 | | k13 | 99 | | k23 | 1 | +-----+------------+
FILTER
- Syntax
array<T> filter(array<T> <a>, function<T,boolean> <func>)
- Description
Filters the elements in Array a by using func and returns a new array.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type. - func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to filter the elements in Array a. The value must be of the same data type as the elements in Array a. The output result of the function or expression is of the BOOLEAN type.
- a: required. This parameter specifies an array that
- Return value
A value of the ARRAY type is returned.
- Examples
-- [2, 3] is returned. select filter(array(1, 2, 3), x -> x > 1);
INDEX
- Syntax
index(<var1>[<var2>])
- Description
- If var1 is of the
map<K, V>
type, this function obtains the element that is at position var2 in var1. The position numbers of elements are counted from left to right and start from 0. - If var1 is of the
map<K, V>
type, this function obtains the value whose key is var2 in var1.
Note When you use this function, you must removeindex
and directly execute<var1>[<var2>]
. Otherwise, an error is returned. - If var1 is of the
- Parameters
- var1: required. The value must be of the
array<T>
ormap<K, V>
type.T
inarray<T>
specifies the data type of the elements in an array. The elements can be of any data type.K
andV
inmap<K, V>
specify the keys and values of a map. - var2: required.
- If var1 is of the
array<T>
type, var2 must be of the BIGINT type and greater than or equal to 0. - If var1 is of the
map<K, V>
type, var2 must be of the same data type as K.
- If var1 is of the
- var1: required. The value must be of the
- Return value
- If var1 is of the
array<T>
type, a value of the data type that is specified by T is returned. The return value varies based on the following rules:- If the number of elements in var1 is less than var2, NULL is returned.
- If var1 is null, NULL is returned.
- If var1 is of the
map<K, V>
type, a value that is of the same data type as V is returned. The return value varies based on the following rules:- If
map<K, V>
does not contain a key whose value is var2, NULL is returned. - If var1 is null, NULL is returned.
- If
- If var1 is of the
- Examples
- Example 1: var1 is of the
array<T>
type. Sample statement:-- c is returned. select array('a','b','c')[2];
- Example 2: var1 is of the
map<K, V>
type. Sample statement:-- 1 is returned. select str_to_map("test1=1,test2=2")["test1"];
- Example 1: var1 is of the
POSEXPLODE
- Syntax
posexplode(array<T> <a>)
- Description
Converts Array a into a table that has two columns. The first column lists the position of each element in the array, starting from 0. The second column lists the elements.
- Parameters
a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type. - Return value
A table is returned.
- Examples
select posexplode(array('a','c','f','b')); -- The following result is returned: +------------+------------+ | pos | val | +------------+------------+ | 0 | a | | 1 | c | | 2 | f | | 3 | b | +------------+------------+
SIZE
- Syntax
int size(array<T> <a>) int size(map<K, V> <b> )
- Description
- An array as the input: Calculates the number of elements in Array a.
- A map as the input: Calculates the number of key-value pairs in Map b.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type. - b: required. This parameter specifies a map.
K
andV
inmap<K, V>
specify the keys and values of a map.
- a: required. This parameter specifies an array that
- Return value
A value of the INT type is returned.
- Examples
- Example 1: Calculate the number of elements in
array('a','b')
. Sample statement:-- 2 is returned. select size(array('a','b'));
- Example 2: Calculate the number of key-value pairs in
map('a',123,'b',456)
.-- 2 is returned. select size(map('a',123,'b',456));
- Example 1: Calculate the number of elements in
SLICE
- Syntax
array<T> slice(array<T> <a>, <start>, <length>)
- Description
Copies the elements in an array from the start position based on a specific length and returns the elements as a new array.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type. - start: required. This parameter specifies the position at which the function starts to copy elements from left to right. The minimum positive value of this parameter is 1. You can also set the start parameter to a negative value. In this case, the start position is counted from the end of the array, but the elements are still copied from left to right.
- length: required. The number of elements in the returned array. The value must be greater than or equal to 0. If the value is greater than the length of the input array, this function returns a new array that consists of the elements from the start position to the end of the input array.
- a: required. This parameter specifies an array that
- Return value
A value of the ARRAY type is returned.
- Examples
- Example 1: Copy the elements in
array(10, 20, 20, null, null, 30)
from position1
based on a length of3
. Sample statement:-- [10, 20, 20] is returned. select slice(array(10, 20, 20, null, null, 30), 1, 3);
- Example 2: Copy the elements in
array(10, 20, 20, null, null, 30)
from position-2
based on a length of2
. Sample statement:-- [null, 30] is returned. select slice(array(10, 20, 20, null, null, 30), -2, 2);
- Example 3: Copy the elements in
array(10, 20, 20, null, null, 30)
from position3
based on a length of10
. Sample statement:-- [20, null, null, 30] is returned. select slice(array(10, 20, 20, null, null, 30), 3, 10);
- Example 4: Copy the elements in
array(10, 20, 20, null, null, 30)
from position3
based on a length of0
. Sample statement:-- [] is returned. select slice(array(10, 20, 20, null, null, 30), 3, 0);
- Example 1: Copy the elements in
SORT_ARRAY
- Syntax
array<T> sort_array(array<T> <a>[, <isasc>])
- Description
Sorts the elements in an array.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type. - isasc: optional. This parameter specifies whether to sort elements in the array in ascending or descending order. Valid values: true and false. The value true indicates the ascending order and false indicates the descending order. If you do not specify this parameter, the elements are sorted in ascending order.
- a: required. This parameter specifies an array that
- Return value
A value of the ARRAY type is returned. NULL is interpreted as the minimum value.
- Examples
- Example 1: The
t_array
table contains thec1 ARRAY<STRING>, c2 ARRAY<INT>, and c3 ARRAY<STRING>
fields. Data in the table:
Sort data in each column of the table. Sample statement:+------------+---------+--------------+ | c1 | c2 | c3 | +------------+---------+--------------+ [a, c, f, b] [4, 5, 7, 2, 5, 8] [You, Me, Him] +------------+---------+--------------+
[a, b, c, f] [2, 4, 5, 5, 7, 8] [Him, You, Me] is returned. select sort_array(c1),sort_array(c2),sort_array(c3) from t_array;
- Example 2: Sort the elements in
array(10, 20, 40, 30, 30, null, 50)
in descending order. Sample statement:-- [50, 40, 30, 30, 20, 10, null] is returned. select sort_array(array(10, 20, 40, 30, 30, null, 50), false);
- Example 1: The
TRANSFORM
- Syntax
array<R> transform(array<T> <a>, function<T, R> <func>)
- Description
Transforms the elements in Array a by using func and returns a new array.
- Parameters
- a: required. This parameter specifies an array that
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type. - func: required. This parameter specifies the built-in function, user-defined function,
or expression that is used to transform the elements in Array a. The value must be of the same data type as the elements in Array a.
R
specifies the data type of the output results.
- a: required. This parameter specifies an array that
- Return value
A value of the ARRAY type is returned.
- Examples
-- [2, 3, 4] is returned. select transform(array(1, 2, 3), x -> x + 1);
ZIP_WITH
- Syntax
array<R> array_sort(array<T> <a>, array<S> <b>, function<T, S, R> <combiner>)
- Description
Merges Array a and Array b at the element level based on element positions and a combiner and returns a new array.
- Parameters
- a and b: required. These parameters specify arrays that
T
inarray<T>
andS
inarray<S>
specify the data types of the elements in the arrays. The elements can be of any data type. - combiner: required. This parameter specifies the built-in function, user-defined function, or expression that is used to merge Array a and Array b at the element level. The combiner consists of two input parameters. One input parameter must be of the same data type as the elements in Array a. The other input parameter must be of the same data type as the elements in Array b.
- a and b: required. These parameters specify arrays that
- Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
- The elements in the returned array are at the same positions as the elements in Array a and Array b.
- If Array a and Array b have different lengths, the null elements are added to the shorter array before the arrays are merged.
- Examples
-- [2, 4, 6, null] is returned. select zip_with(array(1,2,3), array(1,2,3,4), (x,y) -> x + y);
MAP
- Syntax
map(K, V) map(K <key1>, V <value1>, K <key2>, V <value2>[, ...])
- Description
Creates a map based on given key-value pairs.
- Parameters
- key: required. All keys must be of the same data type after implicit conversions. Only basic data types are supported.
- value: required. All values must be of the same data type after implicit conversions. All data types are supported.
- Return value
Data of the MAP type is returned.Note You can specify the
odps.sql.map.key.dedup.policy
parameter at the session level to configure the method that is used to process duplicate keys. Valid values:- exception: An error is returned.
- last_win: The latter key overwrites the former key.
- Examples
- Example 1: No duplicate keys exist. The
t_table
table contains thec1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT)
fields. Data in the table:
Sample statement:+------------+----+----+------------+------------+ | c1 | c2 | c3 | c4 | c5 | +------------+----+----+------------+------------+ | 1000 | k11 | k21 | 86 | 15 | | 1001 | k12 | k22 | 97 | 2 | | 1002 | k13 | k23 | 99 | 1 | +------------+----+----+------------+------------+
-- Define a map based on the key-value pairs between the c2 and c4 columns, and between the c3 and c5 columns. select map(c2,c4,c3,c5) from t_table; -- The following result is returned: +------+ | _c0 | +------+ | {k11:86, k21:15} | | {k12:97, k22:2} | | {k13:99, k23:1} | +------+
- Example 2: Duplicate keys exist. The
t_table
table contains thec1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT)
fields. Data in the table:
Sample statement:1000,'k11','k11',86,15 1001,'k12','k22',97,2 1002,'k13','k23',99,1 1003,'k13','k24',100,1 1004,'k12','k25',95,1
-- Define a map based on the key-value pairs between the c2 and c4 columns, and between the c3 and c5 columns. select map(c2,c4,c3,c5) from t_table; -- The following result is returned: +------+ | _c0 | +------+ | {'k11':15} | | {'k12':97, 'k22':2} | | {'k13':99, 'k23':1} | | {'k13':100, 'k24':1} | | {'k12':95, 'k25':1} | +------+
- Example 1: No duplicate keys exist. The
MAP_CONCAT
- Syntax
map<K, V> map_concat([string <mapDupKeyPolicy>,] map<K, V> <a>, map<K, V> <b>[,...])
- Description
Calculates the union of multiple maps.
- Parameters
- mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is
used to process duplicate keys. Valid values:
- exception: An error is returned.
- last_win: The latter key overwrites the former key.
You can also specify the
odps.sql.map.key.dedup.policy
parameter at the session level to configure the method that is used to process duplicate keys. For example, you can setodps.sql.map.key.dedup.policy
to exception. If you do not specify this parameter, the default value last_win is used.Note The behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value ofodps.sql.map.key.dedup.policy
prevails. - a and b: required. These parameters specify maps. The keys of the maps must be of the same
data type, and the values of the maps must be of the same data type.
K
andV
inmap<K, V>
specify the keys and values of a map.
- mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is
used to process duplicate keys. Valid values:
- Return value
Data of the MAP type is returned. An error is returned in the following scenarios:
- A map is null or the key of a map is null.
- The data types of multiple maps are different.
- Examples
-- {1:a, 2:b, 3:c} is returned. select map_concat(map(1, 'a', 2, 'b'), map(3, 'c')); -- {1:a, 2:d, 3:c} is returned. select map_concat('last_win', map(1, 'a', 2, 'b'), map(3, 'c'), map(2, 'd'));
MAP_ENTRIES
- Syntax
array<struct<K, V>> map_entries(map<K, V> <a>):
- Description
Converts key-value pairs in Map a into a struct array.
- Parameters
a: required. This parameter specifies a map.
K
andV
inmap<K, V>
specify the keys and values of a map. - Return value
A struct array is returned. If the input is null, NULL is returned.
- Examples
-- [{key:1, value:a}, {key:2, value:b}] is returned. select map_entries(map(1, 'a', 2, 'b'));
MAP_FILTER
- Syntax
map<K, V> map_filter(map<K, V> <input>, function <K, V, boolean> <predicate>)
- Description
Filters the elements in Map input and retains only the elements that meet the predicate condition.
- Parameters
- input: required. A value of the MAP type.
K
andV
inmap<K, V>
specify the keys and values of a map. - predicate: required. This parameter specifies the built-in function, user-defined function, or expression that is used to filter the elements in the map. The predicate condition consists of two input parameters that correspond to the keys and values in input. The output result is of the BOOLEAN type.
- input: required. A value of the MAP type.
- Return value
Data of the MAP type is returned.
- Examples
-- {-30:100, 20:50} is returned. select map_filter(map(10, -20, 20, 50, -30, 100, 21, null), (k, v) -> (k+v) > 10);
MAP_FROM_ARRAYS
- Syntax
map<K, V> map_from_arrays([string <mapDupKeyPolicy>,] array<K> <a>, array<V> <b>))
- Description
Creates a map based on Array a and Array b.
- Parameters
- mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is
used to process duplicate keys. Valid values:
- exception: An error is returned.
- last_win: The latter key overwrites the former key.
You can also specify the
odps.sql.map.key.dedup.policy
parameter at the session level to configure the method that is used to process duplicate keys. For example, you can setodps.sql.map.key.dedup.policy
to exception. If you do not specify this parameter, the default value last_win is used.Note The behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value ofodps.sql.map.key.dedup.policy
prevails. - a: required. This parameter specifies an array. This parameter corresponds to the key
in the generated map.
K
inarray<K>
specifies the data type of the elements in the array. The elements can be of any data type. - b: required. This parameter specifies an array. This parameter corresponds to the value
in the generated map.
V
inarray<V>
specifies the data type of the elements in the array. The elements can be of any data type.
- mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is
used to process duplicate keys. Valid values:
- Return value
Data of the MAP type is returned. The return value varies based on the following rules:
- If Array a or Array b is null, NULL is returned.
- If Array a contains a null element or the two arrays are of different lengths, an error is returned.
- Examples
-- {1:2, 3:4} is returned. select map_from_arrays(array(1.0, 3.0), array('2', '4')); -- {1:2, 3:6} is returned. select map_from_arrays('last_win', array(1.0, 3.0, 3), array('2', '4', '6'));
MAP_FROM_ENTRIES
- Syntax
map<K, V> map_from_entries([string <mapDupKeyPolicy>,] array <struct<K, V> , struct<K, V>[,...]>)
- Description
Creates a map based on given struct arrays.
- Parameters
- mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is
used to process duplicate keys. Valid values:
- exception: An error is returned.
- last_win: The latter key overwrites the former key.
You can also specify the
odps.sql.map.key.dedup.policy
parameter at the session level to configure the method that is used to process duplicate keys. For example, you can setodps.sql.map.key.dedup.policy
to exception. If you do not specify this parameter, the default value last_win is used.Note The behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value ofodps.sql.map.key.dedup.policy
prevails. - Data of the STRUCT type is required. K corresponds to the keys in the generated map. V corresponds to the values in the generated map.
K
andV
instruct<K, V>
specify the keys and values of a struct array.
- mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is
used to process duplicate keys. Valid values:
- Return value
Data of the MAP type is returned. The return value varies based on the following rules:
- If a struct array is null, NULL is returned.
- If the number of fields in a struct array is not 2 or the key of a struct array is null, an error is returned.
- Examples
-- {1:a, 2:b} is returned. select map_from_entries(array(struct(1, 'a'), struct(2, 'b'))); -- {1:a, 2:c} is returned. select map_from_entries(array(struct(1, 'a'), struct(2, 'b'), struct(2, 'c')));
MAP_KEYS
- Syntax
array<K> map_keys(map<K, V> <a>)
- Description
Returns all keys in Map a as an array.
- Parameters
a: required. This parameter specifies a map.
K
andV
inmap<K, V>
specify the keys and values of a map. - Return value
A value of the ARRAY type is returned. If the input map is null, NULL is returned.
- Examples
The
t_table_map
table contains thec1 (BIGINT) and t_map (MAP<STRING,BIGINT>)
fields. Data in the table:
Sample statement:+------------+-------+ | c1 | t_map | +------------+-------+ | 1000 | {k11:86, k21:15} | | 1001 | {k12:97, k22:2} | | 1002 | {k13:99, k23:1} | +------------+-------+
-- Return keys in the t_map column as an array. select c1, map_keys(t_map) from t_table_map; -- The following result is returned: +------------+------+ | c1 | _c1 | +------------+------+ | 1000 | [k11, k21] | | 1001 | [k12, k22] | | 1002 | [k13, k23] | +------------+------+
MAP_VALUES
- Syntax
array<V> map_values(map<K, V> <a>)
- Description
Returns all values in Map a as an array.
- Parameters
a: required. This parameter specifies a map.
K
andV
inmap<K, V>
specify the keys and values of a map. - Return value
A value of the ARRAY type is returned. If the input map is null, NULL is returned.
- Examples
The
t_table_map
table contains thec1 (BIGINT) and t_map (MAP<STRING,BIGINT>)
fields. Data in the table:
Sample statement:+------------+-------+ | c1 | t_map | +------------+-------+ | 1000 | {k11:86, k21:15} | | 1001 | {k12:97, k22:2} | | 1002 | {k13:99, k23:1} | +------------+-------+
-- Return keys in the t_map column as an array. select c1,map_values(t_map) from t_table_map; -- The following result is returned: +------------+------+ | c1 | _c1 | +------------+------+ | 1000 | [86, 15] | | 1001 | [97, 2] | | 1002 | [99, 1] | +------------+------+
MAP_ZIP_WITH
- Syntax
<K, V1, V2, V3> map<K, V3> map_zip_wuth(map<K, V1> <input1>, map<K, V2> <input2>, function<K, V1, V2, V3> <func>)
- Description
Merges Map input1 and Map input2 into a single map. The keys of the new map are the union of the keys of the two input maps. The value of each key of the new map is calculated by using func.
- Parameters
- input1 and input2: required. These parameters specify maps.
K
andV
inmap<K, V>
specify the keys and values of a map. - func: required. func consists of three input parameters, which correspond to a key, the value that corresponds to the key in input1, and the value that corresponds to the key in input2. If the key does not exist in input1 or input2, null is used to replace the value that corresponds to the key in func.
- input1 and input2: required. These parameters specify maps.
- Return value
Data of the data type defined by func is returned.
- Examples
-- {1:[1, 1, 4], 2:[2, 2, 5], 3:[3, null, null], 4:[4, null, 7]} is returned. select map_zip_with(map(1, 1, 2, 2, 3, null), map(1, 4, 2, 5, 4, 7), (k, v1, v2) -> array(k, v1, v2));
TRANSFORM_KEYS
- Syntax
map<K2, V> transform_keys([string <mapDupKeyPolicy>,] map<K1, V> <input>, function<K1, V, K2> <func>)
- Description
Transforms the keys in Map input by using func. The values in the map are not changed.
- Parameters
- mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is
used to process duplicate keys. Valid values:
- exception: An error is returned.
- last_win: The latter key overwrites the former key.
You can also specify the
odps.sql.map.key.dedup.policy
parameter at the session level to configure the method that is used to process duplicate keys. For example, you can setodps.sql.map.key.dedup.policy
to exception. If you do not specify this parameter, the default value last_win is used.Note The behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value ofodps.sql.map.key.dedup.policy
prevails. - input: required. This parameter specifies a map.
K1
andV
inmap<K1, V>
specify the keys and values of a map. - func: required. This parameter specifies the built-in function, user-defined function,
or expression that is used to transform the keys. The function or expression consists
of two input parameters that correspond to the keys and values in input.
K2
specifies the data type of keys in the returned map.
- mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is
used to process duplicate keys. Valid values:
- Return value
Data of the MAP type is returned. If one of the new keys is null, an error is returned.
- Examples
-- {-10:-20, 70:50, 71:101} is returned. select transform_keys(map(10, -20, 20, 50, -30, 101), (k, v) -> k + v); -- No error is returned. The returned result depends on the order of the elements in the input map. select transform_keys("last_win", map(10, -20, 20, 50, -30, 100), (k, v) -> k + v); -- An error is returned because duplicate keys exist. select transform_keys("exception", map(10, -20, 20, 50, -30, 100), (k, v) -> k + v);
TRANSFORM_VALUES
- Syntax
map<K, V2> transform_values(map<K, V1> <input>, function<K, V1, V2> <func>)
- Description
Transforms the values in Map input by using func. The keys in the map are not changed.
- Parameters
- input: required. This parameter specifies a map.
K
andV1
inmap<K, V1>
specify the keys and values of the map. - func: required. This parameter specifies the built-in function, user-defined function,
or expression that is used to transform the keys. The function or expression consists
of two input parameters that correspond to the keys and values in input.
V2
specifies the data type of values in the returned map.
- input: required. This parameter specifies a map.
- Return value
Data of the MAP type is returned.
- Examples
-- {-30:71, 10:-10, 20:null} is returned. select transform_values(map(10, -20, 20, null, -30, 101), (k, v) -> k + v);
FIELD
- Syntax
T field(struct <s>, string <fieldName>)
- Description
Obtains the value of a member variable in a struct.
- Parameters
- s: required. This parameter specifies a struct. The struct is in the format of
{f1:T1, f2:T2[, ...]}
.f1
andf2
specify member variables,T1
specifies the value off1
, andT2
specifies the value off2
. - fieldName: required. A value of the STRING type. The value must be one of the member variables in the struct.
- s: required. This parameter specifies a struct. The struct is in the format of
- Return value
The value of the specific member variable in the struct is returned.
- Examples
-- hello is returned. select field(named_struct('f1', 'hello', 'f2', 3), 'f1');
INLINE
- Syntax
inline(array<struct<f1:T1, f2:T2[, ...]>>)
- Description
Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row.
- Parameters
f1:T1 and f2:T2: required. A value of any data type.
f1
andf2
specify member variables,T1
specifies the value off1
, andT2
specifies the value off2
. - Return value
The expanded data of the struct array is returned.
- Examples
The
t_table
table contains thet_struct (STRUCT<user_id:BIGINT,user_name:STRING,married:STRING,weight:DOUBLE>)
field. Data in the table:
Sample statement:+----------+ | t_struct | +----------+ | {user_id:10001, user_name:LiLei, married:N, weight:63.5} | | {user_id:10002, user_name:HanMeiMei, married:Y, weight:43.5} | +----------+
-- Expand the t_struct column. select inline(array(t_struct)) from t_table; -- The following result is returned: +------------+-----------+---------+------------+ | user_id | user_name | married | weight | +------------+-----------+---------+------------+ | 10001 | LiLei | N | 63.5 | | 10002 | HanMeiMei | Y | 43.5 | +------------+-----------+---------+------------+
STRUCT
- Syntax
struct struct(<value1>,<value2>[, ...])
- Description
Creates a struct based on a given value list.
- Parameters
value: required. Data in the array can be of any data type.
- Return value
A value of the STRUCT type is returned. Column names are sequentially named as
col1, col2, ...
- Examples
-- {col1:a, col2:123, col3:true, col4:56.9} is returned. select struct('a',123,'true',56.90);
NAMED_STRUCT
- Syntax
struct named_struct(string <name1>, T1 <value1>, string <name2>, T2 <value2>[, ...])
- Description
Creates a struct based on given name-value pairs.
- Parameters
- value: required. All data types are supported.
- name: required. The column name of the STRING type. This parameter is a constant.
- Return value
A value of the STRUCT type is returned. Column names are sequentially named as
name1, name2, ...
- Examples
-- {user_id:10001, user_name:LiLei, married:F, weight:63.5} is returned. select named_struct('user_id',10001,'user_name','LiLei','married','F','weight',63.50);
FROM_JSON
- Syntax
from_json(<jsonStr>, <schema>)
- Description
Returns data of the ARRAY, MAP, or STRUCT type based on JSON string jsonStr and output format schema.
- Parameters
- jsonStr: required. The JSON string that you entered.
- schema: required. The schema of the JSON string. The value of this parameter must be in
the same format as that in the statement for creating a table, such as
array<bigint>
,map<string, array<string>>
, orstruct<a:int, b:double, `C`:map<string,string>>
.Note Keys in a struct are case-sensitive. You can also specify a struct in the format ofa BIGINT, b DOUBLE
, which is equivalent toSTRUCT<a:BIGINT, b:DOUBLE>
.The following table describes the mappings between JSON data types and MaxCompute data types.
JSON data type MaxCompute data type OBJECT STRUCT, MAP, and STRING ARRAY ARRAY and STRING NUMBER TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, and STRING BOOLEAN BOOLEAN and STRING STRING STRING, CHAR, VARCHAR, BINARY, DATE, and DATETIME NULL All types Note The JSON string of the OBJECT and ARRAY types are parsed as much as possible. If the data type of the JSON string is not mapped to any MaxCompute data type, the JSON string is omitted. For ease of use, all JSON data types can be converted into the STRING data type supported by MaxCompute. When you convert a JSON string of the NUMBER type to a value of the FLOAT, DOUBLE, or DECIMAL type, the precision of the value cannot be ensured. We recommend you convert the JSON string to a value of the STRING type and then convert the obtained value to a value of the FLOAT, DOUBLE, or DECIMAL type.
- Return value
A value of the ARRAY, MAP, or STRUCT type is returned.
- Examples
- Example 1: Convert a specific JSON string into a value of a specific data type. Sample
statements:
-- {"a":1,"b":0.8} is returned. select from_json('{"a":1, "b":0.8}', 'a int, b double'); -- {"time":"26/08/2015"} is returned. select from_json('{"time":"26/08/2015"}', 'time string'); -- {"a":1,"b":0.8} is returned. select from_json('{"a":1, "b":0.8}', 'a int, b double, c string'); -- [1,2,3] is returned. select from_json('[1, 2, 3, "a"]', 'array<bigint>'); -- {"d":"v","a":"1","b":"[1,2,3]","c":"{}"} is returned. select from_json('{"a":1,"b":[1,2,3],"c":{},"d":"v"}', 'map<string, string>');
- Example 2: Use the
map_keys
andfrom_json
functions to obtain all keys in the JSON string. You can also use JSON_KEYS for the same purpose. Sample statement:-- ["a","b"] is returned. select map_keys(from_json('{"a":1,"b":2}','map<string,string>'));
- Example 1: Convert a specific JSON string into a value of a specific data type. Sample
statements:
GET_JSON_OBJECT
- Syntax
string get_json_object(string <json>, string <path>)
- Description
Extracts a single string from a standard JSON string based on path. The original data is read each time this function is called. Therefore, repeated calls may affect system performance and increase costs. To prevent repeated calls, you can use the
GET_JSON_OBJECT
function with UDTFs. For more information, see Convert JSON log data by using MaxCompute built-in functions and UDTFs. - Parameters
- json: required. A value of the STRING type. This parameter specifies a standard JSON object
in the format of
{Key:Value, Key:Value,...}
. If the string contains a double quotation mark ("), use two backslashes (\\) to escape the double quotation mark (") before extraction. If the string contains a single quotation mark ('), use a single backslash (\) to escape the single quotation mark (') before extraction. - path: required. A value of the STRING type. This parameter specifies the path in the value of the json parameter and starts with
$
. For more information about the path parameter, see LanguageManual UDF. For more information about best practices, see Migrate JSON data from OSS to MaxCompute. Different characters have the following meanings:$
: indicates the root node..
or['']
: indicates a child node. MaxCompute parses JSON objects by using.
or['']
. If a key in a JSON object contains a period (.), [''] can be used.[]
([number]
): indicates an array subscript, which starts from 0.*
: indicates the wildcard for[]
, which returns an entire array. An asterisk (*
) cannot be escaped.
- json: required. A value of the STRING type. This parameter specifies a standard JSON object
in the format of
- Return value
- If the json parameter is empty or invalid, NULL is returned.
- If the format of json is valid and path exists, the related string is returned.
- To determine how this function returns a value, you can specify
odps.sql.udf.getjsonobj.new
for a session.- If you run the
set odps.sql.udf.getjsonobj.new=true;
command, this function retains the original strings when it returns a value.We recommend that you use this configuration because it results in more standard function return behavior. This facilitates data processing and improves data processing performance. If a MaxCompute project has jobs in which this function escapes JSON reserved characters, we recommend that you retain the original escaping operation to prevent errors caused by lack of verification. The function complies with the following rules when it returns a value:- The return value is still a JSON string, which can be parsed as JSON data. You do
not need to use the
REPLACE
orREGEXP_REPLACE
function to replace backslashes (\). - Duplicate keys are allowed in a JSON object. If duplicate keys exist, the data can
be parsed.
-- 1 is returned. select get_json_object('{"a":"1","a":"2"}', '$.a');
- The encoded strings that correspond to emojis are supported. However, DataWorks does
not allow you to enter emojis. DataWorks allows you to enter only the encoded strings
that correspond to emojis to MaxCompute by using a tool, such as Data Integration.
DataWorks uses the
GET_JSON_OBJECT
function to process the data.-- An emoji is returned. select get_json_object('{"a":"<Emoji>"}', '$.a');
- The output results are displayed in alphabetical order.
-- {"b":"1","a":"2"} is returned. select get_json_object('{"b":"1","a":"2"}', '$');
- The return value is still a JSON string, which can be parsed as JSON data. You do
not need to use the
- If you run the
set odps.sql.udf.getjsonobj.new=false;
command, this function escapes JSON reserved characters when it returns a value. The function complies with the following rules when it returns a value:- JSON reserved characters, such as line feeds (\n) and double quotation marks (") are
displayed as
'\n'
and'\"'
. - Each key in a JSON object must be unique. If duplicate keys exist, the data may fail
to be parsed.
-- NULL is returned. select get_json_object('{"a":"1","a":"2"}', '$.a');
- The encoded strings that correspond to emojis cannot be parsed.
-- NULL is returned. select get_json_object('{"a":"<Emoji>"}', '$.a');
- The output results are displayed in alphabetical order.
-- {"a":"2","b":"1"} is returned. select get_json_object('{"b":"1","a":"2"}', '$');
- JSON reserved characters, such as line feeds (\n) and double quotation marks (") are
displayed as
Note For MaxCompute projects that are created on or after January 21, 2021, theGET_JSON_OBJECT
function retains the original strings when it returns a value. For MaxCompute projects that are created before January 21, 2021, theGET_JSON_OBJECT
function escapes JSON reserved characters when it returns a value. The following example helps you determine how theGET_JSON_OBJECT
function returns a value in a MaxCompute project.select get_json_object('{"a":"[\\"1\\"]"}', '$.a'); -- Return JSON reserved characters by using escape characters. [\"1\"] -- Return the original strings. ["1"]
You can submit a ticket to request the MaxCompute technical support team to enable the
GET_JSON_OBJECT
function. This way, you do not need to frequently specify flagodps.sql.udf.getjsonobj.new for a session. This function retains the original strings when it returns a value. - If you run the
- Examples
- Example 1: Extract information from the JSON object
src_json.json
. Sample statement:-- The JSON string src_json.json contains the following content: +----+ json +----+ {"store": {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net", "owner":"amy" } -- Extract the information of the owner field and return amy. select get_json_object(src_json.json, '$.owner') from src_json; -- Extract the information of the first array in the store.fruit field and return {"weight":8,"type":"apple"}. select get_json_object(src_json.json, '$.store.fruit[0]') from src_json; -- Extract the information of the non-existent field and return NULL. select get_json_object(src_json.json, '$.non_exist_key') from src_json;
- Example 2: Extract information from a JSON object of the ARRAY type. Sample statement:
-- 2222 is returned. select get_json_object('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]'); -- ["h0","h1","h2"] is returned. set odps.sql.udf.getjsonobj.new=true; select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]'); -- ["h0","h1","h2"] is returned. set odps.sql.udf.getjsonobj.new=false; select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh'); -- h1 is returned. select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]');
- Example 3: Extract information from a JSON object that contains a period
(.)
. Sample statement:-- Create a table. create table mf_json (id string, json string); -- Insert data into the table. The key in the data contains a period (
.
). insert into table mf_json (id, json) values ("1", "{ \"China.beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\", \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}"); -- Insert data into the table. The key in the data does not contain a period (.
). insert into table mf_json (id, json) values ("2", "{ \"China_beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\", \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}"); -- Query the value of id in the JSON object whose key is China.beijing. 0 is returned. Only['']
can be used to specify the key because the key contains a period(.)
. This way, MaxCompute can parse the key. select get_json_object(json, "$['China.beijing'].school['id']") from mf_json where id =1; -- -- Query the value of id in the JSON object whose key is China_beijing. 0 is returned. You can use one of the following statements: select get_json_object(json, "$['China_beijing'].school['id']") from mf_json where id =2; select get_json_object(json, "$.China_beijing.school['id']") from mf_json where id =2; - Example 4: The json parameter is empty or invalid. Sample statement:
-- NULL is returned. select get_json_object('','$.array[1][1]'); -- NULL is returned. select get_json_object('"array":["aaaa",1111],"bbbb":["cccc",3333]','$.array[1][1]');
- Example 5: Escape a JSON string. Sample statement:
set odps.sql.udf.getjsonobj.new=true; -- "1" is returned. select get_json_object('{"a":"\\"1\\"","b":"2"}', '$.a'); -- '1' is returned. select get_json_object('{"a":"\'1\'","b":"2"}', '$.a');
- Example 1: Extract information from the JSON object
JSON_TUPLE
- Syntax
string json_tuple(string <json>, string <key1>, string <key2>,...)
- Description
Extracts strings from a standard JSON string based on a set of input keys, such as
(key1,key2,...)
. - Parameters
- json: required. A value of the STRING type. This parameter specifies a standard JSON string.
- key: required. A value of the STRING type. This parameter is used to describe the
path
of a JSON object in the JSON string. The value cannot start with a dollar sign ($). You can enter multiple keys at a time. MaxCompute parses JSON objects by using.
or['']
. If a key in a JSON object contains a period (.), [''] can be used.
- Return value
A value of the STRING type is returned.
Note- If json is empty or invalid, NULL is returned.
- If key is empty, invalid, or does not exist in the JSON string, NULL is returned.
- If json is valid and key exists, the related string is returned.
- This function can parse JSON data that contains Chinese characters.
- This function can parse nested JSON data.
- This function can parse JSON data that contains nested arrays.
- The parsing action is equivalent to the execution of GET_JSON_OBJECT along with
set odps.sql.udf.getjsonobj.new=true;
. To obtain multiple objects from a JSON string, you must call the GET_JSON_OBJECT function multiple times. As a result, the JSON string is parsed multiple times. The JSON_TUPLE function allows you to enter multiple keys at a time and the JSON string is parsed only once. JSON_TUPLE is more efficient than GET_JSON_OBJECT. - JSON_TUPLE is a UDTF and is used with Lateral View when other columns need to be selected.
TO_JSON
- Syntax
to_json(<expr>)
- Description
Converts an expression of a given complex data type into a JSON string.
- Parameters
expr: required. The expression of the ARRAY, MAP, or STRUCT type.Note If the input expression is of the STRUCT type (
struct<key1:value1, key2:value2>
), take note of the following points:- All keys are converted into lowercase letters when you convert the expression into a JSON string.
- If a
value
is null, the key-value pair to which thevalue
belongs is not included in the JSON string that is returned. For example, ifvalue2
is null,key2:value2
is not included in the JSON string that is returned.
- Examples
- Example 1: Convert an expression of a given data type into a JSON string. Sample statement:
-- {"a":1,"b":2} is returned. select to_json(named_struct('a', 1, 'b', 2)); -- {"time":"26/08/2015"} is returned. select to_json(named_struct('time', "26/08/2015")); -- [{"a":1,"b":2}] is returned. select to_json(array(named_struct('a', 1, 'b', 2))); -- {"a":{"b":1}} is returned. select to_json(map('a', named_struct('b', 1))); -- {"a":1} is returned. select to_json(map('a', 1)); -- [{"a":1}] is returned. select to_json(array((map('a', 1))));
- Example 2: The input expression is of the STRUCT type. Sample statement:
-- {"a":"B"} is returned. If the expression of the STRUCT type is converted into a JSON string, all keys are converted into lowercase letters. select to_json(named_struct("A", "B")); -- {"k2":"v2"} is returned. The key-value pair to which null belongs is not included in the JSON string that is returned. select to_json(named_struct("k1", cast(null as string), "k2", "v2"));
- Example 1: Convert an expression of a given data type into a JSON string. Sample statement: