You can use complex type functions in MaxCompute SQL to process data of complex data types, such as ARRAY, MAP, STRUCT, and JSON. This topic describes the syntax and parameters of complex type functions that are supported by MaxCompute SQL, and provides examples on how to use complex type functions. This topic guides you through data development by using complex type functions.
The following table describes the complex type functions that are supported by MaxCompute SQL. For more information about the limits on JSON functions, see Limits.
Function type | Function | Description |
ARRAY functions | Checks whether all elements in an array meet a specific condition. | |
Checks whether an element in an array meets a specific condition. | ||
Creates an array based on given values. | ||
Checks whether an array contains a given value. | ||
Removes duplicate elements from an array. | ||
Finds the elements that exist in Array A but do not exist in Array B and returns the elements as a new array without duplicates. | ||
Calculates the intersection of two arrays. | ||
Concatenates the elements in an array by using a delimiter. | ||
Returns the largest element in an array. | ||
Returns the smallest element in an array. | ||
Returns an array whose elements are normalized based on the specified p norm. | ||
Returns the position of the first occurrence of a given element in an array. | ||
Aggregates the elements in an array. | ||
Removes a given element from an array. | ||
Returns a new array in which a given element is repeated several times. | ||
Sorts the elements in an array based on a comparator. | ||
Calculates the union of two arrays and returns the union as a new array without duplicates. | ||
Checks whether two arrays contain the same element. | ||
Merges multiple arrays. | ||
Returns an array of n-element subsets based on the elements in an input array. | ||
Concatenates multiple arrays or strings. | ||
Transposes one row of data into multiple rows. This function is a user-defined table-valued function (UDTF). | ||
Filters the elements in an array. | ||
Converts multiple arrays of the ARRAY data type into a single array. | ||
Returns the element at a specific position in an array. | ||
Returns an n-gram array of the elements of the specified array. | ||
Converts an array into a table that has two columns. The first column lists the position of each element in the array, starting from 0. The second column lists the elements. | ||
Returns an array in reverse order of the elements of a specific array. | ||
Returns an array that contains the specified elements based on expressions. | ||
Returns elements of an array in random order. | ||
Returns the number of elements in an array. | ||
Copies the elements in an array from a specific position based on a specific length and returns the elements as a new array. | ||
Sorts the elements in an array. | ||
Splits a string with a specified delimiter and returns an array. | ||
Transforms the elements in an array. | ||
Merges two arrays at the element level based on element positions and returns a new array. | ||
MAP functions | Transposes one row of data into multiple rows. This function is a UDTF. | |
Returns the value that meets a specific condition in a map. | ||
Creates a map based on given key-value pairs. | ||
Returns the union of multiple maps. | ||
Converts key-value pairs in a map into a struct array. | ||
Filters the elements in a map. | ||
Creates a map based on given arrays. | ||
Creates a map based on given struct arrays. | ||
Returns all keys in a map as an array. | ||
Returns all values in a map as an array. | ||
Merges two given maps into a single map. | ||
Returns a map that consists of the keys in a struct array and an array containing all values. | ||
Returns the number of key-value pairs in a map. | ||
Transforms the keys in a map by using a given function. The values in the map are not changed. | ||
Transforms the values in a map by using a given function. The keys in the map are not changed. | ||
STRUCT functions | Obtains the value of a member variable in a struct. | |
Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row. | ||
Creates a struct based on given name-value pairs. | ||
Creates a struct based on a given value list. | ||
JSON functions | Returns data of the ARRAY, MAP, or STRUCT type based on a given JSON string and a given output format. | |
Extracts a single string from a standard JSON string by using a specific method. | ||
Extracts strings from a standard JSON string based on a set of input keys. | ||
Converts data of a complex data type into a JSON string. | ||
Returns a JSON object that contains key-value pairs. | ||
Evaluates a possibly empty list of values and returns a JSON array that contains these values. Evaluates a possibly empty list of values and returns a JSON array that contains these values. | ||
Parses the value of json_path in a JSON expression. Note that an error is returned if the value of json_path is invalid. | ||
Determines whether the JSON value of json_path exists. | ||
Returns a JSON value in a format that is easy to read by adding line breaks and spaces. | ||
Returns the data type of a JSON value. | ||
Converts a value of the JSON data type into a value of the STRING data type. By default, JSON data is not automatically prettified. | ||
Converts a value of the STRING data type into a value of the JSON data type. If a non-JSON-formatted value is converted into a value of the STRING data type, an error is returned. | ||
Determines whether a string is in a valid JSON format. | ||
Supports conversion between basic data types and JSON data types. |
In the following examples, the combination of a hyphen and a closing angle bracket (->)
is used. For more information about how to use the combination of a hyphen and a closing angle bracket (->)
in Lambda functions, see Lambda functions.
ALL_MATCH
Syntax
boolean all_match(array<T> <a>, function<T, boolean> <predicate>)
Description
Checks whether all elements in Array a meet the predicate condition.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type.predicate: required. This parameter specifies the built-in function, user-defined function, or expression that is used to determine whether all elements in Array a meet requirements. The data type of the input parameter must be the same as the data type of the elements in Array a.
Return value
A value of the BOOLEAN type is returned. The return value varies based on the following rules:
If all elements in Array a meet the predicate condition or the array is empty, True is returned.
If one or more elements in Array a do not meet the predicate condition, false is returned.
If an element in Array a is null and other elements meet the predicate condition, null is returned.
Examples
Example 1: Check whether all elements in
array(4, 5, 6)
meet thex x > 3
condition. This condition specifies that all elements are greater than 3. Sample statement:-- The return value is true. select all_match(array(4, 5, 6), x -> x>3);
Example 2: The array is empty. Sample statement:
-- The return value is true. select all_match(array(), x -> x>3);
Example 3: Check whether all elements in
array(1, 2, -10, 100, -30)
meet thex-> x > 3
condition. Sample statement:-- The return value is false. select all_match(array(1, 2, -10, 100, -30), x -> x>3);
Example 4: Check whether all elements in
array(10, 100, 30, null)
meet thex-> x > 3
condition. Sample statement:-- The return value is null. select all_match(array(10, 100, 30, null), x -> x>3);
ANY_MATCH
Syntax
boolean any_match(array<T> <a>, function<T, boolean> <predicate>)
Description
Checks whether an element in Array a meets the predicate condition.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type.predicate: required. This parameter specifies the built-in function, user-defined function, or expression that is used to determine whether all elements in Array a meet requirements. The data type of the input parameter must be the same as the data type of the elements in Array a.
Return value
A value of the BOOLEAN type is returned. The return value varies based on the following rules:
If one or more elements in Array a meet the predicate condition, true is returned.
If no elements in Array a meet the predicate condition or the array is empty, false is returned.
If an element in Array a is null and other elements do not meet the predicate condition, null is returned.
Examples
Example 1: Check whether an element in
array(1, 2, -10, 100, -30)
meets thex-> x > 3
condition. Sample statement:-- The return value is true. select any_match(array(1, 2, -10, 100, -30), x-> x > 3);
Example 2: The array is empty. Sample statement:
-- The return value is false. select any_match(array(), x-> x > 3);
Example 3: Check whether an element in
array(1, 2, -10, -20, -30)
meets thex-> x > 3
condition. Sample statement:-- The return value is false. select any_match(array(1, 2, -10, -20, -30), x-> x > 3);
Example 4: Check whether an element in
array(1, 2, null, -10)
meets thex-> x > 3
condition. Sample statement:-- The return value is null. select any_match(array(1, 2, null, -10), x-> x > 3);
ARRAY
Syntax
array array(<value>,<value>[, ...])
Description
Creates an array based on given values.
Parameters
value: required. A value of any data type. All values must be of the same data type.
Return value
A value of the ARRAY type is returned.
Examples
Create an array based on the data in the
t_table
table that contains thec1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT)
columns. Data in the table:+------------+----+----+------------+------------+ | c1 | c2 | c3 | c4 | c5 | +------------+----+----+------------+------------+ | 1000 | k11 | k21 | 86 | 15 | | 1001 | k12 | k22 | 97 | 2 | | 1002 | k13 | k23 | 99 | 1 | +------------+----+----+------------+------------+
Sample statement:
-- Create an array based on the data in the c2, c4, c3, and c5 columns. select array(c2,c4,c3,c5) from t_table; -- The following result is returned: +------+ | _c0 | +------+ | [k11, 86, k21, 15] | | [k12, 97, k22, 2] | | [k13, 99, k23, 1] | +------+
ARRAY_CONTAINS
Syntax
boolean array_contains(array<T> <a>, value <v>)
Description
Checks whether Array a contains Element v.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type.v: required. This parameter specifies the element that you want to check. The value of v must be of the same data type as the elements in Array a.
Return value
A value of the BOOLEAN type is returned.
Examples
The
t_table_array
table contains thec1 (BIGINT) and t_array (ARRAY<STRING>)
columns. Data in the table:+------------+---------+ | c1 | t_array | +------------+---------+ | 1000 | [k11, 86, k21, 15] | | 1001 | [k12, 97, k22, 2] | | 1002 | [k13, 99, k23, 1] | +------------+---------+
Sample statement:
-- Check whether the t_array column contains the value 1. select c1, array_contains(t_array,'1') from t_table_array; -- The following result is returned: +------------+------+ | c1 | _c1 | +------------+------+ | 1000 | false | | 1001 | false | | 1002 | true | +------------+------+
ARRAY_DISTINCT
Syntax
array<T> array_distinct(array<T> <a>)
Description
Removes duplicate elements from Array a.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type.Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
The returned array has no duplicate elements, and the elements are sorted in the same order as Array a.
If an element in Array a is null, the null element is involved in the operation.
If the input array is empty, an empty array is returned.
Examples
Example 1: Remove duplicate elements from
array(10, 20, 30, 30, 20, 10)
. Sample statement:-- The return value is [10,20,30]. select array_distinct(array(10, 20, 30, 30, 20, 10));
Example 2: Remove duplicate elements from
array(10, 20, 20, null, null, 30, 20, null)
. Sample statement:-- The return value is [10,20,null,30]. select array_distinct(array(10, 20, 20, null, null, 30, 20, null));
Example 3: Remove duplicate elements from an empty array. Sample statement:
-- The return value is []. select array_distinct(array());
ARRAY_EXCEPT
Syntax
array<T> array_except(array<T> <a>, array<T> <b>)
Description
Finds the elements that exist in Array a but do not exist in Array b and returns the elements as a new array without duplicates.
Parameters
a and b: required. These parameters specify arrays.
T
inarray<T>
specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
The returned array has no duplicate elements, and the elements are sorted in the same order as Array a.
If an element in an array is null, the null element is involved in the operation.
If one of the input arrays is empty, a new array is returned based on the non-empty array without duplicates.
If the two input arrays are empty, an empty array is returned.
Examples
Example 1: Find the elements that exist in
array(1, 1, 3, 3, 5, 5)
but do not exist inarray(1, 1, 2, 2, 3, 3)
and return the elements as a new array without duplicates. Sample statement:-- The return value is [5]. select array_except(array(1, 1, 3, 3, 5, 5), array(1, 1, 2, 2, 3, 3));
Example 2: Find the elements that exist in
array(1, 1, 3, 3, 5, 5, null, null)
but do not exist inarray(1, 1, 2, 2, 3, 3)
and return the elements as a new array without duplicates. Sample statement:-- The return value is [5,null]. select array_except(array(1, 1, 3, 3, 5, 5, null, null), array(1, 1, 2, 2, 3, 3));
Example 3: Find the elements that exist in an array but do not exist in another array and return the elements as a new array without duplicates. One of the input arrays is empty. Sample statement:
-- The return value is [2,1]. select array_except(array(2, 1, 1, 2), cast(array() as array<int>));
Example 4: Find the elements that exist in an array but do not exist in another array and return the elements as a new array without duplicates. The two input arrays are empty. Sample statement:
-- The return value is []. select array_except(cast(array() as array<int>), cast(array() as array<int>));
ARRAY_INTERSECT
Syntax
array<T> array_intersect(array<T> <a>, array<T> <b>)
Description
Calculates the intersection of Array a and Array b and removes duplicate elements.
Parameters
a and b: required. These parameters specify arrays.
T
inarray<T>
specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
If an element in an array is null, the null element is involved in the operation.
The returned array has no duplicate elements, and the elements are sorted in the same order as Array a.
If Array a or Array b is null, null is returned.
Examples
Example 1: Calculate the intersection of
array(1, 2, 3)
andarray(1, 3, 5)
, and remove duplicate elements. Sample statement:-- The return value is [1,3]. select array_intersect(array(1, 2, 3), array(1, 3, 5));
Example 2: Calculate the intersection of
array(10, 20, 20, 30, 30, null, null)
andarray(30, 30, 20, 20, 40, null, null)
, and remove duplicate elements. Sample statement:-- The return value is [20,30,null]. select array_intersect(array(10, 20, 20, 30, 30, null, null), array(30, 30, 20, 20, 40, null, null));
ARRAY_JOIN
Syntax
array_join(array<T> <a>, <delimiter>[, <nullreplacement>])
Description
Concatenates the elements in Array a by using a delimiter. If the array contains a null element, use nullreplacement to specify the string that you want to use to replace the null element in the results. If you do not configure nullreplacement, the null element is ignored.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array.NoteIf the elements in the array are not of the STRING type, MaxCompute converts the data type of the elements into STRING.
delimiter: required. A value of the STRING type. This parameter specifies the string that is used to separate the concatenated elements in Array a.
nullreplacement: optional. This parameter specifies the string that is used to replace null elements.
Return value
A value of the STRING type is returned.
Examples
-- The return value is 10,20,20,30. select array_join(array(10, 20, 20, null, null, 30), ","); -- The return value is 10##20##20##null##null##30. select array_join(array(10, 20, 20, null, null, 30), "##", "null");
ARRAY_MAX
Syntax
T array_max(array<T> <a>)
Description
Returns the largest element in Array a.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array.The following data types are supported:
TINYINT, SMALLINT, INT, and BIGINT
FLOAT and DOUBLE
BOOLEAN
DECIMAL and DECIMALVAL
DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
STRING, BINARY, VARCHAR, and CHAR
ARRAY, STRUCT, and MAP
Return value
The largest element in Array a is returned. The return value varies based on the following rules:
If Array a is null, null is returned.
If an element in Array a is null, the null element is not involved in the operation.
Examples
-- The return value is 20. select array_max(array(1, 20, null, 3));
ARRAY_MIN
Syntax
T array_min(array<T> <a>)
Description
Returns the smallest element in Array a.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array.The following data types are supported:
TINYINT, SMALLINT, INT, and BIGINT
FLOAT and DOUBLE
BOOLEAN
DECIMAL and DECIMALVAL
DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
STRING, BINARY, VARCHAR, and CHAR
ARRAY, STRUCT, and MAP
Return value
The smallest element in Array a is returned. The return value varies based on the following rules:
If Array a is null, null is returned.
If an element in Array a is null, the null element is not involved in the operation.
Examples
-- The return value is 1. select array_min(array(1, 20, null, 3));
ARRAY_NORMALIZE
Syntax
array_normalize(array, p)
Description
Returns an array whose elements are normalized based on the specified p norm.
This function is equivalent to
TRANSFORM(array, v -> v / REDUCE(array, 0, (a, v) -> a + POW(ABS(v), p), a -> POW(a, 1 / p))
. However, theREDUCE
function is called only once.Parameters
array: an input array. The elements in the array can be of only the FLOAT and DOUBLE data types.
p: the p norm of the array.
Return value
An array whose elements are normalized based on the specified p norm is returned.
If the array is null or contains null elements, null is returned.
If the
p=0
condition is met, the original array is returned. If thep<0
condition is met, an error is returned.
Examples
SELECT array_normalize(array(10.0, 20.0, 50.0), 1.0);
The following result is returned:
[0.125, 0.25, 0.625]
ARRAY_POSITION
Syntax
bigint array_position(array<T> <a>, T <element>)
Description
Returns the position of the first occurrence of a given element in Array a. The position numbers of elements are counted from left to right and start from 1.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The following data types are supported:TINYINT, SMALLINT, INT, and BIGINT
FLOAT and DOUBLE
BOOLEAN
DECIMAL and DECIMALVAL
DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
STRING, BINARY, VARCHAR, and CHAR
ARRAY, STRUCT, and MAP
element: required. The element whose position you want to query. The data type of this parameter must be the same as the data type of the elements in Array a.
Return value
A value of the BIGINT type is returned. The return value varies based on the following rules:
If Array a or element is null, null is returned.
If the specific element is not found, 0 is returned.
Examples
Example 1: Return the position of the first occurrence of
1
inarray(3, 2, 1)
. Sample statement:-- The return value is 3. select array_position(array(3, 2, 1), 1);
Example 2: The specified element is null. Sample statement:
-- The return value is null. select array_position(array(3, 1, null), null);
ARRAY_REDUCE
Syntax
R array_reduce(array<T> <a>, buf <init>, function<buf, T, buf> <merge>, function<buf, R> <final>)
Description
Aggregates the elements in Array a.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type.init: required. The initial value of the intermediate result that is used to aggregate elements.
merge: required. A built-in function, user-defined function, or expression that is used to perform an operation on each element in Array a and the intermediate result. This function or expression uses the elements of Array a and the init parameter as input parameters.
final: required. A built-in function, user-defined function, or expression that is used to convert the intermediate result into the final result. This function or expression uses the result of merge as the input parameter. R specifies the data type of the output.
Return value
The data type of the return value is the same as the data type that is specified for final.
Examples
-- The return value is 6. select array_reduce(array(1, 2, 3), 0, (buf, e)->buf + e, buf->buf); -- The return value is 2.5. select array_reduce(array(1, 2, 3, 4), named_struct('sum', 0, 'count', 0), (buf, e)->named_struct('sum', buf.sum + e, 'count', buf.count + 1), buf -> buf.sum / buf.count);
ARRAY_REMOVE
Syntax
array<T> array_remove(array<T> <a>, T <element>)
Description
Removes a given element from Array a.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The following data types are supported:TINYINT, SMALLINT, INT, and BIGINT
FLOAT and DOUBLE
BOOLEAN
DECIMAL and DECIMALVAL
DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
STRING, BINARY, VARCHAR, and CHAR
ARRAY, STRUCT, and MAP
element: required. The element that you want to remove. The data type of this parameter must be the same as the data type of the elements in Array a.
Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
If an element in Array a is null, the null element is not involved in the operation.
If Array a or element is null, null is returned.
If Array a does not contain the specific element, Array a is returned.
Examples
Example 1: Remove
1
fromarray(3, 2, 1)
. Sample statement:-- The return value is [3,2]. select array_remove(array(3, 2, 1), 1);
Example 2: The specified element is null. Sample statement:
-- The return value is null. select array_remove(array(3, 1, null), null);
Example 3: Remove
2
fromarray(3, 1, null)
. Sample statement:-- The return value is [3,1,null]. select array_remove(array(3, 1, null), 2);
ARRAY_REPEAT
Syntax
array<T> array_repeat(T <element>, int <count>)
Description
Returns a new array in which element t is repeated count times.
Parameters
t: required. This parameter specifies the element that you want to repeat. The following data types are supported:
TINYINT, SMALLINT, INT, and BIGINT
FLOAT and DOUBLE
BOOLEAN
DECIMAL and DECIMALVAL
DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
STRING, BINARY, VARCHAR, and CHAR
ARRAY, STRUCT, and MAP
count: required. This parameter specifies the number of repetitions. A value of the INT type is required. The value must be greater than or equal to 0.
Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
If the value of count is null, null is returned.
If the value of count is less than 0, an empty array is returned.
Examples
Example 1: Repeat
123
twice
and return the new array. Sample statement:-- The return value is [123, 123]. select array_repeat('123', 2);
Example 2: The value of count is null. Sample statement:
-- The return value is null. select array_repeat('123', null);
Example 3: The value of count is less than 0. Sample statement:
-- The return value is []. select array_repeat('123', -1);
ARRAY_SORT
Syntax
array<T> array_sort(array<T> <a>, function<T, T, bigint> <comparator>)
Description
Sorts the elements in Array a based on a comparator.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type.comparator: required. A built-in function, user-defined function (UDF), or expression that is used to compare two elements in the array.
Processing logic of
comparator(a, b)
: If a is equal to b, 0 is returned. If a is less than b, a negative integer is returned. If a is greater than b, a positive integer is returned. Ifcomparator(a, b)
returns null, an error is returned.ImportantComparison functions in ARRAY_SORT must be self-consistent. Examples:
If
compare(a, b) > 0
is met,compare(b, a) < 0
must be met.If
compare(a, b) = 0
is met,compare(b, a) = 0
must be met.If
compare(a, b) < 0
is met,compare(b, a) > 0
must be met.
Examples of functions that are not self-consistent:
(left, right) -> CASE WHEN left <= right THEN -1L ELSE 0L END
:If you configure
a = 1
andb = 1
, thecompare(a, b)
function returns -1, and thecompare(b, a)
function also returns -1. The comparison results conflict with each other, and the functions are not self-consistent.(left, right) -> CASE WHEN left < right THEN - 1L WHEN left = right THEN 0L ELSE 1L END
:If you configure
a = NULL
andb = 1
, thecompare(a, b)
function returns 1 and thecompare(b, a)
function also returns 1. The comparison results conflict with each other, and the functions are not self-consistent.
Return value
A value of the ARRAY type is returned.
Examples
Example 1: Sort the elements in
array(5,6,1)
.SELECT array_sort(array(5,6,1), (left,right) -> CASE WHEN left < right THEN -1L WHEN left > right THEN 1L ELSE 0L END); -- The following result is returned: +------------+ | _c0 | +------------+ | [1,5,6] | +------------+
Example 2:
-- The return value is [{"a":1,"b":10},{"a":2,"b":12},{"a":3,"b":11}]. select array_sort(a, (a,b)->case when a.a> b.a then 1L when a.a=b.a then 0L else -1L end) from values ( array(named_struct('a', 1, 'b', 10), named_struct('a', 3, 'b', 11), named_struct('a', 2, 'b', 12))) as t(a);
ARRAY_UNION
Syntax
array<T> array_union(array<T> <a>, array<T> <b>)
Description
Calculates the union of Array a and Array b, and removes duplicate elements.
Parameters
a and b: required. These parameters specify arrays.
T
inarray<T>
specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.The following data types are supported:
TINYINT, SMALLINT, INT, and BIGINT
FLOAT and DOUBLE
BOOLEAN
DECIMAL and DECIMALVAL
DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
STRING, BINARY, VARCHAR, and CHAR
ARRAY, STRUCT, and MAP
Return value
A value of the ARRAY type is returned. If Array a or Array b is null, null is returned.
Examples
Example 1: Calculate the union of
array(1, 2, 3)
andarray(1, 3, 5)
, and remove duplicate elements. Sample statement:-- The return value is [1,2,3,5]. select array_union(array(1, 2, 3), array(1, 3, 5));
Example 2: Calculate the union of two arrays and remove duplicate elements. One of the arrays is null. Sample statement:
-- The return value is null. select array_union(array(1, 2, 3), null);
ARRAYS_OVERLAP
Syntax
boolean arrays_overlap(array<T> <a>, array<T> <b>)
Description
Checks whether Array a and Array b contain the same element.
Parameters
a and b: required. These parameters specify arrays.
T
inarray<T>
specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.The following data types are supported:
TINYINT, SMALLINT, INT, and BIGINT
FLOAT and DOUBLE
BOOLEAN
DECIMAL and DECIMALVAL
DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
STRING, BINARY, VARCHAR, and CHAR
ARRAY, STRUCT, and MAP
Return value
A value of the BOOLEAN type is returned. The return value varies based on the following rules:
If Array a contains at least one element that is in Array b and is not null, true is returned.
If Array a and Array b do not contain the same element, both of the arrays are not empty, and one or both of the arrays contain a null element, null is returned.
If Array a and Array b do not contain the same element, and both of the arrays are not empty and do not contain a null element, false is returned.
Examples
Example 1: Check whether
array(1, 2, 3)
andarray(3, 4, 5)
contain the same element. Sample statement:-- The return value is true. select arrays_overlap(array(1, 2, 3), array(3, 4, 5));
Example 2: Check whether
array(1, 2, 3)
andarray(6, 4, 5)
contain the same element. Sample statement:-- The return value is false. select arrays_overlap(array(1, 2, 3), array(6, 4, 5));
Example 3: Check whether two arrays contain the same element. One of the arrays contains the null element. Sample statement:
-- The return value is null. select arrays_overlap(array(1, 2, 3), array(5, 4, null));
ARRAYS_ZIP
Syntax
array<struct<T, U, ...>> arrays_zip(array<T> <a>, array<U> <b>[, ...])
Description
Merges multiple given arrays and returns a struct array, in which the Nth struct contains all the Nth elements of the input arrays.
Parameters
a and b: required. These parameters specify arrays.
T
inarray<T>
andU
inarray<U>
specify the data types of the elements in the arrays. The elements can be of any data type.The following data types are supported:
TINYINT, SMALLINT, INT, and BIGINT
FLOAT and DOUBLE
BOOLEAN
DECIMAL and DECIMALVAL
DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth
STRING, BINARY, VARCHAR, and CHAR
ARRAY, STRUCT, and MAP
Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
The Nth struct in the generated struct array contains all the Nth elements of the input arrays. If an array contains less than N elements, null is used as the Nth element of the array.
If one or more input arrays are null, null is returned.
Examples
Example 1: Merge
array(1, 2, 3)
andarray(2, 3, 4)
into a struct array. Sample statement:-- The return value is [{0:1, 1:2}, {0:2, 1:3}, {0:3, 1:4}]. select arrays_zip(array(1, 2, 3), array(2, 3, 4));
Example 2: Merge
array(1, 2, 3)
andarray(4, 5)
into a struct array. Sample statement:-- The return value is [{0:1, 1:4}, {0:2, 1:5}, {0:3, 1:null}]. select arrays_zip(array(1, 2, 3), array(4, 5));
COMBINATIONS
Syntax
combinations(array(T), n)
Description
Returns an array of n-element subsets based on the elements in an input array.
Parameters
array: an input array.
n: the number of elements in each subset.
Return value
Returns an array of n-element subsets based on the elements in an input array.
If the input array has no duplicate elements, an array of n-element subsets is returned. The subsets are deterministic and listed in random order. Elements in a subset are deterministic and listed in random order.
By default, the maximum value of n is 5. You can specify the odps.sql.max.combination.length parameter to change the maximum value. By default, the total number of generated subsets cannot exceed 100,000. You can specify the odps.sql.max.combinations parameter to change the total number.
If the value of n is greater than the number of elements in the input array, an empty array is returned.
Examples
SELECT combinations(array('foo', 'bar', 'boo'),2);
The following result is returned:
[['foo', 'bar'], ['foo', 'boo']['bar', 'boo']]
SELECT combinations(array(1,2,3,4,5),3);
The following result is returned:
[[1, 2, 3], [1, 2, 4], [1, 3, 4], [2, 3, 4], [1, 2, 5], [1, 3, 5], [2, 3, 5], [1, 4, 5], [2, 4, 5], [3, 4, 5]]
SELECT combinations(array(1,2,2),2);
The following result is returned:
[[1,2], [1,2], [2,2]]
CONCAT
Syntax
array<T> concat(array<T> <a>, array<T> <b>[,...]) string concat(string <str1>, string <str2>[,...])
Description
Arrays as inputs: Concatenates all elements of multiple arrays and returns a new array.
Strings as inputs: Concatenates multiple strings and returns a new string.
Parameters
a and b: required. These parameters specify arrays.
T
inarray<T>
specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type. The null elements are also involved in the operation.str1 and str2: required. Values of the STRING type. If the input values are of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, they are implicitly converted into values of the STRING type before calculation. If the input values are of other data types, an error is returned.
Return value
A value of the ARRAY type is returned. If one of the input arrays is null, null is returned.
A value of the STRING type is returned. If no input parameters are configured or an input parameter is set to null, null is returned.
Examples
Example 1: Concatenate all elements of
array(10, 20)
andarray(20, -20)
. Sample statement:-- The return value is [10, 20, 20, -20]. select concat(array(10, 20), array(20, -20));
Example 2: One of the input arrays contains a null element. Sample statement:
-- The return value is [10, null, 20, -20]. select concat(array(10, null), array(20, -20));
Example 3: One of the input arrays is null. Sample statement:
-- The return value is null. select concat(array(10, 20), null);
Example 4: Concatenate strings
aabc
andabcde
. Sample statement:-- The return value is aabcabcde. select concat('aabc','abcde');
Example 5: The input is empty. Sample statement:
-- The return value is null. select concat();
Example 6: One of the input strings is null. Sample statement:
-- The return value is null. select concat('aabc', 'abcde', null);
EXPLODE
Limits
A
SELECT
statement can include only oneEXPLODE
function. Only data in a column can be calculated by using the EXPLODE function.This function cannot be used with the
GROUP BY
,CLUSTER BY
,DISTRIBUTE BY
, orSORT BY
clause.
Syntax
explode (<var>)
Description
Transposes one row of data into multiple rows. This function is a UDTF.
If the parameter value is of the
array<T>
type, the array stored in the column is transposed into multiple rows.If the parameter value is of the
map<K, V>
type, each key-value pair of the map stored in the column is transposed into one row with two columns. One column is used to store keys, and the other column is used to store values.
Parameters
var: required. The value must be of the
array<T>
ormap<K, V>
type.Return value
Rows after transposition are returned.
Examples
The
t_table_map
table contains thec1 (BIGINT) and t_map (MAP<STRING,BIGINT>)
columns. Data in the table:+------------+-------+ | c1 | t_map | +------------+-------+ | 1000 | {k11:86, k21:15} | | 1001 | {k12:97, k22:2} | | 1002 | {k13:99, k23:1} | +------------+-------+
Sample statement:
select explode(t_map) from t_table_map; -- The following result is returned: +-----+------------+ | key | value | +-----+------------+ | k11 | 86 | | k21 | 15 | | k12 | 97 | | k22 | 2 | | k13 | 99 | | k23 | 1 | +-----+------------+
FIELD
Syntax
T field(struct <s>, string <fieldName>)
Description
Obtains the value of a member variable in a struct.
Parameters
s: required. This parameter specifies a struct. The struct is in the format of
{f1:T1, f2:T2[, ...]}
.f1
andf2
specify member variables,T1
specifies the value off1
, andT2
specifies the value off2
.fieldName: required. A value of the STRING type. This parameter specifies the member variable in the struct.
Return value
A value of the specific member variable in the struct is returned.
Examples
-- The return value is hello. select field(named_struct('f1', 'hello', 'f2', 3), 'f1');
FILTER
Syntax
array<T> filter(array<T> <a>, function<T,boolean> <func>)
Description
Filters the elements in Array a by using func and returns a new array.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type.func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to filter the elements in Array a. The value must be of the same data type as the elements in Array a. The output result of the function or expression is of the BOOLEAN type.
Return value
A value of the ARRAY type is returned.
Examples
-- The return value is [2, 3]. select filter(array(1, 2, 3), x -> x > 1);
FLATTEN
Syntax
flatten(arrayOfArray)
Description
Converts multiple arrays of the ARRAY data type into a single array.
Parameters
arrayOfArray: arrays of the ARRAY data type.
Return value
Multiple arrays of the ARRAY data type are converted into a single array in the order of the elements.
If the input value is
null
, null is returned.If the input parameters are not arrays of the ARRAY data type, an error is returned.
Examples
SELECT flatten(array(array(1, 2), array(3, 4)));
The following result is returned:
[1,2,3,4]
FROM_JSON
Syntax
from_json(<jsonStr>, <schema>)
Description
Returns data of the ARRAY, MAP, or STRUCT type based on JSON string jsonStr and output format schema.
Parameters
jsonStr: required. The JSON string that you entered.
schema: required. The schema of the JSON string. The value of this parameter must be in the same format as that in the statement for creating a table, such as
array<bigint>
,map<string, array<string>>
, orstruct<a:int, b:double, `C`:map<string,string>>
.NoteKeys in a struct are case-sensitive. You can also specify a struct in the format of
a BIGINT, b DOUBLE
, which is equivalent toSTRUCT<a:BIGINT, b:DOUBLE>
.The following table describes the mappings between JSON data types and MaxCompute data types.
JSON data type
MaxCompute data type
OBJECT
STRUCT, MAP, and STRING
ARRAY
ARRAY and STRING
NUMBER
TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, and STRING
BOOLEAN
BOOLEAN and STRING
STRING
STRING, CHAR, VARCHAR, BINARY, DATE, and DATETIME
NULL
All types
NoteThe JSON string of the OBJECT and ARRAY types are parsed as much as possible. If the data type of the JSON string is not mapped to any MaxCompute data type, the JSON string is omitted. For ease of use, all JSON data types can be converted into the STRING data type supported by MaxCompute. When you convert a JSON string of the NUMBER type to a value of the FLOAT, DOUBLE, or DECIMAL type, the precision of the value cannot be ensured. We recommend you convert the JSON string to a value of the STRING type and then convert the obtained value to a value of the FLOAT, DOUBLE, or DECIMAL type.
Return value
A value of the ARRAY, MAP, or STRUCT type is returned.
Examples
Example 1: Convert a specific JSON string into a value of a specific data type. Sample statement:
-- The return value is {"a":1,"b":0.8}. select from_json('{"a":1, "b":0.8}', 'a int, b double'); -- The return value is {"time":"26/08/2015"}. select from_json('{"time":"26/08/2015"}', 'time string'); -- The return value is {"a":1,"b":0.8}. select from_json('{"a":1, "b":0.8}', 'a int, b double, c string'); -- The return value is [1,2,3]. select from_json('[1, 2, 3, "a"]', 'array<bigint>'); -- The return value is {"d":"v","a":"1","b":"[1,2,3]","c":"{}"}. select from_json('{"a":1,"b":[1,2,3],"c":{},"d":"v"}', 'map<string, string>');
Example 2: Use the
map_keys
andfrom_json
functions to obtain all keys in a JSON string. You can also use JSON_KEYS for the same purpose. Sample statement:-- The return value is ["a","b"]. select map_keys(from_json('{"a":1,"b":2}','map<string,string>'));
GET_JSON_OBJECT
Usage notes
This function is used to extract a single string from a standard JSON string by using JSONPath. This function supports input parameters of the following data types:
JSON types: If input parameters are of JSON types, standard JSONPath expressions are used.
STRING type: If input parameters are of the STRING type, the original JSONPath expressions are used.
The methods and usage notes for using this function vary based on input parameter data types. This topic describes how to use the GET_JSON_OBJECT function when input parameters are of the JSON and STRING types.
The JSONPath expressions that are used when input parameters are of JSON types and those that are used when input parameters are of the STRING type comply with different rules. This may cause incompatibility issues.
The GET_JSON_OBJECT function does not support the syntax of JSONPath expressions.
Input parameters of JSON data types
Syntax
string get_json_object(json <json>, string <json_path>)
Description
Extracts a single string from a standard JSON string based on JSON PATH.
Parameters
json: required. A JSON string from which you want to extract a single string.
json_path: required. A JSONPath expression based on which you want to extract a single string.
Return value
A value of a STRING type is returned.
Examples
Example 1: Extract the value that corresponds to the key a from a JSON string.
select get_json_object(json '{"a":1, "b":2}', '$.a');
The following result is returned:
+-----+ | _c0 | +-----+ | 1 | +-----+
Example 2: Extract the value that corresponds to the key c from a JSON string.
select get_json_object(json '{"a":1, "b":2}', '$.c');
The following result is returned:
+-----+ | _c0 | +-----+ | NULL | +-----+
Example 3: If an invalid JSON path is specified, the return value is NULL.
select get_json_object(json '{"a":1, "b":2}', '$invalid_json_path');
The following result is returned:
+-----+ | _c0 | +-----+ | NULL | +-----+
Input parameter of the STRING data type
Syntax
string get_json_object(string <json>, string <path>)
Description
Extracts a single string from a standard JSON string based on path. The original data is read each time this function is called. Therefore, repeated calls may affect system performance and increase costs. To prevent repeated calls, you can use the
GET_JSON_OBJECT
function with UDTFs. For more information, see Convert JSON log data by using MaxCompute built-in functions and UDTFs.Parameters
json: required. A value of the STRING type. This parameter specifies a standard JSON object in the format of
{Key:Value, Key:Value,...}
. If the string contains a double quotation mark ("), use two backslashes (\\) to escape the double quotation mark (") before extraction. If the string contains a single quotation mark ('), use a single backslash (\) to escape the single quotation mark (') before extraction.path: required. A value of the STRING type. This parameter specifies the path in the value of the json parameter and starts with
$
. For more information about the path parameter, see LanguageManual UDF. For more information about best practices, see Migrate JSON data from OSS to MaxCompute. Meanings of different characters:$
: indicates the root node..
or['']
: indicates a child node. MaxCompute parses JSON objects by using.
or['']
. If a key in a JSON object contains a period (.), [''] can be used.[]
([number]
): indicates an array subscript, which starts from 0.*
: indicates the wildcard for[]
. If this character is used in the path parameter, an entire array is returned. An asterisk (*
) cannot be escaped.
Limits
Only MaxCompute V2.0 allows you to extract data by using
['']
in the path parameter. To use [''], you must add theset odps.sql.udf.getjsonobj.new=true;
statement before the statement that you want to execute.Return value
If the json parameter is empty or invalid, null is returned.
If the format of json is valid and path exists, the related string is returned.
You can specify the
flagodps.sql.udf.getjsonobj.new
parameter for a session to determine how this function returns a value.If you execute the
set odps.sql.udf.getjsonobj.new=true;
statement, this function retains the original strings when it returns a value.We recommend that you use this configuration because it results in more standard function return behavior. This facilitates data processing and improves data processing performance. If a job in which this function escapes JSON reserved characters exists in a MaxCompute project, we recommend that you retain the original escape operation to prevent errors caused by lack of verification. The function complies with the following rules when it returns a value:
In this configuration, the return value is still a JSON string, which can be parsed as JSON data, without the need to use the
REPLACE
orREGEXP_REPLACE
function to replace backslashes (\).Duplicate keys are allowed in a JSON object. If duplicate keys exist, the data can be parsed.
-- The return value is 1. select get_json_object('{"a":"1","a":"2"}', '$.a');
The encoded strings that correspond to emojis are supported. However, DataWorks does not allow you to enter emojis. DataWorks allows you to enter only the encoded strings that correspond to emojis to MaxCompute by using a tool, such as Data Integration. DataWorks uses the
GET_JSON_OBJECT
function to process the data.-- The return value is an emoji. select get_json_object('{"a":"<Emoji>"}', '$.a');
The output results are displayed in alphabetical order.
-- The return value is {"b":"1","a":"2"}. select get_json_object('{"b":"1","a":"2"}', '$');
If you execute the
set odps.sql.udf.getjsonobj.new=false;
statement, this function escapes JSON reserved characters when it returns a value. The function complies with the following rules when it returns a value:JSON reserved characters such as line feeds (\n) and quotation marks (") are displayed as
'\n'
and'\"'
.Each key in a JSON object must be unique. If duplicate keys exist, the data may fail to be parsed. Sample statement:
-- The return value is null. select get_json_object('{"a":"1","a":"2"}', '$.a');
The encoded strings that correspond to emojis cannot be parsed. Sample statement:
-- The return value is null. select get_json_object('{"a":"<Emoji>"}', '$.a');
The output results are displayed in alphabetical order. Sample statement:
-- The return value is {"a":"2","b":"1"}. select get_json_object('{"b":"1","a":"2"}', '$');
NoteFor MaxCompute projects that were created on or after January 21, 2021, the
GET_JSON_OBJECT
function retains the original strings when it returns a value. For MaxCompute projects that are created before January 21, 2021, theGET_JSON_OBJECT
function escapes JSON reserved characters when it returns a value. The following example helps you determine how theGET_JSON_OBJECT
function returns a value in a MaxCompute project.select get_json_object('{"a":"[\\"1\\"]"}', '$.a'); -- Return JSON reserved characters by using escape characters. [\"1\"] -- Return the original strings. ["1"]
You can submit an application or search for the DingTalk group ID 11782920 to join the MaxCompute developer community DingTalk group and request MaxCompute technical support engineers to configure the
GET_JSON_OBJECT
function to retain original strings. This way, you do not need to frequently specify set odps.sql.udf.getjsonobj.new=false; for a session.
Examples
Example 1: Extract information from the JSON object
src_json.json
. Sample statement:-- The JSON string src_json.json contains the following content: +----+ json +----+ {"store": {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net", "owner":"amy" } -- Extract the information of the owner field and return amy. select get_json_object(src_json.json, '$.owner') from src_json; -- Extract the information of the first array in the store.fruit field and return {"weight":8,"type":"apple"}. select get_json_object(src_json.json, '$.store.fruit[0]') from src_json; -- Extract the information of the non-existent field and return null. select get_json_object(src_json.json, '$.non_exist_key') from src_json;
Example 2: Extract information from a JSON object of the ARRAY type. Sample statement:
-- The return value is 2222. select get_json_object('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]'); -- The return value is ["h0","h1","h2"]. set odps.sql.udf.getjsonobj.new=true; select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]'); -- The return value is ["h0","h1","h2"]. set odps.sql.udf.getjsonobj.new=false; select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh'); -- The return value is h1. select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]');
Example 3: Extract information from a JSON object that includes a period
(.)
. Sample statement:-- Create a table. create table mf_json (id string, json string); -- Insert data into the table. The key in the data contains a period (.). insert into table mf_json (id, json) values ("1", "{ \"China.beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\", \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}"); -- Insert data into the table. The key in the data does not contain a period (.). insert into table mf_json (id, json) values ("2", "{ \"China_beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\", \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}"); -- Query the value of id in the JSON object whose key is China.beijing. 0 is returned. Only [''] can be used to specify the key because the key contains a period (.). This way, MaxCompute can parse the key. select get_json_object(json, "$['China.beijing'].school['id']") from mf_json where id =1; -- Query the value of id in the JSON object whose key is China_beijing. 0 is returned. You can use one of the following statements: select get_json_object(json, "$['China_beijing'].school['id']") from mf_json where id =2; select get_json_object(json, "$.China_beijing.school['id']") from mf_json where id =2;
Example 4: The json parameter is empty or invalid. Sample statement:
-- The return value is null. select get_json_object('','$.array[1][1]'); -- The return value is null. select get_json_object('"array":["aaaa",1111],"bbbb":["cccc",3333]','$.array[1][1]');
Example 5: Escape a JSON string. Sample statement:
set odps.sql.udf.getjsonobj.new=true; -- The return value is "1". select get_json_object('{"a":"\\"1\\"","b":"2"}', '$.a'); -- The return value is '1'. select get_json_object('{"a":"\'1\'","b":"2"}', '$.a');
INDEX
Syntax
index(<var1>[<var2>])
Description
If var1 is of the
map<K, V>
type, this function obtains the element that is at position var2 in var1. The position numbers of elements are counted from left to right and start from 0.If var1 is of the
map<K, V>
type, this function obtains the value whose key is var2 in var1.
NoteWhen you use this function, you must remove
index
and directly execute<var1>[<var2>]
. Otherwise, an error is returned.Parameters
var1: required. The value must be of the
array<T>
ormap<K, V>
type.T
inarray<T>
specifies the data type of the elements in an array. The elements can be of any data type.K
andV
inmap<K, V>
specify the keys and values of a map.var2: required.
If var1 is of the
array<T>
type, var2 must be of the BIGINT type and greater than or equal to 0.If var1 is of the
map<K, V>
type, var2 must be of the same data type as K.
Return value
If var1 is of the
array<T>
type, a value of the data type that is specified by T is returned. The return value varies based on the following rules:If the number of elements in var1 is less than var2, null is returned.
If var1 is null, null is returned.
If var1 is of the
map<K, V>
type, a value of the data type that is specified by V is returned. The return value varies based on the following rules:If
map<K, V>
does not contain a key whose value is var2, null is returned.If var1 is null, null is returned.
Examples
Example 1: var1 is of the
array<T>
type. Sample statement:-- The return value is c. select array('a','b','c')[2];
Example 2: var1 is of the
map<K, V>
type. Sample statement:-- The return value is 1. select str_to_map("test1=1,test2=2")["test1"];
INLINE
Syntax
inline(array<struct<f1:T1, f2:T2[, ...]>>)
Description
Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row.
Parameters
f1:T1 and f2:T2: required. Values of any data type.
f1
andf2
specify member variables,T1
specifies the value off1
, andT2
specifies the value off2
.Return value
The expanded data of the struct array is returned.
Examples
The
t_table
table contains thet_struct (STRUCT<user_id:BIGINT,user_name:STRING,married:STRING,weight:DOUBLE>)
fields. Data in the table:+----------+ | t_struct | +----------+ | {user_id:10001, user_name:LiLei, married:N, weight:63.5} | | {user_id:10002, user_name:HanMeiMei, married:Y, weight:43.5} | +----------+
Sample statement:
-- Expand the t_struct column. select inline(array(t_struct)) from t_table; -- The following result is returned: +------------+-----------+---------+------------+ | user_id | user_name | married | weight | +------------+-----------+---------+------------+ | 10001 | LiLei | N | 63.5 | | 10002 | HanMeiMei | Y | 43.5 | +------------+-----------+---------+------------+
JSON_TUPLE
Syntax
string json_tuple(string <json>, string <key1>, string <key2>,...)
Description
Extracts strings from a standard JSON string based on a set of input keys, such as
(key1,key2,...)
.Parameters
json: required. A value of the STRING type. This parameter specifies a standard JSON string.
key: required. A value of the STRING type. This parameter is used to describe the
path
of a JSON object in the JSON string. The value cannot start with a dollar sign ($). You can enter multiple keys at a time. MaxCompute parses JSON objects by using.
or['']
. If a key in a JSON object includes a period(.)
,['']
can be used.
Return value
A value of the STRING type is returned.
NoteIf json is empty or invalid, null is returned.
If key is empty, invalid, or does not exist in the JSON string, null is returned.
If json is valid and key exists, the related string is returned.
This function can parse JSON data that contains Chinese characters.
This function can parse nested JSON data.
This function can parse JSON data that contains nested arrays.
The parsing action is equivalent to the execution of GET_JSON_OBJECT along with
set odps.sql.udf.getjsonobj.new=true;
. To obtain multiple objects from a JSON string, you must call the GET_JSON_OBJECT function multiple times. As a result, the JSON string is parsed multiple times. The JSON_TUPLE function allows you to enter multiple keys at a time and the JSON string is parsed only once. JSON_TUPLE is more efficient than GET_JSON_OBJECT.JSON_TUPLE is a user-defined table-valued function (UDTF). If you want to select some columns from a table, use JSON_TUPLE together with the LATERAL VIEW clause.
JSON_OBJECT
Syntax
json json_object(<key1>,<value1>[,<keyn>,<valuen>])
Description
Returns a JSON object that contains key-value pairs.
Parameters
key: You must specify at least one key. The key must be of the STRING type.
value: You must specify at least one value. The value must be of the STRING, BIGINT, INT, or BOOLEAN type.
Return value
The return value is of the JSON type.
Examples
Example 1: Generate a JSON object that contains only one key-value pair.
-- Generate a JSON object. select json_object('a', 123);
The following result is returned:
+-----+ | _c0 | +-----+ | {"a":123} | +-----+
Example 2: Generate a JSON object that contains multiple key-value pairs.
-- Generate a JSON object. select json_object('a', 123,'b','hello');
The following result is returned:
+-----+ | _c0 | +-----+ | {"a":123,"b":"hello"} | +-----+
JSON_ARRAY
Syntax
json json_array(<element>)
Description
Evaluates a possibly empty list of values and returns a JSON array that contains these values.
Parameters
element: required. The values in the list specified by this parameter must be of the STRING, BIGINT, BOOLEAN, or JSON type.
Return value
The return value is of the JSON type.
Examples
-- Generate a JSON array. select json_array('a', 45, true, 13, json '{"a":456}');
The following result is returned:
+-----+ | _c0 | +-----+ | ["a",45,true,13,{"a":456}] | +-----+
JSON_EXTRACT
Syntax
json json_extract(<json>, <json_path>)
Description
Parses the value of json_path in a JSON expression. Note that an error is returned if the value of json_path is invalid.
Parameters
json: required. This parameter specifies the JSON expression that you want to process.
json_path: required. This parameter specifies the JSON path of the value that you want to be returned.
Return value
The return value is of the JSON type.
Examples
Example 1: Obtain the value of the key a from a JSON object.
select json_extract(json '{"a":1, "b":2}', '$.a');
The following result is returned:
+-----+ | _c0 | +-----+ | 1 | +-----+
Example 2: NULL is returned if the specified key does not exist in the JSON object.
select json_extract(json '{"a":1, "b":2}', 'strict $.c');
The following result is returned:
+-----+ | _c0 | +-----+ | NULL | +-----+
Example 3: The error message
Invalid argument - Param json path $invalid_json_path is invalid
is returned if the JSON path that you specified is invalid.select json_extract(json '{"a":1, "b":2}', '$a');
The following result is returned:
-- An error message is returned. Invalid argument - Param json path $invalid_json_path is invalid
JSON_EXISTS
Syntax
boolean json_exists(<json>, <json_path>)
Description
Determines whether the JSON value of json_path exists.
Parameters
json: required. This parameter specifies the JSON expression that you want to process.
json_path: required. This parameter specifies the JSON path in which a specific JSON value is checked.
Return value
The return value is true or false, which is of the BOOLEAN type.
Examples
Example 1: Check whether the value of the key a exists in a JSON object.
select json_exists(json '{"a":1, "b":2}', '$.a');
The following result is returned:
+------+ | _c0 | +------+ | true | +------+
Example 2: Check whether the value of the key c exists in a JSON object.
select json_exists(json '[1,2, {"a":34}]', '$[2].a');
The following result is returned:
+------+ | _c0 | +------+ | true | +------+
Example 3: Check whether values that are obtained based on a specific subscript exist.
select json_exists(json '{"a":1, "b":2}', 'strict $.c');
The following result is returned:
+------+ | _c0 | +------+ | false | +------+
JSON_PRETTY
Syntax
string json_pretty(<json>)
Description
Returns a JSON value in a format that is easy to read by adding line breaks and spaces.
Parameters
json: required. This parameter specifies the JSON expression that you want to process.
Return value
The return value is of the STRING type.
Examples
-- Beautify a JSON object. select json_pretty(json '{"a":1, "b":2}');
The following result is returned:
+-----+ | _c0 | +-----+ | { "a":1, "b":2 } | +-----+
JSON_TYPE
Syntax
string|number|boolean|null|object|array json_type(<json>)
Description
Returns the data type of a JSON value.
Parameters
json: required. The JSON expression that you want to process.
Return value
A value of the STRING type is returned.
Examples
Example 1: Return the ARRAY type for a JSON value.
select json_type(json '[{"a":1}, 23]');
The following result is returned:
+-----+ | _c0 | +-----+ | array | +-----+
Example 2: Return a numeric data type for a JSON value.
select json_type(json '123');
The following result is returned:
+-----+ | _c0 | +-----+ | number | +-----+
Example 3: Return the STRING type for a JSON value.
select json_type(json '"123"');
The following result is returned:
+-----+ | _c0 | +-----+ | string | +-----+
JSON_FORMAT
Syntax
string json_format(<json>)
Description
Converts a value of the JSON data type into a value of the STRING data type. By default, JSON data is not automatically prettified.
Parameters
json: required. This parameter specifies the JSON data that you want to convert.
Return value
The return value is of the STRING type.
Examples
Example 1: Convert JSON NUMBER data into strings.
select json_format(json '123');
The following result is returned:
+-----+ | _c0 | +-----+ | 123 | +-----+
Example 2: Convert JSON STRING data into strings.
select json_format(json '"123"');
The following result is returned:
+-----+ | _c0 | +-----+ | "123" | +-----+
JSON_PARSE
Syntax
json json_parse(<string>)
Description
Converts a value of the STRING data type into a value of the JSON data type. If a non-JSON-formatted value is converted into a value of the STRING data type, an error is returned.
Parameters
string: required. This parameter specifies the string that you want to process.
Return value
The return value is of the JSON type.
Examples
Example 1: Convert a string into data of the JSON type.
select json_parse('{"a":1, "b":2}');
The following result is returned:
+-----+ | _c0 | +-----+ | {"a":1,"b":2} | +-----+
Example 2: Convert a string into data of the JSON type.
select json_parse('"abc"');
The following result is returned:
+-----+ | _c0 | +-----+ | "abc" | +-----+
Example 3: Convert an invalid string into data of the JSON type. An error is reported.
select json_parse('abc');
The following result is returned:
Invalid input syntax for type json, detail:Token "abc" is invalid.
JSON_VALID
Syntax
boolean json_valid(<string>)
Description
Determines whether a string is in a valid JSON format.
Parameters
string: required. This parameter specifies the JSON string that you want to process.
Return value
The return value is true or false, which is of the BOOLEAN type.
Examples
Example 1: Check whether
"abc"
is a valid JSON string.select json_valid('"abc"');
The following result is returned:
+------+ | _c0 | +------+ | true | +------+
Example 2: Check whether
abc
is a valid JSON string.select json_valid('abc');
The following result is returned:
+------+ | _c0 | +------+ | false | +------+
CAST
Syntax
json/string/bigint/int/tinyint/smallint/double/float/boolean/sql-type cast(json as string/ string as json/ json as bigint/ bigint as json/ json as int/ int as json/ json as tinyint/ tinyint as json/ json as smallint/ smallint as json/ json as double/ double as json/ json as float/ float as json/ boolean as json/ json as boolean/ null as json/ json 'null' as ... )
Description
Supports conversion between basic data types and JSON data types.
Parameters
The following data types are supported:
JSON, STRING, BIGINT, INT, TINYINT, SMALLINT, DOUBLE, FLOAT, BOOLEAN, and SQL-TYPE
.To convert JSON data into data of the STRING data type, you must make sure the JSON data does not contain values of the ARRAY or OBJECT data type.
After you convert data of the STRING data type into JSON data, the output JSON data contains only values of the STRING data type. Take note of the following differences between the CAST function and the JSON_PARSE and JSON_FORMAT functions:
You can use the JSON_PARSE function to convert only valid JSON strings into JSON data, which can be a JSON object.
You can use the CAST function to convert any string into a JSON string that contains values of the STRING data type.
JSON 'null' values and common null values are converted into SQL null values.
Return value
JSON data or data of a specific basic data type is returned.
Examples
Example 1: Perform conversion between the STRING data type and the JSON data type.
-- Convert a JSON number into a value of the STRING type. select cast(json '123' as string); -- The following result is returned: +-----+ | _c0 | +-----+ | 123 | +-----+ -- Convert a JSON string into a value of the STRING type. select cast(json '"abc"' as string); -- The following result is returned: +-----+ | _c0 | +-----+ | abc | +-----+ -- Convert a JSON boolean into a value of the STRING type. select cast(json 'true' as string); -- The following result is returned: +-----+ | _c0 | +-----+ | TRUE | +-----+ -- Convert JSON 'null' into a null value of the STRING type. select cast(json 'null' as string); -- The following result is returned: +-----+ | _c0 | +-----+ | NULL | +-----+ -- Convert a string into a value of a JSON type. select cast('{"a":2}' as json); -- The following result is returned: +-----+ | _c0 | +-----+ | "{\"a\":2}" | +-----+ -- Incorrect sample statement. Convert JSON expressions of the ARRAY or OBJECT type into values of the STRING type. select cast(json '{"a":2}' as string); -- An error is returned. FAILED: ODPS-0123091:Illegal type cast - Unsupported cast from json array/object to string
Example 2: Perform conversion between the NUMBER data type and the JSON data type.
-- Convert a JSON number into a value of the BIGINT type. select cast(json '123' as bigint); -- The following result is returned: +------------+ | _c0 | +------------+ | 123 | +------------+ -- Convert a JSON number into a value of the FLOAT type. select cast(json '"1.23"' as float); -- The following result is returned: +------+ | _c0 | +------+ | 1.23 | +------+ -- Convert a JSON number into a value of the DOUBLE type. select cast(json '1.23' as double); -- The following result is returned: +------------+ | _c0 | +------------+ | 1.23 | +------------+ -- Convert a value of the INT type into a JSON number. select cast(123 as json); -- The following result is returned: +-----+ | _c0 | +-----+ | 123 | +-----+ -- Convert a value of the FLOAT type into a JSON number. select cast(1.23 as json); -- The following result is returned: +-----+ | _c0 | +-----+ | 1.23 | +-----+
Example 3: Perform conversion between the BOOLEAN data type and the JSON data type.
-- Convert a value of the BOOLEAN type into the BIGINT type. select cast(true as json); -- The following result is returned: +-----+ | _c0 | +-----+ | true | +-----+ -- Convert a JSON boolean into a value of the BOOLEAN type. select cast(json 'false' as boolean); -- The following result is returned: +------+ | _c0 | +------+ | false | +------+ -- Convert a JSON string into a value of the BOOLEAN type. select cast(json '"abc"' as boolean); -- The following result is returned: +------+ | _c0 | +------+ | true | +------+ -- Convert a JSON array or JSON object into the BOOLEAN type. select cast(json '[1,2]' as boolean); -- An error is returned. Unsupported cast from json array/object to boolean
Example 4: Perform conversion between a null value and a value of the JSON data type.
-- Convert a null value into a value of the STRING type. select json_type(cast(null as json)); -- The following result is returned: +-----+ | _c0 | +-----+ | NULL | +-----+
MAP
Syntax
map(K, V) map(K <key1>, V <value1>, K <key2>, V <value2>[, ...])
Description
Creates a map based on given key-value pairs.
Parameters
key: required. All keys must be of the same data type after implicit conversions. Only basic data types are supported.
value: required. All values must be of the same data type or must be of the same data type after implicit conversions. Data types other than DECIMAL are supported.
Return value
A value of the MAP type is returned.
NoteYou can configure the
odps.sql.map.key.dedup.policy
parameter at the session level to specify the method that is used to process duplicate keys. Valid values:exception: An error is returned.
last_win: The latter key overwrites the former key.
If you do not specify the parameter, the default value last_win is used.
Examples
Example 1: No duplicate keys exist. The
t_table
table contains thec1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT)
columns. Data in the table:+------------+----+----+------------+------------+ | c1 | c2 | c3 | c4 | c5 | +------------+----+----+------------+------------+ | 1000 | k11 | k21 | 86 | 15 | | 1001 | k12 | k22 | 97 | 2 | | 1002 | k13 | k23 | 99 | 1 | +------------+----+----+------------+------------+
Sample statement:
-- Define a map based on the key-value pairs between the c2 and c4 columns, and between the c3 and c5 columns. select map(c2,c4,c3,c5) from t_table; -- The following result is returned: +------+ | _c0 | +------+ | {k11:86, k21:15} | | {k12:97, k22:2} | | {k13:99, k23:1} | +------+
Example 2: Duplicate keys exist. The
t_table
table contains thec1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT)
columns. Data in the table:1000,'k11','k11',86,15 1001,'k12','k22',97,2 1002,'k13','k23',99,1 1003,'k13','k24',100,1 1004,'k12','k25',95,1
Sample statement:
-- Define a map based on the key-value pairs between the c2 and c4 columns, and between the c3 and c5 columns. select map(c2,c4,c3,c5) from t_table; -- The following result is returned: +------+ | _c0 | +------+ | {'k11':15} | | {'k12':97, 'k22':2} | | {'k13':99, 'k23':1} | | {'k13':100, 'k24':1} | | {'k12':95, 'k25':1} | +------+
MAP_CONCAT
Syntax
map<K, V> map_concat([string <mapDupKeyPolicy>,] map<K, V> <a>, map<K, V> <b>[,...])
Description
Calculates the union of multiple maps.
Parameters
mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:
exception: An error is returned.
last_win: The latter key overwrites the former key.
You can also specify the
odps.sql.map.key.dedup.policy
parameter at the session level to configure the method that is used to process duplicate keys. For example, you can setodps.sql.map.key.dedup.policy
to exception. If you do not specify this parameter, the default value last_win is used.NoteThe behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of
odps.sql.map.key.dedup.policy
is used.a and b: required. These parameters specify maps. The keys of the maps must be of the same data type, and the values of the maps must be of the same data type.
K
andV
inmap<K, V>
specify the keys and values of a map.
Return value
A value of the MAP type is returned. The return value varies based on the following rules:
A map is null or the key of a map is null.
The data types of multiple maps are different.
Examples
-- The return value is {1:a, 2:b, 3:c}. select map_concat(map(1, 'a', 2, 'b'), map(3, 'c')); -- The return value is {1:a, 2:d, 3:c}. select map_concat('last_win', map(1, 'a', 2, 'b'), map(3, 'c'), map(2, 'd'));
MAP_ENTRIES
Syntax
array<struct<K, V>> map_entries(map<K, V> <a>):
Description
Converts key-value pairs in Map a into a struct array.
Parameters
a: required. This parameter specifies a map.
K
andV
inmap<K, V>
specify the keys and values of a map.Return value
A struct array is returned. If the input is null, null is returned.
Examples
-- The return value is [{key:1, value:a}, {key:2, value:b}]. select map_entries(map(1, 'a', 2, 'b'));
MAP_FILTER
Syntax
map<K, V> map_filter(map<K, V> <input>, function <K, V, boolean> <predicate>)
Description
Filters the elements in Map input and retains only the elements that meet the predicate condition.
Parameters
input: required. A value of the MAP type.
K
andV
inmap<K, V>
specify the keys and values of a map.predicate: required. This parameter specifies the built-in function, user-defined function, or expression that is used to filter the elements in the map. The predicate condition consists of two input parameters that correspond to the keys and values in input. The output result is of the BOOLEAN type.
Return value
A value of the MAP type is returned.
Examples
-- The return value is {-30:100, 20:50}. select map_filter(map(10, -20, 20, 50, -30, 100, 21, null), (k, v) -> (k+v) > 10);
MAP_FROM_ARRAYS
Syntax
map<K, V> map_from_arrays([string <mapDupKeyPolicy>,] array<K> <a>, array<V> <b>))
Description
Creates a map based on Array a and Array b.
Parameters
mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:
exception: An error is returned.
last_win: The latter key overwrites the former key.
You can also specify the
odps.sql.map.key.dedup.policy
parameter at the session level to configure the method that is used to process duplicate keys. For example, you can setodps.sql.map.key.dedup.policy
to exception. If you do not specify this parameter, the default value last_win is used.NoteThe behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of
odps.sql.map.key.dedup.policy
is used.a: required. This parameter specifies an array. This parameter corresponds to the key in the generated map.
K
inarray<K>
specifies the data type of the elements in the array. The elements can be of any data type.b: required. This parameter specifies an array. This parameter corresponds to the value in the generated map.
V
inarray<V>
specifies the data type of the elements in the array. The elements can be of any data type.
Return value
A value of the MAP type is returned. The return value varies based on the following rules:
If Array a or Array b is null, null is returned.
If Array a contains a null element or the two arrays are of different lengths, an error is returned.
Examples
-- The return value is {1:2, 3:4}. select map_from_arrays(array(1.0, 3.0), array('2', '4')); -- The return value is {1:2, 3:6}. select map_from_arrays('last_win', array(1.0, 3.0, 3), array('2', '4', '6'));
MAP_FROM_ENTRIES
Syntax
map<K, V> map_from_entries([string <mapDupKeyPolicy>,] array <struct<K, V> , struct<K, V>[,...]>)
Description
Creates a map based on given struct arrays.
Parameters
mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:
exception: An error is returned.
last_win: The latter key overwrites the former key.
You can also specify the
odps.sql.map.key.dedup.policy
parameter at the session level to configure the method that is used to process duplicate keys. For example, you can setodps.sql.map.key.dedup.policy
to exception. If you do not specify this parameter, the default value last_win is used.NoteThe behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of
odps.sql.map.key.dedup.policy
is used.Input parameters of this function are of the STRUCT type. K corresponds to the keys in the generated map. V corresponds to the values in the generated map.
K
andV
instruct<K, V>
specify the keys and values of a struct array.
Return value
A value of the MAP type is returned. The return value varies based on the following rules:
If a struct array is null, null is returned.
If the number of fields in a struct array is not 2 or the key of a struct array is null, an error is returned.
Examples
-- The return value is {1:a, 2:b}. select map_from_entries(array(struct(1, 'a'), struct(2, 'b'))); -- The return value is {1:a, 2:c}. select map_from_entries(array(struct(1, 'a'), struct(2, 'b'), struct(2, 'c')));
MAP_KEYS
Syntax
array<K> map_keys(map<K, V> <a>)
Description
Returns all keys in Map a as an array.
Parameters
a: required. This parameter specifies a map.
K
andV
inmap<K, V>
specify the keys and values of a map.Return value
A value of the ARRAY type is returned. If the input map is null, null is returned.
Examples
The
t_table_map
table contains thec1 (BIGINT) and t_map (MAP<STRING,BIGINT>)
columns. Data in the table:+------------+-------+ | c1 | t_map | +------------+-------+ | 1000 | {k11:86, k21:15} | | 1001 | {k12:97, k22:2} | | 1002 | {k13:99, k23:1} | +------------+-------+
Sample statement:
-- Return keys in the t_map column as an array. select c1, map_keys(t_map) from t_table_map; -- The following result is returned: +------------+------+ | c1 | _c1 | +------------+------+ | 1000 | [k11, k21] | | 1001 | [k12, k22] | | 1002 | [k13, k23] | +------------+------+
MAP_VALUES
Syntax
array<V> map_values(map<K, V> <a>)
Description
Returns all values in Map a as an array.
Parameters
a: required. This parameter specifies a map.
K
andV
inmap<K, V>
specify the keys and values of a map.Return value
A value of the ARRAY type is returned. If the input map is null, null is returned.
Examples
The
t_table_map
table contains thec1 (BIGINT) and t_map (MAP<STRING,BIGINT>)
columns. Data in the table:+------------+-------+ | c1 | t_map | +------------+-------+ | 1000 | {k11:86, k21:15} | | 1001 | {k12:97, k22:2} | | 1002 | {k13:99, k23:1} | +------------+-------+
Sample statement:
-- Return keys in the t_map column as an array. select c1,map_values(t_map) from t_table_map; -- The following result is returned: +------------+------+ | c1 | _c1 | +------------+------+ | 1000 | [86, 15] | | 1001 | [97, 2] | | 1002 | [99, 1] | +------------+------+
MAP_ZIP_WITH
Syntax
<K, V1, V2, V3> map<K, V3> map_zip_with(map<K, V1> <input1>, map<K, V2> <input2>, function<K, V1, V2, V3> <func>)
Description
Merges Map input1 and Map input2 into a new map. The keys of the new map are the union of the keys of the two input maps. The value of each key of the new map is calculated by using func.
Parameters
input1 and input2: required. This parameter specifies a map.
K
andV
inmap<K, V>
specify the keys and values of a map.func: required. func consists of three input parameters, which correspond to a key, the value that corresponds to the key in input1, and the value that corresponds to the key in input2. If the key does not exist in input1 or input2, null is used to replace the value that corresponds to the key in func.
Return value
Data of the data type defined by func is returned.
Examples
-- The return value is {1:[1, 1, 4], 2:[2, 2, 5], 3:[3, null, null], 4:[4, null, 7]}. select map_zip_with(map(1, 1, 2, 2, 3, null), map(1, 4, 2, 5, 4, 7), (k, v1, v2) -> array(k, v1, v2));
MULTIMAP_FROM_ENTRIES
Syntax
multimap_from_entries(array<struct<K, V>>)
Description
Returns a map that consists of the keys in a struct array and an array containing all values.
Parameters
array<struct<K, V>>: a struct array that consists of key-value pairs.
Return value
A map that consists of keys in a struct array and an array containing all values is returned. The map is in the
map<K, array<V>>
format.Each key in the returned map can be associated with multiple values, and the associated values are in the same array.
If the array is null, null is returned.
If the number of fields in the struct array is not 2 or the keys contain null values, an error is returned.
Examples
SELECT multimap_from_entries(array(struct(1, 'a'), struct(2, 'b'), struct(1, 'c')));
The following result is returned:
{1 : ['a', 'c'], 2: ['b']}
NAMED_STRUCT
Syntax
struct named_struct(string <name1>, T1 <value1>, string <name2>, T2 <value2>[, ...])
Description
Creates a struct based on given name-value pairs.
Parameters
value: required. A value of any data type.
name: required. The column name of the STRING type. This parameter is a constant.
Return value
A value of the STRUCT type is returned. Columns are sequentially named as
name1, name2, ...
.Examples
-- The return value is {user_id:10001, user_name:LiLei, married:F, weight:63.5}. select named_struct('user_id',10001,'user_name','LiLei','married','F','weight',63.50);
NGRAMS
Syntax
ngrams(array(T), n)
Description
Returns an n-gram array of the elements of the specified array.
Parameters
array: an input array.
n: the number of elements in each subset.
Return value
Returns an n-gram array of the elements of the specified array.
If the
n<=0
condition is met, an error is returned.Examples
SELECT ngrams(array('foo', 'bar', 'baz', 'foo'), 2);
The following result is returned:
[['foo', 'bar'], ['bar', 'baz'], ['baz', 'foo']]
SELECT ngrams(array('foo', 'bar', 'baz', 'foo'), 3);
The following result is returned:
[['foo', 'bar', 'baz'], ['bar', 'baz', 'foo']]
SELECT ngrams(array('foo', 'bar', 'baz', 'foo'), 4);
The following result is returned:
[['foo', 'bar', 'baz', 'foo']]
SELECT ngrams(array('foo', 'bar', 'baz', 'foo'), 5);
The following result is returned:
[['foo', 'bar', 'baz', 'foo']]
SELECT ngrams(array(1, 2, 3, 4), 2);
The following result is returned:
[[1, 2], [2, 3], [3, 4]]
POSEXPLODE
Syntax
posexplode(array<T> <a>)
Description
Converts Array a into a table that has two columns. The first column lists the position of each element in the array, starting from 0. The second column lists the elements.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type.Return value
A table is returned.
Examples
select posexplode(array('a','c','f','b')); -- The following result is returned: +------------+------------+ | pos | val | +------------+------------+ | 0 | a | | 1 | c | | 2 | f | | 3 | b | +------------+------------+
REVERSE
Syntax
array reverse(array <value>)
Description
Returns an array in reverse order of the elements of a specific array.
Parameters
value: an input array.
Return value
An array in reverse order of the elements of an input array is returned. If the input value is null, null is returned.
Examples
-- The return value is [3, 4, 1, 2]. SELECT reverse(array(2, 1, 4, 3));
SEQUENCE
Syntax
sequence(start, stop, [step]) -> array
Description
Returns an array that contains the specified elements based on expressions.
Parameters
start: an expression that represents the beginning of a sequence of elements. The sequence contains start.
start and stop support the following integer types: TINYINT, SMALLINT, INT, and BIGINT. The preceding integer types correspond to the following types supported by step: TINYINT, SMALLINT, INT, and BIGINT.
start and stop support the following date and time types: DATE, DATETIME, and TIMESTAMP. The preceding date and time types correspond to the IntervalDayTime or IntervalYearMonth type supported by step.
stop: an expression that represents the end of a sequence of elements. The sequence contains stop.
step: optional. The step size of a sequence of elements.
By default, if the value of start is less than or equal to the value of stop, the value of step is 1. If the value of start is greater than the value of stop, the value of step is -1.
If the sequence of elements is of the time type, the default value of step is 1 or -1 in days. When you configure step, take note of the following points: If the value of start is greater than the value of stop, set step to a negative value. If the value of start is less than or equal to the value of stop, set step to a positive value. If the requirements are not met, an error is returned.
Return value
An array that contains the specified elements based on expressions is returned.
If the value of start is greater than the value of stop and the value of step is positive, an error is returned. If the value of start is less than or equal to the value of stop and the value of step is negative, an error is returned.
By default, the SEQUENCE function can generate a maximum of 10,000 elements. You can configure the
odps.sql.max.sequence.length
parameter to change the maximum number of elements that can be generated.
Examples
SELECT sequence(1, 5);
The following result is returned:
[1, 2, 3, 4, 5]
SELECT sequence(5, 1);
The following result is returned:
[5, 4, 3, 2, 1]
SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval 1 month);
The following result is returned:
[2018-01-01, 2018-02-01, 2018-03-01]
SHUFFLE
Syntax
shuffle(array)
Description
Parameters
array: an input array.
Return value
Returns elements of an array in random order.
If the input value is null, null is returned.
The returned result of this function is undefined.
Examples
SELECT shuffle(array(1, 20, 3, 5));
The following result is returned:
[3,1,5,20]
SELECT shuffle(array(1, 20, null, 3));
The following result is returned:
[20,null,3,1]
SIZE
Syntax
int size(array<T> <a>) int size(map<K, V> <b> )
Description
An array as the input: Calculates the number of elements in Array a.
A map as the input: Calculates the number of key-value pairs in Map b.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type.b: required. This parameter specifies a map.
K
andV
inmap<K, V>
specify the keys and values of a map.
Return value
A value of the INT type is returned.
Examples
Example 1: Calculate the number of elements in
array('a','b')
. Sample statement:-- The return value is 2. select size(array('a','b'));
Example 2: Calculate the number of key-value pairs in
map('a',123,'b',456)
.-- The return value is 2. select size(map('a',123,'b',456));
SLICE
Syntax
array<T> slice(array<T> <a>, <start>, <length>)
Description
Copies the elements in an array from the start position based on a specified length and returns the elements as a new array.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type.start: required. This parameter specifies the position at which the function starts to copy elements from left to right. The minimum positive value of this parameter is 1. You can also set the start parameter to a negative value. In this case, the start position is counted from the end of the array, but the elements are still copied from left to right.
length: required. The number of elements in the returned array. The value must be greater than or equal to 0. If the value is greater than the length of the input array, this function returns a new array that consists of the elements from the start position to the end of the input array.
Return value
A value of the ARRAY type is returned.
Examples
Example 1: Copy the elements in
array(10, 20, 20, null, null, 30)
from position1
based on the length of3
. Sample statement:-- The return value is [10, 20, 20]. select slice(array(10, 20, 20, null, null, 30), 1, 3);
Example 2: Copy the elements in
array(10, 20, 20, null, null, 30)
from position-2
based on the length of2
. Sample statement:-- The return value is [null, 30]. select slice(array(10, 20, 20, null, null, 30), -2, 2);
Example 3: Copy the elements in
array(10, 20, 20, null, null, 30)
from position3
based on the length of10
. Sample statement:-- The return value is [20, null, null, 30]. select slice(array(10, 20, 20, null, null, 30), 3, 10);
Example 4: Copy the elements in
array(10, 20, 20, null, null, 30)
from position3
based on the length of0
. Sample statement:-- The return value is []. select slice(array(10, 20, 20, null, null, 30), 3, 0);
SORT_ARRAY
Syntax
array<T> sort_array(array<T> <a>[, <isasc>])
Description
Sorts the elements in an array.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type.isasc: optional. This parameter specifies the order in which elements in the array are sorted. Valid values: true and false. The value true indicates the ascending order and false indicates the descending order. If you do not specify this parameter, the elements are sorted in ascending order.
Return value
A value of the ARRAY type is returned. The value null is interpreted as the minimum value.
Examples
Example 1: Sort data in the
t_array
table that contains thec1 ARRAY<STRING>, c2 ARRAY<INT>, and c3 ARRAY<STRING>
columns. Data in the table:+------------+---------+--------------+ | c1 | c2 | c3 | +------------+---------+--------------+ | [a, c, f, b] | [4, 5, 7, 2, 5, 8] | [You, Me, Him] | +------------+---------+--------------+
Sort data in each column of the table. Sample statement:
-- The return value is [a, b, c, f] [2, 4, 5, 5, 7, 8] [Him, Me, You]. select sort_array(c1),sort_array(c2),sort_array(c3) from t_array;
Example 2: Sort the elements in
array(10, 20, 40, 30, 30, null, 50)
in descending order. Sample statement:-- The return value is [50, 40, 30, 30, 20, 10, null]. select sort_array(array(10, 20, 40, 30, 30, null, 50), false);
SPLIT
Syntax
split(<str>, <pat>)
Description
Returns an array after str is split with pat.
Parameters
str: required. A value of the STRING type. This parameter specifies the string that you want to split.
pat: required. A delimiter of the STRING type. Regular expressions are supported. For more information about regular expressions, see Regular expressions.
Return value
An array is returned. The elements in the array are of the STRING type.
Examples
-- The return value is [a, b, c]. select split("a, b, c", ",");
STRUCT
Syntax
struct struct(<value1>,<value2>[, ...])
Description
Creates a struct based on a given value list.
Parameters
value: required. A value of any data type.
Return value
A value of the STRUCT type is returned. Columns are sequentially named as
col1, col2, ...
.Examples
-- The return value is {col1:a, col2:123, col3:true, col4:56.9}. select struct('a',123,'true',56.90);
TO_JSON
Syntax
string to_json(<expr>)
Description
Converts an expression specified by expr of a given complex data type into a JSON string.
Parameters
expr: required. An expression of the ARRAY, MAP, or STRUCT type.
NoteIf the input expression is of the STRUCT type (
struct<key1:value1, key2:value2>
), take note of the following points:All keys are converted into lowercase letters when you convert the expression into a JSON string.
If a
value
is null, the key-value pair to which thevalue
belongs is not included in the JSON string that is returned. For example, ifvalue2
is null,key2:value2
is not included in the JSON string that is returned.
Return value
A JSON string is returned.
Examples
Example 1: Convert an expression of a given complex data type into a JSON string. Sample statement:
-- The return value is {"a":1,"b":2}. select to_json(named_struct('a', 1, 'b', 2)); -- The return value is {"time":"26/08/2015"}. select to_json(named_struct('time', "26/08/2015")); -- The return value is [{"a":1,"b":2}]. select to_json(array(named_struct('a', 1, 'b', 2))); -- The return value is {"a":{"b":1}}. select to_json(map('a', named_struct('b', 1))); -- The return value is {"a":1}. select to_json(map('a', 1)); -- The return value is [{"a":1}]. select to_json(array((map('a', 1))));
Example 2: The input expression is of the STRUCT type. Sample statement:
-- The return value is {"a":"B"}. If the expression of the STRUCT type is converted into a JSON string, all keys are converted into lowercase letters. select to_json(named_struct("A", "B")); -- The return value is {"k2":"v2"}. The key-value pair to which null belongs is not included in the JSON string that is returned. select to_json(named_struct("k1", cast(null as string), "k2", "v2"));
TRANSFORM
Syntax
array<R> transform(array<T> <a>, function<T, R> <func>)
Description
Transforms the elements in Array a by using func and returns a new array.
Parameters
a: required. This parameter specifies an array.
T
inarray<T>
specifies the data type of the elements in the array. The elements can be of any data type.func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to transform the elements in Array a. The value must be of the same data type as the elements in Array a.
R
specifies the data type of the output results.
Return value
A value of the ARRAY type is returned.
Examples
-- The return value is [2, 3, 4]. select transform(array(1, 2, 3), x -> x + 1);
TRANSFORM_KEYS
Syntax
map<K2, V> transform_keys([string <mapDupKeyPolicy>,] map<K1, V> <input>, function<K1, V, K2> <func>)
Description
Transforms the keys in Map input by using func. The values in the map are not changed.
Parameters
mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:
exception: An error is returned.
last_win: The latter key overwrites the former key.
You can also specify the
odps.sql.map.key.dedup.policy
parameter at the session level to configure the method that is used to process duplicate keys. For example, you can setodps.sql.map.key.dedup.policy
to exception. If you do not specify this parameter, the default value last_win is used.NoteThe behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of
odps.sql.map.key.dedup.policy
is used.input: required. This parameter specifies a map.
K1
andV
inmap<K1, V>
specify the keys and values of a map.func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to transform the keys. The function or expression consists of two input parameters that correspond to the keys and values in input.
K2
specifies the data type of keys in the returned map.
Return value
A value of the MAP type is returned. If one of the new keys is null, an error is returned.
Examples
-- The return value is {-10:-20, 70:50, 71:101}. select transform_keys(map(10, -20, 20, 50, -30, 101), (k, v) -> k + v); -- No error is returned. The returned result depends on the order of the elements in the input map. select transform_keys("last_win", map(10, -20, 20, 50, -30, 100), (k, v) -> k + v); -- An error is returned because duplicate keys exist. select transform_keys("exception", map(10, -20, 20, 50, -30, 100), (k, v) -> k + v);
TRANSFORM_VALUES
Syntax
map<K, V2> transform_values(map<K, V1> <input>, function<K, V1, V2> <func>)
Description
Transforms the values in Map input by using func. The keys in the map are not changed.
Parameters
input: required. This parameter specifies a map.
K
andV1
inmap<K, V1>
specify the keys and values of the map.func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to transform the keys. The function or expression consists of two input parameters that correspond to the keys and values in input.
V2
specifies the data type of values in the returned map.
Return value
A value of the MAP type is returned.
Examples
-- The return value is {-30:71, 10:-10, 20:NULL}. select transform_values(map(10, -20, 20, null, -30, 101), (k, v) -> k + v);
ZIP_WITH
Syntax
array<R> zip_with(array<T> <a>, array<S> <b>, function<T, S, R> <combiner>)
Description
Merges Array a and Array b at the element level based on element positions and a combiner and returns a new array.
Parameters
a and b: required. This parameter specifies an array.
T
inarray<T>
andS
inarray<S>
specify the data types of the elements in the arrays. The elements can be of any data type.combiner: required. This parameter specifies the built-in function, user-defined function, or expression that is used to merge Array a and Array b at the element level. The combiner consists of two input parameters. One input parameter must be of the same data type as the elements in Array a. The other input parameter must be of the same data type as the elements in Array b.
Return value
A value of the ARRAY type is returned. The return value varies based on the following rules:
The elements in the returned array are at the same positions as the elements in Array a and Array b.
If Array a and Array b have different lengths, the null elements are added to the shorter array before the arrays are merged.
Examples
-- The return value is [2, 4, 6, null]. select zip_with(array(1,2,3), array(1,2,3,4), (x,y) -> x + y);