MaxCompute SQL provides other functions commonly used in the development process. You can use appropriate functions based on your business requirements. This topic describes the command syntax, parameters, and examples of the functions, such as CAST, DECODE, LEAST, ARRAY, SPLIT, and MAP.
Function | Description |
---|---|
CAST | Converts the result of an expression to a specific data type. |
COALESCE | Returns the first non-NULL value in the list. |
DECODE | Implements the IF-THEN-ELSE conditional branching feature.
|
GET_IDCARD_AGE | Returns an age based on the ID card number. |
GET_IDCARD_BIRTHDAY | Returns the date of birth based on the ID card number. |
GET_IDCARD_SEX | Returns the gender based on the ID card number. |
GREATEST | Returns the maximum value of the input parameters. |
ORDINAL | Sorts the values of the input variables in ascending order and returns the value that is ranked at a specific position. |
PARTITION_EXISTS | Checks whether a specific partition exists in a table. |
LEAST | Returns the minimum value of the input parameters. |
MAX_PT | Returns the name of the largest level-1 partition in a partitioned table. |
UUID | Returns a random ID. |
SAMPLE | Samples all column values that are read and filters out the rows that do not meet sampling conditions. |
CASE WHEN expression | Returns different values based on the computing result of an expression. |
IF expression | Checks whether a specified condition evaluates to true. |
SPLIT | Splits a string with a specified delimiter and returns an array. |
STR_TO_MAP | Splits a string with a specified delimiter and returns a key-value pair. |
EXPLODE | Transposes one row of data into multiple rows. This function is a user-defined table function (UDTF). |
STACK | Splits a specified parameter group into a specified number of rows. |
MAP | Defines a MAP parameter based on a specified key-value pair. |
MAP_KEYS | Returns all keys in the MAP parameter as an array. |
MAP_VALUES | Returns all values in the MAP parameter as an array. |
ARRAY | Creates an array by using given input values. |
SIZE | Returns the number of key-value pairs in a specified MAP parameter or the number of elements in an array. |
ARRAY_CONTAINS | Checks whether a specified array contains a specified value. |
SORT_ARRAY | Sorts given arrays. |
POSEXPLODE | Converts an array into a table that has two columns. The first column lists subscripts of each value in the array, starting from 0. The second column lists array elements. |
STRUCT | Creates a struct based on a given value list. |
NAMED_STRUCT | Creates a struct based on given name-value pairs. |
INLINE | Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row. |
TABLE_EXISTS | Checks whether a specified table exists. |
TRANS_ARRAY | Transposes one row of data into multiple rows. This function is a UDTF that transposes an array separated by fixed delimiters in a column into multiple rows. |
TRANS_COLS | Transposes one row of data into multiple rows. This function is a UDTF that transposes columns into rows. |
GET_USER_ID | Obtains the ID of the current account. |
CAST
- Syntax
cast(<expr> as <type>)
- Description
Converts the result of expr to a specific data type. Usage:
cast(double as bigint)
: converts a value of the DOUBLE type to the BIGINT type.cast(string as bigint)
: converts a value of the STRING type to the BIGINT type. If the string consists of numerals expressed in the INTEGER form, it is converted to the BIGINT type. If the string consists of numerals expressed in the FLOAT or EXPONENTIAL form, it is converted to the DOUBLE type and then to the BIGINT type.- The default date format,
yyyy-mm-dd hh:mi:ss
, is used forcast(string as datetime)
andcast(datetime as string)
.
- Parameters
- expr: required. The expression whose computing result you want to convert.
- type: required. The data type to which you want to convert the data.
- Return value
The return value is of the specified data type.
- Examples
- Example 1: common usage. Sample command:
-- 1 is returned. select cast('1' as bigint);
- Example 2: incorrect usage. If the conversion fails or an unsupported type conversion
occurs, NULL is returned. Sample command:
-- NULL is returned. select cast('abc' as bigint);
- Example 1: common usage. Sample command:
COALESCE
- Syntax
coalesce(<expr1>, <expr2>, ...)
- Description
Returns the first non-NULL value in
<expr1>, <expr2>, ...
. If all values in the list are NULL, NULL is returned. - Parameter
expr: required. A value that you want to check. Except the NULL values, all other values must be of the same data type. Otherwise, an error is returned. If all values are NULL, an error is returned.
- Return value
The data type of the return value is the same as the data type of the parameter.
- Examples
- Example 1: common usage. Sample command:
-- 1 is returned. select coalesce(null,null,1,null,3,5,7);
- Example 2: incorrect usage. The data types of parameter values are not the same, and
an error is returned. Sample command:
-- An error is returned, and the value abc cannot be identified. select coalesce(null,null,1,null,abc,5,7);
- Example 3: incorrect usage. Non-NULL values do not exist and an error is returned.
Sample command:
-- An error is returned because non-NULL values do not exist. select coalesce(null,null,null,null);
- Example 1: common usage. Sample command:
DECODE
- Syntax
decode(<expression>, <search>, <result>[, <search>, <result>]...[, <default>])
- Description
Implements the
IF-THEN-ELSE
conditional branching feature. - Parameters
- expression: required. The expression that you want to compare.
- search: required. The search item that you use to compare with the expression.
- result: required. The value returned when the value of search matches the value of expression.
- default: optional. If no search items match the expression, the value of default is returned. If this parameter is not specified, NULL is returned.
Note- You must specify at least three parameters.
- Except the NULL values, all other values of the result parameter must be of the same data type. Otherwise, an error is returned.
- The values of search and expression must be of the same data type. Otherwise, an error is returned.
- Return value
- If a search item matches the expression, result is returned.
- If no search item matches the expression, default is returned.
- If default is not specified, NULL is returned.
- If duplicate search items match the expression, the first value is returned.
- In most cases, NULL is returned when MaxCompute SQL calculates
NULL=NULL
. However, the DECODE function considers that the two NULL values are the same.
- Example
-- If the value of customer_id is 1, Taobao is returned. If the value is 2, Alipay is returned. If the value is 3, Aliyun is returned. If the value is NULL, N/A is returned. In other cases, Others is returned. select decode(customer_id, '1', 'Taobao', '2', 'Alipay', '3', 'Aliyun', Null, 'N/A', 'Others') as result from sale_detail; -- The preceding statement is equivalent to the following statement: if customer_id = 1 then result := 'Taobao'; elsif customer_id = 2 then result := 'Alipay'; elsif customer_id = 3 then result := 'Aliyun'; ... else result := 'Others'; end if;
GET_IDCARD_AGE
- Syntax
get_idcard_age(<idcardno>)
- Description
Returns the current age based on the ID card number. The current age is the current year minus the birth year in the ID card number.
- Parameter
idcardno: required. A 15-digit or 18-digit ID card number of the STRING type. During calculation, the validity of the ID card is checked based on the province code and the last digit of the ID card number. NULL is returned if the check fails.
- Return value
A value of the BIGINT type is returned. If the input value is NULL, NULL is returned.
GET_IDCARD_BIRTHDAY
- Syntax
get_idcard_birthday(<idcardno>)
- Description
Returns the date of birth based on the ID card number.
- Parameter
idcardno: required. A 15-digit or 18-digit ID card number of the STRING type. During calculation, the validity of the ID card is checked based on the province code and the last digit of the ID card number. NULL is returned if the check fails.
- Return value
A value of the DATETIME type is returned. If the input value is NULL, NULL is returned.
GET_IDCARD_SEX
- Syntax
get_idcard_sex(<idcardno>)
- Description
Returns the gender based on the ID card number. Valid values:
M
andF
. M indicates male and F indicates female. - Parameter
idcardno: required. The 15-digit or 18-digit ID card number of the STRING type. During calculation, the validity of the ID card is checked based on the province code and the last digit of the ID card number. NULL is returned if the check fails.
- Return value
A value of the STRING type is returned. If the input value is NULL, NULL is returned.
GREATEST
- Syntax
greatest(<var1>, <var2>, …)
- Description
Returns the maximum value of the input parameters.
- Parameter
var: required. The value can be of the BIGINT, DOUBLE, DECIMAL, DATETIME, or STRING type. If the values of all input parameters are NULL, NULL is returned.
- Return value
- The maximum value of the input parameters is returned. If no implicit conversion is required, the return value is of the same data type as the input parameter.
- NULL is interpreted as the minimum value.
- If the input parameters are of different data types, those of the DOUBLE, BIGINT, DECIMAL, and STRING types need to be converted to the DOUBLE type for comparison, and those of the STRING and DATETIME types need to be converted to the DATETIME type for comparison. Implicit conversions of other data types are not allowed.
- If
odps.sql.hive.compatible
is set to true and the value of any input parameter is NULL, NULL is returned.
ORDINAL
- Syntax
ordinal(bigint <nth>, <var1>, <var2>, …)
- Description
Sorts the values of the input variables in ascending order and returns the value that is ranked nth.
- Parameters
- nth: required. The position of the value that you want to return. The value is of the BIGINT type. If it is set to NULL, NULL is returned.
- var: required. The value is of the BIGINT, DOUBLE, DATETIME, or STRING type.
- Return value
- The value that is ranked nth is returned. If no implicit conversion is required, the return value is of the same data type as the input parameter.
- If a data type conversion is performed among the DOUBLE, BIGINT, and STRING types, a value of the DOUBLE type is returned. If a data type conversion is performed between the STRING and DATETIME types, a value of the DATETIME type is returned. Implicit conversions of other data types are not allowed.
- NULL is interpreted as the minimum value.
- Example
-- 2 is returned. select ordinal(3, 1, 3, 2, 5, 2, 4, 6);
PARTITION_EXISTS
- Syntax
boolean partition_exists(string <table_name>, string... <partitions>)
- Description
Checks whether a specified partition exists in a table.
- Parameters
- table_name: required. The table name, which is of the STRING type. You can specify a project
name in the table name, for example,
my_proj.my_table
. If you do not specify a project name, the current project name is used. - partitions: required. The partition name, which is of the STRING type. In this parameter, you must specify the values of partition key columns in a table based on the sequence of these columns. The number of values must be the same as the number of partition key columns.
- table_name: required. The table name, which is of the STRING type. You can specify a project
name in the table name, for example,
- Return value
A value of the BOOLEAN type is returned. If the specified partition exists, True is returned. Otherwise, False is returned.
- Example
-- Create the foo partitioned table. create table foo (id bigint) partitioned by (ds string, hr string); -- Add partitions to the foo table. alter table foo add partition (ds='20190101', hr='1'); -- Check whether partitions 20190101 and 1 exist. True is returned. select partition_exists('foo', '20190101', '1');
LEAST
- Syntax
least(<var1>, <var2>, …)
- Description
Returns the minimum value of the input parameters.
- Parameter
var: required. The values of the input parameters, which are of the BIGINT, DOUBLE, DECIMAL, DATETIME, or STRING type. If the values of all input parameters are NULL, NULL is returned.
- Return value
- The minimum value of input parameters is returned. If no implicit conversion is required, the return value is of the same data type as the input parameter.
- If a data type conversion is performed among the DOUBLE, BIGINT, and STRING types, a value of the DOUBLE type is returned. If a data type conversion is performed between the STRING and DATETIME types, a value of the DATETIME type is returned. If a data type conversion is performed among the DECIMAL, DOUBLE, BIGINT, and STRING types, a value of the DECIMAL type is returned. Implicit conversions of other data types are not allowed.
- NULL is interpreted as the minimum value.
MAX_PT
- Syntax
max_pt(<table_full_name>)
- Description
Returns the name of the largest level-1 partition in a partitioned table and reads the data files of this partition. This function determines the largest partition by sorting partitions in alphabetical order.
- Parameter
table_full_name: required. The table name, which must be specified with the project name, for example, prj.src. You must have read permissions on the table.
- Return value
The name of the largest partition is returned.
NoteIf a partition is added by using the
ALTER TABLE
statement and the partition contains no data file, the name of this partition is not returned. - Example
For example, tbl is a partitioned table, and the partitions in the table are 20120901 and 20120902, both of which contain data files. If you execute the following statement, the return value of
max_pt
is'20120902'
, and the MaxCompute SQL statement reads data from the20120902
partition.select * from tbl where pt=max_pt('myproject.tbl');
UUID
- Syntax
string uuid()
- Description
Returns a random ID, which is in the format of
29347a88-1e57-41ae-bb68-a9edbdd9****
.Note The return value is a random global ID that has a low probability of duplication.
SAMPLE
- Syntax
boolean sample(<x>, <y>, [<column_name1>, <column_name2>,...])
- Description
Samples all values read from column_name based on x and y, and filters out the rows that do not meet sampling conditions.
- Parameters
- x and y: x is required. x and y are integer constants that are greater than 0. Their values
are of the BIGINT type. They indicate that the values fall into x portions based on the hash function and the yth portion is used.
If y is not specified, you do not need to specify column_name. In this case, the first portion is used.
An error is returned if x or y is of another data type, the value of x or y less than or equal to 0, or y is greater than x. If the value of x or y is NULL, NULL is returned.
- column_name: optional. The name of the destination column on which sampling is performed. This
parameter is optional. If it is not specified, random sampling is performed based
on the values of x and y. It can be of any data type, and the column value can be NULL. No implicit conversion
is performed. If column_name is constant NULL, an error is returned.
Note To avoid data skew due to the NULL value, uniform hashing is performed on the NULL values in column_name in x portions. If column_name is not specified and the amount of data is small, the output is not necessarily uniform. In this case, we recommend that you specify column_name to obtain a uniform output.
- x and y: x is required. x and y are integer constants that are greater than 0. Their values
are of the BIGINT type. They indicate that the values fall into x portions based on the hash function and the yth portion is used.
- Return value
A value of the BOOLEAN type is returned.
- Example
Assume that the
tbla
table exists and contains thecola
column.-- The values in the cola column fall into four portions based on the hash function, and the first portion is used. True is returned. select * from tbla where sample (4, 1 , cola); -- The values in each row are randomly hashed to four portions, and the second portion is used. True is returned. select * from tbla where sample (4, 2);
CASE WHEN expression
- Syntax
MaxCompute provides the following
CASE WHEN
syntax:-
case <value> when <value1> then <result1> when <value2> then <result2> ... else <resultn> end
-
case when (<_condition1>) then <result1> when (<_condition2>) then <result2> when (<_condition3>) then <result3> ... else <resultn> end
-
- Description
Returns the value of result based on the calculation result of value or _condition.
- Parameters
- value: required. The value used for comparison.
- _condition: required. The specified condition.
- result: required. The return value.
Note- If the data types of all result values are only BIGINT and DOUBLE, the data types are converted to the DOUBLE type and then values are returned.
- If result values of the STRING type exist, the data types are converted to the STRING type and then values are returned. If an unsupported conversion, such as a conversion from the BOOLEAN type to the STRING type, is involved, an error is returned.
- Conversions between other data types are not allowed.
- Example
select case when shop_name is null then 'default_region' when shop_name like 'hang%' then 'zj_region' end as region from sale_detail;
IF expression
- Syntax
if(<testCondition>, <valueTrue>, <valueFalseOrNull>)
- Description
Checks whether testCondition evalutes to true. If testCondition evalutes to true, the value of valueTrue is returned. Otherwise, the value of valueFalseOrNull is returned.
- Parameters
- testCondition: required. The expression that you want to evaluate. The value is of the BOOLEAN type.
- valueTrue: required. The value returned when testCondition evaluates to true
- valueFalseOrNull: the value returned when testCondition evaluates to false. You can set this parameter to NULL.
- Return value
The data type of the return value is the same as that of valueTrue or valueFalseOrNull.
- Example
-- 200 is returned. select if(1=2,100,200);
SPLIT
- Syntax
split(<str>, <pat>)
- Description
Returns an array after str is split with pat.
- Parameters
- str: required. The string that you want to split, which is of the STRING type.
- pat: required. The STRING-type delimiter, which can be a regular expression.
- Return value
array <string>
is returned. - Example
-- ["a","b","c"] is returned. select split("a,b,c",",");
STR_TO_MAP
- Syntax
str_to_map(<text> [, <delimiter1> [, <delimiter2>]])
- Description
Splits text into key-value pairs by using delimiter1 and then separates keys from values in the key-value pairs by using delimiter2.
- Parameters
- text: required. The string that you want to split, which is of the STRING type.
- delimiter1: required. The delimiter of the STRING type. If this parameter is not specified,
commas (
,
) are used. - delimiter2: required. The delimiter of the STRING type. If this parameter is not specified,
equal signs (
=
) are used.Note If the delimiter is a regular expression or special character, you must add two backslashes (\\) before the delimiter for escaping. The following special characters can be used as a delimiter: . ? + * :
- Return value
A value of the
map<string, string>
type is returned. The return value indicates that the string specified by text is split by using delimiter1 and delimiter2. - Example
-- {"test2":"2","test1":"1"} is returned. select str_to_map('test1&1-test2&2','-','&'); -- {"test2":"2","test1":"1"} is returned. select str_to_map("test1.1,test2.2", ",", "\\.") ;
EXPLODE
- Limits
- A
SELECT
statement can contain only oneEXPLODE
function, and no other columns of a table are allowed. - This function cannot be used with the
GROUP BY
,CLUSTER BY
,DISTRIBUTE BY
, orSORT BY
clause.
- A
- Syntax
explode (<var>)
- Description
Transposes one row of data into multiple rows. This function is a UDTF.
- If the parameter value is of the
array<T>
type, the array stored in the column is transposed into multiple rows. - If the parameter value is of the
map<K, V>
type, each key-value pair in the MAP parameter stored in the column is transposed into one row with two columns. One column is used to store keys, and the other column is used to store values.
- If the parameter value is of the
- Parameter
var: the data type, which can be
array<T>
ormap<K, V>
. - Return value
Rows after transposition are returned.
- Example
select explode(array(null, 'a', 'b', 'c')) col; -- Return result: +------------+ | col | +------------+ | NULL | | a | | b | | c | +------------+
STACK
- Syntax
stack(n, expr1, ..., exprk)
- Description
Splits
expr1, ..., exprk
into n rows. Unless otherwise specified, the output result uses the default column namescol0, col1...
. - Parameters
- n: required. The number of rows obtained after splitting.
- expr: required. The parameter that you want to split
expr1,... exprk
must be of the INTEGER type, and the number of parameters must be an integer multiple of n. The parameter must be able to be split into n complete rows. Otherwise, an error is returned.
- Return value
n rows with a specific number of columns are returned. The number of columns is equal to the number of parameters divided by n.
- Example
-- Split the parameter group of 1, 2, 3, 4, 5, 6 into three rows. select stack(3, 1, 2, 3, 4, 5, 6); -- Return result: +------+------+ | col0 | col1 | +------+------+ | 1 | 2 | | 3 | 4 | | 5 | 6 | +------+------+ -- Split 'A',10,date '2015-01-01','B',20,date '2016-01-01' into two rows. select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2); -- Return result: +------+------+------+ | col0 | col1 | col2 | +------+------+------+ | A | 10 | 2015-01-01 | | B | 20 | 2016-01-01 | +------+------+------+ -- Split the parameter group of a, b, c, and d into two rows. If the source table contains multiple rows, this function is called for each row. select stack(2,a,b,c,d) as (col,value) from values (1,1,2,3,4), (2,5,6,7,8), (3,9,10,11,12), (4,13,14,15,null) as t(key,a,b,c,d); -- Return result: +------+-------+ | col | value | +------+-------+ | 1 | 2 | | 3 | 4 | | 5 | 6 | | 7 | 8 | | 9 | 10 | | 11 | 12 | | 13 | 14 | | 15 | NULL | +------+-------+ -- Use this function with the LATERAL VIEW clause. select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20, date '2016-01-01') tf as col0,col1,col2; -- Return result: +------+------+------+ | col0 | col1 | col2 | +------+------+------+ | A | 10 | 2015-01-01 | | B | 20 | 2016-01-01 | +------+------+------+
MAP
- Syntax
MAP map(K <key1>, V <value1>, K <key2>, V <value2>, ...)
- Description
Defines a MAP parameter based on a specific key-value pair.
- Parameters
- key: required. All keys must be of the same data types, including the types after an implicit conversion. The data types must be basic types.
- value: required. All values must be of the same data type, including the data types after an implicit conversion. Random data types can be used.
MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:- Session level: To use a new data type, you must insert
SET odps.sql.type.system.odps2=true;
before the SQL statement, and commit and execute them together. - Project level: The project owner can set the project as needed. It takes 10 to 15
minutes for the settings to take effect. Run the following command:
For more information aboutsetproject odps.sql.type.system.odps2=true;
setproject
, see Project operations. For the precautions you must take when you enable data types at the project level, see Data types.
- Return value
The MAP parameter is returned.
- Example
Column information of the
t_table
table isc1 BIGINT, c2 STRING, c3 STRING, c4 BIGINT, c5 BIGINT
. The table contains the following data:+------------+----+----+------------+------------+ | c1 | c2 | c3 | c4 | c5 | +------------+----+----+------------+------------+ | 1000 | k11 | k21 | 86 | 15 | | 1001 | k12 | k22 | 97 | 2 | | 1002 | k13 | k23 | 99 | 1 | +------------+----+----+------------+------------+ -- Execute the following SQL statement: select map(c2,c4,c3,c5) from t_table; -- Return result: +------+ | _c0 | +------+ | {k11:86, k21:15} | | {k12:97, k22:2} | | {k13:99, k23:1} | +------+
MAP_KEYS
- Syntax
ARRAY map_keys(map<K, V>)
- Description
Returns all keys in the MAP parameter as an array.
- Parameter
The parameter is of the MAP type.
- Return value
A value of the ARRAY type is returned. If the input value is NULL, NULL is returned.
- Example
Column information of the
t_table_map
table isc1 BIGINT,t_map MAP<STRING,BIGINT>
. The table contains the following data:+------------+-------+ | c1 | t_map | +------------+-------+ | 1000 | {k11:86, k21:15} | | 1001 | {k12:97, k22:2} | | 1002 | {k13:99, k23:1} | +------------+-------+ -- Execute the following SQL statement: select c1,map_keys(t_map) from t_table_map; -- Return result: +------------+------+ | c1 | _c1 | +------------+------+ | 1000 | [k11, k21] | | 1001 | [k12, k22] | | 1002 | [k13, k23] | +------------+------+
MAP_VALUES
- Syntax
ARRAY map_values(map<K, V>)
- Description
Returns all values in the MAP parameter as an array.
- Parameter
The parameter is of the MAP type.
- Return value
A value of the ARRAY type is returned. If the input value is NULL, NULL is returned.
- Example
select map_values(map('a',123,'b',456)); -- Return result: [123, 456]
ARRAY
- Syntax
ARRAY array(value1,value2, ...)
- Description
Creates an array by using given values.
- Parameter
The parameters can be of any type, but the types of all parameters must be the same.
MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:- Session level: To use a new data type, you must insert
SET odps.sql.type.system.odps2=true;
before the SQL statement, and commit and execute them together. - Project level: The project owner can set the project as needed. It takes 10 to 15
minutes for the settings to take effect. Run the following command:
For more information aboutsetproject odps.sql.type.system.odps2=true;
setproject
, see Project operations. For the precautions you must take when you enable data types at the project level, see Data types.
- Session level: To use a new data type, you must insert
- Return value
A value of the ARRAY type is returned.
- Example
Column information of the
t_table
table isc1 BIGINT, c2 STRING, c3 STRING, c4 BIGINT, c5 BIGINT
. The table contains the following data:+------------+----+----+------------+------------+ | c1 | c2 | c3 | c4 | c5 | +------------+----+----+------------+------------+ | 1000 | k11 | k21 | 86 | 15 | | 1001 | k12 | k22 | 97 | 2 | | 1002 | k13 | k23 | 99 | 1 | +------------+----+----+------------+------------+ -- Execute the following SQL statement: select array(c2,c4,c3,c5) from t_table; -- Return result: +------+ | _c0 | +------+ | [k11, 86, k21, 15] | | [k12, 97, k22, 2] | | [k13, 99, k23, 1] | +------+
SIZE
- Syntax
INT size(map) INT size(array)
- Description
Uses
size(map)
to return the number of key-value pairs in a specifiedMAP
parameter. Usessize(array)
to return the number of elements in a specifiedarray
. - Parameters
map
: the data of the MAP type.array
: the data of the ARRAY type.
- Example
-- 2 is returned. select size(map('a',123,'b',456)); -- 3 is returned. select size(map('a',123,'b',456,'c',789)); -- 2 is returned. select size(array('a','b')); -- 3 is returned. select size(array(123,456,789));
ARRAY_CONTAINS
- Syntax
boolean array_contains(ARRAY<T> a,value v)
- Description
Checks whether array
a
containsv
. - Parameters
a
: the data of the ARRAY type.v
: The value of v must be of the same data type as the data in the array.
- Example
Column information of the
t_table_array
table isc1 BIGINT, t_array ARRAY<STRING>
. The table contains the following data:+------------+---------+ | c1 | t_array | +------------+---------+ | 1000 | [k11, 86, k21, 15] | | 1001 | [k12, 97, k22, 2] | | 1002 | [k13, 99, k23, 1] | +------------+---------+ -- Execute the following SQL statement: select c1, array_contains(t_array,'1') from t_table_array; -- Return result: +------------+------+ | c1 | _c1 | +------------+------+ | 1000 | false | | 1001 | false | | 1002 | true | +------------+------+
SORT_ARRAY
- Syntax
ARRAY sort_array(ARRAY<T>)
- Description
Sorts given arrays.
- Parameter
ARRAY<T>
: the data of the ARRAY type. Data in the array can be of any data type.MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:- Session level: To use a new data type, you must insert
SET odps.sql.type.system.odps2=true;
before the SQL statement, and commit and execute them together. - Project level: The project owner can set the project as needed. It takes 10 to 15
minutes for the settings to take effect. Run the following command:
For more information aboutsetproject odps.sql.type.system.odps2=true;
setproject
, see Project operations. For the precautions you must take when you enable data types at the project level, see Data types.
- Session level: To use a new data type, you must insert
- Return value
A value of the ARRAY type is returned.
- Example
Column information of the
t_array
table isc1 ARRAY<STRING>,c2 ARRAY<INT> ,c3 ARRAY<STRING>
. The table contains the following data:+------------+---------+--------------+ | c1 | c2 | c3 | +------------+---------+--------------+ | [a, c, f, b] | [4, 5, 7, 2, 5, 8] | [You, Me, Him] | +------------+---------+--------------+ -- Execute the following SQL statement: select sort_array(c1),sort_array(c2),sort_array(c3) from t_array; -- Return result: [a, b, c, f] [2, 4, 5, 5, 7, 8] [Him, You, Me]
POSEXPLODE
- Syntax
posexplode(ARRAY<T>)
- Description
Converts a given array into a table that has two columns. The first column lists subscripts of each value in the array, starting from 0. The second column lists array elements.
- Parameter
ARRAY<T>
: the data of the ARRAY type. Data in the array can be of any data type.MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:- Session level: To use a new data type, you must insert
SET odps.sql.type.system.odps2=true;
before the SQL statement, and commit and execute them together. - Project level: The project owner can set the project as needed. It takes 10 to 15
minutes for the settings to take effect. Run the following command:
For more information aboutsetproject odps.sql.type.system.odps2=true;
setproject
, see Project operations. For the precautions you must take when you enable data types at the project level, see Data types.
- Session level: To use a new data type, you must insert
- Return value
The generated table is returned.
- Example
select posexplode(array('a','c','f','b')); -- Return result: +------------+------------+ | pos | val | +------------+------------+ | 0 | a | | 1 | c | | 2 | f | | 3 | b | +------------+------------+
STRUCT
- Syntax
STRUCT struct(value1,value2, ...)
- Description
Creates a struct based on a given
value
list. - Parameter
value
: It can be of any data type.MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:- Session level: To use a new data type, you must insert
SET odps.sql.type.system.odps2=true;
before the SQL statement, and commit and execute them together. - Project level: The project owner can set the project as needed. It takes 10 to 15
minutes for the settings to take effect. Run the following command:
For more information aboutsetproject odps.sql.type.system.odps2=true;
setproject
, see Project operations. For the precautions you must take when you enable data types at the project level, see Data types.
- Session level: To use a new data type, you must insert
- Return value
A value of the STRUCT type is returned. Column names are sequentially named as
col1, col2, ...
. - Example
select struct('a',123,'ture',56.90); -- Return result: +------------+ | _c0 | +------------+ | {"col1":"a","col2":123,"col3":"ture","col4":56.9} | +------------+ 1 records (at most 10000 supported) fetched by instance tunnel.
NAMED_STRUCT
- Syntax
STRUCT named_struct(string name1, T1 value1, string name2, T2 value2, ...)
- Description
Creates a struct based on given
name-value
pairs. - Parameters
value
: It can be of any data type.name
: the column name of the STRING type. This parameter is a constant.
MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:- Session level: To use a new data type, you must insert
SET odps.sql.type.system.odps2=true;
before the SQL statement, and commit and execute them together. - Project level: The project owner can set the project as needed. It takes 10 to 15
minutes for the settings to take effect. Run the following command:
For more information aboutsetproject odps.sql.type.system.odps2=true;
setproject
, see Project operations. For the precautions you must take when you enable data types at the project level, see Data types.
- Return value
A value of the STRUCT type is returned. Column names are sequentially named as
name1, name2, ...
. - Example
select named_struct('user_id',10001,'user_name','LiLei','married','F','weight',63.50); -- Return result: +------------+ | _c0 | +------------+ | {"user_id":10001,"user_name":"LiLei","married":"F","weight":63.5} | +------------+ 1 records (at most 10000 supported) fetched by instance tunnel.
INLINE
- Syntax
inline(array<struct<f1:T1, f2:T2, ... >>)
- Description
Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row.
- Parameter
STRUCT
: The values in the array can be of any data type.MaxCompute V2.0 provides additional date functions. If the functions you are using involve new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, run the following SET statement to enable these data types:- Session level: To use a new data type, you must insert
SET odps.sql.type.system.odps2=true;
before the SQL statement, and commit and execute them together. - Project level: The project owner can set the project as needed. It takes 10 to 15
minutes for the settings to take effect. Run the following command:
For more information aboutsetproject odps.sql.type.system.odps2=true;
setproject
, see Project operations. For the precautions you must take when you enable data types at the project level, see Data types.
- Session level: To use a new data type, you must insert
- Return value
The function generated by the table is returned.
- Example
Column information of the
t_table
table ist_struct struct<user_id:bigint,user_name:string,married:string,weight:double>
. The table contains the following data:+----------+ | t_struct | +----------+ | {user_id:10001, user_name:LiLei, married:N, weight:63.5} | | {user_id:10002, user_name:HanMeiMei, married:Y, weight:43.5} | +----------+ -- Execute the following SQL statement: select inline(array(t_struct)) from t_table; -- Return result: +------------+-----------+---------+------------+ | user_id | user_name | married | weight | +------------+-----------+---------+------------+ | 10001 | LiLei | N | 63.5 | | 10002 | HanMeiMei | Y | 43.5 | +------------+-----------+---------+------------+
TABLE_EXISTS
- Syntax
boolean table_exists(string table_name)
- Description
Checks whether a specified table exists.
- Parameter
table_name: the name of the table that you want to query. It is of the STRING type. You can specify a project name in the table name, for example,
my_proj.my_table
. If you do not specify a project name, the current project name is used. - Return value
A value of the BOOLEAN type is returned. If the specified table exists, True is returned. Otherwise, False is returned.
- Example
-- Used for the list in the SELECT statement. select from(table_exists('abd'), col1, col2) FROM src;
TRANS_ARRAY
- Limits
- All columns that are used as
key
must be placed before the columns are to be transposed. - Only one UDTF is allowed in a
SELECT
statement. - This function cannot be used with the
GROUP BY
,CLUSTER BY
,DISTRIBUTE BY
, orSORT BY
clause.
- All columns that are used as
- Syntax
trans_array (num_keys, separator, key1,key2,…,col1, col2,col3) as (key1,key2,…,col1, col2)
- Description
Transposes one row of data into multiple rows. This function is a UDTF that transposes an array separated by fixed delimiters in a column into multiple rows.
- Parameters
num_keys
: the number of columns that can be used as key when you transpose one row into multiple rows. The value is a constant of the BIGINT type, which must be greater than or equal to 0.key
: duplicate columns in multiple rows after transposition.separator
: the delimiter used to split a string into multiple elements. It is a constant of the STRING type. If it is left blank, an error is returned.keys
: the columns that are used askey
during the transposition. The number of keys is specified bynum_keys
. Ifnum_keys
equals the total number of all columns (which means that all columns are used askey
), only one row is returned.cols
: the array that you want to be converted to rows. All columns that followkeys
are considered arrays to be transposed. The parameter value must be of the STRING type to store arrays in the STRING format, such asHangzhou;Beijing;shanghai
. The values in this array are separated by semicolons (;
).
- Return value
Transposed rows are returned. The new column name is specified by
as
. The data types of columns that are used askey
remain unchanged. All other columns are of the STRING type. The number of separated rows is based on the array with the maximum number of elements. If the number of rows is insufficient, NULL is added. - Example
The
t_table
table contains the following data:+----------+----------+------------+ | login_id | login_ip | login_time | +----------+----------+------------+ | wangwangA | 192.168.0.1,192.168.0.2 | 20120101010000,20120102010000 | | wangwangB | 192.168.45.10,192.168.67.22,192,168.6.3 | 20120111010000,20120112010000,20120223080000 | +----------+----------+------------+ -- Execute the following SQL statement: select trans_array(1, ",", login_id, login_ip, login_time) as (login_id,login_ip,login_time) from t_table; Return result: +----------+----------+------------+ | login_id | login_ip | login_time | +----------+----------+------------+ | wangwangB | 192.168.45.10 | 20120111010000 | | wangwangB | 192.168.67.22 | 20120112010000 | | wangwangB | 192.168.6.3 | 20120223080000 | | wangwangA | 192.168.0.1 | 20120101010000 | | wangwangA | 192.168.0.2 | 20120102010000 | +----------+----------+------------+ -- The table contains the following data: Login_id LOGIN_IP LOGIN_TIME wangwangA 192.168.0.1,192.168.0.2 20120101010000 -- NULL is added to supplement the array which has insufficient data. Login_id Login_ip Login_time wangwangA 192.168.0.1 20120101010000 wangwangA 192.168.0.2 NULL
TRANS_COLS
- Limits
- All columns that are used as
key
must be placed before the columns are to be transposed. - Only one UDTF is allowed in a
SELECT
statement.
- All columns that are used as
- Syntax
trans_cols (num_keys, key1,key2,…,col1, col2,col3) as (idx, key1,key2,…,col1, col2)
- Description
Transposes one row of data into multiple rows. This function is a UDTF that transposes columns into rows.
- Parameters
- num_keys: The value is a constant of the BIGINT type, which must be greater than or equal to 0. This parameter specifies the number of columns that can be used as key when you transpose one row into multiple rows.
- key: duplicate columns in multiple rows when you transpose one row to multiple rows.
- keys: the columns that are used as key during the transposition. The number of keys is specified by num_keys. If num_keys equals the total number of all columns (which means that all columns are used as key), only one row is returned.
- col: the column you want to convert into a row.
- Return value
Transposed rows are returned. The new column name is specified by
as
. The first output column is the subscripts of the transposition, which start with 1. The data types of the columns that are used as key remain unchanged, whereas the data types of other columns remain unchanged. - Example
The
t_table
table contains the following data:+----------+----------+------------+ | Login_id | Login_ip1 | Login_ip2 | +----------+----------+------------+ | wangwangA | 192.168.0.1 | 192.168.0.2 | +----------+----------+------------+ -- Execute the following SQL statement: select trans_cols(1, login_id, login_ip1, login_ip2) as (idx, login_id, login_ip) from t_table; -- Return result: idx login_id login_ip 1 wangwangA 192.168.0.1 2 wangwangA 192.168.0.2
GET_USER_ID
- Syntax
get_user_id()
- Description
Obtains the ID of the current account, which is the user ID (UID).
- Parameter
No input parameters are required, and an error is returned.
- Return value
The ID of the current account is returned.
- Example
select get_user_id(); -- Return result: +------------+ | _c0 | +------------+ | 1117xxxxxxxx8519 | +------------+