CAST

Command format:
cast(expr as <type>)

Usage:

Converts the result of expression to object type. For example, cast (‘1’ as bigint) converts string ‘1’ to bigint ‘1’. If the conversion is unsuccessful or not supported, an exception is thrown.
Note
  • cast (double as bigint): Converts double type value to bigint type value.
  • cast (string as bigint): While the string is converted to ‘bigint’ type, if the digits indicated by ‘int’ exist in this string, it is converted to ‘bigint’ type directly. If the digits indicated by ‘float’ or ‘exponent’ exist in this string, firstly they are converted to ‘double’ type and then converted to ‘bigint’ type.
  • For cast (string as datetime) or cast (datetime as string), it adopts the default datetime format yyyy-mm-dd hh: mi: ss.

COALESCE

Command format:
coalesce(expr1, expr2, ...)

Usage:

Returns the first value which is not NULL from the list. If all values in the list are NULL, return NULL.

Parameter description:

expr: Value to be tested. All these values have the same data type or be NULL, otherwise an exception is thrown.

Return value:

Returns value type is the same as parameter type.
Note At least one parameter must exist, otherwise an exception is thrown.

DECODE

Command format:
decode(expression, search, result[, search, result]...[, default])

Usage:

Implements the selection function of if-then-else branch.

Parameter description:
  • expression: An expression to be compared.
  • search: The search string; which is compared with the expression.
  • result: Return value when the values of search and expression match.
  • default: Optional. If all search items do not match the expression, return this default value. If it is not specified, return NULL.
Return value:
  • Returns a matched search.
  • If no matched record exists, return default.
  • If default is not specified, return NULL.
    Note
    1. Specify at least three parameters.
    2. All of the result types must be the same or NULL. Inconsistent data type throws an exception. All of the ‘search’ and ‘expression’ types must be consistent, otherwise an exception is thrown.
    3. If the option ‘search’ in ‘decode’ has duplicate or matched records, return the first value.
Example:
Select
decode(customer_id,
1, 'Taobao',
2, 'Alipay',
3, 'Aliyun',
Null, 'N/A',
'Others') as result
from sale_detail;
The decode function mentioned earlier implements the function in following if-then-else sentence:
if customer_id = 1 then
result := 'Taobao';
elsif customer_id = 2 then
result := 'Alipay';
elsif customer_id = 3 then
result := 'Aliyun';
...
else
result := 'Others';
end if;
Note
  • Calculating NULL= NULL by MaxCompute SQL, return NULL, while the values of NULL and NULL are equal in a decode function.
  • In the preceding example, if the value of customer_id is NULL, decode function returns N/A.

GET_IDCARD_AGE

Command format:
get_idcard_age(idcardno)

Usage

Returns the current age according to the ID information. The number is the difference of the current year and the birth year identified in the ID.

Parameter description:

idcardno: String type, ID number of 15-digit or 8-digit. While calculating, the validity of the ID is checked according to the province code and the last digit, and Null returns if the check fails.

Return value:

Returns the Bigint type. Input is Null, returns Null. Returns Null if the difference of the current year and the birth year is greater than 100.

GET_IDCARD_BIRTHDAY

Command format:
get_idcard_birthday(idcardno)

Usage:

Returns date of birth according to the ID information.

Parameter description:

idcardno: String type, ID number of 15-digit or 18-digit. While calculating, the validity of the ID is checked according to the province code and the last digit, and Null is returned if the check fails.

Return value:

Returns the Datetime type. Input is Null, returns Null.

GET_IDCARD_SEX

Command format:
get_idcard_sex(idcardno)

Usage

Returns the gender according to the ID information. The value is either M (male) or F (female)

Parameter description:

idcardno: String type, ID number of 15-digit or 18-digit. While calculating, the validity of the ID is checked according to the province code and the last digit, and Null is returned if the check fails.

Return value:

Returns the String type. Input is Null, returns Null.

GREATEST

Command format:
greatest(var1, var2, ...)

Usage

Returns the greatest input parameter.

Parameter description:

var1/var2: Its type can be Bigint, Double, Decimal, Datetime, or String type. If all values are NULL, return NULL.

Return value:

  • The greatest value in the input parameter. If the implicit conversion is not needed, return type is the same as input parameter type.
  • NULL is the least value.
    If the input parameter types are different, 
    • for Double, Bigint, Decimal and String type, convert them into Double type. 
    • for String and Datetime, convert them into Datetime type.
    • other implicit conversions are not allowed.

ORDINAL

Command format:
ordinal(bigint nth, var1, var2, ...)

Usage

Returns the location value specified by ‘nth’ after the input variables are sorted by small to large.

Parameter description:
  • nth: Bigint type, specify the location to return its value. If it is NULL, return NULL.
  • var1/var2: Its type can be Bigint, Double, Datetime, or String type.
Return value:
  • The value in nth bit. If the implicit conversion is not needed, return type is the same as input parameter type.
  • If implicit conversion is in input parameters,
    • For Double, Bigint and String type, convert them into Double type.
    • For String and Datetime type, convert them into Datetime type.
    • Other implicit conversions are not allowed.
  • NULL is the least value.
Example:
ordinal(3, 1, 3, 2, 5, 2, 4, 6) = 2

LEAST

Command format:
least(var1, var2, ...)

Usage

Returns the least value in the input parameter.

Parameter description:

var1/var2: Its type can be Bigint, Double, Decimal, Datetime, or String type. If all values are NULL, return NULL.

Return value:
  • The least value in the input parameter. If the implicit conversion is not needed, return type is the same as input parameter type.
  • If implicit conversion is in input parameters:
    • For Double, Bigint and String type, convert them into Double type.
    • For String and Datetime type, convert them into Datetime type.
    • Convert to Decimal type when Decimal type compares to Double, Bigint or String type.
    • Other implicit conversions are not allowed.
  • NULL is the least value.

MAX_PT

Command prompt:
max_pt(table_full_name)

Usage

For a partitioned table, this function returns the maximum value of the level-one partition of the partitioned table, which is sorted alphabetically, and a corresponding data file exists for the partition.

Parameter description:

table_full_name: String type, specifies the name of table. To use this funtion, specify the name of project, for example: prj.src. You must have read permission on this table.

Return value:

Returns the value of the largest level-one partition.

Example:

Example: Suppose that ‘tbl’ is a partitioned table, all partitions of the table with data files are as follows:
pt =‘20120901’
pt =‘20120902’
In the following statement, the return value of max_pt is ‘20120902’, and the MaxCompute SQL statement reads the data in the ‘20120902’ partition.
select * from tbl where pt=max_pt('myproject.tbl');
Note

If you only add a new partition using alter table, but no data file is in this partition, the partition will not return.

UUID

Command format:
string uuid()

Usage

Returns a random ID. For example: 29347a88-1e57-41ae-bb68-a9edbdd94212.

Note

Returns a random global ID with a low probability of duplication.

SAMPLE

Command format:
boolean sample(x, y, column_name)

Usage

Samples all values of column_name according to the setting of x and y and filter out the rows which do not meet the sampling conditions.

Parameter description:
  • x, y: Bigint type, indicates hash to x portions, take yth portions. y can be ignored.
    • If y is ignored, take the first portion. If y in the parameter is ignored, then column_name is ignored at the same time. 
    • x and y are Bigint constants and greater than 0. If it is other data type or less than or equal to 0, an exception is thrown. If y&gt, x exception is thrown. If any input of x and y is NULL, return NULL.
  • column_name: The destination column to be sampled.
    • column_name can be excluded, in this case, a random sample is taken according to the values of x and y.
    • It can be any data type and the column value can be NULL. Implicit type conversion is not needed.
    • If column_name is the constant NULL, an exception is thrown.

Return value:

Returns the Boolean type.

Note

To avoid data skew caused by NULL value, the NULL values in column_name are perform a uniform hash operations in x portions. If ‘column_name’ is not added, the output will not necessarily be uniform, since the data size is small. We recommend to add a ‘column_name’ for better output.

Example:

Suppose that the table 'tbla' already exists and a column 'cola' is in this table:
select * from tbla where sample (4, 1 , cola) = true;
-- The values are carried out Hash into 4 portions and take the first portion.
select * from tbla where sample (4, 2) = true;
-- The values do random Hash into 4 portions for each row of data and take the second portion.

CASE WHEN Expression

The following are the two case types that MaxCompute offers:
case value
when (_condition1) then result1
when (_condition2) then result2
...
else resultn
end
case
when (_condition1) then result1
when (_condition2) then result2
when (_condition3) then result3
...
else resultn
end

‘case when’ expression can return different values according to the computing result of expression values. 

The following sentences are used to get the region according to different shop_name:
select
case
when shop_name is null then 'default_region'
when shop_name like 'hang%' then 'zj_region'
end as region
from sale_detail;
Note
  • If the types of result include bigint and double, convert them to double type and then return the result.
  • If the types of result include string type, convert them into string type and then return the result. If the conversion is unsuccessful, the error is thrown, for example, boolean type.

The conversion between other types is not allowed.

IF Expression

Command format:
if(testCondition, valueTrue, valueFalseOrNull)

Usage

Determine if testCondition is true. If it is true, return valueTrue, otherwise return valueFalse or Null.

Parameter description:
  • testCondition: The expression to be Determined. Boolean type.
  • valueTrue: It returns when the expression testCondition is true.
  • valueFalseOrNull: It returns when the expression testCondition is not true and can also be null.

Return value:

The return type is the same as the valueTrue or valueFalseOrNul type.

Example:
select if(1=2,100,200) from dual; 
Return Value:
+------------+
| _c0 |
+------------+
| 200 |
+------------+

New extended other functions

SPLIT

Command format:
split(str, pat)

Usage

Separates 'str' using 'pat'.

Parameter description:
  • str: String type, specifies the string to be separated.
  • pat: String type, specifies the delimiter and supports regular expressions.

Return value:

array <string >, the result is the elements in 'str' separated by 'pat'.

Example:
select split("a,b,c",",") from dual;
Result:
+------+
| _c0 |
+------+
| [a, b, c] |
+------+
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

EXPLODE

Command format:
explode(var)

Usage

Converts one row of data into a multi-row UDTF.
  • If var is Array type, the array stored in the column is converted to multiple rows.
  • If var is Map type, each key-value pair of the map stored in the column is converted to a row with two columns, one column for the key and the other for the value.

Parameter description:

var: array<T> type or map<K, V> type.

Return value:

Rows after conversion are returned.
Note
The following limits apply when UDTF is used:
  • One select can only have one UDTF and no other columns can appear.
  • It cannot be used with group by, cluster by, distribute by, or sort by.
Example:
explode(array(null, 'a', 'b', 'c')) col
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

MAP

Command format:
MAP map(K key1, V value1, K key2, V value2, ...)

Usage

Uses the given key or value pairs to create a map.

Parameter description:

key/value
  • All key types must be the same basic type. 
  • Any value types can be used, but all value types must be consistent.

Return value:

Returns the map<K:V> type.

Example:
select map('a',123,'b',456) from dual;
Result:
{a:123, b:456}
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

MAP_KEYS

Command format:
ARRAY map_keys(map<K, V> )

Usage

Returns an array of all the keys in the map parameter.

Parameter description:

map<K, V>: Map-type data.

Return Value:

Returns the array<K> type. If the input is null, null is returned.

Example:
select map_keys(map('a',123,'b',456)) from dual;
Result:
[a, b]
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

MAP_VALUES

Command format:
ARRAY map_values(map<K, V>)

Usage

Returns an array of all the values in the map parameter.

Parameter description:

map<K, V>: Map-type data.

Return Value:

Returns the array<V> type. If the input is null, null is returned.

Example:
select map_values(map('a',123,'b',456));
Result:
[123, 456]
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

ARRAY

Command format:
ARRAY array(value1,value2, ...)

Usage

Creates an array using the given values.

Parameter description:

value: This parameter can be of any type, but all the values must be of the same type.

Return value:

Returns the Array type.

Example:
select array(123,456,789) from dual;
Result:
[123, 456, 789]
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

SIZE

Command format:
INT size(map)
INT size(array)
Usage
  • size(map<K,V>) returns the number of K/V pairs in the given map.
  • size(array<T>) returns the number of elements in the given array.
Parameter description:
  • map<K, V>: Map-type data.
  • array<T>: Array-type data.

Return value:

Returns the Int type.

Example:
select size(map('a',123,'b',456)) from dual;--Returns 2
select size(map('a',123,'b',456,'c',789)) from dual;--Returns 3
select size(array('a','b')) from dual;--Returns 2
select size(array(123,456,789)) from dual;--Returns 3
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

ARRAY_CONTAINS 

Command format:
boolean array_contains(ARRAY<T> a,value v)

Usage

Checks if the given array a contains v.

Parameter description:
  • a: Array-type data.
  • v: The given v must be of the same type as the data in the array.

Return value:

Returns the Boolean type.

Example:
select array_contains(array('a','b'), 'a') from dual; --Returns true
select array_contains(array(456,789),123) from dual; -- Returns false
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

SORT_ARRAY

Command format:
ARRAY sort_array(ARRAY<T>)

Usage

Sorts the given array.

Parameter description:

ARRAY<T>: Array-type data, the data in the array can be of any type.

Return value:

Returns the array type.

Example:
select sort_array(array('a','c','f','b')),sort_array(array(4,5,7,2,5,8)),sort_array(array('You','Me','He')) from dual;
Result:
[a, b, c, f] [2, 4, 5, 5, 7, 8] [He, You, Me]
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

POSEXPLODE

Command format:
posexplode(ARRAY<T>)

Usage

Explodes the given array. Each value is given a row and each row has two columns corresponding to the subscript (starting from 0) and the array element.

Parameter description:

ARRAY<T>: Array-type data, the data in the array can be of any type.

Return value:

Functions generated by the table are returned.

Example:
select posexplode(array('a','c','f','b')) from dual;
Result:
+------------+-----+
| pos | val |
+------------+-----+
| 0 | a |
| 1 | c |
| 2 | f |
| 3 | b |
+------------+-----+
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

STRUCT

Function definition:
STRUCT struct(value1,value2, ...)

Usage:

Creates a struct using the given value list.

Parameter description:

value: Each value can be of any type.

Return value:

Returns the STRUCT<col1:T1, col2:T2, ... >Type. Field names are sequential: col1, col2, …

Example:
select struct('a',123,'ture',56.90) from dual;
Result:
{col1:a, col2:123, col3:ture, col4:56.9}
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

NAMED_STRUCT

Function definition:
STRUCT named_struct(string name1, T1 value1, string name2, T2 value2, ...)

Usage

Creates a struct using the given name/value list.

Parameter description:
  • value: A value can be of any type.
  • name: Specifies the name of a String-type field.

Return value:

Returns the STRUCT<name1:T1, name2:T2, ... >type. The field names of the generated struct are sequential: name1, name2, …

Example:
select named_struct('user_id',10001,'user_name','LiLei','married','F','weight',63.50) from dual;
Result:
{user_id:10001, user_name:LiLei, married:F, weight:63.5}
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

INLINE

Command format:
inline(array<struct<f1:T1, f2:T2, ... >>)

Usage

Explodes the given struct array. Each element is given one row and each struct element corresponds to one column in each row.

Parameter description:

STRUCT<f1:T1, f2:T2, ... >: The values in the array can be of any type.

Return value:

Returns the table generation function.

Example:
select inline(array(named_struct('user_id',10001,'user_name','LiLei','married','F','weight',63.50))) from dual;
Result:
+------------+-----------+---------+------------+
| user_id | user_name | married | weight |
+------------+-----------+---------+------------+
| 10001 | LiLei | F | 63.5 |
+------------+-----------+---------+------------+
Note

Add set odps.sql.type.system.odps2=true;, opposite the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

TRANS_ARRAY

Command format:
trans_array (num_keys, separator, key1,key2,…,col1, col2,col3) as (key1,key2,…,col1, col2)

Usage

A UDTF that converts a single row of data into multiple rows, and converts an array separated by a fixed-separator format in a column into multiple rows.

Parameter description:
  1. num_keys: Bigint type constant, must be greater than or equal to 0. It is used as a number of columns to transpose key when converting into multiple rows.
  2. Key: Duplicate columns in multiple rows when converting one row into multiple rows.
  3. separator: String type constant, a separator is used to split a string into multiple elements. Exception is thrown when it is null.
  4. keys: As column of key when you transpose. It is specified by num_keys. If num_keys specifies that all columns are keys (that is, num_keys equals the number of all columns), only one row is returned.
  5. cols: An array to convert columns into rows. All columns after keys are considered as an array to be transposed. String type. The stored contents are arrays of string format, such as “Hangzhou; Beijing; Shanghai” and these arrays are separated by a semicolon “;”.

Return value:

Transposed rows, new column names are specified by as. The type of key column remains unchanged, and all other columns are String type. The number of rows to be split depends on the array that has a maximum number, no-value locales are complemented with NULL.

Note
The following limits apply when UDTF is used:
  • All columns that are considered as keys are placed in the front, and columns to be transposed are placed behind.
  • One select can only have one UDTF and no other columns can appear.
  • One select can only have one UDTF and no other columns can appear.

Example:

The table contains the following data:
Login_id LOGIN_IP LOGIN_TIME 
wangwangA 192.168.0.1,192.168.0.2 20120101010000,20120102010000
Use the trans_array function:
trans_array(1, ",", login_id, login_ip, login_time) as (login_id,login_ip,login_time)
Result:
Login_id Login_ip Login_time
wangwangA 192.168.0.1 20120101010000
wangwangA 192.168.0.2 20120102010000
If the table contains the following data:
Login_id LOGIN_IP LOGIN_TIME 
wangwangA 192.168.0.1,192.168.0.2 20120101010000
NULL is complemented to the no-value locales in the array:
Login_id Login_ip Login_time 
wangwangA 192.168.0.1 20120101010000
wangwangA 192.168.0.2 NULL