edit-icon download-icon

Other Functions

Last Updated: May 09, 2018

CAST

Function definition:

  1. cast(expr as <type>)

Use: convert the result of expression to object type. For example, cast (‘1’ as bigint) is to convert string ‘1’ to bingint ‘1’. If the conversion is unsuccessful or the conversion is not supported, an exception occurs.

Notes:

  • cast (double as bigint): convert double type value to bigint type value.
  • cast (string as bigint): while the string is converted to ‘bigint’ type, if the digits indicated by ‘int’ exist in this string, it is converted to ‘bigint’ type directly. If the digits indicated by ‘float’ or ‘exponent’ exist in this string, it is converted to ‘double’ type at first and then converted to ‘bigint’ type.
  • For cast (string as datetime) or cast (datetime as string), it adopts default datatime format yyyy-mm-dd hh: mi: ss.

COALESCE

Function definition:

  1. coalesce(expr1, expr2, ...)

Use: return the first value which is not NULL from the list. If all values in the list are NULL, return NULL.

Parameter description:

  • Expri: value to be tested. All these values have the same data type or be NULL, otherwise an expction occurs.

Return value: return value type is the same as parameter type.

Note:

There must be one parameter at least, otherwise an exception occurs.

DECODE

Function definition:

  1. decode(expression, search, result[, search, result]...[, default])

Use: Implement the selection function of If-Then-Else branch.

Parameter description:

  • expression: expression to be compared.
  • search: the search item; needs to be compared with ‘expression’.
  • result: return value when ‘search’ matches ‘expression’.
  • default: it is optional. If all search items do not match the expression, return this default value. If it is not specified, return NULL.

Return value: return matched ‘search’. If no matching record exists, return ‘default’. If ‘default’ is not specified, return NULL.

Notes:

  • You must specify at least three parameters.
  • All of the result types must be the same or NULL. Inconsistent data type causes an exception. All of the ‘search’ and ‘expression’ types must be consistent, otherwise an exception is reported.
  • If the option ‘search’ in ‘decode’ has repeated record and has been matched, return the first value.

Examples:

  1. select
  2. decode(customer_id,
  3. 1, 'Taobao',
  4. 2, 'Alipay',
  5. 3, 'Alibaba Cloud',
  6. NULL, 'N/A',
  7. 'Others') as result
  8. from sale_detail;

The ‘decode’ function mentioned previously implements the function in following ‘if-then-else’ sentence:

  1. if customer_id = 1 then
  2. result := 'Taobao';
  3. elsif customer_id = 2 then
  4. result := 'Alipay';
  5. elsif customer_id = 3 then
  6. result := 'Alibaba Cloud';
  7. ...
  8. else
  9. result := 'Others';
  10. end if;

Notes:

  • Calculating ‘NULL= NULL’ by MaxCompute SQL, return NULL, while the values of NULL and NULL are equal in decode function.
  • In the preceding example, if the value of ‘customer_id’ is NULL, ‘decode’ function returns ‘N/A’ as a result.

GET_IDCARD_AGE

Function definition:

  1. get_idcard_age(idcardno)

Use: Returns the current age based on the ID number which is the difference of the current year and the year of birth identified in the ID.

Parameter description:

  • idcardno: String type, ID number of 15-digit or 18-digit. In the calculation, the validity of the ID is checked according to the province code and the last digit, and Null is returned if the check fails.

Return value:

Returns bigint type. Input is Null, returns Null. Returns Null if the difference of the current year and the year of birth is larger than 100.

GET_IDCARD_BIRTHDAY

Function definition:

  1. get_idcard_birthday(idcardno)

Use: Returns date of birth based on the ID number.

Parameter description:

  • idcardno: String type, ID number of 15-digit or 18-digit. In the calculation, the validity of the ID is checked according to the province code and the last digit, and Null is returned if the check fails.

Return value:

Returns datetime type. Input is Null, returns Null.

GET_IDCARD_SEX

Function definition:

  1. get_idcard_sex(idcardno)

Use: Returns the gender based on the ID number and the value is either M (male) or F (female).

Parameter description:

  • idcardno: String type, ID number of 15-digit or 18-digit. In the calculation, the validity of the ID is checked according to the province code and the last digit, and Null is returned if the check fails.

Return value:

Returns string type. Input is Null, returns Null.

GREATEST

Function definition:

  1. greatest(var1, var2, …)

Use: Return the greatest input parameter.

Parameter description:

  • var1, var2 can be ‘bigint’ type or ‘double’ type or ‘datetime’ type or ‘string’ type. If all values are NULL, return NULL.

Return value:

  • The greatest value in input parameter. If the implicit conversion is not needed, return type is the same as input parameter type.
  • NULL is the least value.
  • If the input parameter types are different, for ‘double’ type, ‘bigint’ type and ‘string’ type, convert them to be ‘double’ type. For ‘string’ type and ‘datetime’ type, convert them to be ‘datetime’ type. Other implicit conversion is not allowed.

ORDINAL

Function definition:

  1. ordinal(bigint nth, var1, var2, …)

Use: Return the location value specified by ‘nth’ after the input variables are sorted by small to large.

Parameter description:

  • nth: Bigint type, specify the location to return its value. If it is NULL, return NULL.
  • var1, var2: its type can be ‘bigint’ or ‘double’ or ‘datetime’ or ‘string’.

Return Value:

  • The value in nth bit. If the implicit conversion is not needed, return type is the same as input parameter type.
  • If implicit conversion is in input parameters, for ‘double’ type, ‘bigint’ type and ‘string’ type, convert them to be ‘double’ type. For ‘string’ type and ‘datetime’ type, convert them to be ‘datetime’ type. Other implicit conversion is not allowed.
  • NULL is the least value.

Example:

  1. ordinal(3, 1, 3, 2, 5, 2, 4, 6) = 2

LEAST

Function definition:

  1. least(var1, var2, …)

Use: return the least value in input parameter.

Parameter description:

  • ‘var1’/‘var2’: Its type can be ‘bigint’ or ‘double’ or ‘datetime’ or ‘string’. If all values are NULL, return NULL.

Return value:

  • The least value in input parameter; If the implicit conversion is not needed, return type is the same as input parameter type.

  • If implicit conversion is in input parameters,

    • For ‘double’ type, ‘bigint’ type and ‘string’ type, convert them to be ‘double’ type.
    • For ‘string’ type and ‘datetime’ type, convert them to be ‘datetime’ type.
    • Converts to ‘decimal’ type when ‘decimal’ type compares to ‘double’ type, ‘bigint’ type or ‘string’ type.
    • Other implicit conversion is not allowed.
  • NULL is the least value.

MAX_PT

Function definition:

  1. max_pt(table_full_name)

Use: For a partitioned table, this function returns the maximum value of the level-one partition of the partitioned table, which is sorted alphabetically, and there is a corresponding data file for the partition.

Parameter description:

  • table_full_name: String type, specifys the name of table, which must be with the name of project, for example: prj.src). You must own read permission on this table.

Return value: Returns the value of the largest level-one partition.

Example: Suppose that ‘tbl’ is a partitioned table, all partitions of the table are as follows, and there are data files:

  1. pt =‘20120901
  2. pt =‘20120902

In the following statement, the return value of max_pt is ‘20120902’, and the MaxCompute SQL statement reads the data in the ‘20120902’ partition.

  1. select * from tbl where pt=max_pt('myproject.tbl');

Notes:

If you only added a new partition using alter table, but no data file is in this partition, this partition is not returned.

UUID

Function definition:

  1. string uuid()

Use: return a random ID. Example: “29347a88-1e57-41ae-bb68-a9edbdd94212”.

SAMPLE

Function definition:

  1. boolean sample(x, y, column_name)

Use: sample all values of ‘column_name’ according to the setting of x and y and filter out the rows which do not meet the sampling condition.

Parameter description:

  • ‘x’, ‘y’: Bigint type, indicates ‘hash’ to x portions, take yth portions. ‘y’ can be ignored.
    • If ‘y’ is ignored, take the first portion. If ‘y’ in parameter is ignored, then ‘column_name’ is ignored at the same time. ‘x’ and ‘y’ are bigint constants and greater than 0.
    • If it is other data type or less than or equal to 0, an exception is thrown. If y>x, exception is also thrown. If any input of ‘x’ and ‘y’ is NULL, return NULL.
  • column_name: object column of sampling.
    • ‘column_name’ can be ignored. If it is ignored, do random sampling according to x and y.
    • It can be any data type and the column value can be NULL. Do not need implicit type conversion.
    • If ‘column_name’ is the constant NULL, an exception is reported.

Return value: Boolean type.

Notes:

  • To avoid data skew brought by NULL value, so NULL values in column_name are carried out a uniform hash in x portions. If ‘column_name’ is not added, the output is not necessarily uniform since the data size is smaller. So ‘column_name’ is suggested to be added to get better output.

Example: Suppose that the table ‘tbla’ is existent and a column ‘cola’ is in this table:

  1. select * from tbla where sample (4, 1 , cola) = true;
  2. -- The values are carried out Hash into 4 portions and take the first portion.
  3. select * from tbla where sample (4, 2) = true;
  4. -- The values do random Hash into 4 portions for each row of data and take the second portion.

CASE WHEN Expression

ODPS provides two kinds of case when syntax formats, as follows:

  1. case value
  2. when (_condition1) then result1
  3. when (_condition2) then result2
  4. ...
  5. else resultn
  6. end
  7. case
  8. when (_condition1) then result1
  9. when (_condition2) then result2
  10. when (_condition3) then result3
  11. ...
  12. else resultn
  13. end

‘case when’ expression can return different values according to the computing result of expression values flexibly. The following sentences is used to get the region according to different shop_name:

  1. select
  2. case
  3. when shop_name is null then 'default_region'
  4. when shop_name like 'hang%' then 'zj_region'
  5. end as region
  6. from sale_detail;

Notes:

  • If the types of result include bigint and double, convert them to be double type and then return the result.
  • If the types of result include string type, convert them to be string type and then return the result. If the conversion is unsuccessfully, the error is reported. (such as boolean type).
  • Expect these, the conversion between other types is not allowed.

IF

Function declaration:

  1. if(testCondition, valueTrue, valueFalseOrNull)

Purpose: Judge if testCondition is true. If it is true, return valueTrue, otherwise return valueFalse or Null.

Parameter descriptions:

  • testCondition: The expression to be judged. Boolean type.

  • valueTrue: It returns when the expression testCondition is true.

  • valueFalseOrNull: It returns when the expression testCondition is not true and also can be null.

Return value: The return type is the same as the valueTrue or valueFalseOrNul type.

Example:

  1. select if(1=2,100,200) from dual;
  2. --Returned results:
  3. +------------+
  4. | _c0 |
  5. +------------+
  6. | 200 |
  7. +------------+

New extended other functions

SPLIT

Function declaration:

  1. split(str, pat)

Purpose: Separates str using pat.

Parameter descriptions:

  • str: String type, specifies the string to be separated.

  • pat: String type, specifies the delimiter, supports regular expressions.

Return value: Array , the result is the elements in str separated by pat.

Example:

  1. select split("a,b,c",",") from dual;

Returned results:

  1. +------+
  2. | _c0 |
  3. +------+
  4. | [a, b, c] |
  5. +------+

EXPLODE

Function declaration:

  1. explode (var)

Purpose: Converts one row of data into a multi-row UDTF. If var is Array type, the array stored in the column is converted to multiple rows. If var is Map type, each key-value pair of the map stored in the column is converted to a row with two columns, one column for the key and one for the value.

Parameter descriptions:

  • var: Array < T > type or Map < K,V > type.

Return value: Transposed rows.

Notes:

The following restrictions apply when using UDTF:

  • One select can only have one UDTF and no other columns can appear.
  • It cannot be used with group by, cluster by, distribute by, or sort by.

Example:

  1. explode(array(null, 'a', 'b', 'c')) col

Note:

Please add set odps.sql.type.system.odps2=true; in front of the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

MAP

Function declaration:

  1. MAP map(K key1, V value1, K key2, V value2, ...)

Purpose: Uses the given key/value pairs to create a map.

Parameter descriptions:

  • key/value: All key types must be the same basic type. All value types must be consistent, but any type may be used.

Return value: Map type.

Example:

  1. select map('a',123,'b',456) from dual;

Results:

  1. {a:123, b:456}

Note:

Please add set odps.sql.type.system.odps2=true; in front of the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

MAP_KEYS

Function declaration:

  1. ARRAY map_keys(map<K, V> )

Purpose: Returns an array of all the keys in the map parameter.

Parameter descriptions:

  • map: Map-type data.

Return value: Array type. If the input is null, null is returned.

Example:

  1. select map_keys(map('a',123,'b',456)) from dual;

Results:

  1. [a, b]

Note:

Please add set odps.sql.type.system.odps2=true; in front of the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

MAP_VALUES

Function declaration:

  1. ARRAY map_values(map<K, V>)

Purpose: Returns an array of all the values in the map parameter.

Parameter descriptions:

  • map: Map-type data.

Return value: Array type. If the input is null, null is returned.

Example:

  1. select map_keys(map('a',123,'b',456)) from dual;

Results:

  1. [123, 456]

Note:

Please add set odps.sql.type.system.odps2=true; in front of the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

ARRAY

Function declaration:

  1. ARRAY array(value1,value2, ...)

Purpose: Creates an array using the given values.

Parameter descriptions:

  • value: This parameter can be of any type, but all the values must be of the same type.

Return value: Array type.

Example:

  1. select array(123,456,789) from dual;

Results:

  1. [123, 456, 789]

Note:

Please add set odps.sql.type.system.odps2=true; in front of the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

SIZE

Function declaration:

  1. INT size(map<K, V>)
  2. INT size(array<T>)

Purpose: Size(map) returns the number of K/V pairs in the given map. Size(array) returns the number of elements in the given array.

Parameter descriptions:

  • map: Map-type data.
  • array: Array-type data.

Return value: Int type.

Example:

  1. select map_values(map('a',123,'b',456)) from dual;--Returns 2
  2. select size(map('a',123,'b',456,'c',789)) from dual;--Returns 3
  3. select size(array('a','b')) from dual;--Returns 2
  4. select size(array(123,456,789)) from dual;--Returns 3

Note:

Please add set odps.sql.type.system.odps2=true; in front of the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

ARRAY_CONTAINS

Function declaration:

  1. boolean array_contains(ARRAY<T> a, value v)

Purpose: Checks if the given array ‘a’ contains ‘v’.

Parameter descriptions:

  • a: Array-type data.
  • v: The given v must be of the same type as the data in the array.

Return value: Boolean type.

Example:

  1. select array_contains(array('a','b'), 'a') from dual; --Returns true
  2. select array_contains(array(456,789),123) from dual; -- Returns false

Note:

Please add set odps.sql.type.system.odps2=true; in front of the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

SORT_ARRAY

Function declaration:

  1. ARRAY sort_array(ARRAY<T>)

Purpose: Sorts the given array.

Parameter descriptions:

  • ARRAY: Array-type data, the data in the array can be of any type.

Return value: Array type.

Example:

  1. select sort_array(array('a','c','f','b')),sort_array(array(4,5,7,2,5,8)),sort_array(array('You','Me','He')) from dual;

Returned results:

  1. [a, b, c, f] [2, 4, 5, 5, 7, 8] [He, You, Me]

Note:

Please add set odps.sql.type.system.odps2=true; in front of the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

POSEXPLODE

Function declaration:

  1. posexplode(ARRAY<T>)

Purpose: Explodes the given array. Each value is given a row and each row has two columns corresponding to the subscript (starting from 0) and the array element.

Parameter descriptions:

  • ARRAY: Array-type data, the data in the array can be of any type.

Return value: Table generation function.

Example:

  1. select select posexplode(array('a','c','f','b')) from dual;

Returned results:

  1. +------------+-----+
  2. | pos | val |
  3. +------------+-----+
  4. | 0 | a |
  5. | 1 | c |
  6. | 2 | f |
  7. | 3 | b |
  8. +------------+-----+

Note:

Please add set odps.sql.type.system.odps2=true; in front of the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

STRUCT

Function declaration:

  1. STRUCT struct(value1,value2, ...)

Purpose: Creates a struct using the given value list.

Parameter descriptions:

  • value: Each value can be of any type.

Return value: STRUCT type, field names are sequential: col1, col2, …

Example:

  1. select struct('a',123,'ture',56.90) from dual;

Returned results:

  1. {col1:a, col2:123, col3:ture, col4:56.9}

Note:

Please add set odps.sql.type.system.odps2=true; in front of the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

NAMED_STRUCT

Function declaration:

  1. STRUCT named_struct(string name1, T1 value1, string name2, T2 value2, ...)

Purpose: Creates a struct using the given name/value list.

Parameter descriptions:

  • value: Each value can be of any type.
  • name: Specifies the name of a String-type field.

Returned value: STRUCT type. The field names of the generated struct are sequential: name1, name2, …

Example:

  1. select named_struct('user_id',1,0001,'user_name','LiLei','married','F','weight',63.50) from dual;

Returned results:

  1. {user_id:10001, user_name:LiLei, married:F, weight:63.5}

Note:

Please add set odps.sql.type.system.odps2=true; in front of the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

INLINE

Function declaration:

  1. inline(ARRAY<STRUCT<f1:T1, f2:T2, ...>>)

Purpose: Explodes the given struct array. Each element is given one row and each struct element corresponds to one column in each row.

Parameter descriptions:

  • STRUCT: The values in the array can be of any type.

Return value: Table generation function.

Example:

  1. select select inline(array(named_struct('user_id',10001,'user_name','LiLei','married','F','weight',63.50))) from dual;

Returned results:

  1. +------------+-----------+---------+------------+
  2. | user_id | user_name | married | weight |
  3. +------------+-----------+---------+------------+
  4. | 10001 | LiLei | F | 63.5 |
  5. +------------+-----------+---------+------------+

Note:

Please add set odps.sql.type.system.odps2=true; in front of the SQL statement that uses this function, and submit it with SQL to use the new data type normally.

TRANS_ARRAY

Format definition:

  1. trans_array (num_keys, separator, key1,key2,…,col1, col2,col3) as (key1,key2,…,col1, col2)

Purpose: A UDTF that converts one row of data to multiple rows, and converts an array separated with fixed-separator format in column into multiple rows.

Parameter descriptions:

  • num_keys: Bigint type constant, must be larger than or equal to 0. It is used as the number of columns to transpose key when converting to multiple rows.
  • Key: Duplicate columns in multiple rows when converting one row to multiple rows.
  • separator: String type constant. It is a separator used to split a string into multiple elements. Exception is thrown when it is null.
  • keys: As column of key when you transpose. It is specified by num_keys. If num_keys specifies that all columns are keys (that is, num_keys equals the number of all columns), only one row is returned.
  • cols: An array to convert to rows. All columns after keys are considered as an array to be transposed. String type. The stored contents are arrays of string format, such as “Hangzhou; Beijing; shanghai”, they are arrays separated by “;”.

Return value: Transposed rows, new column names are specified by as. The type of column that is as key remains unchanged, and all other columns are string type. The number of rows to be split depends on the array that has maximum number, no-value locales are complemented with NULL.

Notes:

The following restrictions apply when using UDTF:

  • All columns that are considered as keys must be placed front, and columns to be transposed must be placed behind.
  • One select can only have one UDTF and no other columns can appear.
  • It cannot be used with group by, cluster by, distribute by, or sort by.

Example:

The table contains the following data:

  1. Login_id LOGIN_IP LOGIN_TIME
  2. wangwangA 192.168.0.1,192.168.0.2 20120101010000,20120102010000

Use the trans_array function:

  1. trans_array(1, ",", login_id, login_ip, login_time) as (login_id,login_ip,login_time)

Results:

  1. Login_id Login_ip Login_time
  2. wangwangA 192.168.0.1 20120101010000
  3. wangwangA 192.168.0.2 20120102010000

If the table contains the following data:

  1. Login_id LOGIN_IP LOGIN_TIME
  2. wangwangA 192.168.0.1,192.168.0.2 20120101010000

NULL is complemented to the no-value locales in the array:

  1. Login_id Login_ip Login_time
  2. wangwangA 192.168.0.1 20120101010000
  3. wangwangA 192.168.0.2 NULL
Thank you! We've received your feedback.