MaxCompute SQL provides other functions that are commonly used in the development process. You can use appropriate functions based on your business requirements. This topic describes the command syntax, parameters, and examples of the functions, such as CAST, DECODE, LEAST, ARRAY, SPLIT, and MAP.

Function Description
ARRAY Constructs an array based on given values.
ARRAY_CONTAINS Checks whether a specific value is contained in a specific array.
ARRAY_INTERSECT Calculates the intersection of two arrays.
CASE WHEN expression Returns different values based on the computing result of an expression.
CAST Converts the result of an expression into a specific data type.
COALESCE Returns the first non-NULL value in the parameter list.
DECODE Implements the IF-THEN-ELSE conditional branching feature.
EXPLODE Transposes one row of data into multiple rows. This function is a UDTF.
GET_IDCARD_AGE Returns an age based on the ID card number.
GET_IDCARD_BIRTHDAY Returns the date of birth based on the ID card number.
GET_IDCARD_SEX Returns the gender based on the ID card number.
GET_USER_ID Obtains the ID of the current account.
GREATEST Returns the maximum value of the input parameters.
IF expression Checks whether a specified condition evaluates to true.
INLINE Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row.
LEAST Returns the minimum value of the input parameters.
MAP Defines a map based on specific key-value pairs.
MAP_KEYS Returns all keys in a map as an array.
MAP_VALUES Returns all values in a map as an array.
MAX_PT Returns the name of the largest level-1 partition in a partitioned table.
NAMED_STRUCT Creates a struct based on given name-value pairs.
ORDINAL Sorts the values of the input variables in ascending order and returns the value that is ranked at a specific position.
PARTITION_EXISTS Checks whether a specific partition exists in a table.
POSEXPLODE Converts an array into a table that has two columns. The first column lists subscripts of each value in the array, starting from 0. The second column lists array elements.
SAMPLE Samples all column values that are read and filters out the rows that do not meet sampling conditions.
SIZE Returns the number of key-value pairs in a specific map or the number of elements in a specific array.
SORT_ARRAY Sorts elements in a specific array.
SPLIT Splits a string with a specified delimiter and returns an array.
STACK Splits a specified parameter group into a specified number of rows.
STR_TO_MAP Splits a string with a specified delimiter and returns a key-value pair.
STRUCT Creates a struct based on a given value list.
TABLE_EXISTS Checks whether a specified table exists.
TRANS_ARRAY Transposes one row of data into multiple rows. This function is a UDTF that transposes an array separated by fixed delimiters in a column into multiple rows.
TRANS_COLS Transposes one row of data into multiple rows. This function is a UDTF that transposes columns into rows.
UUID Returns a random ID.

ARRAY

  • Syntax
    array array(<value>,<value>, ...)
  • Description

    Constructs an array based on given values.

  • Parameters

    value: required. All data types are supported. All parameters must be of the same data type.

    MaxCompute V2.0 provides additional functions. If the functions you use involve new data types, execute the following SET statement to enable these data types. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.
    • Session level: To use a new data type, you must insert set odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can configure new data types for the project as required. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Data type editions.
  • Return value

    A value of the ARRAY type is returned.

  • Example
    The t_table table contains the c1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT) fields. Data in the table:
    +------------+----+----+------------+------------+
    | c1         | c2 | c3 | c4         | c5         |
    +------------+----+----+------------+------------+
    | 1000       | k11 | k21 | 86         | 15         |
    | 1001       | k12 | k22 | 97         | 2          |
    | 1002       | k13 | k23 | 99         | 1          |
    +------------+----+----+------------+------------+
    Sample command:
    -- Construct an array based on the data in the c2, c4, c3, and c5 columns. 
    select array(c2,c4,c3,c5) from t_table;
    
    -- The following result is returned: 
    +------+
    | _c0  |
    +------+
    | [k11, 86, k21, 15] |
    | [k12, 97, k22, 2] |
    | [k13, 99, k23, 1] |
    +------+

ARRAY_CONTAINS

  • Syntax
    boolean array_contains(array <a>, value <v>)
  • Description

    Checks whether the a array contains the v value.

  • Parameters
    • a: required. Only the ARRAY data type is supported. This parameter specifies the name of the array that you want to check.
    • v: required. The value of v must be of the same data type as the data in the array. This parameter specifies the value that you want to check.
  • Example
    The t_table_array table contains the c1 (BIGINT) and t_array (ARRAY<STRING>) fields. Data in the table:
    +------------+---------+
    | c1         | t_array |
    +------------+---------+
    | 1000       | [k11, 86, k21, 15] |
    | 1001       | [k12, 97, k22, 2] |
    | 1002       | [k13, 99, k23, 1] |
    +------------+---------+
    Sample command:
    -- Check whether the t_array column contains the value 1. 
    select c1, array_contains(t_array,'1') from t_table_array;
    -- The following result is returned: 
    +------------+------+
    | c1         | _c1  |
    +------------+------+
    | 1000       | false |
    | 1001       | false |
    | 1002       | true |
    +------------+------+

ARRAY_INTERSECT

  • Syntax
    array array_intersect(<array1>, <array2>)  
  • Description

    Calculates the intersection of array1 and array2.

  • Parameters

    array1 and array2: required. Only the ARRAY data type is supported. These parameters specify the arrays that you want to compare.

  • Return value

    A value of the ARRAY type is returned.

  • Example
    -- [20,30,null] is returned. 
    select array_intersect(array(10, 20, 20, 30, 30, null, null), array(30, 30, 20, 20, 40, null, null)); 

CASE WHEN expression

  • Syntax
    MaxCompute provides the following CASE WHEN syntax:
    • case <value>
      when <value1> then <result1>
      when <value2> then <result2>
      ...
      else <resultn>
      end
    • case
      when (<_condition1>) then <result1>
      when (<_condition2>) then <result2>
      when (<_condition3>) then <result3>
      ...
      else <resultn>
      end
  • Description

    Returns the value of result based on the calculation result of value or _condition.

  • Parameters
    • value: required. The value used for comparison.
    • _condition: required. The specified condition.
    • result: required. The return value.
    Note
    • If the data types of all result values are only BIGINT and DOUBLE, the data types are converted into the DOUBLE type and then values are returned.
    • If result values of the STRING type exist, the data types are converted into the STRING type and then values are returned. If an unsupported conversion, such as a conversion from the BOOLEAN type to the STRING type, is involved, an error is returned.
    • Conversions between other data types are not allowed.
  • Example
    The sale_detail table contains the shop_name (STRING), customer_id (STRING), and total_price (DOUBLE) fields. Data in the table:
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    | s3         | c3          | 100.3       | 2013       | china      |
    | null       | c5          | NULL        | 2014       | shanghai   |
    | s6         | c6          | 100.4       | 2014       | shanghai   |
    | s7         | c7          | 100.5       | 2014       | shanghai   |
    +------------+-------------+-------------+------------+------------+
    Sample command:
    select 
    case  
    when region='china' then 'default_region'
    when region like 'shang%' then 'sh_region'
    end as region 
    from sale_detail;
    The following result is returned:
    +------------+
    | region     |
    +------------+
    | default_region |
    | default_region |
    | default_region |
    | sh_region  |
    | sh_region  |
    | sh_region  |
    +------------+

CAST

  • Syntax
    cast(<expr> as <type>)
  • Description
    Converts the result of expr to a specific data type. Usage:
    • cast(double as bigint): converts a value of the DOUBLE type into the BIGINT type.
    • cast(string as bigint): converts a value of the STRING type into the BIGINT type. If the string consists of numerals expressed in the INTEGER form, it is converted into the BIGINT type. If the string consists of numerals expressed in the FLOAT or EXPONENTIAL form, it is converted into the DOUBLE type and then into the BIGINT type.
    • The default date format, yyyy-mm-dd hh:mi:ss, is used for cast(string as datetime) and cast(datetime as string).
  • Parameters
    • expr: required. The expression whose computing result you want to convert.
    • type: required. The data type to which you want to convert the data.
  • Return value

    The return value is of the specified data type.

  • Examples
    • Example 1: common usage. Sample command:
      -- 1 is returned. 
      select cast('1' as bigint);
    • Example 2: incorrect usage. If the conversion fails or an unsupported type conversion occurs, NULL is returned. Sample command:
      -- NULL is returned. 
      select cast('abc' as bigint);

COALESCE

  • Syntax
    coalesce(<expr1>, <expr2>, ...)
  • Description

    Returns the first non-NULL value in <expr1>, <expr2>, ....

  • Parameters

    expr: required. A value that you want to check. Except the NULL values, all other values must be of the same data type. Otherwise, an error is returned. If all values are NULL, an error is returned.

  • Return value

    The data type of the return value is the same as the data type of the parameter.

  • Examples
    • Example 1: common usage. Sample command:
      -- 1 is returned. 
      select coalesce(null,null,1,null,3,5,7);
    • Example 2: incorrect usage. The data types of parameter values are not the same, and an error is returned. Sample command:
      -- An error is returned, and the value abc cannot be identified. 
      select coalesce(null,null,1,null,abc,5,7);
    • Example 3: incorrect usage. Non-NULL values do not exist and an error is returned. Sample command:
      -- An error is returned because non-NULL values do not exist. 
      select coalesce(null,null,null,null);

DECODE

  • Syntax
    decode(<expression>, <search>, <result>[, <search>, <result>]...[, <default>])
  • Description

    Implements the IF-THEN-ELSE conditional branching feature.

  • Parameters
    • expression: required. The expression that you want to compare.
    • search: required. The search item that you use to compare with the expression.
    • result: required. The value returned when the value of search matches the value of expression.
    • default: optional. If no search items match the expression, the value of default is returned. If this parameter is not specified, NULL is returned.
    Note
    • Except the NULL values, all other values of the result parameter must be of the same data type. If the values are of different data types, an error is returned.
    • The values of search and expression must be of the same data type. Otherwise, an error is returned.
  • Return value
    • If a search item matches the expression, result is returned.
    • If no search item matches the expression, default is returned.
    • If default is not specified, NULL is returned.
    • If duplicate search items match the expression, the first value is returned.
    • In most cases, NULL is returned when MaxCompute SQL calculates NULL=NULL. However, the DECODE function considers that the two NULL values are the same.
  • Example
    The sale_detail table contains the shop_name (STRING), customer_id (STRING), and total_price (DOUBLE) fields. Data in the table:
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    | s3         | c3          | 100.3       | 2013       | china      |
    | null       | c5          | NULL        | 2014       | shanghai   |
    | s6         | c6          | 100.4       | 2014       | shanghai   |
    | s7         | c7          | 100.5       | 2014       | shanghai   |
    +------------+-------------+-------------+------------+------------+
    Sample command:
    -- If the value of customer_id is c1, Taobao is returned. If the value is c2, Alipay is returned. If the value is c3, Aliyun is returned. If the value is NULL, N/A is returned. In other cases, Others is returned. 
    select
    decode(customer_id,
    'c1', 'Taobao',
    'c2', 'Alipay',
    'c3', 'Aliyun',
    Null, 'N/A',
    'Others') as result
    from sale_detail;
    -- The preceding statement is equivalent to the following statement: 
    if customer_id = c1 then
    result := 'Taobao';
    elsif customer_id = c2 then
    result := 'Alipay';
    elsif customer_id = c3 then
    result := 'Aliyun';
    ...
    else
    result := 'Others';
    end if;
    The following result is returned:
    +------------+
    | result     |
    +------------+
    | Others     |
    | Others     |
    | Others     |
    | Taobao     |
    | Alipay     |
    | Aliyun     |
    +------------+

EXPLODE

  • Limits
    • A SELECT statement can contain only one EXPLODE function, and no other columns of a table are allowed.
    • This function cannot be used with the GROUP BY, CLUSTER BY, DISTRIBUTE BY, or SORT BY clause.
  • Syntax
    explode (<var>)
  • Description
    Transposes one row of data into multiple rows. This function is a UDTF.
    • If the parameter value is of the array<T> type, the array stored in the column is transposed into multiple rows.
    • If the parameter value is of the map<K, V> type, each key-value pair of the map stored in the column is transposed into one row with two columns. One column is used to store keys, and the other column is used to store values.
  • Parameters

    var: required. The value must be of the array<T> or map<K, V> data type.

  • Return value

    Rows after transposition are returned.

  • Example
    select explode(array(null, 'a', 'b', 'c')) col;
    -- The following result is returned: 
    +------------+
    | col        |
    +------------+
    | NULL       |
    | a          |
    | b          |
    | c          |
    +------------+

GET_IDCARD_AGE

  • Syntax
    get_idcard_age(<idcardno>)
  • Description

    Returns the current age based on the ID card number. The current age is the current year minus the birth year in the ID card number.

  • Parameters

    idcardno: required. A 15-digit or 18-digit ID card number of the STRING type. During calculation, the validity of the ID card is checked based on the province code and the last digit of the ID card number. NULL is returned if the check fails.

  • Return value

    A value of the BIGINT type is returned. If the input value is NULL, NULL is returned.

GET_IDCARD_BIRTHDAY

  • Syntax
    get_idcard_birthday(<idcardno>)
  • Description

    Returns the date of birth based on the ID card number.

  • Parameters

    idcardno: required. A 15-digit or 18-digit ID card number of the STRING type. During calculation, the validity of the ID card is checked based on the province code and the last digit of the ID card number. NULL is returned if the check fails.

  • Return value

    A value of the DATETIME type is returned. If the input value is NULL, NULL is returned.

GET_IDCARD_SEX

  • Syntax
    get_idcard_sex(<idcardno>)
  • Description

    Returns the gender based on the ID card number. Valid values: M and F. M indicates male and F indicates female.

  • Parameters

    idcardno: required. The 15-digit or 18-digit ID card number of the STRING type. During calculation, the validity of the ID card is checked based on the province code and the last digit of the ID card number. NULL is returned if the check fails.

  • Return value

    A value of the STRING type is returned. If the input value is NULL, NULL is returned.

GET_USER_ID

  • Syntax
    get_user_id()
  • Description

    Obtains the ID of the current account, which is the user ID (UID).

  • Parameters

    No input parameters are required. If you specify an input parameter, an error is returned.

  • Return value

    The ID of the current account is returned.

  • Example
    select get_user_id();
    -- The following result is returned: 
    +------------+
    | _c0        |
    +------------+
    | 1117xxxxxxxx8519 |
    +------------+

GREATEST

  • Syntax
    greatest(<var1>, <var2>, ...)
  • Description

    Returns the maximum value of the input parameters.

  • Parameters

    var: required. The value can be of the BIGINT, DOUBLE, DECIMAL, DATETIME, or STRING type. If the values of all input parameters are NULL, NULL is returned.

  • Return value
    • The maximum value of the input parameters is returned. If no implicit conversion is required, the return value is of the same data type as the input parameter.
    • NULL is interpreted as the minimum value.
    • If the input parameters are of different data types, those of the DOUBLE, BIGINT, DECIMAL, and STRING types need to be converted into the DOUBLE type for comparison, and those of the STRING and DATETIME types need to be converted into the DATETIME type for comparison. Implicit conversions of other data types are not allowed.
    • If odps.sql.hive.compatible is set to true and the value of any input parameter is NULL, NULL is returned.

IF expression

  • Syntax
    if(<testCondition>, <valueTrue>, <valueFalseOrNull>)
  • Description

    Checks whether testCondition evaluates to true. If testCondition evaluates to true, the value of valueTrue is returned. Otherwise, the value of valueFalseOrNull is returned.

  • Parameters
    • testCondition: required. The expression that you want to evaluate. The value is of the BOOLEAN type.
    • valueTrue: required. The value returned when testCondition evaluates to true.
    • valueFalseOrNull: the value returned when testCondition evaluates to false. You can set this parameter to NULL.
  • Return value

    The data type of the return value is the same as that of valueTrue or valueFalseOrNull.

  • Example
    -- 200 is returned. 
    select if(1=2,100,200); 

INLINE

  • Syntax
    inline(array<struct<f1:T1, f2:T2, ...>>)
  • Description

    Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row.

  • Parameters

    f1:T1 and f2:T2: required. All data types are supported.

    MaxCompute V2.0 provides additional functions. If the functions you use involve new data types, execute the following SET statement to enable these data types. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.
    • Session level: To use a new data type, you must insert set odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can configure new data types for the project as required. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Data type editions.
  • Return value

    The expanded data of the struct array is returned.

  • Example
    The t_table table contains the t_struct (STRUCT<user_id:BIGINT,user_name:STRING,married:STRING,weight:DOUBLE>) field. Data in the table:
    +----------+
    | t_struct |
    +----------+
    | {user_id:10001, user_name:LiLei, married:N, weight:63.5} |
    | {user_id:10002, user_name:HanMeiMei, married:Y, weight:43.5} |
    +----------+
    Sample command:
    -- Expand the t_struct column. 
    select inline(array(t_struct)) from t_table;
    -- The following result is returned: 
    +------------+-----------+---------+------------+
    | user_id    | user_name | married | weight     |
    +------------+-----------+---------+------------+
    | 10001      | LiLei     | N       | 63.5       |
    | 10002      | HanMeiMei | Y       | 43.5       |
    +------------+-----------+---------+------------+

LEAST

  • Syntax
    least(<var1>, <var2>, ...)
  • Description

    Returns the minimum value of the input parameters.

  • Parameters

    var: required. The values of the input parameters. The value can be of the BIGINT, DOUBLE, DECIMAL, DATETIME, or STRING type. If the values of all input parameters are NULL, NULL is returned.

  • Return value
    • The minimum value of input parameters is returned. If no implicit conversion is required, the return value is of the same data type as the input parameter.
    • If a data type conversion is performed between the DOUBLE, BIGINT, and STRING types, a value of the DOUBLE type is returned. If a data type conversion is performed between the STRING and DATETIME types, a value of the DATETIME type is returned. If a data type conversion is performed between the DECIMAL, DOUBLE, BIGINT, and STRING types, a value of the DECIMAL type is returned. Implicit conversions of other data types are not allowed.
    • NULL is interpreted as the minimum value.

MAP

  • Syntax
    map map(K <key1>, V <value1>, K <key2>, V <value2>, ...)
  • Description

    Defines a map based on specific key-value pairs.

  • Parameters
    • key: required. All keys must be of the same data types, including the types after an implicit conversion. The data types must be basic types.
    • value: required. All values must be of the same data type, including the data types after an implicit conversion. Random data types can be used.
    MaxCompute V2.0 provides additional functions. If the functions you use involve new data types, execute the following SET statement to enable these data types. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.
    • Session level: To use a new data type, you must insert set odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can configure new data types for the project as required. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Data type editions.
  • Return value

    Data of the MAP type is returned.

  • Example
    The t_table table contains the c1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT) fields. Data in the table:
    +------------+----+----+------------+------------+
    | c1         | c2 | c3 | c4         | c5         |
    +------------+----+----+------------+------------+
    | 1000       | k11 | k21 | 86         | 15         |
    | 1001       | k12 | k22 | 97         | 2          |
    | 1002       | k13 | k23 | 99         | 1          |
    +------------+----+----+------------+------------+
    Sample command:
    -- Define a map based on the key-value pairs between the c2 and c4 columns, and the c3 and c5 columns. 
    select map(c2,c4,c3,c5) from t_table;
    -- The following result is returned: 
    +------+
    | _c0  |
    +------+
    | {k11:86, k21:15} |
    | {k12:97, k22:2} |
    | {k13:99, k23:1} |
    +------+

MAP_KEYS

  • Syntax
    array map_keys(map<K, V>)
  • Description

    Returns all keys in a map as an array.

  • Parameters

    The parameter value is of the MAP type.

  • Return value

    A value of the ARRAY type is returned. If the input value is NULL, NULL is returned.

  • Example
    The t_table_map table contains the c1 (BIGINT) and t_map (MAP<STRING,BIGINT>) fields. Data in the table:
    +------------+-------+
    | c1         | t_map |
    +------------+-------+
    | 1000       | {k11:86, k21:15} |
    | 1001       | {k12:97, k22:2} |
    | 1002       | {k13:99, k23:1} |
    +------------+-------+
    Sample command:
    -- Return keys in the t_map column as an array. 
    select c1, map_keys(t_map) from t_table_map;
    -- The following result is returned: 
    +------------+------+
    | c1         | _c1  |
    +------------+------+
    | 1000       | [k11, k21] |
    | 1001       | [k12, k22] |
    | 1002       | [k13, k23] |
    +------------+------+

MAP_VALUES

  • Syntax
    array map_values(map<K, V>)
  • Description

    Returns all values in a map as an array.

  • Parameters

    The parameter value is of the MAP type.

  • Return value

    A value of the ARRAY type is returned. If the input value is NULL, NULL is returned.

  • Example
    select map_values(map('a',123,'b',456));
    -- The following result is returned: 
    [123, 456]

MAX_PT

  • Syntax
    max_pt(<table_full_name>)
  • Description

    Returns the name of the largest level-1 partition that contains data in a partitioned table and reads the data of this partition. This function determines the largest partition by sorting partitions in alphabetical order.

    You can also use a standard SQL statement instead of the statement where the MAX_PT function is used. For example, you can use select * from table where pt = (select max(pt) from table); instead of select * from table where pt = max_pt("table");.

    Note MaxCompute does not provide the MIN_PT function. If you need to obtain the smallest partition that has data in a partitioned table, you cannot use the SQL statement select * from table where pt = min_pt("table");. Instead, you can use the standard SQL statement select * from table where pt = (select min(pt) from table); to achieve the same effect.
  • Parameters

    table_full_name: required. The value is of the STRING type. This parameter specifies the name of the table. You must have read permissions on the table.

  • Return value

    The name of the largest partition is returned.

    Note

    If a partition is added by using the ALTER TABLE statement and the partition contains no data, the name of this partition is not returned.

  • Example
    • Example 1: The tbl table is a partitioned table. The partitions in the table are 20120901 and 20120902, both of which contain data. If you execute the following statement, the return value of MAX_PT is '20120902', and the MaxCompute SQL statement reads data from the 20120902 partition. Sample statements:
      select * from tbl where pt=max_pt('myproject.tbl');
      -- The preceding statement is equivalent to the following statement: 
      select * from tbl where pt = (select max(pt) from myproject.tbl);
    • Example 2: If a partitioned table has multiple levels of partitions, use the standard SQL statement to obtain data in the largest partition. Sample statement:
      select * from table where pt1 = (select max(pt1) from table) and pt2 = (select max(pt2) from table where pt1 = (select max(pt1) from table));

NAMED_STRUCT

  • Syntax
    struct named_struct(string <name1>, T1 <value1>, string <name2>, T2 <value2>, ...)
  • Description

    Creates a struct based on given name-value pairs.

  • Parameters
    • value: required. All data types are supported.
    • name: required. The column name of the STRING type. This parameter is a constant.
    MaxCompute V2.0 provides additional functions. If the functions you use involve new data types, execute the following SET statement to enable these data types. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.
    • Session level: To use a new data type, you must insert set odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can configure new data types for the project as required. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Data type editions.
  • Return value

    A value of the STRUCT type is returned. Column names are sequentially named as name1, name2, ...

  • Example
    select named_struct('user_id',10001,'user_name','LiLei','married','F','weight',63.50);
    -- The following result is returned: 
    +------------+
    | _c0        |
    +------------+
    | {"user_id":10001,"user_name":"LiLei","married":"F","weight":63.5} |
    +------------+

ORDINAL

  • Syntax
    ordinal(bigint <nth>, <var1>, <var2>, ...)
  • Description

    Sorts the values of the input variables in ascending order and returns the value that is ranked nth.

  • Parameters
    • nth: required. The value is of the BIGINT type. This parameter specifies the position of the value that you want to return. If the value is NULL, NULL is returned.
    • var: required. The value is of the BIGINT, DOUBLE, DATETIME, or STRING type.
  • Return value
    • The value that is ranked nth is returned. If no implicit conversion is required, the return value is of the same data type as the input parameter.
    • If a data type conversion is performed between the DOUBLE, BIGINT, and STRING types, a value of the DOUBLE type is returned. If a data type conversion is performed between the STRING and DATETIME types, a value of the DATETIME type is returned. Implicit conversions of other data types are not allowed.
    • NULL is interpreted as the minimum value.
  • Example
    -- 2 is returned. 
    select ordinal(3, 1, 3, 2, 5, 2, 4, 6); 

PARTITION_EXISTS

  • Syntax
    boolean partition_exists(string <table_name>, string... <partitions>)
  • Description

    Checks whether a specific partition exists in a table.

  • Parameters
    • table_name: required. The table name, which is of the STRING type. You can specify a project name in the table name, for example, my_proj.my_table. If you do not specify a project name, the current project name is used.
    • partitions: required. The partition name, which is of the STRING type. In this parameter, you must specify the values of partition key columns in a table based on the sequence of these columns. The number of values must be the same as the number of partition key columns.
  • Return value

    A value of the BOOLEAN type is returned. If the specified partition exists, True is returned. Otherwise, False is returned.

  • Example
    -- Create the foo partitioned table. 
    create table foo (id bigint) partitioned by (ds string, hr string);
    -- Add partitions to the foo table. 
    alter table foo add partition (ds='20190101', hr='1');
    -- Check whether partitions 20190101 and 1 exist. True is returned. 
    select partition_exists('foo', '20190101', '1');

POSEXPLODE

  • Syntax
    posexplode(array<T>)
  • Description

    Converts an array into a table that has two columns. The first column lists subscripts of each value in the array, starting from 0. The second column lists array elements.

  • Parameters

    array<T>: required. The value is of the ARRAY type. All data types are supported.

    MaxCompute V2.0 provides additional functions. If the functions you use involve new data types, execute the following SET statement to enable these data types. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.
    • Session level: To use a new data type, you must insert set odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can configure new data types for the project as required. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Data type editions.
  • Return value

    The generated table is returned.

  • Example
    select posexplode(array('a','c','f','b'));
    -- The following result is returned: 
    +------------+------------+
    | pos        | val        |
    +------------+------------+
    | 0          | a          |
    | 1          | c          |
    | 2          | f          |
    | 3          | b          |
    +------------+------------+

SAMPLE

  • Syntax
    boolean sample(<x>, <y>, [<column_name1>, <column_name2>,...])
  • Description

    Samples all values read from column_name based on x and y, and filters out the rows that do not meet sampling conditions.

  • Parameters
    • x and y: x is required. x and y are integer constants that are greater than 0. Their values are of the BIGINT type. They indicate that the values fall into x portions based on the hash function and the yth portion is used.

      y is optional. If y is not specified, the first option is used by default, and you do not need to specify column_name.

      If x or y is of another data type, the value of x or y is less than or equal to 0, or y is greater than x, an error is returned. If the value of x or y is NULL, NULL is returned.

    • column_name: optional. The name of the destination column on which sampling is performed. This parameter is optional. If it is not specified, random sampling is performed based on the values of x and y. It can be of any data type, and the column value can be NULL. No implicit conversion is performed. If column_name is constant NULL, an error is returned.
      Note To avoid data skew due to the NULL value, uniform hashing is performed on the NULL values in column_name in x portions. If column_name is not specified and the amount of data is small, the output is not necessarily uniform. In this case, we recommend that you specify column_name to obtain a uniform output.
  • Return value

    A value of the BOOLEAN type is returned.

  • Example

    The tbla table contains the cola column.

    -- The values in the cola column fall into four portions based on the hash function, and the first portion is used. True is returned. 
    select * from tbla where sample (4, 1 , cola);
    -- The values in each row are randomly hashed to four portions, and the second portion is used. True is returned. 
    select * from tbla where sample (4, 2);

SIZE

  • Syntax
    int size(<map>)
    int size(<array>)
  • Description

    size(<map>): returns the number of key-value pairs in a specified MAP parameter. size(<array>): returns the number of elements in a specified array.

  • Parameters
    • map: required. The value is of the MAP type.
    • array: required. The value is of the ARRAY type.
  • Example
    -- 2 is returned. 
    select size(map('a',123,'b',456)); 
    -- 3 is returned. 
    select size(map('a',123,'b',456,'c',789)); 
    -- 2 is returned. 
    select size(array('a','b'));
    -- 3 is returned.  
    select size(array(123,456,789)); 

SORT_ARRAY

  • Syntax
    array sort_array(array<T>[, isasc])
  • Description

    Sorts elements in a specific array.

  • Parameters
    • array<T>: required. The value is of the ARRAY type. Data in the array can be of any data type.
      MaxCompute V2.0 provides additional functions. If the functions you use involve new data types, execute the following SET statement to enable these data types. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.
      • Session level: To use a new data type, you must insert set odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
      • Project level: The project owner can configure new data types for the project as required. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
        setproject odps.sql.type.system.odps2=true;
        For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Data type editions.
    • isasc: optional. This parameter specifies whether to sort elements in a specific array in ascending or descending order. Valid values: true and false. The value true indicates the ascending order and false indicates the descending order. If you do not specify this parameter, the elements are sorted in ascending order.
  • Return value

    An array with elements sorted is returned.

  • Examples
    • Example 1: The t_array table contains the c1 ARRAY<STRING>, c2 ARRAY<INT>, and c3 ARRAY<STRING> fields. Data in the table:
      +------------+---------+--------------+
      | c1         | c2      | c3           |
      +------------+---------+--------------+
      | [a, c, f, b]  | [4, 5, 7, 2, 5, 8]  |  [You, Me, Him] |
      +------------+---------+--------------+
      Sort data in each column of the table. Sample command:
      [a, b, c, f] [2, 4, 5, 5, 7, 8] [Him, You, Me] is returned. 
      select sort_array(c1),sort_array(c2),sort_array(c3) from t_array;
    • Example 2: Sort elements in an array. Sample command:
      -- [null,10,20,30,30,40,50] is returned. 
      select sort_array(array(10, 20, 40, 30, 30, null, 50));
      -- [null,10,20,30,30,40,50] is returned. 
      select sort_array(array(10, 20, 40, 30, 30, null, 50),true);
      -- [50,40,30,30,20,10,null] is returned. 
      select sort_array(array(10, 20, 40, 30, 30, null, 50),false);

SPLIT

  • Syntax
    split(<str>, <pat>)
  • Description

    Returns an array after str is split with pat.

  • Parameters
    • str: required. The string that you want to split, which is of the STRING type.
    • pat: required. The STRING-type delimiter, which can be a regular expression.
  • Return value

    Data of the ARRAY <STRING> type is returned.

  • Example
    -- ["a","b","c"] is returned. 
    select split("a,b,c",",");

STACK

  • Syntax
    stack(n, expr1, ..., exprk) 
  • Description

    Splits expr1, ..., exprk into n rows. Unless otherwise specified, the output result uses the default column names col0, col1....

  • Parameters
    • n: required. The number of rows obtained after splitting.
    • expr: required. The parameter that you want to split. expr1,... exprk must be of the INTEGER type, and the number of parameters must be an integer multiple of n. The parameter must be able to be split into n complete rows. Otherwise, an error is returned.
  • Return value

    n rows with a specific number of columns are returned. The number of columns is equal to the number of parameters divided by n.

  • Example
    -- Split the parameter group of 1, 2, 3, 4, 5, 6 into three rows. 
    select stack(3, 1, 2, 3, 4, 5, 6);
    -- The following result is returned: 
    +------+------+
    | col0 | col1 |
    +------+------+
    | 1    | 2    |
    | 3    | 4    |
    | 5    | 6    |
    +------+------+
    
    -- Split 'A',10,date '2015-01-01','B',20,date '2016-01-01' into two rows.
    select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2);
    -- The following result is returned: 
    +------+------+------+
    | col0 | col1 | col2 |
    +------+------+------+
    | A    | 10   | 2015-01-01 |
    | B    | 20   | 2016-01-01 |
    +------+------+------+
    
    -- Split the parameter group of a, b, c, and d into two rows. If the source table contains multiple rows, this function is called for each row. 
    select stack(2,a,b,c,d) as (col,value)
    from values 
        (1,1,2,3,4),
        (2,5,6,7,8),
        (3,9,10,11,12),
        (4,13,14,15,null)
    as t(key,a,b,c,d);
    -- The following result is returned: 
    +------+-------+
    | col  | value |
    +------+-------+
    | 1    | 2     |
    | 3    | 4     |
    | 5    | 6     |
    | 7    | 8     |
    | 9    | 10    |
    | 11   | 12    |
    | 13   | 14    |
    | 15   | NULL  |
    +------+-------+
    
    -- Use this function with the LATERAL VIEW clause. 
    select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20, date '2016-01-01') tf as col0,col1,col2;
    -- The following result is returned: 
    +------+------+------+
    | col0 | col1 | col2 |
    +------+------+------+
    | A    | 10   | 2015-01-01 |
    | B    | 20   | 2016-01-01 |
    +------+------+------+

STR_TO_MAP

  • Syntax
    str_to_map(<text> [, <delimiter1> [, <delimiter2>]])
  • Description

    Splits text into key-value pairs by using delimiter1 and then separates keys from values in the key-value pairs by using delimiter2.

  • Parameters
    • text: required. The string that you want to split, which is of the STRING type.
    • delimiter1: optional. The delimiter of the STRING type. If this parameter is not specified, commas (,) are used.
    • delimiter2: optional. The delimiter of the STRING type. If this parameter is not specified, equal signs (=) are used.
      Note If the delimiter is a regular expression or special character, you must add two backslashes (\\) before the delimiter for escaping. The following special characters can be used as a delimiter: . ? + * :
  • Return value

    A value of the MAP<STRING, STRING> type is returned. The return value indicates that the string specified by text is split by using delimiter1 and delimiter2.

  • Example
    -- {"test2":"2","test1":"1"} is returned. 
    select str_to_map('test1&1-test2&2','-','&');
    -- {"test2":"2","test1":"1"} is returned. 
    select str_to_map("test1.1,test2.2", ",", "\\.");

STRUCT

  • Syntax
    struct struct(<value1>,<value2>, ...)
  • Description

    Creates a struct based on a given value list.

  • Parameters

    value: required. All data types are supported.

    MaxCompute V2.0 provides additional functions. If the functions you use involve new data types, execute the following SET statement to enable these data types. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.
    • Session level: To use a new data type, you must insert set odps.sql.type.system.odps2=true; before the SQL statement, and commit and execute them together.
    • Project level: The project owner can configure new data types for the project as required. It takes 10 to 15 minutes for the settings to take effect. Run the following command:
      setproject odps.sql.type.system.odps2=true;
      For more information about setproject, see Project operations. For the precautions you must take when you enable data types at the project level, see Data type editions.
  • Return value

    A value of the STRUCT type is returned. Column names are sequentially named as col1, col2, ...

  • Example
    select struct('a',123,'ture',56.90);
    -- The following result is returned: 
    +------------+
    | _c0        |
    +------------+
    | {"col1":"a","col2":123,"col3":"ture","col4":56.9} |
    +------------+

TABLE_EXISTS

  • Syntax
    boolean table_exists(string <table_name>)
  • Description

    Checks whether a specified table exists.

  • Parameters

    table_name: required. The name of the table that you want to query. The value is of the STRING type. You can specify a project name in the table name, for example, my_proj.my_table. If you do not specify a project name, the current project name is used.

  • Return value

    A value of the BOOLEAN type is returned. If the specified table exists, True is returned. Otherwise, False is returned.

  • Example
    -- Use this function for the list in the SELECT statement. 
    select if(table_exists('abd'), col1, col2) from src;

TRANS_ARRAY

  • Limits
    • All columns that are used as key must be placed before the columns that are to be transposed.
    • Only one UDTF is allowed in a SELECT statement.
    • This function cannot be used with the GROUP BY, CLUSTER BY, DISTRIBUTE BY, or SORT BY clause.
  • Syntax
    trans_array (<num_keys>, <separator>, <key1>,<key2>,…,<col1>,<col2>,<col3>) as (<key1>,<key2>,…,<col1>, <col2>)
  • Description

    Transposes one row of data into multiple rows. This function is a UDTF that transposes an array separated by fixed delimiters in a column into multiple rows.

  • Parameters
    • num_keys: required. The value is a constant of the BIGINT type and must be greater than or equal to 0. This parameter specifies the number of columns that can be used as key when you transpose one row into multiple rows.
    • separator: required. The value is a constant of the STRING type. This parameter is used to split a string into multiple elements. If it is left empty, an error is returned.
    • keys: required. The columns that are used as key during the transposition. The number of keys is specified by num_keys. If num_keys equals the total number of all columns (which means that all columns are used as key), only one row is returned.
    • cols: required. The array that you want to be converted to rows. All columns that follow keys are considered arrays to be transposed. The parameter value must be of the STRING type to store arrays in the STRING format, such as Hangzhou;Beijing;shanghai. The values in this array are separated by semicolons (;).
  • Return value

    Transposed rows are returned. The new column name is specified by as. The data types of columns that are used as key remain unchanged. All other columns are of the STRING type. The number of separated rows is based on the array with the maximum number of elements. If the number of rows is insufficient, NULL is added.

  • Example
    The t_table table contains the following data:
    +----------+----------+------------+
    | login_id | login_ip | login_time |
    +----------+----------+------------+
    | wangwangA | 192.168.0.1,192.168.0.2 | 20120101010000,20120102010000 |
    | wangwangB | 192.168.45.10,192.168.67.22,192,168.6.3 | 20120111010000,20120112010000,20120223080000 |
    +----------+----------+------------+
    -- Execute the following SQL statement: 
    select trans_array(1, ",", login_id, login_ip, login_time) as (login_id,login_ip,login_time) from t_table;
    The following result is returned: 
    +----------+----------+------------+
    | login_id | login_ip | login_time |
    +----------+----------+------------+
    | wangwangB | 192.168.45.10 | 20120111010000 |
    | wangwangB | 192.168.67.22 | 20120112010000 |
    | wangwangB | 192.168.6.3 | 20120223080000 |
    | wangwangA | 192.168.0.1 | 20120101010000 |
    | wangwangA | 192.168.0.2 | 20120102010000 |
    +----------+----------+------------+
    
    -- The table contains the following data: 
    Login_id LOGIN_IP LOGIN_TIME 
    wangwangA 192.168.0.1,192.168.0.2 20120101010000
    -- NULL is added to supplement the array in which data is insufficient.  
    Login_id Login_ip Login_time 
    wangwangA 192.168.0.1 20120101010000
    wangwangA 192.168.0.2 NULL

TRANS_COLS

  • Limits
    • All columns that are used as key must be placed before the columns that are to be transposed.
    • Only one UDTF is allowed in a SELECT statement.
  • Syntax
    trans_cols (<num_keys>, <key1>,<key2>,…,<col1>, <col2>,<col3>) as (<idx>, <key1>,<key2>,…,<col1>, <col2>)
  • Description

    Transposes one row of data into multiple rows. This function is a UDTF that transposes columns into rows.

  • Parameters
    • num_keys: required. The value is a constant of the BIGINT type and must be greater than or equal to 0. This parameter specifies the number of columns that can be used as key when you transpose one row into multiple rows.
    • keys: required. The columns that are used as key during the transposition. The number of keys is specified by num_keys. If num_keys equals the total number of all columns (which means that all columns are used as key), only one row is returned.
    • idx: required. The ID of a row after the transposition.
    • cols: required. The columns that you want to transpose into rows.
  • Return value

    Transposed rows are returned. The new column name is specified by as. The first output column is the subscripts of the transposition, which start with 1. The data types of the columns that are used as key remain unchanged, whereas the data types of other columns remain unchanged.

  • Example
    The t_table table contains the following data:
    +----------+----------+------------+
    | Login_id | Login_ip1 | Login_ip2 |
    +----------+----------+------------+
    | wangwangA | 192.168.0.1 | 192.168.0.2 |
    +----------+----------+------------+
    -- Execute the following SQL statement: 
    select trans_cols(1, login_id, login_ip1, login_ip2) as (idx, login_id, login_ip) from t_table;
    -- The following result is returned: 
    idx    login_id    login_ip
    1    wangwangA    192.168.0.1
    2    wangwangA    192.168.0.2

UUID

  • Syntax
    string uuid()
  • Description

    Returns a random ID, which is in the format of 29347a88-1e57-41ae-bb68-a9edbdd9****.

    Note The return value is a random global ID that has a low probability of duplication.