MaxCompute SQL provides other common functions for development. You can select the appropriate function as needed. This topic describes the command format, parameters, and examples for functions such as CAST, FAILIF, and HASH that are supported by MaxCompute SQL.
Function | Features |
Filters data that meets the specified range condition. | |
Returns different values based on the result of an expression. | |
Converts the result of an expression to a target data type. | |
Returns the first non-NULL value in a parameter list. | |
Compresses a STRING or BINARY input parameter using the GZIP algorithm. | |
Calculates the cyclic redundancy check (CRC) value of a string or binary data. | |
Decompresses a BINARY input parameter using the GZIP algorithm. | |
Returns true or a custom error message based on the result of an expression. | |
Returns the current age based on a Chinese ID card number. | |
Returns the date of birth based on a Chinese ID card number. | |
Returns the sex based on a Chinese ID card number. | |
Obtains the ID of the current account. | |
Calculates a hash value based on input parameters. | |
Checks whether a specified condition is true. | |
Returns the maximum value of the level-1 partition in a partitioned table. | |
Determines whether the two input parameters are equal. | |
Specifies the return value for a parameter that is NULL. | |
Sorts input variables in ascending order and returns the value at a specified position. | |
Queries whether a specified partition exists. | |
Samples all read column values and filters out rows that do not meet the sampling conditions. | |
Calculates the SHA-1 hash value of a string or binary data. | |
Calculates the SHA-1 hash value of a string or binary data. | |
Calculates the SHA-2 hash value of a string or binary data. | |
Splits a specified group of parameters into a specified number of rows. | |
Splits a string into keys and values based on specified separators. | |
Queries whether a specified table exists. | |
A user-defined table-valued function (UDTF) that converts one row of data into multiple rows. It converts an array stored in a column with a fixed separator into multiple rows. | |
A UDTF that converts one row of data into multiple rows. It splits different columns into different rows. | |
Returns a random ID. This function is more efficient than the UUID function. | |
Returns a random ID. |
BETWEEN AND expression
Command format
<a> [NOT] BETWEEN <b> AND <c>Description
Filters data where the value of a is between b and c, or is not between b and c.
Parameters
a: Required. The field to be filtered.
b and c: Required. The specified range. The data types of b and c must be the same as the data type of a.
Return value
Returns the data that meets the condition.
If a, b, or c is null, the result is null.
Examples
The following data is in the
emptable.| empno | ename | job | mgr | hiredate| sal| comm | deptno | 7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20 7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30 7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30 7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20 7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30 7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30 7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10 7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20 7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10 7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30 7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20 7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30 7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20 7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10 7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10 7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10 7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10The following command queries for data where the value of
salis greater than or equal to 1000 and less than or equal to 1500.select * from emp where sal between 1000 and 1500;The following result is returned.
+-------+-------+-----+------------+------------+------------+------------+------------+ | empno | ename | job | mgr | hiredate | sal | comm | deptno | +-------+-------+-----+------------+------------+------------+------------+------------+ | 7521 | WARD | SALESMAN | 7698 | 1981-02-22 00:00:00 | 1250.0 | 500.0 | 30 | | 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 00:00:00 | 1250.0 | 1400.0 | 30 | | 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 00:00:00 | 1500.0 | 0.0 | 30 | | 7876 | ADAMS | CLERK | 7788 | 1987-05-23 00:00:00 | 1100.0 | NULL | 20 | | 7934 | MILLER | CLERK | 7782 | 1982-01-23 00:00:00 | 1300.0 | NULL | 10 | | 7956 | TEBAGE | CLERK | 7748 | 1982-12-30 00:00:00 | 1300.0 | NULL | 10 | +-------+-------+-----+------------+------------+------------+------------+------------+
CASE WHEN expression
Command format
MaxCompute provides the following two
CASE WHENformats:CASE <value> WHEN <value1> THEN <result1> WHEN <value2> THEN <result2> ... ELSE <resultn> ENDCASE WHEN (<_condition1>) THEN <result1> WHEN (<_condition2>) THEN <result2> WHEN (<_condition3>) THEN <result3> ... ELSE <resultn> END
Description
Returns different result values based on the result of value or _condition.
Parameters
value: Required. The value to compare.
_condition: Required. The condition to evaluate.
result: Required. The return value.
Return value
If result contains only BIGINT and DOUBLE types, all values are converted to DOUBLE before the result is returned.
If result includes the STRING type, all values are converted to the STRING type before the result is returned. If a data type conversion is not supported, an error is returned. For example, data of the BOOLEAN type cannot be converted to the STRING type.
Other type conversions are not allowed.
Examples
The
sale_detailtable has the fieldsshop_name string, customer_id string, total_price doubleand contains the following data.+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | | null | c5 | NULL | 2014 | shanghai | | s6 | c6 | 100.4 | 2014 | shanghai | | s7 | c7 | 100.5 | 2014 | shanghai | +------------+-------------+-------------+------------+------------+The following is a command example.
select case when region='china' then 'default_region' when region like 'shang%' then 'sh_region' end as region from sale_detail;The following result is returned.
+------------+ | region | +------------+ | default_region | | default_region | | default_region | | sh_region | | sh_region | | sh_region | +------------+
CAST
Command format
CAST(<expr> AS <type>)Description
Converts the result of expr to the target data type type.
Parameters
expr: Required. The source data to convert.
type: Required. The target data type. The usage is as follows:
cast(double as bigint): Converts a value of the DOUBLE data type to the BIGINT data type.cast(string as bigint): When you convert a string to the BIGINT data type, if the string contains a number expressed as an integer, it is directly converted to the BIGINT type. If the string contains a number expressed in floating-point or exponential form, it is first converted to the DOUBLE data type and then to the BIGINT data type.cast(string as datetime)orcast(datetime as string): The default date formatyyyy-mm-dd hh:mi:ssis used.
Return value
The return value is of the target data type.
If you run
setproject odps.function.strictmode=false, the function returns the digits before the letters.If you run
setproject odps.function.strictmode=true, an error is returned.When you convert a value to the DECIMAL type, if you set
odps.sql.decimal.tostring.trimzero=true, trailing zeros after the decimal point are removed. If you setodps.sql.decimal.tostring.trimzero=false, trailing zeros after the decimal point are retained.ImportantThe
odps.sql.decimal.tostring.trimzeroparameter takes effect for both data retrieved from tables and static values.
Examples
Example 1: Common usage.
--Returns 1. select cast('1' as bigint);Example 2: Convert a STRING value to a BOOLEAN value. If the STRING is empty,
falseis returned. Otherwise,trueis returned.The STRING is empty.
select cast("" as boolean); --Returns +------+ | _c0 | +------+ | false | +------+The STRING is not empty.
select cast("false" as boolean); --Returns true +------+ | _c0 | +------+ | true | +------+
Example 3: Convert a string to a date.
--Convert a string to a date. select cast("2022-12-20" as date); --Returns +------------+ | _c0 | +------------+ | 2022-12-20 | +------------+ --Convert a date string with a time part to a date. select cast("2022-12-20 00:01:01" as date); --Returns +------------+ | _c0 | +------------+ | NULL | +------------+ --To ensure the value is displayed correctly, set the following parameter: set odps.sql.executionengine.enable.string.to.date.full.format= true; select cast("2022-12-20 00:01:01" as date); --Returns +------------+ | _c0 | +------------+ | 2022-12-20 | +------------+NoteBy default, the
odps.sql.executionengine.enable.string.to.date.full.formatparameter isfalse. To convert a date string that includes a time part, set this parameter totrue.Example 4 (Incorrect command example): Invalid usage. An exception is thrown if a conversion fails or is not supported. The following command is an example of incorrect usage.
select cast('abc' as bigint);Example 5: Example of a scenario in which
setproject odps.function.strictmode=falseis set.setprojectodps.function.strictmode=false; select cast('123abc'as bigint); --Returns +------------+ |_c0| +------------+ |123| +------------+Example 6: Example of a scenario in which
setproject odps.function.strictmode=trueis set.setprojectodps.function.strictmode=true; select cast('123abc' as bigint); --Returns FAILED:ODPS-0130071:[0,0]Semanticanalysisexception-physicalplangenerationfailed:java.lang.NumberFormatException:ODPS-0123091:Illegaltypecast-Infunctioncast,value'123abc'cannotbecastedfromStringtoBigint.Example 7: Example of a scenario in which
odps.sql.decimal.tostring.trimzerois set.--Create a table. create table mf_dot (dcm1 decimal(38,18), dcm2 decimal(38,18)); --Insert data. insert into table mf_dot values (12.45500BD,12.3400BD); --When the flag is true or not set. set odps.sql.decimal.tostring.trimzero=true; --Remove trailing zeros after the decimal point. select cast(round(dcm1,3) as string),cast(round(dcm2,3) as string) from mf_dot; --Return value +------------+------------+ | _c0 | _c1 | +------------+------------+ | 12.455 | 12.34 | +------------+------------+ --When the flag is false. set odps.sql.decimal.tostring.trimzero=false; --Keep trailing zeros after the decimal point. select cast(round(dcm1,3) as string),cast(round(dcm2,3) as string) from mf_dot; --Return value +------------+------------+ | _c0 | _c1 | +------------+------------+ | 12.455 | 12.340 | +------------+------------+ --This parameter also applies to static values. set odps.sql.decimal.tostring.trimzero=false; select cast(round(12345.120BD,3) as string); --Returns: +------------+ | _c0 | +------------+ | 12345.120 | +------------+
COALESCE
Command format
COALESCE(<expr1>, <expr2>, ...)Description
Returns the first non-NULL value in the list of
<expr1>, <expr2>, ....Parameters
expr: Required. The value to check.
Return value
The data type of the return value is the same as the data types of the parameters.
Examples
Example 1: Common usage. The following is a command example.
--Returns 1. select coalesce(null,null,1,null,3,5,7);Example 2: An error is returned if the data type of a parameter value is not defined.
Invalid command
-- The data type of the parameter abc is not defined. The system engine cannot recognize it and returns an error. select coalesce(null,null,1,null,abc,5,7);Valid command
select coalesce(null,null,1,null,'abc',5,7);
Example 3: If all parameter values are null when you do not read data from a table, an error is returned. The following is an invalid command.
-- An error is returned, indicating that at least one parameter value must be non-NULL. select coalesce(null,null,null,null);Example 4: If all parameter values are null when you read data from a table, NULL is returned.
Source table:
+-----------+-------------+------------+ | shop_name | customer_id | toal_price | +-----------+-------------+------------+ | ad | 10001 | 100.0 | | jk | 10002 | 300.0 | | ad | 10003 | 500.0 | | tt | NULL | NULL | +-----------+-------------+------------+As shown in the source table, all values for tt are null. After you run the following statement, NULL is returned.
select coalesce(customer_id,total_price) from sale_detail where shop_name='tt';
COMPRESS
Command format
BINARY COMPRESS(STRING <str>) BINARY COMPRESS(BINARY <bin>)Description
Compresses str or bin using the GZIP algorithm.
Parameters
str: Required. A value of the STRING type.
bin: Required. A value of the BINARY type.
Return value
Returns a value of the BINARY type. If the input parameter is NULL, NULL is returned.
Examples
-- The return value is =1F=8B=08=00=00=00=00=00=00=03=CBH=CD=C9=C9=07=00=86=A6=106=05=00=00=00. select compress('hello');Example 2: The input parameter is an empty string. The following is a command example.
-- The return value is =1F=8B=08=00=00=00=00=00=00=03=03=00=00=00=00=00=00=00=00=00. select compress('');Example 3: The input parameter is NULL. The following is a command example.
-- Returns NULL. select compress(null);
CRC32
Command format
BIGINT CRC32(STRING|BINARY <expr>)Description
Calculates the cyclic redundancy check (CRC) value of the STRING or BINARY expression expr.
Parameters
expr: Required. A value of the STRING or BINARY type.
Return value
Returns a value of the BIGINT type. The following rules apply:
If the input parameter is NULL, NULL is returned.
If the input parameter is an empty string, 0 is returned.
Examples
Example 1: Calculates the CRC value of the string
ABC. The following is a command example.-- Returns 2743272264. select crc32('ABC');Example 2: The input parameter is NULL. The following is a command example.
-- Returns NULL. select crc32(null);
DECOMPRESS
Command format
BINARY DECOMPRESS(BINARY <bin>)Description
Decompresses bin using the GZIP algorithm.
Parameters
bin: Required. A value of the BINARY type.
Return value
Returns a value of the BINARY type. If the input parameter is NULL, NULL is returned.
Examples
Example 1: Decompresses the compressed result of the string
hello, worldand converts it to the STRING format. The following is a command example.-- Returns hello, world. select cast(decompress(compress('hello, world')) as string);Example 2: The input parameter is NULL. The following is a command example.
-- Returns NULL. select decompress(null);
GET_IDCARD_AGE
Command format
get_idcard_age(<idcardno>)Description
Returns the current age based on a Chinese ID card number. The age is calculated by subtracting the birth year from the ID card number from the current year.
Parameters
idcardno: Required. A 15-digit or 18-digit Chinese ID card number of the STRING type. The function checks the validity of the ID card number based on the province code and the last check digit. If the check fails, NULL is returned.
Return value
Returns a value of the BIGINT type. If the input is NULL, NULL is returned.
GET_IDCARD_BIRTHDAY
Command format
get_idcard_birthday(<idcardno>)Description
Returns the date of birth based on a Chinese ID card number.
Parameters
idcardno: Required. A 15-digit or 18-digit Chinese ID card number of the STRING type. The function checks the validity of the ID card number based on the province code and the last check digit. If the check fails, NULL is returned.
Return value
Returns a value of the DATETIME type. If the input is NULL, NULL is returned.
GET_IDCARD_SEX
Command format
get_idcard_sex(<idcardno>)Description
Returns the sex based on a Chinese ID card number. The value is
M(male) orF(female).Parameters
idcardno: Required. A 15-digit or 18-digit Chinese ID card number of the STRING type. The function checks the validity of the ID card number based on the province code and the last check digit. If the check fails, NULL is returned.
Return value
Returns a value of the STRING type. If the input is NULL, NULL is returned.
GET_USER_ID
Command format
get_user_id()Description
Retrieves the ID of the current account. This is also known as the user ID or UID.
Parameters
No parameters are required.
Return value
Returns the ID of the current account.
Examples
select get_user_id(); -- The following result is returned. +------------+ | _c0 | +------------+ | 1117xxxxxxxx8519 | +------------+
HASH
Command format
If the MaxCompute project is in Hive-compatible mode, the command format is as follows.
INT HASH(<value1>, <value2>[, ...]);If the MaxCompute project is not in Hive-compatible mode, the command format is as follows.
BIGINT HASH(<value1>, <value2>[, ...]);
Description
Calculates the hash value of value1 and value2.
Parameters
value1 and value2: Required. The parameters for which to calculate the hash value. The parameters can be of different data types. The supported data types vary between Hive-compatible and non-Hive-compatible modes:
Hive-compatible mode: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, BOOLEAN, STRING, CHAR, VARCHAR, DATETIME, and DATE.
Non-Hive-compatible mode: BIGINT, DOUBLE, BOOLEAN, STRING, and DATETIME.
NoteFor the same inputs, the returned hash values are always the same. However, if two hash values are the same, it does not guarantee that the input values are the same. A hash collision may occur.
Return value
Returns a value of the INT or BIGINT type. If an input parameter is an empty string or NULL, 0 is returned.
Examples
Example 1: Calculate the hash value of input parameters of the same data type. The following is a command example.
-- Returns 66. SELECT HASH(0L, 2L, 4L);Example 2: Calculate the hash value of input parameters of different data types. The following is a command example.
-- Returns 97. SELECT HASH(0L, 'a');Example 3: An input parameter is an empty string or NULL. The following is a command example.
-- Returns 0. SELECT HASH(0L, null); -- Returns 0. SELECT HASH(0L, '');
IF
Command format
IF(<testCondition>, <valueTrue>, <valueFalseOrNull>)Description
Checks whether testCondition is true. If it is true, the function returns the value of valueTrue. Otherwise, it returns the value of valueFalseOrNull.
Parameters
testCondition: Required. The expression to evaluate. It must be of the BOOLEAN type.
valueTrue: Required. The value to return if the testCondition expression is true.
valueFalseOrNull: The value to return if the testCondition expression is false. This can be set to NULL.
Return value
The data type of the return value is the same as the data type of the valueTrue or valueFalseOrNull parameter.
Examples
-- Returns 200. select if(1=2, 100, 200);
MAX_PT
Command format
MAX_PT(<table_full_name>)Description
Returns the maximum value of a level-1 partition that contains data in a partitioned table. The values are sorted alphabetically. The function then reads the data from that partition.
Notes
The
MAX_PTfunction can also be implemented using standard SQL.SELECT * FROM table WHERE pt=MAX_PT("table");can be rewritten asSELECT * FROM table WHERE pt = (SELECT MAX(pt) FROM table);.NoteMaxCompute does not provide a
MIN_PTfunction. You cannot use the SQL statementSELECT * FROM table WHERE pt=MIN_PT("table");to achieve a similar function asMAX_PTto retrieve the minimum partition that contains data. However, you can use the standard SQL statementSELECT * FROM table WHERE pt= (SELECT MIN(pt) FROM table);to achieve the same effect.If all partitions in the table are empty, the
MAX_PTfunction fails. Make sure that at least one partition is not empty.OSS foreign tables also support the MAX_PT function. The behavior is the same as for internal tables.
Parameters
table_full_name: Required. A value of the STRING type. The name of the table. You must have read permissions on the table.
Return value
Returns the value of the largest level-1 partition.
NoteIf you create a partition only using
ALTER TABLEand the partition does not contain any data, the partition is not returned.Examples
Example 1: The table tbl is a partitioned table. Its partitions are 20120901 and 20120902, and both contain data. In the following statement,
MAX_PTreturns'20120902'. The MaxCompute SQL statement reads the data from thept='20120902'partition. The following is a command example.SELECT * FROM tbl WHERE pt= MAX_PT('tbl'); -- This is equivalent to the following statement. SELECT * FROM tbl WHERE pt= (SELECT MAX(pt) FROM tbl);Example 2: In a multi-level partitioning scenario, use standard SQL to get data from the largest partition. The following is a command example.
SELECT * FROM table WHERE pt1 = (SELECT MAX(pt1) FROM table) AND pt2 = (SELECT MAX(pt2) FROM table WHERE pt1= (SELECT MAX(pt1) FROM table));
NULLIF
Command format
T NULLIF(T <expr1>, T <expr2>)Description
Compares the values of expr1 and expr2. If they are equal, the function returns NULL. Otherwise, it returns expr1.
Parameters
expr1 and expr2: Required. Expressions of any type.
Tindicates the input data type. It can be any data type that MaxCompute supports.Return value
Returns NULL or expr1.
Examples
-- Returns 2. select nullif(2, 3); -- Returns NULL. select nullif(2, 2); -- Returns 3. select nullif(3, null);
NVL
Command format
nvl(T <value>, T <default_value>)Description
If the value of value is NULL, the function returns default_value. Otherwise, it returns value. The two parameters must be of the same data type.
Parameters
value: Required. The input parameter.
Tindicates the input data type. It can be any data type that MaxCompute supports.default_value: Required. The replacement value. It must be of the same data type as value.
Examples
The table
t_datahas three columns:c1 string,c2 bigint, andc3 datetime. The table contains the following data.+----+------------+------------+ | c1 | c2 | c3 | +----+------------+------------+ | NULL | 20 | 2017-11-13 05:00:00 | | ddd | 25 | NULL | | bbb | NULL | 2017-11-12 08:00:00 | | aaa | 23 | 2017-11-11 00:00:00 | +----+------------+------------+You can use the
nvlfunction to output 00000 for NULL values inc1, 0 for NULL values inc2, and-for NULL values inc3. The following is a command example.select nvl(c1,'00000'),nvl(c2,0),nvl(c3,'-') from nvl_test; -- The following result is returned. +-----+------------+-----+ | _c0 | _c1 | _c2 | +-----+------------+-----+ | 00000 | 20 | 2017-11-13 05:00:00 | | ddd | 25 | - | | bbb | 0 | 2017-11-12 08:00:00 | | aaa | 23 | 2017-11-11 00:00:00 | +-----+------------+-----+
ORDINAL
Command format
ORDINAL(BIGINT <nth>, <var1>, <var2>[,...])Description
Sorts the input variables in ascending order and returns the value at the nth position.
Parameters
nth: Required. The position number, starting from 1. A value of the BIGINT type. If the value at the specified position is NULL, NULL is returned.
var: Required. The values to be sorted. They can be of the BIGINT, DOUBLE, DATETIME, or STRING type.
Return value
The value at the nth position. If no implicit conversion occurs, the data type of the return value is the same as the data type of the input parameters.
If type conversion occurs, a conversion between DOUBLE, BIGINT, and STRING returns a DOUBLE type. A conversion between STRING and DATETIME returns a DATETIME type. Other implicit conversions are not allowed.
NULL is treated as the minimum value.
Examples
-- Returns 3. SELECT ORDINAL(CAST(3 AS BIGINT), CAST(1 AS BIGINT), cast(3 AS BIGINT), cast(7 AS BIGINT), cast(5 AS BIGINT), cast(2 AS BIGINT), cast(4 AS BIGINT), cast(6 AS BIGINT));
PARTITION_EXISTS
Command format
boolean partition_exists(string <table_name>, string... <partitions>)Description
Checks whether a specified partition exists.
Parameters
table_name: Required. The table name. A value of the STRING type. The table name can include the project name, such as
my_proj.my_table. If you do not specify a project name, the current project is used by default.partitions: Required. The partition name. A value of the STRING type. Specify the partition values in the order of the partition key columns. The number of partition values must be the same as the number of partition key columns.
Return value
Returns a value of the BOOLEAN type. The function returns True if the specified partition exists. Otherwise, it returns False.
Examples
-- Create a partitioned table named foo. create table foo (id bigint) partitioned by (ds string, hr string); -- Add a partition to the foo table. alter table foo add partition (ds='20190101', hr='1'); -- Query whether the partition ds='20190101' and hr='1' exists. The result is True. select partition_exists('foo', '20190101', '1');
SAMPLE
Command format
boolean sample(<x>, <y>, [<column_name1>, <column_name2>[,...]])Description
The system samples all read values of column_name based on the settings of x and y, and filters out rows that do not meet the sampling conditions.
Parameters
x and y: x is required. A BIGINT constant greater than 0. This indicates that the data is hashed into x parts, and the yth part is selected.
y is optional. If omitted, the first part is selected by default. If you omit the y parameter, you must also omit column_name.
If x or y is of another type or is less than or equal to 0, an exception is thrown. If y is greater than x, an exception is also returned. If either x or y is NULL, NULL is returned.
column_name: Optional. The target column for sampling. If this parameter is omitted, random sampling is performed based on the values of x and y. This parameter can be of any data type, and its value can be NULL. No implicit type conversion is performed. If column_name is a constant NULL, an error is returned.
NoteTo prevent data skew caused by NULL values, NULL values in column_name are evenly hashed into x parts. If you do not specify column_name, the output may not be uniform when the data volume is small. In this case, you can specify column_name to obtain a better output result.
Currently, random sampling is supported only for columns of the following data types: bigint, datetime, boolean, double, string, binary, char, and varchar.
Return value
Returns a value of the BOOLEAN type.
Examples
Assume that the table
tblacontains a column namedcola.-- The values are hashed into 4 parts based on the cola column, and the 1st part is taken. The return value is True. select * from tbla where sample (4, 1 , cola); -- Each row of data is randomly hashed into 4 parts, and the 2nd part is taken. The return value is True. select * from tbla where sample (4, 2);
SHA
Command format
STRING SHA(STRING|BINARY <expr>)Description
Calculates the SHA-1 hash value of the STRING or BINARY expression expr and returns it as a hexadecimal string.
Parameters
expr: Required. A value of the STRING or BINARY type.
Return value
Returns a value of the STRING type. If the input parameter is NULL, NULL is returned.
Examples
Example 1: Calculates the SHA hash value of the string
ABC. The following is a command example.-- Returns 3c01bdbb26f358bab27f267924aa2c9a03fcfdb8. select sha('ABC');Example 2: The input parameter is NULL. The following is a command example.
-- Returns NULL. select sha(null);
SHA1
Command format
string sha1(string|binary <expr>)Description
Calculates the SHA-1 hash value of the STRING or BINARY expression expr and returns it as a hexadecimal string.
Parameters
expr: Required. A value of the STRING or BINARY type.
Return value
Returns a value of the STRING type. If the input parameter is NULL, NULL is returned.
Examples
Example 1: Calculates the SHA-1 hash value of the string
ABC. The following is a command example.-- Returns 3c01bdbb26f358bab27f267924aa2c9a03fcfdb8. select sha1('ABC');Example 2: The input parameter is NULL. The following is a command example.
-- Returns NULL. select sha1(null);
SHA2
Command format
string sha2(string|binary <expr>, bigint <number>)Description
Calculates the SHA-2 hash value of the STRING or BINARY expression expr and returns it in the format specified by number.
Parameters
expr: Required. A value of the STRING or BINARY type.
number: Required. A value of the BIGINT type. The hash bit length. The value must be 224, 256, 384, 512, or 0 (which is the same as 256).
Return value
Returns a value of the STRING type. The following rules apply:
If any input parameter is NULL, NULL is returned.
If the value of number is not within the allowed range, NULL is returned.
Examples
Example 1: Calculates the SHA-2 hash value of the string
ABC. The following is a command example.-- Returns b5d4045c3f466fa91fe2cc6abe79232a1a57cdf104f7a26e716e0a1e2789df78. select sha2('ABC', 256);Example 2: An input parameter is NULL. The following is a command example.
-- Returns NULL. select sha2('ABC', null);
STACK
Command format
stack(n, expr1, ..., exprk)Description
Splits
expr1, ..., exprkinto n rows. Unless otherwise specified, the output uses the default column namescol0, col1, ....Parameters
n: Required. The number of rows to split into.
expr: Required. The parameters to be split,
expr1, ..., exprk, must be integers. The number of parameters must be an integer multiple of n to be split into n complete rows. Otherwise, an error is returned.
Return value
Returns a dataset with n rows. The number of columns is the quotient of the number of parameters divided by n.
Examples
-- Arrange 1, 2, 3, 4, 5, 6 into 3 rows. select stack(3, 1, 2, 3, 4, 5, 6); -- The following result is returned. +------+------+ | col0 | col1 | +------+------+ | 1 | 2 | | 3 | 4 | | 5 | 6 | +------+------+ -- Arrange 'A',10,date '2015-01-01','B',20,date '2016-01-01' into two rows. select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2); -- The following result is returned. +------+------+------+ | col0 | col1 | col2 | +------+------+------+ | A | 10 | 2015-01-01 | | B | 20 | 2016-01-01 | +------+------+------+ -- Arrange a, b, c, d into two rows. If the source table has multiple rows, the stack operation is performed row by row. select stack(2,a,b,c,d) as (col,value) from values (1,1,2,3,4), (2,5,6,7,8), (3,9,10,11,12), (4,13,14,15,null) as t(key,a,b,c,d); -- The following result is returned. +------+-------+ | col | value | +------+-------+ | 1 | 2 | | 3 | 4 | | 5 | 6 | | 7 | 8 | | 9 | 10 | | 11 | 12 | | 13 | 14 | | 15 | NULL | +------+-------+ -- Use with LATERAL VIEW. select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20, date '2016-01-01') tf as col0,col1,col2; -- The following result is returned. +------+------+------+ | col0 | col1 | col2 | +------+------+------+ | A | 10 | 2015-01-01 | | B | 20 | 2016-01-01 | +------+------+------+
STR_TO_MAP
Command format
STR_TO_MAP([STRING <mapDupKeyPolicy>,] <text> [, <delimiter1> [, <delimiter2>]])Description
Uses delimiter1 to split text into key-value pairs, and then uses delimiter2 to split each key-value pair into a key and a value.
Parameters
-
mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:
-
exception: An error is returned.
-
last_win: The latter key overwrites the former key.
You can also specify the
odps.sql.map.key.dedup.policyparameter at the session level to configure the method that is used to process duplicate keys. For example, you can setodps.sql.map.key.dedup.policyto exception. If you do not specify this parameter, the default value last_win is used.NoteThe behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of
odps.sql.map.key.dedup.policyis used. -
text: Required. A value of the STRING type. The string to be split.
delimiter1: Optional. A value of the STRING type. The separator. If not specified, the default value is a comma (
,).delimiter2: Optional. A value of the STRING type. The separator. If not specified, the default value is an equal sign (
=).NoteIf the separator is a regular expression or a special character, you must escape it with two backslashes (\\). Special characters include the colon (:), period (.), question mark (?), plus sign (+), and asterisk (*).
-
Return value
The return value is of the
map<string, string>type. The return value is the result of splitting text by delimiter1 and delimiter2.Examples
-- Returns {test1:1, test2:2}. select str_to_map('test1&1-test2&2','-','&'); -- Returns {test1:1, test2:2}. select str_to_map("test1.1,test2.2", ",", "\\."); -- Returns {test1:1, test2:3}. select str_to_map("test1.1,test2.2,test2.3", ",", "\\.");
TABLE_EXISTS
Command format
BOOLEAN TABLE_EXISTS(STRING <table_name>)Description
Checks whether a specified table exists.
Parameters
table_name: Required. The table name. A value of the STRING type. The table name can include the project name, such as
my_proj.my_table. If you do not specify a project name, the current project is used by default.Return value
Returns a value of the BOOLEAN type. The function returns True if the specified table exists. Otherwise, it returns False.
Examples
-- Use in a SELECT list. select if(table_exists('abd'), col1, col2) from src;
TRANS_ARRAY
Limits
All columns that are used as
keysmust be placed first, and the columns to be transposed must be placed after them.A
SELECTstatement can contain only one UDTF. No other columns can be included.Cannot be used with
GROUP BY,CLUSTER BY,DISTRIBUTE BY, orSORT BY.
Command format
TRANS_ARRAY (<num_keys>, <separator>, <key1>,<key2>,…,<col1>,<col2>,<col3>) AS (<key1>,<key2>,...,<col1>, <col2>)Description
A user-defined table-valued function (UDTF) that converts one row of data into multiple rows. It converts an array that is stored in a column and delimited by a fixed separator into multiple rows.
Parameters
num_keys: Required. A BIGINT constant. The value must be
>=0. The number of columns to be used as transposekeyswhen converting to multiple rows.separator: Required. A STRING constant. The separator used to split the string into multiple elements. If this parameter is empty, an error is returned.
keys: Required. The columns to be used as
keysduring transposition. The number of columns is specified by num_keys. If num_keys specifies that all columns are used askeys(that is, num_keys is equal to the total number of columns), only one row is returned.cols: Required. The arrays to be converted into rows. All columns after the
keysare considered arrays to be transposed. They must be of the STRING type and store arrays in string format, such asHangzhou;Beijing;Shanghai, which is an array delimited by semicolons (;).
Return value
Returns the transposed rows. The new column names are specified by
AS. The data types of the columns used askeysremain unchanged. All other columns are of the STRING type. The number of resulting rows is determined by the array with the most elements. Shorter arrays are padded with NULL.Examples
Example 1: The table
t_tablecontains the following data.+----------+----------+------------+ | login_id | login_ip | login_time | +----------+----------+------------+ | wangwangA | 192.168.0.1,192.168.0.2 | 20120101010000,20120102010000 | | wangwangB | 192.168.45.10,192.168.67.22,192.168.6.3 | 20120111010000,20120112010000,20120223080000 | +----------+----------+------------+ -- Run the SQL statement. select trans_array(1, ",", login_id, login_ip, login_time) as (login_id,login_ip,login_time) from t_table; -- The following result is returned. +----------+----------+------------+ | login_id | login_ip | login_time | +----------+----------+------------+ | wangwangB | 192.168.45.10 | 20120111010000 | | wangwangB | 192.168.67.22 | 20120112010000 | | wangwangB | 192.168.6.3 | 20120223080000 | | wangwangA | 192.168.0.1 | 20120101010000 | | wangwangA | 192.168.0.2 | 20120102010000 | +----------+----------+------------+ -- If the table contains the following data. Login_id LOGIN_IP LOGIN_TIME wangwangA 192.168.0.1,192.168.0.2 20120101010000 -- The insufficient data in the array is padded with NULL. Login_id Login_ip Login_time wangwangA 192.168.0.1 20120101010000 wangwangA 192.168.0.2 NULLExample 2: The table mf_fun_array_test_t contains the following data.
+------------+------------+------------+------------+ | id | name | login_ip | login_time | +------------+------------+------------+------------+ | 1 | Tom | 192.168.100.1,192.168.100.2 | 20211101010101,20211101010102 | | 2 | Jerry | 192.168.100.3,192.168.100.4 | 20211101010103,20211101010104 | +------------+------------+------------+------------+ -- Use two keys, id and name, to convert to an array. Run the SQL statement. select trans_array(2, ",", Id,Name, login_ip, login_time) as (Id,Name,login_ip,login_time) from mf_fun_array_test_t; -- The following result is returned. The data is split and grouped by the keys id and name. +------------+------------+------------+------------+ | id | name | login_ip | login_time | +------------+------------+------------+------------+ | 1 | Tom | 192.168.100.1 | 20211101010101 | | 1 | Tom | 192.168.100.2 | 20211101010102 | | 2 | Jerry | 192.168.100.3 | 20211101010103 | | 2 | Jerry | 192.168.100.4 | 20211101010104 | +------------+------------+------------+------------+
TRANS_COLS
Limits
All columns that are used as
keysmust be placed first, and the columns to be transposed must be placed after them.A
SELECTstatement can contain only one UDTF. No other columns can be included.
Command format
TRANS_COLS (<num_keys>, <key1>,<key2>,…,<col1>, <col2>,<col3>) AS (<idx>, <key1>,<key2>,…,<col1>, <col2>)Description
A UDTF that converts one row of data into multiple rows. It splits different columns into different rows.
Parameters
num_keys: Required. A BIGINT constant. The value must be
>=0. The number of columns to be used as transpose keys when converting to multiple rows.keys: Required. The columns to be used as keys during transposition. The number of columns is specified by num_keys. If num_keys specifies that all columns are used as keys (that is, num_keys is equal to the total number of columns), only one row is returned.
idx: Required. The row number after conversion.
cols: Required. The columns to be converted into rows.
Return value
Returns the transposed rows. The new column names are specified by
AS. The first column of the output is the transpose index, which starts from 1. The data types of the columns used as keys remain unchanged. All other columns retain their original data types.Examples
The table
t_tablecontains the following data.+----------+----------+------------+ | Login_id | Login_ip1 | Login_ip2 | +----------+----------+------------+ | wangwangA | 192.168.0.1 | 192.168.0.2 | +----------+----------+------------+ -- Run the SQL statement. select trans_cols(1, login_id, login_ip1, login_ip2) as (idx, login_id, login_ip) from t_table; -- The following result is returned. idx login_id login_ip 1 wangwangA 192.168.0.1 2 wangwangA 192.168.0.2
UNIQUE_ID
Command format
string unique_id()Description
Returns a random unique ID, such as
29347a88-1e57-41ae-bb68-a9edbdd9****_1. This function is more efficient than the UUID function and returns a longer ID. Compared with a UUID, this ID includes an additional underscore (_) and a number, such as_1.
UUID
Command format
string uuid()Description
Returns a random ID, such as
29347a88-1e57-41ae-bb68-a9edbdd9****.NoteUUID returns a random global ID that has a very low probability of being repeated.