MaxCompute offers a variety of built-in functions to address data processing needs across numerous business scenarios. This topic describes the different types of built-in functions available in MaxCompute and provides guidance on their usage.
Function types
Function type | Description |
Date and time functions are called to process data of date types, such as DATE, DATETIME, and TIMESTAMP. You can call the functions to add and subtract date values, calculate date value differences, extract date fields, obtain the current time, and convert date formats. | |
Mathematical functions are called to process data of numeric types, such as BIGINT, DOUBLE, DECIMAL, and FLOAT. You can call the functions to perform base conversion, mathematical operations, rounding, and obtain random numbers. | |
Window functions are called to process the data of columns in a window. You can call the functions to calculate the sum, maximum value, minimum value, average value, and median value of column data. You can also call the functions to sort column data, obtain the data of columns at a specific offset, and sample column data. | |
Aggregate functions are called to aggregate multiple input records into a single output value. You can call the functions to calculate the sum, average value, maximum value, minimum value, parameter aggregation, and concatenate strings. | |
String functions are called to process data of the STRING type. You can call the functions to truncate strings, replace strings, search for strings, convert uppercase and lowercase letters, and convert string formats. | |
Complex type functions are called to process data of the MAP, ARRAY, STRUCT, and JSON type. You can call the functions to deduplicate, aggregate, sort, and merge elements. For more information, see The limits on the use of JSON functions. | |
Encryption functions are called to process table data of the STRING and BINARY types. You can call the functions to encrypt and decrypt data. | |
In addition to the preceding functions, other functions are provided to support other business scenarios. |
For more information about the correlation between each type of function and their counterparts in open-source platforms, see Mappings between built-in functions of MaxCompute and built-in functions of Hive, MySQL, and Oracle.
Precautions
When utilizing built-in functions, consider the following:
Ensure that the types and number of input parameters, along with the function format, comply with the function's syntax requirements. Non-compliance may result in parsing errors or execution failures when the SQL statement containing the built-in function is run.
To use built-in functions with input parameters supported by the MaxCompute V2.0 data type edition, you must enable this edition. Supported data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY. Without this edition enabled, executing SQL statements with these functions may lead to errors. Activation can be done at the session or project level:
Session level: To apply a configuration for the current SQL statement, prepend it with
set odps.sql.type.system.odps2=true;
and then submit them together for execution.Project level: A project owner can enable the MaxCompute V2.0 data type edition based on project needs. The setting becomes effective after 10 to 15 minutes and applies to all subsequent SQL statements.
setproject odps.sql.type.system.odps2=true;
Enabling the MaxCompute V2.0 data type edition in a project disables some implicit type conversions, such as STRING to BIGINT, STRING to DATETIME, DOUBLE to BIGINT, DECIMAL to DOUBLE and DECIMAL to BIGINT, which can lead to precision loss or errors. To address this, use the CAST function for explicit type conversions or disable the MaxCompute V2.0 data type edition.
If the name of a UDF is the same as that of a built-in function, the UDF is preferentially called. For example, if UDF CONCAT and built-in function CONCAT both exist in MaxCompute, the system automatically calls UDF CONCAT instead of the built-in function CONCAT. If you want to call the built-in function, you must add the symbol
::
before the built-in function. For example, you can useselect ::concat('ab', 'c');
.Differences in the global properties of a MaxCompute project can lead to inconsistent results from built-in functions. To view the project's global properties, you can execute the
setproject;
command.
The limits on the use of JSON functions
The supported development tools include the MaxCompute client (odpscmd), MaxCompute Studio, and DataWorks. External ecosystems such as Dataphin are not supported. If you need to use JSON data types supported by MaxCompute in an external ecosystem, you must check whether the external ecosystem supports the JSON data types. The following table describes the items that you must take note of when you use the MaxCompute client (odpscmd) or MaxCompute Studio as the development tool for data of a JSON type.
MaxCompute client (odpscmd)
MaxCompute Studio
You must upgrade the MaxCompute client to V0.46.5 or later. Otherwise, you cannot execute the
desc json_table
statement or download data of JSON types by using Tunnel.You must set the
use_instance_tunnel
parameter to false. This parameter is included in the odps_config.ini file in the conf folder in the installation path of the client. Otherwise, an error is reported when you perform a query.
MaxCompute Studio allows you to query JSON data. MaxCompute Studio does not allow you to upload or download JSON data.
If you want to use another engine, such as Hologres, to read data from a MaxCompute table, JSON data in the table cannot be read.
Columns of a JSON type cannot be added to a MaxCompute table.
You are not allowed to compare data of a JSON type with data of other types, execute SQL statements that contain the
ORDER BY
orGROUP BY
clause on data of a JSON type, or use columns of a JSON type asjoin
keys.For data of the JSON NUMBER type, the integer part is stored by using the BIGINT type, and the decimal part is stored by using the DOUBLE type. If the integer part exceeds the range that is supported by the BIGINT type, an integer overflow occurs. When the decimal part is converted into the DOUBLE type, precision loss occurs.
Strings that are used to generate JSON data cannot contain
\u0000
, which is the null character in Unicode.Java UDFs and Python UDFs do not support JSON data types.
Clustered tables cannot store JSON data.
SDKs for Java in versions earlier than V0.44.0 and PyODPS in versions earlier than V0.11.4.1 do not support JSON data types
Delta tables do not support JSON data types.
The JSON data type can be used nested, supporting up to 20 levels of nesting.
Limits on string functions
The following functions support only English characters:
TRIM, RTRIM, and LTRIM: The value of the trimChars parameter can contain only English characters.
REVERSE: This function supports only English characters in the Hive-compatible data type edition.
SOUNDEX: This function supports only English characters.
TOLOWER: This function is used to convert English characters in a string into lowercase characters.
TOUPPER: This function is used to convert English characters in a string into uppercase characters.
INITCAP: This function is used to convert the first letter of each word in English in a string into an uppercase letter and the other letters of each word into lowercase characters.