MaxCompute provides various built-in functions for data processing. This topic describes the available function types and their usage.
Function types
Function type | Description |
Process date and time data types, such as DATE, DATETIME, and TIMESTAMP. These functions let you add or subtract dates, calculate date differences, extract date fields, get the current time, and convert date formats. | |
Process numeric data types, such as BIGINT, DOUBLE, DECIMAL, and FLOAT. These functions let you convert number bases, perform mathematical operations, round numbers, and get random numbers. | |
Perform operations within a specified window column. These functions let you calculate sums, find maximum or minimum values, calculate averages or medians, sort numbers, offset values, and perform sampling. | |
Aggregate multiple input records into a single output value. These functions let you calculate sums, averages, maximum or minimum values, aggregate parameters, and concatenate strings. | |
HyperLogLog++ functions | These are approximate aggregate functions. For large data volumes, they use a small amount of memory to quickly remove duplicates and accelerate queries. |
Process data of the STRING type. These functions let you truncate, replace, or find strings, change character case, and convert string formats. | |
Process data of the ARRAY type. These functions let you construct arrays, remove duplicate elements, aggregate elements, sort elements, and merge elements. | |
Process data of the MAP type. These functions let you extract key-value pairs, construct maps, and merge maps. | |
Process data of the STRUCT type. These functions let you expand STRUCT arrays and construct STRUCTs. | |
Process data of the JSON type. These functions let you extract JSON field values, generate JSON objects or arrays, insert or update JSON data, and handle complex data structures. For the limits on JSON functions, see Limits on JSON functions. | |
Process table data of STRING and BINARY types. These functions let you encrypt and decrypt data. | |
Process network-related data of STRING and BINARY types. These functions let you convert IP address formats, parse URLs, and get network masks. | |
Connect to unstructured data and its metadata stored in data warehouses or data lakes in various ways. | |
Other functions are also available for various business scenarios. |
For information about the mapping between these function types and open source functions, see Mapping between MaxCompute, Hive, MySQL, and Oracle built-in functions.
Notes
Note the following when you use built-in functions:
The input parameter types, number of input parameters, and format for a built-in function must follow the required syntax. Otherwise, MaxCompute cannot parse the function, and the SQL statement fails.
If a built-in function uses Data Type 2.0 data types such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY as input parameters, you must enable Data Type 2.0. If Data Type 2.0 is not enabled, an error occurs. To enable Data Type 2.0:
Session level: Add
SET odps.sql.type.system.odps2=true;before your SQL statement and submit them together. This configuration is valid only for the current SQL statement.Project level: A project owner can set this configuration for the MaxCompute project. The configuration takes effect in 10 to 15 minutes and is valid for all subsequent SQL statements.
SETPROJECT odps.sql.type.system.odps2=true;
When Data Type 2.0 is enabled for a MaxCompute project, some implicit type conversions are disabled. This includes conversions from STRING to BIGINT, STRING to DATETIME, DOUBLE to BIGINT, DECIMAL to DOUBLE, and DECIMAL to BIGINT. These conversions are disabled because they risk precision loss or can cause errors. To resolve this issue, you can use the CAST function to perform an explicit conversion or disable Data Type 2.0.
If a user-defined function (UDF) has the same name as a built-in function, the UDF overwrites the built-in function. For example, if a UDF named CONCAT exists in a MaxCompute project, MaxCompute calls the UDF by default instead of the built-in CONCAT function. If you want to call the built-in function, add
::before the function name. For example:SELECT ::CONCAT('ab', 'c');.The results of built-in functions may vary depending on the global properties of the MaxCompute project. You can run the
SETPROJECT;command to view the global properties of the project.
Limits on JSON functions
The supported development tools include the MaxCompute client (odpscmd), MaxCompute Studio, and DataWorks. External ecosystems such as Dataphin are not supported. If you need to use JSON data types supported by MaxCompute in an external ecosystem, you must check whether the external ecosystem supports the JSON data types. The following table describes the items that you must take note of when you use the MaxCompute client (odpscmd) or MaxCompute Studio as the development tool for data of a JSON type.
MaxCompute client (odpscmd)
MaxCompute Studio
You must upgrade the MaxCompute client to V0.46.5 or later. Otherwise, you cannot execute the
desc json_tablestatement or download data of JSON types by using Tunnel.You must set the
use_instance_tunnelparameter to false. This parameter is included in the odps_config.ini file in the conf folder in the installation path of the client. Otherwise, an error is reported when you perform a query.
MaxCompute Studio allows you to query JSON data. MaxCompute Studio does not allow you to upload or download JSON data.
If you want to use another engine, such as Hologres, to read data from a MaxCompute table, JSON data in the table cannot be read.
Columns of a JSON type cannot be added to a MaxCompute table.
You are not allowed to compare data of a JSON type with data of other types, execute SQL statements that contain the
ORDER BYorGROUP BYclause on data of a JSON type, or use columns of a JSON type asjoinkeys.For data of the JSON NUMBER type, the integer part is stored by using the BIGINT type, and the decimal part is stored by using the DOUBLE type. If the integer part exceeds the range that is supported by the BIGINT type, an integer overflow occurs. When the decimal part is converted into the DOUBLE type, precision loss occurs.
Strings that are used to generate JSON data cannot contain
\u0000, which is the null character in Unicode.Java UDFs and Python UDFs do not support JSON data types.
Clustered tables cannot store JSON data.
SDKs for Java in versions earlier than V0.44.0 and PyODPS in versions earlier than V0.11.4.1 do not support JSON data types
Delta tables do not support JSON data types.
The JSON data type can be used nested, supporting up to 20 levels of nesting.
Limits on string functions
The following functions support only English characters:
TRIM, RTRIM, and LTRIM: The value of the trimChars parameter can contain only English characters.
REVERSE: This function supports only English characters in the Hive-compatible data type edition.
SOUNDEX: This function supports only English characters.
TOLOWER: This function is used to convert English characters in a string into lowercase characters.
TOUPPER: This function is used to convert English characters in a string into uppercase characters.
INITCAP: This function is used to convert the first letter of each word in English in a string into an uppercase letter and the other letters of each word into lowercase characters.