MaxCompute provides various built-in functions for most data processing needs. This topic describes the function types in MaxCompute and provides usage notes.
Function types
Function type | Description |
Process date and time data types, such as DATE, DATETIME, and TIMESTAMP. You can use these functions to perform operations such as adding or subtracting dates, calculating date differences, extracting date fields, getting the current time, and converting date formats. | |
Process numeric data types, such as BIGINT, DOUBLE, DECIMAL, and FLOAT. You can use these functions to perform operations such as base conversion, mathematical operations, rounding, and generating random numbers. | |
Perform operations within a specified window column. You can use these functions to calculate sums, find maximum or minimum values, calculate averages or medians, sort numbers, apply offsets, and perform sampling. | |
Aggregate multiple input records into a single output value. You can use these functions to calculate sums, averages, maximum or minimum values, aggregate parameters, and concatenate strings. | |
Process data of the STRING type. You can use these functions to perform operations such as truncating, replacing, and finding strings, changing character case, and converting string formats. | |
Process data of the ARRAY type. You can use these functions to perform operations such as creating arrays, removing duplicate elements, aggregating elements, sorting elements, and merging elements. | |
Process data of the MAP type. You can use these functions to perform operations such as extracting key-value pairs, creating maps, and merging maps. | |
Process data of the STRUCT type. You can use these functions to perform operations such as expanding STRUCT arrays and creating STRUCTs. | |
Process data of the JSON type. You can use these functions to extract JSON field values, generate JSON objects or arrays, insert or update JSON data, and handle complex data structures. For the limits on JSON functions, see Limits on JSON functions. | |
Process table data of the STRING and BINARY types. You can use these functions to perform encryption and decryption. | |
Network functions | Process network-related data of the STRING and BINARY types. You can use these functions to convert IP address formats, parse URLs, and get network masks. |
Unstructured data processing functions | Connect to unstructured data and its metadata stored in data warehouses or data lakes in various ways. |
Provides other functions for various business scenarios. |
For a comparison of built-in functions in MaxCompute with those in Hive, MySQL, and Oracle, see Comparison of built-in functions in MaxCompute, Hive, MySQL, and Oracle.
Notes
Note the following when you use built-in functions:
The data types, number of input parameters, and format of a built-in function must meet the syntax requirements. Otherwise, MaxCompute cannot parse the function, and the SQL statement fails to run.
If an input parameter of a built-in function has a Data Type 2.0 data type, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, you must enable Data Type 2.0. Otherwise, an error is reported. To enable Data Type 2.0:
Session level: Add
SET odps.sql.type.system.odps2=true;before the SQL statement and submit them together. This setting is valid only for the current SQL statement.Project level: A project owner can configure the MaxCompute project. The configuration takes effect in 10 to 15 minutes. This setting is valid for all subsequent SQL statements.
SETPROJECT odps.sql.type.system.odps2=true;
When Data Type 2.0 is enabled for a MaxCompute project, some implicit type conversions are disabled. These conversions include STRING to BIGINT, STRING to DATETIME, DOUBLE to BIGINT, DECIMAL to DOUBLE, and DECIMAL to BIGINT. These conversions may cause precision loss or errors. To resolve this issue, you can use the CAST function to perform explicit conversions, or you can disable Data Type 2.0.
If a user-defined function (UDF) has the same name as a built-in function, the UDF overwrites the built-in function. For example, if a UDF named CONCAT exists in MaxCompute, the system calls the UDF by default instead of the built-in CONCAT function. If you want to call the built-in function, add
::before the function name. For example,SELECT ::CONCAT('ab', 'c');.If MaxCompute projects have different global properties, the results of built-in functions may be inconsistent. You can run the
SETPROJECT;command to view the global properties of a project.
Limits on JSON functions
The supported development tools include the MaxCompute client (odpscmd), MaxCompute Studio, and DataWorks. External ecosystems such as Dataphin are not supported. If you need to use JSON data types supported by MaxCompute in an external ecosystem, you must check whether the external ecosystem supports the JSON data types. The following table describes the items that you must take note of when you use the MaxCompute client (odpscmd) or MaxCompute Studio as the development tool for data of a JSON type.
MaxCompute client (odpscmd)
MaxCompute Studio
You must upgrade the MaxCompute client to V0.46.5 or later. Otherwise, you cannot execute the
desc json_tablestatement or download data of JSON types by using Tunnel.You must set the
use_instance_tunnelparameter to false. This parameter is included in the odps_config.ini file in the conf folder in the installation path of the client. Otherwise, an error is reported when you perform a query.
MaxCompute Studio allows you to query JSON data. MaxCompute Studio does not allow you to upload or download JSON data.
If you want to use another engine, such as Hologres, to read data from a MaxCompute table, JSON data in the table cannot be read.
Columns of a JSON type cannot be added to a MaxCompute table.
You are not allowed to compare data of a JSON type with data of other types, execute SQL statements that contain the
ORDER BYorGROUP BYclause on data of a JSON type, or use columns of a JSON type asjoinkeys.For data of the JSON NUMBER type, the integer part is stored by using the BIGINT type, and the decimal part is stored by using the DOUBLE type. If the integer part exceeds the range that is supported by the BIGINT type, an integer overflow occurs. When the decimal part is converted into the DOUBLE type, precision loss occurs.
Strings that are used to generate JSON data cannot contain
\u0000, which is the null character in Unicode.Java UDFs and Python UDFs do not support JSON data types.
Clustered tables cannot store JSON data.
SDKs for Java in versions earlier than V0.44.0 and PyODPS in versions earlier than V0.11.4.1 do not support JSON data types
Delta tables do not support JSON data types.
The JSON data type can be used nested, supporting up to 20 levels of nesting.
Limits on string functions
The following functions support only English characters:
TRIM, RTRIM, and LTRIM: The value of the trimChars parameter can contain only English characters.
REVERSE: This function supports only English characters in the Hive-compatible data type edition.
SOUNDEX: This function supports only English characters.
TOLOWER: This function is used to convert English characters in a string into lowercase characters.
TOUPPER: This function is used to convert English characters in a string into uppercase characters.
INITCAP: This function is used to convert the first letter of each word in English in a string into an uppercase letter and the other letters of each word into lowercase characters.