All Products
Search
Document Center

MaxCompute:Overview

Last Updated:Jul 28, 2023

MaxCompute provides a large number of built-in functions to meet data processing requirements in most business scenarios. This topic describes the types of built-in functions that are provided by MaxCompute. This topic also describes how to use the built-in functions.

Background information

For more information about the precautions you must take into account when you use built-in functions provided by MaxCompute, see Precautions.

The following table describes the types of built-in functions that are provided by MaxCompute.

Type

Description

Date functions

Used to process data of a date type, such as DATE, DATETIME, or TIMESTAMP. For example, you can use these functions to add and subtract date values, calculate date value differences, extract date fields, obtain the current time, and convert date formats.

Mathematical functions

Used to process data of a numeric type, such as BIGINT, DOUBLE, DECIMAL, or FLOAT. For example, you can use these functions to convert numeral systems, perform mathematical operations, round values, and obtain random numbers.

Window functions

Used to process the data of columns in a window. For example, you can use these functions to calculate the sum, maximum value, minimum value, average value, and median value of column data, sort column data, obtain the data of columns at a given offset, and sample column data.

Aggregate functions

Used to aggregate multiple input records into an output value. For example, you can use these functions to calculate the sum, maximum value, minimum value, and average value of data, aggregate parameters, and concatenate strings.

String functions

Used to process data of the STRING type. For example, you can use these functions to truncate strings, replace strings, search for strings, convert uppercase and lowercase letters, and convert string formats.

Complex type functions

Used to process data of the MAP, ARRAY, STRUCT, or JSON type. For example, you can use these functions to deduplicate, aggregate, sort, and merge elements.

Encryption and decryption functions

Used to encrypt and decrypt data of the STRING or BINARY type in a table.

Other functions

Used to process data in other business scenarios.

For more information about typical cases, error codes, and FAQs of built-in functions that are provided by MaxCompute, see Fix the precision issue of the ROUND function, Implement capabilities provided by the GROUP_CONCAT function, Common errors for built-in functions, and FAQ about built-in functions.

Precautions

When you use built-in functions that are provided by MaxCompute, take note of the following items:

  • For a built-in function, the types and number of input parameters and function format must meet the function syntax requirements. If the function syntax requirements are not met, MaxCompute cannot parse the built-in function and an error may occur when you execute the SQL statement in which the built-in function is called.

  • If the input parameters of a built-in function are of a type that is supported by the MaxCompute V2.0 data type edition, you must enable the MaxCompute V2.0 data type edition. The data types supported by the MaxCompute V2.0 data type edition include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY. If you do not enable the MaxCompute V2.0 data type edition, an error may occur when you execute the SQL statement in which the built-in function is called. You can enable the MaxCompute V2.0 data type edition at the session or project level.

    • Session level: Add set odps.sql.type.system.odps2=true; before the SQL statement in which a built-in function is called. Then, commit and execute them together. This configuration is valid for only the current SQL statement.

    • Project level: The owner of a project can enable the MaxCompute V2.0 data type edition for the project based on the project requirements. The configuration takes effect after 10 to 15 minutes. This configuration is valid for all the subsequent SQL statements.

      setproject odps.sql.type.system.odps2=true;
  • If you enable the MaxCompute V2.0 data type edition for a project, some implicit conversions are disabled, such as the conversions from STRING to BIGINT, STRING to DATETIME, DOUBLE to BIGINT, DECIMAL to DOUBLE, and DECIMAL to BIGINT. This may cause a loss of precision or errors. In this case, you can use the CAST function to forcefully convert the data types to resolve these issues. You can also disable the MaxCompute V2.0 data type edition.

  • If the name of a UDF is the same as that of a built-in function, the UDF is preferentially called. For example, if UDF CONCAT and built-in function CONCAT both exist in MaxCompute, the system automatically calls UDF CONCAT instead of the built-in function CONCAT. If you want to call the built-in function, you must add the symbol :: before the built-in function. For example, you can use select ::concat('ab', 'c');.

  • If the settings of global properties of MaxCompute projects are different, the execution results of built-in functions may be different. You can run the setproject; command to configure the global properties of a MaxCompute project.

For more information about the mappings between the built-in functions of MaxCompute and the built-in functions of open source systems, see Mappings between built-in functions of MaxCompute and built-in functions of Hive, MySQL, and Oracle.

Date functions

The following table describes date functions provided by MaxCompute SQL. You can select an appropriate date function based on your business requirements to complete date calculations and conversions.

Function

Description

DATEADD

Changes a date value based on the time unit specified by datepart and the interval specified by delta.

DATE_ADD

Adds or subtracts a number of days to or from a date value based on the interval specified by delta. The DATE_ADD function is the inverse of the DATE_SUB function.

DATE_FORMAT

Converts a date value into a string in a specified format.

DATE_SUB

Adds or subtracts a number of days to or from a date value based on the interval specified by delta. The DATE_SUB function is the inverse of the DATE_ADD function.

DATEDIFF

Calculates the difference between two date values based on the time unit specified by datepart.

DATEPART

Returns a specified component of a date value based on the time unit specified by datepart.

DATETRUNC

Truncates a date value based on the time unit specified by datepart.

FROM_UNIXTIME

Converts a UNIX timestamp of the BIGINT type into a date value of the DATETIME type.

GETDATE

Returns the current system time as a date value.

ISDATE

Determines whether a date string can be converted into a date value in a specified format.

LASTDAY

Returns the last day of the month in which a date value falls.

TO_DATE

Converts a string into a date value in a specified format.

TO_CHAR

Converts a date value into a string in a specified format.

UNIX_TIMESTAMP

Converts a date value into a UNIX timestamp that is an integer.

WEEKDAY

Returns a number that represents the day of the week in which a date value falls.

WEEKOFYEAR

Returns a number that represents the week of the year in which a date value falls.

ADD_MONTHS

Returns a date value that is obtained after a number of months are added to a specified date.

CURRENT_TIMESTAMP

Returns the current timestamp.

CURRENT_TIMEZONE

Returns the time zone of the current system.

DAY

Returns the day in which a date value falls.

DAYOFMONTH

Returns the day component of a date value.

DAYOFWEEK

Returns the day of the week in which a date value falls.

DAYOFYEAR

Returns the sequence number of the day in the year.

EXTRACT

Returns a specified component of a timestamp.

FROM_UTC_TIMESTAMP

Converts a UTC timestamp into a timestamp for a specified time zone.

HOUR

Returns the hour component of a date value.

LAST_DAY

Returns the last day of the month in which a date value falls.

MINUTE

Returns the minute component of a date value.

MONTH

Returns the month in which a date value falls.

MONTHS_BETWEEN

Returns the number of months between specified date values.

NEXT_DAY

Returns the date of the first weekday that is later than a date value and matches the specified week.

QUARTER

Returns the quarter in which a date value falls.

SECOND

Returns the second component of a date value.

TO_MILLIS

Converts a date value into a UNIX timestamp that is accurate to the millisecond.

YEAR

Returns the year in which a date value falls.

Mathematical functions

The following table describes mathematical functions that are provided by MaxCompute SQL for you to use during development. You can select mathematical functions based on your business requirements to compute data or convert data types.

Note

For more information about operators, such as the operator that is used to calculate remainders, see Arithmetic operators.

Function

Description

ABS

Calculates the absolute value.

ACOS

Calculates the arccosine.

ASIN

Calculates the arcsine.

ATAN

Calculates the arctangent.

ATAN2

Calculates the arctangent of expr1/expr2.

CEIL

Rounds up a number and returns the nearest integer.

CONV

Converts a number from one number system to another.

COS

Calculates the cosine.

COSH

Calculates the hyperbolic cosine.

COT

Calculates the cotangent.

EXP

Calculates the exponential value.

FLOOR

Rounds down a number and returns the nearest integer.

ISNAN

Checks whether the value of an expression is NaN.

LN

Calculates the natural logarithm.

LOG

Calculates the logarithm.

NEGATIVE

Returns the negative value of an expression.

POSITIVE

Returns the value of an expression.

POW

Calculates the nth power of a value.

RAND

Returns a random number.

ROUND

Returns a value rounded to the specified decimal place.

SIN

Calculates the sine.

SINH

Calculates the hyperbolic sine.

SQRT

Calculates the square root.

TAN

Calculates the tangent.

TANH

Calculates the hyperbolic tangent.

TRUNC

Truncates the input value to the specified decimal place.

BIN

Calculates the binary code.

CBRT

Calculates the cube root.

CORR

Calculates the Pearson correlation coefficient.

DEGREES

Converts a radian value into a degree.

E

Calculates the value of e.

FACTORIAL

Calculates the factorial.

FORMAT_NUMBER

Converts a number into a string in the specified format.

HEX

Converts an integer or a string into a hexadecimal number.

LOG2

Calculates the logarithm of a number with the base number of 2.

LOG10

Calculates the logarithm of a number with the base number of 10.

PI

Calculates the value of π.

RADIANS

Converts a degree into a radian value.

SIGN

Returns the sign of the input value.

SHIFTLEFT

Shifts a value left by a specific number of places.

SHIFTRIGHT

Shifts a value right by a specific number of places.

SHIFTRIGHTUNSIGNED

Shifts an unsigned value right by a specific number of places.

UNHEX

Converts a hexadecimal string into a string.

WIDTH_BUCKET

Returns the ID of the bucket into which the value of a specific expression falls.

Window functions

The following table describes window functions that are provided by MaxCompute SQL for you to flexibly analyze and process data of specific columns in a window.

Function

Description

ROW_NUMBER

Calculates the sequence number of a row. The row number starts from 1.

RANK

Calculates the rank of a row in an ordered group of rows. The ranks may not be consecutive integers.

DENSE_RANK

Calculates the rank of a row in an ordered group of rows. The ranks are consecutive integers.

PERCENT_RANK

Calculates the percentile rank of a row in an ordered group of rows.

CUME_DIST

Calculates the cumulative distribution of data in a partition.

NTILE

Splits rows of data in a partition into N groups of equal size and returns the number of the group to which the current row belongs. The group number ranges from 1 to N.

LAG

Obtains the calculated result of the Nth row of data that precedes the current row at a given offset in a window.

LEAD

Obtains the calculated result of the Nth row of data that follows the current row at a given offset in a window.

FIRST_VALUE

Obtains the calculated result of the first row of data in the window to which the current row belongs.

LAST_VALUE

Obtains the calculated result of the last row of data in the window to which the current row belongs.

NTH_VALUE

Obtains the calculated result of the Nth row of data in a window to which the current row belongs.

CLUSTER_SAMPLE

Samples random rows of data. If true is returned, the specified row of data is sampled.

COUNT

Calculates the number of rows in a window.

MIN

Calculates the minimum value in a window.

MAX

Calculates the maximum value in a window.

AVG

Calculates the average value of data in a window.

SUM

Calculates the sum of data in a window.

MEDIAN

Calculates the median in a window.

STDDEV

Returns the population standard deviation of all input values. This function is also called STDDEV_POP.

STDDEV_SAMP

Returns the sample standard deviation of all input values.

  • Syntax

    Syntax of window functions:

    <function_name>([distinct][<expression> [, ...]]) over (<window_definition>)
    <function_name>([distinct][<expression> [, ...]]) over <window_name>
    • function_name: the name of a built-in window function, aggregate function, or user-defined aggregate function (UDAF).

    • expression: the format of a window function. The format is subject to the function syntax.

    • windowing_definition: the definition of a window. For more information about the syntax, see windowing_definition.

    • window_name: the name of a window. You can use the window keyword to configure a window and use windowing_definition to specify the name of the window. Syntax of named_window_def:

      window <window_name> as (<window_definition>)

      Position of named_window_def in an SQL statement:

      select ... from ... [where ...] [group by ...] [having ...] named_window_def [order by ...] [limit ...]
  • windowing_definition

    Syntax

    --partition_clause:
    [partition by <expression> [, ...]]
    --orderby_clause:
    [order by <expression> [asc|desc][nulls {first|last}] [, ...]]
    [<frame_clause>]

    If you use a window function in a SELECT statement, data is partitioned and sorted based on PARTITION BY and ORDER BY in windowing_definition when the window function is executed. If the SELECT statement does not include PARTITION BY, only one partition exists and the partition contains all data. If the SELECT statement does not include ORDER BY, data in a partition is arranged in a random order, and a data stream is generated. After the data stream is generated, a group of rows is extracted from the data stream based on frame_clause in windowing_definition to create a window for the current row. The window function calculates the data included in the window to which the current row belongs.

    • partition by <expression> [, ...]: optional. This parameter specifies the partition information. If the values of partition key columns are the same for a group of rows, these rows are included in the same window. For more information about the format of PARTITION BY, see Table operations.

    • order by <expression> [asc|desc][nulls {first|last}] [, ...]: optional. This parameter specifies how to sort rows of data in a window.

      Note

      If the values of the column that is specified in order by are the same, the sorting result may not be accurate. To reduce the random ordering of data, make sure that the values of the column that is specified in order by are unique.

    • frame_clause: optional. This parameter is used to determine the data boundaries of a window. The frame_clause section in this topic provides details about this parameter.

  • frame_clause

    Syntax

    -- Syntax 1 
    {ROWS|RANGE|GROUPS} <frame_start> [<frame_exclusion>]
    -- Syntax 2 
    {ROWS|RANGE|GROUPS} between <frame_start> and <frame_end> [<frame_exclusion>]

    frame_clause is a closed interval that is used to determine the data boundaries of a window. The data boundaries are determined based on the rows that are specified by frame_start and frame_end.

    • ROWS|RANGE|GROUPS: required. ROWS, RANGE, and GROUPS indicate the types of frame_clause. The implementation rules of frame_start and frame_end vary based on the type of frame_clause. Take note of the following points:

      • ROWS: The data boundaries of a window are determined based on the number of rows.

      • RANGE: The data boundaries of a window are determined based on the comparison results of the values of the column that is specified in order by. In most cases, order by is specified in windowing_definition. If order by is not specified in windowing_definition, the values of the column that is specified in order by are the same for all rows in a partition. NULL values are considered equivalent.

      • GROUPS: In a partition, rows that have the same value of the column specified in order by form a group. If order by is not specified, all rows in the partition form a group. NULL values are considered equivalent.

    • frame_start and frame_end: the start and end rows of a window. frame_start is required. frame_end is optional. If frame_end is not specified, the default value CURRENT ROW is used.

      The row specified by frame_start must precede or be the same as the row specified by frame_end. Compared with the row specified by frame_end, the row specified by frame_start is closer to the first row in a window after all data in the window is sorted based on the column that is specified in order by of windowing_definition. The following table describes the valid values and logic of frame_start and frame_end when the type of frame_clause is ROWS, RANGE, or GROUPS.

      frame_clause type

      Value of frame_start or frame_end

      Description

      ROWS, RANGE, and GROUPS

      UNBOUNDED PRECEDING

      Indicates the first row of a partition. Rows are counted from 1.

      UNBOUNDED FOLLOWING

      Indicates the last row of a partition.

      ROWS

      CURRENT ROW

      Indicates the current row. Each row of data corresponds to a result calculated by a window function. The current row indicates the row whose data is calculated by using a window function.

      offset PRECEDING

      Indicates the Nth row that precedes the current row at a given offset. For example, if 0 PRECEDING indicates the current row, 1 PRECEDING indicates the previous row. offset must be a non-negative integer.

      offset FOLLOWING

      Indicates the Nth row that follows the current row at a given offset. For example, if 0 FOLLOWING indicates the current row, 1 FOLLOWING indicates the next row. offset must be a non-negative integer.

      RANGE

      CURRENT ROW

      • If frame_start is set to CURRENT ROW, it indicates the first row that has the same value of the column specified in order by as the current row.

      • If frame_end is set to CURRENT ROW, it indicates the last row that has the same value of the column specified in order by as the current row.

      offset PRECEDING

      The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by order by. For example, if data in a window is sorted by X, Xi indicates the X value that corresponds to the ith row, and Xc indicates the X value that corresponds to the current row. Positions of rows specified by frame_start and frame_end:

      • order by is set to asc:

        • frame_start indicates the first row that meets the following requirement: Xc - Xi ≤ offset.

        • frame_end indicates the last row that meets the following requirement: Xc - Xi ≥ offset.

      • order by is set to desc:

        • frame_start indicates the first row that meets the following requirement: Xi - Xc ≤ offset.

        • frame_end indicates the last row that meets the following requirement: Xi - Xc ≥ offset.

      The column that is specified in order by can be of the following data types: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATETIME, DATE, and TIMESTAMP.

      Syntax for offset of the DATE type:

      • N: indicates N days or N seconds. It must be a non-negative integer. For an offset of the DATETIME or TIMESTAMP type, it indicates N seconds. For an offset of the DATE type, it indicates N days.

      • interval 'N' {YEAR\MONTH\DAY\HOUR\MINUTE\SECOND}: indicates N years, months, days, hours, minutes, or seconds. For example, INTERVAL '3' YEAR indicates 3 years.

      • INTERVAL 'N-M' YEAR TO MONTH: indicates N years and M months. For example, INTERVAL '1-3' YEAR TO MONTH indicates 1 year and 3 months.

      • INTERVAL 'D[ H[:M[:S[:N]]]]' DAY TO SECOND: indicates D days, H hours, M minutes, S seconds, and N nanoseconds. For example, INTERVAL '1 2:3:4:5' DAY TO SECOND indicates 1 day, 2 hours, 3 minutes, 4 seconds, and 5 nanoseconds.

      offset FOLLOWING

      The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by order by. For example, if data in a window is sorted by X, Xi indicates the X value that corresponds to the ith row, and Xc indicates the X value that corresponds to the current row. Positions of rows specified by frame_start and frame_end:

      • order by is set to asc:

        • frame_start indicates the first row that meets the following requirement: Xi - Xc ≥ offset.

        • frame_end indicates the last row that meets the following requirement: Xi - Xc ≤ offset.

      • order by is set to desc:

        • frame_start indicates the first row that meets the following requirement: Xc - Xi >= offset.

        • frame_end indicates the last row that meets the following requirement: Xc - Xi <= offset.

      GROUPS

      CURRENT ROW

      • If frame_start is set to CURRENT ROW, it indicates the first row of the group to which the current row belongs.

      • If frame_end is set to CURRENT ROW, it indicates the last row of the group to which the current row belongs.

      offset PRECEDING

      • If frame_start is set to offset PRECEDING, it indicates the first row of the Nth group that precedes the group of the current row at a given offset.

      • If frame_end is set to offset PRECEDING, it indicates the last row of the Nth group that precedes the group of the current row at a given offset.

      Note

      You cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.

      offset FOLLOWING

      • If frame_start is set to offset FOLLOWING, it indicates the first row of the Nth group that follows the group of the current row at a given offset.

      • If frame_end is set to offset FOLLOWING, it indicates the last row of the Nth group that follows the group of the current row at a given offset.

      Note

      You cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.

    • frame_exclusion: optional. This parameter is used to remove specific rows from a window. Valid values:

      • EXCLUDE NO OTHERS: No rows are excluded from the window.

      • EXCLUDE CURRENT ROW: The current row is excluded from the window.

      • EXCLUDE GROUP: indicates that an entire group of rows in a partition is excluded from the window. In the group, all rows have the same value of the column that is specified in order by as the current row.

      • EXCLUDE TIES: An entire group of rows, except for the current row, are excluded from the window.

    Default frame_clause

    If you do not specify frame_clause, MaxCompute uses the default frame_clause to determine the data boundaries of a window. Values of the default frame_clause:

    • If odps.sql.hive.compatible is set to true, the following default frame_clause is used. This rule applies to most SQL systems.

      RANGE between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS
    • If odps.sql.hive.compatible is set to false, order by is specified in windowing_definition, and one of the following window functions is used, the default frame_clause in ROWS mode is used: AVG, COUNT, MAX, MIN, STDDEV, STEDEV_POP, STDDEV_SAMP, and SUM.

      ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS

    Example of data boundaries of a window

    In this example, a table named tbl contains three columns that are of the BIGINT type: pid, oid, and rid. The tbl table contains the following data:

    +------------+------------+------------+
    | pid        | oid        | rid        |
    +------------+------------+------------+
    | 1          | NULL       | 1          |
    | 1          | NULL       | 2          |
    | 1          | 1          | 3          |
    | 1          | 1          | 4          |
    | 1          | 2          | 5          |
    | 1          | 4          | 6          |
    | 1          | 7          | 7          |
    | 1          | 11         | 8          |
    | 2          | NULL       | 9          |
    | 2          | NULL       | 10         |
    +------------+------------+------------+

    You can replace ellipses (...) in the following SQL statements with windowing_definition to display the data in the windows in which each row of data is included.

    Note

    If a value in the window column in the returned result is NULL, no data is contained in the window.

    • Windows in ROWS mode

      • windowing_definition 1

        partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW
        -- Sample SQL statement: 
        select pid, 
               oid, 
               rid, 
        collect_list(rid) over(partition by pid order by 
        oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;

        The following result is returned:

        +------------+------------+------------+--------+
        | pid        | oid        | rid        | window |
        +------------+------------+------------+--------+
        | 1          | NULL       | 1          | [1]    |
        | 1          | NULL       | 2          | [1, 2] |
        | 1          | 1          | 3          | [1, 2, 3] |
        | 1          | 1          | 4          | [1, 2, 3, 4] |
        | 1          | 2          | 5          | [1, 2, 3, 4, 5] |
        | 1          | 4          | 6          | [1, 2, 3, 4, 5, 6] |
        | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7] |
        | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 2          | NULL       | 9          | [9]    |
        | 2          | NULL       | 10         | [9, 10] |
        +------------+------------+------------+--------+
      • windowing_definition 2

        partition by pid order by oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING
        -- Sample SQL statement: 
        select pid, 
               oid, 
               rid, 
        collect_list(rid) over(partition by pid order by 
        oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING) as window from tbl;

        The following result is returned:

        +------------+------------+------------+--------+
        | pid        | oid        | rid        | window |
        +------------+------------+------------+--------+
        | 1          | NULL       | 1          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 1          | NULL       | 2          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 1          | 1          | 3          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 1          | 1          | 4          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 1          | 2          | 5          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 1          | 4          | 6          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 2          | NULL       | 9          | [9, 10] |
        | 2          | NULL       | 10         | [9, 10] |
        +------------+------------+------------+--------+
      • windowing_definition 3

        partition by pid order by oid ROWS between 1 FOLLOWING and 3 FOLLOWING
        -- Sample SQL statement: 
        select pid, 
               oid, 
               rid, 
        collect_list(rid) over(partition by pid order by 
        oid ROWS between 1 FOLLOWING and 3 FOLLOWING) as window from tbl;

        The following result is returned:

        +------------+------------+------------+--------+
        | pid        | oid        | rid        | window |
        +------------+------------+------------+--------+
        | 1          | NULL       | 1          | [2, 3, 4] |
        | 1          | NULL       | 2          | [3, 4, 5] |
        | 1          | 1          | 3          | [4, 5, 6] |
        | 1          | 1          | 4          | [5, 6, 7] |
        | 1          | 2          | 5          | [6, 7, 8] |
        | 1          | 4          | 6          | [7, 8] |
        | 1          | 7          | 7          | [8]    |
        | 1          | 11         | 8          | NULL   |
        | 2          | NULL       | 9          | [10]   |
        | 2          | NULL       | 10         | NULL   |
        +------------+------------+------------+--------+
      • windowing_definition 4

        partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW
        -- Sample SQL statement: 
        select pid, 
        oid, 
        rid, 
        collect_list(rid) over(partition by pid order by 
        oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW) as window from tbl;

        The following result is returned:

        +------------+------------+------------+--------+
        | pid        | oid        | rid        | window |
        +------------+------------+------------+--------+
        | 1          | NULL       | 1          | NULL   |
        | 1          | NULL       | 2          | [1]    |
        | 1          | 1          | 3          | [1, 2] |
        | 1          | 1          | 4          | [1, 2, 3] |
        | 1          | 2          | 5          | [1, 2, 3, 4] |
        | 1          | 4          | 6          | [1, 2, 3, 4, 5] |
        | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6] |
        | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7] |
        | 2          | NULL       | 9          | NULL   |
        | 2          | NULL       | 10         | [9]    |
        +------------+------------+------------+--------+
      • windowing_definition 5

        partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP
        -- Sample SQL statement: 
        select pid, 
               oid, 
               rid, 
        collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP) as window from tbl;

        The following result is returned:

        +------------+------------+------------+--------+
        | pid        | oid        | rid        | window |
        +------------+------------+------------+--------+
        | 1          | NULL       | 1          | NULL   |
        | 1          | NULL       | 2          | NULL   |
        | 1          | 1          | 3          | [1, 2] |
        | 1          | 1          | 4          | [1, 2] |
        | 1          | 2          | 5          | [1, 2, 3, 4] |
        | 1          | 4          | 6          | [1, 2, 3, 4, 5] |
        | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6] |
        | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7] |
        | 2          | NULL       | 9          | NULL   |
        | 2          | NULL       | 10         | NULL   |
        +------------+------------+------------+--------+
      • windowing_definition 6

        partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES
        -- Sample SQL statement: 
        select pid, 
               oid, 
               rid, 
        collect_list(rid) over(partition by pid order by 
        oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES) as window from tbl;                            

        The following result is returned:

        +------------+------------+------------+--------+
        | pid        | oid        | rid        | window |
        +------------+------------+------------+--------+
        | 1          | NULL       | 1          | [1]    |
        | 1          | NULL       | 2          | [2]    |
        | 1          | 1          | 3          | [1, 2, 3] |
        | 1          | 1          | 4          | [1, 2, 4] |
        | 1          | 2          | 5          | [1, 2, 3, 4, 5] |
        | 1          | 4          | 6          | [1, 2, 3, 4, 5, 6] |
        | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7] |
        | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 2          | NULL       | 9          | [9]    |
        | 2          | NULL       | 10         | [10]   |
        +------------+------------+------------+--------+

        The differences between EXCLUDE CURRENT ROW and EXCLUDE GROUP can be obtained based on the comparison between the window column values of rows with the rid column values of 2, 4, and 10 in windowing_definition 5 and windowing_definition 6. If frame_exclusion is set to EXCLUDE GROUP, the rows that have the same pid column value in a partition are extracted when the rows have the same oid column value as the current row.

    • Windows in RANGE mode

      • windowing_definition 1

        partition by pid order by oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW
        -- Sample SQL statement: 
        select pid, 
               oid, 
               rid, 
        collect_list(rid) over(partition by pid order by 
        oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;

        The following result is returned:

        +------------+------------+------------+--------+
        | pid        | oid        | rid        | window |
        +------------+------------+------------+--------+
        | 1          | NULL       | 1          | [1, 2] |
        | 1          | NULL       | 2          | [1, 2] |
        | 1          | 1          | 3          | [1, 2, 3, 4] |
        | 1          | 1          | 4          | [1, 2, 3, 4] |
        | 1          | 2          | 5          | [1, 2, 3, 4, 5] |
        | 1          | 4          | 6          | [1, 2, 3, 4, 5, 6] |
        | 1          | 7          | 7          | [1, 2, 3, 4, 5, 6, 7] |
        | 1          | 11         | 8          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 2          | NULL       | 9          | [9, 10] |
        | 2          | NULL       | 10         | [9, 10] |
        +------------+------------+------------+--------+

        If frame_end is set to CURRENT ROW, the last row that has the same value of the oid column in order by as the current row is obtained. Therefore, the window column value of the row whose rid column value is 1 is [1, 2].

      • windowing_definition 2

        partition by pid order by oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING
        -- Sample SQL statement: 
        select pid, 
               oid, 
               rid, 
        collect_list(rid) over(partition by pid order by 
        oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING) as window from tbl;

        The following result is returned:

        +------------+------------+------------+--------+
        | pid        | oid        | rid        | window |
        +------------+------------+------------+--------+
        | 1          | NULL       | 1          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 1          | NULL       | 2          | [1, 2, 3, 4, 5, 6, 7, 8] |
        | 1          | 1          | 3          | [3, 4, 5, 6, 7, 8] |
        | 1          | 1          | 4          | [3, 4, 5, 6, 7, 8] |
        | 1          | 2          | 5          | [5, 6, 7, 8] |
        | 1          | 4          | 6          | [6, 7, 8] |
        | 1          | 7          | 7          | [7, 8] |
        | 1          | 11         | 8          | [8]    |
        | 2          | NULL       | 9          | [9, 10] |
        | 2          | NULL       | 10         | [9, 10] |
        +------------+------------+------------+--------+
      • windowing_definition 3

        partition by pid order by oid RANGE between 3 PRECEDING and 1 PRECEDING
        -- Sample SQL statement: 
        
        select pid, 
               oid, 
               rid, 
        collect_list(rid) over(partition by pid order by 
        oid RANGE between 3 PRECEDING and 1 PRECEDING) as window from tbl;

        The following result is returned:

        +------------+------------+------------+--------+
        | pid        | oid        | rid        | window |
        +------------+------------+------------+--------+
        | 1          | NULL       | 1          | [1, 2] |
        | 1          | NULL       | 2          | [1, 2] |
        | 1          | 1          | 3          | NULL   |
        | 1          | 1          | 4          | NULL   |
        | 1          | 2          | 5          | [3, 4] |
        | 1          | 4          | 6          | [3, 4, 5] |
        | 1          | 7          | 7          | [6]    |
        | 1          | 11         | 8          | NULL   |
        | 2          | NULL       | 9          | [9, 10] |
        | 2          | NULL       | 10         | [9, 10] |
        +------------+------------+------------+--------+

        For the row whose value of oid in order by is NULL, if frame_start is set to offset PRECEDING or offset FOLLOWING, the row is the first row whose value of oid in order by is NULL only if the offset is not unbounded. If frame_end is set to offset PRECEDING or offset FOLLOWING, the row is the last row whose value of oid in order by is NULL only if the offset is not unbounded.

    • Windows in GROUPS mode

      windowing_definition

      partition by pid order by oid GROUPS between 2 PRECEDING and CURRENT ROW
      -- Sample SQL statement: 
      select pid, 
             oid, 
             rid, 
      collect_list(rid) over(partition by pid order by 
      oid GROUPS between 2 PRECEDING and CURRENT ROW) as window from tbl;

      The following result is returned:

      +------------+------------+------------+--------+
      | pid        | oid        | rid        | window |
      +------------+------------+------------+--------+
      | 1          | NULL       | 1          | [1, 2] |
      | 1          | NULL       | 2          | [1, 2] |
      | 1          | 1          | 3          | [1, 2, 3, 4] |
      | 1          | 1          | 4          | [1, 2, 3, 4] |
      | 1          | 2          | 5          | [1, 2, 3, 4, 5] |
      | 1          | 4          | 6          | [3, 4, 5, 6] |
      | 1          | 7          | 7          | [5, 6, 7] |
      | 1          | 11         | 8          | [6, 7, 8] |
      | 2          | NULL       | 9          | [9, 10] |
      | 2          | NULL       | 10         | [9, 10] |
      +------------+------------+------------+--------+

Aggregate functions

The following table describes the aggregate functions that are supported by MaxCompute SQL.

Function

Description

AVG

Returns the average value of a column.

COUNT

Returns the number of records that match the specified criteria.

COUNT_IF

Returns the number of records whose expr value is True.

MAX

Returns the maximum value of a column.

MIN

Returns the minimum value of a column.

MEDIAN

Returns the median value of a column.

STDDEV

Returns the population standard deviation of all input values.

STDDEV_SAMP

Returns the sample standard deviation of all input values.

SUM

Returns the sum of a column.

WM_CONCAT

Concatenates strings with a specified delimiter.

ANY_VALUE

Returns a random value from a specified column.

APPROX_DISTINCT

Returns the approximate number of distinct input values in a specified column.

ARG_MAX

Returns the column value of the row that corresponds to the maximum value of a specified column.

ARG_MIN

Returns the column value of the row that corresponds to the minimum value of a specified column.

MAX_BY

Returns the column value of the row that corresponds to the maximum value of a specified column.

MIN_BY

Returns the column value of the row that corresponds to the minimum value of a specified column.

COLLECT_LIST

Aggregates values from a specified column into an array.

COLLECT_SET

Aggregates only distinct values from a specified column into an array.

NUMERIC_HISTOGRAM

Returns the approximate histogram of a specified column.

PERCENTILE_APPROX

Returns approximate percentiles. This function applies to scenarios in which a large amount of data is calculated.

BITWISE_OR_AGG

Aggregates input values based on the bitwise OR operation.

BITWISE_AND_AGG

Aggregates input values based on the bitwise AND operation.

MAP_AGG

Returns a map that is created by using a and b. a is the key in the map. b is the value of the key in the map.

MULTIMAP_AGG

Returns a map that is created by using a and b. a is the key in the map. b is used to create an array, which is used as the value of the key in the map.

MAP_UNION

Returns a new map that is the union of all input maps.

MAP_UNION_SUM

Returns a new map that is the union of all input maps. The output map sums the values of the matching keys in all input maps.

HISTOGRAM

Returns a map that contains the number of times each input value appears.

  • Syntax

    Syntax of an aggregate function:

    <aggregate_name>(<expression>[,..]) [within group (order by <col1>[,<col2>…])] [filter (where <where_condition>)]
    • <aggregate_name>(<expression>[,..]): a built-in aggregate function or a UDAF. The format of an aggregate function is based on its syntax.

    • within group (order by <col1>[,<col2>…]): If the syntax of an aggregate function includes this expression, the system automatically sorts the input data of <col1>[,<col2>…] in ascending order. To sort the input data in descending order, use the expression within group (order by <col1>[,<col2>…] [desc]).

      Before you use this expression, take note of the following points:

      • You can use this expression only for WM_CONCAT, COLLECT_LIST, COLLECT_SET, and UDAFs.

      • If multiple aggregate functions of a SELECT statement include the expression within group (order by <col1>[,<col2>…]), order by <col1>[,<col2>…] must be the same for these functions.

      • If the parameters of an aggregate function include the DISTINCT keyword, columns with distinct values must be specified in the expression order by <col1>[,<col2>…].

      Examples:

      -- Example 1: Sort the input data in ascending order and return the output data. 
      select 
        x,
        wm_concat(',', y) within group (order by y)
      from values('k', 1),('k', 3),('k', 2) as t(x, y)
      group by x;
      -- The following result is returned: 
      +------------+------------+
      | x          | _c1        |
      +------------+------------+
      | k          | 1,2,3      |
      +------------+------------+
      
      -- Example 2: Sort the input data in descending order and return the output data. 
      select 
        x,
        wm_concat(',', y) within group (order by y desc)
      from values('k', 1),('k', 3),('k', 2) as t(x, y)
      group by x;
      -- The following result is returned: 
      +------------+------------+
      | x          | _c1        |
      +------------+------------+
      | k          | 3,2,1      |
      +------------+------------+
    • [filter (where <where_condition>)]: If an aggregate function includes this expression, the aggregate function processes only the data that meets the condition specified by <where_condition>. For more information about <where_condition>, see WHERE clause (where_condition).

      Before you use this expression, take note of the following points:

      • Only built-in aggregate functions support this expression. UDAFs do not support this expression.

      • count(*) does not support the [filter (where <where_condition>)] expression. To add filter conditions to count(*), you can use the COUNT_IF function.

      • The COUNT_IF function does not support this expression.

      Examples:

      -- Example 1: Filter and aggregate data. 
      select
        sum(x),
        sum(x) filter (where y > 1),
        sum(x) filter (where y > 2)
        from values(null, 1),(1, 2),(2, 3),(3, null) as t(x, y);
      -- The following result is returned: 
      +------------+------------+------------+
      | _c0        | _c1        | _c2        |
      +------------+------------+------------+
      | 6          | 3          | 2          |
      +------------+------------+------------+
      
      -- Example 2: Use multiple aggregate functions to filter and aggregate data. 
      select
        count_if(x > 2),
        sum(x) filter (where y > 1),
        sum(x) filter (where y > 2)
        from values(null, 1),(1, 2),(2, 3),(3, null) as t(x, y);
      -- The following result is returned: 
      +------------+------------+------------+
      | _c0        | _c1        | _c2        |
      +------------+------------+------------+
      | 1          | 3          | 2          |
      +------------+------------+------------+
  • Filter expressions

    • Limits

      • Only built-in aggregate functions of MaxCompute support filter expressions. UDAFs do not support filter expressions.

      • COUNT(*) cannot be used with filter expressions. Use the COUNT_IF function with filter expressions.

    • Syntax

      <aggregate_name>(<expression>[,...]) [filter (where <where_condition>)]
    • Description

      All aggregate functions support filter expressions. If you specify a filter condition, only the row data that meets the filter condition can be passed to the related aggregate function for data processing.

    • Parameters

      • aggregate_name: required. The name of the aggregate function. Select an aggregate function that is described in this topic based on your business requirements.

      • expression: required. The parameters of the aggregate function that you select. Specify this parameter based on the description of the aggregate function that you select.

      • where_condition: optional. The filter condition. For more information about where_condition, see WHERE clause (where_condition).

    • Return value

      For more information, see the description of the return value for each aggregate function.

    • Sample statement:

      select sum(sal) filter (where deptno=10), sum(sal) filter (where deptno=20), sum(sal) filter (where deptno=30) from emp;

      The following result is returned:

      +------------+------------+------------+
      | _c0        | _c1        | _c2        |
      +------------+------------+------------+
      | 17500      | 10875      | 9400       |
      +------------+------------+------------+

String functions

The following table describes the string functions that are supported by MaxCompute SQL. For more information about the limits on string functions, see Limits on string functions.

Function

Description

ASCII

Returns the ASCII code of the first character in a specified string.

CHAR_MATCHCOUNT

Calculates the number of characters of String A that appear in String B.

CHR

Converts an ASCII code into a character.

CONCAT

Concatenates all the specified strings and returns the final string.

CONCAT_WS

Concatenates all input strings in an array by using a specified delimiter.

DECODE

Decodes a string in the specified encoding format.

ENCODE

Encodes a string in the specified encoding format.

FIND_IN_SET

Returns the position of the specified string among multiple strings that are separated by commas (,).

FORMAT_NUMBER

Converts a number into a string in the specified format.

FROM_JSON

Returns data of the ARRAY, MAP, or STRUCT type based on a given JSON string and a given output format.

GET_JSON_OBJECT

Extracts a single string from a standard JSON string by using a specific method.

INITCAP

Converts a string into a string with words in title caps and separated by spaces. In title caps, the first letter of each word is capitalized, and the other letters are in lowercase.

INSTR

Returns the position of String A in String B.

IS_ENCODING

Determines whether a string can be converted from one character set to another character set.

KEYVALUE

Splits a string into key-value pairs, separates the key-value pairs, and then returns the value that corresponds to a specified key.

KEYVALUE_TUPLE

Splits a string into key-value pairs, separates the key-value pairs, and then returns the values that correspond to the keys.

LENGTH

Calculates the length of a string.

LENGTHB

Calculates the length of a string in bytes.

LOCATE

Returns the position of a specified string in another string.

LTRIM

Removes the characters from the left side of a string.

MD5

Returns the MD5 value of a string.

PARSE_URL

Parses a URL and returns the specified part of the URL.

PARSE_URL_TUPLE

Parses a URL and returns multiple parts of the URL.

REGEXP_COUNT

Returns the number of substrings that match a specified pattern from a specified position.

REGEXP_EXTRACT

Splits a string into groups based on a specified pattern and returns the string in a specified group.

REGEXP_INSTR

Returns the start or end position of a substring that starts at a specified position and matches a specified pattern for a specified number of times.

REGEXP_REPLACE

Uses a string to replace a substring of another string if the substring matches a specified pattern for a specified number of times.

REGEXP_SUBSTR

Returns a substring in a string that matches a specified pattern for a specified number of times from a specified position.

REPEAT

Returns a string that repeats a specified string for a specified number of times.

REVERSE

Returns a string in reverse order.

RTRIM

Removes the characters from the right side of a string.

SPACE

Generates a space string.

SPLIT_PART

Uses a delimiter to split a string into substrings and returns a substring of the specified part of the string.

SUBSTR

Returns a substring that has a specified length from a specified position of a string. The string is of the STRING type.

SUBSTRING

Returns a substring that has a specified length from a specified position of a string. The string is of the STRING or BINARY type.

TO_CHAR

Converts data of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type into the STRING type.

TO_JSON

Converts data of a complex data type into a JSON string.

TOLOWER

Converts uppercase letters in a string into lowercase letters.

TOUPPER

Converts lowercase letters in a string into uppercase letters.

TRIM

Removes the characters from both of the left and right sides of a string.

URL_DECODE

Converts an input string that is in the application/x-www-form-urlencoded MIME format into a standard string.

URL_ENCODE

Encodes the input string in the application/x-www-form-urlencoded MIME format and returns the encoded string.

JSON_TUPLE

Extracts strings from a standard JSON string based on a set of input keys.

LPAD

Left pads a string to a specified length.

RPAD

Right pads a string to a specified length.

REPLACE

Replaces a substring in String A that matches String B with another substring.

SOUNDEX

Converts a string into a string of the SOUNDEX type.

SUBSTRING_INDEX

Truncates a string from a specified delimiter.

TRANSLATE

Replaces the part of String A that appears in String B with String C.

REGEXP_EXTRACT_ALL

Finds all substrings that match the pattern of a regular expression in a string and returns the substrings as an array.

Complex type functions

The following table describes the complex type functions that are supported by MaxCompute SQL. For more information about the limits on JSON functions, see Limits on JSON functions.

Function type

Function

Description

ARRAY functions

ALL_MATCH

Checks whether all elements in an array meet a specific condition.

ANY_MATCH

Checks whether an element in an array meets a specific condition.

ARRAY

Creates an array based on given values.

ARRAY_CONTAINS

Checks whether an array contains a given value.

ARRAY_DISTINCT

Removes duplicate elements from an array.

ARRAY_EXCEPT

Finds the elements that exist in Array A but do not exist in Array B and returns the elements as a new array without duplicates.

ARRAY_INTERSECT

Calculates the intersection of two arrays.

ARRAY_JOIN

Concatenates the elements in an array by using a delimiter.

ARRAY_MAX

Returns the largest element in an array.

ARRAY_MIN

Returns the smallest element in an array.

ARRAY_POSITION

Returns the position of the first occurrence of a given element in an array.

ARRAY_REDUCE

Aggregates the elements in an array.

ARRAY_REMOVE

Removes a given element from an array.

ARRAY_REPEAT

Returns a new array in which a given element is repeated for a specified number of times.

ARRAY_SORT

Sorts the elements in an array based on a comparator.

ARRAY_UNION

Calculates the union of two arrays and returns the union as a new array without duplicates.

ARRAYS_OVERLAP

Checks whether two arrays contain the same element.

ARRAYS_ZIP

Merges multiple arrays.

CONCAT

Concatenates multiple arrays or strings.

EXPLODE

Transposes one row of data into multiple rows. This function is a user-defined table-valued function (UDTF).

FILTER

Filters the elements in an array.

INDEX

Returns the element at a specific position in an array.

POSEXPLODE

Converts an array into a table that has two columns. The first column lists the position of each element in the array, starting from 0. The second column lists the elements.

SIZE

Returns the number of elements in an array.

SLICE

Copies the elements in an array from a specific position based on a specific length and returns the elements as a new array.

SORT_ARRAY

Sorts the elements in an array.

TRANSFORM

Transforms the elements in an array.

ZIP_WITH

Merges two arrays at the element level based on element positions and returns a new array.

MAP functions

EXPLODE

Transposes one row of data into multiple rows. This function is a UDTF.

INDEX

Returns the value that meets a specific condition in a map.

MAP

Creates a map based on given key-value pairs.

MAP_CONCAT

Returns the union of multiple maps.

MAP_ENTRIES

Converts key-value pairs in a map into a struct array.

MAP_FILTER

Filters the elements in a map.

MAP_FROM_ARRAYS

Creates a map based on given arrays.

MAP_FROM_ENTRIES

Creates a map based on given struct arrays.

MAP_KEYS

Returns all keys in a map as an array.

MAP_VALUES

Returns all values in a map as an array.

MAP_ZIP_WITH

Merges two given maps into a single map.

SIZE

Returns the number of key-value pairs in a map.

TRANSFORM_KEYS

Transforms the keys in a map by using a given function. The values in the map are not changed.

TRANSFORM_VALUES

Transforms the values in a map by using a given function. The keys in the map are not changed.

STRUCT functions

FIELD

Obtains the value of a member variable in a struct.

INLINE

Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row.

STRUCT

Creates a struct based on a given value list.

NAMED_STRUCT

Creates a struct based on given name-value pairs.

JSON functions

FROM_JSON

Returns data of the ARRAY, MAP, or STRUCT type based on a given JSON string and a given output format.

GET_JSON_OBJECT

Extracts a single string from a standard JSON string by using a specific method.

JSON_TUPLE

Extracts strings from a standard JSON string based on a set of input keys.

TO_JSON

Converts data of a complex data type into a JSON string.

JSON_OBJECT

Returns a JSON object that contains key-value pairs.

JSON_ARRAY

Evaluates a possibly empty list of values and returns a JSON array that contains these values.

JSON_EXTRACT

Parses the value of json_path in a JSON expression. Note that an error is returned if the value of json_path is invalid.

JSON_EXISTS

Determines whether the JSON value of json_path exists.

JSON_PRETTY

Returns a JSON value in a format that is easy to read by adding line breaks and spaces.

JSON_TYPE

Returns the data type of a JSON value.

JSON_FORMAT

Converts a value of the JSON data type into a value of the STRING data type. By default, JSON data is not automatically prettified.

JSON_PARSE

Converts a value of the STRING data type into a value of the JSON data type. If a non-JSON-formatted value is converted into a value of the STRING data type, an error is returned.

JSON_VALID

Determines whether a string is in a valid JSON format.

CAST

Supports conversion between basic data types and JSON data types.

Encryption and decryption functions

The following table describes the encryption function and decryption function that are supported by MaxCompute SQL.

Function

Description

SYM_ENCRYPT

Encrypts data in specified columns of a table by using a random key and returns the ciphertext of the BINARY type.

SYM_DECRYPT

Decrypts the encrypted data in the specified columns and returns the plaintext of the BINARY type.

Other functions

The following table describes other types of functions that are supported by MaxCompute SQL.

Function

Description

BASE64

Converts a binary value into a Base64-encoded string.

BETWEEN AND

Returns the values that fall in or fall out of the specified range.

CASE WHEN

Returns values based on the computing result of an expression.

CAST

Converts the result of an expression into the specified data type.

COALESCE

Returns the first non-null value in the parameter list.

COMPRESS

Uses the GZIP algorithm to compress input parameters of the STRING or BINARY type.

CRC32

Calculates the cyclic redundancy check (CRC) value of a value that is of the STRING or BINARY type.

DECODE

Implements the IF-THEN-ELSE logic.

DECOMPRESS

Uses the GZIP algorithm to decompress input parameters of the BINARY type.

GET_IDCARD_AGE

Returns an age in years based on the ID card number.

GET_IDCARD_BIRTHDAY

Returns the date of birth based on the ID card number.

GET_IDCARD_SEX

Returns the gender based on the ID card number.

GET_USER_ID

Obtains the ID of the current account.

GREATEST

Returns the maximum value of the input parameters.

HASH

Calculates a hash value based on the input parameters.

IF

Checks whether a specified condition is true.

LEAST

Returns the minimum value of the input parameters.

MAX_PT

Returns the name of the largest level-1 partition in a partitioned table.

NULLIF

Checks whether the values of two input parameters are the same.

NVL

Specifies the return values of the parameters whose values are null.

ORDINAL

Sorts the values of the input variables in ascending order and returns the value that is ranked at a specified position.

PARTITION_EXISTS

Checks whether a specified partition exists in a table.

SAMPLE

Samples all column values that are read and filters out the rows that do not meet sampling conditions.

SHA

Calculates the SHA-1 hash value of a value that is of the STRING or BINARY type.

SHA1

Calculates the SHA-1 hash value of a value that is of the STRING or BINARY type.

SHA2

Calculates the SHA-2 hash value of a value that is of the STRING or BINARY type.

SIGN

Determines the sign of a value. The sign indicates whether a value is positive or negative.

SPLIT

Splits a string with a specified delimiter and returns an array.

STACK

Splits a specified parameter group into a specified number of rows.

STR_TO_MAP

Splits a string with a specified delimiter and returns a key-value pair.

TABLE_EXISTS

Checks whether a specified table exists.

TRANS_ARRAY

Transposes one row of data into multiple rows. This function is a UDTF that transposes an array separated by fixed delimiters in a column into multiple rows.

TRANS_COLS

Transposes one row of data into multiple rows. This function is a UDTF that transposes columns into rows.

UNBASE64

Converts a Base64-encoded string into a binary value.

UNIQUE_ID

Returns a unique ID. This function is more efficient than the UUID function.

UUID

Returns a random ID.

Limits on JSON functions

  • The development tools that are supported for the JSON type include odpscmd and MaxCompute Studio. External ecosystems such as DataWorks and Dataphin are not supported. If you need to use the JSON type together with an external ecosystem, you must check whether the external ecosystem is supported for the JSON type. The following table describes the items that you must take note of when you use odpscmd and MaxCompute Studio as development tools for the JSON type.

    odpscmd

    MaxCompute Studio

    • You must upgrade the client to the latest version. Otherwise, the desc json_table command cannot be used.

    • You must set the se_instance_tunnel parameter in the conf\odps_config.ini file in the installation path of the client to false. Otherwise, an error is reported when you perform a query.

    MaxCompute Studio allows you only to query JSON data. It does not allow you to upload or download JSON data.

  • If you want to use another type of engine, such as Hologres, to read data from a MaxCompute table, JSON data in the table cannot be read.

  • Columns of the JSON type cannot be added to a MaxCompute table.

  • You are not allowed to compare data of the JSON type with data of other types, execute SQL statements that contain the ORDER BY or GROUP BY clause on data of the JSON type, or use columns of the JSON type as join keys.

  • The integer part and decimal part of a value of the JSON NUMBER type are stored by using the BIGINT type and the DOUBLE type separately. If the integer part exceeds the range that is supported by the BIGINT type, an integer overflow error occurs. Conversion of the decimal part to the DOUBLE type leads to precision loss.

  • Strings that can be used to generate JSON data do not include \u0000, which is the Unicode character for NULL.

  • Java UDFs and Python UDFs do not support the JSON type.

Limits on string functions

The following functions support only English characters:

  • TRIM, RTRIM, and LTRIM: The value of the trimChars parameter can contain only English characters.

  • REVERSE: This function supports only English characters in the Hive-compatible data type edition.

  • SOUNDEX: This function supports only English characters.

  • TOLOWER: This function is used to convert English characters in a string into lowercase characters.

  • TOUPPER: This function is used to convert English characters in a string into uppercase characters.

  • INITCAP: This function is used to convert the first letter of each word in English in a string into an uppercase letter and the other letters of each word into lowercase characters.