SQL functions allow you to reference SQL user-defined functions (UDFs) in SQL scripts. This topic describes how to manage SQL functions.

Background

MaxCompute allows you to create UDFs by using SQL scripts in addition to the Java or Python code. SQL functions support more function type parameters and anonymous function parameters. SQL functions allow you to define business logic that suits your business requirements. SQL functions can be used to implement simple features and improve the code reuse rate. SQL functions provide the following features:
  • Allows you to reference and call SQL UDFs in SQL scripts.
  • Allows you to pass parameters of the function type to built-in functions, UDFs, or SQL functions when you call SQL functions.
  • Allows you to pass parameters of the anonymous function type to anonymous functions when you call SQL functions.
You can use SQL functions of MaxCompute to solve the following issues:
  • In most cases, a large amount of similar code exists, which is inconvenient to maintain and prone to errors. If you want to use UDFs, you must develop code, compile the code in Java, and then create resources and functions. The process is complex. In addition, UDFs cannot catch up with built-in functions in terms of performance. The following code provides an example:
    select
        nvl(str_to_map(get_json_object(col, '$.key1')), 'default') as key1,
        nvl(str_to_map(get_json_object(col, '$.key2')), 'default') as key2,
        ...
        nvl(str_to_map(get_json_object(col, '$.keyN')), 'default') as keyN
    from t;
  • One function can be passed to another function as a parameter by using code that is similar to lambda expressions in Java.

Remarks

When you call an SQL function, make sure that the type of the read data meets the requirements of SQL functions for the data type. Otherwise, the function may fail to be parsed.

When you create, query, call, and delete SQL UDFs, make sure that you have permissions to perform operations on the UDFs. For more information about function permissions and authorization, see Authorize users.

Create permanent SQL functions

After you create permanent SQL functions and store them in the metadata system, you can call these functions for all queries. For more information about calling these functions, see Call an SQL function. Command syntax:
create sql function <function_name>(@<parameter_in1> <datatype>[, @<parameter_in2> <datatype>...]) 
[returns @<parameter_out> <datatype>] 
as [begin] 
<function_expression> 
[end];
  • function_name: required. The name of the SQL function that you want to create. Each function name must be unique and can be registered only once. The SQL function names cannot be the same as those of built-in functions. You can run the LIST FUNCTIONS; command to view all the functions in a project.
  • parameter_in: required. The input parameters of the SQL function that you want to create. Function type parameters and anonymous function parameters are supported. For information about function type parameters, see Function type parameters. For information about anonymous function parameters, see Anonymous function parameters.
  • datatype: required. The data type of the parameters.
  • returns: required. The return value of the function. It is a variable. If you do not specify the returns parameter, the value of function_name is returned by default.
  • parameter_out: required. The response parameter of the function.
  • function_expression: required. The function expression.
The following statement provides an example:
create sql function my_add(@a BIGINT) as @a + 1;
In the preceding example, @a + 1 indicates the logic of the SQL function. You can write it as an expression. The expression can be a built-in operator, built-in function, or UDF.
If the function logic is complex, you can write multiple SQL statements and enclose them with BEGIN and END. returns specifies the return value of the function. If you do not specify this parameter, the value of the function_name parameter is returned by default. The following statement provides an example:
create sql function my_sum(@a BIGINT, @b BIGINT, @c BIGINT) returns @my_sum BIGINT
as begin 
    @temp := @a + @b;
    @my_sum := @temp + @c;
end;
Note To write multiple SQL statements, you must use SQL in script mode. For more information, see Run MaxCompute SQL in script mode.

Create temporary SQL functions

If you do not need to store SQL functions in the metadata system of MaxCompute, you can create temporary SQL functions. These functions apply only to the current SQL script. The following statement provides an example:
function <function_name>(@<parameter_in1> <datatype>[, @<parameter_in2> <datatype>...]) 
[returns @<parameter_out> <datatype>] 
as [begin] 
<function_expression> 
[end];
For more information about parameters in this statement, see Create permanent SQL functions.
Note To write multiple SQL statements, you must use SQL in script mode. For more information, see Run MaxCompute SQL in script mode.
The following statement provides an example:
function my_add(@a BIGINT) as @a + 1;

Query the information of an SQL function

You can query an SQL function in the same way that you query a Java or Python UDF. The following statement provides an example:
desc function <function_name>;
function_name: the name of the created SQL function.
The following statement provides an example:
desc function my_add;
The following result is returned:
Name                                    my_add
Owner                                   ALIYUN$santie_doctest@test.aliyunid.com
Created Time                            2021-05-08 11:26:02
SQL Definition Text                     CREATE SQL FUNCTION MY_ADD(@a BIGINT) AS @a + 1
Note If you want to query an SQL function on the MaxCompute client, make sure that the client version is later than 0.34.0. For more information about how to view the version of the MaxCompute client and perform operations on the client, see MaxCompute client.

Delete an SQL function

You can delete an SQL function in the same way that you delete a Java or Python UDF. Syntax:
drop function <function_name>;
function_name: the name of the created SQL function.
The following statement provides an example:
drop function my_add;

Call an SQL function

You can call an SQL function in the same way that you call a built-in function. The following statement provides an example:
select <function_name>(<column_name>[,...]) from <table_name>;
  • function_name: the name of the created SQL function.
  • column_name: the column name of the table from which you want to query data. The data type of the column must be the same as the data type defined by the SQL function.
  • table_name: the name of the table from which you want to query data.
The following statement provides an example:
-- Create a table named src. 
create table src (c bigint, d string);
insert into table src values (1,100.1),(2,100.2),(3,100.3);
-- Call the my_add function. 
select my_add(c) from src;
-- The following result is returned. 
+------------+
| _c0        |
+------------+
| 2          |
| 3          |
| 4          |
+------------+

Function type parameters

When you call an SQL function, you can pass in built-in functions, UDFs, or other SQL functions. The following statement provides an example:
function add(@a BIGINT) as @a + 1;
function op(@a BIGINT, @fun function (BIGINT) returns BIGINT) as @fun(@a);
select op(key, add), op(key, abs) from values (1),(2) as t (key);

-- The following result is returned. 
+------------+------------+
| _c0        | _c1        |
+------------+------------+
| 2          | 1          |
| 3          | 2          |
+------------+------------+
In the example, two input parameters are specified for the OP function. @a specifies a value of the BIGINT type. @fun specifies a function whose input and output parameters are of the BIGINT type. The OP function passes @a to the function that is specified by @fun and then to the ADD and ABS functions for processing@a. For more information about the ABS function, see Mathematical functions.

Anonymous function parameters

You can pass parameters of the anonymous function type to anonymous functions when you call SQL functions. The following statement provides an example:
function op(@a BIGINT, @fun function (BIGINT) returns BIGINT) as @fun(@a);
select op(key, function (@a) as @a + 1) from values (1),(2) as t (key);
In the example, function (@a) as @a + 1 is an anonymous function. You do not need to specify a data type for the @a parameter. The compiler infers the data type of @a based on the parameter definition of the OP function.

Examples

Scenario: Convert the format of a date from yyyy-mm-dd to yyyymmdd.

For example, the formats of the following dates need to be converted: 2020-11-21, 2020-1-01, 2019-5-1, and 19-12-1.

Solutions:
  • Solution 1: Use an SQL function. We recommend that you use this solution. The following statement shows an example:
    create sql function y_m_d2yyyymmdd(@y_m_d string) returns @yyyymmdd string
    as begin
        @yyyymmdd := concat(lpad(split_part(@y_m_d, '-', 1), 4, '0'), lpad(split_part(@y_m_d, '-', 2), 2, '0'), lpad(split_part(@y_m_d, '-', 3), 2, '0')) ;
    end;
    
    select y_m_d2yyyymmdd(d) from values('2020-11-21'),('2020-1-01'), ('2019-5-1'), ('19-12-1') t (d);
    The following result is returned:
    +------------+
    | _c0        |
    +------------+
    | 20201121   |
    | 20200101   |
    | 20190501   |
    | 00191201   |
    +------------+
  • Solution 2: Repeatedly call a function, which decreases the code reuse rate. Therefore, we recommend that you do not use this solution. The following statement shows an example:
    select concat(lpad(split_part(d, '-', 1), 4, '0'), lpad(split_part(d, '-', 2), 2, '0'), lpad(split_part(d, '-', 3), 2, '0')) from values('2020-11-21'),('2020-1-01'), ('2019-5-1'), ('19-12-1') t (d);
Note To write multiple SQL statements, you must use SQL in script mode. For more information, see Run MaxCompute SQL in script mode.