Offline computing functions are used to manage SQL functions in offline computing task code development. These include commonly used functions supported by default in the compute engine and user-defined functions. Default functions cannot be edited. This topic describes how to create a user-defined function.
Prerequisites
Complete the resource creation. For more information, see upload resources and references.
Background information
Dataphin organizes functions into directories based on their types to help you better manage functions.
Different compute engines support different types of functions.
Compute Engine Type
Supported Functions
Offline Engine
MaxCompute
MAXC functions
Hologres
Custom functions are not supported
Hadoop
Hadoop functions (Hive functions), Impala functions
TDH Inceptor
Custom functions are not supported
ADB for PostgreSQL
ADB functions
SelectDB
Custom functions are not supported
Doris
Custom functions are not supported
Real-time Engine
Alibaba Blink
FLINK functions
Ververica Flink
FLINK functions
Open-source Flink
FLINK functions
External MaxCompute projects do not support creating custom functions.
Procedure
In the top menu bar of the Dataphin homepage, choose Develop > Data Development.
In the top menu bar, select Project (In Dev-Prod mode, you need to select Environment).
In the navigation pane on the left, choose Data Processing > Function.
In the function list on the right, click the
icon and select the target function type.In the Create Function dialog box, configure the parameters.
Parameter
Description
Name
Enter a function name. The name can contain letters, numbers, and underscores (_), and must start with a letter.
NoteWithin the same project, Impala functions and Hive functions cannot have the same name as any custom function in any directory of the project.
Select Resource
Select the resource file. The drop-down list provides resource names that match the current project.
NoteOnly JAR files are supported for function definition.
When selecting multiple resources, they must be of the same type.
If you don't have resources yet, you need to create them. For more information, see upload resources and references.
Programming Language
Impala supports functions defined in C++ and Java. To define an Impala function, select the corresponding programming language based on your resource type.
Class Name
Enter the class name. For resources in the compute type, extract the class content from the resource, such as
test_udf.UDFGETSrcId.Type
Select the type. The drop-down list includes Window, Statistics, Numeric, String, Time, IP Address Related Functions, URL, Encoding, Business, and Others.
Register Function
To define an Impala function and the programming language of the resource is C++, enter the statement to create the Impala function. The registered function must follow the syntax below. The Location statement backend is compatible with resource file substitution.
Create C++ scalar function
CREATE FUNCTION [IF NOT EXISTS] [db_name.]function_name([arg_type[, arg_type...]) RETURNS return_type SYMBOL='symbol_name'Create C++ aggregate function
CREATE [AGGREGATE] FUNCTION [IF NOT EXISTS] [db_name.]function_name([arg_type[, arg_type...]) RETURNS return_type [INTERMEDIATE type_spec] [INIT_FN='function'] UPDATE_FN='function' MERGE_FN='function' [PREPARE_FN='function'] [CLOSEFN='function'] [SERIALIZE_FN='function'] [FINALIZE_FN='function']
For more information, see User-Defined Functions (UDF).
Syntax
Enter the syntax. The syntax is the function reference format, such as:
bigintweekday (datetime date).Usage Documentation
Enter the function usage description, for example:
select get_week_date("20170810",0,2),--Query the date of Tuesday in the week of August 10. from cndata.dualSelect Directory
The system defaults to the directory of the current function type. To modify it, the system only supports modifying subdirectories under the function type directory.
For example, if you are creating a MAXC function, the system automatically selects the MAXC function directory. To modify the directory, the system only supports selecting subdirectories under the MAXC function directory.
After completing the configuration, click Submit. In the dialog box that appears, enter Submission Notes, and then click Confirm And Submit.
NoteIf the resources referenced by the custom function are updated, you need to resubmit the custom function so that the custom function registered with the compute engine is updated.
After successful submission, related reference tasks automatically reference the new version of the object, which may cause tasks to become unavailable. Please check promptly.
You can use Ad Hoc Query (see query and download data) to write SQL code (referencing the function in the SQL code) to verify whether the function meets the expected effect. The following is an example of an SQL query statement:
select get_week_date("20170810",0,2),--Query the date of Tuesday in the week of August 10. from cndata.dual
What to do next
If the project mode is Dev-Prod, you need to publish the resource to the production environment. For more information, see manage publishing tasks.
If your development mode is Basic mode, you can use the custom function for computing task development after successful submission. For more information, see data development overview.