Once the MaxCompute Java Module has been created, udfs can be developed.
- Expand the maxcompute Java Module Directory that you createdMaxCompute Java. as shown in the following figure.
- Set Name and Kind, and click OK. as shown in the following figure.
Name: Specifies the name of the MaxCompute Java Class. If you have not created a package, you can enter packagename.classname to automatically create a package.
Kind: Specifies the type. Supported types include custom functions (UDF/UDAF/UDTF), MapReduce (Driver/Mapper/Reducer), and non-structural development (StorageHandler/Extractor).
- After the creation is successful, the Java program can be developed, modified, and tested.
Note Here's a code template that can be customized in thinkj, which is specific: preference-Editor-file and code templates, then look for the template modifications corresponding to maxcompute in the Code tab.
Debug the UDF program
After the UDF program is developed, it can be tested using unit test (UT) or local running to check whether it meets expectations.Unit Testing
There are various UT examples in the examples directory and you can refer to them to compile your UT.
During local running of the UDF program, the running data source must be specified. The following two methods are provided to set the test data source:
MaxCompute Studio uses the Tunnel Service to automatically download table data of a specific project to the warehouse directory.
The mock project and table data are provided. You can see example_project in warehouse to set it by yourself.
- Right-click UDF Class and select Run UDF class.main(). The Run Configuration dialog box is displayed. In normal cases, UDF/UDAF/UDTF data is used as columns in tables of a select sub-statement. The MaxCompute project, table, and column need to be configured. (The metadata is from the mock project under project explorer and warehouse.) Debugging for complex types is also supported, as shown in the following figure:
- If the table data under the specified project is not downloaded into glashourse, You need to download the data first, default download 100 entries. If more data is required, use the Tunnel Command of the console or table downloading function of Studio.
- If the mock project is used or the table data is downloaded, directly run the program.
- The UDF local run framework uses data in specific columns in warehouse as the UDF input and run the UDF program locally. You can view log output and result display on the console.
The local warehouse directory is used to store tables (including meta and data) or resources for local UDF or MR running. The following figure shows the warehouse directory.