This topic provides an example on how to use Java user-defined table-valued functions (UDTFs) to read resources from MaxCompute base on MaxCompute Studio.
Prerequisites
MaxCompute Studio is installed and connected to a MaxCompute project, and a MaxCompute Java Module is created. For more information, see Install MaxCompute Studio, Manage project connections, and Create a MaxCompute Java module.
The development tool IDEA 2024 and JDK version 1.8 are installed.
UDTF code examples
The following sample code is the Java UDTF.
Parameter category | Parameter type | Description |
Input Parameter | String | First input parameter. |
String | Second input parameter. | |
Output Parameter | String | First input parameter value. |
Bigint | Length of the second input parameter string. | |
String | Concatenated value of the line count from file_resource.txt, the row count from the table_resource1 table, and the row count from the table_resource2 table. |
package com.aliyun.odps.examples.udf;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.Iterator;
import com.aliyun.odps.udf.ExecutionContext;
import com.aliyun.odps.udf.UDFException;
import com.aliyun.odps.udf.UDTF;
import com.aliyun.odps.udf.annotation.Resolve;
/**
* project: example_project
* table: wc_in2
* partitions: p1=2,p2=1
* columns: cola,colc
*/
@Resolve("string,string->string,bigint,string")
public class UDTFResource extends UDTF {
ExecutionContext ctx;
long fileResourceLineCount;
long tableResource1RecordCount;
long tableResource2RecordCount;
@Override
public void setup(ExecutionContext ctx) throws UDFException {
this.ctx = ctx;
try {
InputStream in = ctx.readResourceFileAsStream("file_resource.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
fileResourceLineCount = 0;
while ((line = br.readLine()) != null) {
fileResourceLineCount++;
}
br.close();
Iterator<Object[]> iterator = ctx.readResourceTable("table_resource1").iterator();
tableResource1RecordCount = 0;
while (iterator.hasNext()) {
tableResource1RecordCount++;
iterator.next();
}
iterator = ctx.readResourceTable("table_resource2").iterator();
tableResource2RecordCount = 0;
while (iterator.hasNext()) {
tableResource2RecordCount++;
iterator.next();
}
} catch (IOException e) {
throw new UDFException(e);
}
}
@Override
public void process(Object[] args) throws UDFException {
String a = (String) args[0];
long b = args[1] == null ? 0 : ((String) args[1]).length();
forward(a, b, "fileResourceLineCount=" + fileResourceLineCount + "|tableResource1RecordCount="
+ tableResource1RecordCount + "|tableResource2RecordCount=" + tableResource2RecordCount);
}
}
The following code shows the dependency that is required in the pom.xml
file for local testing.
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>odps-udf-local</artifactId>
<version>0.48.0-public</version>
</dependency>
Procedure
Local testing
Create a new Java program of the UDTF type in MaxCompute Studio. For example, name the Java Class
UDTFResource
and use the program code from the UDTF code examples.Configure the runtime parameters based on the warehouse resource in the Java Module.
NoteThe input parameters are the values of the first and third columns of each row in the partition p1=2, p2=1 of the wc_in2 table in the local resource.
The code execution retrieves data from the local resource
file_resource.txt
, data fromtable_resource1
corresponding to the tablewc_in1
, and data fromtable_resource2
corresponding to the tablewc_in2
with parametersp1=2
andp2=1
.
Right-click the UDTFResource class and select Run to execute the program. The results are displayed.
Client testing
Click
Project Explorer in the upper-left corner of IDEA, and select
Add Resource.
Add the file_resource.txt file based on the MaxCompute instance information.
Create sample data tables wc_in1 and wc_in2 in the MaxCompute project and insert data.
CREATE TABLE wc_in1 ( col1 STRING, col2 STRING, col3 STRING, col4 STRING ); INSERT INTO wc_in1 VALUES ('A1','A2','A3','A4'), ('A1','A2','A3','A4'), ('A1','A2','A3','A4'), ('A1','A2','A3','A4'); CREATE TABLE wc_in2 ( cola STRING, colb STRING, colc STRING ) PARTITIONED BY (p1 STRING, p2 STRING); ALTER TABLE wc_in2 ADD PARTITION (p1='2',p2='1'); INSERT INTO wc_in2 PARTITION (p1='2',p2='1') VALUES ('three1','three2','three3'), ('three1','three2','three3'), ('three1','three2','three3');
Map the MaxCompute-created wc_in1 and wc_in2 tables to the table_resource1 and table_resource2 in Resource.
Add wc_in1 resource.
Add wc_in2 resource.
Package the created UDTF into a JAR file, upload it to the MaxCompute project, and register the function. For example, the function name is
my_udtf
. Right-click the UDTFResource class and select Deploy to Server... to enter the packaging and upload interface. Include the necessary resources such asfile_resource.txt
,table_resource1
, andtable_resource2
in the Extra resources section.Click
Project Explorer in the upper-left corner of IDEA, right-click the target MaxCompute project, and select Open Console to start the MaxCompute client and execute SQL commands to call the newly created UDTF. The results are displayed.
Sample SQL command:
SELECT my_udtf("10","20") AS (a, b, fileResourceLineCount);