This topic describes how to dynamically load and delete user-defined functions (UDFs) for the Trino service of an E-MapReduce (EMR) cluster.
Background information
If you want to add a UDF to the Trino service of an EMR cluster of a minor version earlier than V3.39.1 or V5.5.0, you must upload the JAR file of the UDF to all nodes in your EMR cluster and restart the nodes. This procedure is complex. If your EMR cluster is deployed on a Container Service for Kubernetes (ACK) cluster, you also need to repackage an image when you add the UDF. This procedure is not user-friendly. In an EMR Trino cluster of V3.39.1, V5.5.0, or a minor version later than V3.39.1 or V5.5.0, UDFs can be dynamically loaded and deleted.
Limits
UDFs can be dynamically loaded and deleted only in an EMR Hadoop cluster or an EMR Trino cluster of V3.39.1, V5.5.0, or a minor version later than V3.39.1 or V5.5.0.
Usage notes
When you run the DROP command, if
xxxxxxin the command specifies an existing connector such as hive or mysql, all content in the connector directory is deleted and cannot be recovered. Exercise caution when you run the DROP command.After you scale out an EMR cluster, the new nodes do not contain the UDF JAR file that is uploaded to the existing nodes. To add the UDF JAR file to the new nodes, run the DROP command before you perform the scale-out operation and run the ADD command after the scale-out operation is complete.
If access to Hadoop Distributed File System (HDFS) or Object Storage Service (OSS) fails, run the
hadoop fs -lscommand to check whether each node in your EMR cluster can directly access the UDF JAR file. If a worker node cannot access the UDF JAR file, you can identify the cause based on the server.log file of the worker node.
Procedure
Step 1: Make preparations
Upload the JAR file for the UDF to the file system that you use.
You can use one of the following methods to upload the JAR file for the UDF:
Method 1: Package all UDF content into a JAR file such as udfjar.jar. Then, upload the JAR file to the file system. If the OSS bucket belongs to a different Alibaba Cloud account from the Hadoop cluster or Trino cluster, or the Hadoop cluster or Trino cluster is deployed on an ACK cluster, make sure that public read access is enabled for the UDF JAR file.
ImportantThe name of the JAR file cannot be the same as the name of an existing connector or UDF.
We recommend that the JAR file name contain only letters and digits. Otherwise, the name may not be recognized.
The Trino service cannot directly use the UDF of the Presto service. You must modify the name of the JAR file and recompile the JAR file. Otherwise, the UDF may fail to be added.
Method 2: If the UDF that you want to add depends on multiple JAR files and you do not want to package the JAR files into one JAR file, you can upload all the JAR files to the same directory such as udfdir. Then, upload the entire directory to the file system.
ImportantThe name of the directory cannot be the same as the name of an existing connector or UDF.
Do not include irrelevant content in the directory.
Make sure that you are granted permissions to configure this directory.
Step 2: Add the UDF
Start the client, connect the client to the Trino service, and perform the following operations to add the JAR file of the UDF that you want to add.
Log on to the cluster in SSH mode. For more information, see Log on to a cluster.
Run the following command to log on to the Trino console:
trino --server master-1-1:9090
Run the following command to add the UDF.
In this step, a folder is created in the plugin directory of the Trino installation path. The UDF JAR file is uploaded from the file system path that you specify in the add command to the created folder, and the function list is refreshed.
The syntax for adding a UDF is
add jar "xxxxxx".NoteIf an error occurs when you run the command, you can check the
/mnt/disk1/log/trino/var/log/server.loglog file of a worker node to identify the cause.If you use method 1 in Step 1: Make preparations, run the following command to add the UDF:
add jar "oss://path/udfjar.jar"If you use method 2 in Step 1: Make preparations, run the following command to add the UDF:
add jar "oss://path/udfdir"If you use method 2, Trino identifies the entire directory that needs to be uploaded and downloads all content from the directory to the cluster.
If the UDF JAR file is stored in HDFS, run the following command to add the UDF:
add jar "hdfs://xxxxxx";If the UDF JAR file is stored in a local file system, run the following command to add the UDF:
add jar "file:///xxxxxx";ImportantIf a local file system is used, you need to upload the UDF JAR file to the required paths of all nodes on which the Trino service runs. Three forward slashes (/) must be specified after
file.
Step 3: Delete the UDF
Directly delete the entire directory that is named after the UDF JAR file in the plugin directory. Specify the name of the UDF JAR file in the DROP command to delete the UDF, and reload the function list.
The syntax for deleting a UDF is DROP JAR xxxxxx.
xxxxxx indicates the name of the uploaded UDF JAR file. xxxxxx also indicates the name of the added folder in the UDF plugin directory that is read by the Trino service.
You do not need to enclose the name of a UDF JAR file in a pair of quotation marks (") in the DROP command. No matter whether a UDF JAR file is uploaded by using method 1 or method 2 in Step 1, you do not need to add the jar file name extension when you delete the UDF JAR file. Sample commands:
drop jar udfjar;drop jar udfdir;