Use a Python user-defined function (UDF) to read data from a MaxCompute table resource at query time. This topic walks through the complete workflow: write the UDF, upload the resources, register the function, and call it in SQL.
Prerequisites
Before you begin, ensure that you have:
The MaxCompute client installed and configured. For setup instructions, see Install and configure the MaxCompute client
The target table registered as a resource in your MaxCompute project. For instructions, see Add resources
In this topic, the example uses the udf_test table as the registered table resource. The table contains the following data:
+------+------+
| col1 | col2 |
+------+------+
| 1 | a |
| 2 | b |
| 4 | c |
| 5 | d |
+------+------+How it works
get_cache_table() loads the table data into memory once when the UDF class is initialized. Every subsequent call to evaluate() reads from that in-memory snapshot — the table is not re-read for each row. This is why the load call belongs in __init__, not in evaluate().
Step 1: Write the UDF
The following Python UDF reads all rows from udf_test into memory on initialization, then returns one row per evaluate() call.
from odps.udf import annotate
from odps.distcache import get_cache_table
@annotate('->string')
class DistCacheTableExample(object):
def __init__(self):
self.records = list(get_cache_table('udf_test'))
self.counter = 0
self.ln = len(self.records)
def evaluate(self):
if self.counter > self.ln - 1:
return None
ret = self.records[self.counter]
self.counter += 1
return str(ret)Save this file as table.py in the bin directory of the MaxCompute client.
Step 2: Upload resources and register the UDF
Upload the Python script
Run the following command to add table.py as a resource:
add py table.py;Expected output:
OK: Resource 'table.py' have been created.For the full list of resource commands, see Add resources.
Register the UDF
Run the following command to create the UDF:
create function table_udf as 'table.DistCacheTableExample' using 'table.py,udf_test';| Parameter | Description |
|---|---|
table_udf | The name of the UDF. This is the name you call in SQL statements. |
table.DistCacheTableExample | table is the base name of the script file (table.py). DistCacheTableExample is the class defined in that script. |
Expected output:
Success: Function 'table_udf' have been created.For more details on UDF registration, see Create a UDF.
Step 3: Call the UDF
Create a test table, insert data, and call the UDF:
-- Create a test table
CREATE TABLE table_test (arg bigint);
-- Insert test data
INSERT INTO table_test VALUES (1), (4), (15), (123), (7995);
-- Call the UDF
SELECT table_udf() FROM table_test;The UDF is called once per row in table_test. Because table_test has 5 rows but udf_test has only 4, the fifth call returns NULL.
Expected output:
+----------+
| _c0 |
+----------+
| (4, 'c') |
| (5, 'd') |
| (1, 'a') |
| (2, 'b') |
| NULL |
+----------+What's next
To read MaxCompute table resources using a Java user-defined table-valued function (UDTF), see Use a Java UDTF to read resources from MaxCompute.