You can use MaxCompute Studio to develop Python user-defined functions (UDFs). This topic describes how to develop, test, and publish them.
Prerequisites
Before you start, complete the following:Develop a Python UDF
- In the Project pane, under the MaxCompute Studio directory, right-click scripts and select .
- In the Create new MaxCompute python class dialog box, enter a class name in the Name field, set Kind to Python UDF, and then click OK.
- Write the UDF code in the editor.
from odps.udf import annotate @annotate("bigint,bigint->bigint") class Hello(object): def evaluate(self, arg0, arg1): if None in (arg0,arg1): return None return arg0+arg1
Test the UDF
After you develop a UDF, you must test the code to ensure it works as expected. MaxCompute Studio supports local testing, which allows you to download sample data from a table to run and debug the UDF locally.
- Right-click the Python UDF script and select RUN.
- On the Edit configuration page, configure the required parameters and click OK.
- MaxCompute project: The MaxCompute project where the UDF runs. If you have already configured a MaxCompute project connection in Manage project connections, this field defaults to the connected project. You can also add other projects as prompted.
- MaxCompute table: The source MaxCompute table for the UDF run. You can select a table from the drop-down list of tables in the selected MaxCompute project.
- Table columns: The source table columns that the UDF uses.
- Download Record limit: The maximum number of records to download. Default: 100.
Note- If data has already been downloaded, MaxCompute Studio does not download it again. To re-download the data, run the Tunnel command in the MaxCompute client.
- By default, 100 records are downloaded. To test with more data, download the data by using the Tunnel command in the MaxCompute client or the table download feature in MaxCompute Studio.
- After the download is complete, you can find the sample data in the table's data file, located in the warehouse directory.
- The local run framework retrieves data from the specified columns in the data file and runs the UDF locally.
Note Local runs use the
pyouscript from PyODPS. The command ispyou hello.Hello<data. After installing PyODPS, you can run a command to verify that the script exists.- If you are using Windows, run the
${python}/../Scripts/pyoucommand. - If you are using macOS, run the
${python}/../pyoucommand.
- If you are using Windows, run the
- The following example shows the source code of a Python UDF. After you run the code, you can view the output in the console.
from odps.udf import annotate @annotate("bigint,bigint->bigint") class Plus(object): def evaluate(self, arg0, arg1): if None in (arg0, arg1): return None return arg0 + arg1
Publish a Python UDF
After the Python UDF passes the tests, you can publish it to a production environment. For more information, see Upload and register a function.