All Products
Search
Document Center

DataWorks:Use third-party packages and custom Python scripts in PyODPS nodes

Last Updated:Feb 08, 2026

When PyODPS built-in libraries don't meet your needs, you can extend functionality with third-party packages or custom Python scripts. This topic shows how to install third-party packages using custom images and O&M Assistant, and how to import a Python script.

Choose your integration approach

Use case

Resource group type

Solution

Using third-party packages

Serverless resource group

Install open source packages using a custom image

Exclusive resource group for scheduling

Install open source packages using O&M Assistant

Using custom .py script files

Serverless resource group or exclusive resource group for scheduling

Reference custom Python resources

The following figure shows the main process for each solution.

image

Preparations

Before you start, understand the following two key concepts to determine your configuration method.

  1. PyODPS 3 vs. PyODPS 2

    • PyODPS 3 (recommended): Based on the Python 3.7+ environment. Official support for Python 2 has ended. This topic uses PyODPS 3 as the primary example.

    • PyODPS 2: Based on the Python 2.7 environment.

  2. Resource groups: serverless resource groups vs. exclusive resource group for scheduling

    • Serverless resource group (recommended): Elastic and maintenance-free, reliant on custom images to manage dependencies.

    • Exclusive resource group for scheduling: Requires manual maintenance of the resource group and installs dependencies through O&M Assistant, which has more limitations.

    Note

    To view the resource group type bound to your workspace, navigate to the Resource Group page in workspace details within the DataWorks console.

    • General-purpose Type indicates you are using a serverless resource group.

    • Data Scheduling indicates you are using an exclusive resource group for scheduling.

Install third-party packages using a custom image

Note

This method applies to serverless resource groups.

This tutorial provides an end-to-end example that shows how to create a custom environment containing the pendulum package and call it in a PyODPS 3 node to get and format the current time in a specific time zone.

Step 1: Create a custom image that contains pendulum

  1. Log on to the DataWorks console and go to Image Management.

  2. Select Custom Images.

  3. Click Create Image and configure the following key parameters:

    Parameter

    Description

    Image Name

    Example: pyodps3_with_pendulum.

    Reference Type

    Select DataWorks Official Image.

    Image Name/ID

    Select dataworks_pyodps_task_pod.

    Supported Task Types

    Select PyODPS 3.

    Installation Package

    SelecPython3 and then select the pendulum package from the drop-down list.

    For more information about installation commands, see Use third-party packages and custom Python scripts in PyODPS nodes.
    Important

    To install packages from the Internet, ensure the VPC bound to the serverless resource group have Internet access.

  4. Click OK.

  5. On the Custom Images page, test and publish the image.

  6. In the Actions column of the image, click image > Change Workspace and then bind the custom image to the target workspace.

    image

Step 2: Create and configure a PyODPS 3 node

  1. In the left navigation pane, click Data Development And O&M > Data Development. From the drop-down list, select the target workspace and click Go To Data Development.

  2. Click the image to the right of Workspace Directories to create a new PyODPS 3 node. For example, you can name the node pyodps3_pendulum_test.

  3. In the editor, enter the following Python 3 code:

    # Because pendulum is installed in the custom image, you can directly import it. Python 3 syntax
    import pendulum
    print("Start testing the third-party package pendulum...")
    try:
        # Use pendulum to get the current time in the "Asia/Shanghai" time zone.
        shanghai_time = pendulum.now('Asia/Shanghai')
    
        # Print the formatted time and time zone information.
        print(f"Successfully imported the 'pendulum' package.")
        print(f"The current time in Shanghai is: {shanghai_time.to_datetime_string()}")
        print(f"The corresponding time zone is: {shanghai_time.timezone_name}")
    
        print("\nTest passed! The PyODPS node successfully called the third-party package.")
    except Exception as e:
        print(f"Test failed. An error occurred: {e}")

Step 3: Test and verify the result

  1. On the right side of the editor, in the Run Configuration section, set Resource Group to the serverless resource group that you prepared. Change the image to pyodps3_with_pendulum.

    Important

    If you cannot find the target image, bind the image to the current workspace.

  2. Click the Running Duration button.

  3. View the run logs at the bottom of the page. The following output indicates that the pendulum package was successfully called.

    Start testing the third-party package pendulum...
    Successfully imported the 'pendulum' package.
    The current time in Shanghai is: 2025-09-27 15:45:00
    The corresponding time zone is: Asia/Shanghai
    Test passed! The PyODPS node successfully called the third-party package.

Step 4: Publish the PyODPS 3 node

After completing the test, go to Scheduling > Scheduling Policies. Select the prepared serverless resource group and change the image to pyodps3_with_pendulum. Then, publish the node to Operation Center.

Install third-party packages using O&M Assistant

Important

This method is for exclusive resource groups for scheduling which are no longer recommended.

  1. Log on to the DataWorks Workspaces page, switch the region at the top, find the target workspace, and click Details in the Actions column.

  2. In the left-side navigation pane, click Resource Group, find the bound exclusive resource group for scheduling. In the Actions column, click the image icon and select O&M Assistant.

  3. Select Create Command in the upper-left corner.

  4. For Python 3 (PyODPS 3), keep the other default options. From the drop-down list, select the pendulum installation package.

  5. On the O&M Assistant page, click Run Command in the Actions column. Once the command completes, you can use import pendulum directly in the corresponding PyODPS node.

Reference custom Python resources

If you only want to call a function from another .py file that you wrote, follow these steps:

  1. Create a Python resource:

    1. On the Resource Management page, click the image button and choose Create Resource > MaxCompute Python.

    2. In the Create Resource or Function dialog box, enter a name for the resource (for example: my_utils.py) and click OK.

    3. Upload the file my_utils.py to File Content. For Data Source, select the computing resource that you bound.

    4. Save and Publish the resource.

  2. Create a PyODPS 3 node and reference the Python resource:

    • In the left navigation pane, click the image icon to go to Workspace Directories on the Data Studio page.

    • Click image and choose Create Node > MaxCompute > PyODPS 3. Follow the instructions to create the node.

    • In the editor, reference the Python resource using ##@resource_reference{"my_utils.py"} as shown in the following code:

      ##@resource_reference{"my_utils.py"}
      import sys
      import os
      # Add the current directory where the resource is located to the Python interpreter's search path.
      sys.path.append(os.path.dirname(os.path.abspath('my_utils.py')))
      # Now you can import and use it like a normal module.
      import my_utils
      my_utils.say_hello("DataWorks")
  3. Run the node. You will see "Hello, DataWorks! This is from my_utils module." in the run logs.

FAQ

Q: Why does my custom image test hang during package installation?

A: This issue typically occurs due to network connectivity problems. Try these solutions:

  • If your task environment requires third-party packages from the Internet, ensure the VPC bound to your serverless resource group has Internet access. For more information, see Enable public network access.

  • Try switching to a different Python package source, such as https://mirrors.aliyun.com/pypi/simple/.

Q: What should I do if importing a third-party package fails?

A: Troubleshoot as follows:

  1. Confirm that the custom image has been published successfully.

  2. Confirm that the task type supported by the image (PyODPS 2/3) matches the type of node you created.

  3. Confirm that the custom image is correctly selected in the Properties of the PyODPS node.

    You cannot select shared resource groups.
  4. Check if the installed package version is compatible with your Python version (for example, pendulum 2.0+ does not support Python 2).

References