All Products
Search
Document Center

MaxCompute:Generate a third-party package for PyODPS

Last Updated:Mar 26, 2026

pyodps-pack is a command-line interface (CLI) tool shipped with PyODPS V0.11.3 and later. It packages Python dependencies — PyPI packages, custom code, Git repositories, and packages with binary dependencies — into a .tar.gz archive that MaxCompute and DataWorks PyODPS nodes can consume directly. The operation is similar to using pip.

Prerequisites

Before you begin, ensure that you have:

  • PyODPS V0.11.3 or later installed

  • Docker installed and running (required for Docker mode, which is the default)

  • Python 3 (recommended; Python 2 packaging may fail for new projects)

Packaging scenarios at a glance

Use this table to find the command for your scenario, then follow the detailed instructions in the sections below.

ScenarioCommand
Package a PyPI package (Docker mode)pyodps-pack pandas
Package a PyPI package (non-Docker mode)pyodps-pack --without-docker pandas
Package a specific versionpyodps-pack pandas==1.2.5
Package for Python 2.7 in MaxComputepyodps-pack --mcpy27 pandas
Package for Python 2.7 in DataWorkspyodps-pack --dwpy27 pandas
Package custom local codepyodps-pack /<path_to_package>/test_package_root
Package from a Git repositorypyodps-pack git+https://github.com/aliyun/aliyun-odps-python-sdk.git
Package with binary dependenciespyodps-pack --run-before install-gdal.sh gdal==3.6.0

How it works

pyodps-pack is installed automatically with PyODPS. On Linux and macOS, the executable is in the bin directory under the Python installation path. On Windows, it is in the Scripts directory. Run pyodps-pack commands from the Windows CLI, macOS Terminal, or Linux Shell — not from the DataWorks console, the MaxCompute client (odpscmd), or the Python CLI.

If the bin or Scripts directory is already in your PATH environment variable (set during Python installation or virtual environment activation), no additional setup is needed. Otherwise, navigate to the directory manually or add it to PATH.

After packaging completes, pyodps-pack prints all resolved dependency versions and writes packages.tar.gz to the current directory.

Choose a packaging mode

pyodps-pack supports two modes:

ModeWhen to useHow to enable
Docker mode (default)All cases where Docker is available. Produces the most compatible packages.Install Docker; no extra flag needed
Non-Docker modeOnly when Docker is not available in your environmentAdd --without-docker
Warning

Packages built in non-Docker mode may not work in MaxCompute or DataWorks. Use Docker mode whenever possible.

Set up Docker

pyodps-pack calls Docker automatically and pulls the required images — no manual image management needed.

Linux: See Install Docker Engine.

macOS or Windows:

Note

pyodps-pack is not tested in other Docker environments such as minikube. Availability in those environments is not guaranteed.

Windows-specific notes:

  • Windows Server may be required for the Docker service to start, but many enterprises disable Windows Server for security reasons. If Docker fails to start, switch to Linux or try enabling Windows Server.

  • On Windows 10 with Rancher Desktop, containerd may not work as the container engine. Use dockerd instead. See Container Engine for configuration steps.

Set up non-Docker mode

Note

Use non-Docker mode only when Docker is unavailable. If you encounter errors or the generated package does not work, switch to Docker mode.

Add --without-docker to any pyodps-pack command. Before using this mode:

  • Make sure pip is installed in your Python environment.

  • On Windows, install Git Bash (included in Git for Windows).

对于专有云环境,如果您的MaxCompute或DataWorks基于Arm64机型部署,您需要额外增加--arch aarch64参数指定打包需要的架构。通常Docker Desktop或Rancher Desktop已经安装了跨平台打包所需的binfmt相关组件,您也可以使用以下命令安装相关的虚拟环境。

docker run --privileged --rm tonistiigi/binfmt --install arm64
Note

上述命令要求Linux Kernel版本高于4.8,具体请参考binfmt

Package PyPI dependencies

Note

Some packages have optional dependencies that pyodps-pack does not include automatically. For example, pandas requires openpyxl when you use the to_excel method. Check the third-party package documentation and add any optional dependencies explicitly to the command.

Usage notes:

  • Use Python 3 for new MaxCompute projects. Packaging with Python 2 may fail.

  • For existing Python 2 projects, migrate to Python 3 to simplify maintenance.

  • On Linux, prefix pyodps-pack commands with sudo to ensure Docker has the necessary permissions.

  • On macOS, do not use sudo with pyodps-pack — doing so may cause permission errors.

Docker mode:

pyodps-pack pandas

Non-Docker mode:

pyodps-pack --without-docker pandas

Specific version:

pyodps-pack pandas==1.2.5

After packaging, pyodps-pack prints the resolved versions of all included packages:

Package         Version
--------------- -------
numpy           1.21.6
pandas          1.2.5
python-dateutil 2.8.2
pytz            2022.6
six             1.16.0

The output archive packages.tar.gz is written to the current directory.

Package for Python 2.7

If you need a Python 2.7 package, the command depends on where the package will run. See PyODPS DataFrame for context.

For MaxCompute:

pyodps-pack --mcpy27 pandas

For DataWorks:

pyodps-pack --dwpy27 pandas

Package custom code

pyodps-pack packages custom Python projects built with setup.py or pyproject.toml. See Build System Interface for details.

The following example packages a project based on pyproject.toml with this directory structure:

test_package_root
├── test_package
│   ├── __init__.py
│   ├── mod1.py
│   └── subpackage
│       ├── __init__.py
│       └── mod2.py
└── pyproject.toml

Example pyproject.toml:

[project]
name = "test_package"
description = "pyodps-pack example package"
version = "0.1.0"
dependencies = [
    "pandas>=1.0.5"
]

Run the following command, replacing <path_to_package> with the parent directory of test_package_root:

pyodps-pack /<path_to_package>/test_package_root

This compresses the project and all its dependencies into packages.tar.gz.

Package code from a Git repository

Pass the Git URL directly:

pyodps-pack git+https://github.com/aliyun/aliyun-odps-python-sdk.git

To package a specific branch or tag:

pyodps-pack git+https://github.com/aliyun/aliyun-odps-python-sdk.git@v0.11.2.2

If the packaging process requires build-time dependencies (such as Cython), specify them with --install-requires. These dependencies are used during packaging but are not necessarily included in the output archive.

pyodps-pack \
    --install-requires cython \
    git+https://github.com/aliyun/aliyun-odps-python-sdk.git@v0.11.2.2

Alternatively, list build-time dependencies in a file that follows the requirements.txt format and pass it with --install-requires-file:

cython>0.29
pyodps-pack \
    --install-requires-file install-requires.txt \
    git+https://github.com/aliyun/aliyun-odps-python-sdk.git@v0.11.2.2

Package binary dependencies

Some packages include binary components — such as dynamic-link libraries — that must be compiled before packaging. Use --run-before to run a Bash script that installs these binary components before pyodps-pack runs.

The following example packages GDAL 3.6.0, which requires libgdal (version > 3.6.0) and PROJ (version 6.0 or later), both compiled with CMake.

  1. Write a script named install-gdal.sh that compiles and installs PROJ and GDAL:

    #!/bin/bash
    set -e
    
    cd /tmp
    curl -o proj-6.3.2.tar.gz https://download.osgeo.org/proj/proj-6.3.2.tar.gz
    tar xzf proj-6.3.2.tar.gz
    cd proj-6.3.2
    mkdir build && cd build
    cmake ..
    cmake --build .
    cmake --build . --target install
    
    cd /tmp
    curl -o gdal-3.6.0.tar.gz http://download.osgeo.org/gdal/3.6.0/gdal-3.6.0.tar.gz
    tar xzf gdal-3.6.0.tar.gz
    cd gdal-3.6.0
    mkdir build && cd build
    cmake ..
    cmake --build .
    cmake --build . --target install
  2. Run pyodps-pack with the script and the oldest-supported-numpy build dependency:

    pyodps-pack --install-requires oldest-supported-numpy --run-before install-gdal.sh gdal==3.6.0

Parameters

ParameterDescription
-r, --requirement <file>Dependency file required for packaging. Can be specified multiple times.
-o, --output <file>Name of the output archive. Default: packages.tar.gz.
--install-requires <item>PyPI dependency required at packaging time (not necessarily included in the output archive). Can be specified multiple times.
--install-requires-file <file>File listing PyPI dependencies required at packaging time, in requirements.txt format. Can be specified multiple times.
--run-before <script-file>Bash script to run before packaging. Typically used to install binary dependencies.
-x, --exclude <dependency>PyPI dependency to exclude from the output archive. Can be specified multiple times.
--no-depsExclude transitive dependencies from the output archive.
-i, --index-url <index-url>PyPI index URL. Defaults to global.index-url from pip config list, which is configured in pip.conf.
--trusted-host <host>HTTPS host whose certificate errors should be ignored during packaging.
-l, --legacy-imageUse a CentOS 5 image for packaging. Ensures compatibility with older environments such as earlier versions of Apsara Stack.
--mcpy27Generate a package for Python 2.7 in MaxCompute. Enables --legacy-image by default.
--dwpy27Generate a package for Python 2.7 in DataWorks. Enables --legacy-image by default.
--prefer-binaryPrefer older binary distributions on PyPI over newer source-only distributions.

--arch <architecture>

指定目标包面向的硬件架构,目前仅支持x86_64和AArch64(或 ARM64),默认为x86_64。如果您不在专有云使用MaxCompute或DataWorks,则不需要指定该参数。

--python-version <version>

指定目标包面向的Python版本,可使用3.6或者36表示Python 3.6。如果您不在专有云使用MaxCompute或DataWorks,则不需要指定该参数。

--docker-args <args>Additional arguments to pass to Docker. Enclose multiple arguments in double quotation marks, for example: --docker-args "--ip 192.168.1.10".
--without-dockerRun pyodps-pack in non-Docker mode. Packages with binary dependencies may fail or produce unusable output.
--without-mergeKeep .whl files instead of merging them into a .tar.gz archive.
--debugPrint detailed command execution output for troubleshooting.

Troubleshoot packaging failures

SymptomLikely causeResolution
Package generated in non-Docker mode does not work in MaxCompute or DataWorksNon-Docker mode lacks the matching OS environmentSwitch to Docker mode: remove --without-docker
Docker fails to start on WindowsWindows Server is disabled by your organizationSwitch to Linux, or try enabling Windows Server
containerd error with Rancher Desktop on Windows 10Incompatible container engineSwitch to dockerd in Rancher Desktop Container Engine settings
Packaging fails with Python 2Python 2 packaging may fail for newer packagesUse Python 3; if Python 2.7 is required, use --mcpy27 (MaxCompute) or --dwpy27 (DataWorks)
Optional dependency missing at runtimepyodps-pack does not include optional dependencies automaticallyAdd the optional dependency explicitly to the command, for example: pyodps-pack pandas openpyxl
Binary dependency compilation failsMissing system libraries or build toolsUse --run-before to install required system libraries before packaging; add --debug for detailed output

What's next

After generating the package, upload it to MaxCompute as an archive resource and reference it in your PyODPS node. See Reference a third-party package in a PyODPS node.