pyodps-pack is a command-line interface (CLI) tool shipped with PyODPS V0.11.3 and later. It packages Python dependencies — PyPI packages, custom code, Git repositories, and packages with binary dependencies — into a .tar.gz archive that MaxCompute and DataWorks PyODPS nodes can consume directly. The operation is similar to using pip.
Prerequisites
Before you begin, ensure that you have:
PyODPS V0.11.3 or later installed
Docker installed and running (required for Docker mode, which is the default)
Python 3 (recommended; Python 2 packaging may fail for new projects)
Packaging scenarios at a glance
Use this table to find the command for your scenario, then follow the detailed instructions in the sections below.
| Scenario | Command |
|---|---|
| Package a PyPI package (Docker mode) | pyodps-pack pandas |
| Package a PyPI package (non-Docker mode) | pyodps-pack --without-docker pandas |
| Package a specific version | pyodps-pack pandas==1.2.5 |
| Package for Python 2.7 in MaxCompute | pyodps-pack --mcpy27 pandas |
| Package for Python 2.7 in DataWorks | pyodps-pack --dwpy27 pandas |
| Package custom local code | pyodps-pack /<path_to_package>/test_package_root |
| Package from a Git repository | pyodps-pack git+https://github.com/aliyun/aliyun-odps-python-sdk.git |
| Package with binary dependencies | pyodps-pack --run-before install-gdal.sh gdal==3.6.0 |
How it works
pyodps-pack is installed automatically with PyODPS. On Linux and macOS, the executable is in the bin directory under the Python installation path. On Windows, it is in the Scripts directory. Run pyodps-pack commands from the Windows CLI, macOS Terminal, or Linux Shell — not from the DataWorks console, the MaxCompute client (odpscmd), or the Python CLI.
If the bin or Scripts directory is already in your PATH environment variable (set during Python installation or virtual environment activation), no additional setup is needed. Otherwise, navigate to the directory manually or add it to PATH.
After packaging completes, pyodps-pack prints all resolved dependency versions and writes packages.tar.gz to the current directory.
Choose a packaging mode
pyodps-pack supports two modes:
| Mode | When to use | How to enable |
|---|---|---|
| Docker mode (default) | All cases where Docker is available. Produces the most compatible packages. | Install Docker; no extra flag needed |
| Non-Docker mode | Only when Docker is not available in your environment | Add --without-docker |
Packages built in non-Docker mode may not work in MaxCompute or DataWorks. Use Docker mode whenever possible.
Set up Docker
pyodps-pack calls Docker automatically and pulls the required images — no manual image management needed.
Linux: See Install Docker Engine.
macOS or Windows:
Individual developers: use Docker Desktop.
Enterprise users who cannot use Docker Desktop: use Rancher Desktop (open source).
pyodps-pack is not tested in other Docker environments such as minikube. Availability in those environments is not guaranteed.
Windows-specific notes:
Windows Server may be required for the Docker service to start, but many enterprises disable Windows Server for security reasons. If Docker fails to start, switch to Linux or try enabling Windows Server.
On Windows 10 with Rancher Desktop,
containerdmay not work as the container engine. Usedockerdinstead. See Container Engine for configuration steps.
Set up non-Docker mode
Use non-Docker mode only when Docker is unavailable. If you encounter errors or the generated package does not work, switch to Docker mode.
Add --without-docker to any pyodps-pack command. Before using this mode:
Make sure
pipis installed in your Python environment.On Windows, install Git Bash (included in Git for Windows).
对于专有云环境,如果您的MaxCompute或DataWorks基于Arm64机型部署,您需要额外增加--arch aarch64参数指定打包需要的架构。通常Docker Desktop或Rancher Desktop已经安装了跨平台打包所需的binfmt相关组件,您也可以使用以下命令安装相关的虚拟环境。
docker run --privileged --rm tonistiigi/binfmt --install arm64上述命令要求Linux Kernel版本高于4.8,具体请参考binfmt。
Package PyPI dependencies
Some packages have optional dependencies that pyodps-pack does not include automatically. For example, pandas requires openpyxl when you use the to_excel method. Check the third-party package documentation and add any optional dependencies explicitly to the command.
Usage notes:
Use Python 3 for new MaxCompute projects. Packaging with Python 2 may fail.
For existing Python 2 projects, migrate to Python 3 to simplify maintenance.
On Linux, prefix
pyodps-packcommands withsudoto ensure Docker has the necessary permissions.On macOS, do not use
sudowithpyodps-pack— doing so may cause permission errors.
Docker mode:
pyodps-pack pandasNon-Docker mode:
pyodps-pack --without-docker pandasSpecific version:
pyodps-pack pandas==1.2.5After packaging, pyodps-pack prints the resolved versions of all included packages:
Package Version
--------------- -------
numpy 1.21.6
pandas 1.2.5
python-dateutil 2.8.2
pytz 2022.6
six 1.16.0The output archive packages.tar.gz is written to the current directory.
Package for Python 2.7
If you need a Python 2.7 package, the command depends on where the package will run. See PyODPS DataFrame for context.
For MaxCompute:
pyodps-pack --mcpy27 pandasFor DataWorks:
pyodps-pack --dwpy27 pandasPackage custom code
pyodps-pack packages custom Python projects built with setup.py or pyproject.toml. See Build System Interface for details.
The following example packages a project based on pyproject.toml with this directory structure:
test_package_root
├── test_package
│ ├── __init__.py
│ ├── mod1.py
│ └── subpackage
│ ├── __init__.py
│ └── mod2.py
└── pyproject.tomlExample pyproject.toml:
[project]
name = "test_package"
description = "pyodps-pack example package"
version = "0.1.0"
dependencies = [
"pandas>=1.0.5"
]Run the following command, replacing <path_to_package> with the parent directory of test_package_root:
pyodps-pack /<path_to_package>/test_package_rootThis compresses the project and all its dependencies into packages.tar.gz.
Package code from a Git repository
Pass the Git URL directly:
pyodps-pack git+https://github.com/aliyun/aliyun-odps-python-sdk.gitTo package a specific branch or tag:
pyodps-pack git+https://github.com/aliyun/aliyun-odps-python-sdk.git@v0.11.2.2If the packaging process requires build-time dependencies (such as Cython), specify them with --install-requires. These dependencies are used during packaging but are not necessarily included in the output archive.
pyodps-pack \
--install-requires cython \
git+https://github.com/aliyun/aliyun-odps-python-sdk.git@v0.11.2.2Alternatively, list build-time dependencies in a file that follows the requirements.txt format and pass it with --install-requires-file:
cython>0.29pyodps-pack \
--install-requires-file install-requires.txt \
git+https://github.com/aliyun/aliyun-odps-python-sdk.git@v0.11.2.2Package binary dependencies
Some packages include binary components — such as dynamic-link libraries — that must be compiled before packaging. Use --run-before to run a Bash script that installs these binary components before pyodps-pack runs.
The following example packages GDAL 3.6.0, which requires libgdal (version > 3.6.0) and PROJ (version 6.0 or later), both compiled with CMake.
Write a script named
install-gdal.shthat compiles and installs PROJ and GDAL:#!/bin/bash set -e cd /tmp curl -o proj-6.3.2.tar.gz https://download.osgeo.org/proj/proj-6.3.2.tar.gz tar xzf proj-6.3.2.tar.gz cd proj-6.3.2 mkdir build && cd build cmake .. cmake --build . cmake --build . --target install cd /tmp curl -o gdal-3.6.0.tar.gz http://download.osgeo.org/gdal/3.6.0/gdal-3.6.0.tar.gz tar xzf gdal-3.6.0.tar.gz cd gdal-3.6.0 mkdir build && cd build cmake .. cmake --build . cmake --build . --target installRun
pyodps-packwith the script and theoldest-supported-numpybuild dependency:pyodps-pack --install-requires oldest-supported-numpy --run-before install-gdal.sh gdal==3.6.0
Parameters
| Parameter | Description |
|---|---|
-r, --requirement <file> | Dependency file required for packaging. Can be specified multiple times. |
-o, --output <file> | Name of the output archive. Default: packages.tar.gz. |
--install-requires <item> | PyPI dependency required at packaging time (not necessarily included in the output archive). Can be specified multiple times. |
--install-requires-file <file> | File listing PyPI dependencies required at packaging time, in requirements.txt format. Can be specified multiple times. |
--run-before <script-file> | Bash script to run before packaging. Typically used to install binary dependencies. |
-x, --exclude <dependency> | PyPI dependency to exclude from the output archive. Can be specified multiple times. |
--no-deps | Exclude transitive dependencies from the output archive. |
-i, --index-url <index-url> | PyPI index URL. Defaults to global.index-url from pip config list, which is configured in pip.conf. |
--trusted-host <host> | HTTPS host whose certificate errors should be ignored during packaging. |
-l, --legacy-image | Use a CentOS 5 image for packaging. Ensures compatibility with older environments such as earlier versions of Apsara Stack. |
--mcpy27 | Generate a package for Python 2.7 in MaxCompute. Enables --legacy-image by default. |
--dwpy27 | Generate a package for Python 2.7 in DataWorks. Enables --legacy-image by default. |
--prefer-binary | Prefer older binary distributions on PyPI over newer source-only distributions. |
| 指定目标包面向的硬件架构,目前仅支持x86_64和AArch64(或 ARM64),默认为x86_64。如果您不在专有云使用MaxCompute或DataWorks,则不需要指定该参数。 |
| 指定目标包面向的Python版本,可使用3.6或者36表示Python 3.6。如果您不在专有云使用MaxCompute或DataWorks,则不需要指定该参数。 |
--docker-args <args> | Additional arguments to pass to Docker. Enclose multiple arguments in double quotation marks, for example: --docker-args "--ip 192.168.1.10". |
--without-docker | Run pyodps-pack in non-Docker mode. Packages with binary dependencies may fail or produce unusable output. |
--without-merge | Keep .whl files instead of merging them into a .tar.gz archive. |
--debug | Print detailed command execution output for troubleshooting. |
Troubleshoot packaging failures
| Symptom | Likely cause | Resolution |
|---|---|---|
| Package generated in non-Docker mode does not work in MaxCompute or DataWorks | Non-Docker mode lacks the matching OS environment | Switch to Docker mode: remove --without-docker |
| Docker fails to start on Windows | Windows Server is disabled by your organization | Switch to Linux, or try enabling Windows Server |
containerd error with Rancher Desktop on Windows 10 | Incompatible container engine | Switch to dockerd in Rancher Desktop Container Engine settings |
| Packaging fails with Python 2 | Python 2 packaging may fail for newer packages | Use Python 3; if Python 2.7 is required, use --mcpy27 (MaxCompute) or --dwpy27 (DataWorks) |
| Optional dependency missing at runtime | pyodps-pack does not include optional dependencies automatically | Add the optional dependency explicitly to the command, for example: pyodps-pack pandas openpyxl |
| Binary dependency compilation fails | Missing system libraries or build tools | Use --run-before to install required system libraries before packaging; add --debug for detailed output |
What's next
After generating the package, upload it to MaxCompute as an archive resource and reference it in your PyODPS node. See Reference a third-party package in a PyODPS node.