Create a custom image when the default DataWorks runtime lacks required dependencies for your PyODPS or Shell tasks (e.g., Python libraries like pandas or jieba). Custom images package all dependencies into a reusable, standardized runtime environment, ensuring consistency and significantly improving development efficiency.
Usage notes
Edition requirements:
All editions: Create and use custom images.
Professional edition or higher: Build persistent images.
Resource group support: Custom images work only with serverless resource groups.
Legacy resource groups: Use O&M Assistant to install external dependencies.
Permissions required: You need one of the following policies: AliyunDataWorksFullAccess or ModifyResourceGroup.
For more information, see Product and console access control: RAM Policy.
Quotas and limits
Image count limits:
Basic and standard editions: 10.
Professional Edition: 50.
Enterprise Edition: 100.
Build concurrency: Maximum 2 concurrent builds per region.
ACR image requirements:
Instance edition: Enterprise edition.
Architecture: AMD64.
Image size: Maximum 5 GB per image.
Persistent builds: Only supported for images built from DataWorks official images (not ACR images).
Supported node types:
Node type
Build from official images
Build from ACR images
PyODPS2PyODPS3EMR SparkEMR Spark SQLEMR SHELLShellPythonNotebookCDHAssignment node
Procedure
Create a custom image
Choose one of three methods to create your custom image:
Option 1: Build from DataWorks official images
Log in to the DataWorks console and click Image Management in the left navigation pane.
On the DataWorks Official Images tab, select your base image and click Create Custom Image in the Actions column.
Configure the following parameters:
Parameter
Description
Image Name/ID
The selected official image (you can switch if needed).
Visible Scope
Visible Only to Creator or Visible to all.
Module
Currently limited to DataStudio.
Supported Task Type
Select node types that can use this image.
Installation Package
Add third-party packages using one of these methods:
Quick install: Select
Python2,Python3, orYumfrom the dropdown and choose packages.Script mode: Select
Scriptand manually enter installation commands:pip:
pip install xx.pip3:
/home/tops/bin/pip3 install 'urllib3<2.0'.yum:
yum install -y git.wget:
wget git.For more information, see Appendix: Installation command reference.
ImportantTo install or depend on a third-party package from the Internet, the virtual private cloud (VPC) attached to the Serverless resource group must have Internet access.
Click OK.
Option 2: Build from ACR Images
To create custom images from ACR, enable Container Registry first.
Log in to the DataWorks console and click Image Management in the left navigation pane.
On the Custom Images tab, click Create Image and configure:
Parameter
Description
Reference Type
Select Alibaba Cloud Container Registry Image.
Image Instance ID
Select your ACR enterprise instance.
Image Namespace
Select a namespace from the instance.
Image Repository
Select an image repository.
Image Version
Select the version to use.
VPC to Associate
Select the VPC bound to your ACR instance. For more information, see Configure VPC access.
ImportantDataWorks supports selecting only one VPC per ACR instance.
Synchronize to MaxCompute
Defaults to No and is available only for ACR instances running Standard Edition or higher.
Yes: Generates a DataWorks custom image and synchronously builds it as a MaxCompute image during publishing.
For more information, see Create a MaxCompute image from a personal development environment.
No: Generates only a DataWorks custom image without MaxCompute synchronization.
Visible Scope
Visible Only to Creator or Visible to all.
Module
Currently limited to DataStudio.
Supported Task Type
ACR images use entrypoint:
startup command + task_script_path.Shell: The default command.Python: Ensure your ACR base image includes a Python runtimeNotebookUse DataWorks Notebook base image:
dataworks-public-registry.cn-shanghai.cr.aliyuncs.com/public/dataworks-notebook:py3.11-ubuntu22.04-20241202.Ensure your build environment has internet access to pull this base image.
Click OK.
Option 3: Build from personal development environment
Data Studio supports creating images from personal development environments. For more information, see Create a DataWorks image from a personal development environment.
Test and publish the image
In the DataWorks console, go to .
Locate your image and click Publish in the Actions column.
If the test fails, you can click to update the image configuration.
Notes:
Resource group: Select a serverless resource group.
VPC consistency: For ACR or personal environment images, ensure the Serverless resource group and ACR instance use the same VPC.
Internet access: If the test times out while fetching packages, verify your test resource group's VPC has Internet access.
Assign the image to workspaces
After publishing, assign the image to workspaces:
On the tab, find your published image.
Click in the Actions column.
Use the image in tasks
New version of Data Studio:
Go to Data Studio: Go to the DataWorks Workspaces page, switch to your target region, find your workspace, and click .
Configure the image: In your task node, click Scheduling in the right pane.
Resource Group: Select a serverless resource group.
If the target resource group is not displayed, go to the Resource Group page and click Associate Workspace.
ImportantEnsure this resource group matches the test resource group used when publishing the image.
Image: Select your published custom image.
Changes to the image require republishing the node to take effect in production.

Debug the node: In the Debugging Configurations pane, configure Computing Resource, Resource Group, CUs For Computing, Image, and Script Parameters, then click Running Duration in the toolbar.
Publish the node: Click Publish in the toolbar to deploy to production.
Legacy version of DataStudio
Go to DataStudio: Log on to the DataWorks console, switch to your region, click , select your workspace, and click Go to Data Development.
Configure the image: In your task node, click Properties in the right pane.
Resource Group: Select a serverless resource group.
If the target resource group is not displayed, go to the Resource Group page and click Associate Workspace.
ImportantEnsure this resource group matches the test resource group used when publishing the image.
Image: Select your published custom image.
Changes to the image require republishing the node to take effect in production.

Debug the node: Click Run with Parameters (
), configure Resource Group Name, CUs for Running, and Image, then click Run.Publish the node: Click Save and Submit to deploy to production.
Build a persistent image
We strongly recommend building persistent images after publishing and testing. This prevents runtime failures caused by upstream dependency changes or unspecified versions.
Regular custom images redeploy on every run, increasing execution time and compute costs. Persistent images build once and reuse indefinitely, improving efficiency and reducing costs.
Go to and locate your published image.
Click in the Actions column.
In the Resource Group for Which You Want to Create Image dialog, select a resource group and click Continue.
ImportantTo prevent network-related failures, ensure this resource group matches the test resource group used when publishing.
Building takes 5-10 minutes depending on image size. Upon success, the status changes to Published (Build Succeeded).
Billing
Image builds are charged as: CU count × Build duration. The system allocates 0.5 CUs by default. For more information about billing, see Billing for Serverless resource groups.
Production best practices
Follow these recommendations for stable, efficient, cost-effective use in production:
Persistent image: Build persistent images from published configurations with stable dependencies. This eliminates reinstallation on every run, reducing startup time, compute costs, and improving stability.
Environment consistency: Ensure VPCs and network configurations match across test, build, and production serverless resource groups, especially when accessing private ACR repositories or the internet.
Version locking: When installing dependencies via
Scriptmode, always specify versions such aspip install pandas==1.5.3. This prevents unexpected behavior from upstream library updates.Rollback plan: If production tasks fail after updating an image:
Roll back via task publishing history.
Revert to a previous stable version in scheduling configuration.
Example
This example demonstrates using a custom image with PyODPS to perform word segmentation on a MaxCompute table. Segment text in a MaxCompute table column and store results in another table for downstream scheduling.
Create test data.
Create a DataWorks workspace with attached MaxCompute resources. For more information, see Create a workspace, Add a data source or register a cluster to a workspace, and Associate a computing resource.
In Data Studio, create an ODPS node (legacy) or MaxCompute SQL node (new version):
NoteThis example uses scheduling parameters. Set parameter name to
bdaywith value to$[yyyymmdd]in the Scheduling pane.Save and publish the node.
Create a custom image.
Create a custom image with these key parameters:
Image Name/ID: Select
dataworks_pyodps_task_pod(DataWorks official PyODPS image)Supported Task Type:
PyODPS2andPyODPS 3.Installation Package: Select
Python 3and addjieba.
Publish and assign images.
Publish the image and assign it to your workspace. For more information, see Test and publish the image and Assign the image to workspaces.
Create PyODPS Tasks.
In Data Studio, create a PyODPS3 node with this code
Configure scheduling parameters in the right pane:
Scheduling Parameters:
bday=$[yyyymmdd].Resource Group: Same Serverless group used for image testing.
Image: Your published custom image.
Debug the node.
Legacy version: Click Run with Parameters (
), configure settings, and click Run.New version: Configure in Debugging Configurations pane and click Running Duration in toolbar
(Optional) Verify results with a SQL query:
-- Replace <partition_date> with actual date SELECT * FROM participle_tb WHERE ds=<partition_date>;Publish the PyODPS node to production.
NoteImage changes in Data Studio don't sync to production automatically. You need to publish the task for changes to take effect. For more information, see Deploy nodes or Node or workflow deployment.
Build persistent Images.
Build your custom image as a persistent image. For more information, see Build a persistent image.
References
Appendix: Installation command reference
When using Script mode to configure installation commands:
For PyODPS 2 Dependencies:
pip install <package_name>NoteIf prompted to upgrade pip, run:
pip install --upgrade pip.For PyODPS 3 dependencies:
/home/tops/bin/pip3 install <package_name>NoteIf prompted to upgrade pip, run:
/home/tops/bin/pip3 install --upgrade pip.If you encounter error
/home/admin/usertools/tools/cmd-0.sh: line 3: /home/tops/bin/python3: No such file or directory, submit a ticket to request permissions.
Python mirror sources
Switch to these public mirrors as needed:
Organization
Mirror URL
Alibaba Cloud (Aliyun)
https://mirrors.aliyun.com/pypi/simple/ImportantNo internet access required for Alibaba Cloud mirrors.
Tsinghua University
https://pypi.tuna.tsinghua.edu.cn/simpleUSTC
https://pypi.mirrors.ustc.edu.cn/simple/
> Change Workspace