Create a custom image when the default DataWorks runtime environment does not meet your task dependency requirements, such as PyODPS or Shell tasks needing Python libraries like pandas or jieba. Custom images pre-package dependencies into a reusable, standardized environment, ensuring consistency and improving efficiency.
Usage notes
Version limits:
All editions support creating and using custom images.
Only Professional Edition and higher support image building.
Resource group limits: This feature supports only serverless resource groups.
For old resource groups, please use O&M Assistant to install external dependencies.
Permission limits: You need the AliyunDataWorksFullAccess or ModifyResourceGroup policy.
For authorization details, see Manage product-level and console access with RAM policies.
Quotas and limits
Image quantity: Custom image limits vary by DataWorks edition.
Basic and Standard Editions: 10.
Professional Edition: 50.
Enterprise Edition: 100.
Build concurrency: You can build up to two images simultaneously per region.
ACR image requirements:
Instance edition: Only Enterprise Edition Alibaba Cloud ACR instances are supported.
Instance architecture: Only AMD64 architecture is supported.
Image size: A single image cannot exceed 5 GB.
Timezone configuration: Install the
tzdatapackage to prevent container exceptions due to timezone inconsistencies.
Image build: Only custom images based on DataWorks official images support persistence builds. Images referencing Alibaba Cloud ACR images do not; they are pulled and deployed for every task run.
Supported node types and methods:
Node type
Official image build
ACR image build
PyODPS2PyODPS3EMR SparkEMR Spark SQLEMR SHELLShellPythonNotebookCDHAssignment Node
Procedure
1. Create a custom image
You can create custom images by referencing DataWorks Official Images or Alibaba Cloud Container Registry Image. The configuration parameters vary based on the selected reference type:
Create based on DataWorks official images
Log on to the DataWorks console and click Image Management in the left navigation pane.
On the DataWorks Official Images tab, select the target image as the base and click Create Custom Image in the Actions column. The system populates the target image information in the dialog box. Configure other parameters as follows.
Reference Type: Default is DataWorks Official Images. Image Namespace: Default is DataWorks Default. Image Repository: Default is DataWorks Default.
Parameter
Description
Image Name/ID
The target official image is selected by default. You can switch it as needed.
Visible Scope
Configure the visibility of the custom image: Visible Only to Creator or Visible to all.
Module
Custom images are currently only supported for DataStudio.
Supported Task Type
Select the task types to support based on the image type. When running matching task nodes in DataStudio, this image can be configured as the runtime image.
Installation Package
Add third-party packages as needed. You can select multiple modes and install multiple packages simultaneously. The following methods are supported:
Quick Install: Select
Python2,Python3, orYumfrom the package drop-down list to directly select the environment and resources to install.If the package is not listed, switch to Script mode for manual installation.
Manual Input: Select
Scriptfrom the package drop-down list. You can manually enter installation commands in the script editor. Use the following example commands to download third-party packages.pip example:
pip install xx(for Python 2).pip3 example:
/home/tops/bin/pip3 install 'urllib3<2.0'(for Python 3).yum example:
yum install -y git.wget example:
wget git.For more installation commands, see Installation commands.
ImportantTo install packages from the Internet, the VPC bound to the Serverless resource group must have Internet access.
Click OK to complete the creation.
Create based on Alibaba Cloud Container Registry Image (ACR Image)
To create a custom image from an ACR image, activate Container Registry. Only Enterprise Edition ACR instances with AMD64 architecture are supported for creating DataWorks images.
Log on to the DataWorks console and click Image Management in the left navigation pane.
On the Custom Images tab, click Create Image. Configure the key parameters in the dialog box:
Parameter
Description
Reference Type
Select Alibaba Cloud Container Registry Image.
Image Instance ID
Select the Enterprise Edition instance created in Alibaba Cloud Container Registry.
Image Namespace
Select the namespace under the image instance.
Image Repository
Select the image repository under the image instance.
Image Version
Select the image version (tag) from the selected repository to create the custom image.
VPC to Associate
Select the VPC network bound to the image instance. For details on configuring VPC networks, see Configure a VPC ACL.
ImportantYou can configure only one VPC connection between DataWorks and the ACR instance.
Synchronize to MaxCompute
The default is No. This option depends on the selected Image Instance. It is selectable only for Standard Edition or higher ACR instances; otherwise, it is disabled.
Select Yes: A DataWorks custom image is generated by default, and a MaxCompute image is synchronously built when the DataWorks image is published.
For details, see Build a MaxCompute custom image in a personal development environment.
Select No: Only a DataWorks custom image is generated; it will not be synchronously built as a MaxCompute image.
Visible Scope
Configure the visibility of the custom image: Visible Only to Creator or Visible to all.
Module
Custom images are currently only supported for DataStudio.
Supported Task Type
ACR images are started using the method:
Start command + user task code file path. The following are the supported task types and their default start commands:ShellPython: To use a custom image created from an Alibaba Cloud ACR image for Python tasks, verify that your ACR image instance contains a Python environment; otherwise, Python tasks are not supported.NotebookTo use a custom image created from an Alibaba Cloud ACR image for Notebook tasks, use the DataWorks Notebook base image as the base for your ACR image to provide the runtime environment. DataWorks Notebook base image:
dataworks-public-registry.cn-shanghai.cr.aliyuncs.com/public/dataworks-notebook:py3.11-ubuntu22.04-20241202.Ensure that the environment used to build the image has Internet access capability to fetch the DataWorks Notebook base image.
Click OK to complete the creation.
Create based on personal development environment instances
The new DataStudio supports creating new images from personal development environments. For details, see Create a DataWorks image from a personal development environment.
2. Test and publish a custom image
On the tab of the DataWorks console, Publish the target image. You can publish only successfully tested images. If testing fails, click in the Actions column to modify the image configuration.
Note the following when testing and publishing:
Select a serverless resource group when testing custom images.
For images based on ACR or personal development environments, ensure the serverless resource group VPC matches the image container VPC.
If your custom image fetches third-party packages from the Internet and testing fails for a long time, check whether the VPC bound to the test resource group has Internet access capability.
3. Associate the image with a workspace
After publishing, you can bind the image to workspaces.
On the tab of the DataWorks console, find the published custom image.
Click in the Actions column to bind the custom image to a workspace.
4. Use the image in a task
Use image in new DataStudio
Enter DataStudio: Go to the DataWorks Workspace List page, switch to the target region in the top navigation bar, find the target workspace, and click in the Actions column.
Configure image: In DataStudio, find the task node to test with the custom image, click Scheduling on the right, and configure resource properties.
Resource Group: Select a serverless resource group.
If the target resource group is not displayed, check whether the resource group is bound to the current workspace. You can go to the Resource Group List page, find the target resource group, and click Associate Workspace in the Actions column to complete the binding.
ImportantThe resource group must match the test resource group used during image publication.
Image: Select the published Custom Image.
If you switch images, you must publish the node for the change to take effect in the production environment.

Debug node: In the Debugging Configuration panel on the right, configure Compute Resource, Resource Group, Compute CUs, Image, and Script Parameters, and then click Run in the top toolbar.
Deploy node: Click Deploy in the top toolbar to publish the node to the production environment.
Use image in old DataStudio
Enter DataStudio: Log on to the DataWorks console. After switching to the target region, click in the left navigation pane. Select the corresponding workspace from the drop-down list and click Go to Data Development.
Configure image: In DataStudio, find the task node to test with the custom image, click Properties on the right, and configure resource properties.
Resource Group: Select a serverless resource group.
If the target resource group is not displayed, check whether the resource group is bound to the current workspace. You can go to the Resource Group List page, find the target resource group, and click Associate Workspace in the Actions column to complete the binding.
ImportantThe resource group must match the test resource group used during image publication.
Image: Select the published Custom Image.
If you switch images, you must publish the node for the change to take effect in the production environment.

Debug node: Click Run with Parameters (
) in the top toolbar, configure Resource Group, CUs for Running, and Image, and then click Run.Deploy node: Click Save and Submit in the top toolbar to publish the node to the production environment.
5. Build a persistent image
We recommend building a persistent image after verification. This prevents task failures caused by unexpected version changes or tampered dependencies.
Standard custom images redeploy for every run, increasing runtime and costs. Persistent images are built once and reused, improving efficiency and consistency while reducing costs. Building persistent images is only supported for custom images created based on official images.
On the tab of the DataWorks console, find the published custom image.
Click in the Actions column to build the custom image into a persistent image.
In the Resource Group for Which You Want to Create Image dialog box, configure the resource group used to build the image, and then click Continue.
ImportantTo avoid build failures caused by network issues, ensure the resource group matches the test resource group selected when publishing the custom image.
Building the image takes approximately 5 to 10 minutes, depending on the image size. After a successful build, the status of the target image changes to Published (Created).
Billing
Building an image incurs computing fees based on CU quantity × Build duration. The system allocates 0.5 CUs by default. For billing details, see Serverless resource group billing standards.
Best practices for production
Follow these recommendations for stable and efficient custom images in production:
Persistent image: We recommend building persistent images for published and stable images. This avoids re-installing dependencies every time a task runs, shortening startup time, reducing computing costs, and improving stability.
Environment consistency: Ensure consistency in VPC binding and network configuration for the Serverless resource groups used for testing, building, and production scheduling, especially when accessing private ACR repositories or the Internet.
Version locking: When installing dependencies via
Script, we strongly recommend explicitly specifying version numbers (e.g.,pip install pandas==1.5.3) to avoid unexpected behavior changes caused by upstream library updates.Rollback plan: If a production task fails after an image update, you can roll back to the previous version via the task publication history or repoint the image to an older, stable version in the scheduling configuration.
Use cases
This example shows how to use a custom image for word segmentation in a PyODPS node. You will process data in a MaxCompute table and store the results for downstream nodes. You can pre-install the jieba segmentation tool package in a custom image, then use this image in a PyODPS task to process the data and store the results in a new table, seamlessly integrating into the downstream scheduling flow.
Create test data.
Create a DataWorks workspace and bind MaxCompute computing resources. For details, see Create a workspace and Computing resource management.
In DataStudio, create an ODPS node (legacy DataStudio) or MaxCompute SQL node (new Data Studio), create a test table, and add test data.
NoteThe following example uses scheduling parameters. Set the parameter name to
bdayand the parameter value to$[yyyymmdd]in the Scheduling panel on the right.Save and deploy.
Create a custom image.
See 1. Create a custom image. Key parameters are as follows:
Image Name/ID: Select
dataworks_pyodps_task_pod, the DataWorks PyODPS node official image.Supported Task Type: Support
PyODPS2andPyODPS 3.Packages: Select
Python3andjieba.
Publish the custom image and modify the owner workspace. For details, see Publish a custom image and Modify owner workspace.
Use the custom image in a scheduling task.
In DataStudio, create a PyODPS3 node and configure the following content:
Set the following key parameters in the scheduling configuration on the right:
Scheduling Parameter: Parameter name
bday, parameter value$[yyyymmdd].Resource Group: Select the Serverless resource group, which must be the same as the test resource group selected when publishing the image.
Image: Select the published custom image bound to the current workspace.
Node debugging.
If using old DataStudio, click Run with Parameters (
) in the top toolbar, configure Resource Group Name, CUs for Running, Image, and Custom Parameters, and then click Run.If using new DataStudio, configure Compute Resource, Resource Group, Compute CUs, Image, and Script Parameters in the Debugging Configuration panel on the right, and then click Run in the top toolbar.
(Optional) Create a temporary query (legacy DataStudio) or create an SQL file in your personal directory (new Data Studio), and use the following SQL to query whether data is generated in the output table.
-- Replace <Partition Date> with the specific date. SELECT * FROM participle_tb WHERE ds=<Partition Date>;Deploy the PyODPS node to the production environment.
NoteImage modifications in DataStudio are not synchronized to the production environment. You must publish the task for the changes to take effect in production. For details, see Publish tasks (Old DataStudio) or Node/Workflow publication (New DataStudio).
Build the custom image into a persistent image. For details, see 5. Build a persistent image.
FAQ
Q: Python task error "urllib3 v2.0 only supports OpenSSL 1.1.1+".
A: urllib3 v2.0 only supports OpenSSL 1.1.1+. You can downgrade urllib3 to be compatible with OpenSSL. For example, force the urllib3 version when installing third-party packages: /home/tops/bin/pip3 install urllib3==1.26.16.
References
Installation commands
If you use the Script method to configure installation commands for custom images, refer to the following commands:
If depending on a PyODPS 2 node, execute the following command.
pip install <package_name> -i https://pypi.tuna.tsinghua.edu.cn/simplepip install <package_name>NoteAfter executing the command, if prompted to upgrade the PIP version, execute
pip install --upgrade pip.If depending on a PyODPS 3 node, execute the following command.
/home/tops/bin/pip3 install <package_name> -i https://pypi.tuna.tsinghua.edu.cn/simple/home/tops/bin/pip3 install <package_name>NoteAfter executing the command, if prompted to upgrade the PIP version, execute
/home/tops/bin/pip3 install --upgrade pip.If the error
/home/admin/usertools/tools/cmd-0.sh: line 3: /home/tops/bin/python3: No such file or directoryoccurs, please submit a ticket to request permission activation.
Refer to the following Python public mirror sources and switch as needed.
Organization
Mirror address
Alibaba Cloud (Aliyun)
https://mirrors.aliyun.com/pypi/simple/ImportantObtaining Python packages from Alibaba Cloud does not require Internet access capability.
Tsinghua University (Tsinghua)
https://pypi.tuna.tsinghua.edu.cn/simpleUniversity of Science and Technology of China (USTC)
https://pypi.mirrors.ustc.edu.cn/simple/
> Change Workspace