All Products
Search
Document Center

DataWorks:Custom images

Last Updated:Jan 27, 2026

Create a custom image when the default DataWorks runtime environment does not meet your task dependency requirements, such as PyODPS or Shell tasks needing Python libraries like pandas or jieba. Custom images pre-package dependencies into a reusable, standardized environment, ensuring consistency and improving efficiency.

Usage notes

  • Version limits:

    • All editions support creating and using custom images.

    • Only Professional Edition and higher support image building.

  • Resource group limits: This feature supports only serverless resource groups.

    For old resource groups, please use O&M Assistant to install external dependencies.
  • Permission limits: You need the AliyunDataWorksFullAccess or ModifyResourceGroup policy.

    For authorization details, see Manage product-level and console access with RAM policies.

Quotas and limits

  • Image quantity: Custom image limits vary by DataWorks edition.

    • Basic and Standard Editions: 10.

    • Professional Edition: 50.

    • Enterprise Edition: 100.

  • Build concurrency: You can build up to two images simultaneously per region.

  • ACR image requirements:

    • Instance edition: Only Enterprise Edition Alibaba Cloud ACR instances are supported.

    • Instance architecture: Only AMD64 architecture is supported.

    • Image size: A single image cannot exceed 5 GB.

    • Timezone configuration: Install the tzdata package to prevent container exceptions due to timezone inconsistencies.

  • Image build: Only custom images based on DataWorks official images support persistence builds. Images referencing Alibaba Cloud ACR images do not; they are pulled and deployed for every task run.

  • Supported node types and methods:

    Node type

    Official image build

    ACR image build

    PyODPS2

    Supported

    Unsupported

    PyODPS3

    Supported

    Unsupported

    EMR Spark

    Supported

    Unsupported

    EMR Spark SQL

    Supported

    Unsupported

    EMR SHELL

    Supported

    Unsupported

    Shell

    Supported

    Supported

    Python

    Supported

    Supported

    Notebook

    Unsupported

    Supported

    CDH

    Supported

    Unsupported

    Assignment Node

    Supported

    Unsupported

Procedure

1. Create a custom image

You can create custom images by referencing DataWorks Official Images or Alibaba Cloud Container Registry Image. The configuration parameters vary based on the selected reference type:

Create based on DataWorks official images

  1. Log on to the DataWorks console and click Image Management in the left navigation pane.

  2. On the DataWorks Official Images tab, select the target image as the base and click Create Custom Image in the Actions column. The system populates the target image information in the dialog box. Configure other parameters as follows.

    Reference Type: Default is DataWorks Official Images. Image Namespace: Default is DataWorks Default. Image Repository: Default is DataWorks Default.

    Parameter

    Description

    Image Name/ID

    The target official image is selected by default. You can switch it as needed.

    Visible Scope

    Configure the visibility of the custom image: Visible Only to Creator or Visible to all.

    Module

    Custom images are currently only supported for DataStudio.

    Supported Task Type

    Select the task types to support based on the image type. When running matching task nodes in DataStudio, this image can be configured as the runtime image.

    Installation Package

    Add third-party packages as needed. You can select multiple modes and install multiple packages simultaneously. The following methods are supported:

    • Quick Install: Select Python2, Python3, or Yum from the package drop-down list to directly select the environment and resources to install.

      If the package is not listed, switch to Script mode for manual installation.
    • Manual Input: Select Script from the package drop-down list. You can manually enter installation commands in the script editor. Use the following example commands to download third-party packages.

      • pip example: pip install xx (for Python 2).

      • pip3 example: /home/tops/bin/pip3 install 'urllib3<2.0' (for Python 3).

      • yum example: yum install -y git.

      • wget example: wget git.

        For more installation commands, see Installation commands.
    Important

    To install packages from the Internet, the VPC bound to the Serverless resource group must have Internet access.

  3. Click OK to complete the creation.

Create based on Alibaba Cloud Container Registry Image (ACR Image)

To create a custom image from an ACR image, activate Container Registry. Only Enterprise Edition ACR instances with AMD64 architecture are supported for creating DataWorks images.

  1. Log on to the DataWorks console and click Image Management in the left navigation pane.

  2. On the Custom Images tab, click Create Image. Configure the key parameters in the dialog box:

    Parameter

    Description

    Reference Type

    Select Alibaba Cloud Container Registry Image.

    Image Instance ID

    Select the Enterprise Edition instance created in Alibaba Cloud Container Registry.

    Image Namespace

    Select the namespace under the image instance.

    Image Repository

    Select the image repository under the image instance.

    Image Version

    Select the image version (tag) from the selected repository to create the custom image.

    VPC to Associate

    Select the VPC network bound to the image instance. For details on configuring VPC networks, see Configure a VPC ACL.

    Important

    You can configure only one VPC connection between DataWorks and the ACR instance.

    Synchronize to MaxCompute

    The default is No. This option depends on the selected Image Instance. It is selectable only for Standard Edition or higher ACR instances; otherwise, it is disabled.

    • Select Yes: A DataWorks custom image is generated by default, and a MaxCompute image is synchronously built when the DataWorks image is published.

      For details, see Build a MaxCompute custom image in a personal development environment.
    • Select No: Only a DataWorks custom image is generated; it will not be synchronously built as a MaxCompute image.

    Visible Scope

    Configure the visibility of the custom image: Visible Only to Creator or Visible to all.

    Module

    Custom images are currently only supported for DataStudio.

    Supported Task Type

    ACR images are started using the method: Start command + user task code file path. The following are the supported task types and their default start commands:

    • Shell

    • Python: To use a custom image created from an Alibaba Cloud ACR image for Python tasks, verify that your ACR image instance contains a Python environment; otherwise, Python tasks are not supported.

    • Notebook

      • To use a custom image created from an Alibaba Cloud ACR image for Notebook tasks, use the DataWorks Notebook base image as the base for your ACR image to provide the runtime environment. DataWorks Notebook base image: dataworks-public-registry.cn-shanghai.cr.aliyuncs.com/public/dataworks-notebook:py3.11-ubuntu22.04-20241202.

      • Ensure that the environment used to build the image has Internet access capability to fetch the DataWorks Notebook base image.

  3. Click OK to complete the creation.

Create based on personal development environment instances

The new DataStudio supports creating new images from personal development environments. For details, see Create a DataWorks image from a personal development environment.

2. Test and publish a custom image

On the Image Management > Custom Images tab of the DataWorks console, Publish the target image. You can publish only successfully tested images. If testing fails, click image > Modify in the Actions column to modify the image configuration.

Note the following when testing and publishing:

  • Select a serverless resource group when testing custom images.

  • For images based on ACR or personal development environments, ensure the serverless resource group VPC matches the image container VPC.

  • If your custom image fetches third-party packages from the Internet and testing fails for a long time, check whether the VPC bound to the test resource group has Internet access capability.

3. Associate the image with a workspace

After publishing, you can bind the image to workspaces.

  1. On the Image Management > Custom Images tab of the DataWorks console, find the published custom image.

  2. Click image > Change Workspace in the Actions column to bind the custom image to a workspace.

4. Use the image in a task

Use image in new DataStudio

  1. Enter DataStudio: Go to the DataWorks Workspace List page, switch to the target region in the top navigation bar, find the target workspace, and click Shortcuts > Data Studio in the Actions column.

  2. Configure image: In DataStudio, find the task node to test with the custom image, click Scheduling on the right, and configure resource properties.

    • Resource Group: Select a serverless resource group.

      If the target resource group is not displayed, check whether the resource group is bound to the current workspace. You can go to the Resource Group List page, find the target resource group, and click Associate Workspace in the Actions column to complete the binding.
      Important

      The resource group must match the test resource group used during image publication.

    • Image: Select the published Custom Image.

      If you switch images, you must publish the node for the change to take effect in the production environment.

      image

  3. Debug node: In the Debugging Configuration panel on the right, configure Compute Resource, Resource Group, Compute CUs, Image, and Script Parameters, and then click Run in the top toolbar.

  4. Deploy node: Click Deploy in the top toolbar to publish the node to the production environment.

Use image in old DataStudio

  1. Enter DataStudio: Log on to the DataWorks console. After switching to the target region, click Data Development and O&M > Data Development in the left navigation pane. Select the corresponding workspace from the drop-down list and click Go to Data Development.

  2. Configure image: In DataStudio, find the task node to test with the custom image, click Properties on the right, and configure resource properties.

    • Resource Group: Select a serverless resource group.

      If the target resource group is not displayed, check whether the resource group is bound to the current workspace. You can go to the Resource Group List page, find the target resource group, and click Associate Workspace in the Actions column to complete the binding.
      Important

      The resource group must match the test resource group used during image publication.

    • Image: Select the published Custom Image.

      If you switch images, you must publish the node for the change to take effect in the production environment.

      image

  3. Debug node: Click Run with Parameters (image) in the top toolbar, configure Resource Group, CUs for Running, and Image, and then click Run.

  4. Deploy node: Click Save and Submit in the top toolbar to publish the node to the production environment.

5. Build a persistent image

Important

We recommend building a persistent image after verification. This prevents task failures caused by unexpected version changes or tampered dependencies.

Standard custom images redeploy for every run, increasing runtime and costs. Persistent images are built once and reused, improving efficiency and consistency while reducing costs. Building persistent images is only supported for custom images created based on official images.

  1. On the Image Management > Custom Images tab of the DataWorks console, find the published custom image.

  2. Click image > Create in the Actions column to build the custom image into a persistent image.

  3. In the Resource Group for Which You Want to Create Image dialog box, configure the resource group used to build the image, and then click Continue.

    Important

    To avoid build failures caused by network issues, ensure the resource group matches the test resource group selected when publishing the custom image.

  4. Building the image takes approximately 5 to 10 minutes, depending on the image size. After a successful build, the status of the target image changes to Published (Created).

Billing

Building an image incurs computing fees based on CU quantity × Build duration. The system allocates 0.5 CUs by default. For billing details, see Serverless resource group billing standards.

Best practices for production

Follow these recommendations for stable and efficient custom images in production:

  • Persistent image: We recommend building persistent images for published and stable images. This avoids re-installing dependencies every time a task runs, shortening startup time, reducing computing costs, and improving stability.

  • Environment consistency: Ensure consistency in VPC binding and network configuration for the Serverless resource groups used for testing, building, and production scheduling, especially when accessing private ACR repositories or the Internet.

  • Version locking: When installing dependencies via Script, we strongly recommend explicitly specifying version numbers (e.g., pip install pandas==1.5.3) to avoid unexpected behavior changes caused by upstream library updates.

  • Rollback plan: If a production task fails after an image update, you can roll back to the previous version via the task publication history or repoint the image to an older, stable version in the scheduling configuration.

Use cases

This example shows how to use a custom image for word segmentation in a PyODPS node. You will process data in a MaxCompute table and store the results for downstream nodes. You can pre-install the jieba segmentation tool package in a custom image, then use this image in a PyODPS task to process the data and store the results in a new table, seamlessly integrating into the downstream scheduling flow.

  1. Create test data.

    1. Create a DataWorks workspace and bind MaxCompute computing resources. For details, see Create a workspace and Computing resource management.

    2. In DataStudio, create an ODPS node (legacy DataStudio) or MaxCompute SQL node (new Data Studio), create a test table, and add test data.

      Note

      The following example uses scheduling parameters. Set the parameter name to bday and the parameter value to $[yyyymmdd] in the Scheduling panel on the right.

      Create test table

      -- Create test table
         CREATE TABLE IF NOT EXISTS custom_img_test_tb
         (
             c_customer_id BIGINT NOT NULL,
             c_customer_text STRING NOT NULL,
             PRIMARY KEY (c_customer_id)
         )
         COMMENT 'Test table for custom image demo'
         PARTITIONED BY (ds STRING COMMENT 'Partition')
         LIFECYCLE 90;
      
         -- Insert test data
         INSERT INTO custom_img_test_tb PARTITION (ds='${bday}') (c_customer_id, c_customer_text) VALUES
         (1, 'The sky is getting dark and it looks like it will snow. Would you like a cup of wine?'),
         (2, 'The moon sets, crows caw, and frost fills the sky. I lie awake, facing the river maples and fishing lights.'),
         (3, 'Mountains and rivers seem to block the way. But among shady willows and bright flowers, another village appears.'),
         (4, 'I sleep in spring, unaware of the dawn. Everywhere I hear the birds sing.'),
         (5, 'Thoughts on a quiet night. Moonlight shines before my bed. I mistake it for frost on the ground.'),
         (6, 'The bright moon rises over the sea. We share this moment, though we are far apart.'),
         (7, 'The swallows that once graced the halls of nobles now fly into the homes of common people.'),
         (8, 'A line of egrets flies up to the blue sky. My window frames the ancient snow on the Western Hills.'),
         (9, 'When life is good, enjoy it to the fullest. Do not let the golden goblet face the moon empty.'),
         (10, 'Heaven gave me talent, so it must be used. A thousand pieces of gold, once spent, can be earned again.');
    3. Save and deploy.

  2. Create a custom image.

    See 1. Create a custom image. Key parameters are as follows:

    • Image Name/ID: Select dataworks_pyodps_task_pod, the DataWorks PyODPS node official image.

    • Supported Task Type: Support PyODPS2 and PyODPS 3.

    • Packages: Select Python3 and jieba.

  3. Publish the custom image and modify the owner workspace. For details, see Publish a custom image and Modify owner workspace.

  4. Use the custom image in a scheduling task.

    1. In DataStudio, create a PyODPS3 node and configure the following content:

      Use custom image

      import jieba
      from odps import ODPS
      from odps.models import TableSchema as Schema, Column, Partition
      
      # Read table data
      table = o.get_table('custom_img_test_tb')
      partition_spec = f"ds={args['bday']}"
      with table.open_reader(partition=partition_spec) as reader:
          records = [record for record in reader]
      
      # Perform word segmentation on the extracted text
      participles = [' | '.join(jieba.cut(record['c_customer_text'])) for record in records]
      
      # Create target table
      if not o.exist_table("participle_tb"):
          schema = Schema(columns=[Column(name='word_segment', type='string', comment='Segmentation Result')], partitions=[Column(name='ds', type='string', comment='Partition Field')])
          o.create_table("participle_tb", schema)
      
      # Write segmentation results to the target table
      # Define output partition and table
      output_partition = f"ds={args['bday']}"
      output_table = o.get_table("participle_tb")
      
      # Create partition if it does not exist
      if not output_table.exist_partition(output_partition):
          output_table.create_partition(output_partition)
      
      # Write segmentation results to table
      record = output_table.new_record()
      with output_table.open_writer(partition=output_partition, create_partition=True) as writer:
          for participle in participles:
              record['word_segment'] = participle
              writer.write(record)
    2. Set the following key parameters in the scheduling configuration on the right:

      • Scheduling Parameter: Parameter name bday, parameter value $[yyyymmdd].

      • Resource Group: Select the Serverless resource group, which must be the same as the test resource group selected when publishing the image.

      • Image: Select the published custom image bound to the current workspace.

    3. Node debugging.

      • If using old DataStudio, click Run with Parameters (image) in the top toolbar, configure Resource Group Name, CUs for Running, Image, and Custom Parameters, and then click Run.

      • If using new DataStudio, configure Compute Resource, Resource Group, Compute CUs, Image, and Script Parameters in the Debugging Configuration panel on the right, and then click Run in the top toolbar.

    4. (Optional) Create a temporary query (legacy DataStudio) or create an SQL file in your personal directory (new Data Studio), and use the following SQL to query whether data is generated in the output table.

      -- Replace <Partition Date> with the specific date.
      SELECT * FROM participle_tb WHERE ds=<Partition Date>;
    5. Deploy the PyODPS node to the production environment.

      Note

      Image modifications in DataStudio are not synchronized to the production environment. You must publish the task for the changes to take effect in production. For details, see Publish tasks (Old DataStudio) or Node/Workflow publication (New DataStudio).

  5. Build the custom image into a persistent image. For details, see 5. Build a persistent image.

FAQ

Q: Python task error "urllib3 v2.0 only supports OpenSSL 1.1.1+".

A: urllib3 v2.0 only supports OpenSSL 1.1.1+. You can downgrade urllib3 to be compatible with OpenSSL. For example, force the urllib3 version when installing third-party packages: /home/tops/bin/pip3 install urllib3==1.26.16.

References

Installation commands

If you use the Script method to configure installation commands for custom images, refer to the following commands:

  • If depending on a PyODPS 2 node, execute the following command.

    pip install <package_name> -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install <package_name>
    Note

    After executing the command, if prompted to upgrade the PIP version, execute pip install --upgrade pip.

  • If depending on a PyODPS 3 node, execute the following command.

    /home/tops/bin/pip3 install <package_name> -i https://pypi.tuna.tsinghua.edu.cn/simple
    /home/tops/bin/pip3 install <package_name>
    Note
    • After executing the command, if prompted to upgrade the PIP version, execute /home/tops/bin/pip3 install --upgrade pip.

    • If the error /home/admin/usertools/tools/cmd-0.sh: line 3: /home/tops/bin/python3: No such file or directory occurs, please submit a ticket to request permission activation.

    Refer to the following Python public mirror sources and switch as needed.

    Organization

    Mirror address

    Alibaba Cloud (Aliyun)

    https://mirrors.aliyun.com/pypi/simple/

    Important

    Obtaining Python packages from Alibaba Cloud does not require Internet access capability.

    Tsinghua University (Tsinghua)

    https://pypi.tuna.tsinghua.edu.cn/simple

    University of Science and Technology of China (USTC)

    https://pypi.mirrors.ustc.edu.cn/simple/