All Products
Search
Document Center

DataWorks:DataWorks official images

Last Updated:Feb 05, 2026

DataWorks provides official images to support various task types in Data Development. Each image has a pre-configured runtime environment for specific nodes. You can use these official images directly or as a base to create your own custom images. This topic describes the official images available in DataWorks.

Image overview

In Data Development, if you do not specify a runtime environment image for a node, the system uses a default standard image. The default image provides a basic runtime environment that may not meet the requirements of specific tasks. Official images provide pre-configured, standardized environments for various task types. You can use them directly or use them as a base for custom images with additional configurations to support a wider range of use cases.

Available images

Important

For supported versions and regions, refer to the DataWorks console. Images may have multiple versions. The following table lists the capabilities of the latest image versions only.

DataWorks provides the following images:

Image name

Description

Task type

dataworks_pyodps_py311_task_pod

The official image for DataWorks PyODPS nodes. This image uses Python 3.11.

PyODPS 3

dataworks_pairec_task_pod

The official image for DataWorks PAI-Rec, used to run algorithms generated by PAI-Rec. The specific versions of the feature_store SDK and pyfg are specified in the console.

dataworks_pyodps_task_pod

The official image for DataWorks PyODPS nodes. This image uses Python 3.7.

PyODPS 2

PyODPS 3

dataworks_emr_base_task_pod

A base image for EMR clusters. It supports EMR Serverless Spark, EMR on ECS DataLake, and EMR on ECS Custom cluster types.

  • The image includes only the basic components required for DataWorks to submit EMR tasks and does not contain the execution environment for EMR base components. For semi-managed clusters like DataLake and Custom, you must install the components that correspond to the EMR cluster versionCustom images.

  • When you use CUSTOM or DATALAKE cluster types, you must first initialize the EMR Gateway environment by specifying the cluster type and version number.

    sh /home/admin/init_emr_component.sh DATALAKE EMR-<Version>
    Note

    If the EMR Gateway environment fails to initialize, it is typically because the cluster version is not available in the image repository. In this case, please submit a ticket to contact support.

dataworks_shell_jdk17_task_pod

The official image for DataWorks Shell nodes. This image uses JDK 17.

Shell

dataworks_shell_task_pod

The official image for DataWorks Shell nodes, which uses JDK 7. If you need a custom runtime environment that supports Subprocess parameter passing, you can build a Custom images based on this image.

dataworks_python_task_pod

The official image for DataWorks Python nodes. System information: py3.11-ubuntu22.04.

Python

dataworks_cdh_custom_task_pod

A base image for DataWorks CDH clusters. This image cannot be used directly. You must first install the Custom images by following the instructions in CDH parcel before using it in Data Development.

CDH

dataworks_controller_task_pod

The official image for DataWorks assignment nodes. If you need a custom runtime environment and need to use assignment nodes or assignment parameters to pass parameters to downstream nodes, build a Custom images based on this image.

Assignment node

dataworks-mcp

Applicable for DataWorks Agent for third-party clients task development. System information: py3.11-ubuntu22.04.

Personal development environment

dataworks-notebook

Applicable for Notebook development task development. System information: py3.11-ubuntu22.04.

dataworks_notebook_task_pod

The official image for DataWorks Notebook nodes. System information: py3.11-ubuntu22.04. The Python environment is consistent with the dataworks-notebook and dataworks-mcp images in the personal development environment.

dataworks-maxcompute

Applicable for Build a MaxCompute custom image in a personal development environment. System information: py3.11-ubuntu20.04.

Using images

In Data Development, you can use official images or custom images that are bound to your workspace.

  • Use image in new DataStudio: Configure the Run Configuration and Scheduling Configuration in the Properties and Scheduling Configuration panels on the right side of the node development page.

  • Use image in old DataStudio: On the node development page, configure the Resource Group and Image for the Trial Run and Post-deployment Run of the node in the dialog box that appears after you click Run with Parameters or on the Scheduling Configuration page on the right side.

  • Use images in a Personal Development Environment: When you create an instance for a personal development environment, you can select different official images in the Image Configuration section.

Note

Note the following when you configure a resource group and an image: