All Products
Search
Document Center

DataWorks:DataWorks official images

Last Updated:Mar 26, 2026

DataWorks provides pre-configured official images for each supported task type in Data Development. Use an official image directly, or build a custom image on top of one to add your own dependencies and configurations.

On this page:

Choose an image type

Official image Custom image
Setup required None — ready to use Build from an official base image
Best for Standard tasks with no extra dependencies Tasks that need additional packages, scripts, or tooling
How to start Select in console Follow Custom images

Use an official image when your task type has a matching image and requires no additional packages.

Build a custom image when your task needs packages not included in the official image, or a custom initialization script (for example, dataworks_emr_base_task_pod requires cluster-specific initialization before use).

Available images

Important

For supported versions and regions, refer to the DataWorks console. The following table lists the capabilities of the latest image versions only.

Image name Description Supported task types
dataworks_pyodps_py311_task_pod Official image for PyODPS nodes. Runtime: Python 3.11. PyODPS 3
dataworks_pyodps_task_pod Official image for PyODPS nodes. Runtime: Python 3.7. Supports both PyODPS 2 and PyODPS 3. PyODPS 2, PyODPS 3
dataworks_pairec_task_pod Official image for PAI-Rec. Runs algorithms generated by PAI-Rec. The specific versions of the feature_store SDK and pyfg are specified in the console.
dataworks_emr_base_task_pod Base image for EMR clusters. Supports EMR Serverless Spark, EMR on ECS DataLake, and EMR on ECS Custom cluster types. Cannot be used directly — requires initialization before use. See EMR base image setup. EMR Spark, EMR Spark SQL, EMR Shell, Serverless Spark Batch, Serverless Spark SQL, Serverless Kyuubi
dataworks_shell_jdk17_task_pod Official image for Shell nodes. Runtime: JDK 17. Shell
dataworks_shell_task_pod Official image for Shell nodes. Runtime: JDK 7. Use as a base image if your task requires Subprocess parameter passing.
dataworks_python_task_pod Official image for Python nodes. System: py3.11-ubuntu22.04. Python
dataworks_cdh_custom_task_pod Base image for CDH clusters. Cannot be used directly — install the CDH parcel first, then use the image in Data Development. CDH
dataworks_controller_task_pod Official image for assignment nodes. Supports passing parameters to downstream nodes. Use as a base image if you need a custom runtime that supports assignment parameters. Assignment node
dataworks-mcp Image for the personal development environment (DataWorks Agent for third-party clients). System: py3.11-ubuntu22.04. Personal development environment
dataworks-notebook Image for Notebook instances in the personal development environment. System: py3.11-ubuntu22.04. Use when creating a personal development environment instance.
dataworks_notebook_task_pod Official image for Notebook nodes in scheduled task execution. System: py3.11-ubuntu22.04. The Python environment matches dataworks-notebook and dataworks-mcp. Use when running Notebook nodes as scheduled tasks.
dataworks-maxcompute Image for building a MaxCompute custom image in a personal development environment. System: py3.11-ubuntu20.04.

EMR base image setup

dataworks_emr_base_task_pod includes only the basic components DataWorks needs to submit EMR tasks — it does not include the EMR base component execution environment.

For CUSTOM and DATALAKE cluster types, initialize the EMR Gateway environment before use:

sh /home/admin/init_emr_component.sh DATALAKE EMR-<Version>

Replace DATALAKE with your cluster type and EMR-<Version> with your cluster version number.

For semi-managed clusters (DataLake and Custom), also install the components that match your EMR cluster version. See Custom images for instructions.

If initialization fails, the cluster version is likely not available in the image repository. Submit a ticket to contact support.

Use an image in Data Development

New DataStudio

Configure the image in the Properties and Scheduling Configuration panels on the right side of the node development page:

  • Run Configuration — sets the image for trial runs

  • Scheduling Configuration — sets the image for scheduled runs

See Use image in new DataStudio for step-by-step instructions.

Old DataStudio

Configure the image on the node development page:

  • For Trial Run: click Run with Parameters, then set the Resource Group and Image in the dialog box.

  • For Post-deployment Run: open the Scheduling Configuration page and set the Resource Group and Image.

See Use image in old DataStudio for step-by-step instructions.

Personal development environment

When creating an instance for a personal development environment, select an image in the Image Configuration section.

See Use images in a personal development environment for step-by-step instructions.

Configuration notes

When configuring a resource group and image, note the following:

  • Scheduling Resource Group: Select a serverless resource group.

  • Image: Select an Official Image or a Published Custom Image.