DataWorks provides pre-configured official images for each supported task type in Data Development. Use an official image directly, or build a custom image on top of one to add your own dependencies and configurations.
On this page:
Choose an image type
| Official image | Custom image | |
|---|---|---|
| Setup required | None — ready to use | Build from an official base image |
| Best for | Standard tasks with no extra dependencies | Tasks that need additional packages, scripts, or tooling |
| How to start | Select in console | Follow Custom images |
Use an official image when your task type has a matching image and requires no additional packages.
Build a custom image when your task needs packages not included in the official image, or a custom initialization script (for example, dataworks_emr_base_task_pod requires cluster-specific initialization before use).
Available images
For supported versions and regions, refer to the DataWorks console. The following table lists the capabilities of the latest image versions only.
| Image name | Description | Supported task types |
|---|---|---|
dataworks_pyodps_py311_task_pod |
Official image for PyODPS nodes. Runtime: Python 3.11. | PyODPS 3 |
dataworks_pyodps_task_pod |
Official image for PyODPS nodes. Runtime: Python 3.7. Supports both PyODPS 2 and PyODPS 3. | PyODPS 2, PyODPS 3 |
dataworks_pairec_task_pod |
Official image for PAI-Rec. Runs algorithms generated by PAI-Rec. The specific versions of the feature_store SDK and pyfg are specified in the console. | — |
dataworks_emr_base_task_pod |
Base image for EMR clusters. Supports EMR Serverless Spark, EMR on ECS DataLake, and EMR on ECS Custom cluster types. Cannot be used directly — requires initialization before use. See EMR base image setup. | EMR Spark, EMR Spark SQL, EMR Shell, Serverless Spark Batch, Serverless Spark SQL, Serverless Kyuubi |
dataworks_shell_jdk17_task_pod |
Official image for Shell nodes. Runtime: JDK 17. | Shell |
dataworks_shell_task_pod |
Official image for Shell nodes. Runtime: JDK 7. Use as a base image if your task requires Subprocess parameter passing. | — |
dataworks_python_task_pod |
Official image for Python nodes. System: py3.11-ubuntu22.04. | Python |
dataworks_cdh_custom_task_pod |
Base image for CDH clusters. Cannot be used directly — install the CDH parcel first, then use the image in Data Development. | CDH |
dataworks_controller_task_pod |
Official image for assignment nodes. Supports passing parameters to downstream nodes. Use as a base image if you need a custom runtime that supports assignment parameters. | Assignment node |
dataworks-mcp |
Image for the personal development environment (DataWorks Agent for third-party clients). System: py3.11-ubuntu22.04. | Personal development environment |
dataworks-notebook |
Image for Notebook instances in the personal development environment. System: py3.11-ubuntu22.04. Use when creating a personal development environment instance. | — |
dataworks_notebook_task_pod |
Official image for Notebook nodes in scheduled task execution. System: py3.11-ubuntu22.04. The Python environment matches dataworks-notebook and dataworks-mcp. Use when running Notebook nodes as scheduled tasks. |
— |
dataworks-maxcompute |
Image for building a MaxCompute custom image in a personal development environment. System: py3.11-ubuntu20.04. | — |
EMR base image setup
dataworks_emr_base_task_pod includes only the basic components DataWorks needs to submit EMR tasks — it does not include the EMR base component execution environment.
For CUSTOM and DATALAKE cluster types, initialize the EMR Gateway environment before use:
sh /home/admin/init_emr_component.sh DATALAKE EMR-<Version>
Replace DATALAKE with your cluster type and EMR-<Version> with your cluster version number.
For semi-managed clusters (DataLake and Custom), also install the components that match your EMR cluster version. See Custom images for instructions.
If initialization fails, the cluster version is likely not available in the image repository. Submit a ticket to contact support.
Use an image in Data Development
New DataStudio
Configure the image in the Properties and Scheduling Configuration panels on the right side of the node development page:
-
Run Configuration — sets the image for trial runs
-
Scheduling Configuration — sets the image for scheduled runs
See Use image in new DataStudio for step-by-step instructions.
Old DataStudio
Configure the image on the node development page:
-
For Trial Run: click Run with Parameters, then set the Resource Group and Image in the dialog box.
-
For Post-deployment Run: open the Scheduling Configuration page and set the Resource Group and Image.
See Use image in old DataStudio for step-by-step instructions.
Personal development environment
When creating an instance for a personal development environment, select an image in the Image Configuration section.
See Use images in a personal development environment for step-by-step instructions.
Configuration notes
When configuring a resource group and image, note the following:
-
Scheduling Resource Group: Select a serverless resource group.
-
Image: Select an Official Image or a Published Custom Image.