DataWorks tasks run inside containers. Use image management to replace the default runtime with your own Docker image — install specific libraries, pin dependency versions, or enforce a consistent environment across your team.
By default, DataWorks uses built-in images maintained by Alibaba Cloud. Image management lets you register custom Docker images and attach them to resource groups, giving you full control over the execution environment.
When to use custom images
Custom images are useful when you need to:
Install specific libraries: include Python packages, system dependencies, or language runtimes not available in the default image.
Pin dependency versions: lock your environment to exact package versions to ensure reproducible results across runs and team members.
Enforce a standardized environment: distribute a single, validated image across multiple tasks and resource groups so every engineer runs the same stack.
Integrate with your CI/CD pipeline: build and push images as part of your existing container workflow, then attach them to DataWorks tasks.
If the built-in images meet your requirements, you do not need to use image management.
Key concepts
The following terms are used throughout the image management workflow:
Dockerfile: a text file that defines the base image, installed packages, and environment configuration for your custom image.
Docker image: the built artifact produced from a Dockerfile. Push this image to a container registry before DataWorks can use it.
DataWorks image: a record in DataWorks that references a Docker image stored in a container registry. Each DataWorks image tracks one or more versions.
Image version: an immutable snapshot of a Docker image at a point in time. Once created, a version cannot be modified — create a new version instead.
Image types
DataWorks images fall into two categories:
|
Type |
Owner |
Description |
|
Built-in images |
Alibaba Cloud |
Pre-configured images maintained and updated by DataWorks. No setup required. |
|
Custom images |
You |
Docker images you build and push to a container registry, then register in DataWorks. You are responsible for maintenance and updates. |
Choose built-in images when the default runtime satisfies your needs. Choose custom images when you need libraries, tools, or configurations not included in the defaults.
How it works
Write a Dockerfile that extends a compatible base image and installs your required dependencies.
Build the Docker image and push it to a supported container registry (Alibaba Cloud Container Registry, Docker Hub, or a private registry accessible from your DataWorks resource group).
Register the image in DataWorks by providing the registry URL and credentials.
Attach the image to a resource group. Tasks running on that resource group use your image as their execution environment.
To update the environment, build and push a new Docker image, then create a new image version in DataWorks.
Build environment changes into the image
Avoid modifying the runtime environment through init scripts when you can make the same change in the Dockerfile. Init scripts run every time a container starts, which adds latency and makes your environment harder to reproduce. Reserve init scripts for tasks that genuinely require container startup execution — for example, starting a background daemon — and handle everything else at image build time.
Supported container registries
DataWorks can pull images from the following registries:
|
Registry |
Image URL format |
|
Alibaba Cloud Container Registry (ACR) |
|
|
Docker Hub |
|
|
Private registry |
|
For private registries, configure credentials when you register the image in DataWorks.
Troubleshoot image failures
When a custom image fails to attach or a task fails at startup, check the following:
Build log: the raw output from the container build process. Look for package installation errors or missing base image layers.
Image pull errors: confirm that the registry URL is correct, the image tag exists, and DataWorks has permission to pull from the registry.
Environment conflicts: if your Dockerfile installs packages in a way that conflicts with DataWorks requirements (for example, overwriting a required system path), the task will fail at startup rather than at build time.
What's next
[Create a custom image]()
[Attach an image to a resource group]()
[Update an image version]()