This topic describes how to use the custom image feature of MaxCompute.
Background
SQL or Python development often involves complex business logic and dependencies on multiple third-party packages. To simplify this process, MaxCompute provides a custom image management feature. You can use Docker images to build a complete development environment for MaxCompute SQL and Python (PyODPS or MaxFrame) development.
Usage notes
Image size: The maximum size of a single custom image is 10 GB.
Number of images: A single tenant can upload a maximum of 10 images.
ACR version requirements: Only Standard Edition or Premium Edition instances of Container Registry (ACR) Enterprise Edition are supported.
CPU architecture requirements: Images must be built using CPUs with the x86_64 architecture. CPUs with other architectures, such as the macOS M series or ARM, are not supported.
Library version requirements: The MaxCompute job runtime environment is based on CentOS 7. Therefore, when you build an image, you must use package versions that are compatible with CentOS 7. The yum source in the base image is configured to use the Alibaba Cloud CentOS 7 image source address.
Restrictions on file directory operations within images:
When you use a package manager, such as pip or yum, do not place personal files in the
/home/admin, /usr/local/lib, /usr/ali, or /apsaradirectories. When a container starts, MaxCompute mounts the runtime environment to these directories, which overwrites the original content in the image directories.
Procedure
Step 1: Install Docker
For Linux environments: Install Docker by following the instructions in the official Docker documentation.
For macOS or Windows environments:
Individual developers: Use Docker Desktop.
Enterprise users who have not purchased a license: Use the open-source Rancher Desktop.
Step 2: Grant permissions
Grant the required permissions to your account or RAM user. You need read permissions on RAM roles, operation permissions on ACR (Container Registry), and operation permissions on MaxCompute custom images. The details are as follows:
Authorization scenario | Account type | Permission requirements | Guidance |
Read permissions on RAM roles | Alibaba Cloud account (Recommended) | An Alibaba Cloud account has read permissions on RAM roles by default. No extra authorization is required. | Not applicable |
RAM user | Grant the AliyunRAMReadOnlyAccess permission. | ||
Operation permissions on ACR | Alibaba Cloud account (Recommended) | An Alibaba Cloud account has all operation permissions on ACR by default. No extra authorization is required. | Not applicable |
RAM user | If you use a RAM user to perform operations on ACR, grant the AliyunContainerRegistryReadOnlyAccess permission to the RAM user. | ||
Operation permissions on MaxCompute custom images | Alibaba Cloud account (Recommended) | An Alibaba Cloud account has all permissions to view, add, and delete custom images in MaxCompute by default. No extra authorization is required. | Not applicable |
RAM user | If you use a RAM user to perform operations on MaxCompute custom images, grant the required permissions to the RAM user. |
Step 3: Build a custom image in Docker
Build a custom image from a MaxCompute base image using a Dockerfile.
The address of the MaxCompute CentOS base image is
registry.cn-zhangjiakou.aliyuncs.com/maxcompute_image/base_image:latest. This base image provides basic environments, such as Python 3.7, Python 3.11, pip, and yum.The address of the MaxCompute Ubuntu base image is
registry.cn-zhangjiakou.aliyuncs.com/maxcompute_image/ubuntu_20.04:latest.
Create a Dockerfile to build the custom image from the MaxCompute base image by running the following command:
vim DockerFileAdd the following content to the Dockerfile:
# Use the MaxCompute CentOS base image FROM registry.cn-zhangjiakou.aliyuncs.com/maxcompute_image/base_image:latest # If you use the Ubuntu image, replace the image address with registry.cn-zhangjiakou.aliyuncs.com/maxcompute_image/ubuntu_20.04:latest # Install system dependencies RUN yum install vi -y # Install third-party libraries RUN /usr/ali/python3.7/bin/python3 -m pip install --no-cache-dir pandasBuild the image from the Dockerfile.
sudo docker build -f DockerFile -t <image_name>:<tag> .Parameters:
image_name: The name of the custom image.
tag: The version of the custom image.
Step 4: Upload the custom image to ACR
Log on to the Container Registry console and select a region in the upper-left corner.
In the navigation pane on the left, click Instances.
On the Instances page, click Create ACR EE. If you have already created an instance, skip this step.
ImportantYou can upload custom images only to Standard Edition or Premium Edition instances of ACR Enterprise Edition.
On the Instances page, find the target Enterprise instance and click Manage to open its overview page.
In the navigation pane on the left, choose .
On the Repository > Repositories page, click Create Repository. In the Create Repository dialog box, enter the following information and click Next.
Parameter
Required
Description
Region
Required
The region where the current instance resides is automatically selected.
Namespace
Required
The namespace of the image repository. This parameter cannot be modified after it is set.
We recommend that you create a namespace that corresponds to a company, an organization, or an individual user, such as Aliyun.
We do not recommend that you create a namespace that corresponds to a module or system, such as Tomcat, CentOS, an application, or a module.
Repository Name
Required
The name must be 2 to 120 characters in length and can contain lowercase letters, digits, and separators. The separators can be underscores (_), hyphens (-), periods (.), and forward slashes (/). A separator cannot be the first or last character.
Repository Type
Optional
To pull a public image, you must log on to the Enterprise Edition instance. To allow anonymous pulls, enable anonymous pulls for the instance.
To pull a private image from an Enterprise Edition instance, you must log on to the instance and have the pull permission.
The default value is Private.
Tags
Optional
After selected, all tags other than the latest in the repository cannot be overwritten, ensuring the consistency of the container images..
Accelerated Image
Optional
After selected, all images in the repository will automatically generate the accelerated images with "_accelerated" suffix.
Full Mode: Full mode provides a significant acceleration effect. The size of an accelerated image is approximately 130% the size of the original image. The system requires approximately 25 seconds to generate a 1 GB-sized accelerated image. If an accelerated image layer has been generated for an image layer, the system does not generate an accelerated image layer again for the image layer..
Index-only Mode (Public Preview): This mode provides about 70% of the acceleration effect of the full mode. The size of the accelerated image is about 3% of the original image size. It takes about 3 seconds to generate a 1 GB accelerated image. Image layers for which an index is already generated are not generated again.
Summary
Required
Max. 100 characters.
Description
Optional
Supports Markdown Format.
Configure the code source:
Set Code Source to Local Repository and click Create Repository.
For more information, see Use an Enterprise Edition instance to build an image.
Upload the custom image that you built to ACR.
On the Repository > Repositories page, find the desired repository and click Manage in the Actions column.
In the navigation pane on the left, click Details.
Follow the instructions in the Instructions on Images section on the Details page to upload a custom image from your Docker environment to the ACR image repository.
(Optional) If your machine is in a VPC, perform the following steps:
Configure VPC access for the Enterprise Edition instance so that you can connect to it. For more information, see Configure access over a VPC.
When you perform operations on the ACR Enterprise Edition instance in your Docker environment, add vpc to the domain name. For example, in the following command, change
acr-test-registry.cn-wulanchabu.cr.aliyuncs.comtoacr-test-registry-vpc.cn-wulanchabu.cr.aliyuncs.com.$ docker login --username=***@test.aliyunid.com acr-test-registry.cn-wulanchabu.cr.aliyuncs.com
NoteIf an error occurs when you log on to the instance, check whether public network access is enabled for the repository.
Step 5: Add the custom image to MaxCompute
Associate an existing image in ACR with MaxCompute to centrally manage your development images.
Log on to the MaxCompute console and select a region in the top-left corner.
In the navigation pane on the left, choose .
On the Images page, click the Custom Image tab.
On the Custom Image tab, click Create Image. In the Add Image dialog box that appears, configure the following parameters:
NoteWhen you create an image for the first time, the MaxCompute Service Linked Role dialog box appears. Click OK to automatically create a service-linked role to access ACR resources.
Parameter
Description
Image Name
Required. The name of the custom image. The image name must start with a lowercase letter, and can contain only lowercase letters, digits, hyphens (-), and underscores (_)..
The name is used in subsequent MaxCompute SQL, PyODPS, and MaxFrame development.
Image Type
Required. The type of the ACR image. Only ACR Enterprise Edition images are supported.
Enterprise Edition Image Instance
Required. Select the Enterprise Edition image instance that you created in ACR.
Image Namespace
Required. Select the Enterprise Edition image namespace that you created in ACR.
Image Repository
Required. Select the Enterprise Edition image repository that you created in ACR.
Image Version
Required. Select the image version that you uploaded to ACR.
Image Description
Required. Add a description for the image.
Click OK. The custom image is created and added to the custom image list.
Step 6: Use the custom image
You can use custom images for MaxCompute SQL user-defined functions (UDFs), PyODPS, and MaxFrame development.
Each development job can specify only one image. Otherwise, image conflicts may occur.
When you call a UDF: You can specify the required image and Python version at the SQL session level using flags. The command is as follows:
set odps.sql.python.version=cp37; set odps.session.image = <image_name>;In PyODPS development: You can specify an existing image using the image parameter of the execute or persist method. The command is as follows:
NoteTo reference an image in PyODPS development, you must upgrade PyODPS to version 0.11.5 or later.
image='<image_name>'In MaxFrame development: You can specify an existing image for the current job. The relevant parameters are as follows:
config.options.sql.settings = { "odps.session.image": "<image_name>" }