DataWorks lets you simultaneously generate a MaxCompute custom image when you create a custom image in a personal development environment. This simplifies the use of MaxCompute custom images in DataWorks nodes, such as PyODPS 3 and Notebook nodes. This topic describes how to build and use MaxCompute custom images in DataWorks.
Background information
The MaxCompute image management feature lets you create custom images. These images can be directly referenced in scenarios such as SQL UDF, PyODPS, and MaxFrame development, eliminating the need for complex resource packaging and uploading. In DataWorks, you can build a MaxCompute image at the same time you build a DataWorks image from a personal development environment.
Prerequisites
You have created a workspace that uses the new version of Data Studio and attached MaxCompute computing resources.
You have created a Serverless resource group and associate it to the workspace.
Create a MaxCompute custom image
Preparations
You have activated Alibaba Cloud Container Registry (ACR) and created a Standard Edition or higher version of an ACR instance. For more information, see Create an Enterprise instance, Create a namespace, and Create an image repository.
You have configured access control for the ACR instance over a virtual private cloud (VPC). For more information, see Configure access control for a VPC.
You have the required permissions to manage ACR and MaxCompute custom images. For more information, see Custom images.
Notes
When you create a MaxCompute custom image:
Image size: The maximum size of a single MaxCompute image is
10 GB.Number of images: A single MaxCompute tenant can upload a maximum of
10images.
When you use a MaxCompute image, note that DataWorks builds MaxCompute images based on a Python 3.11 environment. To run a MaxCompute image built by DataWorks, you must ensure that your Python environment is version 3.11.
Create a personal development environment instance
Go to Data Studio and create a personal development environment instance. You must use the dataworks-maxcompute:py3.11-ubuntu20.04 image to simultaneously create a MaxCompute custom image.
Go to Data Studio.
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
On the Data Studio page, click the
icon in the navigation pane on the left to go to the Data Studio page.
Go to the personal development environment creation page. At the top of the page, click Personal development environment and create a personal development environment instance.
If you do not have a personal development environment instance, click New Instance to create one.
If you have a personal development environment instance, click Management Environment. Then, in the list of personal development environment instances, click New Instance.
Configure the personal development environment. When you create a MaxCompute custom image in DataWorks, you must configure the following parameters for the personal development environment. For information about other parameters, see Create a personal development environment instance.
Image Configuration: Select
dataworks-maxcompute:py3.11-ubuntu20.04.NoteYou must select the
dataworks-maxcompute:py3.11-ubuntu20.04image to create a MaxCompute custom image.A DataWorks custom image built from the
dataworks-maxcompute:py3.11-ubuntu20.04base image can be used to develop MaxFrame jobs in DataWorks Notebook, General Python, and Shell nodes.
Network Settings: Select the VPC that is configured for the ACR instance. This ensures that the personal development environment instance can push the image to the ACR instance.
Configure the image environment
In the terminal of your personal development environment instance, install the third-party dependencies required for MaxCompute development. This topic uses jieba as an example.
At the top of the Data Studio page, click Personal development environment and then click the personal development environment instance that you created in Create a personal development environment instance.
In the toolbar at the bottom of Data Studio, click the
icon on the left to open the terminal.In the terminal of the personal development environment, run the following commands to download the
jiebathird-party dependency and verify its installation.## Install the third-party dependency. pip install jieba; ## View the third-party dependency. pip show jieba;
Save the custom image
Create a DataWorks image from your personal development environment and choose to create a MaxCompute image at the same time. The system automatically uploads the generated image to the ACR instance that is managed by the same account.
Go to the personal development environment instance management page.
At the top of the page, click the name of the personal development environment instance that you created, which is displayed in the Personal development environment section.
In the dialog box that appears, select Management Environment to go to the Personal Development Environment Instances page.
Go to the image creation page.
On the personal development environment instance page, find the personal development environment instance that you created.
In the Actions column of the instance, click Create Image.
Configure the image as described in the following table. After you complete the configuration, click Confirm.
Parameter
Description
Image Name
The custom name of the DataWorks image. If the image is synced to MaxCompute, the name defined here is used as the name of the MaxCompute image. Example:
image_jieba.Image Instance
Select a Standard Edition or higher ACR instance. For more information about how to create an ACR instance, see Create an Enterprise instance.
NoteOnly Standard Edition or higher ACR instances can be used to build MaxCompute custom images.
Namespace
Select a namespace for the ACR instance. For more information about how to create a namespace, see Create a namespace.
Image Repository
Select an image repository for the ACR instance. For more information about how to create an image repository, see Create an image repository.
Image Version
The custom image version.
Sync To MaxCompute
In this example, select Yes. After you select this option, the image is built as a MaxCompute image when the DataWorks image is published.
NoteThis option is related to the Image Instance that you select. You can select ACR image instances whose Instance Type is Standard Edition or higher. Other instances cannot be selected by default.
Task Type
Select the task types for which the DataWorks image can be used. In this example, you can select to use the image for Notebook development.
Notebook
Python
Shell
Check the image save status.
On the list of instances, find the image column for your personal development environment to view the save status.
Click Confirm to create the image.
To the right of the personal development environment instance, click the
icon and select the Image checkbox to display the column.Wait for the image to be created. Hover the mouse over the
icon to the right of Saved, and click Here in the pop-up window to go to the Image Management page.
Publish the custom image
After the image from the personal development environment instance is saved in Data Studio, publish the custom image. This operation syncs the image from the ACR instance to DataWorks and MaxCompute, which generates both a DataWorks custom image and a MaxCompute custom image.
Go to the DataWorks workspace list page and switch to the destination region in the top navigation bar.
In the navigation pane on the left, go to the tab. Test the destination image. After the test is successful, Publish the image.
NoteWhen you test a custom image, select a Serverless resource group for Test Resource Group.
The VPC that is attached to the Serverless resource group selected for testing and publishing must be the same as the VPC configured in ACR.
If your custom image obtains third-party packages from the Internet and the test fails, check whether the VPC that is attached to the Test Resource Group can access the Internet. To configure Internet access for a VPC, see Use the SNAT feature of an Internet NAT gateway to access the Internet.
Refresh the page and confirm that the Publishing Status of the image in the image list changes to Published.
In the Actions column of the destination image, click to attach the custom image to a workspace.
Confirm the MaxCompute image status
Publishing a DataWorks image automatically creates a corresponding MaxCompute image. Once the image status on the tab in the DataWorks console changes to Published, you can go to the MaxCompute console. Follow the steps in Add a custom image to MaxCompute to view the new MaxCompute custom image.
Use a MaxCompute custom image
Notes
To use MaxFrame for development, the
MaxFrameservice must be included in the image. To run a MaxCompute custom image in DataWorks, the image must be built in aPython 3.11environment.To use a MaxCompute custom image for MaxFrame job development in DataWorks, make sure that the task runs in a DataWorks image that has a MaxFrame runtime environment. The requirements are as follows:
Notebook node: Select the official image
dataworks-notebook:py3.11-ubuntu22.04, or a DataWorks custom image built from this official image or thedataworks-maxcompute:py3.11-ubuntu20.04image.PyODPS 3 node: Select the official image
dataworks_pyodps_py311_task_pod, or a DataWorks custom image built from this official image.Python node: Create a personal development environment instance that has the MaxFrame service based on the
dataworks-maxcompute:py3.11-ubuntu20.04image, and save it as a DataWorks custom image that supports Python task types.Other nodes: Make sure that the DataWorks custom image contains a MaxFrame runtime environment and is built in a
Python 3.11environment.
Go to Data Development
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
On the Data Studio page, click the
icon in the navigation pane on the left to go to the Data Development page.
> Change Workspace
icon and choose
icon. In the dialog box that appears, select a