Before you use Pai-Megatron-Patch to accelerate model training, you must install a Pai-Megatron-Patch image. This topic describes the limits and procedure of installing a Pai-Megatron-Patch image.
Limits
You can install a Pai-Megatron-Patch image only on GPU-accelerated instances.
The GPU driver version is 460.32 or later.
Procedure
Install a Pai-Megatron-Patch image in DLC
Deep Learning Containers (DLC) of Platform for AI (PAI) is a cloud-native all-in-one platform on which you can train deep learning models. DLC provides a flexible, stable, easy-to-use, and high-performance training environment. DLC supports various algorithms, including large-scale distributed deep learning algorithms and custom algorithm frameworks. This helps developers and enterprises reduce costs and improve efficiency.
DLC allows you to install custom images, including Pai-Megatron-Patch images. You need to only pass the URL of a Pai-Megatron-Patch image to DLC. Then, the system automatically installs the image. After the image is installed, you can perform ultra-large distributed training on multiple multi-GPU servers based on Pai-Megatron-Patch in DLC.
Perform the following steps to install a Pai-Megatron-Patch image:
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane of the Workspace page, choose Model Training > Deep Learning Containers (DLC). Click Create Job.
The following section describes the key parameters. You can configure other parameters based on your business requirements. For more information about the parameters, see Submit training jobs.
Basic Information: In the Image config section, select Image Address and enter the Pai-Megatron-Patch image address in the input box. Example:pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:2.0-ubuntu20.04-py3.10-cuda11.8-megatron-patch-llm.
Resource Configuration: Select Pytorch for the Framework parameter. In the Node Configuration section, select an instance type on the GPU Instance tab based on your business requirements.


Click Submit.
Install a Pai-Megatron-Patch image in DSW
Data Science Workshop (DSW) is a development environment in the cloud that is used for deep learning algorithm development. JupyterLab is integrated into DSW to allow DSW instances to provide plug-ins for custom development. You can launch Notebook to write, debug, and run Python code without the need to perform O&M configurations. DSW supports open source deep learning frameworks and provides an optimized TensorFlow framework that is developed by Alibaba Cloud. You can optimize compilation to improve the training performance.
DSW also allows you to install custom images. You need to only pass the URL of a Pai-Megatron-Patch image to DSW. Then, the system automatically installs the image. After the image is installed, you can accelerate training based on Pai-Megatron-Patch in DSW.
Perform the following steps to install a Pai-Megatron-Patch image:
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane of the workspace page, choose Model Training > Data Science Workshop (DSW). Click Create Instance.
The following section describes the key parameters. You can configure other parameters based on your business requirements. For more information, see Create and manage DSW instances.
Resource Quota: Select an instance type on the GPU Specifications tab based on your business requirements.
Image config: Select Image URL and enter the Pai-Megatron-Patch image address in the input field. Example: pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:2.0-ubuntu20.04-py3.10-cuda11.8-megatron-patch-llm.


Click Next, confirm the information about the DSW instance, and then click Create Instance.
Use Pai-Megatron-Patch
After you install a Pai-Megatron-Patch image, you can view and use the sample code in the examples folder of Pai-Megatron-Patch.