Develop Qwen models from training to deployment - Platform For AI

This guide helps large language model (LLM) developers get started with the Lingjun Intelligent Computing platform. You will learn the complete development workflow for Qwen LLMs, such as Qwen-7B, Qwen-14B, and Qwen-72B. The workflow includes efficient distributed training, three-stage instruction tuning, offline model inference, and online service deployment. This topic uses the Qwen-7B model to demonstrate the process.

Prerequisites

This guide uses the Qwen-7B v1.1.4 model as an example. Before you begin, complete the following preparations:

Activate Platform for AI (PAI), including Data Science Workshop (DSW), Deep Learning Containers (DLC), and Elastic Algorithm Service (EAS), and create a default workspace. For more information, see Activate PAI and create a default workspace.

Purchase Lingjun resources and create a resource quota. The following table lists the supported resource specifications for different model sizes. Select the appropriate resources based on the model size you use. For more information about the node specifications of Lingjun resources, see Lingjun Serverless pricing details. For more information about how to purchase resources and create a quota, see Create a resource group and purchase Lingjun resources and Create a resource quota.

Model size	Full-parameter training resources	Inference resources (minimum)	Megatron Training Model Segment
7B	8 × gu7xf GPUs, 8 × gu7ef GPUs	1 × V100 (32 GB VRAM), 1 × A10 (24 GB VRAM)	TP: 1, PP: 1
14B	8 × gu7xf GPUs, 8 × gu7ef GPUs	2 × V100 (32 GB VRAM), 2 × A10 (24 GB VRAM)	TP: 2, PP: 1
72B	4 × 8 gu7xf GPUs, 4 × 8 gu7ef GPUs	6 × V100 (32 GB VRAM), 2 × gu7xf GPUs	TP: 8, PP: 2

Create a General-purpose NAS file system dataset to store training files and results. Configure the default mount path to /mnt/data/nas. For more information, see Create and manage datasets.
A DSW instance has been created with the following key parameter settings. For more information, see Create a DSW instance.
- Resource Quota: Select the resource quota that you created for Lingjun resources.
- Instance Type: Configure the following resource specifications.
  - vCPUs: 90.
  - Memory (GiB): 1024.
  - Shared Memory (GiB): 1024.
  - GPUs: At least 8.
- Dataset Mounting: Click Custom Dataset, select the dataset that you created, and use the default mount path.
- Image Configuration: On the Image Address tab, set the runtime image to pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:1.12-ubuntu20.04-py3.10-cuda11.3-megatron-patch-llm.
If you use a Resource Access Management (RAM) user to perform these operations, grant the RAM user the required permissions for DSW, DLC, or EAS. For more information, see Cloud product dependencies and permissions: DSW, Cloud product dependencies and permissions: DLC, or Cloud product dependencies and permissions: EAS.

Limits

This best practice is available only in the China (Ulanqab) region.

Step 1: Prepare the Qwen model

This topic provides three ways to download the model. Follow these steps:

Go to the PAI DSW development environment.
1. Log on to the PAI console.
2. In the upper-left corner of the page, select the region where you want to use the service: China (Ulanqab).
3. In the navigation pane on the left, click Workspaces. On the page that appears, click the name of the workspace that you want to manage.
4. In the navigation pane on the left, choose Model Training > Data Science Workshop (DSW).
5. In the Actions column of the target instance, click Open.
In the top menu bar, click Terminal. On the tab that appears, click create a terminal.

Download the Qwen model.

Download the model from the ModelScope community

In the Terminal, run the following command to install ModelScope.

pip install modelscope

Click to view the output. You can ignore the WARNING messages in the results.

Looking in indexes: https://mirrors.cloud.aliyuncs.com/pypi/simple
Collecting modelscope
  Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/ac/05/75b5d750608d7354dc3dd023dca7101e5f3b4645cb3e5b816536d472a058/modelscope-1.9.5-py3-none-any.whl (5.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 104.7 MB/s eta 0:00:00
Requirement already satisfied: pyyaml in /opt/*/lib/python3.8/site-packages (from modelscope) (5.4.1)
Requirement already satisfied: pandas in /opt/*/lib/python3.8/site-packages (from modelscope) (1.5.3)
Requirement already satisfied: addict in /opt/*/lib/python3.8/site-packages (from modelscope) (2.4.0)
Requirement already satisfied: numpy in /opt/*/lib/python3.8/site-packages (from modelscope) (1.22.2)
Collecting simplejson>=3.3.0
  Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/33/5f/b9506e323ea89737b34c97a6eda9d22ad6b771190df93f6eb72657a3b996/simplejson-3.19.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (136 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 136.6/136.6 kB 70.2 MB/s eta 0:00:00
Collecting gast>=0.2.2
  Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/fa/39/5aae571e5a5f4de9c3445dae08a530498e5c53b0e74410eeeb0991c79047/gast-0.5.4-py3-none-any.whl (19 kB)
Requirement already satisfied: Pillow>=6.2.0 in /opt/*/lib/python3.8/site-packages (from modelscope) (9.3.0)
Requirement already satisfied: oss2 in /opt/*/lib/python3.8/site-packages (from modelscope) (2.17.0)
Requirement already satisfied: filelock>=3.3.0 in /opt/*/lib/python3.8/site-packages (from modelscope) (3.11.0)
Requirement already satisfied: urllib3>=1.26 in /opt/*/lib/python3.8/site-packages (from modelscope) (1.26.12)
Requirement already satisfied: datasets<=2.13.0,>=2.8.0 in /opt/*/lib/python3.8/site-packages (from modelscope) (2.11.0)
Requirement already satisfied: attrs in /opt/*/lib/python3.8/site-packages (from modelscope) (22.2.0)
Requirement already satisfied: scipy in /opt/*/lib/python3.8/site-packages (from modelscope) (1.9.3)
Requirement already satisfied: yapf in /opt/*/lib/python3.8/site-packages (from modelscope) (0.32.0)
Requirement already satisfied: pyarrow!=9.0.0,>=6.0.0 in /opt/*/lib/python3.8/site-packages (from modelscope) (11.0.0)
Requirement already satisfied: setuptools in /opt/*/lib/python3.8/site-packages (from modelscope) (65.5.0)
Requirement already satisfied: requests>=2.25 in /opt/*/lib/python3.8/site-packages (from modelscope) (2.28.1)
Requirement already satisfied: einops in /opt/*/lib/python3.8/site-packages (from modelscope) (0.6.0)
Requirement already satisfied: python-dateutil>=2.1 in /opt/*/lib/python3.8/site-packages (from modelscope) (2.8.2)
Collecting sortedcontainers>=1.5.9
  Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/32/46/9cb0e58b2deb7f82b84065f37f3bffeb12413f947f9388e4cac22c4621ce/sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Requirement already satisfied: tqdm>=4.64.0 in /opt/*/lib/python3.8/site-packages (from modelscope) (4.65.0)
Requirement already satisfied: dill<0.3.7,>=0.3.0 in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (0.3.6)
Requirement already satisfied: multiprocess in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (0.70.14)
Requirement already satisfied: aiohttp in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (3.8.4)
Requirement already satisfied: responses<0.19 in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (0.18.0)
Requirement already satisfied: huggingface-hub<1.0.0,>=0.11.0 in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (0.16.4)
Requirement already satisfied: fsspec[http]>=2021.11.1 in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (2023.4.0)
Requirement already satisfied: packaging in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (21.3)
Requirement already satisfied: xxhash in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (3.2.0)
Requirement already satisfied: six>=1.5 in /opt/*/lib/python3.8/site-packages (from python-dateutil>=2.1->modelscope) (1.16.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/*/lib/python3.8/site-packages (from requests>=2.25->modelscope) (2022.9.24)
Requirement already satisfied: charset-normalizer<3,>=2 in /opt/*/lib/python3.8/site-packages (from requests>=2.25->modelscope) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in /opt/*/lib/python3.8/site-packages (from requests>=2.25->modelscope) (3.4)
Requirement already satisfied: aliyun-python-sdk-kms>=2.4.1 in /opt/*/lib/python3.8/site-packages (from oss2->modelscope) (2.16.0)
Requirement already satisfied: aliyun-python-sdk-core>=2.13.12 in /opt/*/lib/python3.8/site-packages (from oss2->modelscope) (2.13.36)
Requirement already satisfied: crcmod>=1.7 in /opt/*/lib/python3.8/site-packages (from oss2->modelscope) (1.7)
Requirement already satisfied: pycryptodome>=3.4.7 in /opt/*/lib/python3.8/site-packages (from oss2->modelscope) (3.15.0)
Requirement already satisfied: pytz>=2020.1 in /opt/*/lib/python3.8/site-packages (from pandas->modelscope) (2022.7.1)
Requirement already satisfied: cryptography>=2.6.0 in /opt/*/lib/python3.8/site-packages (from aliyun-python-sdk-core>=2.13.12->oss2->modelscope) (38.0.3)
Requirement already satisfied: jmespath<1.0.0,>=0.9.3 in /opt/*/lib/python3.8/site-packages (from aliyun-python-sdk-core>=2.13.12->oss2->modelscope) (0.10.0)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/*/lib/python3.8/site-packages (from aiohttp->datasets<=2.13.0,>=2.8.0->modelscope) (4.0.2)
Requirement already satisfied: yarl<2.0,>=1.0 in /opt/*/lib/python3.8/site-packages (from aiohttp->datasets<=2.13.0,>=2.8.0->modelscope) (1.8.2)
Requirement already satisfied: frozenlist>=1.1.1 in /opt/*/lib/python3.8/site-packages (from aiohttp->datasets<=2.13.0,>=2.8.0->modelscope) (1.3.3)
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/*/lib/python3.8/site-packages (from aiohttp->datasets<=2.13.0,>=2.8.0->modelscope) (6.0.4)
Requirement already satisfied: aiosignal>=1.1.2 in /opt/*/lib/python3.8/site-packages (from aiohttp->datasets<=2.13.0,>=2.8.0->modelscope) (1.3.1)
Requirement already satisfied: typing-extensions>=3.7.*.* in /opt/*/lib/python3.8/site-packages (from huggingface-hub<1.0.0,>=0.11.0->datasets<=2.13.0,>=2.8.0->modelscope) (4.4.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/*/lib/python3.8/site-packages (from packaging->datasets<=2.13.0,>=2.8.0->modelscope) (3.0.9)
Requirement already satisfied: cffi>=1.12 in /opt/*/lib/python3.8/site-packages (from cryptography>=2.6.0->aliyun-python-sdk-core>=2.13.12->oss2->modelscope) (1.15.1)
Requirement already satisfied: pycparser in /opt/*/lib/python3.8/site-packages (from cffi>=1.12->cryptography>=2.6.0->aliyun-python-sdk-core>=2.13.12->oss2->modelscope) (2.21)
Installing collected packages: sortedcontainers, simplejson, gast, modelscope
Successfully installed gast-0.5.4 modelscope-1.9.5 simplejson-3.19.2 sortedcontainers-2.4.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

Run the following command to enter the Python environment.

python

The following code shows how to download the model files for the Qwen-7B model.

# ### Loading Model and Tokenizer
from modelscope.hub.snapshot_download import snapshot_download
model_dir = snapshot_download('qwen/Qwen-7B', 'v1.1.4')
# model_dir = snapshot_download('qwen/Qwen-14B', 'v1.0.4')
# model_dir = snapshot_download('qwen/Qwen-72B')
# Get the download path.
print(model_dir)
# /root/.cache/modelscope/hub/qwen/Qwen-7B

Press Ctrl+D to exit the Python environment.
Run the following command to move the downloaded Qwen model to the corresponding folder.

# mkdir -p /mnt/workspace/qwen-ckpts/${ckpt_folder_with_hf_suffix}
mkdir -p /mnt/workspace/qwen-ckpts/qwen-7b-hf
# cp -r ${path_to_downloaded_model}/* /mnt/workspace/qwen-ckpts/${ckpt_folder_with_hf_suffix}
cp -r /root/.cache/modelscope/hub/qwen/Qwen-7B/* /mnt/workspace/qwen-ckpts/qwen-7b-hf

Download the model from the HuggingFace community

In the DSW Terminal, run the following command to download the model files. This guide uses the Qwen-7B model as an example. To download the Qwen-14B or Qwen-72B model files, modify the following code as needed.

mkdir /mnt/workspace/qwen-ckpts
cd /mnt/workspace/qwen-ckpts
git clone https://huggingface.co/Qwen/Qwen-7B
# git clone https://huggingface.co/Qwen/Qwen-7B-Chat
# git clone https://huggingface.co/Qwen/Qwen-14B
# git clone https://huggingface.co/Qwen/Qwen-14B-Chat
# git clone https://huggingface.co/Qwen/Qwen-72B
# git clone https://huggingface.co/Qwen/Qwen-72B-Chat

Step 2: Prepare pre-training data

We recommend that you prepare the pre-training data in the DSW instance. This guide uses the WuDaoCorpora2.0 dataset, which is for research purposes only, as an example to show the data pre-processing workflow for Megatron training. You can download the small-scale sample data prepared by PAI or prepare the pre-training data yourself by following these steps.

Use the small-scale sample data processed by PAI

To help you try this solution, PAI provides processed small-scale sample data. You can run the following command in the DSW Terminal to download the sample data.

mkdir /mnt/workspace/qwen-datasets/
cd /mnt/workspace/qwen-datasets
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/alpaca_zh-qwen-train.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/alpaca_zh-qwen-valid.json
mkdir -p /mnt/workspace/qwen-datasets/wudao
cd /mnt/workspace/qwen-datasets/wudao
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/wudao_qwenbpe_content_document.bin
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/wudao_qwenbpe_content_document.idx

Process the data yourself

Download the open source WuDaoCorpora2.0 dataset to the /mnt/workspace/qwen-datasets working directory. In this guide, the decompressed folder is named wudao_200g.

PAI provides sample data for demonstration. You can run the following command in the DSW Terminal to download and decompress the dataset.
```
mkdir /mnt/workspace/qwen-datasets
cd /mnt/workspace/qwen-datasets
wget https://atp-modelzoo.oss-cn-hangzhou.aliyuncs.com/release/datasets/WuDaoCorpus2.0_base_sample.tgz
tar zxvf WuDaoCorpus2.0_base_sample.tgz 
mv WuDaoCorpus2.0_base_sample wudao_200g
```

In the Terminal, run the following command to perform data cleansing and file format conversion on the Wudao data. This generates a merged merged_wudao_cleaned.json file.

#! /bin/bash
set -ex
# Set the path where the raw data is stored.
data_dir=/mnt/workspace/qwen-datasets/wudao_200g

# Start the data cleansing process.
dataset_dir=$(dirname $data_dir)
mkdir -p ${dataset_dir}/cleaned_wudao_dataset
cd ${dataset_dir}/cleaned_wudao_dataset
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/llama2-codes/preprocess_wudao2.py
# Unlike the previous section, the key parameter is added and set to text.
python preprocess_wudao2.py -i ${data_dir} -o ${dataset_dir}/cleaned_wudao_dataset -k text -p 32

# Merge the cleansed data.
mkdir ${dataset_dir}/wudao
cd ${dataset_dir}/wudao
find ${dataset_dir}/cleaned_wudao_dataset -name "*.json" -exec cat {} + > ${dataset_dir}/wudao/merged_wudao_cleaned.json
rm -rf ${dataset_dir}/cleaned_wudao_dataset

After the command is run, the file structure of the qwen-datasets directory is as follows. A new wudao folder is added.

qwen-datasets
├── wudao_200g 
└── wudao
    └── merged_wudao_cleaned.json

In the Terminal, run the following command to split the merged_wudao_cleaned.json file into multiple groups and compress them. This facilitates multithreading in subsequent steps.

apt-get update
apt-get install zstd

# The number of chunks is set to 10. You can increase this value if data processing is slow.
NUM_PIECE=10

# Process the merged_wudao_cleaned.json file.
mkdir -p ${dataset_dir}/cleaned_zst/
# Query the total data length and split the data.
NUM=$(sed -n '$=' ${dataset_dir}/wudao/merged_wudao_cleaned.json)
echo "total line of dataset is $NUM, data will be split into $NUM_PIECE pieces for processing"
NUM=`expr $NUM / $NUM_PIECE`
echo "each group is processing $NUM sample"
split_dir=${dataset_dir}/split
mkdir $split_dir
split -l $NUM --numeric-suffixes --additional-suffix=.jsonl ${dataset_dir}/wudao/merged_wudao_cleaned.json $split_dir/

# Compress the data.
o_path=${dataset_dir}/cleaned_zst/
mkdir -p $o_path
files=$(ls $split_dir/*.jsonl)
for filename in $files
do
   f=$(basename $filename)
   zstd -z $filename -o $o_path/$f.zst &
done
rm -rf $split_dir
rm ${dataset_dir}/wudao/merged_wudao_cleaned.json

After the command is run, the file structure of the qwen-datasets directory is as follows. A new cleaned_zst folder is added, which contains 10 compressed files.

qwen-datasets
├── wudao_200g
├── wudao
└── cleaned_zst
    ├── 00.jsonl.zst
		│   ...
    └── 09.jsonl.zst

Create an MMAP-format pre-training dataset.

MMAP is a pre-tokenized data format that reduces the time spent waiting for data to be read during training and fine-tuning. This is especially advantageous when processing large-scale data. Follow these steps:

In the DSW Terminal, run the following command to copy the PAI-Megatron-Patch source code, which is a model training tool for Megatron, to the DSW working directory /mnt/workspace/.

cd /mnt/workspace/
# Method 1: Obtain the training code from the open source website.
git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git
# Method 2: Obtain the training code using wget. You need to run tar zxvf Pai-Megatron-Patch.tgz to decompress the file.
wget https://atp-modelzoo.oss-cn-hangzhou.aliyuncs.com/release/models/Pai-Megatron-Patch.tgz

In the Terminal, run the following command to convert the data to the MMAP format.

After the command is run, .bin and .idx files are generated in the /mnt/workspace/qwen-datasets/wudao directory.

# Install the tokenizer library required by Qwen.
pip install tiktoken
# Set the dataset path and working directory.
export dataset_dir=/mnt/workspace/qwen-datasets
export WORK_DIR=/mnt/workspace

# Generate MMAP-format pre-training datasets for the training and validation sets.
cd ${WORK_DIR}/Pai-Megatron-Patch/toolkits/pretrain_data_preprocessing
bash run_make_pretraining_dataset.sh \
../../Megatron-LM-23.04 \
${WORK_DIR}/Pai-Megatron-Patch/ \
${dataset_dir}/cleaned_zst/ \
qwenbpe \
${dataset_dir}/wudao/ \
${WORK_DIR}/qwen-ckpts/qwen-7b-hf
rm -rf ${dataset_dir}/cleaned_zst

The following table describes the six start parameters for running run_make_pretraining_dataset.sh.

Parameter	Description
MEGATRON_PATH=$1	The path of the open source Megatron code.
MEGATRON_PATCH_PATH=$2	The path of the Megatron Patch code.
input_data_dir=$3	The path of the folder that contains the packaged WuDao dataset.
tokenizer=$4	The type of the tokenizer. Set this to qwenbpe.
output_data_dir=$5	The path to save the output `.bin` and `.idx` files.
load_dir=$6	The path of the generated tokenizer_config.json file.

After the script is run, the file structure of the qwen-datasets directory is as follows.

qwen-datasets
├── wudao_200g
└── wudao
   ├── wudao_qwenbpe_content_document.bin
   └── wudao_qwenbpe_content_document.idx

Step 3: Train the model with Megatron

You can train the model with Megatron by following these steps.

Convert the model format

Convert the HuggingFace model files to the Megatron format.

Download the converted Megatron model

To help you try this solution, PAI provides a model with the format already converted. You can run the following command in the Terminal to download the model.

cd /mnt/workspace/
mkdir qwen-ckpts
cd qwen-ckpts
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-ckpts/qwen-7b-hf-to-mg-tp1-pp1.tgz
tar -zxf qwen-7b-hf-to-mg-tp1-pp1.tgz
mv qwen-7b-hf-to-mg-tp1-pp1 qwen-7b-hf-to-megatron-tp1-pp1

Convert the HuggingFace model to the Megatron format

In the Terminal, run the following command to use the model conversion tool provided by PAI to convert the HuggingFace model files to the Megatron format:

# Convert the model.
cd /mnt/workspace/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen
sh model_convertor.sh \
../../../Megatron-LM-main        \
/mnt/workspace/qwen-ckpts/qwen-7b-hf         \
/mnt/workspace/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1  \
1  \
1  \
qwen-7b \
0 \
false

The following table describes the parameters for running model_convertor.sh.

Parameter	Description
MEGATRON_PATH=$1	The path of the open source Megatron code.
SOURCE_CKPT_PATH=$2	The path of the HuggingFace model files.
TARGET_CKPT_PATH=$3	The path to save the converted Megatron model.
TP=$4	The number of tensor parallelism shards. This must be the same as the number used for training. The number of shards varies based on the model size. Modify this parameter as needed when converting the model: Qwen-7B: 1. Qwen-14B: 2. Qwen-72B: 8.
PP=$5	The number of pipeline parallelism shards. This must be the same as the number used for training. The number of shards varies based on the model size. Modify this parameter as needed when converting the model: Qwen-7B: 1. Qwen-14B: 1. Qwen-72B: 2.
MN=$6	The model name: qwen-7b, qwen-14b, or qwen-72b.
EXTRA_VOCAB_SIZE=$7	The extra vocabulary size.
mg2hf=$8	Specifies whether to convert a Megatron model to a HuggingFace model.

Pre-trained model

You can pre-train the model in a single DSW instance or submit a distributed training task with multi-GPU servers in the DLC environment. The training process takes about two hours. After the task succeeds, the model files are saved to the /mnt/workspace/output_megatron_qwen/ directory.

DSW standalone pre-trained model

The following code shows an example of how to run the command for the Qwen-7B model in the Terminal:

export WORK_DIR=/mnt/workspace
cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen
sh run_pretrain_megatron_qwen.sh  \
dsw  \
${WORK_DIR}/Pai-Megatron-Patch  \
7B   \
1    \
8 \
1e-5   \
1e-6   \
2048  \
2048  \
85   \
fp16  \
1   \
1  \
sel  \
true   \
false  \
false   \
false  \
100000  \
${WORK_DIR}/qwen-datasets/wudao/wudao_qwenbpe_content_document   \
${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1   \
100000000   \
10000   \
${WORK_DIR}/output_megatron_qwen/

The following table describes the parameters for running run_pretrain_megatron_qwen.sh.

Parameter	Description
ENV=$1	The running environment: dsw dlc
MEGATRON_PATH=$2	The path of the open source Megatron code.
MODEL_SIZE=$3	The model size: 7B, 14B, or 72B.
BATCH_SIZE=$4	The number of samples for one iteration of training on each GPU: 4 or 8.
GLOBAL_BATCH_SIZE=$5	Total number of training samples.
LR=$6	The learning rate: 1e-5 or 5e-5.
MIN_LR=$7	The minimum learning rate: 1e-6 or 5e-6.
SEQ_LEN=$8	The sequence length.
PAD_LEN=${9}	The padding length.
EXTRA_VOCAB_SIZE=${10}	The extra vocabulary size: Qwen-7B: 85. Qwen-14B: 213. Qwen-72B: 213.
PR=${11}	The training precision: fp16 or bf16.
TP=${12}	The degree of model parallelism.
PP=${13}	The degree of pipeline parallelism.
AC=${14}	The activation checkpointing mode: full sel
DO=${15}	Specifies whether to use the Megatron version of the Zero-1 optimizer to reduce VRAM usage: true false
FL=${16}	Specifies whether to enable Flash Attention: true false
SP=${17}	Specifies whether to use sequence parallelism: true false
TE=${18}	Specifies whether to enable Transformer-engine acceleration. This feature requires gu8xf GPUs.
SAVE_INTERVAL=${19}	The interval for saving checkpoint files.
DATASET_PATH=${20}	The path of the training dataset.
PRETRAIN_CHECKPOINT_PATH=${21}	The path of the pre-trained model.
TRAIN_TOKENS=${22}	Training tokens
WARMUP_TOKENS=${23}	The number of warmup tokens.
OUTPUT_BASEPATH=${24}	The path to save the output model files.

DLC Distributed Pre-trained Model

After debugging on a single instance, you can configure a distributed task with multi-GPU servers in the DLC environment. Follow these steps:

Go to the Create Job page.
1. Log on to the PAI console. select the target region and workspace at the top of the page, and then click Deep Learning Containers (DLC).
2. On the Deep Learning Containers (DLC) page, click Create Job.

On the Create Job page, configure the following key parameters and keep the default values for other parameters. For more information, see Create a training task.

Parameter		Description
Basic Information	Job Name	Enter a custom task name. This guide uses test_qwen_dlc.
Environment Information	Image Configuration	Select Image Address and enter `pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:1.12-ubuntu20.04-py3.10-cuda11.3-megatron-patch-llm` in the text box.
	Mount dataset	Click Custom Dataset and configure the following parameters: Custom Dataset: Select the NAS dataset that you created. Mount Path: Set this to `/mnt/workspace/`.
	Startup Command	Configure the following command. The start parameters for the run_pretrain_megatron_qwen.sh script are the same as those for pre-training the model on a single DSW instance. `export WORK_DIR=/mnt/workspace cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen sh run_pretrain_megatron_qwen.sh \ dlc \ ${WORK_DIR}/PAI-Megatron-Patch \ 7B \ 1 \ 8 \ 1e-5 \ 1e-6 \ 2048 \ 2048 \ 85 \ fp16 \ 1 \ 1 \ sel \ true \ false \ false \ false \ 100000 \ ${WORK_DIR}/qwen-datasets/wudao/wudao_qwenbpe_content_document \ ${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \ 100000000 \ 10000 \ ${WORK_DIR}/output_megatron_qwen/`
Resource Information	Resource Type	Select Lingjun Intelligent Computing.
	Source	Select Resource Type.
	Resource Type	Select the resource quota that you created for Lingjun resources.
	Framework	Select PyTorch.
	Job Resource	Configure the following parameters for the Worker node: Nodes: 2. To perform multi-node training, set Nodes to the required number of machines. GPUs: 8 vCPUs: 90 Note The number of CPU cores cannot exceed 96. Memory (GiB): 1024 Shared Memory (GiB): 1024

Click OK. The page automatically goes to the Deep Learning Containers (DLC) page. When the Status changes to Succeeded, the training task is successful.

Fine-tune the model with supervised learning

You can fine-tune the model in a single DSW instance or submit a distributed task with multi-GPU servers in the DLC environment. The training process takes about two hours. After the task succeeds, the model files are saved to the /mnt/workspace/output_megatron_qwen/ directory.

Before you fine-tune the model, go to the Step 2: Prepare pre-training data section. On the Use the small-scale sample data processed by PAI tab, download the JSON file using the provided code.

Fine-tune the model.

Fine-tune the model on a single DSW instance

The following code shows an example of how to run the command for the Qwen-7B model in the Terminal:

export WORK_DIR=/mnt/workspace
cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen
sh run_finetune_megatron_qwen_withGA.sh  \
dsw  \
${WORK_DIR}/Pai-Megatron-Patch  \
7B     \
1      \
96 \
1e-5   \
1e-6   \
2048   \
2048     \
85      \
bf16   \
1      \
1      \
sel    \
true   \
false  \
false  \
false \
1000 \
${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-train.json   \
${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-valid.json   \
${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1   \
2000   \
10 \
${WORK_DIR}/output_megatron_qwen/

The following table describes the parameters for running run_finetune_megatron_qwen_withGA.sh.

Parameter	Description
ENV=$1	The running environment: dlc dsw
MEGATRON_PATH=$2	The path of the open source Megatron code.
MODEL_SIZE=$3	The model size: 7B, 14B, or 72B.
BATCH_SIZE=$4	The number of samples for one iteration of training on each GPU: 1, 2, 4, or 8.
GLOBAL_BATCH_SIZE=$5	The total number of samples for one iteration of fine-tuning: 64, 96, or 128.
LR=$6	The learning rate: 1e-5 or 5e-5.
MIN_LR=$7	The minimum learning rate: 1e-6 or 5e-6.
SEQ_LEN=$8	The sequence length.
PAD_LEN=$9	The padding sequence length.
EXTRA_VOCAB_SIZE=${10}	The extra vocabulary size: Qwen-7B: 85. Qwen-14B: 213. Qwen-72B: 213.
PR=${11}	The training precision: fp16 or bf16.
TP=${12}	The degree of model parallelism.
PP=${13}	The degree of pipeline parallelism.
AC=${14}	The activation checkpointing mode: full or sel.
DO=${15}	Specifies whether to use the Megatron version of the Zero-1 optimizer to reduce VRAM usage: true false
FL=${16}	Specifies whether to enable Flash Attention: true false
SP=${17}	Specifies whether to use sequence parallelism: true false
TE=${18}	Specifies whether to enable Transformer-engine acceleration. This feature requires gu8xf GPUs.
SAVE_INTERVAL=${19}	The step interval for saving the model.
DATASET_PATH=${20}	The path of the training dataset.
VALID_DATASET_PATH=${21}	The path of the validation set.
PRETRAIN_CHECKPOINT_PATH=${22}	The path of the pre-trained model.
TRAIN_ITERS=${23}	The number of training iterations.
LR_WARMUP_ITERS=${24}	The step at which the learning rate increases the most.
OUTPUT_BASEPATH=${25}	The path to save the output model files.

Fine-tune the model with distributed training on DLC

After debugging in a single DSW instance, you can configure a distributed task with multi-GPU servers in the DLC environment. When you submit the DLC training task, configure the Startup Command as follows. For information about other parameter settings, see Step 2: Pre-train the model.

export WORK_DIR=/mnt/workspace
cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen
sh run_finetune_megatron_qwen_withGA.sh  \
dlc  \
${WORK_DIR}/Pai-Megatron-Patch  \
7B     \
1      \
96 \
1e-5   \
1e-6   \
2048   \
2048     \
85      \
bf16   \
1      \
1      \
sel    \
true   \
false  \
false  \
false \
1000 \
${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-train.json   \
${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-valid.json   \
${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1   \
2000   \
10 \
${WORK_DIR}/output_megatron_qwen/

The parameters for running run_finetune_megatron_qwen_withGA.sh are the same as those for fine-tuning the model on a single DSW instance.

Step 4: Perform offline inference

After the model training is complete, you can use the Megatron inference pipeline to perform offline inference and evaluate the model's performance. Follow these steps:

Download the test sample pred_input.jsonl and upload it to the /mnt/workspace directory in DSW. For more information, see Upload and download files.

Note
The data structure for inference must be consistent with the data structure used for fine-tuning.
Copy all JSON files and the tokenizer.model file from the pre-training model path to the path of the generated model. The generated model path is the subdirectory under {OUTPUT_BASEPATH }/checkpoint and is at the same level as latest_checkpointed_iteration.txt.

Note
Replace the paths in the command with your actual paths.
```
cd /mnt/workspace/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1
cp *.json /mnt/workspace/output_megatron_qwen/checkpoint/dswXXX/
cp tokenizer.model /mnt/workspace/output_megatron_qwen/checkpoint/dswXXX/
```

In the Terminal, run the following command to perform offline inference. The inference results are saved to the /mnt/workspace/qwen_pred.txt file. You can evaluate the model's performance based on the results.

Note

Before you run the command, you must set the CUDA_VISIBLE_DEVICES parameter to 0 and the GPUS_PER_NODE parameter to 1 in the run_text_generation_megatron_qwen.sh script.

export WORK_DIR=/mnt/workspace
cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen
bash run_text_generation_megatron_qwen.sh \
dsw \
${WORK_DIR}/PAI-Megatron-Patch \
/mnt/workspace/output_megatron_qwen/checkpoint/dswXXX \
7B \
1 \
1 \
1024 \
1024 \
85 \
fp16 \
10 \
512 \
512 \
${WORK_DIR}/pred_input.jsonl \
${WORK_DIR}/qwen_pred.txt \
0 \
1.0 \
1.2

The following table describes the start parameters for the run_text_generation_megatron_qwen.sh script.

Parameter	Description
ENV=$1	The running environment: dlc dsw
MEGATRON_PATCH_PATH=$2	The path of the Megatron Patch code.
CHECKPOINT_PATH=$3	The path where the model was saved during the training phase. Important Replace this with your model path.
MODEL_SIZE=$4	The model size: 7B, 14B, or 72B.
TP=$5	The degree of model parallelism. Important If this parameter is set to 1, you can use a single GPU for inference. If this parameter is set to a value greater than 1, you must use the corresponding number of GPUs for inference.
BS=$6	The number of samples for one iteration of inference on each GPU: 1, 4, or 8.
SEQ_LEN=$7	The sequence length: 256, 512, or 1024.
PAD_LEN=$8	PAD length: The length to which the text is padded.
EXTRA_VOCAB_SIZE=${9}	The number of tokens added during model conversion: Qwen-7B: 85. Qwen-14B: 213. Qwen-72B: 213.
PR=${10}	The precision used for inference: fp16 or bf16.
TOP_K=${11}	The number of highest probability vocabulary tokens to keep for top-k-filtering (0 to n): 0, 5, 10, or 20.
INPUT_SEQ_LEN=${12}	The input sequence length: 512.
OUTPUT_SEQ_LEN=${13}	The output sequence length: 256.
INPUT_FILE=${14}	The text file for inference: pred_input.jsonl. Each line is a sample.
OUTPUT_FILE=${15}	The output file for inference: qwen_pred.txt.
TOP_P=${16}	The cumulative probability for top-p-filtering (0 to 1): 0, 0.85, or 0.95. Note Only one of TOP_K and TOP_P can be non-zero.
TEMPERATURE=${17}	Temperature penalty in the (sampling)(policy): 1–n.
REPETITION_PENALTY=${18}	The penalty for repetition. The value can be from 1 to 2. The default value is 1.2.

Step 5: Convert the model format

After you complete offline inference, if the model performance meets your expectations, you can convert the trained Megatron model to the HuggingFace format. You can then use the converted HuggingFace model for online service deployment. Follow these steps:

In the Terminal, run the following command to convert the trained Megatron model to the HuggingFace format.

export WORK_DIR=/mnt/workspace
cd /mnt/workspace/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen
sh model_convertor.sh \
../../../Megatron-LM-main        \
${WORK_DIR}/output_megatron_qwen/checkpoint/${path}/iter_*******         \
/mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1/  \
1  \
1  \
qwen-7b \
0 \
true

The following table describes the parameters for running the model_convertor.sh script.

Parameter	Description
MEGATRON_PATH=$1	The path of the open source Megatron code.
SOURCE_CKPT_PATH=$2	The path of the Megatron model obtained from training. The path must be specific to the `iter_` directory. For example: `${WORK_DIR}/output_megatron_qwen/checkpoint/dsw-pretrain-megatron-qwen-7B-lr-1e-5-bs-1-seqlen-2048-pr-bf16-tp-1-pp-1-ac-sel-do-true-sp-false-tt--wt-/iter_****`. Important Replace this with your model path. If you use a pre-trained model for conversion, delete all distrib_optim.pt** files from the model path.
TARGET_CKPT_PATH=$3	The path to save the converted HuggingFace model.
TP=$4	The number of tensor parallelism shards. This must be the same as the number used for training.
PP=$5	The number of pipeline parallelism shards. This must be the same as the number used for training.
MN=$6	The model name: qwen-7b, qwen-14b, or qwen-72b.
EXTRA_VOCAB_SIZE=$7	The extra vocabulary size.
mg2hf=$8	Specifies whether to convert a Megatron model to a HuggingFace model.

Copy the .json, .py, and .tiktoken files from the open source HuggingFace model folder /mnt/workspace/qwen-ckpts/qwen-7b-hf to the /mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1 directory to ensure that the model can be used.

Important
Note that you do not need to copy the pytorch_model.bin.index.json file.

Perform offline inference on the HuggingFace model

You can use the HuggingFace & DeepSpeed inference pipeline to perform offline inference on the converted HuggingFace model files. Taking the Qwen-7B model as an example, create an infer.py file in any directory in the Terminal with the following content. Run the infer.py file to perform offline inference and evaluate the model's performance based on the results.

#!/usr/bin/env python
#encoding=utf-8
from transformers import AutoTokenizer, LlamaTokenizer
from transformers import LlamaForCausalLM
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
 
checkpoint = '/mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1'
print(checkpoint)
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(checkpoint,device_map="auto", trust_remote_code=True)
 
prompts= 'Write a quicksort algorithm'
p = f"Human:{prompts}"
print(p)
inputs = tokenizer.encode(p, return_tensors="pt").to(model.device)
outputs = model.generate(inputs,max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Replace checkpoint with the path where the converted HuggingFace model files are stored. This guide uses /mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1.

Step 6: Deploy and call the model service

After you complete offline inference and evaluate the model's performance, you can deploy the converted HuggingFace model as an online service and call it in a production environment for inference. Follow these steps:

Deploy the model service

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

On the Custom Deployment page, configure the following key parameters and keep the default values for other parameters.

Parameter		Description
Basic Information	Service Name	Enter a custom model service name that is unique within the region. This guide uses test_qwen.
Environment Information	Deployment Method	Select Image-based Deployment and select Enable Web App.
	Image Configuration	Select Image Address and enter the following registry address in the text box: `eas-registry-vpc.cn-wulanchabu.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.4-vllm`.
	Directly Mount	Select NAS as the mount type. Click Standard NAS and configure the following parameters: Select File System: Select the NAS file system that you used to create the dataset. Mount Target: Select the mount target that you used to create the dataset. File System Path: Set this to the path of the converted HuggingFace model stored in NAS. This guide uses `/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1`. Mount Path: Specify the path after mounting. This guide uses `/qwen-7b`.
	Command	Set this to `python webui/webui_server.py --port=8000 --model-path=/qwen-7b --tensor-parallel-size 1 --backend=vllm`. Where: --model-path: Must be the same as the mount path in the model configuration. --tensor-parallel-size: The number of tensor parallelism shards for the model. This needs to be adjusted based on the number of GPUs. Set this to 1 for a 7B model and 8 for a 72B model (requires an eight-GPU instance).
	Port Number	Set this to 8000.
Resource Information	Resource Type	Select Resource Quota.
	Resource Quota	Select the resource quota that you created for Lingjun resources.
	Instance Count	Configure this based on the model and the selected resources. For a 7B model, set Instance Count to 1.
	Deployment	For a 7B model, configure the resources used by each instance as follows: vCPUs: 16. Memory (GB): 64. GPUs: 1.
Service Access	VPC	After you configure the NAS mount target, the system automatically matches the VPC, vSwitch, and security group with the preset NAS file system.
	vSwitch
	Security Group Name

Click Deploy.

When the Service Status changes to Running, the service is deployed.

Call the service

After the service is deployed, you can call it for inference. Follow these steps:

In the service list, click the name of the target service. In the upper-right corner of the page, click View Web App.
On the WebUI page, perform model inference.