This guide helps large language model (LLM) developers get started with the Lingjun Intelligent Computing platform. You will learn the complete development workflow for Qwen LLMs, such as Qwen-7B, Qwen-14B, and Qwen-72B. The workflow includes efficient distributed training, three-stage instruction tuning, offline model inference, and online service deployment. This topic uses the Qwen-7B model to demonstrate the process.
Prerequisites
This guide uses the Qwen-7B v1.1.4 model as an example. Before you begin, complete the following preparations:
-
Activate Platform for AI (PAI), including Data Science Workshop (DSW), Deep Learning Containers (DLC), and Elastic Algorithm Service (EAS), and create a default workspace. For more information, see Activate PAI and create a default workspace.
-
Purchase Lingjun resources and create a resource quota. The following table lists the supported resource specifications for different model sizes. Select the appropriate resources based on the model size you use. For more information about the node specifications of Lingjun resources, see Lingjun Serverless pricing details. For more information about how to purchase resources and create a quota, see Create a resource group and purchase Lingjun resources and Create a resource quota.
Model size
Full-parameter training resources
Inference resources (minimum)
Megatron Training Model Segment
7B
8 × gu7xf GPUs, 8 × gu7ef GPUs
1 × V100 (32 GB VRAM), 1 × A10 (24 GB VRAM)
TP: 1, PP: 1
14B
8 × gu7xf GPUs, 8 × gu7ef GPUs
2 × V100 (32 GB VRAM), 2 × A10 (24 GB VRAM)
TP: 2, PP: 1
72B
4 × 8 gu7xf GPUs, 4 × 8 gu7ef GPUs
6 × V100 (32 GB VRAM), 2 × gu7xf GPUs
TP: 8, PP: 2
-
Create a General-purpose NAS file system dataset to store training files and results. Configure the default mount path to
/mnt/data/nas. For more information, see Create and manage datasets. -
A DSW instance has been created with the following key parameter settings. For more information, see Create a DSW instance.
-
Resource Quota: Select the resource quota that you created for Lingjun resources.
-
Instance Type: Configure the following resource specifications.
-
vCPUs: 90.
-
Memory (GiB): 1024.
-
Shared Memory (GiB): 1024.
-
GPUs: At least 8.
-
-
Dataset Mounting: Click Custom Dataset, select the dataset that you created, and use the default mount path.
-
Image Configuration: On the Image Address tab, set the runtime image to
pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:1.12-ubuntu20.04-py3.10-cuda11.3-megatron-patch-llm.
-
-
If you use a Resource Access Management (RAM) user to perform these operations, grant the RAM user the required permissions for DSW, DLC, or EAS. For more information, see Cloud product dependencies and permissions: DSW, Cloud product dependencies and permissions: DLC, or Cloud product dependencies and permissions: EAS.
Limits
This best practice is available only in the China (Ulanqab) region.
Step 1: Prepare the Qwen model
This topic provides three ways to download the model. Follow these steps:
-
Go to the PAI DSW development environment.
-
Log on to the PAI console.
-
In the upper-left corner of the page, select the region where you want to use the service: China (Ulanqab).
-
In the navigation pane on the left, click Workspaces. On the page that appears, click the name of the workspace that you want to manage.
-
In the navigation pane on the left, choose .
-
In the Actions column of the target instance, click Open.
-
-
In the top menu bar, click Terminal. On the tab that appears, click create a terminal.
-
Download the Qwen model.
Download the model from the ModelScope community
-
In the Terminal, run the following command to install ModelScope.
-
Run the following command to enter the Python environment.
-
The following code shows how to download the model files for the Qwen-7B model.
-
Press
Ctrl+Dto exit the Python environment. -
Run the following command to move the downloaded Qwen model to the corresponding folder.
pip install modelscopepython# ### Loading Model and Tokenizer from modelscope.hub.snapshot_download import snapshot_download model_dir = snapshot_download('qwen/Qwen-7B', 'v1.1.4') # model_dir = snapshot_download('qwen/Qwen-14B', 'v1.0.4') # model_dir = snapshot_download('qwen/Qwen-72B') # Get the download path. print(model_dir) # /root/.cache/modelscope/hub/qwen/Qwen-7B# mkdir -p /mnt/workspace/qwen-ckpts/${ckpt_folder_with_hf_suffix} mkdir -p /mnt/workspace/qwen-ckpts/qwen-7b-hf # cp -r ${path_to_downloaded_model}/* /mnt/workspace/qwen-ckpts/${ckpt_folder_with_hf_suffix} cp -r /root/.cache/modelscope/hub/qwen/Qwen-7B/* /mnt/workspace/qwen-ckpts/qwen-7b-hfDownload the model from the HuggingFace community
In the DSW Terminal, run the following command to download the model files. This guide uses the Qwen-7B model as an example. To download the Qwen-14B or Qwen-72B model files, modify the following code as needed.
mkdir /mnt/workspace/qwen-ckpts cd /mnt/workspace/qwen-ckpts git clone https://huggingface.co/Qwen/Qwen-7B # git clone https://huggingface.co/Qwen/Qwen-7B-Chat # git clone https://huggingface.co/Qwen/Qwen-14B # git clone https://huggingface.co/Qwen/Qwen-14B-Chat # git clone https://huggingface.co/Qwen/Qwen-72B # git clone https://huggingface.co/Qwen/Qwen-72B-Chat -
Step 2: Prepare pre-training data
We recommend that you prepare the pre-training data in the DSW instance. This guide uses the WuDaoCorpora2.0 dataset, which is for research purposes only, as an example to show the data pre-processing workflow for Megatron training. You can download the small-scale sample data prepared by PAI or prepare the pre-training data yourself by following these steps.
Use the small-scale sample data processed by PAI
To help you try this solution, PAI provides processed small-scale sample data. You can run the following command in the DSW Terminal to download the sample data.
mkdir /mnt/workspace/qwen-datasets/
cd /mnt/workspace/qwen-datasets
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/alpaca_zh-qwen-train.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/alpaca_zh-qwen-valid.json
mkdir -p /mnt/workspace/qwen-datasets/wudao
cd /mnt/workspace/qwen-datasets/wudao
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/wudao_qwenbpe_content_document.bin
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/wudao_qwenbpe_content_document.idx
Process the data yourself
-
Download the open source WuDaoCorpora2.0 dataset to the
/mnt/workspace/qwen-datasetsworking directory. In this guide, the decompressed folder is named wudao_200g.PAI provides sample data for demonstration. You can run the following command in the DSW Terminal to download and decompress the dataset.
mkdir /mnt/workspace/qwen-datasets cd /mnt/workspace/qwen-datasets wget https://atp-modelzoo.oss-cn-hangzhou.aliyuncs.com/release/datasets/WuDaoCorpus2.0_base_sample.tgz tar zxvf WuDaoCorpus2.0_base_sample.tgz mv WuDaoCorpus2.0_base_sample wudao_200g -
In the Terminal, run the following command to perform data cleansing and file format conversion on the Wudao data. This generates a merged merged_wudao_cleaned.json file.
#! /bin/bash set -ex # Set the path where the raw data is stored. data_dir=/mnt/workspace/qwen-datasets/wudao_200g # Start the data cleansing process. dataset_dir=$(dirname $data_dir) mkdir -p ${dataset_dir}/cleaned_wudao_dataset cd ${dataset_dir}/cleaned_wudao_dataset wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/llama2-codes/preprocess_wudao2.py # Unlike the previous section, the key parameter is added and set to text. python preprocess_wudao2.py -i ${data_dir} -o ${dataset_dir}/cleaned_wudao_dataset -k text -p 32 # Merge the cleansed data. mkdir ${dataset_dir}/wudao cd ${dataset_dir}/wudao find ${dataset_dir}/cleaned_wudao_dataset -name "*.json" -exec cat {} + > ${dataset_dir}/wudao/merged_wudao_cleaned.json rm -rf ${dataset_dir}/cleaned_wudao_datasetAfter the command is run, the file structure of the
qwen-datasetsdirectory is as follows. A new wudao folder is added.qwen-datasets ├── wudao_200g └── wudao └── merged_wudao_cleaned.json -
In the Terminal, run the following command to split the merged_wudao_cleaned.json file into multiple groups and compress them. This facilitates multithreading in subsequent steps.
apt-get update apt-get install zstd # The number of chunks is set to 10. You can increase this value if data processing is slow. NUM_PIECE=10 # Process the merged_wudao_cleaned.json file. mkdir -p ${dataset_dir}/cleaned_zst/ # Query the total data length and split the data. NUM=$(sed -n '$=' ${dataset_dir}/wudao/merged_wudao_cleaned.json) echo "total line of dataset is $NUM, data will be split into $NUM_PIECE pieces for processing" NUM=`expr $NUM / $NUM_PIECE` echo "each group is processing $NUM sample" split_dir=${dataset_dir}/split mkdir $split_dir split -l $NUM --numeric-suffixes --additional-suffix=.jsonl ${dataset_dir}/wudao/merged_wudao_cleaned.json $split_dir/ # Compress the data. o_path=${dataset_dir}/cleaned_zst/ mkdir -p $o_path files=$(ls $split_dir/*.jsonl) for filename in $files do f=$(basename $filename) zstd -z $filename -o $o_path/$f.zst & done rm -rf $split_dir rm ${dataset_dir}/wudao/merged_wudao_cleaned.jsonAfter the command is run, the file structure of the
qwen-datasetsdirectory is as follows. A newcleaned_zstfolder is added, which contains 10 compressed files.qwen-datasets ├── wudao_200g ├── wudao └── cleaned_zst ├── 00.jsonl.zst │ ... └── 09.jsonl.zst -
Create an MMAP-format pre-training dataset.
MMAP is a pre-tokenized data format that reduces the time spent waiting for data to be read during training and fine-tuning. This is especially advantageous when processing large-scale data. Follow these steps:
-
In the DSW Terminal, run the following command to copy the PAI-Megatron-Patch source code, which is a model training tool for Megatron, to the DSW working directory
/mnt/workspace/.cd /mnt/workspace/ # Method 1: Obtain the training code from the open source website. git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git # Method 2: Obtain the training code using wget. You need to run tar zxvf Pai-Megatron-Patch.tgz to decompress the file. wget https://atp-modelzoo.oss-cn-hangzhou.aliyuncs.com/release/models/Pai-Megatron-Patch.tgz -
In the Terminal, run the following command to convert the data to the MMAP format.
After the command is run,
.binand.idxfiles are generated in the/mnt/workspace/qwen-datasets/wudaodirectory.# Install the tokenizer library required by Qwen. pip install tiktoken # Set the dataset path and working directory. export dataset_dir=/mnt/workspace/qwen-datasets export WORK_DIR=/mnt/workspace # Generate MMAP-format pre-training datasets for the training and validation sets. cd ${WORK_DIR}/Pai-Megatron-Patch/toolkits/pretrain_data_preprocessing bash run_make_pretraining_dataset.sh \ ../../Megatron-LM-23.04 \ ${WORK_DIR}/Pai-Megatron-Patch/ \ ${dataset_dir}/cleaned_zst/ \ qwenbpe \ ${dataset_dir}/wudao/ \ ${WORK_DIR}/qwen-ckpts/qwen-7b-hf rm -rf ${dataset_dir}/cleaned_zstThe following table describes the six start parameters for running run_make_pretraining_dataset.sh.
Parameter
Description
MEGATRON_PATH=$1
The path of the open source Megatron code.
MEGATRON_PATCH_PATH=$2
The path of the Megatron Patch code.
input_data_dir=$3
The path of the folder that contains the packaged WuDao dataset.
tokenizer=$4
The type of the tokenizer. Set this to qwenbpe.
output_data_dir=$5
The path to save the output
.binand.idxfiles.load_dir=$6
The path of the generated tokenizer_config.json file.
After the script is run, the file structure of the
qwen-datasetsdirectory is as follows.qwen-datasets ├── wudao_200g └── wudao ├── wudao_qwenbpe_content_document.bin └── wudao_qwenbpe_content_document.idx
-
Step 3: Train the model with Megatron
You can train the model with Megatron by following these steps.
Convert the model format
Convert the HuggingFace model files to the Megatron format.
Download the converted Megatron model
To help you try this solution, PAI provides a model with the format already converted. You can run the following command in the Terminal to download the model.
cd /mnt/workspace/
mkdir qwen-ckpts
cd qwen-ckpts
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-ckpts/qwen-7b-hf-to-mg-tp1-pp1.tgz
tar -zxf qwen-7b-hf-to-mg-tp1-pp1.tgz
mv qwen-7b-hf-to-mg-tp1-pp1 qwen-7b-hf-to-megatron-tp1-pp1
Convert the HuggingFace model to the Megatron format
In the Terminal, run the following command to use the model conversion tool provided by PAI to convert the HuggingFace model files to the Megatron format:
# Convert the model.
cd /mnt/workspace/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen
sh model_convertor.sh \
../../../Megatron-LM-main \
/mnt/workspace/qwen-ckpts/qwen-7b-hf \
/mnt/workspace/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \
1 \
1 \
qwen-7b \
0 \
false
The following table describes the parameters for running model_convertor.sh.
|
Parameter |
Description |
|
MEGATRON_PATH=$1 |
The path of the open source Megatron code. |
|
SOURCE_CKPT_PATH=$2 |
The path of the HuggingFace model files. |
|
TARGET_CKPT_PATH=$3 |
The path to save the converted Megatron model. |
|
TP=$4 |
The number of tensor parallelism shards. This must be the same as the number used for training. The number of shards varies based on the model size. Modify this parameter as needed when converting the model:
|
|
PP=$5 |
The number of pipeline parallelism shards. This must be the same as the number used for training. The number of shards varies based on the model size. Modify this parameter as needed when converting the model:
|
|
MN=$6 |
The model name: qwen-7b, qwen-14b, or qwen-72b. |
|
EXTRA_VOCAB_SIZE=$7 |
The extra vocabulary size. |
|
mg2hf=$8 |
Specifies whether to convert a Megatron model to a HuggingFace model. |
Pre-trained model
You can pre-train the model in a single DSW instance or submit a distributed training task with multi-GPU servers in the DLC environment. The training process takes about two hours. After the task succeeds, the model files are saved to the /mnt/workspace/output_megatron_qwen/ directory.
DSW standalone pre-trained model
The following code shows an example of how to run the command for the Qwen-7B model in the Terminal:
export WORK_DIR=/mnt/workspace
cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen
sh run_pretrain_megatron_qwen.sh \
dsw \
${WORK_DIR}/Pai-Megatron-Patch \
7B \
1 \
8 \
1e-5 \
1e-6 \
2048 \
2048 \
85 \
fp16 \
1 \
1 \
sel \
true \
false \
false \
false \
100000 \
${WORK_DIR}/qwen-datasets/wudao/wudao_qwenbpe_content_document \
${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \
100000000 \
10000 \
${WORK_DIR}/output_megatron_qwen/
The following table describes the parameters for running run_pretrain_megatron_qwen.sh.
|
Parameter |
Description |
|
ENV=$1 |
The running environment:
|
|
MEGATRON_PATH=$2 |
The path of the open source Megatron code. |
|
MODEL_SIZE=$3 |
The model size: 7B, 14B, or 72B. |
|
BATCH_SIZE=$4 |
The number of samples for one iteration of training on each GPU: 4 or 8. |
|
GLOBAL_BATCH_SIZE=$5 |
Total number of training samples. |
|
LR=$6 |
The learning rate: 1e-5 or 5e-5. |
|
MIN_LR=$7 |
The minimum learning rate: 1e-6 or 5e-6. |
|
SEQ_LEN=$8 |
The sequence length. |
|
PAD_LEN=${9} |
The padding length. |
|
EXTRA_VOCAB_SIZE=${10} |
The extra vocabulary size:
|
|
PR=${11} |
The training precision: fp16 or bf16. |
|
TP=${12} |
The degree of model parallelism. |
|
PP=${13} |
The degree of pipeline parallelism. |
|
AC=${14} |
The activation checkpointing mode:
|
|
DO=${15} |
Specifies whether to use the Megatron version of the Zero-1 optimizer to reduce VRAM usage:
|
|
FL=${16} |
Specifies whether to enable Flash Attention:
|
|
SP=${17} |
Specifies whether to use sequence parallelism:
|
|
TE=${18} |
Specifies whether to enable Transformer-engine acceleration. This feature requires gu8xf GPUs. |
|
SAVE_INTERVAL=${19} |
The interval for saving checkpoint files. |
|
DATASET_PATH=${20} |
The path of the training dataset. |
|
PRETRAIN_CHECKPOINT_PATH=${21} |
The path of the pre-trained model. |
|
TRAIN_TOKENS=${22} |
Training tokens |
|
WARMUP_TOKENS=${23} |
The number of warmup tokens. |
|
OUTPUT_BASEPATH=${24} |
The path to save the output model files. |
DLC Distributed Pre-trained Model
After debugging on a single instance, you can configure a distributed task with multi-GPU servers in the DLC environment. Follow these steps:
-
Go to the Create Job page.
-
Log on to the PAI console. select the target region and workspace at the top of the page, and then click Deep Learning Containers (DLC).
-
On the Deep Learning Containers (DLC) page, click Create Job.
-
-
On the Create Job page, configure the following key parameters and keep the default values for other parameters. For more information, see Create a training task.
Parameter
Description
Basic Information
Job Name
Enter a custom task name. This guide uses test_qwen_dlc.
Environment Information
Image Configuration
Select Image Address and enter
pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:1.12-ubuntu20.04-py3.10-cuda11.3-megatron-patch-llmin the text box.Mount dataset
Click Custom Dataset and configure the following parameters:
-
Custom Dataset: Select the NAS dataset that you created.
-
Mount Path: Set this to
/mnt/workspace/.
Startup Command
Configure the following command. The start parameters for the run_pretrain_megatron_qwen.sh script are the same as those for pre-training the model on a single DSW instance.
export WORK_DIR=/mnt/workspace cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen sh run_pretrain_megatron_qwen.sh \ dlc \ ${WORK_DIR}/PAI-Megatron-Patch \ 7B \ 1 \ 8 \ 1e-5 \ 1e-6 \ 2048 \ 2048 \ 85 \ fp16 \ 1 \ 1 \ sel \ true \ false \ false \ false \ 100000 \ ${WORK_DIR}/qwen-datasets/wudao/wudao_qwenbpe_content_document \ ${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \ 100000000 \ 10000 \ ${WORK_DIR}/output_megatron_qwen/Resource Information
Resource Type
Select Lingjun Intelligent Computing.
Source
Select Resource Type.
Resource Type
Select the resource quota that you created for Lingjun resources.
Framework
Select PyTorch.
Job Resource
Configure the following parameters for the Worker node:
-
Nodes: 2. To perform multi-node training, set Nodes to the required number of machines.
-
GPUs: 8
-
vCPUs: 90
NoteThe number of CPU cores cannot exceed 96.
-
Memory (GiB): 1024
-
Shared Memory (GiB): 1024
-
-
Click OK. The page automatically goes to the Deep Learning Containers (DLC) page. When the Status changes to Succeeded, the training task is successful.
Fine-tune the model with supervised learning
You can fine-tune the model in a single DSW instance or submit a distributed task with multi-GPU servers in the DLC environment. The training process takes about two hours. After the task succeeds, the model files are saved to the /mnt/workspace/output_megatron_qwen/ directory.
-
Before you fine-tune the model, go to the Step 2: Prepare pre-training data section. On the Use the small-scale sample data processed by PAI tab, download the JSON file using the provided code.
-
Fine-tune the model.
Fine-tune the model on a single DSW instance
The following code shows an example of how to run the command for the Qwen-7B model in the Terminal:
export WORK_DIR=/mnt/workspace cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen sh run_finetune_megatron_qwen_withGA.sh \ dsw \ ${WORK_DIR}/Pai-Megatron-Patch \ 7B \ 1 \ 96 \ 1e-5 \ 1e-6 \ 2048 \ 2048 \ 85 \ bf16 \ 1 \ 1 \ sel \ true \ false \ false \ false \ 1000 \ ${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-train.json \ ${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-valid.json \ ${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \ 2000 \ 10 \ ${WORK_DIR}/output_megatron_qwen/The following table describes the parameters for running run_finetune_megatron_qwen_withGA.sh.
Parameter
Description
ENV=$1
The running environment:
-
dlc
-
dsw
MEGATRON_PATH=$2
The path of the open source Megatron code.
MODEL_SIZE=$3
The model size: 7B, 14B, or 72B.
BATCH_SIZE=$4
The number of samples for one iteration of training on each GPU: 1, 2, 4, or 8.
GLOBAL_BATCH_SIZE=$5
The total number of samples for one iteration of fine-tuning: 64, 96, or 128.
LR=$6
The learning rate: 1e-5 or 5e-5.
MIN_LR=$7
The minimum learning rate: 1e-6 or 5e-6.
SEQ_LEN=$8
The sequence length.
PAD_LEN=$9
The padding sequence length.
EXTRA_VOCAB_SIZE=${10}
The extra vocabulary size:
-
Qwen-7B: 85.
-
Qwen-14B: 213.
-
Qwen-72B: 213.
PR=${11}
The training precision: fp16 or bf16.
TP=${12}
The degree of model parallelism.
PP=${13}
The degree of pipeline parallelism.
AC=${14}
The activation checkpointing mode: full or sel.
DO=${15}
Specifies whether to use the Megatron version of the Zero-1 optimizer to reduce VRAM usage:
-
true
-
false
FL=${16}
Specifies whether to enable Flash Attention:
-
true
-
false
SP=${17}
Specifies whether to use sequence parallelism:
-
true
-
false
TE=${18}
Specifies whether to enable Transformer-engine acceleration. This feature requires gu8xf GPUs.
SAVE_INTERVAL=${19}
The step interval for saving the model.
DATASET_PATH=${20}
The path of the training dataset.
VALID_DATASET_PATH=${21}
The path of the validation set.
PRETRAIN_CHECKPOINT_PATH=${22}
The path of the pre-trained model.
TRAIN_ITERS=${23}
The number of training iterations.
LR_WARMUP_ITERS=${24}
The step at which the learning rate increases the most.
OUTPUT_BASEPATH=${25}
The path to save the output model files.
Fine-tune the model with distributed training on DLC
After debugging in a single DSW instance, you can configure a distributed task with multi-GPU servers in the DLC environment. When you submit the DLC training task, configure the Startup Command as follows. For information about other parameter settings, see Step 2: Pre-train the model.
export WORK_DIR=/mnt/workspace cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen sh run_finetune_megatron_qwen_withGA.sh \ dlc \ ${WORK_DIR}/Pai-Megatron-Patch \ 7B \ 1 \ 96 \ 1e-5 \ 1e-6 \ 2048 \ 2048 \ 85 \ bf16 \ 1 \ 1 \ sel \ true \ false \ false \ false \ 1000 \ ${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-train.json \ ${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-valid.json \ ${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \ 2000 \ 10 \ ${WORK_DIR}/output_megatron_qwen/The parameters for running run_finetune_megatron_qwen_withGA.sh are the same as those for fine-tuning the model on a single DSW instance.
-
Step 4: Perform offline inference
After the model training is complete, you can use the Megatron inference pipeline to perform offline inference and evaluate the model's performance. Follow these steps:
-
Download the test sample pred_input.jsonl and upload it to the
/mnt/workspacedirectory in DSW. For more information, see Upload and download files.NoteThe data structure for inference must be consistent with the data structure used for fine-tuning.
-
Copy all JSON files and the tokenizer.model file from the pre-training model path to the path of the generated model. The generated model path is the subdirectory under
{OUTPUT_BASEPATH }/checkpointand is at the same level as latest_checkpointed_iteration.txt.NoteReplace the paths in the command with your actual paths.
cd /mnt/workspace/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 cp *.json /mnt/workspace/output_megatron_qwen/checkpoint/dswXXX/ cp tokenizer.model /mnt/workspace/output_megatron_qwen/checkpoint/dswXXX/ -
In the Terminal, run the following command to perform offline inference. The inference results are saved to the
/mnt/workspace/qwen_pred.txtfile. You can evaluate the model's performance based on the results.NoteBefore you run the command, you must set the CUDA_VISIBLE_DEVICES parameter to 0 and the GPUS_PER_NODE parameter to 1 in the run_text_generation_megatron_qwen.sh script.
export WORK_DIR=/mnt/workspace cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen bash run_text_generation_megatron_qwen.sh \ dsw \ ${WORK_DIR}/PAI-Megatron-Patch \ /mnt/workspace/output_megatron_qwen/checkpoint/dswXXX \ 7B \ 1 \ 1 \ 1024 \ 1024 \ 85 \ fp16 \ 10 \ 512 \ 512 \ ${WORK_DIR}/pred_input.jsonl \ ${WORK_DIR}/qwen_pred.txt \ 0 \ 1.0 \ 1.2The following table describes the start parameters for the run_text_generation_megatron_qwen.sh script.
Parameter
Description
ENV=$1
The running environment:
-
dlc
-
dsw
MEGATRON_PATCH_PATH=$2
The path of the Megatron Patch code.
CHECKPOINT_PATH=$3
The path where the model was saved during the training phase.
ImportantReplace this with your model path.
MODEL_SIZE=$4
The model size: 7B, 14B, or 72B.
TP=$5
The degree of model parallelism.
Important-
If this parameter is set to 1, you can use a single GPU for inference.
-
If this parameter is set to a value greater than 1, you must use the corresponding number of GPUs for inference.
BS=$6
The number of samples for one iteration of inference on each GPU: 1, 4, or 8.
SEQ_LEN=$7
The sequence length: 256, 512, or 1024.
PAD_LEN=$8
PAD length: The length to which the text is padded.
EXTRA_VOCAB_SIZE=${9}
The number of tokens added during model conversion:
-
Qwen-7B: 85.
-
Qwen-14B: 213.
-
Qwen-72B: 213.
PR=${10}
The precision used for inference: fp16 or bf16.
TOP_K=${11}
The number of highest probability vocabulary tokens to keep for top-k-filtering (0 to n): 0, 5, 10, or 20.
INPUT_SEQ_LEN=${12}
The input sequence length: 512.
OUTPUT_SEQ_LEN=${13}
The output sequence length: 256.
INPUT_FILE=${14}
The text file for inference: pred_input.jsonl. Each line is a sample.
OUTPUT_FILE=${15}
The output file for inference: qwen_pred.txt.
TOP_P=${16}
The cumulative probability for top-p-filtering (0 to 1): 0, 0.85, or 0.95.
NoteOnly one of TOP_K and TOP_P can be non-zero.
TEMPERATURE=${17}
Temperature penalty in the (sampling)(policy): 1–n.
REPETITION_PENALTY=${18}
The penalty for repetition. The value can be from 1 to 2. The default value is 1.2.
-
Step 5: Convert the model format
After you complete offline inference, if the model performance meets your expectations, you can convert the trained Megatron model to the HuggingFace format. You can then use the converted HuggingFace model for online service deployment. Follow these steps:
-
In the Terminal, run the following command to convert the trained Megatron model to the HuggingFace format.
export WORK_DIR=/mnt/workspace cd /mnt/workspace/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen sh model_convertor.sh \ ../../../Megatron-LM-main \ ${WORK_DIR}/output_megatron_qwen/checkpoint/${path}/iter_******* \ /mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1/ \ 1 \ 1 \ qwen-7b \ 0 \ trueThe following table describes the parameters for running the model_convertor.sh script.
Parameter
Description
MEGATRON_PATH=$1
The path of the open source Megatron code.
SOURCE_CKPT_PATH=$2
The path of the Megatron model obtained from training. The path must be specific to the
iter_*directory. For example:${WORK_DIR}/output_megatron_qwen/checkpoint/dsw-pretrain-megatron-qwen-7B-lr-1e-5-bs-1-seqlen-2048-pr-bf16-tp-1-pp-1-ac-sel-do-true-sp-false-tt--wt-/iter_*******.Important-
Replace this with your model path.
-
If you use a pre-trained model for conversion, delete all distrib_optim.pt files from the model path.
TARGET_CKPT_PATH=$3
The path to save the converted HuggingFace model.
TP=$4
The number of tensor parallelism shards. This must be the same as the number used for training.
PP=$5
The number of pipeline parallelism shards. This must be the same as the number used for training.
MN=$6
The model name: qwen-7b, qwen-14b, or qwen-72b.
EXTRA_VOCAB_SIZE=$7
The extra vocabulary size.
mg2hf=$8
Specifies whether to convert a Megatron model to a HuggingFace model.
-
-
Copy the
.json,.py, and.tiktokenfiles from the open source HuggingFace model folder/mnt/workspace/qwen-ckpts/qwen-7b-hfto the/mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1directory to ensure that the model can be used.ImportantNote that you do not need to copy the pytorch_model.bin.index.json file.
Step 6: Deploy and call the model service
After you complete offline inference and evaluate the model's performance, you can deploy the converted HuggingFace model as an online service and call it in a production environment for inference. Follow these steps:
Deploy the model service
-
Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
-
Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.
-
On the Custom Deployment page, configure the following key parameters and keep the default values for other parameters.
Parameter
Description
Basic Information
Service Name
Enter a custom model service name that is unique within the region. This guide uses test_qwen.
Environment Information
Deployment Method
Select Image-based Deployment and select Enable Web App.
Image Configuration
Select Image Address and enter the following registry address in the text box:
eas-registry-vpc.cn-wulanchabu.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.4-vllm.Directly Mount
Select NAS as the mount type. Click Standard NAS and configure the following parameters:
-
Select File System: Select the NAS file system that you used to create the dataset.
-
Mount Target: Select the mount target that you used to create the dataset.
-
File System Path: Set this to the path of the converted HuggingFace model stored in NAS. This guide uses
/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1. -
Mount Path: Specify the path after mounting. This guide uses
/qwen-7b.
Command
Set this to
python webui/webui_server.py --port=8000 --model-path=/qwen-7b --tensor-parallel-size 1 --backend=vllm.Where:
-
--model-path: Must be the same as the mount path in the model configuration.
-
--tensor-parallel-size: The number of tensor parallelism shards for the model. This needs to be adjusted based on the number of GPUs. Set this to 1 for a 7B model and 8 for a 72B model (requires an eight-GPU instance).
Port Number
Set this to 8000.
Resource Information
Resource Type
Select Resource Quota.
Resource Quota
Select the resource quota that you created for Lingjun resources.
Instance Count
Configure this based on the model and the selected resources. For a 7B model, set Instance Count to 1.
Deployment
For a 7B model, configure the resources used by each instance as follows:
-
vCPUs: 16.
-
Memory (GB): 64.
-
GPUs: 1.
Service Access
VPC
After you configure the NAS mount target, the system automatically matches the VPC, vSwitch, and security group with the preset NAS file system.
vSwitch
Security Group Name
-
-
Click Deploy.
When the Service Status changes to Running, the service is deployed.
Call the service
After the service is deployed, you can call it for inference. Follow these steps:
-
In the service list, click the name of the target service. In the upper-right corner of the page, click View Web App.
-
On the WebUI page, perform model inference.