DeepSeek-R1 is the first-generation reasoning model developed by DeepSeek, excelling in mathematical, coding, and reasoning tasks. DeepSeek has open-sourced the DeepSeek-R1 model and six dense models distilled from DeepSeek-R1 based on Llama and Qwen, all of which have shown impressive performance in various benchmarks. This topic takes DeepSeek-R1-Distill-Qwen-7B as an example to describe how to fine-tune these models in Model Gallery of Platform for AI (PAI).
Supported models
PAI-Model Gallery supports LoRA supervised fine-tuning (SFT) training for the six distill models. The following table describes the recommended minimum configurations under the default parameters and datasets:
Distill model | Base model | Training method | Minimum configuration |
DeepSeek-R1-Distill-Qwen-1.5B | LoRA supervised fine-tuning | 1 x A10 (24 GB video memory) | |
DeepSeek-R1-Distill-Qwen-7B | 1 x A10 (24 GB video memory) | ||
DeepSeek-R1-Distill-Llama-8B | 1 x A10 (24 GB video memory) | ||
DeepSeek-R1-Distill-Qwen-14B | 1 x GU8IS (48 GB video memory) | ||
DeepSeek-R1-Distill-Qwen-32B | 2 x GU8IS (48 GB video memory) | ||
DeepSeek-R1-Distill-Llama-70B | 8 x GU100 (80 GB video memory) |
Train the model
Go to the Model Gallery page.
Log on to the PAI console.
In the upper-left corner, select a region based on your business requirements.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to use.
In the left-side navigation pane, choose QuickStart > Model Gallery.
On the Model Gallery page, click the DeepSeek-R1-Distill-Qwen-7B model card to go to the details page.
This page provides detailed information on model deployment and training, such as the SFT data format and the invocation method.

Click Train in the upper-right corner and configure the following key parameters:
Dataset Configuration: After you prepare the data, upload the data to the Object Storage Service (OSS) bucket.
Computing Resources: Choose suitable resources. The minimum configurations required under the default setup are listed in Supported models. If you need to adjust the hyperparameters, more video memory may be required.
Hyperparameters: The following table describes the hyperparameters supported by LoRA SFT. Adjust these based on your data and computing resources. For more information, see Guide to fine-tuning LLMs.
Hyperparameter
Type
Default value
(for 7B model as an example)
Description
learning_rate
float
5e-6
The learning rate, which controls the magnitude of model weight adjustments.
num_train_epochs
int
6
The number times the training dataset is reused.
per_device_train_batch_size
int
2
The number of samples processed by each GPU in one training iteration. A higher value results in higher training efficiency and higher memory usage.
gradient_accumulation_steps
int
2
The number of gradient accumulation steps.
max_length
int
1024
The maximum token length of input data processed by the model in one training session.
lora_rank
int
8
The LoRA dimension.
lora_alpha
int
32
The LoRA weights.
lora_dropout
float
0
The LoRA dropout rate. Randomly dropping neurons during the training process helps prevent overfitting.
lorap_lr_ratio
float
16
The learning rate ratio in LoRA+ is defined as λ = ηB/ηA, where ηA and ηB are the learning rates for adapter matrices A and B, respectively. Compared to standard LoRA, LoRA+ allows for the use of different learning rates for critical parts of the process, leading to better performance and faster fine-tuning without increasing computational demands. When
lorap_lr_ratiois set to 0, ithe standard LoRA is being used instead of LoRA+.
Click Train. You will be redirected to the model training page, and the training will begin. Here, you can view the status and logs of the training job.

If the training is successful, the model will be automatically registered in AI Asset Management - Models, where you can view or deploy it. For more information, see Register and manage models.
If the training fails, click
next to Status to discover the cause or go to the Task log tab for more information. For common training errors and solutions, see Usage notes and FAQ about Model Gallery.The Metric Curve section at the bottom of the training page displays the loss progression during training.

After successful training, click Deploy in the upper-right corner to deploy the trained model as an EAS service. The invocation method for the deployed model is the same as that of the original distill model. You can refer to the model detail page or One-click deployment of DeepSeek-V3 and DeepSeek-R1.

Billing
Training in Model Gallery uses the training capacities of Deep Learning Containers (DLC). DLC charges based on the duration of training jobs. After your training job ends, resource consumption will stop automatically and you don't need to stop it manually. Learn about Billing of DLC.
Usage notes
Troubleshooting job failure
When training, set an appropriate
max_length(hyperparameter in the training configuration). The training algorithm will delete any data exceedingmax_length, and the task log will display the following message:
Excessive data deletion may result in an empty training/validation dataset, leading to training task failure:
You may encounter the following error log:
failed to compose dlc job specs, resource limiting triggered, you are trying to use more GPU resources than the threshold. This indicates that the training job is restricted to 2 simultaneous GPU cores. Exceeding this limit will trigger a resource restriction. Please wait for the ongoing job to complete before starting a new one, or submit a ticket to request an increase in your quota.You may encounter the following error log:
the specified vswitch vsw-**** cannot create the required resource ecs.gn7i-c32g1.8xlarge, zone not match. This indicates that some specifications are out of resources in the current zone. Youcan try the following solutions:Do not select a vSwitch. DLC will automatically choose a vSwitch based on inventory.
Use other specifications.
How to download trained model?
When creating the training job, you can set the model out path to an OSS path. After training, you can download the trained model from OSS.
