All Products
Search
Document Center

Platform For AI:Fine-tune DeepSeek-R1 distill models

Last Updated:Mar 30, 2026

DeepSeek-R1 is a model developed by DeepSeek that excels in math, coding, and reasoning tasks. This topic uses the DeepSeek-R1-Distill-Qwen-7B distill model as an example to explain how to fine-tune models in this series.

Supported models

Model Gallery supports LoRA supervised fine-tuning (SFT) for six distill models. The following table lists the recommended minimum computing resource configurations when you use the default hyperparameters and the provided dataset.

Distill Model

Base Model

Supported Training Method

Minimum Configuration

DeepSeek-R1-Distill-Qwen-1.5B

Qwen2.5-Math-1.5B

LoRA supervised fine-tuning

1 × A10 (24 GB video memory)

DeepSeek-R1-Distill-Qwen-7B

Qwen2.5-Math-7B

1 × A10 (24 GB video memory)

DeepSeek-R1-Distill-Llama-8B

Llama-3.1-8B

1 × A10 (24 GB video memory)

DeepSeek-R1-Distill-Qwen-14B

Qwen2.5-14B

1 × GU8IS (48 GB video memory)

DeepSeek-R1-Distill-Qwen-32B

Qwen2.5-32B

2 × GU8IS (48 GB video memory)

DeepSeek-R1-Distill-Llama-70B

Llama-3.3-70B-Instruct

8 × GU100 (80 GB video memory)

Quick start

  1. Go to the Model Gallery page.

    1. Log on to the PAI console. In the left-side navigation pane, select your target Workspace.

    2. In the left-side navigation pane, choose QuickStart > Model Gallery.

      image

  2. On the Model Gallery page, search for and click the DeepSeek-R1-Distill-Qwen-7B model card to open the model details page. This page provides details about model training and deployment, including the required data format for SFT and model invocation methods.

    image

  3. Click Train in the upper-right corner. Configure the following key parameters:

    • Dataset configuration: This example uses the default dataset. You can also prepare a custom dataset according to the format requirements on the model details page and upload it to an Object Storage Service (OSS) bucket.

    • Model output path: Select an OSS path to store the fine-tuned model.

    • Computing Resources: For Source, select public resource. For Instance type, select ecs.gn7i-c16g1.4xlarge.

    • Hyperparameters: The following table describes the hyperparameters supported for LoRA supervised fine-tuning. You can adjust them as needed. For more information, see Fine-tuning guide for large language models.

      Hyperparameters

      Parameter

      Type

      Default (for 7B model)

      Description

      learning_rate

      float

      5e-6

      The learning rate, which controls the magnitude of model weight adjustments.

      num_train_epochs

      int

      6

      The number of times the training dataset is iterated over.

      per_device_train_batch_size

      int

      2

      The number of samples processed by each GPU in a single training iteration. A larger batch size can improve efficiency but also increases video memory requirements.

      gradient_accumulation_steps

      int

      2

      The number of gradient accumulation steps.

      max_length

      int

      1024

      The maximum token length of the input data that the model processes in a single training iteration.

      lora_rank

      int

      8

      The LoRA dimension.

      lora_alpha

      int

      16

      The LoRA scaling factor.

      lora_dropout

      float

      0

      The dropout rate for LoRA training. It helps prevent overfitting by randomly dropping neurons during training.

      lorap_lr_ratio

      float

      16

      The LoRA+ learning rate ratio (λ = ηB/ηA), where ηA and ηB are the learning rates for adapter matrices A and B, respectively. Compared to LoRA, LoRA+ uses different learning rates for the adapter matrices A and B to achieve better performance and faster fine-tuning without increasing computational requirements. Set lorap_lr_ratio to 0 to use standard LoRA instead of LoRA+.

  4. Click Train. PAI automatically redirects you to the training job page where you can monitor the job status and view logs.

    image

    When the training job succeeds, the system automatically registers the fine-tuned model in AI Asset Management - Models. You can then view or deploy the model. For details, see Register and manage models.

  5. After the training is complete, click Deploy in the upper-right corner to deploy the fine-tuned model as an EAS service. The invocation method is the same as that for the original distill model. For more information, see the model details page or Deploy DeepSeek-V3 and DeepSeek-R1 models.

    image

Billing

Model training in Model Gallery uses DLC. DLC is billed based on the duration of the training job. For more information, see Billing for DLC.

FAQ

Q: How do I troubleshoot a failed training job?

  • Set an appropriate max_length in the training configuration. The training algorithm discards data that exceeds max_length and logs the action in the task log:

    imageIf too much data is discarded, the training or validation dataset might become empty, causing the training job to fail:

    image

  • The error log failed to compose dlc job specs, resource limiting triggered, you are trying to use more GPU resources than the threshold indicates that the job has triggered a resource limit. By default, a maximum of 2 GPUs can run simultaneously for training jobs. Wait for the running jobs to complete before starting a new one, or submit a ticket to request a quota increase.

  • The error log the specified vswitch vsw-**** cannot create the required resource ecs.gn7i-c32g1.8xlarge, zone not match indicates that the specified instance type is out of stock in the availability zone where the VSwitch is located. You can try the following solutions: 1. Do not specify a VSwitch. DLC automatically selects a VSwitch in an availability zone with sufficient inventory. 2. Switch to a different instance type.

Q: Can I download the model after training?

Yes. When you create a training job, you can set the model output path to an OSS directory. After the job completes, you can download the model from the specified OSS path.

image

Q: What should I do if the model performance is poor?

Consider the following solutions:

  1. Use a model with better baseline performance, such as a model from the DeepSeek or Qwen3 series with a higher parameter count.

  2. Refine your prompts.

  3. Increase the max_tokens value.

  4. Break down complex tasks into smaller subtasks for the model to handle separately.

Related topics