Fine-Tune and Deploy Qwen2.5 - Coder with SFT & DPO on PAI

Qwen2.5-Coder capabilities

Qwen2.5-Coder is Alibaba Cloud's coding model series with six sizes (0.5B, 1.5B, 3B, 7B, 14B, 32B). It supports up to 128K tokens context length and 92 programming languages. Qwen2.5-Coder-Instruct is the instruction-tuned version with improved task performance.

Multilingual coding support

Supports over 40 programming languages including niche ones, validated on McEval benchmark.
Code reasoning

Strong reasoning ability on CRUXEval benchmark. Code reasoning improvements enhance complex instruction execution.
Mathematical ability

Excels at both math and code tasks, demonstrating solid scientific and technical competence.
Core capabilities

Retains Qwen2.5's general capabilities, confirming broad applicability and stability.

Prerequisites

Available regions: China (Beijing), China (Shanghai), China (Shenzhen), China (Hangzhou), China (Ulanqab), and Singapore.

GPU resource requirements:

Model size	Resource requirements
Qwen2.5-Coder-0.5B/1.5B	Training: GPUs with ≥16 GB VRAM (T4, P100, or V100) Deployment: Minimum single P4; recommended single GU30, A10, V100, or T4
Qwen2.5-Coder-3B/7B	Training: GPUs with ≥24 GB VRAM (A10 or T4) Deployment: Minimum single P100, T4, or V100 (gn6v); recommended single GU30 or A10
Qwen2.5-Coder-14B	Training: GPUs with ≥32 GB VRAM (V100) Deployment: Minimum single L20, single GU60, or dual GU30; recommended dual GU60 or dual L20
Qwen2.5-Coder-32B	Training: GPUs with ≥80 GB VRAM (A800 or H800) Deployment: Minimum two GU60, two L20, or four A10; recommended four GU60, four L20, or eight V100-32G

Deploy, train, evaluate, and compress models

Deploy and invoke the model

Go to Model Gallery.
1. Log on to the PAI console.
2. In the top-left corner, select your region.
3. In the left-side navigation pane, click Workspaces, then click your workspace name.
4. In the left-side navigation pane, click Getting Started > Model Gallery.
Click the Qwen2.5-Coder-32B-Instruct model card to open the model details page.
Click Deploy in the upper-right corner. Configure deployment method, inference service name, and resource settings. Set vLLM accelerated deployment as the deployment method.
Use the inference service.

After deployment completes, use the inference method shown on the model details page to call the model service.

Fine-tune the model

Model Gallery includes built-in supervised fine-tuning (SFT) and direct preference optimization (DPO) algorithms for Qwen2.5-Coder-32B-Instruct.

SFT supervised fine-tuning

SFT training accepts JSON-formatted input. Each sample contains an instruction and an output, represented by "instruction" and "output" fields. Example:

[
  {
    "instruction": "Create a function to calculate the sum of a sequence of integers.",
    "output": "# Python code\ndef sum_sequence(sequence):\n  sum = 0\n  for num in sequence:\n    sum += num\n  return sum"
  },
  {
    "instruction": "Generate a Python code for crawling a website for a specific type of data.",
    "output": "import requests\nimport re\n\ndef crawl_website_for_phone_numbers(website):\n    response = requests.get(website)\n    phone_numbers = re.findall('\\d{3}-\\d{3}-\\d{4}', response.text)\n    return phone_numbers\n    \nif __name__ == '__main__':\n    print(crawl_website_for_phone_numbers('www.example.com'))"
  }
]

DPO direct preference optimization

DPO training accepts JSON-formatted input. Each sample contains a prompt, preferred response, and rejected response, represented by "prompt", "chosen", and "rejected" fields. Example:

[
  {
    "prompt": "Create a function to calculate the sum of a sequence of integers.",
    "chosen": "# Python code\ndef sum_sequence(sequence):\n  sum = 0\n  for num in sequence:\n    sum += num\n  return sum",
    "rejected": "[x*x for x in [1, 2, 3, 5, 8, 13]]"
  },
  {
    "prompt": "Generate a Python code for crawling a website for a specific type of data.",
    "chosen": "import requests\nimport re\n\ndef crawl_website_for_phone_numbers(website):\n    response = requests.get(website)\n    phone_numbers = re.findall('\\d{3}-\\d{3}-\\d{4}', response.text)\n    return phone_numbers\n    \nif __name__ == '__main__':\n    print(crawl_website_for_phone_numbers('www.example.com'))",
    "rejected": "def remove_duplicates(string): \n    result = \"\" \n    prev = '' \n\n    for char in string:\n        if char != prev: \n            result += char\n            prev = char\n    return result\n\nresult = remove_duplicates(\"AAABBCCCD\")\nprint(result)"
  }
]

Click the Qwen2.5-Coder-32B-Instruct model card to open the model details page.

Click Train in the upper-right corner. Configure the following:

Dataset: Upload to OSS bucket, select from NAS or CPFS, or use PAI public datasets for testing.
Compute resources: Requires GPUs with ≥80 GB VRAM. Ensure sufficient resource quota. For other model sizes, see Prerequisites.

Hyperparameters: See supported hyperparameters below. Adjust based on your dataset and resources, or use defaults.

Hyperparameter	Type	Default value	Required	Description
training_strategy	string	sft	Yes	Training algorithm. Valid values: sft, dpo.
learning_rate	float	5e-5	Yes	Learning rate. Controls weight adjustments during training.
num_train_epochs	int	1	Yes	Number of iterations over the training dataset.
per_device_train_batch_size	int	1	Yes	Samples processed per GPU in one training iteration. Larger batches improve efficiency but increase VRAM usage.
seq_length	int	128	Yes	Maximum tokens processed in one training step.
lora_dim	int	32	No	LoRA dimension. When lora_dim > 0, uses LoRA or QLoRA lightweight training.
lora_alpha	int	32	No	LoRA weight. Takes effect when lora_dim > 0.
dpo_beta	float	0.1	No	Weight of preference signals during training.
load_in_4bit	bool	false	No	Load model in 4-bit precision. When lora_dim > 0, load_in_4bit is true, and load_in_8bit is false, uses 4-bit QLoRA lightweight training.
load_in_8bit	bool	false	No	Load model in 8-bit precision. When lora_dim > 0, load_in_4bit is false, and load_in_8bit is true, uses 8-bit QLoRA lightweight training.
gradient_accumulation_steps	int	8	No	Steps to accumulate gradients before updating weights.
apply_chat_template	bool	true	No	Apply model’s default chat template to training data. For Qwen2-series models: Problem: `<\|im_end\|>\n<\|im_start\|>user\n + instruction + <\|im_end\|>\n` Response: `<\|im_start\|>assistant\n + output + <\|im_end\|>\n`
system_prompt	string	You are a helpful assistant	No	System prompt used during model training.

Click Train. Model Gallery opens the task details page and starts training. Monitor training job status and logs.

Trained models register automatically in AI Assets > Model Management. View or deploy them from there. For details, see Register and manage models.

Evaluate the model

Model evaluation helps measure and compare model performance, guiding precise model selection and optimization to accelerate AI innovation and adoption.

Model Gallery includes built-in evaluation algorithms for Qwen2.5-Coder-32B-Instruct. Evaluate the base model or fine-tuned versions. For full instructions, see Model evaluation and Large Language Model Evaluation Best Practices.

Compress the model

Before deployment, quantize and compress trained models to reduce storage and compute resource usage. For full instructions, see Model compression.

Platform For AI:Qwen2.5-Coder models