All Products
Search
Document Center

Platform For AI:Qwen2.5-Coder models

Last Updated:Mar 12, 2026

Deploy, fine-tune, evaluate, and compress Qwen2.5-Coder models (0.5B to 32B) using Model Gallery—optimized for code generation, completion, and repair across 92 programming languages.

Qwen2.5-Coder capabilities

Qwen2.5-Coder is Alibaba Cloud's coding model series with six sizes (0.5B, 1.5B, 3B, 7B, 14B, 32B). It supports up to 128K tokens context length and 92 programming languages. Qwen2.5-Coder-Instruct is the instruction-tuned version with improved task performance.

  • Multilingual coding support

    Supports over 40 programming languages including niche ones, validated on McEval benchmark.

  • Code reasoning

    Strong reasoning ability on CRUXEval benchmark. Code reasoning improvements enhance complex instruction execution.

  • Mathematical ability

    Excels at both math and code tasks, demonstrating solid scientific and technical competence.

  • Core capabilities

    Retains Qwen2.5's general capabilities, confirming broad applicability and stability.

Prerequisites

  • Available regions: China (Beijing), China (Shanghai), China (Shenzhen), China (Hangzhou), China (Ulanqab), and Singapore.

  • GPU resource requirements:

    Model size

    Resource requirements

    Qwen2.5-Coder-0.5B/1.5B

    • Training: GPUs with ≥16 GB VRAM (T4, P100, or V100)

    • Deployment: Minimum single P4; recommended single GU30, A10, V100, or T4

    Qwen2.5-Coder-3B/7B

    • Training: GPUs with ≥24 GB VRAM (A10 or T4)

    • Deployment: Minimum single P100, T4, or V100 (gn6v); recommended single GU30 or A10

    Qwen2.5-Coder-14B

    • Training: GPUs with ≥32 GB VRAM (V100)

    • Deployment: Minimum single L20, single GU60, or dual GU30; recommended dual GU60 or dual L20

    Qwen2.5-Coder-32B

    • Training: GPUs with ≥80 GB VRAM (A800 or H800)

    • Deployment: Minimum two GU60, two L20, or four A10; recommended four GU60, four L20, or eight V100-32G

Deploy, train, evaluate, and compress models

Deploy and invoke the model

  1. Go to Model Gallery.

    1. Log on to the PAI console.

    2. In the top-left corner, select your region.

    3. In the left-side navigation pane, click Workspaces, then click your workspace name.

    4. In the left-side navigation pane, click Getting Started > Model Gallery.

  2. Click the Qwen2.5-Coder-32B-Instruct model card to open the model details page.

  3. Click Deploy in the upper-right corner. Configure deployment method, inference service name, and resource settings. Set vLLM accelerated deployment as the deployment method.

  4. Use the inference service.

    After deployment completes, use the inference method shown on the model details page to call the model service.image

Fine-tune the model

Model Gallery includes built-in supervised fine-tuning (SFT) and direct preference optimization (DPO) algorithms for Qwen2.5-Coder-32B-Instruct.

SFT supervised fine-tuning

SFT training accepts JSON-formatted input. Each sample contains an instruction and an output, represented by "instruction" and "output" fields. Example:

[
  {
    "instruction": "Create a function to calculate the sum of a sequence of integers.",
    "output": "# Python code\ndef sum_sequence(sequence):\n  sum = 0\n  for num in sequence:\n    sum += num\n  return sum"
  },
  {
    "instruction": "Generate a Python code for crawling a website for a specific type of data.",
    "output": "import requests\nimport re\n\ndef crawl_website_for_phone_numbers(website):\n    response = requests.get(website)\n    phone_numbers = re.findall('\\d{3}-\\d{3}-\\d{4}', response.text)\n    return phone_numbers\n    \nif __name__ == '__main__':\n    print(crawl_website_for_phone_numbers('www.example.com'))"
  }
]

DPO direct preference optimization

DPO training accepts JSON-formatted input. Each sample contains a prompt, preferred response, and rejected response, represented by "prompt", "chosen", and "rejected" fields. Example:

[
  {
    "prompt": "Create a function to calculate the sum of a sequence of integers.",
    "chosen": "# Python code\ndef sum_sequence(sequence):\n  sum = 0\n  for num in sequence:\n    sum += num\n  return sum",
    "rejected": "[x*x for x in [1, 2, 3, 5, 8, 13]]"
  },
  {
    "prompt": "Generate a Python code for crawling a website for a specific type of data.",
    "chosen": "import requests\nimport re\n\ndef crawl_website_for_phone_numbers(website):\n    response = requests.get(website)\n    phone_numbers = re.findall('\\d{3}-\\d{3}-\\d{4}', response.text)\n    return phone_numbers\n    \nif __name__ == '__main__':\n    print(crawl_website_for_phone_numbers('www.example.com'))",
    "rejected": "def remove_duplicates(string): \n    result = \"\" \n    prev = '' \n\n    for char in string:\n        if char != prev: \n            result += char\n            prev = char\n    return result\n\nresult = remove_duplicates(\"AAABBCCCD\")\nprint(result)"
  }
]
  1. Click the Qwen2.5-Coder-32B-Instruct model card to open the model details page.

  2. Click Train in the upper-right corner. Configure the following:

    • Dataset: Upload to OSS bucket, select from NAS or CPFS, or use PAI public datasets for testing.

    • Compute resources: Requires GPUs with ≥80 GB VRAM. Ensure sufficient resource quota. For other model sizes, see Prerequisites.

    • Hyperparameters: See supported hyperparameters below. Adjust based on your dataset and resources, or use defaults.

      Hyperparameter

      Type

      Default value

      Required

      Description

      training_strategy

      string

      sft

      Yes

      Training algorithm. Valid values: sft, dpo.

      learning_rate

      float

      5e-5

      Yes

      Learning rate. Controls weight adjustments during training.

      num_train_epochs

      int

      1

      Yes

      Number of iterations over the training dataset.

      per_device_train_batch_size

      int

      1

      Yes

      Samples processed per GPU in one training iteration. Larger batches improve efficiency but increase VRAM usage.

      seq_length

      int

      128

      Yes

      Maximum tokens processed in one training step.

      lora_dim

      int

      32

      No

      LoRA dimension. When lora_dim > 0, uses LoRA or QLoRA lightweight training.

      lora_alpha

      int

      32

      No

      LoRA weight. Takes effect when lora_dim > 0.

      dpo_beta

      float

      0.1

      No

      Weight of preference signals during training.

      load_in_4bit

      bool

      false

      No

      Load model in 4-bit precision.

      When lora_dim > 0, load_in_4bit is true, and load_in_8bit is false, uses 4-bit QLoRA lightweight training.

      load_in_8bit

      bool

      false

      No

      Load model in 8-bit precision.

      When lora_dim > 0, load_in_4bit is false, and load_in_8bit is true, uses 8-bit QLoRA lightweight training.

      gradient_accumulation_steps

      int

      8

      No

      Steps to accumulate gradients before updating weights.

      apply_chat_template

      bool

      true

      No

      Apply model’s default chat template to training data. For Qwen2-series models:

      • Problem: <|im_end|>\n<|im_start|>user\n + instruction + <|im_end|>\n

      • Response: <|im_start|>assistant\n + output + <|im_end|>\n

      system_prompt

      string

      You are a helpful assistant

      No

      System prompt used during model training.

  3. Click Train. Model Gallery opens the task details page and starts training. Monitor training job status and logs.

    image

    Trained models register automatically in AI Assets > Model Management. View or deploy them from there. For details, see Register and manage models.

Evaluate the model

Model evaluation helps measure and compare model performance, guiding precise model selection and optimization to accelerate AI innovation and adoption.

Model Gallery includes built-in evaluation algorithms for Qwen2.5-Coder-32B-Instruct. Evaluate the base model or fine-tuned versions. For full instructions, see Model evaluation and Large Language Model Evaluation Best Practices.

Compress the model

Before deployment, quantize and compress trained models to reduce storage and compute resource usage. For full instructions, see Model compression.

Related information