Training, evaluation, compression, and deployment of a Qwen2.5-Coder model - Platform For AI

Tongyi Qianwen 2.5-Coder (Qwen2.5-Coder), also known as CodeQwen, is the latest large language model (LLM) series from Alibaba Cloud that focuses on code processing. This series offers six model sizes (0.5B, 1.5B, 3B, 7B, 14B, and 32B) to meet the diverse needs of developers. Trained on massive amounts of code data, Qwen2.5-Coder significantly enhances its performance in code-related scenarios while maintaining powerful mathematical and inference capabilities. PAI provides full support for this series of models. This topic uses the Qwen2.5-Coder-32B-Instruct model as an example to describe how to deploy, fine-tune, evaluate, and compress this model series in Model Gallery.

Overview

Qwen2.5-Coder is a model from Alibaba Cloud with powerful programming capabilities. It supports a context of up to 128K tokens and is compatible with 92 programming languages. This model excels at multiple code-related tasks, including multi-language code generation, code completion, and code repair. Through instruction fine-tuning, Alibaba Cloud launched Qwen2.5-Coder-Instruct, which is based on Qwen2.5-Coder. This new model further improves performance across various tasks and demonstrates excellent generalization capabilities.

Multi-language programming capabilities
Qwen2.5-Coder-Instruct demonstrates excellent multi-language programming capabilities. The model was extensively tested using the McEval benchmark, which covers more than 40 programming languages, including some niche ones. The results show that the model performs well in multi-language tasks.
Code inference
Qwen2.5-Coder-Instruct performs well in code inference tasks. When evaluated using the CRUXEval benchmark, the model demonstrates powerful inference capabilities. As the code inference capability improves, the model's performance in complex instruction execution is also enhanced. This provides a new perspective for exploring the impact of code capabilities on general inference capabilities.
Mathematical capabilities
Qwen2.5-Coder-Instruct performs well in both mathematics and code tasks. Mathematics is a foundational discipline for code, and the model's excellent performance in both areas reflects its strong comprehensive scientific capabilities.
Basic capabilities
In general capability evaluations, Qwen2.5-Coder-Instruct maintains the advantages of Qwen2.5, proving its applicability and stability across a wide range of tasks.

With these attributes, the Qwen2.5-Coder model series provides strong technical support for multi-language programming and complex task processing.

Environment requirements

This example can be run in Model Gallery in regions such as China (Beijing), China (Shanghai), China (Shenzhen), China (Hangzhou), China (Ulanqab), and Singapore.

Resource configuration requirements:

Model size	Resource requirements
Qwen2.5-Coder-0.5B/1.5B	Training stage: Use card types with 16 GB or more video memory, such as T4, P100, and V100. Deployment stage: The minimum card type configuration is a single P4 card. We recommend that you deploy the model on a single GU30, A10, V100, or T4 card.
Qwen2.5-Coder-3B/7B	Training stage: Use card types with 24 GB or more video memory, such as A10 and T4. Deployment stage: The minimum card type configuration is a single P100, T4, or V100 (gn6v) card. We recommend that you deploy the model on a single GU30 or A10 card.
Qwen2.5-Coder-14B	Training stage: Use card types with 32 GB or more video memory, such as V100. Deployment stage: The minimum card type configuration is a single L20 card, a single GU60 card, or two GU30 cards. We recommend that you deploy the model on two GU60 cards or two L20 cards.
Qwen2.5-Coder-32B	Training stage: Use card types with 80 GB or more video memory, such as A800/H800. Deployment stage: The minimum card type configuration is two GU60 cards, two L20 cards, or four A10 cards. We recommend that you deploy the model on four GU60 cards, four L20 cards, or eight V100-32G cards.

Use a model in PAI-Model Gallery

Deploy and call a model

Go to the Model Gallery page.
1. Log on to the PAI console.
2. In the upper-left corner, select the required region.
3. In the navigation pane on the left, click Workspaces. Then, click the name of the workspace that you want to manage to open its details page.
4. In the left navigation pane, choose QuickStart > Model Gallery.
On the Model Gallery page, in the model list on the right, click the Qwen2.5-Coder-32B-Instruct model card to go to the product page.
In the upper-right corner, click Deploy and configure the Deployment Method, Service Name and Deployment Resources to the EAS inference service platform.
Use the inference service.
After the service is deployed, you can call the model service to verify its performance. For more information about the inference methods, see the model details page.

Fine-tune a model

Model Gallery provides two fine-tuning algorithms for the Qwen2.5-Coder-32B-Instruct model: supervised fine-tuning (SFT) and direct preference optimization (DPO). You can use these algorithms out of the box to fine-tune the model.

Supervised fine-tuning (SFT)

The SFT algorithm supports input in JSON format. Each data entry consists of a question and an answer, which are represented by the "instruction" and "output" fields, respectively. For example:

[
  {
    "instruction": "Create a function to calculate the sum of a sequence of integers.",
    "output": "# Python code\ndef sum_sequence(sequence):\n  sum = 0\n  for num in sequence:\n    sum += num\n  return sum"
  },
  {
    "instruction": "Generate a Python code for crawling a website for a specific type of data.",
    "output": "import requests\nimport re\n\ndef crawl_website_for_phone_numbers(website):\n    response = requests.get(website)\n    phone_numbers = re.findall('\\d{3}-\\d{3}-\\d{4}', response.text)\n    return phone_numbers\n    \nif __name__ == '__main__':\n    print(crawl_website_for_phone_numbers('www.example.com'))"
  }
]

Direct preference optimization (DPO)

The DPO algorithm supports input in JSON format. Each data entry consists of a question, a preferred answer, and a rejected answer. These are represented by the "prompt", "chosen", and "rejected" fields, respectively. For example:

[
  {
    "prompt": "Create a function to calculate the sum of a sequence of integers.",
    "chosen": "# Python code\ndef sum_sequence(sequence):\n  sum = 0\n  for num in sequence:\n    sum += num\n  return sum",
    "rejected": "[x*x for x in [1, 2, 3, 5, 8, 13]]"
  },
  {
    "prompt": "Generate a Python code for crawling a website for a specific type of data.",
    "chosen": "import requests\nimport re\n\ndef crawl_website_for_phone_numbers(website):\n    response = requests.get(website)\n    phone_numbers = re.findall('\\d{3}-\\d{3}-\\d{4}', response.text)\n    return phone_numbers\n    \nif __name__ == '__main__':\n    print(crawl_website_for_phone_numbers('www.example.com'))",
    "rejected": "def remove_duplicates(string): \n    result = \"\" \n    prev = '' \n\n    for char in string:\n        if char != prev: \n            result += char\n            prev = char\n    return result\n\nresult = remove_duplicates(\"AAABBCCCD\")\nprint(result)"
  }
]

On the Model Gallery page, in the model list on the right, click the Qwen2.5-Coder-32B-Instruct model card to open the model details page.

On the model details page, click Train in the upper-right corner. The key configurations are as follows:

Dataset configuration: After you prepare the data, you can upload it to an Object Storage Service (OSS) bucket or select a dataset on NAS or CPFS by specifying a dataset object. You can also use the public dataset preset in PAI to submit a task and test the algorithm.
Computing resource configuration: The algorithm requires GPU resources with 80 GB or more of video memory. Make sure that your resource quota has sufficient computing resources. For more information about the resource specifications required for other model sizes, see Environment requirements.

Hyperparameter configuration: The following table describes the hyperparameters supported by the training algorithm. You can adjust the hyperparameters based on your data and computing resources, or you can use the default hyperparameters of the algorithm.

Hyperparameter	Type	Default value	Required	Description
training_strategy	string	sft	Yes	The training algorithm. Valid values: SFT and DPO.
learning_rate	float	5e-5	Yes	The learning rate, which is used to control the model weights and the adjustment range.
num_train_epochs	int	1	Yes	The number of times the training dataset is repeatedly used.
per_device_train_batch_size	int	1	Yes	The number of samples processed by each GPU in a training iteration. A larger batch size can improve efficiency and increase the demand for video memory.
seq_length	int	128	Yes	The sequence length, which refers to the length of the input data processed by the model in a single training.
lora_dim	int	32	No	The LoRA dimension. When lora_dim > 0, LoRA/QLoRA lightweight training is used.
lora_alpha	int	32	No	The LoRA weight. This parameter takes effect when lora_dim > 0 and LoRA/QLoRA lightweight training is used.
dpo_beta	float	0.1	No	The degree to which the model depends on preference information during training.
load_in_4bit	bool	false	No	Specifies whether to load the model in 4-bit. 4-bit QLoRA lightweight training is used if lora_dim is greater than 0, load_in_4bit is set to true, and load_in_8bit is set to false.
load_in_8bit	bool	false	No	Specifies whether to load the model in 8-bit. 8-bit QLoRA lightweight training is used if lora_dim is greater than 0, load_in_4bit is set to false, and load_in_8bit is set to true.
gradient_accumulation_steps	int	8	No	The number of gradient accumulation steps.
apply_chat_template	bool	true	No	Specifies whether the algorithm adds the model's default chat template to the training data. For example, for the Qwen2 series models, the format is: Question: `<\|im_end\|>\n<\|im_start\|>user\n + instruction + <\|im_end\|>\n` Answer: `<\|im_start\|>assistant\n + output + <\|im_end\|>\n`
system_prompt	string	You are a helpful assistant	No	The system prompt used for model training.

Click Train. Model Gallery automatically redirects you to the job details page, where you can view the status and logs as the training begins.
The trained model is automatically registered in AI Asset - Model Management, where you can view or deploy it. For more information, see Register and manage models.

Evaluate a model

Scientific and efficient model evaluation helps developers measure and compare the performance of different models. It also guides them in selecting and optimizing models, which accelerates AI innovation and application implementation.

Model Gallery provides an evaluation algorithm for the Qwen2.5-Coder-32B-Instruct model. You can use this algorithm out of the box to evaluate the original or fine-tuned model. For more information about model evaluation, see Model evaluation and Best practices for LLM evaluation.

Compress a model

After a model is trained, you can quantize and compress it before deployment to reduce storage and computing resource usage. For more information about model compression, see Model compression.

Platform For AI:Training, evaluation, compression, and deployment of a Qwen2.5-Coder model