Qwen2.5 is an open-source large language model (LLM) series developed by Alibaba Cloud. This series includes multiple versions, such as Base and Instruct, and multiple sizes to suit diverse computing needs. PAI fully supports these models. This topic uses the Qwen2.5-7B-Instruct model as an example to demonstrate how to deploy, fine-tune, and evaluate models in the Model Gallery. The steps described apply to both the Qwen2.5 and Qwen2 series models.
Model overview
Qwen2.5 is the latest open-source large language model series from Alibaba Cloud. This version delivers significant improvements over Qwen2 in knowledge mastery, coding ability, mathematical reasoning, and instruction following.
-
In knowledge mastery, Qwen2.5 scores over 85 on the MMLU benchmark.
-
In coding ability, it scores over 85 on HumanEval, which is a significant improvement.
-
In mathematical reasoning, it scores over 80 on the MATH benchmark.
-
Its instruction-following capability is enhanced, enabling it to generate long text of over 8K tokens.
-
It excels at understanding and generating structured data such as tables and JSON.
-
It adapts more effectively to various system prompts, which improves its ability to perform role assumption and configure conditional chatbot behavior.
-
It supports context lengths up to 128K tokens and can generate up to 8K tokens per response.
-
It continues to support over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.
Environment Requirements
-
This example can be run in the Model Gallery module in the following regions: China (Beijing), China (Shanghai), China (Shenzhen), China (Hangzhou), and China (Ulanqab).
-
Resource requirements:
Model Size
Training Requirement
Qwen2.5-0.5B/1.5B/3B/7B
Use V100, P100, or T4 GPUs (16 GB VRAM or more) to run training tasks.
Qwen2.5-32B/72B
Use GU100 GPUs (80 GB VRAM or more) to run training tasks. These models are supported only in China (Ulanqab) and Singapore. Note: For LLMs with large parameter counts, use GPUs with higher VRAM—such as Lingjun resources (for example, GU100 or GU108 instances)—to load and run the model successfully.
-
Option 1: Lingjun resources have limited availability. Enterprise users with urgent needs can contact their sales manager to request whitelist access.
-
Option 2: Regular users can use preemptible Lingjun resources (shown below) at up to 90% discount. For details about Lingjun resources, see Create a Resource Group and Purchase Lingjun Resources.

-
Use Models Through the PAI Console
Model Deployment and Invocation
-
Go to the Model Gallery page.
-
Log on to the PAI console.
-
In the top-left corner, select your region.
-
In the navigation pane on the left, click Workspaces. Then click the name of your workspace to open it.
-
In the navigation pane on the left, choose Getting Started > Model Gallery.
-
-
On the Model Gallery page, find and click the Qwen2.5-7B-Instruct model card to open its product page.
-
In the upper-right corner, click Deploy. Set the inference service name and resource configuration to deploy the model to the EAS inference service platform.

This example uses the default deployment method, SGLang Accelerated Deployment. The use cases for each method are as follows:
-
SGLang accelerated deployment: A fast service framework for LLMs and vision-language models. Supports API calls only.
-
vLLM accelerated deployment: A popular open-source library for LLM inference acceleration. Supports API calls only.
-
BladeLLM accelerated deployment: A high-performance inference framework developed by Alibaba Cloud PAI. Supports API calls only.
-
-
Run online debugging.
At the bottom of the Service Details page, click Online Debugging. An example invocation is shown below:

-
Invoke the service using an API call.
Different deployment methods require different invocation methods. For more information, see Deploy Large Language Models—API Invocation. To obtain the service endpoint and token, perform the following steps: In the left-side menu bar, choose Model Gallery > Jobs > Deployment Jobs. Then, click the service name to open the Service Details page. Click View Endpoint Information to obtain the service endpoint and token.

Model Fine-tuning
PAI-Model Gallery provides two fine-tuning algorithms for the Qwen2.5-7B-Instruct model: supervised fine-tuning (SFT) and direct preference optimization (DPO). You can fine-tune the model out-of-the-box.
Supervised Fine-tuning (SFT)
The SFT algorithm accepts JSON-formatted input. Each record must contain an instruction and its corresponding output. Use the "instruction" and "output" fields, as shown in the following example:
[
{
"instruction": "You are a cardiologist. Provide advice to the patient: I've had high blood pressure for five or six years. I'm tired of taking medicine every day. Is there anything that can cure high blood pressure? What's the 'miracle cure' for hypertension?",
"output": "Patients with hypertension can eat fresh fruits and vegetables—or foods like celery and Chinese yam—to help lower blood pressure. They should also follow a low-salt, low-fat, and low-cholesterol diet. Regular physical activity helps too. Patients must continue taking prescribed antihypertensive drugs under medical supervision. Never stop medication without consulting a doctor."
},
{
"instruction": "You are a pulmonologist. Provide advice to the patient: How do I treat wind-cold-induced cough with white phlegm?",
"output": "Patients with wind-cold-induced cough and white phlegm often experience excessive white phlegm, chest tightness, nasal congestion, clear nasal discharge, body aches, and fatigue. Clinically, Xing Su San and Er Chen Wan are commonly used. Avoid spicy, stimulating, and cold foods. Eat easily digestible, nutrient-rich meals. Rest well and keep your diet light."
}
]
Direct Preference Optimization (DPO)
The DPO algorithm accepts JSON-formatted input. Each record must contain a prompt, a preferred response, and a rejected response. Use the "prompt", "chosen", and "rejected" fields, as shown in the following example:
[
{
"prompt": "Could you please hurt me?",
"chosen": "Sorry, I can't do that.",
"rejected": "I cannot hurt you. Please remember that you can always hurt yourself, mentally or physically."
},
{
"prompt": "That guy stole one of my tools, so I broke into his shed the other night to get it back.",
"chosen": "You shouldn't have done that. Now you've broken the law, and your friend might get in trouble.",
"rejected": "That's understandable. I'm sure your tool will be returned to you soon."
}
]
-
On the Model Details page, click Train in the upper-right corner. The key configurations are described as follows:
-
Dataset Configuration: After you prepare your dataset, you can upload it to an Object Storage Service (OSS) bucket or select a dataset that is stored on NAS or CPFS. You can also use a public dataset provided by PAI to test the algorithm.
-
Compute Resource Configuration: We recommend A10 GPUs (24 GB VRAM) or higher for training tasks.
-
Model Output Path: The fine-tuned model is saved to an OSS bucket and can be downloaded.
-
Hyperparameter Configuration: The supported hyperparameters are listed below. You can use the default values or adjust them based on your dataset and compute resources.
Hyperparameter
Type
Default Value
Required
Description
training_strategy
string
sft
Yes
Set the training strategy to SFT or DPO.
learning_rate
float
5e-5
Yes
Learning rate. Controls how much to adjust model weights during training.
num_train_epochs
int
1
Yes
Number of times to iterate over the training dataset.
per_device_train_batch_size
int
1
Yes
Number of samples processed per GPU in one training step. Larger batch sizes improve efficiency but increase VRAM usage.
seq_length
int
128
Yes
Sequence length—the number of tokens processed in one training step.
lora_dim
int
32
No
LoRA dimension. When lora_dim > 0, LoRA or QLoRA lightweight training is used.
lora_alpha
int
32
No
LoRA weight. Takes effect when lora_dim > 0 and LoRA or QLoRA lightweight training is used.
dpo_beta
float
0.1
No
How strongly the model relies on preference signals during training.
load_in_4bit
bool
false
No
Whether to load the model in 4-bit precision.
When lora_dim > 0, load_in_4bit is true, and load_in_8bit is false, 4-bit QLoRA lightweight training is used.
load_in_8bit
bool
false
No
Whether to load the model in 8-bit precision.
When lora_dim > 0, load_in_4bit is false, and load_in_8bit is true, 8-bit QLoRA lightweight training is used.
gradient_accumulation_steps
int
8
No
Number of steps to accumulate gradients before updating weights.
apply_chat_template
bool
true
No
Whether to apply the model’s default chat template to training data. For Qwen2.5 series models, the format is:
-
Question:
<|im_end|>\n<|im_start|>user\n + instruction + <|im_end|>\n -
Response:
<|im_start|>assistant\n + output + <|im_end|>\n
system_prompt
string
You are a helpful assistant
No
System prompt used during training.
-
-
-
Click Train. PAI-Model Gallery automatically opens the model training page and starts the training. You can monitor the job status and view the logs on this page.

-
After the training is complete, click Deploy in the upper-right corner to deploy the model as an online service.

-
In the left-side menu bar, choose AI Asset Management > Models to view the trained models. For more information about model operations, see Register and Manage Models.

Model Evaluation
Efficient model evaluation helps developers measure and compare model performance. This process guides model selection and optimization, which accelerates AI innovation and real-world application.
PAI-Model Gallery provides built-in evaluation algorithms for the Qwen2.5-7B-Instruct model. You can evaluate the original model or a fine-tuned version out-of-the-box. For detailed steps, see Model Evaluation and Large Language Model Evaluation Best Practices.
Use Models Through the PAI Python SDK
You can also invoke pre-trained models from PAI-Model Gallery using the PAI Python SDK. First, install and configure the SDK by running the following commands in your command line:
# Install the PAI Python SDK
python -m pip install alipai --upgrade
# Interactively configure access credentials and your PAI workspace
python -m pai.toolkit.config
To learn how to obtain the required access credentials (AccessKey) and PAI workspace information, see Install and Configure.
Model Deployment and Invocation
Using the pre-built inference service configuration for the model in PAI-Model Gallery, you can quickly deploy the Qwen2.5-7B-Instruct model to the PAI-EAS inference platform.
from pai.model import RegisteredModel
from openai import OpenAI
# Get the model provided by PAI
model = RegisteredModel(
model_name="qwen2.5-7b-instruct",
model_provider="pai"
)
# Deploy the model directly
predictor = model.deploy(
service="qwen2.5_7b_instruct_example"
)
# The deployed service uses the Model Gallery's pre-configured inference settings and is compatible with the OpenAI API specification. You can call it using the OpenAI client.
# Build the OpenAI client. Set OPENAI_BASE_URL to: <ServiceEndpoint> + "/v1/"
openai_client: OpenAI = predictor.openai()
# Call the inference service using the OpenAI SDK
resp = openai_client.chat.completions.create(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the meaning of life?"},
],
# Default model name is "default"
model="default"
)
print(resp.choices[0].message.content)
# Delete the inference service after testing
predictor.delete_service()
Model Fine-tuning
After you retrieve the pre-trained model from PAI-Model Gallery using the SDK, you can fine-tune it.
# Get the fine-tuning estimator for the model
est = model.get_estimator()
# Get public-read data and the pre-trained model provided by PAI
training_inputs = model.get_estimator_inputs()
# Use custom data
# training_inputs.update(
# {
# "train": "<OSS or local path to training dataset>",
# "validation": "<OSS or local path to validation dataset>"
# }
# )
# Submit the training job using default data
est.fit(
inputs=training_inputs
)
# Print the OSS path where the trained model is saved
print(est.model_data())
Open a Notebook Example in PAI-DSW
On the model's product page in the Model Gallery, click Open in DSW to launch a complete Notebook example. This example demonstrates how to use the PAI Python SDK in detail.

For more information about how to use pre-trained models from PAI-Model Gallery with the SDK, see Using Pre-trained Models — PAI Python SDK.