Model overview
On March 6, 2025, Alibaba Cloud released its open-source QwQ-32B reasoning model, which achieved a breakthrough in mathematical, coding, and general capabilities through scaling reinforcement learning. The overall performance of QwQ-32B is comparable to DeepSeek-R1, but with significantly lower deployment and usage costs.
On AIME24 for mathematical capabilities and LiveCodeBench for coding capabilities, QwQ-32B performs similarly to DeepSeek-R1, far surpassing o1-mini and distill models based on DeepSeek-R1 of the same size.
On LiveBench, IFEval, and BFCL, QwQ-32B surpasses DeepSeek-R1.
QwQ-32B innovatively integrates agent-related capabilities, enabling it to think critically while using tools and adjust the reasoning process based on environmental feedback.
PAI-Model Gallery fully supports one-click deployment, fine-tuning, and evaluation capabilities for QwQ-32B, as well as quantized versions of the model. The deployment of QwQ-32B requires 96 GB of video memory. The deployment of quantized QwQ-32B-GGUF and QwQ-32B-AWQ requires lower-cost GPUs such as a single A10.
Deploy the model
Go to the Model Gallery page.
Log on to the PAI console. Select a region in the upper left corner. You can switch regions to find suitable computing resources.
In the left-side navigation pane, choose Workspace and click the name of the desired workspace.
In the left-side navigation pane, choose QuickStart > Model Gallery.
On the Model Gallery page, find the QwQ-32B model card and click to enter the details page.
Click Deploy in the upper right corner.
Select a Deployment Method, configure the service name, and the resource information.
Deployment methods include including SGLang accelerated deployment, vLLM accelerated deployment, and BladeLLM accelerated deployment.
Then, click Deploy to deploy the model in Elastic Algorithm Service (EAS) of PAI.
Call the inference service. After deployment, click View Call Information on the service page to obtain the Endpoint and Token. You can click the link next to Pre-trained model to return to the model details page, where you can find more information about how to call the model.
You can also test the deployed QwQ-32B service online in EAS.
Fine-tune the model
PAI-Model Gallery supports supervised fine-tuning (SFT) for QwQ-32B in two methods: LoRA fine-tuning and full parameter fine-tuning. You can fine-tune the model in an out-of-the-box manner.
Prepare the training data. The SFT algorithm supports training datasets in formats JSON and JSONL. Example
{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "Who are you?"}, {"role": "assistant", "content": "I am Xiaopai, an AI assistant trained by PAI. My goal is to provide users with useful, accurate, and timely information and to help users communicate effectively in various ways. Please let me know how I can assist you."}]} {"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "Who are you!"}, {"role": "assistant", "content": "Hello! I am an AI language model developed by PAI, named Xiaopai. I can answer your questions, provide information, engage in conversation, and help solve problems. If you have any questions or need assistance, please feel free to let me know!"}]}
Configure training parameters. After data preparation, you can upload the data to an Object Storage Service (OSS) Bucket. Due to the large size of the 32B model, the algorithm requires GPU resources of at least 96 GB of video memory. Make sure the resource quota to use has sufficient resources.
The algorithm supports the following hyperparameters. You can adjust the hyperparameters based on the your computing resources, or use the default settings.
Parameter
Description
Note
learning_rate
The learning rate, used to control the adjustment range of model weights.
When learning rate is too large, the training process may be unstable. The loss may fluctuate sharply and cannot converge to a smaller value.
When learning rate is too small, the loss may decrease slowly, requiring a long time to converge.
An appropriate learning rate enables the model to converge quickly and stably.
num_train_epochs
The number of times the training dataset is reused.
A smaller value may lead to underfitting. A larger value may lead to overfitting.
If the sample size is small, you can increase the number of epochs to avoid underfitting.
A smaller learning rate usually requires more epochs.
per_device_train_batch_size
The number of samples processed by each GPU card in one training iteration.
A larger batch size can improve training speed but also increase the demand for video memory.
The ideal batch size is usually the maximum value that does not cause video memory overflow. You can view the GPU memory usage on the monitoring tab of the job details page.
gradient_accumulation_steps
The number of gradient accumulation steps.
A smaller batch size will increase the variance of gradient estimation, affecting the convergence speed. Introducing gradient accumulation will optimize the model after
gradient_accumulation_steps
batches. The value must be a multiple of the number of GPUs.max_length
The maximum token length of input data processed by the model in one training session.
The training data converted by a tokenizer into a token sequence. You can use a token estimation tool to estimate the token length of the text in the training data.
lora_rank
The inner dimensions of the low-rank matrices to train.
lora_alpha
The scaling factor for the low-rank matrices.
Generally set to
lora_rank × 2
.lora_dropout
The dropout probability. Randomly dropping neurons prevents neural network overfitting.
lorap_lr_ratio
The LoRA+ learning rate ratio (λ = ηB/ηA). ηA and ηB are the learning rates of adapter matrices A and B, respectively.
Compared to LoRA, LoRA+ can use different learning rates for key parts of the process to achieve better performance and faster fine-tuning without increasing computational requirements. When lorap_lr_ratio is set to 0, LoRA is used instead of LoRA+.
advanced_settings
In addition to the above parameters, we also support custom parameters. You can configure them in this field in the format of
--key1 value1 --key2 value2
. If not needed, leave this item blank.save_strategy: Model saving strategy.
Valid values: steps, epoch, no.
Default value: steps.
save_steps: Model saving interval.
Default value: 500.
save_total_limit: The maximum number of checkpoints saved. Expired checkpoints will be deleted.
Default value: 2.
If none, all checkpoints will be saved.
warmup_ratio: Controls the learning rate warm-up phase, where the learning rate gradually increases from a smaller value to the set initial learning rate at the beginning of training. The warmup ratio determines the proportion of this warm-up phase in the entire training process.
Default value: 0.
Click Train to start. You can view the training status and training log. The completed model can also be deployed as an online service.
Evaluate the model
PAI-Model Gallery has built-in common algorithms for you to evaluate pre-trained models and fine-tuned models conveniently. It also supports comparison among multiple models that helps you choose the most suitable model.
Model evaluation entry:
Directly evaluate the pre-trained model | |
Evaluate the fine-tuned model on the training details page |
We support evaluation based on custom dataset or public dataset.
Custom dataset
Model evaluation supports commonly used text matching metrics for NLP tasks, such as BLEU/ROUGE, and judge model evaluation (only supported in professional mode). You can evaluate whether the selected model is suitable for your unique scenario and data.
Evaluation requires a JSONL file as evaluation set, where each line of data is a JSON. Example file: evaluation_test.jsonl.
Public dataset
Use open-source evaluation datasets of various domains to evaluate the comprehensive capabilities of your model. PAI currently maintains datasets such as CMMLU, GSM8K, TriviaQA, MMLU, C-Eval, TruthfulQA, and HellaSwag, covering mathematics, knowledge, and reasoning. Other public datasets will be available in the future. Note: Evaluations on GSM8K, TriviaQA, and HellaSwag datasets may take a long time.
Select the output path for the evaluation results and choose the appropriate resources based on system recommendations. Then, submit the evaluation task.
Wait for the task to complete and view the evaluation results. If you selected multiple datasets, the model will run one by one, increasing the waiting time. You can check the steps from the logs.
View the evaluation report: Here are sample results of custom dataset and public dataset
Contact us
The platform will continuously publish SOTA models. If you have any requirements, you can contact us in the Model Gallery user group: DingTalk group number 79680024618.