All Products
Search
Document Center

Alibaba Cloud Model Studio:Model tuning

Last Updated:Jun 03, 2026

Use model tuning in Model Studio when optimization methods like prompt engineering and plugin calling fail to meet your performance expectations. As a core strategy to improve model performance, model tuning enhances performance for specific industries and business scenarios, aligns output with human preference, and reduces output latency. Model tuning includes three training methods: supervised fine-tuning (SFT), continued pre-training (CPT), and direct preference optimization (DPO).

Model tuning

To optimize model performance, model tuning can:

  • Improve model performance for specific industries or scenarios

  • Reduce model output latency

  • Mitigate model hallucination

  • Align with human values or preferences

  • Replace larger models with fine-tuned, lightweight models

During fine-tuning, a model learns from your training data, adopting characteristics specific to your business or scenario, such as domain knowledge, tone, phrasing, and persona. After fine-tuning on many specific examples, the model often performs better with one-shot or zero-shot prompts than it previously did with few-shot prompts. This lets you use shorter prompts, which saves a significant number of input tokens and reduces model output latency.

Model fine-tuning

image

For details, see:

Supported models

Singapore

Text generation

Model name

Model code

SFT full training (sft)

SFT efficient training (efficient_sft)

Qwen3-14B

qwen3-14b

×

Supported

Visual understanding (Qwen-VL)

Model name

Model code

SFT full training (sft)

SFT efficient training (efficient_sft)

-

-

-

-

China (Beijing)

Text generation

Model service

Model code

CPT full training (cpt)

SFT full training (sft)

SFT efficient training (sft_efficient)

DPO full training (dpo_full)

DPO efficient training (dpo_lora)

Qwen3.6-Flash-2026-04-16

qwen3.6-flash-2026-04-16

×

Supported

×

×

×

Qwen3.5-27B

qwen3.5-27b

×

Supported

Supported

×

×

Qwen3.5-9B

qwen3.5-9b

×

Supported

Supported

×

×

Qwen3.5-Flash-2026-02-23

qwen3.5-flash-2026-02-23

×

Supported

×

×

×

Qwen3-32B

qwen3-32b

Supported

Supported

Supported

Supported

Supported

Qwen3-30B-A3B-Instruct-2507

qwen3-30b-a3b-instruct-2507

Supported

Supported

Supported

×

×

Qwen3-14B

qwen3-14b

×

Supported

Supported

Supported

Supported

Qwen3-8B

qwen3-8b

×

Supported

Supported

Supported

Supported

Qwen3-1.7B

qwen3-1.7b

Supported

Supported

Supported

Supported

Supported

Qwen3-0.6B

qwen3-0.6b

Supported

Supported

Supported

Supported

Supported

Qwen2.5-72B-Instruct

qwen2.5-72b-instruct

Supported

Supported

Supported

Supported

Supported

Qwen2.5-32B-Instruct

qwen2.5-32b-instruct

Supported

Supported

Supported

Supported

Supported

Qwen2.5-14B-Instruct

qwen2.5-14b-instruct

Supported

Supported

Supported

Supported

Supported

Qwen2.5-7B-Instruct

qwen2.5-7b-instruct

Supported

Supported

Supported

Supported

Supported

Qwen-Plus-Character-2025-11-06

qwen-plus-character-2025-11-06

×

Supported

Supported

Supported

Supported

Visual understanding (Qwen-VL)

Model service

Model code

CPT full training (cpt)

SFT full training (sft)

SFT efficient training (sft_efficient)

DPO full training (dpo_full)

DPO efficient training (dpo_lora)

Qwen3-VL-8B-Instruct

qwen3-vl-8b-instruct

×

Supported

Supported

×

×

Qwen3-VL-8B-Thinking

qwen3-vl-8b-thinking

×

Supported

Supported

×

×

Qwen3-VL-4B-Instruct

qwen3-vl-4b-instruct

×

Supported

Supported

×

×

Qwen2.5-VL-72B-Instruct

qwen2.5-vl-72b-instruct

×

Supported

Supported

×

×

Qwen2.5-VL-32B-Instruct

qwen2.5-vl-32b-instruct

×

Supported

Supported

×

×

Qwen2.5-VL-7B-Instruct

qwen2.5-vl-7b-instruct

×

Supported

Supported

×

×

Optimization method comparison

Feature

CPT

SFT

DPO

One-line summary

Injects domain-specific knowledge

Teaches the model to follow instructions

Aligns outputs with human preferences

Input data

10 million+ tokens

Unlabeled domain text

1,000+ examples

High-quality prompt-response pairs

100+ sets

Chosen/rejected response pairs for the same prompt

Core objective

Domain adaptation, learning specialized vocabulary and facts

Develops conversational structure and task-execution skills

Refines alignment to match human values and preferences

Learning method

Self-supervised learning (predicts the next token)

Supervised learning (mimics ideal responses)

Direct preference learning (rewards chosen responses over rejected ones)

Model stage

Typically before SFT

After CPT, before DPO

The final alignment step, typically after SFT

Training modes

Full-parameter fine-tuning

Parameter-efficient fine-tuning (LoRA, recommended)

Scenarios

• To teach the model new capabilities

• To achieve the best overall performance

• To optimize model performance in specific scenarios

• For time- and cost-sensitive scenarios

Training time

Longer, with slower convergence.

Shorter, with faster convergence.

Billing

Billing method

Billing is based on the volume of training data.

Billing formula

model training fee = (total tokens in training data + total tokens in mixed training data) × epochs × training unit price (minimum billing unit: 1 token)

You can view the estimated training fee at the bottom of the model fine-tuning console and click Computing Details to view the total number of training tokens, epochs, and the training unit price.

Training unit price

The following tables list the training unit prices for pre-trained models. The training unit price for a custom model is the same as that of its corresponding pre-trained model.

Asia Pacific SE 1 (Singapore)

Qwen

Model service

Model identifier

Price

Qwen3-14B

qwen3-14b

USD 0.0016 per 1,000 tokens

Qwen-VL

Model service

Model identifier

Price

-

-

-

China (Beijing)

Qwen

Model service

Model identifier

Price

Qwen3.5-27B

qwen3.5-27b

USD 0.006876 per 1,000 tokens

Qwen3.5-9B

qwen3.5-9b

USD 0.00275 per 1,000 tokens

Qwen3-32B

qwen3-32b

USD 0.005501 per 1,000 tokens

Qwen3-30B-A3B-Instruct-2507

qwen3-30b-a3b-instruct-2507

USD 0.004126 per 1,000 tokens

Qwen3-14B

qwen3-14b

USD 0.004126 per 1,000 tokens

Qwen3-8B

qwen3-8b

USD 0.000825 per 1,000 tokens

Qwen3-1.7B

qwen3-1.7b

USD 0.000619 per 1,000 tokens

Qwen3-0.6B

qwen3-0.6b

USD 0.000413 per 1,000 tokens

Qwen2.5-72B-Instruct

qwen2.5-72b-instruct

USD 0.020628 per 1,000 tokens

Qwen2.5-32B-Instruct

qwen2.5-32b-instruct

USD 0.004126 per 1,000 tokens

Qwen2.5-14B-Instruct

qwen2.5-14b-instruct

USD 0.004126 per 1,000 tokens

Qwen2.5-7B-Instruct

qwen2.5-7b-instruct

USD 0.000825 per 1,000 tokens

Qwen-Plus-Character-2025-11-06

qwen-plus-character-2025-11-06

USD 0.020628 per 1,000 tokens

Qwen-VL

Model service

Model identifier

Price

Qwen3-VL-8B-Instruct

qwen3-vl-8b-instruct

USD 0.00165 per 1,000 tokens

Qwen3-VL-8B-Thinking

qwen3-vl-8b-thinking

USD 0.00165 per 1,000 tokens

Qwen3-VL-4B-Instruct

qwen3-vl-4b-instruct

USD 0.000825 per 1,000 tokens

Qwen2.5-VL-72B-Instruct

qwen2.5-vl-72b-instruct

USD 0.006876 per 1,000 tokens

Qwen2.5-VL-32B-Instruct

qwen2.5-vl-32b-instruct

USD 0.00275 per 1,000 tokens

Qwen2.5-VL-7B-Instruct

qwen2.5-vl-7b-instruct

USD 0.001375 per 1,000 tokens

Prerequisites

  • While model fine-tuning for text generation can deliver excellent results for specific business scenarios, it has the following limitations:

    • Time-consuming, including: having a large-scale (at least 50 million tokens) CPT dataset, building an effective (1000+) SFT dataset, collecting enough (100+) bad cases to build an effective model deployment billing DPO dataset, and slow model optimization and iteration speeds.

    • Expensive: A fine-tuned model must be deployed before use, and the model deployment billing is high.

  • Before considering model fine-tuning, Model Studio recommends that you first try to customize your application by using prompt engineering or function calling. Model fine-tuning is often considered a "last resort" to improve model performance for the following reasons:

    1. For many tasks, a model may not perform well initially. However, you can often improve its performance by using the right prompting techniques, which can eliminate the need for model fine-tuning.

    2. Iteratively optimizing prompts and plugins is more agile and cost-effective than model fine-tuning. Fine-tuning iterations can require resource-intensive steps, such as re-collecting, cleaning, and optimizing data; gathering bad cases; and conducting user surveys.

    3. Even if you ultimately decide that model fine-tuning is necessary, your initial work on prompt engineering and plugin optimization is not wasted. You can fully reuse this preliminary work to build fine-tuning datasets, as it provides their required inputs.

Quick start

Fine-tuning on the console

Fine-tuning steps

Console screenshot

Step 1: On the Model Fine-tuning page, click Create Training Task.

image

Step 2: Training configuration

  • Training Method: Supervised Fine-tuning (SFT)

  • Select Model: Qwen3-8B

  • Training Method: Efficient Training

  • Configure Parameters: Keep the default settings. Model Studio provides recommended hyperparameters.

This configuration shortens training time and lowers data requirements.

Step 3: Data configuration

  • Training Set: Select an uploaded fine-tuning dataset.

    Sample data: SFT-ChatML_format_example.jsonl

  • Mixed Training: Disabled

  • Validation Set: Select Automatic Splitting to use 10% of the data as a validation set.

image

Step 4: Configure checkpoint settings

  • Model Name: Keep the default value.

  • Maximum Exports: Keep the default value.

  • Checkpoint save interval: Keep the default value.

Note

In Model Studio, you can export a checkpoint after a fine-tuning job completes. You must export a checkpoint before deploying that model version.

Model Studio stores exported checkpoints in cloud storage. Accessing or downloading them is not currently supported.

image

Step 5: Click Start Training and wait for the training job to complete.

Step 6: Use Model Deployment in Alibaba Cloud Model Studio to deploy the trained custom model. After deployment, you can evaluate the fine-tuned model. For more information about model deployment, see Model Deployment Overview.

Tuning workflow

Model Studio offers three complementary tuning methods that work together in a progressive workflow.

CPT (Optional) → SFT → DPO (Optional)

  1. CPT (continual pre-training) - Injects domain-specific knowledge. This method provides the depth and precision that a general model's broad knowledge may lack for specialized domains.

    • For a finance model: Learn financial terminology

    • For a medical model: Learn about medications and pathologies

    • For a legal model: Understand legal statutes and case law

  2. SFT (supervised fine-tuning) - Teaches the model to follow instructions.

    • For a customer service bot: Learn customer service workflows

    • For a code assistant: Learn programming paradigms

    • For tool calling (agent): Learn to use MCP

  3. DPO (direct preference optimization) - Refines model responses to align with human preferences.

    • Safety and responsibility: Refuse harmful suggestions

    • Conciseness and effectiveness: Provide clear and direct answers

    • Objectivity and neutrality: Make fair and objective evaluations

Tuning data format

SFT dataset

The SFT training set uses the ChatML (Chat Markup Language) format, which supports multi-turn conversations and multiple role settings.

OpenAI's name and weight parameters are not supported. All assistant outputs are used for training.
# A single training example in JSON format, which has the following structure when expanded:
{"messages": [
  {"role": "system", "content": "System Input 1"}, 
  {"role": "user", "content": "User Input 1"}, 
  {"role": "assistant", "content": "Expected Model Output 1"}, 
  {"role": "user", "content": "User Input 2"}, 
  {"role": "assistant", "content": "Expected Model Output 2"}
  ...
]}

For the differences between the system, user, and assistant roles, see Overview. Sample datasets: SFT-ChatML_format_example.jsonl, SFT-ChatML_format_example.xlsx (XLS and XLSX formats support only single-turn conversations).

The "loss_weight" parameter is supported for all assistant entries in a single training example. This parameter sets the relative importance of the entry during training. The value can range from 0.0 to 1.0, where a higher value indicates greater importance.

This parameter is in private preview. To use it, please contact your business manager.
 {"role": "assistant", "content": "Expected Model Output 1", "loss_weight": 1.0}, 
 {"role": "assistant", "content": "Expected Model Output 2", "loss_weight": 0.5}

Dataset building tips

Dataset size requirements

CPT requires at least 50 million high-quality pre-training tokens. SFT requires at least 1,000 high-quality tuning examples. DPO typically requires at least 100 human preference examples. If model evaluation results are unsatisfactory after tuning, collect more training data to improve performance.

If you lack sufficient data, consider building an agent application and using a knowledge base to enhance the model's capabilities. For complex business scenarios, a hybrid approach that combines model tuning with knowledge base retrieval is also effective.

For example, in a customer service scenario, you can use model tuning to align the model's tone, phrasing, and persona. Meanwhile, you can use a knowledge base to dynamically introduce domain-specific knowledge into the model's context.

Model Studio recommends that you first build and pilot a RAG application. After collecting sufficient application data, you can then use model tuning to improve the model's performance.

You can also use the following strategies to expand your dataset:

  1. Use a large model to generate content for your business or scenario to create more tuning data. (For this task, we recommend selecting a larger, high-performing model.)

  2. Manually collect more data from various sources, such as real-world application usage, web crawlers, social media and online forums, public datasets, partners and industry resources, and user contributions.

Data diversity and balance

Model tuning approaches vary by scenario. Specific business scenarios require more domain expertise, while question-answering scenarios require more generalization. Design data use cases to match the model's intended business module or usage scenario. Consequently, strong training performance depends less on sheer data volume and more on the data's scenario-specific relevance and diversity.

The following AI conversation scenario illustrates the types of business scenarios to include in a professional, diverse dataset:

Industry

Use cases

E-commerce customer service

Campaign push notifications, pre-sales inquiries, sales assistance, after-sales service, post-sale follow-ups, and complaint handling.

Financial services

Loan consultation, investment and financial advisory, credit card services, and bank account management.

Online healthcare

Symptom consultation, appointment scheduling, pre-visit instructions, drug information lookup, and general health tips.

AI assistant

IT, administrative, and HR information; employee benefits Q&A; and company calendar lookups.

Travel assistant

Trip planning, immigration and customs guidance, travel insurance consultation, and information on local culture and customs.

Corporate legal advisor

Contract review, intellectual property protection, compliance checks, labor law Q&A, cross-border transaction consultation, and case-specific legal analysis.

Ensure the data for each scenario and business is balanced according to real-world proportions. This prevents the model from becoming biased toward the features of a particular data type, which impairs its generalization ability.

Splitting training and validation sets

For model tuning, the console supports the following:

  • Automatically split a training dataset and use a random sample to create a validation set.

  • Select a separate dataset.

During training, the console displays real-time validation set loss and token accuracy.

image

Frequently asked questions

Fine-tuning a custom model

You cannot use Model Studio to fine-tune, upload, export, or download models.