Use model tuning in Model Studio when optimization methods like prompt engineering and plugin calling fail to meet your performance expectations. As a core strategy to improve model performance, model tuning enhances performance for specific industries and business scenarios, aligns output with human preference, and reduces output latency. Model tuning includes three training methods: supervised fine-tuning (SFT), continued pre-training (CPT), and direct preference optimization (DPO).
Model tuning
To optimize model performance, model tuning can:
Improve model performance for specific industries or scenarios
Reduce model output latency
Mitigate model hallucination
Align with human values or preferences
Replace larger models with fine-tuned, lightweight models
During fine-tuning, a model learns from your training data, adopting characteristics specific to your business or scenario, such as domain knowledge, tone, phrasing, and persona. After fine-tuning on many specific examples, the model often performs better with one-shot or zero-shot prompts than it previously did with few-shot prompts. This lets you use shorter prompts, which saves a significant number of input tokens and reduces model output latency.
Model fine-tuning
For details, see:
Supported models
Billing
Billing method | Billing is based on the volume of training data. |
Billing formula | model training fee = (total tokens in training data + total tokens in mixed training data) × epochs × training unit price (minimum billing unit: 1 token) You can view the estimated training fee at the bottom of the model fine-tuning console and click Computing Details to view the total number of training tokens, epochs, and the training unit price. |
Prerequisites
While model fine-tuning for text generation can deliver excellent results for specific business scenarios, it has the following limitations:
Time-consuming, including: having a large-scale (at least 50 million tokens) CPT dataset, building an effective (1000+) SFT dataset, collecting enough (100+) bad cases to build an effective model deployment billing DPO dataset, and slow model optimization and iteration speeds.
Expensive: A fine-tuned model must be deployed before use, and the model deployment billing is high.
Before considering model fine-tuning, Model Studio recommends that you first try to customize your application by using prompt engineering or function calling. Model fine-tuning is often considered a "last resort" to improve model performance for the following reasons:
For many tasks, a model may not perform well initially. However, you can often improve its performance by using the right prompting techniques, which can eliminate the need for model fine-tuning.
Iteratively optimizing prompts and plugins is more agile and cost-effective than model fine-tuning. Fine-tuning iterations can require resource-intensive steps, such as re-collecting, cleaning, and optimizing data; gathering bad cases; and conducting user surveys.
Even if you ultimately decide that model fine-tuning is necessary, your initial work on prompt engineering and plugin optimization is not wasted. You can fully reuse this preliminary work to build fine-tuning datasets, as it provides their required inputs.
Quick start
Fine-tuning on the console
Fine-tuning steps | Console screenshot |
Step 1: On the Model Fine-tuning page, click Create Training Task. |
|
Step 2: Training configuration
This configuration shortens training time and lowers data requirements. | |
Step 3: Data configuration
|
|
Step 4: Configure checkpoint settings
Note In Model Studio, you can export a checkpoint after a fine-tuning job completes. You must export a checkpoint before deploying that model version. Model Studio stores exported checkpoints in cloud storage. Accessing or downloading them is not currently supported. |
|
Step 5: Click Start Training and wait for the training job to complete. | |
Step 6: Use Model Deployment in Alibaba Cloud Model Studio to deploy the trained custom model. After deployment, you can evaluate the fine-tuned model. For more information about model deployment, see Model Deployment Overview. | |
Tuning workflow
Model Studio offers three complementary tuning methods that work together in a progressive workflow.
CPT (Optional) → SFT → DPO (Optional)
CPT (continual pre-training) - Injects domain-specific knowledge. This method provides the depth and precision that a general model's broad knowledge may lack for specialized domains.
For a finance model:
Learn financial terminologyFor a medical model:
Learn about medications and pathologiesFor a legal model:
Understand legal statutes and case law
SFT (supervised fine-tuning) - Teaches the model to follow instructions.
For a customer service bot:
Learn customer service workflowsFor a code assistant:
Learn programming paradigmsFor tool calling (agent):
Learn to use MCP
DPO (direct preference optimization) - Refines model responses to align with human preferences.
Safety and responsibility:
Refuse harmful suggestionsConciseness and effectiveness:
Provide clear and direct answersObjectivity and neutrality:
Make fair and objective evaluations
Tuning data format
SFT dataset
The SFT training set uses the ChatML (Chat Markup Language) format, which supports multi-turn conversations and multiple role settings.
OpenAI'snameandweightparameters are not supported. All assistant outputs are used for training.
# A single training example in JSON format, which has the following structure when expanded:
{"messages": [
{"role": "system", "content": "System Input 1"},
{"role": "user", "content": "User Input 1"},
{"role": "assistant", "content": "Expected Model Output 1"},
{"role": "user", "content": "User Input 2"},
{"role": "assistant", "content": "Expected Model Output 2"}
...
]}For the differences between the system, user, and assistant roles, see Overview. Sample datasets: SFT-ChatML_format_example.jsonl, SFT-ChatML_format_example.xlsx (XLS and XLSX formats support only single-turn conversations).
The "loss_weight" parameter is supported for all assistant entries in a single training example. This parameter sets the relative importance of the entry during training. The value can range from 0.0 to 1.0, where a higher value indicates greater importance.
This parameter is in private preview. To use it, please contact your business manager.
{"role": "assistant", "content": "Expected Model Output 1", "loss_weight": 1.0},
{"role": "assistant", "content": "Expected Model Output 2", "loss_weight": 0.5}Dataset building tips
Dataset size requirements
CPT requires at least 50 million high-quality pre-training tokens. SFT requires at least 1,000 high-quality tuning examples. DPO typically requires at least 100 human preference examples. If model evaluation results are unsatisfactory after tuning, collect more training data to improve performance.
If you lack sufficient data, consider building an agent application and using a knowledge base to enhance the model's capabilities. For complex business scenarios, a hybrid approach that combines model tuning with knowledge base retrieval is also effective.
For example, in a customer service scenario, you can use model tuning to align the model's tone, phrasing, and persona. Meanwhile, you can use a knowledge base to dynamically introduce domain-specific knowledge into the model's context.
Model Studio recommends that you first build and pilot a RAG application. After collecting sufficient application data, you can then use model tuning to improve the model's performance.
You can also use the following strategies to expand your dataset:
Use a large model to generate content for your business or scenario to create more tuning data. (For this task, we recommend selecting a larger, high-performing model.)
Manually collect more data from various sources, such as real-world application usage, web crawlers, social media and online forums, public datasets, partners and industry resources, and user contributions.
Data diversity and balance
Model tuning approaches vary by scenario. Specific business scenarios require more domain expertise, while question-answering scenarios require more generalization. Design data use cases to match the model's intended business module or usage scenario. Consequently, strong training performance depends less on sheer data volume and more on the data's scenario-specific relevance and diversity.
The following AI conversation scenario illustrates the types of business scenarios to include in a professional, diverse dataset:
Industry | Use cases |
E-commerce customer service | Campaign push notifications, pre-sales inquiries, sales assistance, after-sales service, post-sale follow-ups, and complaint handling. |
Financial services | Loan consultation, investment and financial advisory, credit card services, and bank account management. |
Online healthcare | Symptom consultation, appointment scheduling, pre-visit instructions, drug information lookup, and general health tips. |
AI assistant | IT, administrative, and HR information; employee benefits Q&A; and company calendar lookups. |
Travel assistant | Trip planning, immigration and customs guidance, travel insurance consultation, and information on local culture and customs. |
Corporate legal advisor | Contract review, intellectual property protection, compliance checks, labor law Q&A, cross-border transaction consultation, and case-specific legal analysis. |
Ensure the data for each scenario and business is balanced according to real-world proportions. This prevents the model from becoming biased toward the features of a particular data type, which impairs its generalization ability.
Splitting training and validation sets
For model tuning, the console supports the following:
Automatically split a training dataset and use a random sample to create a validation set.
Select a separate dataset.
During training, the console displays real-time validation set loss and token accuracy.

Frequently asked questions
Fine-tuning a custom model
You cannot use Model Studio to fine-tune, upload, export, or download models.


