All Products
Search
Document Center

Platform For AI:Data augmentation and model distillation for LLMs

Last Updated:Mar 18, 2026

Transfer knowledge from large teacher models to smaller student models using model distillation while preserving performance.

Model distillation reduces LLM size and compute requirements while maintaining performance. This guide covers data augmentation and distillation using Qwen2 models.

Workflow

  1. Prepare instruction data

    Prepare training dataset according to required format and strategies.

  2. Optional: Augment instructions

    Use Qwen2-1.5B-Instruct-Exp or Qwen2-7B-Instruct-Exp to generate semantically similar instructions. Augmentation improves generalization during distillation.

  3. Optional: Optimize instructions

    Use Qwen2-1.5B-Instruct-Refine or Qwen2-7B-Instruct-Refine to enrich instructions—including augmented ones—to improve text generation quality.

  4. Generate responses with teacher model

    Use Qwen2-72B-Instruct to generate responses for instructions in your training dataset, transferring teacher knowledge.

  5. Train student model

    Use completed instruction-response dataset to train a smaller student model suitable for production.

Prerequisites

Complete these prerequisites:

  • Activate Deep Learning Containers (DLC) and EAS of PAI on a pay-as-you-go basis and create a default workspace. For more information, see Activate PAI and create a default workspace.

  • Create an OSS bucket to store training data and model files. For more information, see Quick Start.

Prepare instruction data

Prepare instruction data following the data preparation strategy and data format requirements:

Data preparation strategy

To improve model distillation effectiveness and stability, prepare data using these strategies:

  • Prepare at least several hundred data points. More data improves model performance.

  • Ensure broad and balanced distribution: diverse task scenarios, varied input and output lengths (both short and long examples), and balanced language distribution for multilingual datasets.

  • Process abnormal data. Even small amounts can impact fine-tuning results. Use rule-based methods to clean and filter invalid entries.

Data format requirements

Training dataset must be a JSON file with a single field: instruction. This field contains the input instruction. Example:

[
    {
        "instruction": "What major measures did governments take to stabilize financial markets during the 2008 financial crisis?"
    },
    {
        "instruction": "What important actions have governments taken to promote sustainable development amid worsening climate change?"
    },
    {
        "instruction": "What major measures did governments take to support economic recovery during the 2001 tech bubble burst?"
    }
]

Optional: Augment instructions

Instruction augmentation expands user-provided instruction datasets to increase diversity and coverage.

  • For example, given this input:

    How do I cook fish-flavored shredded pork?
    How do I prepare for the GRE exam?
    What should I do if a friend misunderstands me?
  • The model outputs something like this:

    Teach me how to cook mapo tofu.
    Provide a detailed guide for preparing for the TOEFL exam.
    If you face setbacks at work, how do you adjust your mindset?

Instruction diversity affects LLM generalization. Augmenting instructions improves student model performance. PAI provides two proprietary instruction augmentation models based on Qwen2: Qwen2-1.5B-Instruct-Exp and Qwen2-7B-Instruct-Exp. Deploy either as an EAS online service:

Deploy model service

Deploy the instruction augmentation model as an EAS online service.

  1. Go to Model Gallery page.

    1. Log on to the PAI console.

    2. In the top-left corner, select your region.

    3. In the left-side navigation pane, choose Workspaces, then click your workspace name.

    4. In the left-side navigation pane, choose Getting Started > Model Gallery.

  2. On Model Gallery page, search for Qwen2-1.5B-Instruct-Exp or Qwen2-7B-Instruct-Exp, then click Deploy.

  3. In the Deploy panel, review default values for Model Service Information and Resource Deployment Information. Modify as needed, then click Deploy.

  4. In the Billing Notice dialog, click OK.

    System opens the Deployment Task page. When Status shows Running, deployment succeeds.

Call model service

After deployment, use the API to run inference. See Deploy large language models. Example client request:

  1. Get the service endpoint and token.

    1. On the Service Details page, click Basic Information, then click View Endpoint Information. image

    2. In the Endpoint Information dialog, find the endpoint and token. Save them locally.

  2. Create and run this Python script:

    import argparse
    import json
    import requests
    from typing import List
    
    def post_http_request(prompt: str,
                          system_prompt: str,
                          host: str,
                          authorization: str,
                          max_new_tokens: int,
                          temperature: float,
                          top_k: int,
                          top_p: float) -> requests.Response:
        headers = {
            "User-Agent": "Test Client",
            "Authorization": f"{authorization}"
        }
        pload = {
            "prompt": prompt,
            "system_prompt": system_prompt,
            "top_k": top_k,
            "top_p": top_p,
            "temperature": temperature,
            "max_new_tokens": max_new_tokens,
            "do_sample": True,
            "eos_token_id": 151645
        }
        response = requests.post(host, headers=headers, json=pload)
        return response
    
    def get_response(response: requests.Response) -> List[str]:
        data = json.loads(response.content)
        output = data["response"]
        return output
    
    if __name__ == "__main__":
        parser = argparse.ArgumentParser()
        parser.add_argument("--top-k", type=int, default=50)
        parser.add_argument("--top-p", type=float, default=0.95)
        parser.add_argument("--max-new-tokens", type=int, default=2048)
        parser.add_argument("--temperature", type=float, default=1)
        parser.add_argument("--prompt", type=str, default="Sing me a song.")
    
        args = parser.parse_args()
        prompt = args.prompt
        top_k = args.top_k
        top_p = args.top_p
        temperature = args.temperature
        max_new_tokens = args.max_new_tokens
    
        host = "EAS HOST"
        authorization = "EAS TOKEN"
    
        print(f" --- input: {prompt}\n", flush=True)
        system_prompt = "You are an instruction creator. Your goal is to create a new instruction inspired by the [given instruction]."
        response = post_http_request(
            prompt, system_prompt,
            host, authorization,
            max_new_tokens, temperature, top_k, top_p)
        output = get_response(response)
        print(f" --- output: {output}\n", flush=True)

    Parameters:

    • host: Service endpoint.

    • authorization: Service token.

Batch augmentation

Use the EAS service to batch-process instructions. This example reads a custom JSON dataset and calls the model API to augment instructions. Create and run this Python script:

import requests
import json
import random
from tqdm import tqdm
from typing import List

input_file_path = "input.json"  # Input filename
with open(input_file_path) as fp:
    data = json.load(fp)

total_size = 10  # Target total number of samples after expansion
pbar = tqdm(total=total_size)

while len(data) < total_size:
    prompt = random.sample(data, 1)[0]["instruction"]
    system_prompt = "You are an instruction creator. Your goal is to create a new instruction inspired by the [given instruction]."
    top_k = 50
    top_p = 0.95
    temperature = 1
    max_new_tokens = 2048

    host = "EAS HOST"
    authorization = "EAS TOKEN"

    response = post_http_request(
        prompt, system_prompt,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p)
    output = get_response(response)
    temp = {
        "instruction": output
    }
    data.append(temp)
    pbar.update(1)
pbar.close()

output_file_path = "output.json" # Output filename
with open(output_file_path, 'w') as f:
    json.dump(data, f, ensure_ascii=False)

Parameters:

  • host: Service endpoint.

  • authorization: Service token.

  • file_path: Replace with the local path to your dataset file.

  • The post_http_request and get_response functions match those defined in the Call model service Python script.

Alternatively, use the LLM-Instruction Expansion (DLC) component in PAI-Designer without code. See Custom pipelines. image

Optional: Optimize instructions

Instruction optimization refines user-provided instruction datasets to generate more detailed, structured instructions, leading to richer LLM responses.

  • For example, given this input to the instruction optimization model:

    How do I cook fish-flavored shredded pork?
    How do I prepare for the GRE exam?
    What should I do if a friend misunderstands me?
  • The model outputs something like this:

    Provide a detailed Sichuan-style recipe for fish-flavored shredded pork. Include a specific ingredient list—vegetables, pork, and seasonings—along with step-by-step cooking instructions. Also recommend suitable side dishes and staple foods to serve with it.
    Provide a comprehensive guide covering GRE registration, required documents, study strategies, and recommended review materials. Also suggest effective practice questions and mock exams to help me prepare.
    Provide a detailed guide on staying calm and rational when misunderstood by a friend—and communicating effectively to resolve it. Include practical advice—for example, how to express your thoughts and feelings, how to avoid escalating misunderstandings, and specific dialogue scenarios and situations for practice.

Instruction detail affects LLM output quality. Optimizing instructions improves student model performance. PAI provides two proprietary instruction optimization models based on Qwen2: Qwen2-1.5B-Instruct-Refine and Qwen2-7B-Instruct-Refine. Deploy either as an EAS online service:

Deploy model service

  1. Go to Model Gallery page.

    1. Log on to the PAI console.

    2. In the top-left corner, select your region.

    3. In the left-side navigation pane, choose Workspaces, then click your workspace name.

    4. In the left-side navigation pane, choose Getting Started > Model Gallery.

  2. On Model Gallery page, search for Qwen2-1.5B-Instruct-Refine or Qwen2-7B-Instruct-Refine, then click Deploy.

  3. In the Deploy panel, review default values for Model Service Information and Resource Deployment Information. Modify as needed, then click Deploy.

  4. In the Billing Notice dialog, click OK.

    System opens the Deployment Task page. When Status shows Running, deployment succeeds.

Call model service

After deployment, use the API to run inference. See Deploy large language models. Example client request:

  1. Get the service endpoint and token.

    1. On the Service Details page, click Basic Information, then click View Endpoint Information. image

    2. In the Endpoint Information dialog, find the endpoint and token. Save them locally.

  2. Create and run this Python script:

    import argparse
    import json
    import requests
    from typing import List
    
    
    def post_http_request(prompt: str,
                          system_prompt: str,
                          host: str,
                          authorization: str,
                          max_new_tokens: int,
                          temperature: float,
                          top_k: int,
                          top_p: float) -> requests.Response:
        headers = {
            "User-Agent": "Test Client",
            "Authorization": f"{authorization}"
        }
        pload = {
            "prompt": prompt,
            "system_prompt": system_prompt,
            "top_k": top_k,
            "top_p": top_p,
            "temperature": temperature,
            "max_new_tokens": max_new_tokens,
            "do_sample": True,
            "eos_token_id": 151645
        }
        response = requests.post(host, headers=headers, json=pload)
        return response
    
    
    def get_response(response: requests.Response) -> List[str]:
        data = json.loads(response.content)
        output = data["response"]
        return output
    
    
    if __name__ == "__main__":
        parser = argparse.ArgumentParser()
        parser.add_argument("--top-k", type=int, default=2)
        parser.add_argument("--top-p", type=float, default=0.95)
        parser.add_argument("--max-new-tokens", type=int, default=256)
        parser.add_argument("--temperature", type=float, default=0.5)
        parser.add_argument("--prompt", type=str, default="Sing me a song.")
    
        args = parser.parse_args()
        prompt = args.prompt
        top_k = args.top_k
        top_p = args.top_p
        temperature = args.temperature
        max_new_tokens = args.max_new_tokens
    
        host = "EAS HOST"
        authorization = "EAS TOKEN"
    
        print(f" --- input: {prompt}\n", flush=True)
        system_prompt = "Optimize this instruction to make it more detailed and specific."
        response = post_http_request(
            prompt, system_prompt,
            host, authorization,
            max_new_tokens, temperature, top_k, top_p)
        output = get_response(response)
        print(f" --- output: {output}\n", flush=True)
    

    Parameters:

    • host: Service endpoint.

    • authorization: Service token.

Batch optimization

Use the EAS service to batch-process instructions. This example reads a custom JSON dataset and calls the model API to optimize instructions. Create and run this Python script:

import requests
import json
import random
from tqdm import tqdm
from typing import List

input_file_path = "input.json"  # Input filename

with open(input_file_path) as fp:
    data = json.load(fp)

pbar = tqdm(total=len(data))
new_data = []

for d in data:
    prompt = d["instruction"]
    system_prompt = "Optimize the following instruction."
    top_k = 50
    top_p = 0.95
    temperature = 1
    max_new_tokens = 2048

    host = "EAS HOST"
    authorization = "EAS TOKEN"

    response = post_http_request(
        prompt, system_prompt,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p)
    output = get_response(response)
    temp = {
        "instruction": output
    }
    new_data.append(temp)
    pbar.update(1)
pbar.close()

output_file_path = "output.json"  # Output filename
with open(output_file_path, 'w') as f:
    json.dump(new_data, f, ensure_ascii=False)

Parameters:

  • host: Service endpoint.

  • authorization: Service token.

  • file_path: Replace with the local path to your dataset file.

  • The post_http_request and get_response functions match those defined in the Call model service Python script.

Alternatively, use the LLM-Instruction Optimization (DLC) component in PAI-Designer without code. See Custom pipelines. image

Generate responses with teacher model

Deploy model service

After optimizing your instruction dataset, deploy a teacher LLM to generate responses:

  1. Go to Model Gallery page.

    1. Log on to the PAI console.

    2. In the top-left corner, select your region.

    3. In the left-side navigation pane, choose Workspaces, then click your workspace name.

    4. In the left-side navigation pane, choose Getting Started > Model Gallery.

  2. On Model Gallery page, search for Qwen2-72B-Instruct, then click Deploy.

  3. In the Deploy panel, review default values for Model Service Information and Resource Deployment Information. Modify as needed, then click Deploy.

  4. In the Billing Notice dialog, click OK.

    System opens the Deployment Task page. When Status shows Running, deployment succeeds.

Call model service

After deployment, use the API to run inference. See Deploy large language models. Example client request:

  1. Get the service endpoint and token.

    1. On the Service Details page, click Basic Information, then click View Endpoint Information. image

    2. In the Endpoint Information dialog, find the endpoint and token. Save them locally.

  2. Create and run this Python script:

    import argparse
    import json
    import requests
    from typing import List
    
    
    def post_http_request(prompt: str,
                          system_prompt: str,
                          host: str,
                          authorization: str,
                          max_new_tokens: int,
                          temperature: float,
                          top_k: int,
                          top_p: float) -> requests.Response:
        headers = {
            "User-Agent": "Test Client",
            "Authorization": f"{authorization}"
        }
        pload = {
            "prompt": prompt,
            "system_prompt": system_prompt,
            "top_k": top_k,
            "top_p": top_p,
            "temperature": temperature,
            "max_new_tokens": max_new_tokens,
            "do_sample": True,
        }
        response = requests.post(host, headers=headers, json=pload)
        return response
    
    
    def get_response(response: requests.Response) -> List[str]:
        data = json.loads(response.content)
        output = data["response"]
        return output
    
    
    if __name__ == "__main__":
        parser = argparse.ArgumentParser()
        parser.add_argument("--top-k", type=int, default=50)
        parser.add_argument("--top-p", type=float, default=0.95)
        parser.add_argument("--max-new-tokens", type=int, default=2048)
        parser.add_argument("--temperature", type=float, default=0.5)
        parser.add_argument("--prompt", type=str)
        parser.add_argument("--system_prompt", type=str)
    
        args = parser.parse_args()
        prompt = args.prompt
        system_prompt = args.system_prompt
        top_k = args.top_k
        top_p = args.top_p
        temperature = args.temperature
        max_new_tokens = args.max_new_tokens
    
        host = "EAS HOST"
        authorization = "EAS TOKEN"
    
        print(f" --- input: {prompt}\n", flush=True)
        response = post_http_request(
            prompt, system_prompt,
            host, authorization,
            max_new_tokens, temperature, top_k, top_p)
        output = get_response(response)
        print(f" --- output: {output}\n", flush=True)
    

    Parameters:

    • host: Service endpoint.

    • authorization: Service token.

Batch teacher model instruction annotation

This example reads a custom JSON dataset and calls the model API to annotate instructions. Create and run this Python script:

import json 
from tqdm import tqdm
import requests
from typing import List

input_file_path = "input.json"  # Input filename

with open(input_file_path) as fp:
    data = json.load(fp)

pbar = tqdm(total=len(data))
new_data = []

for d in data:
    system_prompt = "You are a helpful assistant."
    prompt = d["instruction"]
    print(prompt)
    top_k = 50
    top_p = 0.95
    temperature = 0.5
    max_new_tokens = 2048

    host = "EAS HOST"
    authorization = "EAS TOKEN"

    response = post_http_request(
        prompt, system_prompt,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p)
    output = get_response(response)
    temp = {
        "instruction": prompt,
        "output": output
    }
    new_data.append(temp)
    pbar.update(1)
pbar.close()

output_file_path = "output.json" # Output filename
with open(output_file_path, 'w') as f:
    json.dump(new_data, f, ensure_ascii=False)

Parameters:

  • host: Service endpoint.

  • authorization: Service token.

  • file_path: Replace with the local path to your dataset file.

  • The post_http_request and get_response functions match those defined in the Call model service script.

Train student model

Train the model

After obtaining teacher model responses, train a student model in Model Gallery. This solution uses Qwen2-7B-Instruct:

  1. Go to Model Gallery page.

    1. Log on to the PAI console.

    2. In the top-left corner, select your region.

    3. In the left-side navigation pane, choose Workspaces, then click your workspace name.

    4. In the left-side navigation pane, choose Getting Started > Model Gallery.

  2. On Model Gallery page, search for and click Qwen2-7B-Instruct card to open its details page.

  3. On model details page, click Fine-tune in the upper-right corner.

  4. In the Fine-tune panel, set these key parameters. Leave others at defaults.

    Parameter

    Description

    Default value

    Dataset configuration

    Training dataset

    Select OSS file or directory from dropdown, then select your dataset’s OSS path:

    1. Click image and select your OSS bucket.

    2. Click Upload file. Upload your dataset file to OSS directory.

    3. Click OK.

    None

    Training output configuration

    model

    Click image, then select your OSS storage directory.

    None

    tensorboard

    Click image, then select your OSS storage directory.

    None

    Compute resource configuration

    Job resources

    Select a resource specification. The system recommends suitable options.

    None

    Hyperparameter configuration

    learning_rate

    Learning rate. Type: Float.

    5e-5

    num_train_epochs

    Number of training epochs. Type: INT.

    1

    per_device_train_batch_size

    Number of training samples per GPU per iteration. Type: INT.

    1

    seq_length

    Text sequence length. Type: INT.

    128

    lora_dim

    LoRA dimension. Type: INT. If lora_dim>0, use LoRA/QLoRA lightweight training.

    32

    lora_alpha

    LoRA weight. Type: INT. Takes effect only if lora_dim>0 and LoRA/QLoRA lightweight training is used.

    32

    load_in_4bit

    Whether to load the model in 4-bit mode. Type: bool. Valid values:

    • true

    • false

    If lora_dim>0, load_in_4bit is true, and load_in_8bit is false, use 4-bit QLoRA lightweight training.

    true

    load_in_8bit

    Whether to load the model in 8-bit mode. Type: bool. Valid values:

    • true

    • false

    If lora_dim>0, load_in_4bit is false, and load_in_8bit is true, use 8-bit QLoRA lightweight training.

    false

    gradient_accumulation_steps

    Number of gradient accumulation steps. Type: INT.

    8

    apply_chat_template

    Whether to combine training data with the default chat template. Type: bool. Valid values:

    • true

    • false

    For Qwen2 series models, the format is:

    • Question: <|im_end|>\n<|im_start|>user\n + instruction + <|im_end|>\n

    • Answer: <|im_start|>assistant\n + output + <|im_end|>\n

    true

    system_prompt

    System prompt for training. Type: String.

    You are a helpful assistant

  5. After setting parameters, click Train.

  6. In the Billing Notice dialog, click OK.

    System opens training task page.

Deploy model service

After training, deploy the model as an EAS service.

  1. On training task page, click Deploy on the right side.

  2. In the deployment panel, system sets default values for Model Service Information and Resource Deployment Information. Modify as needed and click Deploy.

  3. In the Billing Notice dialog, click OK.

    System opens the Deployment Task page. When Status shows Running, deployment succeeds.

Call model service

After deployment, use the API for inference. See Deploy large language models.

References