All Products
Search
Document Center

Platform For AI:Data augmentation and model distillation for LLMs

Last Updated:Mar 02, 2026

Training and inference for large language models (LLMs) consume significant energy and result in long response times. These limitations hinder LLM deployment in resource-constrained environments. To address them, PAI provides model distillation—a technique that transfers knowledge from a large teacher model to a smaller student model. This preserves most of the original performance while substantially reducing model size and compute resource requirements, enabling broader real-world adoption. This topic walks you through the end-to-end development workflow for LLM data augmentation and model distillation using the Qwen2 LLM.

Workflow

Follow this complete development workflow:

  1. Prepare instruction data

    Prepare your training dataset according to the required data format and recommended data preparation strategies.

  2. Optional: Use an instruction augmentation model

    In the PAI-Model Gallery, use a prebuilt instruction augmentation model—Qwen2-1.5B-Instruct-Exp or Qwen2-7B-Instruct-Exp. These models automatically generate additional instructions semantically similar to those in your dataset. Instruction augmentation improves generalization during LLM distillation training.

  3. Optional: Use an instruction optimization model

    In the PAI-Model Gallery, use a prebuilt instruction optimization model—Qwen2-1.5B-Instruct-Refine or Qwen2-7B-Instruct-Refine. These models refine and enrich instructions from your dataset—including augmented ones—to improve LLM text generation quality.

  4. Deploy a teacher LLM to generate responses

    In the PAI-Model Gallery, use a prebuilt teacher LLM—Qwen2-72B-Instruct—to generate responses for the instructions in your training dataset. This step distills knowledge from the teacher model into your dataset.

  5. Distill-train a smaller student model

    In the PAI-Model Gallery, use your completed instruction-response dataset to distill-train a smaller student model suitable for production deployment.

Prerequisites

Before you begin, confirm you have completed these steps:

  • You have activated the pay-as-you-go billing method for PAI (including DLC and EAS) and created a default workspace. For more information, see Activate PAI and create a default workspace.

  • You have created an Object Storage Service (OSS) bucket to store training data and resulting model files. For more information about how to create a bucket, see Quick Start.

Prepare instruction data

Use the data preparation strategy and data format requirements to prepare instruction data:

Data preparation strategy

To improve the effectiveness and stability of model distillation, you can use the following strategies to prepare your data:

  • You need to prepare at least several hundred data points. Preparing more data improves model performance.

  • The seed dataset should have a broad and balanced distribution. For example, task scenarios should be diverse, data inputs and outputs should include both short and long examples, and if the dataset contains multiple languages—such as Chinese and English—the language distribution should be relatively balanced.

  • Process abnormal data. Even small amounts can significantly impact fine-tuning results. Use rule-based methods to clean the data and filter out invalid entries.

Data format requirements

Your training dataset must be a JSON file containing a single field: instruction. This field holds the input instruction. Example:

[
    {
        "instruction": "What major measures did governments take to stabilize financial markets during the 2008 financial crisis?"
    },
    {
        "instruction": "What important actions have governments taken to promote sustainable development amid worsening climate change?"
    },
    {
        "instruction": "What major measures did governments take to support economic recovery during the 2001 tech bubble burst?"
    }
]

Optional: Use an instruction augmentation model

Instruction augmentation is a common prompt engineering technique for LLMs. It automatically expands user-provided instruction datasets to increase diversity and coverage.

  • For example, given this input:

    How do I cook fish-flavored shredded pork?
    How do I prepare for the GRE exam?
    What should I do if a friend misunderstands me?
  • The model outputs something like this:

    Teach me how to cook mapo tofu.
    Provide a detailed guide for preparing for the TOEFL exam.
    If you face setbacks at work, how do you adjust your mindset?

Instruction diversity directly affects LLM generalization. Augmenting instructions improves final student model performance. Based on the Qwen2 base model, PAI provides two proprietary instruction augmentation models: Qwen2-1.5B-Instruct-Exp and Qwen2-7B-Instruct-Exp. You can deploy either as an EAS online service with one click. Follow these steps:

Deploy the model service

Deploy the instruction augmentation model as an EAS online service:

  1. Go to the Model Gallery page.

    1. Log on to the PAI console.

    2. In the top-left corner, select your region.

    3. In the navigation pane on the left, choose Workspaces, then click your workspace name to open it.

    4. In the navigation pane on the left, choose Getting Started > Model Gallery.

  2. In the model list on the right side of the Model Gallery page, search for Qwen2-1.5B-Instruct-Exp or Qwen2-7B-Instruct-Exp. Click Deploy on the corresponding card.

  3. In the Deploy configuration panel, the system sets default values for Model Service Information and Resource Deployment Information. Modify them as needed. Then click Deploy.

  4. In the Billing Notice dialog box, click OK.

    The system opens the Deployment Task page. When the Status shows Running, deployment succeeds.

Call the model service

After successful deployment, use the API to run inference. For full usage, see Deploy large language models. Below is an example client request:

  1. Get the service endpoint and token.

    1. On the Service Details page, click Basic Information, then click View Endpoint Information. image

    2. In the Endpoint Information dialog box, find the endpoint and token. Save them locally.

  2. In your terminal, create and run this Python script:

    import argparse
    import json
    import requests
    from typing import List
    
    def post_http_request(prompt: str,
                          system_prompt: str,
                          host: str,
                          authorization: str,
                          max_new_tokens: int,
                          temperature: float,
                          top_k: int,
                          top_p: float) -> requests.Response:
        headers = {
            "User-Agent": "Test Client",
            "Authorization": f"{authorization}"
        }
        pload = {
            "prompt": prompt,
            "system_prompt": system_prompt,
            "top_k": top_k,
            "top_p": top_p,
            "temperature": temperature,
            "max_new_tokens": max_new_tokens,
            "do_sample": True,
            "eos_token_id": 151645
        }
        response = requests.post(host, headers=headers, json=pload)
        return response
    
    def get_response(response: requests.Response) -> List[str]:
        data = json.loads(response.content)
        output = data["response"]
        return output
    
    if __name__ == "__main__":
        parser = argparse.ArgumentParser()
        parser.add_argument("--top-k", type=int, default=50)
        parser.add_argument("--top-p", type=float, default=0.95)
        parser.add_argument("--max-new-tokens", type=int, default=2048)
        parser.add_argument("--temperature", type=float, default=1)
        parser.add_argument("--prompt", type=str, default="Sing me a song.")
    
        args = parser.parse_args()
        prompt = args.prompt
        top_k = args.top_k
        top_p = args.top_p
        temperature = args.temperature
        max_new_tokens = args.max_new_tokens
    
        host = "EAS HOST"
        authorization = "EAS TOKEN"
    
        print(f" --- input: {prompt}\n", flush=True)
        system_prompt = "You are an instruction creator. Your goal is to create a new instruction inspired by the [given instruction]."
        response = post_http_request(
            prompt, system_prompt,
            host, authorization,
            max_new_tokens, temperature, top_k, top_p)
        output = get_response(response)
        print(f" --- output: {output}\n", flush=True)

    Where:

    • host: Set to your service endpoint.

    • authorization: Set to your service token.

Batch instruction augmentation

You can use the EAS online service above to batch-process instructions. The following example reads a custom JSON dataset and calls the model API to augment instructions. Create and run this Python script in your terminal:

import requests
import json
import random
from tqdm import tqdm
from typing import List

input_file_path = "input.json"  # Input filename
with open(input_file_path) as fp:
    data = json.load(fp)

total_size = 10  # Target total number of samples after expansion
pbar = tqdm(total=total_size)

while len(data) < total_size:
    prompt = random.sample(data, 1)[0]["instruction"]
    system_prompt = "You are an instruction creator. Your goal is to create a new instruction inspired by the [given instruction]."
    top_k = 50
    top_p = 0.95
    temperature = 1
    max_new_tokens = 2048

    host = "EAS HOST"
    authorization = "EAS TOKEN"

    response = post_http_request(
        prompt, system_prompt,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p)
    output = get_response(response)
    temp = {
        "instruction": output
    }
    data.append(temp)
    pbar.update(1)
pbar.close()

output_file_path = "output.json" # Output filename
with open(output_file_path, 'w') as f:
    json.dump(data, f, ensure_ascii=False)

Where:

  • host: Set to your service endpoint.

  • authorization: Set to your service token.

  • file_path: Replace with the local path to your dataset file.

  • The post_http_request and get_response functions match those defined in the Call the model service Python script.

You can also use the LLM-Instruction Expansion (DLC) component in PAI-Designer to achieve this without code. For details, see Custom pipelines. image

Optional: Use an instruction optimization model

Instruction optimization is another common prompt engineering technique for LLMs. It automatically refines user-provided instruction datasets to generate more detailed, structured instructions—leading to richer LLM responses.

  • For example, given this input to the instruction optimization model:

    How do I cook fish-flavored shredded pork?
    How do I prepare for the GRE exam?
    What should I do if a friend misunderstands me?
  • The model outputs something like this:

    Provide a detailed Sichuan-style recipe for fish-flavored shredded pork. Include a specific ingredient list—vegetables, pork, and seasonings—along with step-by-step cooking instructions. Also recommend suitable side dishes and staple foods to serve with it.
    Provide a comprehensive guide covering GRE registration, required documents, study strategies, and recommended review materials. Also suggest effective practice questions and mock exams to help me prepare.
    Provide a detailed guide on staying calm and rational when misunderstood by a friend—and communicating effectively to resolve it. Include practical advice—for example, how to express your thoughts and feelings, how to avoid escalating misunderstandings, and specific dialogue scenarios and situations for practice.

Instruction detail directly affects LLM output quality. Optimizing instructions improves final student model performance. Based on the Qwen2 base model, PAI provides two proprietary instruction optimization models: Qwen2-1.5B-Instruct-Refine and Qwen2-7B-Instruct-Refine. You can deploy either as an EAS online service with one click. Follow these steps:

Deploy the model service

  1. Go to the Model Gallery page.

    1. Log on to the PAI console.

    2. In the top-left corner, select your region.

    3. In the navigation pane on the left, choose Workspaces, then click your workspace name to open it.

    4. In the navigation pane on the left, choose Getting Started > Model Gallery.

  2. In the model list on the right side of the Model Gallery page, search for Qwen2-1.5B-Instruct-Refine or Qwen2-7B-Instruct-Refine. Click Deploy on the corresponding card.

  3. In the Deploy configuration panel, the system sets default values for Model Service Information and Resource Deployment Information. Modify them as needed. Then click Deploy.

  4. In the Billing Notice dialog box, click OK.

    The system opens the Deployment Task page. When the Status shows Running, deployment succeeds.

Call the model service

After successful deployment, use the API to run inference. For full usage, see Deploy large language models. Below is an example client request:

  1. Get the service endpoint and token.

    1. On the Service Details page, click Basic Information, then click View Endpoint Information. image

    2. In the Endpoint Information dialog box, find the endpoint and token. Save them locally.

  2. In your terminal, create and run this Python script:

    import argparse
    import json
    import requests
    from typing import List
    
    
    def post_http_request(prompt: str,
                          system_prompt: str,
                          host: str,
                          authorization: str,
                          max_new_tokens: int,
                          temperature: float,
                          top_k: int,
                          top_p: float) -> requests.Response:
        headers = {
            "User-Agent": "Test Client",
            "Authorization": f"{authorization}"
        }
        pload = {
            "prompt": prompt,
            "system_prompt": system_prompt,
            "top_k": top_k,
            "top_p": top_p,
            "temperature": temperature,
            "max_new_tokens": max_new_tokens,
            "do_sample": True,
            "eos_token_id": 151645
        }
        response = requests.post(host, headers=headers, json=pload)
        return response
    
    
    def get_response(response: requests.Response) -> List[str]:
        data = json.loads(response.content)
        output = data["response"]
        return output
    
    
    if __name__ == "__main__":
        parser = argparse.ArgumentParser()
        parser.add_argument("--top-k", type=int, default=2)
        parser.add_argument("--top-p", type=float, default=0.95)
        parser.add_argument("--max-new-tokens", type=int, default=256)
        parser.add_argument("--temperature", type=float, default=0.5)
        parser.add_argument("--prompt", type=str, default="Sing me a song.")
    
        args = parser.parse_args()
        prompt = args.prompt
        top_k = args.top_k
        top_p = args.top_p
        temperature = args.temperature
        max_new_tokens = args.max_new_tokens
    
        host = "EAS HOST"
        authorization = "EAS TOKEN"
    
        print(f" --- input: {prompt}\n", flush=True)
        system_prompt = "Optimize this instruction to make it more detailed and specific."
        response = post_http_request(
            prompt, system_prompt,
            host, authorization,
            max_new_tokens, temperature, top_k, top_p)
        output = get_response(response)
        print(f" --- output: {output}\n", flush=True)
    

    Where:

    • host: Set to your service endpoint.

    • authorization: Set to your service token.

Batch instruction optimization

You can use the EAS online service above to batch-process instructions. The following example reads a custom JSON dataset and calls the model API to optimize instructions. Create and run this Python script in your terminal:

import requests
import json
import random
from tqdm import tqdm
from typing import List

input_file_path = "input.json"  # Input filename

with open(input_file_path) as fp:
    data = json.load(fp)

pbar = tqdm(total=len(data))
new_data = []

for d in data:
    prompt = d["instruction"]
    system_prompt = "Optimize the following instruction."
    top_k = 50
    top_p = 0.95
    temperature = 1
    max_new_tokens = 2048

    host = "EAS HOST"
    authorization = "EAS TOKEN"

    response = post_http_request(
        prompt, system_prompt,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p)
    output = get_response(response)
    temp = {
        "instruction": output
    }
    new_data.append(temp)
    pbar.update(1)
pbar.close()

output_file_path = "output.json"  # Output filename
with open(output_file_path, 'w') as f:
    json.dump(new_data, f, ensure_ascii=False)

Where:

  • host: Set to your service endpoint.

  • authorization: Set to your service token.

  • file_path: Replace with the local path to your dataset file.

  • The post_http_request and get_response functions match those defined in the Call the model service Python script.

You can also use the LLM-Instruction Optimization (DLC) component in PAI-Designer to achieve this without code. For details, see Custom pipelines. image

Deploy a teacher LLM to generate responses

Deploy the model service

After optimizing your instruction dataset, deploy a teacher LLM to generate responses. Follow these steps:

  1. Go to the Model Gallery page.

    1. Log on to the PAI console.

    2. In the top-left corner, select your region.

    3. In the navigation pane on the left, choose Workspaces, then click your workspace name to open it.

    4. In the navigation pane on the left, choose Getting Started > Model Gallery.

  2. In the model list on the right side of the Model Gallery page, search for Qwen2-72B-Instruct. Click Deploy on the corresponding card.

  3. In the Deploy configuration panel, the system sets default values for Model Service Information and Resource Deployment Information. Modify them as needed. Then click Deploy.

  4. In the Billing Notice dialog box, click OK.

    The system opens the Deployment Task page. When the Status shows Running, deployment succeeds.

Call the model service

After successful deployment, use the API to run inference. For full usage, see Deploy large language models. Below is an example client request:

  1. Get the service endpoint and token.

    1. On the Service Details page, click Basic Information, then click View Endpoint Information. image

    2. In the Endpoint Information dialog box, find the endpoint and token. Save them locally.

  2. In your terminal, create and run this Python script:

    import argparse
    import json
    import requests
    from typing import List
    
    
    def post_http_request(prompt: str,
                          system_prompt: str,
                          host: str,
                          authorization: str,
                          max_new_tokens: int,
                          temperature: float,
                          top_k: int,
                          top_p: float) -> requests.Response:
        headers = {
            "User-Agent": "Test Client",
            "Authorization": f"{authorization}"
        }
        pload = {
            "prompt": prompt,
            "system_prompt": system_prompt,
            "top_k": top_k,
            "top_p": top_p,
            "temperature": temperature,
            "max_new_tokens": max_new_tokens,
            "do_sample": True,
        }
        response = requests.post(host, headers=headers, json=pload)
        return response
    
    
    def get_response(response: requests.Response) -> List[str]:
        data = json.loads(response.content)
        output = data["response"]
        return output
    
    
    if __name__ == "__main__":
        parser = argparse.ArgumentParser()
        parser.add_argument("--top-k", type=int, default=50)
        parser.add_argument("--top-p", type=float, default=0.95)
        parser.add_argument("--max-new-tokens", type=int, default=2048)
        parser.add_argument("--temperature", type=float, default=0.5)
        parser.add_argument("--prompt", type=str)
        parser.add_argument("--system_prompt", type=str)
    
        args = parser.parse_args()
        prompt = args.prompt
        system_prompt = args.system_prompt
        top_k = args.top_k
        top_p = args.top_p
        temperature = args.temperature
        max_new_tokens = args.max_new_tokens
    
        host = "EAS HOST"
        authorization = "EAS TOKEN"
    
        print(f" --- input: {prompt}\n", flush=True)
        response = post_http_request(
            prompt, system_prompt,
            host, authorization,
            max_new_tokens, temperature, top_k, top_p)
        output = get_response(response)
        print(f" --- output: {output}\n", flush=True)
    

    Where:

    • host: Set to your service endpoint.

    • authorization: Set to your service token.

Batch teacher model instruction annotation

The following example reads a custom JSON dataset and calls the model API to annotate instructions using the teacher model. Create and run this Python script in your terminal:

import json 
from tqdm import tqdm
import requests
from typing import List

input_file_path = "input.json"  # Input filename

with open(input_file_path) as fp:
    data = json.load(fp)

pbar = tqdm(total=len(data))
new_data = []

for d in data:
    system_prompt = "You are a helpful assistant."
    prompt = d["instruction"]
    print(prompt)
    top_k = 50
    top_p = 0.95
    temperature = 0.5
    max_new_tokens = 2048

    host = "EAS HOST"
    authorization = "EAS TOKEN"

    response = post_http_request(
        prompt, system_prompt,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p)
    output = get_response(response)
    temp = {
        "instruction": prompt,
        "output": output
    }
    new_data.append(temp)
    pbar.update(1)
pbar.close()

output_file_path = "output.json" # Output filename
with open(output_file_path, 'w') as f:
    json.dump(new_data, f, ensure_ascii=False)

Where:

  • host: Set to your service endpoint.

  • authorization: Set to your service token.

  • file_path: Replace with the local path to your dataset file.

  • The post_http_request and get_response functions match those defined in the Call the model service script.

Distill-train a smaller student model

Train the model

After obtaining teacher model responses, train a student model in the PAI-Model Gallery—no coding required. This greatly simplifies model development. This solution uses the Qwen2-7B-Instruct model as an example. Follow these steps to train a model using your prepared dataset in the PAI-Model Gallery:

  1. Go to the Model Gallery page.

    1. Log on to the PAI console.

    2. In the top-left corner, select your region.

    3. In the navigation pane on the left, choose Workspaces, then click your workspace name to open it.

    4. In the navigation pane on the left, choose Getting Started > Model Gallery.

  2. In the model list on the right side of the Model Gallery page, search for and click the Qwen2-7B-Instruct model card to open its details page.

  3. On the model details page, click Fine-tune in the upper-right corner.

  4. In the Fine-tune configuration panel, set these key parameters. Leave others at default.

    Parameter

    Description

    Default value

    Dataset configuration

    Training dataset

    Select OSS file or directory from the dropdown. Then select your dataset’s OSS storage path:

    1. Click image and select your OSS bucket.

    2. Click Upload file. Upload your dataset file to the OSS directory using the console guidance.

    3. Click OK.

    None

    Training output configuration

    model

    Click image, then select your OSS storage directory.

    None

    tensorboard

    Click image, then select your OSS storage directory.

    None

    Compute resource configuration

    Job resources

    Select a resource specification. The system recommends suitable options.

    None

    Hyperparameter configuration

    learning_rate

    Learning rate for model training. Type: Float.

    5e-5

    num_train_epochs

    Number of training epochs. Type: INT.

    1

    per_device_train_batch_size

    Number of training samples per GPU per iteration. Type: INT.

    1

    seq_length

    Text sequence length. Type: INT.

    128

    lora_dim

    LoRA dimension. Type: INT. If lora_dim>0, use LoRA/QLoRA lightweight training.

    32

    lora_alpha

    LoRA weight. Type: INT. Takes effect only if lora_dim>0 and LoRA/QLoRA lightweight training is used.

    32

    load_in_4bit

    Whether to load the model in 4-bit mode. Type: bool. Valid values:

    • true

    • false

    If lora_dim>0, load_in_4bit is true, and load_in_8bit is false, use 4-bit QLoRA lightweight training.

    true

    load_in_8bit

    Whether to load the model in 8-bit mode. Type: bool. Valid values:

    • true

    • false

    If lora_dim>0, load_in_4bit is false, and load_in_8bit is true, use 8-bit QLoRA lightweight training.

    false

    gradient_accumulation_steps

    Number of gradient accumulation steps. Type: INT.

    8

    apply_chat_template

    Whether to combine training data with the default chat template to optimize model output. Type: bool. Valid values:

    • true

    • false

    For Qwen2 series models, the format is:

    • Question: <|im_end|>\n<|im_start|>user\n + instruction + <|im_end|>\n

    • Answer: <|im_start|>assistant\n + output + <|im_end|>\n

    true

    system_prompt

    System prompt used during model training. Type: String.

    You are a helpful assistant

  5. After setting parameters, click Train.

  6. In the Billing Notice dialog box, click OK.

    The system opens the training task page.

Deploy the model service

After training completes, deploy the model as an EAS online service:

  1. On the training task page, click Deploy on the right side.

  2. In the deployment configuration panel, the system sets default values for Model Service Information and Resource Deployment Information. Modify them as needed. Then click Deploy.

  3. In the Billing Notice dialog box, click OK.

    The system opens the Deployment Task page. When the Status shows Running, deployment succeeds.

Call the model service

After successful deployment, use the API to run inference. For full usage, see Deploy large language models.

References

  • For more information about EAS, see EAS overview.

  • Using PAI-Model Gallery, you can easily deploy and fine-tune models for many scenarios—including Llama-3, Qwen1.5, and Stable Diffusion V1.5. For details, see Model Gallery use case collection.