Build LLM Training Data via Qwen2 Distillation on PAI - Platform For AI

Transfer knowledge from large teacher models to smaller student models using model distillation while preserving performance.

Model distillation reduces LLM size and compute requirements while maintaining performance. This guide covers data augmentation and distillation using Qwen2 models.

Workflow

Prepare instruction data

Prepare training dataset according to required format and strategies.
Optional: Augment instructions

Use Qwen2-1.5B-Instruct-Exp or Qwen2-7B-Instruct-Exp to generate semantically similar instructions. Augmentation improves generalization during distillation.
Optional: Optimize instructions

Use Qwen2-1.5B-Instruct-Refine or Qwen2-7B-Instruct-Refine to enrich instructions—including augmented ones—to improve text generation quality.
Generate responses with teacher model

Use Qwen2-72B-Instruct to generate responses for instructions in your training dataset, transferring teacher knowledge.
Train student model

Use completed instruction-response dataset to train a smaller student model suitable for production.

Prerequisites

Complete these prerequisites:

Activate Deep Learning Containers (DLC) and EAS of PAI on a pay-as-you-go basis and create a default workspace. For more information, see Activate PAI and create a default workspace.
Create an OSS bucket to store training data and model files. For more information, see Quick Start.

Prepare instruction data

Prepare instruction data following the data preparation strategy and data format requirements:

Data preparation strategy

To improve model distillation effectiveness and stability, prepare data using these strategies:

Prepare at least several hundred data points. More data improves model performance.
Ensure broad and balanced distribution: diverse task scenarios, varied input and output lengths (both short and long examples), and balanced language distribution for multilingual datasets.
Process abnormal data. Even small amounts can impact fine-tuning results. Use rule-based methods to clean and filter invalid entries.

Data format requirements

Training dataset must be a JSON file with a single field: instruction. This field contains the input instruction. Example:

[
    {
        "instruction": "What major measures did governments take to stabilize financial markets during the 2008 financial crisis?"
    },
    {
        "instruction": "What important actions have governments taken to promote sustainable development amid worsening climate change?"
    },
    {
        "instruction": "What major measures did governments take to support economic recovery during the 2001 tech bubble burst?"
    }
]

Optional: Augment instructions

Instruction augmentation expands user-provided instruction datasets to increase diversity and coverage.

For example, given this input:

How do I cook fish-flavored shredded pork?
How do I prepare for the GRE exam?
What should I do if a friend misunderstands me?

The model outputs something like this:

Teach me how to cook mapo tofu.
Provide a detailed guide for preparing for the TOEFL exam.
If you face setbacks at work, how do you adjust your mindset?

Instruction diversity affects LLM generalization. Augmenting instructions improves student model performance. PAI provides two proprietary instruction augmentation models based on Qwen2: Qwen2-1.5B-Instruct-Exp and Qwen2-7B-Instruct-Exp. Deploy either as an EAS online service:

Deploy model service

Deploy the instruction augmentation model as an EAS online service.

Go to Model Gallery page.
1. Log on to the PAI console.
2. In the top-left corner, select your region.
3. In the left-side navigation pane, choose Workspaces, then click your workspace name.
4. In the left-side navigation pane, choose Getting Started > Model Gallery.
On Model Gallery page, search for Qwen2-1.5B-Instruct-Exp or Qwen2-7B-Instruct-Exp, then click Deploy.
In the Deploy panel, review default values for Model Service Information and Resource Deployment Information. Modify as needed, then click Deploy.
In the Billing Notice dialog, click OK.

System opens the Deployment Task page. When Status shows Running, deployment succeeds.

Call model service

After deployment, use the API to run inference. See Deploy large language models. Example client request:

Get the service endpoint and token.
1. On the Service Details page, click Basic Information, then click View Endpoint Information.
2. In the Endpoint Information dialog, find the endpoint and token. Save them locally.

Create and run this Python script:

import argparse
import json
import requests
from typing import List

def post_http_request(prompt: str,
                      system_prompt: str,
                      host: str,
                      authorization: str,
                      max_new_tokens: int,
                      temperature: float,
                      top_k: int,
                      top_p: float) -> requests.Response:
    headers = {
        "User-Agent": "Test Client",
        "Authorization": f"{authorization}"
    }
    pload = {
        "prompt": prompt,
        "system_prompt": system_prompt,
        "top_k": top_k,
        "top_p": top_p,
        "temperature": temperature,
        "max_new_tokens": max_new_tokens,
        "do_sample": True,
        "eos_token_id": 151645
    }
    response = requests.post(host, headers=headers, json=pload)
    return response

def get_response(response: requests.Response) -> List[str]:
    data = json.loads(response.content)
    output = data["response"]
    return output

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--top-k", type=int, default=50)
    parser.add_argument("--top-p", type=float, default=0.95)
    parser.add_argument("--max-new-tokens", type=int, default=2048)
    parser.add_argument("--temperature", type=float, default=1)
    parser.add_argument("--prompt", type=str, default="Sing me a song.")

    args = parser.parse_args()
    prompt = args.prompt
    top_k = args.top_k
    top_p = args.top_p
    temperature = args.temperature
    max_new_tokens = args.max_new_tokens

    host = "EAS HOST"
    authorization = "EAS TOKEN"

    print(f" --- input: {prompt}\n", flush=True)
    system_prompt = "You are an instruction creator. Your goal is to create a new instruction inspired by the [given instruction]."
    response = post_http_request(
        prompt, system_prompt,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p)
    output = get_response(response)
    print(f" --- output: {output}\n", flush=True)

Parameters:

host: Service endpoint.
authorization: Service token.

Batch augmentation

Use the EAS service to batch-process instructions. This example reads a custom JSON dataset and calls the model API to augment instructions. Create and run this Python script:

import requests
import json
import random
from tqdm import tqdm
from typing import List

input_file_path = "input.json"  # Input filename
with open(input_file_path) as fp:
    data = json.load(fp)

total_size = 10  # Target total number of samples after expansion
pbar = tqdm(total=total_size)

while len(data) < total_size:
    prompt = random.sample(data, 1)[0]["instruction"]
    system_prompt = "You are an instruction creator. Your goal is to create a new instruction inspired by the [given instruction]."
    top_k = 50
    top_p = 0.95
    temperature = 1
    max_new_tokens = 2048

    host = "EAS HOST"
    authorization = "EAS TOKEN"

    response = post_http_request(
        prompt, system_prompt,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p)
    output = get_response(response)
    temp = {
        "instruction": output
    }
    data.append(temp)
    pbar.update(1)
pbar.close()

output_file_path = "output.json" # Output filename
with open(output_file_path, 'w') as f:
    json.dump(data, f, ensure_ascii=False)

Parameters:

host: Service endpoint.
authorization: Service token.
file_path: Replace with the local path to your dataset file.
The post_http_request and get_response functions match those defined in the Call model service Python script.

Alternatively, use the LLM-Instruction Expansion (DLC) component in PAI-Designer without code. See Custom pipelines.

Optional: Optimize instructions

Instruction optimization refines user-provided instruction datasets to generate more detailed, structured instructions, leading to richer LLM responses.

For example, given this input to the instruction optimization model:

How do I cook fish-flavored shredded pork?
How do I prepare for the GRE exam?
What should I do if a friend misunderstands me?

The model outputs something like this:

Provide a detailed Sichuan-style recipe for fish-flavored shredded pork. Include a specific ingredient list—vegetables, pork, and seasonings—along with step-by-step cooking instructions. Also recommend suitable side dishes and staple foods to serve with it.
Provide a comprehensive guide covering GRE registration, required documents, study strategies, and recommended review materials. Also suggest effective practice questions and mock exams to help me prepare.
Provide a detailed guide on staying calm and rational when misunderstood by a friend—and communicating effectively to resolve it. Include practical advice—for example, how to express your thoughts and feelings, how to avoid escalating misunderstandings, and specific dialogue scenarios and situations for practice.

Instruction detail affects LLM output quality. Optimizing instructions improves student model performance. PAI provides two proprietary instruction optimization models based on Qwen2: Qwen2-1.5B-Instruct-Refine and Qwen2-7B-Instruct-Refine. Deploy either as an EAS online service:

Deploy model service

Go to Model Gallery page.
1. Log on to the PAI console.
2. In the top-left corner, select your region.
3. In the left-side navigation pane, choose Workspaces, then click your workspace name.
4. In the left-side navigation pane, choose Getting Started > Model Gallery.
On Model Gallery page, search for Qwen2-1.5B-Instruct-Refine or Qwen2-7B-Instruct-Refine, then click Deploy.
In the Deploy panel, review default values for Model Service Information and Resource Deployment Information. Modify as needed, then click Deploy.
In the Billing Notice dialog, click OK.

System opens the Deployment Task page. When Status shows Running, deployment succeeds.

Call model service

After deployment, use the API to run inference. See Deploy large language models. Example client request:

Get the service endpoint and token.
1. On the Service Details page, click Basic Information, then click View Endpoint Information.
2. In the Endpoint Information dialog, find the endpoint and token. Save them locally.

Create and run this Python script:

import argparse
import json
import requests
from typing import List


def post_http_request(prompt: str,
                      system_prompt: str,
                      host: str,
                      authorization: str,
                      max_new_tokens: int,
                      temperature: float,
                      top_k: int,
                      top_p: float) -> requests.Response:
    headers = {
        "User-Agent": "Test Client",
        "Authorization": f"{authorization}"
    }
    pload = {
        "prompt": prompt,
        "system_prompt": system_prompt,
        "top_k": top_k,
        "top_p": top_p,
        "temperature": temperature,
        "max_new_tokens": max_new_tokens,
        "do_sample": True,
        "eos_token_id": 151645
    }
    response = requests.post(host, headers=headers, json=pload)
    return response


def get_response(response: requests.Response) -> List[str]:
    data = json.loads(response.content)
    output = data["response"]
    return output


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--top-k", type=int, default=2)
    parser.add_argument("--top-p", type=float, default=0.95)
    parser.add_argument("--max-new-tokens", type=int, default=256)
    parser.add_argument("--temperature", type=float, default=0.5)
    parser.add_argument("--prompt", type=str, default="Sing me a song.")

    args = parser.parse_args()
    prompt = args.prompt
    top_k = args.top_k
    top_p = args.top_p
    temperature = args.temperature
    max_new_tokens = args.max_new_tokens

    host = "EAS HOST"
    authorization = "EAS TOKEN"

    print(f" --- input: {prompt}\n", flush=True)
    system_prompt = "Optimize this instruction to make it more detailed and specific."
    response = post_http_request(
        prompt, system_prompt,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p)
    output = get_response(response)
    print(f" --- output: {output}\n", flush=True)

Parameters:

host: Service endpoint.
authorization: Service token.

Batch optimization

Use the EAS service to batch-process instructions. This example reads a custom JSON dataset and calls the model API to optimize instructions. Create and run this Python script:

import requests
import json
import random
from tqdm import tqdm
from typing import List

input_file_path = "input.json"  # Input filename

with open(input_file_path) as fp:
    data = json.load(fp)

pbar = tqdm(total=len(data))
new_data = []

for d in data:
    prompt = d["instruction"]
    system_prompt = "Optimize the following instruction."
    top_k = 50
    top_p = 0.95
    temperature = 1
    max_new_tokens = 2048

    host = "EAS HOST"
    authorization = "EAS TOKEN"

    response = post_http_request(
        prompt, system_prompt,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p)
    output = get_response(response)
    temp = {
        "instruction": output
    }
    new_data.append(temp)
    pbar.update(1)
pbar.close()

output_file_path = "output.json"  # Output filename
with open(output_file_path, 'w') as f:
    json.dump(new_data, f, ensure_ascii=False)

Parameters:

host: Service endpoint.
authorization: Service token.
file_path: Replace with the local path to your dataset file.
The post_http_request and get_response functions match those defined in the Call model service Python script.

Alternatively, use the LLM-Instruction Optimization (DLC) component in PAI-Designer without code. See Custom pipelines.

Generate responses with teacher model

Deploy model service

After optimizing your instruction dataset, deploy a teacher LLM to generate responses:

Go to Model Gallery page.
1. Log on to the PAI console.
2. In the top-left corner, select your region.
3. In the left-side navigation pane, choose Workspaces, then click your workspace name.
4. In the left-side navigation pane, choose Getting Started > Model Gallery.
On Model Gallery page, search for Qwen2-72B-Instruct, then click Deploy.
In the Deploy panel, review default values for Model Service Information and Resource Deployment Information. Modify as needed, then click Deploy.
In the Billing Notice dialog, click OK.

System opens the Deployment Task page. When Status shows Running, deployment succeeds.

Call model service

After deployment, use the API to run inference. See Deploy large language models. Example client request:

Get the service endpoint and token.
1. On the Service Details page, click Basic Information, then click View Endpoint Information.
2. In the Endpoint Information dialog, find the endpoint and token. Save them locally.

Create and run this Python script:

import argparse
import json
import requests
from typing import List


def post_http_request(prompt: str,
                      system_prompt: str,
                      host: str,
                      authorization: str,
                      max_new_tokens: int,
                      temperature: float,
                      top_k: int,
                      top_p: float) -> requests.Response:
    headers = {
        "User-Agent": "Test Client",
        "Authorization": f"{authorization}"
    }
    pload = {
        "prompt": prompt,
        "system_prompt": system_prompt,
        "top_k": top_k,
        "top_p": top_p,
        "temperature": temperature,
        "max_new_tokens": max_new_tokens,
        "do_sample": True,
    }
    response = requests.post(host, headers=headers, json=pload)
    return response


def get_response(response: requests.Response) -> List[str]:
    data = json.loads(response.content)
    output = data["response"]
    return output


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--top-k", type=int, default=50)
    parser.add_argument("--top-p", type=float, default=0.95)
    parser.add_argument("--max-new-tokens", type=int, default=2048)
    parser.add_argument("--temperature", type=float, default=0.5)
    parser.add_argument("--prompt", type=str)
    parser.add_argument("--system_prompt", type=str)

    args = parser.parse_args()
    prompt = args.prompt
    system_prompt = args.system_prompt
    top_k = args.top_k
    top_p = args.top_p
    temperature = args.temperature
    max_new_tokens = args.max_new_tokens

    host = "EAS HOST"
    authorization = "EAS TOKEN"

    print(f" --- input: {prompt}\n", flush=True)
    response = post_http_request(
        prompt, system_prompt,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p)
    output = get_response(response)
    print(f" --- output: {output}\n", flush=True)

Parameters:

host: Service endpoint.
authorization: Service token.

Batch teacher model instruction annotation

This example reads a custom JSON dataset and calls the model API to annotate instructions. Create and run this Python script:

import json 
from tqdm import tqdm
import requests
from typing import List

input_file_path = "input.json"  # Input filename

with open(input_file_path) as fp:
    data = json.load(fp)

pbar = tqdm(total=len(data))
new_data = []

for d in data:
    system_prompt = "You are a helpful assistant."
    prompt = d["instruction"]
    print(prompt)
    top_k = 50
    top_p = 0.95
    temperature = 0.5
    max_new_tokens = 2048

    host = "EAS HOST"
    authorization = "EAS TOKEN"

    response = post_http_request(
        prompt, system_prompt,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p)
    output = get_response(response)
    temp = {
        "instruction": prompt,
        "output": output
    }
    new_data.append(temp)
    pbar.update(1)
pbar.close()

output_file_path = "output.json" # Output filename
with open(output_file_path, 'w') as f:
    json.dump(new_data, f, ensure_ascii=False)

Parameters:

host: Service endpoint.
authorization: Service token.
file_path: Replace with the local path to your dataset file.
The post_http_request and get_response functions match those defined in the Call model service script.

Train student model

Train the model

After obtaining teacher model responses, train a student model in Model Gallery. This solution uses Qwen2-7B-Instruct:

Go to Model Gallery page.
1. Log on to the PAI console.
2. In the top-left corner, select your region.
3. In the left-side navigation pane, choose Workspaces, then click your workspace name.
4. In the left-side navigation pane, choose Getting Started > Model Gallery.
On Model Gallery page, search for and click Qwen2-7B-Instruct card to open its details page.
On model details page, click Fine-tune in the upper-right corner.

In the Fine-tune panel, set these key parameters. Leave others at defaults.

Parameter		Description	Default value
Dataset configuration	Training dataset	Select OSS file or directory from dropdown, then select your dataset’s OSS path: Click and select your OSS bucket. Click Upload file. Upload your dataset file to OSS directory. Click OK.	None
Training output configuration	model	Click , then select your OSS storage directory.	None
Training output configuration	tensorboard	Click , then select your OSS storage directory.	None
Compute resource configuration	Job resources	Select a resource specification. The system recommends suitable options.	None
Hyperparameter configuration	learning_rate	Learning rate. Type: Float.	5e-5
	num_train_epochs	Number of training epochs. Type: INT.	1
	per_device_train_batch_size	Number of training samples per GPU per iteration. Type: INT.	1
	seq_length	Text sequence length. Type: INT.	128
	lora_dim	LoRA dimension. Type: INT. If lora_dim>0, use LoRA/QLoRA lightweight training.	32
	lora_alpha	LoRA weight. Type: INT. Takes effect only if lora_dim>0 and LoRA/QLoRA lightweight training is used.	32
	load_in_4bit	Whether to load the model in 4-bit mode. Type: bool. Valid values: true false If lora_dim>0, load_in_4bit is true, and load_in_8bit is false, use 4-bit QLoRA lightweight training.	true
	load_in_8bit	Whether to load the model in 8-bit mode. Type: bool. Valid values: true false If lora_dim>0, load_in_4bit is false, and load_in_8bit is true, use 8-bit QLoRA lightweight training.	false
	gradient_accumulation_steps	Number of gradient accumulation steps. Type: INT.	8
	apply_chat_template	Whether to combine training data with the default chat template. Type: bool. Valid values: true false For Qwen2 series models, the format is: Question: `<\|im_end\|>\n<\|im_start\|>user\n + instruction + <\|im_end\|>\n` Answer: `<\|im_start\|>assistant\n + output + <\|im_end\|>\n`	true
	system_prompt	System prompt for training. Type: String.	You are a helpful assistant

After setting parameters, click Train.
In the Billing Notice dialog, click OK.

System opens training task page.

Deploy model service

After training, deploy the model as an EAS service.

On training task page, click Deploy on the right side.
In the deployment panel, system sets default values for Model Service Information and Resource Deployment Information. Modify as needed and click Deploy.
In the Billing Notice dialog, click OK.

System opens the Deployment Task page. When Status shows Running, deployment succeeds.

Call model service

After deployment, use the API for inference. See Deploy large language models.

References

See EAS overview.
PAI-Model Gallery supports deploying and fine-tuning models for many scenarios, including Llama-3, Qwen1.5, and Stable Diffusion V1.5. See Scenario-specific practices.