All Products
Search
Document Center

Platform For AI:Deploy and fine-tune a model from Model Gallery

Last Updated:May 27, 2026

Model Gallery lets you deploy and fine-tune open-source LLMs without writing code. This guide walks through the full workflow — deployment, invocation, fine-tuning, and evaluation — using Qwen3-0.6B as an example.

Prerequisites

Activate PAI and create a workspace with your Alibaba Cloud account. Log in to the PAI console, select a region, and complete the one-click authorization.

Billing

The examples in this guide use pay-as-you-go public resources to create PAI-DLC tasks and PAI-EAS services. Billing rules: PAI-DLC billing, PAI-EAS billing.

Model deployment

Deploy a model

  1. Log in to the PAI console. In the left-side navigation pane, click Model Gallery, find Qwen3-0.6B, and click Deploy.

    image

  2. The deployment page is pre-populated with default parameters. Click Deploy > Confirm. Deployment takes about 5 minutes and is complete when the status changes to In operation.

    By default, the model service uses public resources and the pay-as-you-go billing method.

    image

Invoke the model

  1. On the service details page, click View Call Information to get the Internet Endpoint and Token.

    To view deployment job details later, go to Model Gallery > Job Management > Deployment Jobs in the navigation pane, and then click the Service name.

    image

  2. Test the model service by using one of the following methods:

    Online debugging

    Go to the Online Debugging tab. The LLM service supports Conversation Debugging and API Debugging.

    image

    image

    Cherry Studio client

    Cherry Studio is a popular LLM client with built-in MCP support.

    Connect to the Qwen3 model deployed on PAI

    Python SDK

    from openai import OpenAI
    import os
    
    # If an environment variable is not set, replace the next line with your EAS service Token: token = '<YOUR_EAS_SERVICE_TOKEN>'
    token = os.environ.get("Token")
    # Do not remove "/v1" from the end of your Internet Endpoint.
    client = OpenAI(
        api_key=token,
        base_url=f'<YOUR_INTERNET_ENDPOINT>/v1',
    )
    
    if token is None:
        print("Please set the Token environment variable, or assign the Token value directly to the 'token' variable.")
        exit()
    
    query = 'Hello, who are you?'
    messages = [{'role': 'user', 'content': query}]
    
    resp = client.chat.completions.create(model='Qwen3-0.6B', messages=messages, max_tokens=512, temperature=0)
    query = messages[0]['content']
    response = resp.choices[0].message.content
    print(f'query: {query}')
    print(f'response: {response}')

Clean up resources

The model service uses pay-as-you-go public resources. Stop or delete the service when you no longer need it to avoid further charges.

image

Model fine-tuning

Fine-tuning adapts a model to a specific domain using a domain-specific dataset. The following example demonstrates a typical fine-tuning workflow.

Use case

In logistics, extracting structured data (recipient names, addresses, phone numbers) from free text is common. Large models like Qwen3-235B-A22B excel at this but are costly. A practical approach is to label data with the large model, then fine-tune a smaller model (Qwen3-0.6B) to match its performance at a fraction of the cost. This is known as model distillation.

On this task, the original Qwen3-0.6B achieves 50% accuracy. After fine-tuning, accuracy exceeds 90%.

Example recipient address information

Example structured information

Amina Patel - Phone number (474) 598-1543 - 1425 S 5th St, Apt 3B, Allentown, Pennsylvania 18104

{
    "state": "Pennsylvania",
    "city": "Allentown",
    "zip_code": "18104",
    "street_address": "1425 S 5th St, Apt 3B",
    "name": "Amina Patel",
    "phone": "(474) 598-1543"
}

Data preparation

To distill knowledge from the teacher model (Qwen3-235B-A22B) to Qwen3-0.6B, use the teacher model's API to extract recipient addresses into structured JSON. Because generating this data is time-consuming, this guide provides a sample training datasettrain.json and validation seteval.json.

The data used in this guide is synthetically generated and contains no sensitive user information.

Going live

To apply this solution to your business, we recommend that you prepare data using the following methods:

Real business scenarios (recommended)

Real business data better reflects your business scenarios, and the fine-tuned model can be better adapted to your business. After you obtain the business data, you need to programmatically convert it into a JSON file in the following format.

[
    {
        "instruction": "You are an expert assistant for extracting structured JSON from US shipping information. The JSON keys are name, street_address, city, state, zip_code, and phone.  Name: Isabella Rivera Cruz | 182 Calle Luis Lloréns Torres, Apt 3B, Mayagüez, Puerto Rico 00680 | MOBILE: (640) 486-5927",
        "output": "{\"name\": \"Isabella Rivera Cruz\", \"street_address\": \"182 Calle Luis Lloréns Torres, Apt 3B\", \"city\": \"Mayagüez\", \"state\": \"Puerto Rico\", \"zip_code\": \"00680\", \"phone\": \"(640) 486-5927\"}"
    },
    {
        "instruction": "You are an expert assistant for extracting structured JSON from US shipping information. The JSON keys are name, street_address, city, state, zip_code, and phone.  1245 Broadwater Avenue, Apt 3B, Bozeman, Montana 59715Receiver: Aisha PatelP: (429) 763-9742",
        "output": "{\"name\": \"Aisha Patel\", \"street_address\": \"1245 Broadwater Avenue, Apt 3B\", \"city\": \"Bozeman\", \"state\": \"Montana\", \"zip_code\": \"59715\", \"phone\": \"(429) 763-9742\"}"
    }
]

The JSON file contains multiple training samples. Each sample includes two fields: instruction and output.

  • instruction: Contains the prompt that guides the behavior of the large model, along with the input data.

  • output: The expected standard answer, usually generated by human experts or larger models such as qwen3-235b-a22b.

Model generation

When business data is insufficient, consider using a model for data augmentation. This can improve the diversity and coverage of the data. To avoid leaking user privacy, this solution uses a model to generate a batch of virtual address data. The following generation code is for your reference.

Code for simulating business data generation

To run the following code, you need to create an Alibaba Cloud Model Studio API key. The code uses qwen-plus-latest to generate business data and qwen3-235b-a22b for labeling.

# -*- coding: utf-8 -*-
import os
import asyncio
import random
import json
import sys
from typing import List
import platform
from openai import AsyncOpenAI

# Create an asynchronous client instance.
# NOTE: This script uses the DashScope-compatible API endpoint.
# If you are using a different OpenAI-compatible service, change the base_url.
client = AsyncOpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

# List of US States and Territories.
us_states = [
    "Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware",
    "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky",
    "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi",
    "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico",
    "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania",
    "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont",
    "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming", "District of Columbia",
    "Puerto Rico", "Guam", "American Samoa", "U.S. Virgin Islands", "Northern Mariana Islands"
]

# Recipient templates.
recipient_templates = [
    "To: {name}", "Recipient: {name}", "Deliver to {name}", "For: {name}",
    "ATTN: {name}", "{name}", "Name: {name}", "Contact: {name}", "Receiver: {name}"
]

# Phone number templates.
phone_templates = [
    "Tel: {phone}", "Tel. {phone}", "Mobile: {phone}", "Phone: {phone}",
    "Contact number: {phone}", "Phone number {phone}", "TEL: {phone}", "MOBILE: {phone}",
    "Contact: {phone}", "P: {phone}", "{phone}", "Call: {phone}",
]


# Generate a plausible US-style phone number.
def generate_us_phone():
    """Generates a random 10-digit US phone number in (XXX) XXX-XXXX format."""
    area_code = random.randint(201, 999)  # Avoid 0xx, 1xx area codes.
    exchange = random.randint(200, 999)
    line = random.randint(1000, 9999)
    return f"({area_code}) {exchange}-{line}"


# Use LLM to generate recipient and address information.
async def generate_recipient_and_address_by_llm(state: str):
    """Uses LLM to generate a recipient's name and address details for a given state."""
    prompt = f"""Please generate recipient information for a location in {state}, USA, including the following:
1. A realistic full English name. Aim for diversity.
2. A real city name within that state.
3. A specific street address, such as street number and name, and apartment number. It should be realistic.
4. A corresponding 5-digit ZIP code for that city or area.

Please return only the JSON object in the following format:
{{"name": "Recipient Name", "city": "City Name", "street_address": "Specific Street Address", "zip_code": "ZIP Code"}}

Do not include any other text, just the JSON. Ensure names are diverse, not just John Doe.
"""

    try:
        response = await client.chat.completions.create(
            messages=[{"role": "user", "content": prompt}],
            model="qwen-plus-latest",
            temperature=1.5,  # Increase temperature for more diverse names and addresses.
        )

        result = response.choices[0].message.content.strip()
        # Clean up potential markdown code block markers.
        if result.startswith('```'):
            result = result.split('\n', 1)[1]
        if result.endswith('```'):
            result = result.rsplit('\n', 1)[0]

        # Try to parse JSON.
        info = json.loads(result)
        print(info)
        return info
    except Exception as e:
        print(f"Failed to generate recipient and address: {e}, using fallback.")
        # Fallback mechanism.
        backup_names = ["Michael Johnson", "Emily Williams", "David Brown", "Jessica Jones", "Christopher Davis",
                        "Sarah Miller"]
        return {
            "name": random.choice(backup_names),
            "city": "Anytown",
            "street_address": f"{random.randint(100, 9999)} Main St",
            "zip_code": f"{random.randint(10000, 99999)}"
        }


# Generate a single raw data record.
async def generate_record():
    """Generates one messy, combined string of US address information."""
    # Randomly select a state.
    state = random.choice(us_states)

    # Use LLM to generate recipient and address info.
    info = await generate_recipient_and_address_by_llm(state)

    # Format recipient name.
    recipient = random.choice(recipient_templates).format(name=info['name'])

    # Generate a phone number.
    phone = generate_us_phone()
    phone_info = random.choice(phone_templates).format(phone=phone)

    # Assemble the full address line.
    full_address = f"{info['street_address']}, {info['city']}, {state} {info['zip_code']}"

    # Combine all components.
    components = [recipient, phone_info, full_address]

    # Randomize the order of components.
    random.shuffle(components)

    # Choose a random separator.
    separators = [' ', ', ', '; ', ' | ', '\t', ' - ', ' // ', '', '  ']
    separator = random.choice(separators)

    # Join the components.
    combined_data = separator.join(components)
    return combined_data.strip()


# Generate a batch of data.
async def generate_batch_data(count: int) -> List[str]:
    """Generates a specified number of data records."""
    print(f"Starting to generate {count} records...")

    # Use a semaphore to control concurrency, for example, up to 20 concurrent requests.
    semaphore = asyncio.Semaphore(20)

    async def generate_single_record(index):
        async with semaphore:
            try:
                record = await generate_record()
                print(f"Generated record #{index + 1}: {record}")
                return record
            except Exception as e:
                print(f"Failed to generate record #{index + 1}: {e}")
                return None

    # Concurrently generate data.
    tasks = [generate_single_record(i) for i in range(count)]

    data = await asyncio.gather(*tasks)

    successful_data = [record for record in data if record is not None]

    return successful_data


# Save data to a file.
def save_data(data: List[str], filename: str = "us_recipient_data.json"):
    """Saves the generated data to a JSON file."""
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(data, f, ensure_ascii=False, indent=2)
    print(f"Data has been saved to {filename}")


# Phase 1: Data Production.
async def produce_data_phase():
    """Handles the generation of raw recipient data."""
    print("=== Phase 1: Starting Raw Recipient Data Generation ===")

    # Generate 2,000 records.
    batch_size = 2000
    data = await generate_batch_data(batch_size)

    # Save the data.
    save_data(data, "us_recipient_data.json")

    print(f"\nTotal records generated: {len(data)}")
    print("\nSample Data:")
    for i, record in enumerate(data[:3]):  # Show first 3 as examples.
        print(f"{i + 1}. Raw Data: {record}\n")

    print("=== Phase 1 Complete ===\n")
    return True


# Define the system prompt for the extraction model.
def get_system_prompt_for_extraction():
    """Returns the system prompt for the information extraction task."""
    return """You are a professional information extraction assistant specializing in parsing US shipping addresses from unstructured text.

## Task Description
Based on the given input text, accurately extract and generate a JSON object containing the following six fields:
- name: The full name of the recipient.
- street_address: The complete street address, including number, street name, and any apartment or suite number.
- city: The city name.
- state: The full state name (e.g., "California", not "CA").
- zip_code: The 5 or 9-digit ZIP code.
- phone: The complete contact phone number.

## Extraction Rules
1.  **Address Handling**:
    -   Accurately identify the components: street, city, state, and ZIP code.
    -   The `state` field must be the full official name (e.g., "New York", not "NY").
    -   The `street_address` should contain all details before the city, such as "123 Apple Lane, Apt 4B".
2.  **Name Identification**:
    -   Extract the full recipient name.
3.  **Phone Number Handling**:
    -   Extract the complete phone number, preserving its original format.
4.  **ZIP Code**:
    -   Extract the 5-digit or 9-digit (ZIP+4) code.

## Output Format
Strictly adhere to the following JSON format. Do not add any explanatory text or markdown.
{
  "name": "Recipient's Full Name",
  "street_address": "Complete Street Address",
  "city": "City Name",
  "state": "Full State Name",
  "zip_code": "ZIP Code",
  "phone": "Contact Phone Number"
}
"""


# Use LLM to predict structured data from raw text.
async def predict_structured_data(raw_data: str):
    """Uses an LLM to predict structured data from a raw string."""
    system_prompt = get_system_prompt_for_extraction()

    try:
        response = await client.chat.completions.create(
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": raw_data}
            ],
            model="qwen3-235b-a22b",  # A powerful model is recommended for this task.
            temperature=0.0,  # Lower temperature for higher accuracy in extraction.
            response_format={"type": "json_object"},
            extra_body={"enable_thinking": False}
        )

        result = response.choices[0].message.content.strip()

        # Clean up potential markdown code block markers.
        if result.startswith('```'):
            lines = result.split('\n')
            for i, line in enumerate(lines):
                if line.strip().startswith('{'):
                    result = '\n'.join(lines[i:])
                    break
        if result.endswith('```'):
            result = result.rsplit('\n```', 1)[0]

        structured_data = json.loads(result)
        return structured_data

    except Exception as e:
        print(f"Failed to predict structured data: {e}, Raw data: {raw_data}")
        # Return an empty structure on failure.
        return {
            "name": "",
            "street_address": "",
            "city": "",
            "state": "",
            "zip_code": "",
            "phone": ""
        }


# Phase 2: Data Conversion.
async def convert_data_phase():
    """Reads raw data, predicts structured format, and saves as SFT data."""
    print("=== Phase 2: Starting Data Conversion to SFT Format ===")

    try:
        print("Reading us_recipient_data.json file...")
        with open('us_recipient_data.json', 'r', encoding='utf-8') as f:
            raw_data_list = json.load(f)

        print(f"Successfully read {len(raw_data_list)} records.")
        print("Starting to predict structured data using the extraction model...")

        # A simple and clear system message can improve training and inference speed.
        system_prompt = "You are an expert assistant for extracting structured JSON from US shipping information. The JSON keys are name, street_address, city, state, zip_code, and phone."
        output_file = 'us_recipient_sft_data.json'

        # Use a semaphore to control concurrency.
        semaphore = asyncio.Semaphore(10)

        async def process_single_item(index, raw_data):
            async with (semaphore):
                structured_data = await predict_structured_data(raw_data)
                print(f"Processing record #{index + 1}: {raw_data}")

                conversation = {
                        "instruction": system_prompt + '  ' + raw_data,
                        "output": json.dumps(structured_data, ensure_ascii=False)
                }

                return conversation

        print(f"Starting conversion to {output_file}...")

        tasks = [process_single_item(i, raw_data) for i, raw_data in enumerate(raw_data_list)]
        conversations = await asyncio.gather(*tasks)

        with open(output_file, 'w', encoding='utf-8') as outfile:
            json.dump(conversations, outfile, ensure_ascii=False, indent=4)

        print(f"Conversion complete! Processed {len(raw_data_list)} records.")
        print(f"Output file: {output_file}")
        print("=== Phase 2 Complete ===")

    except FileNotFoundError:
        print("Error: us_recipient_data.json not found.")
        sys.exit(1)
    except json.JSONDecodeError as e:
        print(f"JSON decoding error: {e}")
        sys.exit(1)
    except Exception as e:
        print(f"An error occurred during conversion: {e}")
        sys.exit(1)


# Main function.
async def main():
    print("Starting the data processing pipeline...")
    print("This program will execute two phases in sequence:")
    print("1. Generate raw US recipient data.")
    print("2. Predict structured data and convert it to SFT format.")
    print("-" * 50)

    # Phase 1: Generate data.
    success = await produce_data_phase()

    if success:
        # Phase 2: Convert data.
        await convert_data_phase()

        print("\n" + "=" * 50)
        print("All processes completed successfully!")
        print("Generated files:")
        print("- us_recipient_data.json: Raw, unstructured data list.")
        print("- us_recipient_sft_data.json: SFT-formatted training data.")
        print("=" * 50)
    else:
        print("Data generation phase failed. Terminating.")


if __name__ == '__main__':
    # Set event loop policy for Windows if needed.
    if platform.system() == 'Windows':
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    # Run the main coroutine.
    asyncio.run(main(), debug=False)

Fine-tune the model

  1. In the navigation pane, click Model Gallery, find Qwen3-0.6B, and click Fine-tune.

    image

  2. Configure the training parameters. Only the following key parameters need to be set; leave the rest at their defaults.

    • Training Mode: The default is SFT (Supervised Fine-Tuning) using the LoRA method.

      LoRA is an efficient fine-tuning technique that saves resources by updating only a small subset of model parameters.
    • Training dataset: Download the sample training dataset train.json. On the configuration page, select OSS file or directory, click the image icon to select a bucket, click Upload File to upload the dataset to OSS, and select the file.

      image

    • Validate dataset: Download the sample validation dataset eval.json. Click Add validation dataset and repeat the upload process.

      The validation dataset evaluates model performance on unseen data during training.
    • Model output path: The fine-tuned model is saved to an OSS path by default. If the folder does not exist, click Create folder to create one.

    • Resource Group Type: Select Public Resource Group. This job requires about 5 GB of GPU memory. The console filters available instance types accordingly. Select an instance type, such as ecs.gn7i-c16g1.4xlarge.

    • Hyperparameters:

      • learning_rate: Set to 0.0005

      • num_train_epochs: Set to 4

      • per_device_train_batch_size: Set to 8

      • seq_length: Set to 512

      Then, click Train > OK. The training job status changes from Creating to In operation, which starts the model fine-tuning.

  3. Wait for the fine-tuning to complete (about 10 minutes). The task details page shows logs and metric curves during training. After the job completes, the fine-tuned model is saved to the specified OSS folder.

    To view job details later, go to Model Gallery > Job Management > Training Jobs and click the job name.

    image

    (Optional) Adjust hyperparameters based on loss curves to improve model performance

    On the task details page, view the train_loss curve (training set) and eval_loss curve (validation set):

    imageimage

    Use the loss trends to evaluate training effectiveness:

    • Underfitting: Both train_loss and eval_loss are still decreasing when training ends.

      Increase num_train_epochs (training depth) or lora_rank (capacity for complex patterns, but increases overfitting risk), and retrain.

    • Overfitting: train_loss keeps decreasing, but eval_loss starts increasing before training ends.

      Decrease num_train_epochs or lora_rank and retrain.

    • Good fit: Both train_loss and eval_loss have stabilized before training ends.

      Proceed to the next steps.

Deploy the fine-tuned model

On the training job details page, click Deploy. For Resource Type, select Public Resources. The 0.6B model requires about 5 GB of GPU memory. Under Instance Type, only qualifying specifications are listed. Select an option such as ecs.gn7i-c8g1.2xlarge, keep other parameters at defaults, and click Deploy > OK.

Deployment takes about 5 minutes and is complete when the status changes to Running.

To view training job details later, go to Model Gallery > Job Management > Training Jobs and click the job name.

image

If Deploy is disabled after the training job completes, the output model is still being registered. Wait about one minute.

image

To invoke the model, follow the steps in Invoke the model.

Evaluate the fine-tuned model

Evaluate the fine-tuned model's performance before deploying to production. This ensures the model is stable and accurate.

Prepare test data

Prepare a test dataset with no overlap with the training data. The accuracy test code below automatically downloads the required test set.

Test data must not contain training samples. This ensures accurate evaluation of the model's generalization ability and prevents inflated scores from memorization.

Design evaluation metrics

Evaluation criteria must align with your business goals. For this use case, in addition to validating JSON output format, confirm that key-value pairs are correct.

Define evaluation metrics programmatically. For an implementation example, see the compare_address_info method in the code below.

Evaluate model performance

Run the following code to calculate model accuracy on the test set.

Example code for testing model accuracy

Note: Replace the placeholder token and endpoint with the actual values you obtained.

# pip3 install openai
from openai import AsyncOpenAI
import requests
import json
import asyncio
import os

# If the 'Token' environment variable is not set, replace the following line with your token from the EAS service: token = 'YTA1NTEzMzY3ZTY4Z******************'
token = os.environ.get("Token")

# Do not remove the "/v1" suffix after the service URL.
client = OpenAI(
    api_key=token,
    base_url=f'<YOUR_ENDPOINT>/v1',
)

if token is None:
    print("Please set the 'Token' environment variable, or assign your token directly to the 'token' variable.")
    exit()

system_prompt = """You are a professional information extraction assistant specializing in parsing US shipping addresses from unstructured text.

## Task Description
Based on the given input text, accurately extract and generate a JSON object containing the following six fields:
- name: The full name of the recipient.
- street_address: The complete street address, including number, street name, and any apartment or suite number.
- city: The city name.
- state: The full state name (e.g., "California", not "CA").
- zip_code: The 5 or 9-digit ZIP code.
- phone: The complete contact phone number.

## Extraction Rules
1.  **Address Handling**:
    -   Accurately identify the components: street, city, state, and ZIP code.
    -   The `state` field must be the full official name (e.g., "New York", not "NY").
    -   The `street_address` should contain all details before the city, such as "123 Apple Lane, Apt 4B".
2.  **Name Identification**:
    -   Extract the full recipient name.
3.  **Phone Number Handling**:
    -   Extract the complete phone number, preserving its original format.
4.  **ZIP Code**:
    -   Extract the 5-digit or 9-digit (ZIP+4) code.

## Output Format
Strictly adhere to the following JSON format. Do not add any explanatory text or markdown.
{
  "name": "Recipient's Full Name",
  "street_address": "Complete Street Address",
  "city": "City Name",
  "state": "Full State Name",
  "zip_code": "ZIP Code",
  "phone": "Contact Phone Number"
}
"""


def compare_address_info(actual_address_str, predicted_address_str):
    """Compares two JSON strings representing address information to determine if they are identical."""
    try:
        # Parse the actual address information
        if actual_address_str:
            actual_address_json = json.loads(actual_address_str)
        else:
            actual_address_json = {}

        # Parse the predicted address information
        if predicted_address_str:
            predicted_address_json = json.loads(predicted_address_str)
        else:
            predicted_address_json = {}

        # Directly compare if the two JSON objects are identical
        is_same = actual_address_json == predicted_address_json

        return {
            "is_same": is_same,
            "actual_address_parsed": actual_address_json,
            "predicted_address_parsed": predicted_address_json,
            "comparison_error": None
        }

    except json.JSONDecodeError as e:
        return {
            "is_same": False,
            "actual_address_parsed": None,
            "predicted_address_parsed": None,
            "comparison_error": f"JSON parsing error: {str(e)}"
        }
    except Exception as e:
        return {
            "is_same": False,
            "actual_address_parsed": None,
            "predicted_address_parsed": None,
            "comparison_error": f"Comparison error: {str(e)}"
        }


async def predict_single_conversation(conversation_data):
    """Predicts the label for a single conversation."""
    try:
        # Extract user content (excluding assistant message)
        messages = conversation_data.get("messages", [])
        user_content = None

        for message in messages:
            if message.get("role") == "user":
                user_content = message.get("content", "")
                break

        if not user_content:
            return {"error": "User message not found"}

        response = client.chat.completions.create(
            model="Qwen3-0.6B",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_content}
            ],
            response_format={"type": "json_object"},
            extra_body={
                "enable_thinking": False
            }
        )

        predicted_labels = response.choices[0].message.content.strip()
        return {"prediction": predicted_labels}

    except Exception as e:
        return {"error": f"Prediction failed: {str(e)}"}


async def process_batch(batch_data, batch_id):
    """Processes a batch of data."""
    print(f"Processing batch {batch_id}, containing {len(batch_data)} conversations...")

    tasks = []
    for i, conversation in enumerate(batch_data):
        task = predict_single_conversation(conversation)
        tasks.append(task)

    results = await asyncio.gather(*tasks, return_exceptions=True)

    batch_results = []
    for i, result in enumerate(results):
        if isinstance(result, Exception):
            batch_results.append({"error": f"Exception: {str(result)}"})
        else:
            batch_results.append(result)

    return batch_results


async def main():
    output_file = "predicted_labels.jsonl"
    batch_size = 20  # Number of items to process per batch

    # Read test data
    url = 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251015/yghxco/test.jsonl'
    conversations = []

    try:
        response = requests.get(url)
        response.raise_for_status()  # Check if the request was successful
        for line_num, line in enumerate(response.text.splitlines(), 1):
            try:
                data = json.loads(line.strip())
                conversations.append(data)
            except json.JSONDecodeError as e:
                print(f"JSON parsing error on line {line_num}: {e}")
                continue
    except requests.exceptions.RequestException as e:
        print(f"Request error: {e}")
        return

    print(f"Successfully read {len(conversations)} conversations")

    # Process in batches
    all_results = []
    total_batches = (len(conversations) + batch_size - 1) // batch_size

    for batch_id in range(total_batches):
        start_idx = batch_id * batch_size
        end_idx = min((batch_id + 1) * batch_size, len(conversations))
        batch_data = conversations[start_idx:end_idx]

        batch_results = await process_batch(batch_data, batch_id + 1)
        all_results.extend(batch_results)

        print(f"Batch {batch_id + 1}/{total_batches} completed")

        # Add a small delay to avoid making requests too quickly
        if batch_id < total_batches - 1:
            await asyncio.sleep(1)

    # Save results
    same_count = 0
    different_count = 0
    error_count = 0

    with open(output_file, 'w', encoding='utf-8') as f:
        for i, (original_data, prediction_result) in enumerate(zip(conversations, all_results)):
            result_entry = {
                "index": i,
                "original_user_content": None,
                "actual_address": None,
                "predicted_address": None,
                "prediction_error": None,
                "address_comparison": None
            }

            # Extract original user content
            messages = original_data.get("messages", [])
            for message in messages:
                if message.get("role") == "user":
                    result_entry["original_user_content"] = message.get("content", "")
                    break

            # Extract actual address information (if assistant message exists)
            for message in messages:
                if message.get("role") == "assistant":
                    result_entry["actual_address"] = message.get("content", "")
                    break

            # Save prediction result
            if "error" in prediction_result:
                result_entry["prediction_error"] = prediction_result["error"]
                error_count += 1
            else:
                result_entry["predicted_address"] = prediction_result.get("prediction", "")

                # Compare address information
                comparison_result = compare_address_info(
                    result_entry["actual_address"],
                    result_entry["predicted_address"]
                )
                result_entry["address_comparison"] = comparison_result

                # Tally comparison results
                if comparison_result["comparison_error"]:
                    error_count += 1
                elif comparison_result["is_same"]:
                    same_count += 1
                else:
                    different_count += 1

            f.write(json.dumps(result_entry, ensure_ascii=False) + '\n')

    print(f"All predictions complete! Results have been saved to {output_file}")

    # Statistics
    success_count = sum(1 for result in all_results if "error" not in result)
    prediction_error_count = len(all_results) - success_count
    print(f"Number of samples: {success_count}")
    print(f"Correct responses: {same_count}")
    print(f"Incorrect responses: {different_count}")
    print(f"Accuracy: {same_count * 100 / success_count} %")


if __name__ == "__main__":
    asyncio.run(main())

Output:

All predictions complete! Results have been saved to predicted_labels.jsonl
Number of samples: 400
Correct responses: 382
Incorrect responses: 18
Accuracy: 95.5 %
Your accuracy may differ due to the random seed used during fine-tuning and the stochastic nature of LLM output.

The model achieves 95.5% accuracy, up from 50% before fine-tuning. This demonstrates that fine-tuning substantially improves structured information extraction for logistics data entry.

This guide uses only 4 training epochs to reduce training time. You can further improve accuracy by increasing the number of epochs.

Clean up resources

The model service uses pay-as-you-go public resources. Stop or delete the service when you no longer need it to avoid further charges.

image

References

  • Model Gallery features (evaluation, compression, and more): Model Gallery.

  • EAS features (Auto Scaling, stress testing, monitoring and alerting): EAS overview.