Model Gallery quick start - Platform For AI - Alibaba Cloud Documentation Center

Model Gallery encapsulates PAI-DLC and PAI-EAS to support zero-code deployment and training of open-source large language models. This topic demonstrates how to deploy, fine-tune, and evaluate the Qwen3-0.6B model.

1. Prerequisites

To activate PAI and create a workspace, log in to the PAI console with your Alibaba Cloud account, select a region in the top-left corner, and activate the service with one-click authorization.

2. Billing

The examples in this topic use public resources to create PAI-DLC tasks and PAI-EAS services, which are billed on a pay-as-you-go basis. For details, see PAI-DLC billing and PAI-EAS billing.

3. Model deployment

3.1 Deploy the model

Log on to the PAI console. In the left-side navigation pane, click Model Gallery, search for the Qwen3-0.6B card, and then click Deploy.
The configuration page is pre-filled with default parameters. Click Deploy > Confirm. The deployment takes about five minutes. The deployment is successful when the status changes to In operation.
By default, the model is deployed using public resources and is billed on a pay-as-you-go basis.
The default deployment resource specification is ecs.gn7i-c8g1.2xlarge (8 vCPU, 30 GiB, NVIDIA A10 × 1), which costs approximately CNY 10.5/hour. After reviewing the configuration, click Deploy at the bottom of the panel.

3.2 Invoke the model

View the invocation information. On the service details page, click View Call Information to get the Internet Endpoint and Token.
To view the deployment job details later, in the left-side navigation pane, click Model Gallery > Job Management > Deployment Jobs. Then, click the target Service name.
In the displayed invocation information dialog box, view the Internet Endpoint and VPC endpoint on the Shared Gateway and VPC High-Speed Direct Connection tabs, respectively.
Invoke the model by using one of the following methods.
Online debugging
Switch to the Online Debugging page. The large language model service supports Conversation Debugging and API Debugging.
Cherry Studio client
Cherry Studio is a popular client for interacting with large language models. It integrates the MCP feature, which allows you to easily chat with models.
Connect to the Qwen3 model deployed on PAI
1. Install the client
  Download and install the client from Cherry Studio.
  You can also go to https://github.com/CherryHQ/cherry-studio/releases to download the client.
2. Add a provider.
  1. Click the Settings button in the lower-left corner. In the Model Provider section, click Add.
  2. In the Provider Name field, enter a custom name, such as Platform for AI (PAI), and set the provider type to OpenAI.
  3. Click OK.
3. In the API Key field, enter your Token. In the API Host field, enter your endpoint.
4. Click Add. In the model ID field, enter Qwen3-0.6B (case-sensitive) to add the model.
5. You can click Check next to API Key to verify connectivity.
6. Click the icon to return to the chat page. At the top of the window, switch to your newly added Qwen3-0.6B model to start the conversation.
Python SDK
```
from openai import OpenAI
import os
# If you have not set the environment variable, you can assign your service Token directly. For example: token = 'YTA1NTEzMzY3ZTY4Z******************'
token = os.environ.get("Token")
# Do not remove "/v1" from the end of the endpoint.
client = OpenAI(
    api_key=token,
    base_url=f'<your_endpoint>/v1',
)
if token is None:
    print("Please configure the Token environment variable, or assign the token value directly to the token variable.")
    exit()
query = 'Hello, who are you?'
messages = [{'role': 'user', 'content': query}]
resp = client.chat.completions.create(model='Qwen3-0.6B', messages=messages, max_tokens=512, temperature=0)
query = messages[0]['content']
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
```

3.3 Important reminder

This model service uses public resources and is billed on a pay-as-you-go basis. To avoid incurring unnecessary charges, stop or delete the service when you no longer need it.

You can do this on the Job Management > Deployment Jobs tab, in the Actions column of the target service.

4. Model fine-tuning

To improve a model's performance in a specific domain, you can fine-tune it on a domain-specific dataset. This section presents a scenario to demonstrate the purpose and steps of model fine-tuning.

4.1 Use case

In the logistics industry, you often need to extract structured information (such as recipient, address, and phone number) from natural language. Large-parameter models, such as Qwen3-235B-A22B, perform well on this task but are costly and have high latency. To balance performance and cost, you can first use a large-parameter model to label data, and then use that data to fine-tune a small-parameter model, such as Qwen3-0.6B, to deliver similar performance on the task. This process is also known as model distillation.

On this task, the original Qwen3-0.6B model has an accuracy of 50%. After fine-tuning, its accuracy can exceed 90%.

Example recipient address information

Example structured information

Amina Patel - Phone number (474) 598-1543 - 1425 S 5th St, Apt 3B, Allentown, Pennsylvania 18104

{
    "state": "Pennsylvania",
    "city": "Allentown",
    "zip_code": "18104",
    "street_address": "1425 S 5th St, Apt 3B",
    "name": "Amina Patel",
    "phone": "(474) 598-1543"
}

4.2 Data preparation

This task involves performing model distillation from the teacher model (Qwen3-235B-A22B) to the Qwen3-0.6B model. First, you must use the teacher model's API to extract recipient address information into structured JSON data. Generating this JSON data can be time-consuming. Therefore, this article provides a sample training dataset train.json and a validation set eval.json that you can download and use directly.

In model distillation, the model with more parameters is called the teacher model. The data used in this article is synthetically generated by a large model and does not contain any sensitive user information.

Going live

To apply this solution to your business, we recommend that you prepare data using the following methods:

Real business scenarios (recommended)

Real business data better reflects your business scenarios, and the fine-tuned model can be better adapted to your business. After you obtain the business data, you need to programmatically convert it into a JSON file in the following format.

[
    {
        "instruction": "You are an expert assistant for extracting structured JSON from US shipping information. The JSON keys are name, street_address, city, state, zip_code, and phone.  Name: Isabella Rivera Cruz | 182 Calle Luis Lloréns Torres, Apt 3B, Mayagüez, Puerto Rico 00680 | MOBILE: (640) 486-5927",
        "output": "{\"name\": \"Isabella Rivera Cruz\", \"street_address\": \"182 Calle Luis Lloréns Torres, Apt 3B\", \"city\": \"Mayagüez\", \"state\": \"Puerto Rico\", \"zip_code\": \"00680\", \"phone\": \"(640) 486-5927\"}"
    },
    {
        "instruction": "You are an expert assistant for extracting structured JSON from US shipping information. The JSON keys are name, street_address, city, state, zip_code, and phone.  1245 Broadwater Avenue, Apt 3B, Bozeman, Montana 59715Receiver: Aisha PatelP: (429) 763-9742",
        "output": "{\"name\": \"Aisha Patel\", \"street_address\": \"1245 Broadwater Avenue, Apt 3B\", \"city\": \"Bozeman\", \"state\": \"Montana\", \"zip_code\": \"59715\", \"phone\": \"(429) 763-9742\"}"
    }
]

The JSON file contains multiple training samples. Each sample includes two fields: instruction and output.

instruction: Contains the prompt that guides the behavior of the large model, along with the input data.
output: The expected standard answer, usually generated by human experts or larger models such as qwen3-235b-a22b.

Model generation

When business data is insufficient, consider using a model for data augmentation. This can improve the diversity and coverage of the data. To avoid leaking user privacy, this solution uses a model to generate a batch of virtual address data. The following generation code is for your reference.

Code for simulating business data generation

To run the following code, you need to create an Alibaba Cloud Model Studio API key. The code uses qwen-plus-latest to generate business data and qwen3-235b-a22b for labeling.

# -*- coding: utf-8 -*-
import os
import asyncio
import random
import json
import sys
from typing import List
import platform
from openai import AsyncOpenAI

# Create an asynchronous client instance.
# NOTE: This script uses the DashScope-compatible API endpoint.
# If you are using a different OpenAI-compatible service, change the base_url.
client = AsyncOpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

# List of US States and Territories.
us_states = [
    "Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware",
    "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky",
    "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi",
    "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico",
    "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania",
    "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont",
    "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming", "District of Columbia",
    "Puerto Rico", "Guam", "American Samoa", "U.S. Virgin Islands", "Northern Mariana Islands"
]

# Recipient templates.
recipient_templates = [
    "To: {name}", "Recipient: {name}", "Deliver to {name}", "For: {name}",
    "ATTN: {name}", "{name}", "Name: {name}", "Contact: {name}", "Receiver: {name}"
]

# Phone number templates.
phone_templates = [
    "Tel: {phone}", "Tel. {phone}", "Mobile: {phone}", "Phone: {phone}",
    "Contact number: {phone}", "Phone number {phone}", "TEL: {phone}", "MOBILE: {phone}",
    "Contact: {phone}", "P: {phone}", "{phone}", "Call: {phone}",
]


# Generate a plausible US-style phone number.
def generate_us_phone():
    """Generates a random 10-digit US phone number in (XXX) XXX-XXXX format."""
    area_code = random.randint(201, 999)  # Avoid 0xx, 1xx area codes.
    exchange = random.randint(200, 999)
    line = random.randint(1000, 9999)
    return f"({area_code}) {exchange}-{line}"


# Use LLM to generate recipient and address information.
async def generate_recipient_and_address_by_llm(state: str):
    """Uses LLM to generate a recipient's name and address details for a given state."""
    prompt = f"""Please generate recipient information for a location in {state}, USA, including the following:
1. A realistic full English name. Aim for diversity.
2. A real city name within that state.
3. A specific street address, such as street number and name, and apartment number. It should be realistic.
4. A corresponding 5-digit ZIP code for that city or area.

Please return only the JSON object in the following format:
{{"name": "Recipient Name", "city": "City Name", "street_address": "Specific Street Address", "zip_code": "ZIP Code"}}

Do not include any other text, just the JSON. Ensure names are diverse, not just John Doe.
"""

    try:
        response = await client.chat.completions.create(
            messages=[{"role": "user", "content": prompt}],
            model="qwen-plus-latest",
            temperature=1.5,  # Increase temperature for more diverse names and addresses.
        )

        result = response.choices[0].message.content.strip()
        # Clean up potential markdown code block markers.
        if result.startswith('```'):
            result = result.split('\n', 1)[1]
        if result.endswith('```'):
            result = result.rsplit('\n', 1)[0]

        # Try to parse JSON.
        info = json.loads(result)
        print(info)
        return info
    except Exception as e:
        print(f"Failed to generate recipient and address: {e}, using fallback.")
        # Fallback mechanism.
        backup_names = ["Michael Johnson", "Emily Williams", "David Brown", "Jessica Jones", "Christopher Davis",
                        "Sarah Miller"]
        return {
            "name": random.choice(backup_names),
            "city": "Anytown",
            "street_address": f"{random.randint(100, 9999)} Main St",
            "zip_code": f"{random.randint(10000, 99999)}"
        }


# Generate a single raw data record.
async def generate_record():
    """Generates one messy, combined string of US address information."""
    # Randomly select a state.
    state = random.choice(us_states)

    # Use LLM to generate recipient and address info.
    info = await generate_recipient_and_address_by_llm(state)

    # Format recipient name.
    recipient = random.choice(recipient_templates).format(name=info['name'])

    # Generate a phone number.
    phone = generate_us_phone()
    phone_info = random.choice(phone_templates).format(phone=phone)

    # Assemble the full address line.
    full_address = f"{info['street_address']}, {info['city']}, {state} {info['zip_code']}"

    # Combine all components.
    components = [recipient, phone_info, full_address]

    # Randomize the order of components.
    random.shuffle(components)

    # Choose a random separator.
    separators = [' ', ', ', '; ', ' | ', '\t', ' - ', ' // ', '', '  ']
    separator = random.choice(separators)

    # Join the components.
    combined_data = separator.join(components)
    return combined_data.strip()


# Generate a batch of data.
async def generate_batch_data(count: int) -> List[str]:
    """Generates a specified number of data records."""
    print(f"Starting to generate {count} records...")

    # Use a semaphore to control concurrency, for example, up to 20 concurrent requests.
    semaphore = asyncio.Semaphore(20)

    async def generate_single_record(index):
        async with semaphore:
            try:
                record = await generate_record()
                print(f"Generated record #{index + 1}: {record}")
                return record
            except Exception as e:
                print(f"Failed to generate record #{index + 1}: {e}")
                return None

    # Concurrently generate data.
    tasks = [generate_single_record(i) for i in range(count)]

    data = await asyncio.gather(*tasks)

    successful_data = [record for record in data if record is not None]

    return successful_data


# Save data to a file.
def save_data(data: List[str], filename: str = "us_recipient_data.json"):
    """Saves the generated data to a JSON file."""
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(data, f, ensure_ascii=False, indent=2)
    print(f"Data has been saved to {filename}")


# Phase 1: Data Production.
async def produce_data_phase():
    """Handles the generation of raw recipient data."""
    print("=== Phase 1: Starting Raw Recipient Data Generation ===")

    # Generate 2,000 records.
    batch_size = 2000
    data = await generate_batch_data(batch_size)

    # Save the data.
    save_data(data, "us_recipient_data.json")

    print(f"\nTotal records generated: {len(data)}")
    print("\nSample Data:")
    for i, record in enumerate(data[:3]):  # Show first 3 as examples.
        print(f"{i + 1}. Raw Data: {record}\n")

    print("=== Phase 1 Complete ===\n")
    return True


# Define the system prompt for the extraction model.
def get_system_prompt_for_extraction():
    """Returns the system prompt for the information extraction task."""
    return """You are a professional information extraction assistant specializing in parsing US shipping addresses from unstructured text.

## Task Description
Based on the given input text, accurately extract and generate a JSON object containing the following six fields:
- name: The full name of the recipient.
- street_address: The complete street address, including number, street name, and any apartment or suite number.
- city: The city name.
- state: The full state name (e.g., "California", not "CA").
- zip_code: The 5 or 9-digit ZIP code.
- phone: The complete contact phone number.

## Extraction Rules
1.  **Address Handling**:
    -   Accurately identify the components: street, city, state, and ZIP code.
    -   The `state` field must be the full official name (e.g., "New York", not "NY").
    -   The `street_address` should contain all details before the city, such as "123 Apple Lane, Apt 4B".
2.  **Name Identification**:
    -   Extract the full recipient name.
3.  **Phone Number Handling**:
    -   Extract the complete phone number, preserving its original format.
4.  **ZIP Code**:
    -   Extract the 5-digit or 9-digit (ZIP+4) code.

## Output Format
Strictly adhere to the following JSON format. Do not add any explanatory text or markdown.
{
  "name": "Recipient's Full Name",
  "street_address": "Complete Street Address",
  "city": "City Name",
  "state": "Full State Name",
  "zip_code": "ZIP Code",
  "phone": "Contact Phone Number"
}
"""


# Use LLM to predict structured data from raw text.
async def predict_structured_data(raw_data: str):
    """Uses an LLM to predict structured data from a raw string."""
    system_prompt = get_system_prompt_for_extraction()

    try:
        response = await client.chat.completions.create(
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": raw_data}
            ],
            model="qwen3-235b-a22b",  # A powerful model is recommended for this task.
            temperature=0.0,  # Lower temperature for higher accuracy in extraction.
            response_format={"type": "json_object"},
            extra_body={"enable_thinking": False}
        )

        result = response.choices[0].message.content.strip()

        # Clean up potential markdown code block markers.
        if result.startswith('```'):
            lines = result.split('\n')
            for i, line in enumerate(lines):
                if line.strip().startswith('{'):
                    result = '\n'.join(lines[i:])
                    break
        if result.endswith('```'):
            result = result.rsplit('\n```', 1)[0]

        structured_data = json.loads(result)
        return structured_data

    except Exception as e:
        print(f"Failed to predict structured data: {e}, Raw data: {raw_data}")
        # Return an empty structure on failure.
        return {
            "name": "",
            "street_address": "",
            "city": "",
            "state": "",
            "zip_code": "",
            "phone": ""
        }


# Phase 2: Data Conversion.
async def convert_data_phase():
    """Reads raw data, predicts structured format, and saves as SFT data."""
    print("=== Phase 2: Starting Data Conversion to SFT Format ===")

    try:
        print("Reading us_recipient_data.json file...")
        with open('us_recipient_data.json', 'r', encoding='utf-8') as f:
            raw_data_list = json.load(f)

        print(f"Successfully read {len(raw_data_list)} records.")
        print("Starting to predict structured data using the extraction model...")

        # A simple and clear system message can improve training and inference speed.
        system_prompt = "You are an expert assistant for extracting structured JSON from US shipping information. The JSON keys are name, street_address, city, state, zip_code, and phone."
        output_file = 'us_recipient_sft_data.json'

        # Use a semaphore to control concurrency.
        semaphore = asyncio.Semaphore(10)

        async def process_single_item(index, raw_data):
            async with (semaphore):
                structured_data = await predict_structured_data(raw_data)
                print(f"Processing record #{index + 1}: {raw_data}")

                conversation = {
                        "instruction": system_prompt + '  ' + raw_data,
                        "output": json.dumps(structured_data, ensure_ascii=False)
                }

                return conversation

        print(f"Starting conversion to {output_file}...")

        tasks = [process_single_item(i, raw_data) for i, raw_data in enumerate(raw_data_list)]
        conversations = await asyncio.gather(*tasks)

        with open(output_file, 'w', encoding='utf-8') as outfile:
            json.dump(conversations, outfile, ensure_ascii=False, indent=4)

        print(f"Conversion complete! Processed {len(raw_data_list)} records.")
        print(f"Output file: {output_file}")
        print("=== Phase 2 Complete ===")

    except FileNotFoundError:
        print("Error: us_recipient_data.json not found.")
        sys.exit(1)
    except json.JSONDecodeError as e:
        print(f"JSON decoding error: {e}")
        sys.exit(1)
    except Exception as e:
        print(f"An error occurred during conversion: {e}")
        sys.exit(1)


# Main function.
async def main():
    print("Starting the data processing pipeline...")
    print("This program will execute two phases in sequence:")
    print("1. Generate raw US recipient data.")
    print("2. Predict structured data and convert it to SFT format.")
    print("-" * 50)

    # Phase 1: Generate data.
    success = await produce_data_phase()

    if success:
        # Phase 2: Convert data.
        await convert_data_phase()

        print("\n" + "=" * 50)
        print("All processes completed successfully!")
        print("Generated files:")
        print("- us_recipient_data.json: Raw, unstructured data list.")
        print("- us_recipient_sft_data.json: SFT-formatted training data.")
        print("=" * 50)
    else:
        print("Data generation phase failed. Terminating.")


if __name__ == '__main__':
    # Set event loop policy for Windows if needed.
    if platform.system() == 'Windows':
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    # Run the main coroutine.
    asyncio.run(main(), debug=False)

4.3 Fine-tune the model

In the left-side navigation pane, click Model Gallery. Search for the Qwen3-0.6B card and click Fine-tune.
Configure the parameters for the training job. Configure only the following key parameters and keep the default values for the others.
- Training Mode: The default selection is SFT (Supervised Fine-Tuning) using the LoRA method.
  LoRA is an efficient fine-tuning technique that saves training resources by modifying only a subset of the model parameters.
- Training dataset: First, download the sample training dataset train.json. Then, on the configuration page, select OSS file or directory, click the icon to select a bucket, click Upload File to upload the downloaded dataset to Object Storage Service (OSS), and then select the file.
- Validate dataset: First, download the validation dataset eval.json. Then, click Add validation dataset and follow the same procedure as for the training dataset to upload and select the file.
  The validation dataset evaluates the model's performance on unseen data during training.
- Model output path: By default, the system saves the fine-tuned model to OSS. If the target OSS directory is empty, click Create folder and select the newly created directory.
- Resource Group Type: Select Public Resource Group. This fine-tuning task requires approximately 5 GB of GPU memory. The console has already filtered the instance types that meet this requirement. Select an instance type, such as ecs.gn7i-c16g1.4xlarge.
- Hyperparameters:
  - learning_rate: Set to 0.0005
  - num_train_epochs: Set to 4
  - per_device_train_batch_size: Set to 8
  - seq_length: Set to 512
  Then, click Train > OK. The training job enters the Creating state. When the status changes to In operation, model fine-tuning starts.
View the training job until it completes. The fine-tuning process takes about 10 minutes. During this time, the job details page displays logs and metric curves. After the training job completes, the system saves the fine-tuned model to the specified OSS directory.
To view the training job details later, in the left-side navigation pane, click Model Gallery > Job Management > Training Jobs, and then click the job name.
(Optional) Adjust hyperparameters using loss curves
On the job details page, you can view the train_loss curve (training set loss) and the eval_loss curve (validation set loss):
You can use the trend of the loss values to assess the model's training effectiveness:
- Underfitting: Both the train_loss and eval_loss curves are still decreasing when training ends.
  You can increase the num_train_epochs parameter (the number of training epochs, which is positively correlated with training depth) or the lora_rank value (the rank of the low-rank matrix; a larger rank allows the model to handle more complex tasks but increases the risk of overfitting). Then, retrain the model to better fit the training data.
- Overfitting: The train_loss continues to decrease while the eval_loss starts to increase before training ends.
  You can decrease the num_train_epochs parameter or the lora_rank value, and then retrain the model to prevent overfitting.
- Good fit: Both the train_loss and eval_loss curves stabilize before the training ends.
  When the model reaches this state, you can proceed to the next steps.

4.4 Deploying the fine-tuned model

On the training job details page, click Deploy to open the deployment configuration page. For Resource Type, select Public Resources. Deploying the 0.6B model requires about 5 GB of GPU memory. The list under Instance Type automatically displays compatible specifications. Select one, such as ecs.gn7i-c8g1.2xlarge. Keep the other parameters at their default values, and then click Deploy > OK.

Deployment takes about 5 minutes and is complete when the status changes to Running.

To view the training job details, in the left-side navigation pane, click Model Gallery > Job Management > Training Jobs, and then click the job name.

If the Deploy button is disabled after the training job succeeds, it means the output model is still being registered. Wait about one minute for the button to be enabled.

The steps to invoke the model are the same as described in 3.2 Invoke the model.

4.5 Evaluate the fine-tuned model

Before deploying the fine-tuned model to a production environment, evaluate its performance to ensure it is stable and accurate. This evaluation helps prevent unexpected issues after deployment.

Prepare test data

Prepare a test dataset that does not overlap with your training data to evaluate the model's performance. The accuracy test code below automatically downloads a test set for this purpose.

Using a test dataset that is separate from the training data ensures an unbiased assessment of the model's generalization ability on unseen data. This practice prevents inflated scores that result from evaluating the model on data it has already seen.

Design evaluation metrics

Evaluation metrics should align closely with your business objectives. For this solution's use case, in addition to validating the generated JSON, you must also verify that the key-value pairs are correct.

Define the evaluation metrics programmatically. For the implementation in this example, refer to the compare_address_info method in the accuracy test code below.

Validate the fine-tuned model

Run the following test code to output the model's accuracy on the test set.

Test model accuracy

Note: Replace the Token and endpoint with the invocation details you obtained earlier.

# pip3 install openai
from openai import AsyncOpenAI
import requests
import json
import asyncio
import os

# If the 'Token' environment variable is not set, replace the following line with your token from the Elastic Algorithm Service (EAS): token = 'YTA1NTEzMzY3ZTY4Z******************'
token = os.environ.get("Token")

# Do not remove the "/v1" suffix from the endpoint.
client = AsyncOpenAI(
    api_key=token,
    base_url=f'YOUR_ENDPOINT/v1',
)

if token is None:
    print("Please set the 'Token' environment variable, or assign your token directly to the 'token' variable.")
    exit()

system_prompt = """You are a professional information extraction assistant specializing in parsing US shipping addresses from unstructured text.

## Task Description
Based on the given input text, accurately extract and generate a JSON object containing the following six fields:
- name: The full name of the recipient.
- street_address: The complete street address, including number, street name, and any apartment or suite number.
- city: The city name.
- state: The full state name (e.g., "California", not "CA").
- zip_code: The 5 or 9-digit ZIP code.
- phone: The complete contact phone number.

## Extraction Rules
1.  **Address Handling**:
    -   Accurately identify the components: street, city, state, and ZIP code.
    -   The `state` field must be the full official name (e.g., "New York", not "NY").
    -   The `street_address` should contain all details before the city, such as "123 Apple Lane, Apt 4B".
2.  **Name Identification**:
    -   Extract the full recipient name.
3.  **Phone Number Handling**:
    -   Extract the complete phone number, preserving its original format.
4.  **ZIP Code**:
    -   Extract the 5-digit or 9-digit (ZIP+4) code.

## Output Format
Strictly adhere to the following JSON format. Do not add any explanatory text or markdown.
{
  "name": "Recipient's Full Name",
  "street_address": "Complete Street Address",
  "city": "City Name",
  "state": "Full State Name",
  "zip_code": "ZIP Code",
  "phone": "Contact Phone Number"
}
"""

def compare_address_info(actual_address_str, predicted_address_str):
    """Compares two JSON strings representing address information to see if they are identical."""
    try:
        # Parse the actual address information
        if actual_address_str:
            actual_address_json = json.loads(actual_address_str)
        else:
            actual_address_json = {}

        # Parse the predicted address information
        if predicted_address_str:
            predicted_address_json = json.loads(predicted_address_str)
        else:
            predicted_address_json = {}

        # Directly compare if the two JSON objects are identical
        is_same = actual_address_json == predicted_address_json

        return {
            "is_same": is_same,
            "actual_address_parsed": actual_address_json,
            "predicted_address_parsed": predicted_address_json,
            "comparison_error": None
        }

    except json.JSONDecodeError as e:
        return {
            "is_same": False,
            "actual_address_parsed": None,
            "predicted_address_parsed": None,
            "comparison_error": f"JSON parsing error: {str(e)}"
        }
    except Exception as e:
        return {
            "is_same": False,
            "actual_address_parsed": None,
            "predicted_address_parsed": None,
            "comparison_error": f"Comparison error: {str(e)}"
        }

async def predict_single_conversation(conversation_data):
    """Predicts the label for a single conversation."""
    try:
        # Extract user content (excluding assistant message)
        messages = conversation_data.get("messages", [])
        user_content = None

        for message in messages:
            if message.get("role") == "user":
                user_content = message.get("content", "")
                break

        if not user_content:
            return {"error": "User message not found"}

        response = client.chat.completions.create(
            model="Qwen3-0.6B",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_content}
            ],
            response_format={"type": "json_object"},
            extra_body={
                "enable_thinking": False
            }
        )

        predicted_labels = response.choices[0].message.content.strip()
        return {"prediction": predicted_labels}

    except Exception as e:
        return {"error": f"Prediction failed: {str(e)}"}

async def process_batch(batch_data, batch_id):
    """Processes a batch of data."""
    print(f"Processing batch {batch_id}, containing {len(batch_data)} items...")

    tasks = []
    for i, conversation in enumerate(batch_data):
        task = predict_single_conversation(conversation)
        tasks.append(task)

    results = await asyncio.gather(*tasks, return_exceptions=True)

    batch_results = []
    for i, result in enumerate(results):
        if isinstance(result, Exception):
            batch_results.append({"error": f"Exception: {str(result)}"})
        else:
            batch_results.append(result)

    return batch_results

async def main():
    output_file = "predicted_labels.jsonl"
    batch_size = 20  # Number of items to process per batch

    # Read test data
    url = 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251015/yghxco/test.jsonl'
    conversations = []

    try:
        response = requests.get(url)
        response.raise_for_status()  # Check if the request was successful
        for line_num, line in enumerate(response.text.splitlines(), 1):
            try:
                data = json.loads(line.strip())
                conversations.append(data)
            except json.JSONDecodeError as e:
                print(f"JSON parsing error on line {line_num}: {e}")
                continue
    except requests.exceptions.RequestException as e:
        print(f"Request error: {e}")
        return

    print(f"Successfully read {len(conversations)} conversations")

    # Process in batches
    all_results = []
    total_batches = (len(conversations) + batch_size - 1) // batch_size

    for batch_id in range(total_batches):
        start_idx = batch_id * batch_size
        end_idx = min((batch_id + 1) * batch_size, len(conversations))
        batch_data = conversations[start_idx:end_idx]

        batch_results = await process_batch(batch_data, batch_id + 1)
        all_results.extend(batch_results)

        print(f"Batch {batch_id + 1}/{total_batches} completed")

        # Add a small delay to avoid making requests too quickly
        if batch_id < total_batches - 1:
            await asyncio.sleep(1)

    # Save results
    same_count = 0
    different_count = 0
    error_count = 0

    with open(output_file, 'w', encoding='utf-8') as f:
        for i, (original_data, prediction_result) in enumerate(zip(conversations, all_results)):
            result_entry = {
                "index": i,
                "original_user_content": None,
                "actual_address": None,
                "predicted_address": None,
                "prediction_error": None,
                "address_comparison": None
            }

            # Extract original user content
            messages = original_data.get("messages", [])
            for message in messages:
                if message.get("role") == "user":
                    result_entry["original_user_content"] = message.get("content", "")
                    break

            # Extract actual address information (if assistant message exists)
            for message in messages:
                if message.get("role") == "assistant":
                    result_entry["actual_address"] = message.get("content", "")
                    break

            # Save prediction result
            if "error" in prediction_result:
                result_entry["prediction_error"] = prediction_result["error"]
                error_count += 1
            else:
                result_entry["predicted_address"] = prediction_result.get("prediction", "")

                # Compare address information
                comparison_result = compare_address_info(
                    result_entry["actual_address"],
                    result_entry["predicted_address"]
                )
                result_entry["address_comparison"] = comparison_result

                # Tally comparison results
                if comparison_result["comparison_error"]:
                    error_count += 1
                elif comparison_result["is_same"]:
                    same_count += 1
                else:
                    different_count += 1

            f.write(json.dumps(result_entry, ensure_ascii=False) + '\n')

    print(f"All predictions complete! Results have been saved to {output_file}")

    # Statistics
    success_count = sum(1 for result in all_results if "error" not in result)
    prediction_error_count = len(all_results) - success_count
    print(f"Number of samples: {success_count}")
    print(f"Correct responses: {same_count}")
    print(f"Incorrect responses: {different_count}")
    print(f"Accuracy: {same_count * 100 / success_count} %")

if __name__ == "__main__":
    asyncio.run(main())

Output:

All predictions complete! Results have been saved to predicted_labels.jsonl
Number of samples: 400
Correct responses: 382
Incorrect responses: 18
Accuracy: 95.5 %

Due to the random seed used in model fine-tuning and the stochastic nature of the large language model's output, the accuracy you achieve may differ from the results shown in this solution. This variance is normal.

The accuracy is 95.5%, a significant improvement over the original Qwen3-0.6B model's 50% accuracy. This demonstrates that fine-tuning substantially improved its performance on structured information extraction in the logistics domain.

This guide uses only 4 training epochs to reduce training time, which raised the accuracy to 95.5%. You can further improve accuracy by increasing the number of training epochs.

4.6 Important note

The model service in this topic uses public resources and is pay-as-you-go. When you no longer need the service, stop or delete it to avoid further charges.

Platform For AI:Model Gallery quick start