Quick Start for Model Gallery - Platform For AI - Alibaba Cloud Documentation Center

Model Gallery encapsulates Platform for AI (PAI)-DLC and PAI-EAS, providing a zero-code solution to efficiently deploy and train open-source large language models (LLMs). This guide uses the Qwen3-0.6B model to demonstrate the process. The same steps apply to other models.

Prerequisites

Activate Platform for AI (PAI) and create a workspace using your Alibaba Cloud account. To do this, log on to the PAI console, select a region in the upper-left corner, and then activate the service using the one-click authorization.

Billing

The examples in this guide use public resources to create PAI-DLC tasks and PAI-EAS services. These resources are billed on a pay-as-you-go basis. For more information about billing rules, see DLC billing and EAS billing.

Model deployment

Deploy the model

Log on to the PAI console. In the navigation pane on the left, click Model Gallery. Search for Qwen3-0.6B and click Deploy.
Configure the deployment parameters. The deployment page includes default parameters. Click Deploy > OK. The deployment process takes about 5 minutes. The deployment is successful when the status changes to Running.
By default, this service uses public resources and is billed on a pay-as-you-go basis.

Invoke the model

View invocation information. On the service details page, click View Call Information to obtain the Internet Endpoint and Token.
To view the deployment task details later, navigate to Model Gallery > Job Management > Deployment Jobs, and then click the Service name.
Test the model service. You can invoke the model using one of the following methods:
Online debugging
Switch to the Online Debugging page. In the Body field, enter a question, such as Hello, who are you?. Then, click Send Request. The LLM's response is displayed on the right.
Use the Cherry Studio client
Cherry Studio is a popular large model chat client that integrates the MCP feature, which lets you easily chat with large models.
Connect to the Qwen3 model that is deployed on PAI
1. Install the client
  Go to Cherry Studio to download and install the client.
  You can also go to https://github.com/CherryHQ/cherry-studio/releases to download it.
2. Add a provider.
  1. Click the Settings button in the upper-right corner. Then, in the Model Provider section, click Add.
  2. For Provider Name, enter a custom name, such as Platform for AI. The provider type is OpenAI.
  3. Click OK.
3. Enter your token in the API Key field and your endpoint in the API Host field.
4. Click Add. In the Model ID field, enter Qwen3-0.6B (case-sensitive).
5. You can click Check next to the API Key field to test the connection.
6. Click the icon to return to the chat page. At the top of the window, switch to the Qwen3-0.6B model that you added to start chatting.
Use the Python SDK
```
from openai import OpenAI
import os

# If you have not configured the environment variable, replace the next line with the token of your EAS service: token = 'YTA1NTEzMzY3ZTY4Z******************'
token = os.environ.get("Token")
# Do not remove "/v1" at the end of the service URL.
client = OpenAI(
    api_key=token,
    base_url=f'Your service URL/v1',
)

if token is None:
    print("Please configure the Token environment variable or assign the token directly to the token variable.")
    exit()

query = 'Hello, who are you?'
messages = [{'role': 'user', 'content': query}]

resp = client.chat.completions.create(model='Qwen3-0.6B', messages=messages, max_tokens=512, temperature=0)
query = messages[0]['content']
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
```

Important reminder

The model service in this guide was created using public resources, which are billed on a pay-as-you-go basis. To avoid additional charges, stop or delete the service when you are finished.

Model fine-tuning

To improve a model's performance in a specific domain, you can fine-tune it on a dataset from that domain. This section uses the following scenario to demonstrate the purpose and steps of model fine-tuning.

Use case

In the logistics industry, you often need to extract structured information (such as recipient, address, and phone number) from natural language. Large models, such as Qwen3-235B-A22B, perform well on this task but are costly and have high latency. To balance performance and cost, you can first use a large-parameter model to label data and then use that data to fine-tune a smaller model, such as Qwen3-0.6B. This process is also known as model distillation.

For the same structured information extraction task, the original Qwen3-0.6B model achieves an accuracy of 50%. After fine-tuning, its accuracy can exceed 90%.

Example recipient address information

Example structured information

Amina Patel - Phone number (474) 598-1543 - 1425 S 5th St, Apt 3B, Allentown, Pennsylvania 18104

{
    "state": "Pennsylvania",
    "city": "Allentown",
    "zip_code": "18104",
    "street_address": "1425 S 5th St, Apt 3B",
    "name": "Amina Patel",
    "phone": "(474) 598-1543"
}

Prepare the data

To distill the knowledge from the teacher model (Qwen3-235B-A22B) to the Qwen3-0.6B model for this task, you first need to use the teacher model's API to extract recipient address information into structured JSON data. Generating this data can be time-consuming. Therefore, this topic provides a sample training dataset train.json and a validation set eval.json that you can download and use directly.

In model distillation, the larger model is known as the teacher model. The data used in this guide is synthetically generated by a large model and does not contain any sensitive user information.

Going live

To apply this solution to your business, we recommend that you prepare data using the following methods:

Real business scenarios (recommended)

Real business data better reflects your business scenarios, and the fine-tuned model can be better adapted to your business. After you obtain the business data, you need to programmatically convert it into a JSON file in the following format.

[
    {
        "instruction": "You are an expert assistant for extracting structured JSON from US shipping information. The JSON keys are name, street_address, city, state, zip_code, and phone.  Name: Isabella Rivera Cruz | 182 Calle Luis Lloréns Torres, Apt 3B, Mayagüez, Puerto Rico 00680 | MOBILE: (640) 486-5927",
        "output": "{\"name\": \"Isabella Rivera Cruz\", \"street_address\": \"182 Calle Luis Lloréns Torres, Apt 3B\", \"city\": \"Mayagüez\", \"state\": \"Puerto Rico\", \"zip_code\": \"00680\", \"phone\": \"(640) 486-5927\"}"
    },
    {
        "instruction": "You are an expert assistant for extracting structured JSON from US shipping information. The JSON keys are name, street_address, city, state, zip_code, and phone.  1245 Broadwater Avenue, Apt 3B, Bozeman, Montana 59715Receiver: Aisha PatelP: (429) 763-9742",
        "output": "{\"name\": \"Aisha Patel\", \"street_address\": \"1245 Broadwater Avenue, Apt 3B\", \"city\": \"Bozeman\", \"state\": \"Montana\", \"zip_code\": \"59715\", \"phone\": \"(429) 763-9742\"}"
    }
]

The JSON file contains multiple training samples. Each sample includes two fields: instruction and output.

instruction: Contains the prompt that guides the behavior of the large model, along with the input data.
output: The expected standard answer, usually generated by human experts or larger models such as qwen3-235b-a22b.

Model generation

When business data is insufficient, consider using a model for data augmentation. This can improve the diversity and coverage of the data. To avoid leaking user privacy, this solution uses a model to generate a batch of virtual address data. The following generation code is for your reference.

Code for simulating business data generation

To run the following code, you need to create an Alibaba Cloud Model Studio API key. The code uses qwen-plus-latest to generate business data and qwen3-235b-a22b for labeling.

# -*- coding: utf-8 -*-
import os
import asyncio
import random
import json
import sys
from typing import List
import platform
from openai import AsyncOpenAI

# Create an asynchronous client instance.
# NOTE: This script uses the DashScope-compatible API endpoint.
# If you are using a different OpenAI-compatible service, change the base_url.
client = AsyncOpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

# List of US States and Territories.
us_states = [
    "Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware",
    "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky",
    "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi",
    "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico",
    "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania",
    "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont",
    "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming", "District of Columbia",
    "Puerto Rico", "Guam", "American Samoa", "U.S. Virgin Islands", "Northern Mariana Islands"
]

# Recipient templates.
recipient_templates = [
    "To: {name}", "Recipient: {name}", "Deliver to {name}", "For: {name}",
    "ATTN: {name}", "{name}", "Name: {name}", "Contact: {name}", "Receiver: {name}"
]

# Phone number templates.
phone_templates = [
    "Tel: {phone}", "Tel. {phone}", "Mobile: {phone}", "Phone: {phone}",
    "Contact number: {phone}", "Phone number {phone}", "TEL: {phone}", "MOBILE: {phone}",
    "Contact: {phone}", "P: {phone}", "{phone}", "Call: {phone}",
]


# Generate a plausible US-style phone number.
def generate_us_phone():
    """Generates a random 10-digit US phone number in (XXX) XXX-XXXX format."""
    area_code = random.randint(201, 999)  # Avoid 0xx, 1xx area codes.
    exchange = random.randint(200, 999)
    line = random.randint(1000, 9999)
    return f"({area_code}) {exchange}-{line}"


# Use LLM to generate recipient and address information.
async def generate_recipient_and_address_by_llm(state: str):
    """Uses LLM to generate a recipient's name and address details for a given state."""
    prompt = f"""Please generate recipient information for a location in {state}, USA, including the following:
1. A realistic full English name. Aim for diversity.
2. A real city name within that state.
3. A specific street address, such as street number and name, and apartment number. It should be realistic.
4. A corresponding 5-digit ZIP code for that city or area.

Please return only the JSON object in the following format:
{{"name": "Recipient Name", "city": "City Name", "street_address": "Specific Street Address", "zip_code": "ZIP Code"}}

Do not include any other text, just the JSON. Ensure names are diverse, not just John Doe.
"""

    try:
        response = await client.chat.completions.create(
            messages=[{"role": "user", "content": prompt}],
            model="qwen-plus-latest",
            temperature=1.5,  # Increase temperature for more diverse names and addresses.
        )

        result = response.choices[0].message.content.strip()
        # Clean up potential markdown code block markers.
        if result.startswith('```'):
            result = result.split('\n', 1)[1]
        if result.endswith('```'):
            result = result.rsplit('\n', 1)[0]

        # Try to parse JSON.
        info = json.loads(result)
        print(info)
        return info
    except Exception as e:
        print(f"Failed to generate recipient and address: {e}, using fallback.")
        # Fallback mechanism.
        backup_names = ["Michael Johnson", "Emily Williams", "David Brown", "Jessica Jones", "Christopher Davis",
                        "Sarah Miller"]
        return {
            "name": random.choice(backup_names),
            "city": "Anytown",
            "street_address": f"{random.randint(100, 9999)} Main St",
            "zip_code": f"{random.randint(10000, 99999)}"
        }


# Generate a single raw data record.
async def generate_record():
    """Generates one messy, combined string of US address information."""
    # Randomly select a state.
    state = random.choice(us_states)

    # Use LLM to generate recipient and address info.
    info = await generate_recipient_and_address_by_llm(state)

    # Format recipient name.
    recipient = random.choice(recipient_templates).format(name=info['name'])

    # Generate a phone number.
    phone = generate_us_phone()
    phone_info = random.choice(phone_templates).format(phone=phone)

    # Assemble the full address line.
    full_address = f"{info['street_address']}, {info['city']}, {state} {info['zip_code']}"

    # Combine all components.
    components = [recipient, phone_info, full_address]

    # Randomize the order of components.
    random.shuffle(components)

    # Choose a random separator.
    separators = [' ', ', ', '; ', ' | ', '\t', ' - ', ' // ', '', '  ']
    separator = random.choice(separators)

    # Join the components.
    combined_data = separator.join(components)
    return combined_data.strip()


# Generate a batch of data.
async def generate_batch_data(count: int) -> List[str]:
    """Generates a specified number of data records."""
    print(f"Starting to generate {count} records...")

    # Use a semaphore to control concurrency, for example, up to 20 concurrent requests.
    semaphore = asyncio.Semaphore(20)

    async def generate_single_record(index):
        async with semaphore:
            try:
                record = await generate_record()
                print(f"Generated record #{index + 1}: {record}")
                return record
            except Exception as e:
                print(f"Failed to generate record #{index + 1}: {e}")
                return None

    # Concurrently generate data.
    tasks = [generate_single_record(i) for i in range(count)]

    data = await asyncio.gather(*tasks)

    successful_data = [record for record in data if record is not None]

    return successful_data


# Save data to a file.
def save_data(data: List[str], filename: str = "us_recipient_data.json"):
    """Saves the generated data to a JSON file."""
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(data, f, ensure_ascii=False, indent=2)
    print(f"Data has been saved to {filename}")


# Phase 1: Data Production.
async def produce_data_phase():
    """Handles the generation of raw recipient data."""
    print("=== Phase 1: Starting Raw Recipient Data Generation ===")

    # Generate 2,000 records.
    batch_size = 2000
    data = await generate_batch_data(batch_size)

    # Save the data.
    save_data(data, "us_recipient_data.json")

    print(f"\nTotal records generated: {len(data)}")
    print("\nSample Data:")
    for i, record in enumerate(data[:3]):  # Show first 3 as examples.
        print(f"{i + 1}. Raw Data: {record}\n")

    print("=== Phase 1 Complete ===\n")
    return True


# Define the system prompt for the extraction model.
def get_system_prompt_for_extraction():
    """Returns the system prompt for the information extraction task."""
    return """You are a professional information extraction assistant specializing in parsing US shipping addresses from unstructured text.

## Task Description
Based on the given input text, accurately extract and generate a JSON object containing the following six fields:
- name: The full name of the recipient.
- street_address: The complete street address, including number, street name, and any apartment or suite number.
- city: The city name.
- state: The full state name (e.g., "California", not "CA").
- zip_code: The 5 or 9-digit ZIP code.
- phone: The complete contact phone number.

## Extraction Rules
1.  **Address Handling**:
    -   Accurately identify the components: street, city, state, and ZIP code.
    -   The `state` field must be the full official name (e.g., "New York", not "NY").
    -   The `street_address` should contain all details before the city, such as "123 Apple Lane, Apt 4B".
2.  **Name Identification**:
    -   Extract the full recipient name.
3.  **Phone Number Handling**:
    -   Extract the complete phone number, preserving its original format.
4.  **ZIP Code**:
    -   Extract the 5-digit or 9-digit (ZIP+4) code.

## Output Format
Strictly adhere to the following JSON format. Do not add any explanatory text or markdown.
{
  "name": "Recipient's Full Name",
  "street_address": "Complete Street Address",
  "city": "City Name",
  "state": "Full State Name",
  "zip_code": "ZIP Code",
  "phone": "Contact Phone Number"
}
"""


# Use LLM to predict structured data from raw text.
async def predict_structured_data(raw_data: str):
    """Uses an LLM to predict structured data from a raw string."""
    system_prompt = get_system_prompt_for_extraction()

    try:
        response = await client.chat.completions.create(
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": raw_data}
            ],
            model="qwen3-235b-a22b",  # A powerful model is recommended for this task.
            temperature=0.0,  # Lower temperature for higher accuracy in extraction.
            response_format={"type": "json_object"},
            extra_body={"enable_thinking": False}
        )

        result = response.choices[0].message.content.strip()

        # Clean up potential markdown code block markers.
        if result.startswith('```'):
            lines = result.split('\n')
            for i, line in enumerate(lines):
                if line.strip().startswith('{'):
                    result = '\n'.join(lines[i:])
                    break
        if result.endswith('```'):
            result = result.rsplit('\n```', 1)[0]

        structured_data = json.loads(result)
        return structured_data

    except Exception as e:
        print(f"Failed to predict structured data: {e}, Raw data: {raw_data}")
        # Return an empty structure on failure.
        return {
            "name": "",
            "street_address": "",
            "city": "",
            "state": "",
            "zip_code": "",
            "phone": ""
        }


# Phase 2: Data Conversion.
async def convert_data_phase():
    """Reads raw data, predicts structured format, and saves as SFT data."""
    print("=== Phase 2: Starting Data Conversion to SFT Format ===")

    try:
        print("Reading us_recipient_data.json file...")
        with open('us_recipient_data.json', 'r', encoding='utf-8') as f:
            raw_data_list = json.load(f)

        print(f"Successfully read {len(raw_data_list)} records.")
        print("Starting to predict structured data using the extraction model...")

        # A simple and clear system message can improve training and inference speed.
        system_prompt = "You are an expert assistant for extracting structured JSON from US shipping information. The JSON keys are name, street_address, city, state, zip_code, and phone."
        output_file = 'us_recipient_sft_data.json'

        # Use a semaphore to control concurrency.
        semaphore = asyncio.Semaphore(10)

        async def process_single_item(index, raw_data):
            async with (semaphore):
                structured_data = await predict_structured_data(raw_data)
                print(f"Processing record #{index + 1}: {raw_data}")

                conversation = {
                        "instruction": system_prompt + '  ' + raw_data,
                        "output": json.dumps(structured_data, ensure_ascii=False)
                }

                return conversation

        print(f"Starting conversion to {output_file}...")

        tasks = [process_single_item(i, raw_data) for i, raw_data in enumerate(raw_data_list)]
        conversations = await asyncio.gather(*tasks)

        with open(output_file, 'w', encoding='utf-8') as outfile:
            json.dump(conversations, outfile, ensure_ascii=False, indent=4)

        print(f"Conversion complete! Processed {len(raw_data_list)} records.")
        print(f"Output file: {output_file}")
        print("=== Phase 2 Complete ===")

    except FileNotFoundError:
        print("Error: us_recipient_data.json not found.")
        sys.exit(1)
    except json.JSONDecodeError as e:
        print(f"JSON decoding error: {e}")
        sys.exit(1)
    except Exception as e:
        print(f"An error occurred during conversion: {e}")
        sys.exit(1)


# Main function.
async def main():
    print("Starting the data processing pipeline...")
    print("This program will execute two phases in sequence:")
    print("1. Generate raw US recipient data.")
    print("2. Predict structured data and convert it to SFT format.")
    print("-" * 50)

    # Phase 1: Generate data.
    success = await produce_data_phase()

    if success:
        # Phase 2: Convert data.
        await convert_data_phase()

        print("\n" + "=" * 50)
        print("All processes completed successfully!")
        print("Generated files:")
        print("- us_recipient_data.json: Raw, unstructured data list.")
        print("- us_recipient_sft_data.json: SFT-formatted training data.")
        print("=" * 50)
    else:
        print("Data generation phase failed. Terminating.")


if __name__ == '__main__':
    # Set event loop policy for Windows if needed.
    if platform.system() == 'Windows':
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    # Run the main coroutine.
    asyncio.run(main(), debug=False)

Fine-tune the model

In the navigation pane on the left, click Model Gallery. Search for the Qwen3-0.6B model and click Train.
Configure the training task parameters. Configure the following key parameters and leave the others at their default values.
- Training Mode: The default is SFT (Supervised Fine-Tuning), which uses the LoRA fine-tuning method.
  LoRA is an efficient fine-tuning technique that saves training resources by modifying only a subset of the model's parameters.
- Training dataset: First, download the sample training dataset train.json. Then, on the configuration page, select OSS file Or directory and click the icon to select a bucket. Click Upload file to upload the dataset to Object Storage Service (OSS). Finally, select the file.
- Validation dataset: First, download the validation dataset eval.json. Then, click Add validation dataset and follow the same procedure as for the training dataset to upload and select the file.
  The validation dataset is used during training to evaluate the model's performance on unseen data.
- ModelOutput Path: By default, the fine-tuned model is saved to OSS. If the OSS directory is empty, click Create folder and specify a directory.
- Resource Type: Select Public Resources. This fine-tuning task requires about 5 GB of GPU memory. The console has already filtered the available resource specifications to meet this requirement. Select a specification such as ecs.gn7i-c16g1.4xlarge.
- Hyperparameters:
  - learning_rate: Set to 0.0005
  - num_train_epochs: Set to 4
  - per_device_train_batch_size: Set to 8
  - seq_length: Set to 512
  Click Train > OK. The status of the training task changes to Creating. When the status changes to Running, the model fine-tuning process begins.
View the training task and wait for it to complete. The model fine-tuning process takes about 10 minutes. During fine-tuning, the task details page displays logs and metric curves. After the training is complete, the fine-tuned model is saved to the specified OSS directory.
To view the training task details later, click Model Gallery > Task Management > Training Jobs in the navigation pane on the left, and then click the task name.
(Optional) Adjust hyperparameters based on the loss graph to improve model performance
On the task details page, you can view the train_loss curve (reflecting training set loss) and the eval_loss curve (reflecting validation set loss):
You can use the trend of the loss values to assess the model's training effectiveness:
- Underfitting: The train_loss and eval_loss curves are still decreasing when training ends.
  You can increase the num_train_epochs (number of training epochs, which is positively correlated with training depth) or slightly increase the lora_rank (the rank of the low-rank matrix; a larger rank allows the model to learn more complex tasks but increases the risk of overfitting) and then retrain the model to improve its fit to the training data.
- Overfitting: The train_loss continues to decrease, but the eval_loss starts to increase before training ends.
  You can decrease the num_train_epochs or slightly decrease the lora_rank and then retrain the model to prevent it from overfitting.
- Good fit: Both the train_loss and eval_loss curves have stabilized and are flat. When the model reaches this state, you can proceed.

Deploy the fine-tuned model

On the training job details page, click the Deploy button to open the deployment configuration page. Set Resource Type to Public Resources. Deploying the 0.6B model requires approximately 5 GB of GPU memory. The Resource Type dropdown has already been filtered to show specifications that meet this requirement. Select a specification such as ecs.gn7i-c8g1.2xlarge. Keep the other parameters at their default settings, and then click Deploy > OK.

The deployment process takes about 5 minutes. When the status changes to Running, the deployment is successful.

If the Deploy button is disabled after the training task succeeds, it means the output model is still registering. Wait for about 1 minute.

The subsequent steps for invoking the model are the same as those described in the Invoke the model section.

Verify the performance of the fine-tuned model

Before deploying the fine-tuned model to a production environment, systematically evaluate its performance to ensure stability and accuracy and to avoid unexpected issues after deployment.

Prepare test data

Prepare a test dataset that does not overlap with the training data to test the model's performance. This topic provides a test set that is automatically downloaded when you run the accuracy test code below.

The test data should not overlap with the training data. This ensures a more accurate reflection of the model's generalization ability on new data and avoids inflated scores due to sample memorization.

Design evaluation metrics

The evaluation criteria should align closely with your actual business goals. In this solution's example, in addition to checking whether the generated JSON string is valid, you should also check whether the corresponding key-value pairs are correct.

You need to define the evaluation metrics programmatically. For the implementation of the evaluation metrics in this example, see the compare_address_info method in the accuracy test code below.

Verify the performance of the fine-tuned model

Run the following test code, which will output the model's accuracy on the test set.

Example code for testing model accuracy

Note: Replace Token and Endpoint with the values from your service.

# pip3 install openai
from openai import AsyncOpenAI
import requests
import json
import asyncio
import os

# If the 'Token' environment variable is not set, replace the following line with your token from the EAS service: token = 'YTA1NTEzMzY3ZTY4Z******************'
token = os.environ.get("Token")

# Do not remove the "/v1" suffix after the service URL.
client = OpenAI(
    api_key=token,
    base_url=f'<Your_Service_URL>/v1',
)

if token is None:
    print("Please set the 'Token' environment variable, or assign your token directly to the 'token' variable.")
    exit()

system_prompt = """You are a professional information extraction assistant specializing in parsing US shipping addresses from unstructured text.

## Task Description
Based on the given input text, accurately extract and generate a JSON object containing the following six fields:
- name: The full name of the recipient.
- street_address: The complete street address, including number, street name, and any apartment or suite number.
- city: The city name.
- state: The full state name (e.g., "California", not "CA").
- zip_code: The 5 or 9-digit ZIP code.
- phone: The complete contact phone number.

## Extraction Rules
1.  **Address Handling**:
    -   Accurately identify the components: street, city, state, and ZIP code.
    -   The `state` field must be the full official name (e.g., "New York", not "NY").
    -   The `street_address` should contain all details before the city, such as "123 Apple Lane, Apt 4B".
2.  **Name Identification**:
    -   Extract the full recipient name.
3.  **Phone Number Handling**:
    -   Extract the complete phone number, preserving its original format.
4.  **ZIP Code**:
    -   Extract the 5-digit or 9-digit (ZIP+4) code.

## Output Format
Strictly adhere to the following JSON format. Do not add any explanatory text or markdown.
{
  "name": "Recipient's Full Name",
  "street_address": "Complete Street Address",
  "city": "City Name",
  "state": "Full State Name",
  "zip_code": "ZIP Code",
  "phone": "Contact Phone Number"
}
"""


def compare_address_info(actual_address_str, predicted_address_str):
    """Compares two JSON strings representing address information to see if they are identical."""
    try:
        # Parse the actual address information
        if actual_address_str:
            actual_address_json = json.loads(actual_address_str)
        else:
            actual_address_json = {}

        # Parse the predicted address information
        if predicted_address_str:
            predicted_address_json = json.loads(predicted_address_str)
        else:
            predicted_address_json = {}

        # Directly compare if the two JSON objects are identical
        is_same = actual_address_json == predicted_address_json

        return {
            "is_same": is_same,
            "actual_address_parsed": actual_address_json,
            "predicted_address_parsed": predicted_address_json,
            "comparison_error": None
        }

    except json.JSONDecodeError as e:
        return {
            "is_same": False,
            "actual_address_parsed": None,
            "predicted_address_parsed": None,
            "comparison_error": f"JSON parsing error: {str(e)}"
        }
    except Exception as e:
        return {
            "is_same": False,
            "actual_address_parsed": None,
            "predicted_address_parsed": None,
            "comparison_error": f"Comparison error: {str(e)}"
        }


async def predict_single_conversation(conversation_data):
    """Predicts the label for a single conversation."""
    try:
        # Extract user content (excluding assistant message)
        messages = conversation_data.get("messages", [])
        user_content = None

        for message in messages:
            if message.get("role") == "user":
                user_content = message.get("content", "")
                break

        if not user_content:
            return {"error": "User message not found"}

        response = await client.chat.completions.create(
            model="Qwen3-0.6B",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_content}
            ],
            response_format={"type": "json_object"},
            extra_body={
                "enable_thinking": False
            }
        )

        predicted_labels = response.choices[0].message.content.strip()
        return {"prediction": predicted_labels}

    except Exception as e:
        return {"error": f"Prediction failed: {str(e)}"}


async def process_batch(batch_data, batch_id):
    """Processes a batch of data."""
    print(f"Processing batch {batch_id}, containing {len(batch_data)} items...")

    tasks = []
    for i, conversation in enumerate(batch_data):
        task = predict_single_conversation(conversation)
        tasks.append(task)

    results = await asyncio.gather(*tasks, return_exceptions=True)

    batch_results = []
    for i, result in enumerate(results):
        if isinstance(result, Exception):
            batch_results.append({"error": f"Exception: {str(result)}"})
        else:
            batch_results.append(result)

    return batch_results


async def main():
    output_file = "predicted_labels.jsonl"
    batch_size = 20  # Number of items to process per batch

    # Read test data
    url = 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20251015/yghxco/test.jsonl'
    conversations = []

    try:
        response = requests.get(url)
        response.raise_for_status()  # Check if the request was successful
        for line_num, line in enumerate(response.text.splitlines(), 1):
            try:
                data = json.loads(line.strip())
                conversations.append(data)
            except json.JSONDecodeError as e:
                print(f"JSON parsing error on line {line_num}: {e}")
                continue
    except requests.exceptions.RequestException as e:
        print(f"Request error: {e}")
        return

    print(f"Successfully read {len(conversations)} conversation data items")

    # Process in batches
    all_results = []
    total_batches = (len(conversations) + batch_size - 1) // batch_size

    for batch_id in range(total_batches):
        start_idx = batch_id * batch_size
        end_idx = min((batch_id + 1) * batch_size, len(conversations))
        batch_data = conversations[start_idx:end_idx]

        batch_results = await process_batch(batch_data, batch_id + 1)
        all_results.extend(batch_results)

        print(f"Batch {batch_id + 1}/{total_batches} completed")

        # Add a small delay to avoid making requests too quickly
        if batch_id < total_batches - 1:
            await asyncio.sleep(1)

    # Save results
    same_count = 0
    different_count = 0
    error_count = 0

    with open(output_file, 'w', encoding='utf-8') as f:
        for i, (original_data, prediction_result) in enumerate(zip(conversations, all_results)):
            result_entry = {
                "index": i,
                "original_user_content": None,
                "actual_address": None,
                "predicted_address": None,
                "prediction_error": None,
                "address_comparison": None
            }

            # Extract original user content
            messages = original_data.get("messages", [])
            for message in messages:
                if message.get("role") == "user":
                    result_entry["original_user_content"] = message.get("content", "")
                    break

            # Extract actual address information (if assistant message exists)
            for message in messages:
                if message.get("role") == "assistant":
                    result_entry["actual_address"] = message.get("content", "")
                    break

            # Save prediction result
            if "error" in prediction_result:
                result_entry["prediction_error"] = prediction_result["error"]
                error_count += 1
            else:
                result_entry["predicted_address"] = prediction_result.get("prediction", "")

                # Compare address information
                comparison_result = compare_address_info(
                    result_entry["actual_address"],
                    result_entry["predicted_address"]
                )
                result_entry["address_comparison"] = comparison_result

                # Tally comparison results
                if comparison_result["comparison_error"]:
                    error_count += 1
                elif comparison_result["is_same"]:
                    same_count += 1
                else:
                    different_count += 1

            f.write(json.dumps(result_entry, ensure_ascii=False) + '\n')

    print(f"All predictions complete! Results have been saved to {output_file}")

    # Statistics
    success_count = sum(1 for result in all_results if "error" not in result)
    prediction_error_count = len(all_results) - success_count
    print(f"Number of samples: {success_count}")
    print(f"Correct responses: {same_count}")
    print(f"Incorrect responses: {different_count}")
    print(f"Accuracy: {same_count * 100 / success_count} %")


if __name__ == "__main__":
    asyncio.run(main())

Output:

All predictions complete! Results have been saved to predicted_labels.jsonl
Number of samples: 400
Correct responses: 382
Incorrect responses: 18
Accuracy: 95.5 %

Due to the random seed in model fine-tuning and the stochastic nature of the large model's output, the accuracy you achieve may differ from the results in this topic. This is normal.

The accuracy is 95.5%, a significant improvement over the original Qwen3-0.6B model's 50% accuracy. This demonstrates that the fine-tuned model has significantly enhanced its ability to extract structured information in the logistics domain.

To reduce training time, this guide uses only 4 training epochs, which already increased the accuracy to over 90%. You can further improve accuracy by increasing the number of training epochs.

Important reminder

The model service in this guide was created using public resources, which are billed on a pay-as-you-go basis. To avoid additional charges, stop or delete the service when you are finished.

References

For more information about Model Gallery features such as evaluation and compression, see Model Gallery.
For more information about EAS features such as Auto Scaling, stress testing, and monitoring and alerting, see EAS overview.

Prerequisites

Billing

Model deployment

Deploy the model

Invoke the model

Online debugging

Use the Cherry Studio client

Use the Python SDK

Important reminder

Model fine-tuning

Use case

Prepare the data

Going live

Real business scenarios (recommended)

Model generation

Fine-tune the model

Deploy the fine-tuned model

Verify the performance of the fine-tuned model

Prepare test data

Design evaluation metrics

Verify the performance of the fine-tuned model

Important reminder

References