Model Gallery lets you deploy and fine-tune open-source LLMs without writing code. This guide walks through the full workflow — deployment, invocation, fine-tuning, and evaluation — using Qwen3-0.6B as an example.
Prerequisites
Activate PAI and create a workspace with your Alibaba Cloud account. Log in to the PAI console, select a region, and complete the one-click authorization.
Billing
The examples in this guide use pay-as-you-go public resources to create PAI-DLC tasks and PAI-EAS services. Billing rules: PAI-DLC billing, PAI-EAS billing.
Model deployment
Deploy a model
-
Log in to the PAI console. In the left-side navigation pane, click Model Gallery, find Qwen3-0.6B, and click Deploy.

-
The deployment page is pre-populated with default parameters. Click Deploy > Confirm. Deployment takes about 5 minutes and is complete when the status changes to In operation.
By default, the model service uses public resources and the pay-as-you-go billing method.

Invoke the model
-
On the service details page, click View Call Information to get the Internet Endpoint and Token.
To view deployment job details later, go to Model Gallery > Job Management > Deployment Jobs in the navigation pane, and then click the Service name.

-
Test the model service by using one of the following methods:
Online debugging
Go to the Online Debugging tab. The LLM service supports Conversation Debugging and API Debugging.


Cherry Studio client
Cherry Studio is a popular LLM client with built-in MCP support.
Connect to the Qwen3 model deployed on PAI
-
Install the client
Download and install the client from Cherry Studio.
You can also download it from
https://github.com/CherryHQ/cherry-studio/releases. -
Add a provider.
-
Click the Settings icon
in the lower-left corner. In the Model Provider section, click Add. -
In the Provider Name field, enter a custom name, such as Platform for AI. For Provider Type, select OpenAI.
-
Click OK.
-
-
Enter the Token in the API Key field and the Internet Endpoint in the API Host field.
-
Click Add. In the Model ID field, enter
Qwen3-0.6B(case-sensitive). -
You can click Check next to the API Key field to test the connectivity.
-
Click the
icon to return to the chat page. At the top of the window, switch to the Qwen3-0.6B model you added and start the conversation.
Python SDK
from openai import OpenAI import os # If an environment variable is not set, replace the next line with your EAS service Token: token = '<YOUR_EAS_SERVICE_TOKEN>' token = os.environ.get("Token") # Do not remove "/v1" from the end of your Internet Endpoint. client = OpenAI( api_key=token, base_url=f'<YOUR_INTERNET_ENDPOINT>/v1', ) if token is None: print("Please set the Token environment variable, or assign the Token value directly to the 'token' variable.") exit() query = 'Hello, who are you?' messages = [{'role': 'user', 'content': query}] resp = client.chat.completions.create(model='Qwen3-0.6B', messages=messages, max_tokens=512, temperature=0) query = messages[0]['content'] response = resp.choices[0].message.content print(f'query: {query}') print(f'response: {response}') -
Clean up resources
The model service uses pay-as-you-go public resources. Stop or delete the service when you no longer need it to avoid further charges.

Model fine-tuning
Fine-tuning adapts a model to a specific domain using a domain-specific dataset. The following example demonstrates a typical fine-tuning workflow.
Use case
In logistics, extracting structured data (recipient names, addresses, phone numbers) from free text is common. Large models like Qwen3-235B-A22B excel at this but are costly. A practical approach is to label data with the large model, then fine-tune a smaller model (Qwen3-0.6B) to match its performance at a fraction of the cost. This is known as model distillation.
On this task, the original Qwen3-0.6B achieves 50% accuracy. After fine-tuning, accuracy exceeds 90%.
Example recipient address information | Example structured information |
Amina Patel - Phone number (474) 598-1543 - 1425 S 5th St, Apt 3B, Allentown, Pennsylvania 18104 | |
Data preparation
To distill knowledge from the teacher model (Qwen3-235B-A22B) to Qwen3-0.6B, use the teacher model's API to extract recipient addresses into structured JSON. Because generating this data is time-consuming, this guide provides a sample training datasettrain.json and validation seteval.json.
The data used in this guide is synthetically generated and contains no sensitive user information.
Going live
Fine-tune the model
-
In the navigation pane, click Model Gallery, find Qwen3-0.6B, and click Fine-tune.

-
Configure the training parameters. Only the following key parameters need to be set; leave the rest at their defaults.
-
Training Mode: The default is SFT (Supervised Fine-Tuning) using the LoRA method.
LoRA is an efficient fine-tuning technique that saves resources by updating only a small subset of model parameters.
-
Training dataset: Download the sample training dataset train.json. On the configuration page, select OSS file or directory, click the
icon to select a bucket, click Upload File to upload the dataset to OSS, and select the file.
-
Validate dataset: Download the sample validation dataset eval.json. Click Add validation dataset and repeat the upload process.
The validation dataset evaluates model performance on unseen data during training.
-
Model output path: The fine-tuned model is saved to an OSS path by default. If the folder does not exist, click Create folder to create one.
-
Resource Group Type: Select Public Resource Group. This job requires about 5 GB of GPU memory. The console filters available instance types accordingly. Select an instance type, such as
ecs.gn7i-c16g1.4xlarge. -
Hyperparameters:
-
learning_rate: Set to 0.0005
-
num_train_epochs: Set to 4
-
per_device_train_batch_size: Set to 8
-
seq_length: Set to 512
Then, click Train > OK. The training job status changes from Creating to In operation, which starts the model fine-tuning.
-
-
-
Wait for the fine-tuning to complete (about 10 minutes). The task details page shows logs and metric curves during training. After the job completes, the fine-tuned model is saved to the specified OSS folder.
To view job details later, go to Model Gallery > Job Management > Training Jobs and click the job name.

Deploy the fine-tuned model
On the training job details page, click Deploy. For Resource Type, select Public Resources. The 0.6B model requires about 5 GB of GPU memory. Under Instance Type, only qualifying specifications are listed. Select an option such as ecs.gn7i-c8g1.2xlarge, keep other parameters at defaults, and click Deploy > OK.
Deployment takes about 5 minutes and is complete when the status changes to Running.
To view training job details later, go to Model Gallery > Job Management > Training Jobs and click the job name.

If Deploy is disabled after the training job completes, the output model is still being registered. Wait about one minute.

To invoke the model, follow the steps in Invoke the model.
Evaluate the fine-tuned model
Evaluate the fine-tuned model's performance before deploying to production. This ensures the model is stable and accurate.
Prepare test data
Prepare a test dataset with no overlap with the training data. The accuracy test code below automatically downloads the required test set.
Test data must not contain training samples. This ensures accurate evaluation of the model's generalization ability and prevents inflated scores from memorization.
Design evaluation metrics
Evaluation criteria must align with your business goals. For this use case, in addition to validating JSON output format, confirm that key-value pairs are correct.
Define evaluation metrics programmatically. For an implementation example, see the compare_address_info method in the code below.
Evaluate model performance
Run the following code to calculate model accuracy on the test set.
Output:
All predictions complete! Results have been saved to predicted_labels.jsonl
Number of samples: 400
Correct responses: 382
Incorrect responses: 18
Accuracy: 95.5 %
Your accuracy may differ due to the random seed used during fine-tuning and the stochastic nature of LLM output.
The model achieves 95.5% accuracy, up from 50% before fine-tuning. This demonstrates that fine-tuning substantially improves structured information extraction for logistics data entry.
This guide uses only 4 training epochs to reduce training time. You can further improve accuracy by increasing the number of epochs.
Clean up resources
The model service uses pay-as-you-go public resources. Stop or delete the service when you no longer need it to avoid further charges.

References
-
Model Gallery features (evaluation, compression, and more): Model Gallery.
-
EAS features (Auto Scaling, stress testing, monitoring and alerting): EAS overview.

