This solution leverages intent recognition technology of large language models (LLMs) to learn complex language patterns and user behavior patterns from massive data volumes. It enables more accurate recognition of user intents and provides a smoother, more natural interaction experience. This topic describes the complete development process of an LLM-based intent recognition solution using the Qwen1.5 large language model.
Background information
What is intent recognition?
AI agents interpret user requirements described in natural language to perform appropriate operations or provide relevant information. These agents have become essential components in intelligent interaction systems. LLM-based intent recognition technology has gained significant attention in the industry and is widely applied.
Typical scenarios of intent recognition technology
In intelligent voice assistant scenarios, users interact with voice assistants using simple voice commands. For example, when a user says "I want to listen to music" to a voice assistant, the system must accurately recognize that the user requirement is to play music, and then perform the related operation.
In intelligent customer service scenarios, the challenge lies in handling various customer service requests and quickly classifying them into different processes, such as returns, exchanges, and complaints. For example, a user may say "I received a defective item and I want to return it" to the customer service system of an e-commerce platform. In this case, the LLM-based intent recognition system must quickly capture that the user's intent is to "return an item" and trigger the return process to guide the user through subsequent operations.
Working process
The following figure shows the working process of the LLM-based intent recognition solution.
You need to prepare training datasets for specific business scenarios based on data format requirements and data preparation strategies. You can also prepare business data based on data preparation strategies and use iTAG to label raw data. Then, you need to export labeling results and convert them into data formats supported by QuickStart of Platform for AI (PAI) for subsequent model training.
Train and perform an offline evaluation on a model
In QuickStart, you can train the Qwen1.5-1.8B-Chat model. After training the model, you need to perform an offline evaluation.
Deploy and call a model service
If the model evaluation results meet your expectations, you can use QuickStart to deploy the trained model to Elastic Algorithm Service (EAS) as an online service.
Prerequisites
Before performing the operations described in this topic, make sure you have completed the following preparations:
Deep Learning Containers (DLC) and EAS of PAI are activated on a pay-as-you-go basis and a default workspace is created. For more information, see Activate PAI and create a default workspace.
An Object Storage Service (OSS) bucket is created to store training data and the model file obtained from model training. For information about how to create a bucket, see Quick Start.
Prepare training data
You can prepare training data using one of the following methods:
Method 1: Build a training dataset based on data preparation strategies and data format requirements.
Method 2: Use iTAG to label data based on data preparation strategies. This method is suitable for large-scale data scenarios and can significantly improve labeling efficiency.
Data preparation strategies
To improve the effectiveness and stability of training, you can prepare data based on the following strategies:
In single-intent recognition scenarios, ensure that at least 50 to 100 data records are labeled for each type of intent. If the model performance after fine-tuning does not meet your expectations, you can increase the number of labeled data records. Additionally, you must ensure that the quantity of labeled data records for each type of intent is balanced.
In multi-intent recognition scenarios or multi-round chat scenarios, we recommend that the quantity of labeled data records is more than 20% of the quantity in single-intent recognition scenarios, and the intents involved in multi-intent recognition or multi-round chat scenarios must have occurred in single-intent recognition scenarios.
Intent descriptions need to cover as many phrasings and scenarios as possible.
Data format requirements
The training data must be saved in a JSON file, which contains the instruction and output fields. The output field corresponds to the intent predicted by a model and related parameters. The following sample code provides examples of training data in different intent recognition scenarios.
In single-intent recognition scenarios, you need to prepare business data for a specific business scenario to fine-tune an LLM. The following sample code provides an example of single-round chats for the smart home scenario.
[ { "instruction": "I want to listen to music", "output": "play_music()" }, { "instruction": "Too loud, turn the sound down", "output": "volume_down()" }, { "instruction": "I do not want to listen to this, turn it off", "output": "music_exit()" }, { "instruction": "I want to visit Hangzhou. Check the weather forecast for me", "output": "weather_search(Hangzhou)" }, ]In multi-intent recognition or multi-round chat scenarios, the user's intents may be expressed across multiple rounds in a chat. In this case, you can prepare multiple rounds of chat data and label the multi-round inputs. The following sample code provides an example of multi-round chats for a voice assistant:
User: I want to listen to music. Assistant: What kind of music? User: Play *** music. Assistant: play_music(***)The training data for multi-round chats is in the following format:
[ { "instruction": "I want to listen to music. Play *** music.", "output": "play_music(***)" } ]
The sequence length for model training in multi-round chats is significantly longer, and the number of intention recognition scenarios that use multi-round chats is limited. We recommend using the multi-round chat mode for model training only if the single-round chat mode cannot meet your business requirements. The following section provides an example of the single-round chat mode to illustrate the complete process.
Use iTAG to label data
You can label data in iTAG of PAI to generate a training dataset that meets specific requirements by performing the following steps:
Register the data used for iTAG labeling to a PAI dataset.
Prepare a data file in the manifest format. For more information, see data preparation strategies. Example:
{"data":{"instruction": "I want to listen to music"}} {"data":{"instruction": "Too loud, turn the sound down"}} {"data":{"instruction": "I do not want to listen to this, turn it off"}} {"data":{"instruction": "I want to visit Hangzhou. Check the weather forecast for me"}}Go to the AI Asset Management > Datasets page, select the target workspace, and click Enter Datasets.
Click Create Dataset and configure the key parameters described in the following table. For information about other parameters, see Create and manage datasets.
Parameter
Description
Storage Type
Select OSS.
Import Format
Select File.
OSS Path
Select the created OSS directory and upload the manifest file that you prepared by performing the following steps:
Click the
button, and in the Select OSS File dialog box, click Upload File.Click View Local Files or Drag And Drop Files to upload the manifest file as prompted.
Go to the Data Preparation > iTAG page, click Go To Management Page, and switch to the Template Management tab.
Click Create Template, select , and click Edit. After you configure the parameters, click Save Template. The following table describes the key parameters. For information about other parameters, see Manage templates.
Configuration
Description
Basic Template Canvas
Select Text and click Generate Content Card.

Click the text area. In the Import Data dialog box, select an existing dataset. In the Configuration For Basic Template section, select Dataset Field > instruction.
Basic Template Answers
Select Input Field and click Generate Title Card. Then change Title to output.

In the navigation pane on the left, choose . On the Task Management tab, click Create Task. On the Create Labeling Job page, configure the parameters and click Create. The following table describes the key parameters. For information about other parameters, see Create a labeling job.
Parameter
Description
Input Dataset
Select the dataset that you created in Step 1.
NoteNote that the data must match the template.
Template Type
Select Custom Template and select an existing template from the drop-down list.
After you create the labeling job, label the data. For more information, see Process labeling jobs.

After you label the data, export the labeling results to an OSS directory. For more information, see Export labeling results.
The following sample code shows an example of the exported manifest file. For information about the data format, see Overview.
{"data":{"instruction":"I want to listen to music","_itag_index":""},"label-1947839552568066048-system":{"fixedFlag":0,"results":[{"MarkResultId":"1947839554911772672","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"play_music()\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236185165","MarkTime":"Wed Jul 23 10:03:05 CST 2025","UserMarkResultId":null,"IsNeedVoteJudge":false}],"abandonFlag":0},"label-1947839552568066048":{"results":[{"MarkResultId":"1947839554911772672","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"play_music()\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236185165","MarkTime":"Wed Jul 23 10:03:05 CST 2025","UserMarkResultId":"1947839763671740416","IsNeedVoteJudge":false}]},"abandonFlag":0,"abandonRemark":null} {"data":{"instruction":"Too loud, turn the sound down","_itag_index":""},"label-1947839552568066048-system":{"fixedFlag":0,"results":[{"MarkResultId":"1947839554891464704","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"volume_down()\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236198979","MarkTime":"Wed Jul 23 10:03:19 CST 2025","UserMarkResultId":null,"IsNeedVoteJudge":false}],"abandonFlag":0},"label-1947839552568066048":{"results":[{"MarkResultId":"1947839554891464704","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"volume_down()\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236198979","MarkTime":"Wed Jul 23 10:03:19 CST 2025","UserMarkResultId":"1947839868520656896","IsNeedVoteJudge":false}]},"abandonFlag":0,"abandonRemark":null} {"data":{"instruction":"I do not want to listen to this, turn it off","_itag_index":""},"label-1947839552568066048-system":{"fixedFlag":0,"results":[{"MarkResultId":"1947839554992373760","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"music_exit()\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236212152","MarkTime":"Wed Jul 23 10:03:32 CST 2025","UserMarkResultId":null,"IsNeedVoteJudge":false}],"abandonFlag":0},"label-1947839552568066048":{"results":[{"MarkResultId":"1947839554992373760","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"music_exit()\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236212152","MarkTime":"Wed Jul 23 10:03:32 CST 2025","UserMarkResultId":"1947839936657285120","IsNeedVoteJudge":false}]},"abandonFlag":0,"abandonRemark":null} {"data":{"instruction":"I want to visit Hangzhou. Check the weather forecast for me","_itag_index":""},"label-1947839552568066048-system":{"fixedFlag":0,"results":[{"MarkResultId":"1947839554971426816","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"weather_search(Hangzhou)\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236218730","MarkTime":"Wed Jul 23 10:03:39 CST 2025","UserMarkResultId":null,"IsNeedVoteJudge":false}],"abandonFlag":0},"label-1947839552568066048":{"results":[{"MarkResultId":"1947839554971426816","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"weather_search(Hangzhou)\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236218730","MarkTime":"Wed Jul 23 10:03:39 CST 2025","UserMarkResultId":"1947839975890939904","IsNeedVoteJudge":false}]},"abandonFlag":0,"abandonRemark":null}In the terminal, use the following Python script to convert the manifest-formatted labeling result file into a training data format suitable for QuickStart.
import json # Input file path and output file path input_file_path = 'test_json.manifest' output_file_path = 'train.json' converted_data = [] with open(input_file_path, 'r', encoding='utf-8') as file: for line in file: try: # Parse JSON data for each line data = json.loads(line) # Extract instruction instruction = data['data']['instruction'] # Iterate through all keys starting with "label-" for key in data.keys(): if key.startswith('label-'): # Extract MarkResult and parse its content mark_result_str = data[key]['results'][0]['MarkResult'] mark_result = json.loads(mark_result_str) # Parse MarkResult string as JSON # Extract labels["output"] from annotations output = mark_result['annotations'][0]['labels']['output'] # Build new data structure converted_data.append({ 'instruction': instruction, 'output': output }) break except Exception as e: print(f"Error processing line: {line.strip()}. Error: {e}") # Write converted data to output file with open(output_file_path, 'w', encoding='utf-8') as outfile: json.dump(converted_data, outfile, ensure_ascii=False, indent=4) print(f"Conversion completed. Output saved to {output_file_path}")The output is a JSON file.
Train and perform an offline evaluation on a model
Train a model
QuickStart integrates high-quality pre-trained models from open source Artificial Intelligence (AI) communities. It lets you implement the complete process from model training and deployment to inference without writing code. This greatly simplifies the model development process.
In this example, the Qwen1.5-1.8B-Chat model is used to illustrate how to use the prepared training data to train a model in QuickStart. To train a model, perform the following steps:
Go to the Model Gallery page.
Log on to the PAI console.
In the upper-left corner, select a region.
In the left-side navigation pane, choose Workspaces, and click the name of the target workspace to enter it.
In the left-side navigation pane, choose QuickStart > Model Gallery.
In the model list of the Model Gallery page, search for and click the Qwen1.5-1.8B-Chat model.
In the upper-right corner of the model details page, click Train. In the Train panel, configure the key parameters described in the following table. Use the default settings of other parameters.
Parameter
Description
Training Mode
Full-Parameter Fine-Tuning: This mode requires more resources and has a long training time, but delivers good training results.
NoteModels with a small number of parameters support full-parameter fine-tuning. Select Full-Parameter Fine-Tuning as needed.
QLoRA: This is a lightweight fine-tuning mode. Compared with full-parameter fine-tuning, Quantized Low-Rank Adaptation (QLoRA) requires less resources and has a shorter training time, but its training results are not as good.
LoRA: This mode is similar to QLoRA.
Dataset Configuration
Training dataset
To select a prepared training dataset, perform the following steps:
Select OSS file or directory in the drop-down list.
Click the
button to select an OSS directory.In the Select OSS file dialog box, click Upload file, drag the prepared training dataset file to the blank area, and then click OK.
Output Configuration
ModelOutput Path
Select an OSS directory to store the output configuration file and model file.
TensorboardOutput Path
Hyperparameters
For more information about hyperparameters, see Table 1. Full hyperparameters.
We recommend that you configure the hyperparameters based on the following configuration strategies. For information about recommended hyperparameter configurations, see Table 2. Recommended hyperparameter configurations.
Configure hyperparameters based on different training methods.
Global batch size = Number of GPUs × per_device_train_batch_size × gradient_accumulation_stepsTo maximize training performance, increase the number of GPUs and set per_device_train_batch_size to a higher value first.
In most cases, the global batch size ranges from 64 to 256. If a small amount of training data is involved, you can appropriately reduce the global batch size.
You need to configure the seq_length parameter as needed. For example, if the maximum length of a text sequence in a dataset is 50, you can set this parameter to 64 (a power of 2).
If the training loss decreases too slowly or does not converge, we recommend that you increase the learning rate specified by the learning_rate parameter. You also need to confirm whether the quality of the training data is guaranteed.
Click Train. In the Billing Notification message, click OK.
The system automatically navigates to the training job details page. After the training job runs, you can view the status and training logs of the training job.

Evaluate a model offline
After you train a model, you can use a Python script to evaluate the model in the terminal.
Prepare the evaluation data file testdata.json. Sample content:
[ { "instruction": "Who sings the song Ten Years?", "output": "music_query_player(Ten Years)" }, { "instruction": "What is the weather like in Hangzhou today?", "output": "weather_search(Hangzhou)" } ]In the terminal, use the following Python script to evaluate the model offline.
#encoding=utf-8 from transformers import AutoModelForCausalLM, AutoTokenizer import json from tqdm import tqdm device = "cuda" # the device to load the model onto # Modify the path of the model. model_name = '/mnt/workspace/model/qwen14b-lora-3e4-256-train/' print(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) count = 0 ecount = 0 # Modify the path in which the training data is stored. test_data = json.load(open('/mnt/workspace/data/testdata.json')) system_prompt = 'You are an intent recognition expert. You can recognize an intent based on user questions and return the corresponding function invocation and parameters.' for i in tqdm(test_data[:]): prompt = '<|im_start|>system\n' + system_prompt + '<|im_end|>\n<|im_start|>user\n' + i['instruction'] + '<|im_end|>\n<|im_start|>assistant\n' gold = i['output'] gold = gold.split(';')[0] if ';' in gold else gold model_inputs = tokenizer([prompt], return_tensors="pt").to(device) generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=64, pad_token_id=tokenizer.eos_token_id, eos_token_id=tokenizer.eos_token_id, do_sample=False ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] pred = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] if gold.split('(')[0] == pred.split('(')[0]: count += 1 gold_list = set(gold.strip()[:-1].split('(')[1].split(',')) pred_list = set(pred.strip()[:-1].split('(')[1].split(',')) if gold_list == pred_list: ecount += 1 else: pass print("Intent recognition accuracy:", count/len(test_data)) print("Parameter recognition accuracy:", ecount/len(test_data))NoteIf the code execution returns the message
Using low_cpu_mem_usage=True or a device_map requires Accelerate: pip install accelerate, follow the prompt to runpip install accelerateto install the dependency library.
Deploy and call a model service
Deploy a model service
After you train a model, you can deploy the model as an online service in EAS by performing the following steps:
In the upper-right corner of the Task Details page, click Deploy. The system automatically configures the basic information and resource information. For Deployment Method, select VLLM Accelerated Deployment. You can modify the parameters as needed. After you configure the parameters, click Deploy.
In the Billing Notification message, click OK.
The system automatically navigates to the deployment task page. When the Status is Running, the service is deployed successfully.
Call a model service
The following example shows how to call the API using the client:
Obtain the endpoint and token of the model service.
In the Basic Information section of the Service Details page, click View Call Information.

In the Call Information dialog box, view and save the endpoint and token of the model service to your on-premises machine.
The following example shows how to call the service using the vLLM accelerated deployment method. You can run this code in the terminal to call the service.
from openai import OpenAI ##### API Configuration ##### openai_api_key = "<EAS_SERVICE_TOKEN>" openai_api_base = "<EAS_SERVICE_URL>/v1/" client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) models = client.models.list() model = models.data[0].id print(model) def main(): stream = True chat_completion = client.chat.completions.create( messages=[ { "role": "system", "content": [ { "type": "text", "text": "You are an intent recognition expert. You can recognize an intent based on user questions and return the corresponding intent and parameters.", } ], }, { "role": "user", "content": [ { "type": "text", "text": "I want to listen to music", } ], } ], model=model, max_completion_tokens=2048, stream=stream, ) if stream: for chunk in chat_completion: print(chunk.choices[0].delta.content, end="") else: result = chat_completion.choices[0].message.content print(result) if __name__ == "__main__": main()Where:
<EAS_SERVICE_URL>: the endpoint of your model service.
<EAS_SERVICE_TOKEN>: the token of your model service.
References
For more information about how to use iTAG and the format requirements for data labeling, see iTAG.
For more information about EAS, see Elastic Algorithm Service.
You can use QuickStart of PAI to train and deploy models in different scenarios, including Llama-3, Qwen1.5, and Stable Diffusion V1.5 models. For more information, see Scenario-specific practices.