All Products
Search
Document Center

Platform For AI:Develop an LLM-based intent recognition solution

Last Updated:Mar 11, 2026

Build an intent recognition solution using Qwen1.5 LLM: prepare training data, fine-tune the model, evaluate performance, and deploy as a service.

Background

Intent recognition overview

AI agents interpret user requirements in natural language to perform operations or provide information. LLM-based intent recognition technology powers intelligent interaction systems across industries.

Use cases

  • Voice assistants: Users interact through voice commands. When a user says "I want to listen to music", the system recognizes the intent to play music and executes the action.

  • Customer service: Systems classify requests into processes like returns, exchanges, and complaints. When a user says "I received a defective item and I want to return it", the system recognizes the "return" intent and triggers the return workflow.

Workflow

The following diagram shows the LLM-based intent recognition solution workflow.

image
  1. Prepare training data

    Prepare training datasets for specific business scenarios based on data format requirements and preparation strategies. Use iTAG to label raw business data, export labeling results, and convert them into formats supported by PAI QuickStart for model training.

  2. Train and evaluate a model

    Train the Qwen1.5-1.8B-Chat model in QuickStart and perform an offline evaluation.

  3. Deploy and call the model service

    Deploy the trained model to Elastic Algorithm Service (EAS) as an online service if evaluation results meet expectations.

Prerequisites

Complete these preparations before starting:

  • Activate Deep Learning Containers (DLC) and EAS of PAI on a pay-as-you-go basis and create a default workspace. For more information, see Activate PAI and create a default workspace.

  • Create an Object Storage Service (OSS) bucket to store training data and model files. For more information, see Quick Start.

Prepare training data

Prepare training data using one of the following methods:

Data preparation strategies

To improve training effectiveness and stability, prepare data based on the following strategies:

  • For single-intent recognition scenarios, label at least 50 to 100 data records for each intent type. Ensure the quantity of labeled data records for each intent type is balanced. If model performance after fine-tuning does not meet expectations, increase the number of labeled records.

  • For multi-intent recognition or multi-round chat scenarios, use at least 20% of the quantity in single-intent scenarios. All intents in multi-intent or multi-round scenarios must have occurred in single-intent scenarios.

  • Cover as many phrasings and scenarios as possible in intent descriptions.

Data format requirements

Save training data in a JSON file containing the instruction and output fields. The output field corresponds to the intent predicted by the model and related parameters. The following examples show training data for different intent recognition scenarios.

  • For single-intent recognition scenarios, prepare business data for a specific scenario to fine-tune the LLM. The following example shows single-round chats for a smart home scenario.

    [
        {
            "instruction": "I want to listen to music",
            "output": "play_music()"
        },
        {
            "instruction": "Too loud, turn the sound down",
            "output": "volume_down()"
        },
        {
            "instruction": "I do not want to listen to this, turn it off",
            "output": "music_exit()"
        },
        {
            "instruction": "I want to visit Hangzhou. Check the weather forecast for me",
            "output": "weather_search(Hangzhou)"
        },
    ]
  • For multi-intent recognition or multi-round chat scenarios, user intents may be expressed across multiple rounds in a chat. Prepare multiple rounds of chat data and label the multi-round inputs. The following example shows multi-round chats for a voice assistant:

    User: I want to listen to music.
    Assistant: What kind of music?
    User: Play *** music.
    Assistant: play_music(***)

    The training data for multi-round chats is in the following format:

    [
        {
            "instruction": "I want to listen to music. Play *** music.",
            "output": "play_music(***)"
        }
    ]

Multi-round chat training requires significantly longer sequence lengths, and scenarios using multi-round chats are limited. Use the multi-round chat mode only if single-round chat cannot meet business requirements. The following section uses single-round chat to illustrate the complete process.

Use iTAG to label data

Label data in iTAG of PAI to generate a training dataset that meets specific requirements:

  1. Register the data used for iTAG labeling to a PAI dataset.

    1. Prepare a data file in the manifest format. For more information, see data preparation strategies. Example:

      {"data":{"instruction": "I want to listen to music"}}
      {"data":{"instruction": "Too loud, turn the sound down"}}
      {"data":{"instruction": "I do not want to listen to this, turn it off"}}
      {"data":{"instruction": "I want to visit Hangzhou. Check the weather forecast for me"}}
    2. Go to the AI Asset Management > Datasets page, select the target workspace, and click Enter Datasets.

    3. Click Create Dataset and configure the key parameters described in the following table. For information about other parameters, see Create and manage datasets.

      Parameter

      Description

      Storage Type

      Select Alibaba Cloud Object Storage Service (OSS).

      Import Format

      Select File.

      OSS Path

      Select the created OSS directory and upload the manifest file that you prepared by performing the following steps:

      1. Click the image button, and in the Select OSS file dialog box, click Upload File.

      2. Click Browse Local Files or Drag and Drop File to Upload to upload the manifest file as prompted.

  2. Go to the Data Preparation > iTAG page, click Go to Management Page, and switch to the Template Management tab.

  3. Click Create Template, select Custom Template > Basic Templates, and click Edit. After you configure the parameters, click Save Template Name. The following table describes the key parameters. For information about other parameters, see Manage templates.

    Configuration

    Description

    Basic Template Canvas

    1. Select Text and click Generate Content Card.image

    2. Click the text area. In the Import Dataset dialog box, select an existing dataset. In the Configuration For Basic Template section, select Dataset Field Name > instruction.

    Basic Template Answers

    Select Input Field and click Generate Title Card. Then change Title to output.image

  4. In the navigation pane on the left, choose Management Center > Task Management. On the Task Management tab, click Create Task. On the Create Labeling Job page, configure the parameters and click Create. The following table describes the key parameters. For information about other parameters, see Create a labeling job.

    Parameter

    Description

    Input data set

    Select the dataset that you created in Step 1.

    Note

    Note that the data must match the template.

    Template Type

    Select Custom Template and select an existing template from the drop-down list.

  5. After you create the labeling job, label the data. For more information, see Process labeling jobs.image

  6. After you label the data, export the labeling results to an OSS directory. For more information, see Export labeling results.

    The following sample code shows an example of the exported manifest file. For information about the data format, see Overview.

    {"data":{"instruction":"I want to listen to music","_itag_index":""},"label-1947839552568066048-system":{"fixedFlag":0,"results":[{"MarkResultId":"1947839554911772672","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"play_music()\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236185165","MarkTime":"Wed Jul 23 10:03:05 CST 2025","UserMarkResultId":null,"IsNeedVoteJudge":false}],"abandonFlag":0},"label-1947839552568066048":{"results":[{"MarkResultId":"1947839554911772672","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"play_music()\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236185165","MarkTime":"Wed Jul 23 10:03:05 CST 2025","UserMarkResultId":"1947839763671740416","IsNeedVoteJudge":false}]},"abandonFlag":0,"abandonRemark":null}
    {"data":{"instruction":"Too loud, turn the sound down","_itag_index":""},"label-1947839552568066048-system":{"fixedFlag":0,"results":[{"MarkResultId":"1947839554891464704","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"volume_down()\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236198979","MarkTime":"Wed Jul 23 10:03:19 CST 2025","UserMarkResultId":null,"IsNeedVoteJudge":false}],"abandonFlag":0},"label-1947839552568066048":{"results":[{"MarkResultId":"1947839554891464704","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"volume_down()\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236198979","MarkTime":"Wed Jul 23 10:03:19 CST 2025","UserMarkResultId":"1947839868520656896","IsNeedVoteJudge":false}]},"abandonFlag":0,"abandonRemark":null}
    {"data":{"instruction":"I do not want to listen to this, turn it off","_itag_index":""},"label-1947839552568066048-system":{"fixedFlag":0,"results":[{"MarkResultId":"1947839554992373760","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"music_exit()\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236212152","MarkTime":"Wed Jul 23 10:03:32 CST 2025","UserMarkResultId":null,"IsNeedVoteJudge":false}],"abandonFlag":0},"label-1947839552568066048":{"results":[{"MarkResultId":"1947839554992373760","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"music_exit()\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236212152","MarkTime":"Wed Jul 23 10:03:32 CST 2025","UserMarkResultId":"1947839936657285120","IsNeedVoteJudge":false}]},"abandonFlag":0,"abandonRemark":null}
    {"data":{"instruction":"I want to visit Hangzhou. Check the weather forecast for me","_itag_index":""},"label-1947839552568066048-system":{"fixedFlag":0,"results":[{"MarkResultId":"1947839554971426816","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"weather_search(Hangzhou)\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236218730","MarkTime":"Wed Jul 23 10:03:39 CST 2025","UserMarkResultId":null,"IsNeedVoteJudge":false}],"abandonFlag":0},"label-1947839552568066048":{"results":[{"MarkResultId":"1947839554971426816","MarkTitle":"Basic Template","MarkResult":"{\"tabId\":\"CommonExtensions\",\"annotations\":[{\"id\":null,\"labels\":{\"output\":\"weather_search(Hangzhou)\"},\"exif\":null}],\"type\":\"CommonExtensions\",\"version\":\"v2\"}","QuestionId":"CommonExtensions","ResultType":"OPEN_GROUP","Progress":null,"Version":"1753236218730","MarkTime":"Wed Jul 23 10:03:39 CST 2025","UserMarkResultId":"1947839975890939904","IsNeedVoteJudge":false}]},"abandonFlag":0,"abandonRemark":null}
  7. In the terminal, use the following Python script to convert the manifest-formatted labeling result file into a training data format suitable for QuickStart.

    import json
    
    # Input file path and output file path
    input_file_path = 'test_json.manifest'
    output_file_path = 'train.json'
    
    converted_data = []
    
    with open(input_file_path, 'r', encoding='utf-8') as file:
        for line in file:
            try:
                # Parse JSON data for each line
                data = json.loads(line)
             
                # Extract instruction
                instruction = data['data']['instruction']
              
                # Iterate through all keys starting with "label-"
                for key in data.keys():
                    if key.startswith('label-'):
                        # Extract MarkResult and parse its content
                        mark_result_str = data[key]['results'][0]['MarkResult']
                        mark_result = json.loads(mark_result_str)  # Parse MarkResult string as JSON
                   
                        # Extract labels["output"] from annotations
                        output = mark_result['annotations'][0]['labels']['output']
                     
                        # Build new data structure
                        converted_data.append({
                            'instruction': instruction,
                            'output': output
                        })
                        break
              
            except Exception as e:
                print(f"Error processing line: {line.strip()}. Error: {e}")
    
    # Write converted data to output file
    with open(output_file_path, 'w', encoding='utf-8') as outfile:
        json.dump(converted_data, outfile, ensure_ascii=False, indent=4)
    
    print(f"Conversion completed. Output saved to {output_file_path}")
    

    The output is a JSON file.

Train and evaluate a model

Train a model

QuickStart integrates high-quality pre-trained models from open source AI communities and implements the complete process from model training and deployment to inference without code, simplifying model development.

This example uses the Qwen1.5-1.8B-Chat model to illustrate how to train a model in QuickStart using the prepared training data:

  1. Go to Model Gallery page.

    1. Log on to the PAI console.

    2. In the upper-left corner, select a region.

    3. In the left-side navigation pane, choose Workspaces, and click the target workspace name to enter it.

    4. In the left-side navigation pane, choose QuickStart > Model Gallery.

  2. In the model list of the Model Gallery page, search for and click the Qwen1.5-1.8B-Chat model.

  3. In the upper-right corner of the model details page, click Train. In the Train panel, configure the key parameters described in the following table. Use default settings for other parameters.

    Parameter

    Description

    Training Mode

    • Full-Parameter Fine-Tuning: Requires more resources and longer training time but delivers better results.

      Note

      Models with few parameters support full-parameter fine-tuning. Select this mode as needed.

    • QLoRA: A lightweight fine-tuning mode. Compared with full-parameter fine-tuning, Quantized Low-Rank Adaptation (QLoRA) requires fewer resources and shorter training time, but delivers inferior results.

    • LoRA: This mode is similar to QLoRA.

    Dataset configuration

    Training dataset

    To select a prepared training dataset:

    1. Select OSS file or directory in the drop-down list.

    2. Click the image button to select an OSS directory.

    3. In the Select OSS File dialog box, click Upload File, drag the prepared training dataset file to the blank area, and then click OK.

    Output Configuration

    Model output path

    Select an OSS directory to store the output configuration file and model file.

    TensorboardOutput Path

    Hyperparameter Configuration

    For more information about hyperparameters, see Table 1. Full hyperparameters.

    We recommend that you configure the hyperparameters based on the following configuration strategies. For information about recommended hyperparameter configurations, see Table 2. Recommended hyperparameter configurations.

    • Configure hyperparameters based on different training methods.

    • Global batch size = Number of GPUs × per_device_train_batch_size × gradient_accumulation_steps

      • To maximize training performance, increase the number of GPUs and set per_device_train_batch_size to a higher value first.

      • In most cases, the global batch size ranges from 64 to 256. For small training datasets, reduce the global batch size appropriately.

    • Configure the seq_length parameter based on your dataset. For example, if the maximum text sequence length in a dataset is 50, set this parameter to 64 (a power of 2).

    • If the training loss decreases too slowly or does not converge, increase the learning rate specified by the learning_rate parameter. Also confirm whether the training data quality is adequate.

    Table 1. Full hyperparameters

    Hyperparameter

    Type

    Description

    Default value

    learning_rate

    FLOAT

    The learning rate of model training.

    5e-5

    num_train_epochs

    INT

    The number of epochs.

    1

    per_device_train_batch_size

    INT

    The amount of data processed by each GPU in one training iteration.

    1

    seq_length

    INT

    The length of the text sequence.

    128

    lora_dim

    INT

    The inner dimensions of the low-rank matrices that are used in Low-Rank Adaptation (LoRA) or QLoRA training. Set this parameter to a value greater than 0.

    32

    lora_alpha

    INT

    The LoRA or QLoRA weights. This parameter takes effect only if you set the lora_dim parameter to a value greater than 0.

    32

    load_in_4bit

    BOOL

    Specifies whether to load the model in 4-bit quantization. This parameter takes effect only if you set the lora_dim parameter to a value greater than 0, the load_in_4bit parameter to true, and the load_in_8bit parameter to false.

    false

    load_in_8bit

    BOOL

    Specifies whether to load the model in 8-bit quantization. This parameter takes effect only if you set the lora_dim parameter to a value greater than 0, the load_in_4bit parameter to false, and the load_in_8bit parameter to true.

    false

    gradient_accumulation_steps

    INT

    The number of gradient accumulation steps.

    8

    apply_chat_template

    BOOL

    Specifies whether the algorithm combines the training data with the default chat template. Qwen1.5 models use the following format:

    • Question: <|im_start|>user\n + instruction + <|im_end|>\n

    • Answer: <|im_start|>assistant\n + output + <|im_end|>\n

    true

    system_prompt

    STRING

    The default system prompt for model training. This parameter takes effect only if you set the apply_chat_template parameter to true. You can configure a custom system prompt during the training of a Qwen1.5 model to allow the LLM to assume a specific role. The algorithm automatically expands the training data. You do not need to pay attention to the execution details. For example, you set system_prompt to "You are an intent recognition expert. You can recognize an intent based on user questions and return the corresponding intent and parameters." In this case, the following training sample is provided:

    [
        {
            "instruction": "I want to listen to music",
            "output": "play_music()"
        }
    ]

    The training data is in the following format:

    <|im_start|>system\nYou are an intent recognition expert. You can recognize an intent based on user questions and return the corresponding intent and parameters<|im_end|>\n<|im_start|>user\nI want to listen to music<|im_end|>\n<|im_start|>assistant\nplay_music()<|im_end|>\n

    You are a helpful assistant

    Table 2. Recommended hyperparameter configurations

    Parameter

    Full-parameter fine-tuning

    LoRA/QLoRA

    learning_rate

    5e-6 and 5e-5

    3e-4

    Global batch size

    256

    256

    seq_length

    256

    256

    num_train_epochs

    3

    5

    lora_dim

    0

    64

    lora_alpha

    0

    16

    load_in_4bit

    False

    True/False

    load_in_8bit

    False

    True/False

  4. Click Fine-tune. In the Billing Notification message, click OK.

    The system automatically navigates to the training job details page. After the training job runs, view the status and training logs.image

Evaluate a model

After training a model, use a Python script to evaluate the model in the terminal.

  1. Prepare the evaluation data file testdata.json. Sample content:

    [
        {
            "instruction": "Who sings the song Ten Years?",
            "output": "music_query_player(Ten Years)"
        },
        {
            "instruction": "What is the weather like in Hangzhou today?",
            "output": "weather_search(Hangzhou)"
        }
    ]
  2. In the terminal, use the following Python script to evaluate the model.

    #encoding=utf-8
    from transformers import AutoModelForCausalLM, AutoTokenizer
    import json
    from tqdm import tqdm
    
    device = "cuda" # the device to load the model onto
    
    # Modify the path of the model.
    model_name = '/mnt/workspace/model/qwen14b-lora-3e4-256-train/'
    print(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype="auto",
        device_map="auto"
    )
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    count = 0
    ecount = 0
    
    
    # Modify the path in which the training data is stored.
    test_data = json.load(open('/mnt/workspace/data/testdata.json'))
    system_prompt = 'You are an intent recognition expert. You can recognize an intent based on user questions and return the corresponding function invocation and parameters.'
    
    for i in tqdm(test_data[:]):
        prompt = '<|im_start|>system\n' + system_prompt + '<|im_end|>\n<|im_start|>user\n' + i['instruction'] + '<|im_end|>\n<|im_start|>assistant\n'
        gold = i['output']
        gold = gold.split(';')[0] if ';' in gold else gold
    
        model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
        generated_ids = model.generate(
            model_inputs.input_ids,
            max_new_tokens=64,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
            do_sample=False
        )
        generated_ids = [
            output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
        ]
        pred = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
        if gold.split('(')[0] == pred.split('(')[0]:
            count += 1
            gold_list = set(gold.strip()[:-1].split('(')[1].split(','))
            pred_list = set(pred.strip()[:-1].split('(')[1].split(','))
            if gold_list == pred_list:
                ecount += 1
        else:
            pass
    
    print("Intent recognition accuracy:", count/len(test_data))
    print("Parameter recognition accuracy:", ecount/len(test_data))
    Note

    If the code execution returns the message Using low_cpu_mem_usage=True or a device_map requires Accelerate: pip install accelerate, follow the prompt to run pip install accelerate to install the dependency library.

Deploy and call the model service

Deploy the model service

After training a model, deploy it as an online service in EAS:

  1. In the upper-right corner of the Task details page, click Deploy. The system automatically configures basic information and resource information. For Deployment Method, select VLLM Accelerated Deployment. Modify parameters as needed and click Deploy.

  2. In the Billing Notification message, click OK.

    The system automatically navigates to the deployment task page. When Status is Running, the service is deployed successfully.

Call the model service

The following example shows how to call the API using a client:

  1. Obtain the endpoint and token of the model service.

    1. In the Service details section of the Basic Information page, click View Endpoint Information.image

    2. In the Invocation Information dialog box, view and save the endpoint and token of the model service to your on-premises machine.

  2. The following example shows how to call the service using vLLM accelerated deployment. Run this code in the terminal to call the service.

    from openai import OpenAI
    
    ##### API Configuration #####
    openai_api_key = "<EAS_SERVICE_TOKEN>"
    openai_api_base = "<EAS_SERVICE_URL>/v1/"
    
    client = OpenAI(
        api_key=openai_api_key,
        base_url=openai_api_base,
    )
    
    models = client.models.list()
    model = models.data[0].id
    print(model)
    
    
    def main():
        stream = True
        chat_completion = client.chat.completions.create(
            messages=[
                 {
                    "role": "system",
                    "content": [
                        {
                            "type": "text",
                            "text": "You are an intent recognition expert. You can recognize an intent based on user questions and return the corresponding intent and parameters.",
                        }
                    ],
                },
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": "I want to listen to music",
                        }
                    ],
                }
            ],
            model=model,
            max_completion_tokens=2048,
            stream=stream,
        )
    
        if stream:
            for chunk in chat_completion:
                print(chunk.choices[0].delta.content, end="")
        else:
            result = chat_completion.choices[0].message.content
            print(result)
    
    
    if __name__ == "__main__":
        main()
    

    Where:

    • <EAS_SERVICE_URL>: the endpoint of your model service.

    • <EAS_SERVICE_TOKEN>: the token of your model service.

References

  • For more information about how to use iTAG and the format requirements for data labeling, see iTAG.

  • For more information about EAS, see Elastic Algorithm Service.

  • You can use QuickStart of PAI to train and deploy models in different scenarios, including Llama-3, Qwen1.5, and Stable Diffusion V1.5 models. For more information, see Scenario-specific practices.