All Products
Search
Document Center

Alibaba Cloud Model Studio:Agent application

Last Updated:Jun 24, 2026

Alibaba Cloud Model Studio agent applications let you connect an LLM to external tools and knowledge bases without code, extending the model's capabilities beyond its built-in limits.

Important

How it works

An agent uses prompts to orchestrate external tools and complete complex tasks. When the agent receives a request, the LLM identifies the user's intent, plans the tasks, calls the necessary external tools, and integrates the results to generate a final response.

Use an agent when you want the model to autonomously decide which tools to use for a task, rather than following a manually designed workflow.

Model Studio agents support the following core capabilities:

  1. Knowledge base (RAG): Connects the agent to an external knowledge base to answer questions based on your private data. This is useful for Q&A scenarios in specialized domains not covered by the model's training data.

  2. Plugin: Calls pre-built platform tools for tasks such as code execution, image generation, or weather queries. This is useful when the agent must perform actions rather than just converse.

Quick start

Create a basic agent

  1. Go to the Application Management page on the Alibaba Cloud Model Studio console. Click + Create Application. On the Agent Application tab, click Create Now.

  2. On the application configuration page, select a model from the model drop-down list, such as Qwen-Plus. You can keep the default settings for other parameters.

  3. After the application is created, enter Hello in the conversation box on the left to test it.

Agent capabilities

You can extend an agent's capabilities by selecting a model, optimizing the system prompt, adding a knowledge base (RAG), and calling plugins.

Model

The model is the core component that drives an agent's reasoning and decision-making. Model Studio agents support Qwen series models and custom-deployed models.

  1. Select a model

    From the model drop-down list, select a model such as Qwen-Plus. Click More Models to choose from other available models.

  2. Configure parameters

    Click the settings icon image to the right of the model drop-down list to configure the following parameters:

    1. Maximum response length: The maximum length of the model's generated response, excluding the prompt. The maximum value varies by model.

    2. Context turns: The maximum number of historical conversation turns to include in the model's input. A higher number helps maintain better conversational context.

    3. temperature: Controls the randomness of the generated content. A higher value increases diversity, while a lower value increases consistency. The value must be in the range of [0, 2).

    4. enable_thinking: Toggles the model's thinking mode. This parameter is only available for supported models.

      When thinking mode is enabled, the model performs more in-depth reasoning before generating a response, which increases token consumption.

System prompt

A system prompt defines an agent's role, behavior, and constraints to ensure that its responses are consistent and task-oriented. To write an effective prompt, consider the following points:

  • Define a persona: Specify the role the model should adopt and the expertise it should have.

  • Specify an output format: Describe the desired structure, length, or style of the response.

  • Set constraints: Instruct the model on what content to avoid or which rules to follow.

  • Guide tool usage: Explicitly mention tool names and explain when they should be used.

    1. Configure a prompt

      Set the system prompt to Please answer my questions in the style of 'One Hundred Years of Solitude'. The following is a comparison of the results:

      • Without the system prompt: When the user asks "Who are you?", the model provides its default introduction: "I am Qwen, a large language model developed by the Tongyi Lab of Alibaba Group. I can help you answer questions, provide information, create content, write code, and perform logical reasoning."

      • With the system prompt: In the text conversation interface, when the user asks "Who are you?", the AI introduces itself in the literary style of "One Hundred Years of Solitude," referencing elements from the novel such as Macondo and the Buendía family. This confirms the system prompt has taken effect. The interaction statistics show 234 words, 245 input tokens, and 177 output tokens.

Knowledge base (RAG)

Retrieval-augmented generation (RAG) lets an agent query an external knowledge base and use the retrieved content as grounding for answers. For private or domain-specific Q&A, RAG can significantly improve accuracy and reduce hallucinations. For more information, see Create and use a knowledge base.

Note: Text retrieved from a knowledge base consumes the model's context window. You may need to adjust your retrieval strategy and the length of the retrieved text to use the context window efficiently and avoid exceeding its limit.

Plugins

Agents call plugins to perform specific tasks, such as code execution, web searches, and text-to-image generation. Plugins are useful when an agent needs to perform operations or generate content beyond its native capabilities. Model Studio provides a variety of official plugins and also supports custom plugins. For more information, see Plugin overview.

Agent interaction

Text conversation

Text conversation is the primary method for interacting with an agent and supports multi-turn conversations.

Text conversation supports two input methods:

  1. Text input: Enter text to chat with the agent.

  2. File upload: Upload files, such as documents, images, videos, or audio clips, as attachments.

Publish and call

You can call Model Studio agents through an API. You can also publish them to third-party platforms with one click or integrate them into your business workflows as components.

Publish an application

Important

You must publish an application before you can call or integrate it.

In the upper-right corner of the agent application management page, click Publish and then click Confirm Publish.

When republishing the application, a dialog box appears and displays the changes made since the last publication.
Note

If the application was created by a RAM user, make sure you have the ram:CreateServiceLinkedRole permission before publishing. For more information, see service-linked role.

API call

On the Publish Channel tab of your agent application, click View API next to API Call to view the API methods.

Replace YOUR_API_KEY with your actual Model Studio API key before you invoke the API.

Agent management

Copy and delete

On the My Applications page, find the application card. From the More menu, you can copy, delete, or rename the agent.

Common scenarios for copying an application include:

  • Creating test versions that use different prompts or models.

  • Customizing an agent for different audiences or use cases.

  • Creating a backup before making major configuration changes.

Version management

Version management lets you edit historical version descriptions or revert to a previously published version.

  1. On the Configure tab of the agent application, click Version Management in the upper-right corner of the top navigation bar.

  2. In the list of historical versions, select a target version. The Version History panel opens and displays a timeline that includes the current draft, the online version (marked with a Latest tag), and other historical version entries. Each entry lists the version ID, publication time, publisher, and version information.

    • To edit the version information, hover over the edit icon image and click it. In the Edit Version Description dialog box, make your changes and then click OK.

    • To use this historical version, click Overwrite Current Draft, and then click Confirm in the confirmation dialog box.

      This action overwrites the current draft with the content of the selected historical version.

Billing

Agent billing includes the following:

  1. Model calls

    Agents incur model call fees, which depend on the model type and token usage.

    For billing details, see the Model Studio console.

  2. Knowledge base

    • The knowledge base feature is free of charge for a limited time.

    • Text chunks retrieved from the knowledge base increase the number of input tokens, which can increase model call fees.

  3. MCPs

    • Some official Model Customization Platforms (MCPs) are billed based on model calls, such as MCPs for text-to-image, text-to-video, and speech synthesis.

    • Some MCP services involve third-party API calls, which may incur fees charged by the third-party service provider. Model Studio does not charge additional fees for these calls.

  4. Long-term memory

    • Data storage for long-term memory is free of charge.

    • During a Q&A session, the system merges content from memory into the prompt and passes it to the large language model, which increases token consumption. Tokens consumed by content from memory are currently not charged.

Supported models

Note

This list may not be up-to-date. For the latest list of supported models, refer to the agent application interface.

  • Qwen-Plus

  • Qwen-Max

  • QwenVL-Max

  • QwenVL-Plus

  • Qwen-Turbo

FAQ

How are Model Studio applications billed?

Creating an application is free. When you call an application for a Q&A session, you are charged for model calls based on the model type used.

I've configured a knowledge base, but the agent's answers are unrelated to its content. How can I fix this?

First, run a knowledge base hit test to check the similarity score between the query and the knowledge base content. If the score is low, adjust the retrieval configuration to ensure that the model prioritizes results from the knowledge base.

In the system prompt, explicitly instruct the model to answer questions based only on the knowledge base content and not use its own built-in knowledge. If the issue persists, try using a different model for more stable results.

Is there a timeout limit for custom plugins?

Yes, the timeout limit is 5 seconds.

Can I create agent applications by using an API?

You can use the Assistant API to create large language model applications similar to agent applications. However, applications created with the Assistant API cannot be managed on the console. For more information, see the Assistant API documentation.