All Products
Search
Document Center

Platform For AI:Deploy an application flow

Last Updated:Jul 10, 2025

After you develop an application flow, you can deploy it as an Elastic Algorithm Service (EAS) service. EAS provides features such as auto scaling and comprehensive O&M monitoring. This ensures that your application can flexibly respond to business changes and growth, improving system stability and performance, and better meeting production environment requirements.

Prerequisites

You have created and debugged an application flow, see Develop an application flow.

Deploy an application flow

Go to LangStudio and select a workspace. On the Application Flow tab, click your debugged application flow, and then click Deploy in the upper-right corner. Before deploying, make sure the runtime is started. The following table describes the key parameters.

image

Parameter

Description

Resource Information

Resource Type

Select public resources or a dedicated resource group that you have created.

Instances

Configure the number of service instances. In the production stage, configure multiple instances to reduce the risk of single points of failure.

Deployment Resources

If you use the application flow only for business flow scheduling, you can select appropriate CPU resources based on the complexity of the business flow. Compared with GPU resources, CPU resources are usually more cost-effective. After deployment, you are charged for the resources, see Billing of EAS.

VPC: The application flow is actually deployed as an EAS service. To ensure that the client can normally access it, select a virtual private cloud (VPC). Note that EAS services cannot access the Internet by default. To access the Internet, configure a VPC that can be accessed over the Internet, see Configure network connectivity.

Note

If an application flow includes a vector database connection (such as Milvus), ensure that the configured VPC is the one where the vector database instance resides or the two VPCs are connected.

History

Enable History

This parameter applies only to chat type application flows. When enabled, the system can store and transmit multiple rounds of chat history. This feature must be used together with the request header parameter.

History Storage

Local storage does not support multi-instance deployment. If you deploy services for production use, use an external storage instead, such as ApsaraDB RDS. For more information, see Appendix: Chat history.

Important

If you use local storage, multi-instance deployment is not supported, and the scale-out from a single instance to multiple instances is also not supported. Otherwise, the chat history feature may not work properly.

Enable Tracing: When enabled, you can view trace records to evaluate the effect of the application flow after deployment.

Roles and Permissions: In the application flow, if you use a Faiss vector database (select a Faiss or Milvus vector database when creating a knowledge base) or "Alibaba Cloud IQS Search" (required by the IQS web search-based chatbot template), you must select an appropriate role.

For more information about parameter configurations, see Parameters for custom deployment in the console.

Online debugging

Call the service

Online debugging

After successful deployment, you are redirected to PAI-EAS. On the Online Debugging tab, configure and send a request. The Key in the request body must be the same as the value of the Chat Input parameter in the Start Node of the application flow. In this topic, the default field question is used.

image

Make API calls

  1. On the Overview tab, obtain the endpoint and token.

    image

  2. Send an API request.

    You can call the service in simple mode or complete mode. The following table describes the differences between the two modes.

    Property

    Simple Mode

    Complete Mode

    Request path

    <Endpoint>/

    <Endpoint>/run

    Feature description

    Directly returns the output results of the application flow.

    Returns a complex structure, including the node status, error messages, and output messages of the application flow.

    Scenario

    • You need only the final output results of the application flow and do not care about the internal processing or status of the flow.

    • Suitable for simple queries or operations to quickly obtain results.

    • You need to understand the execution process of the application flow in detail, including the status of each node and possible error messages.

    • Suitable for debugging, monitoring, or analyzing the execution of the application flow.

    Advantages

    Simple to use, no need to parse complex structures.

    • Provides comprehensive information to help you understand the execution process of the application flow in depth.

    • Helps troubleshoot and optimize the performance of the application flow.

    Simple mode

    cURL command

    The deployed EAS application flow service supports streaming or non-streaming calls using cURL commands. The following table provides request and response examples.

    Example

    Streaming

    Non-streaming

    Sample request

    curl -X POST \
         -H "Authorization: Bearer <your_token>" \
         -H "Content-Type: application/json" \
         -H "Accept: text/event-stream" \
         -d '{"question": "Where is the capital of France?"}' \
         "<your_endpoint>"
    curl -X POST \
         -H "Authorization: Bearer <your_token>" \
         -H "Content-Type: application/json" \
         -d '{"question": "Where is the capital of France?"}' \
         "<your_endpoint>"

    Sample response

    event: Message
    data: {"answer": ""}
    
    event: Message
    data: {"answer": "The"}
    
    event: Message
    data: {"answer": " capital"}
    
    event: Message
    data: {"answer": " of"}
    
    event: Message
    data: {"answer": " France"}
    
    event: Message
    data: {"answer": " is"}
    
    event: Message
    data: {"answer": " Paris"}
    
    event: Message
    data: {"answer": "."}
    
    event: Message
    data: {"answer": ""}
    {"answer":"The capital of France is Paris."}

    The following table describes the request parameters.

    Parameter

    Description

    -H "Authorization: Bearer <your_token>"

    The HTTP header. Replace <your_token> with the token that you obtained in Step 1.

    -H "Accept: text/event-stream"

    Indicates that the client accepts SSE requests and the returned information is streaming. Note: Streaming is supported only when an LLM node is used as the output node of the application flow (an LLM node is the direct input to the end node).

    -d '{"question": "Where is the capital of France?"}'

    The request body, a JSON object containing a key-value pair, which is the question string. The key must match the "Chat Input" parameter field in the "Start Node" of the application flow. In this case, the default field question is used.

    "<your_endpoint>"

    The destination URL of the request. Replace <your_endpoint> with the endpoint that you obtained in Step 1.

    Python code

    The following examples show how to use the requests library to send a POST request to an application flow service, supporting streaming or non-streaming calls. Make sure that you have installed this library. If not, run pip install requests to install.

    Example

    Streaming

    Non-streaming

    Sample request

    import requests
    import json
    
    url = "http://<your-endpoint-here>"
    token = "<your-token-here>"
    data = {"question": "Where is the capital of France?"}
    
    # Specify a request header, including the token of your application flow service.
    headers = {
        "Authorization": f"Bearer {token}",
        "Accept": "text/event-stream",
        "Content-Type": "application/json"
    }
    
    if __name__ == '__main__':
        with requests.post(url, json=data, headers=headers, stream=True) as r:
            for line in r.iter_lines(chunk_size=1024):
                print(line)
    
    import requests
    import json
    
    url = "http://<your-endpoint-here>"
    token = "<your-token-here>"
    data = {"question": "Where is the capital of France?"}
    
    # Specify a request header, including the token of your application flow service.
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, json=data, headers=headers)
    
    if response.status_code == 200:
        print("Request is successful, results:")
        print(response.text)
    else:
        print(f"Request fails, status code: {response.status_code}")
    

    Sample response

    event: Message
    data: {"answer": ""}
    
    event: Message
    data: {"answer": "The"}
    
    event: Message
    data: {"answer": " capital"}
    
    event: Message
    data: {"answer": " of"}
    
    event: Message
    data: {"answer": " France"}
    
    event: Message
    data: {"answer": " is"}
    
    event: Message
    data: {"answer": " Paris"}
    
    event: Message
    data: {"answer": "."}
    
    event: Message
    data: {"answer": ""}
    {"answer":"The capital of France is Paris."}

    The following table describes the request parameters.

    Parameter

    Description

    url

    The destination URL of the request. Replace <your-endpoint-here> with the endpoint that you obtained in Step 1.

    token

    The HTTP header. Replace <your-token-here> with the token that you obtained in Step 1.

    data

    The request body, a JSON object containing a key-value pair, which is the question string. The key must match the "Chat Input" parameter field in the "Start Node" of the application flow. In this case, the default field question is used.

    "Accept": "text/event-stream"

    Indicates that the client accepts SSE requests and the returned information is streaming. Note: Streaming is supported only when an LLM node is used as the output node of the application flow (an LLM node is the direct input to the end node).

    Complete mode

    Langstudio supports Server-Sent Events (SSE), which can output the status, error messages, and output messages of each node when the application flow is executed. You can also customize the content of the node_run_infos in the events. The following example uses online debugging. You need to append /run to the call address and then edit the request body:

    image

    The following table describes the request body parameters.

    Field Name

    Type

    Default Value

    Description

    inputs

    Mapping[str, Any]

    None

    The input data dictionary. Keys should match the input field names defined in the application flow. If the flow has no inputs, this field is ignored.

    stream

    bool

    True

    Controls the response format. Default value: Dynamic. Valid values:

    • True: Responds with SSE streaming. The Content-Type in the response header is text/event-stream, and the data is returned in DataOnly format, divided into different events: RunStarted, NodeUpdated, RunOutput, and RunTerminated. For more information, see the tables below.

    • False: Responds with a single JSON body. The Content-Type in the response header is application/json. You can refer to the response information in Online debugging.

    response_config

    Dict[str, Any]

    -

    Controls the detailed node information included in the streaming response (when stream=True).

    ∟ include_node_description

    bool

    False

    (Within response_config) Whether to include node descriptions in the SSE event stream.

    ∟ include_node_display_name

    bool

    False

    (Within response_config) Whether to include node display names in the SSE event stream.

    ∟ include_node_output

    bool

    False

    (Within response_config) Whether to include node outputs in the SSE event stream.

    ∟ exclude_nodes

    List[str]

    []

    (Within response_config) List of node names to exclude from the SSE event stream.

    The returned data is divided into different events: RunStarted, NodeUpdated, RunOutput, and RunTerminated:

    RunStarted event

    • Definition: The RunStarted event marks the beginning of run. It is typically sent as the first event in the SSE stream of a run.

    • Payload example:

      data: {"event": "RunStarted", "run_id": "fb745e15-3b3b-4a10-9e0d-0bea08d47411", "timestamp": "2025-06-12T08:15:07.223611Z", "flow_run_info": {"run_id": "fb745e15-3b3b-4a10-9e0d-0bea08d47411", "status": "Running", "error": null, "otel_trace_id": ""}}
    • Field descriptions:

      Field Name

      Type

      Description

      event

      string

      Event type, fixed as RunStarted.

      run_id

      string

      Unique identifier for the current run.

      timestamp

      string

      Timestamp of the event occurrence, following the ISO 8601 standard.

      flow_run_info

      object

      Contains the final status information for the entire run.

      ∟ run_id

      string

      (Within flow_run_info) Unique identifier for the run (same as the outer run_id).

      ∟ status

      string

      (Within flow_run_info) Initial status of the run, fixed as Running.

      ∟ error

      object or null

      (Within flow_run_info) If the run fails, contains an error message object; otherwise null.

      ∟ otel_trace_id

      string

      (Within flow_run_info) OpenTelemetry trace ID associated with this run (may be empty or zero value).

    NodeUpdated event

    • Definition: The NodeUpdated event indicates that the status or output of one or more nodes in the Flow has changed. During a run, this event is typically sent when nodes start running (Running) or complete execution (Completed/Failed). If response_config is set, this event can also include node descriptions, display names, and outputs. Note: If you specify exclude_nodes in the response_config of the request, NodeUpdated events for those nodes will not be returned.

    • Payload example:

      data: {"event": "NodeUpdated", "run_id": "8f92b1a6-4d69-422a-a080-50713e488b56", "timestamp": "2025-04-25T08:57:15.208601Z", "node_run_infos": [{"node_name": "custom_python", "node": "custom_python", "status": "Running", "error": null, "duration": 0.0, "description": null, "display_name": null, "output": null}]}
      data: {"event": "NodeUpdated", "run_id": "8f92b1a6-4d69-422a-a080-50713e488b56", "timestamp": "2025-04-25T08:57:15.209621Z", "node_run_infos": [{"node_name": "custom_python", "node": "custom_python", "status": "Completed", "error": null, "duration": 0.001246, "description": null, "display_name": null, "output": {"text": "echo:hello", "input_length": 2}}]}
    • Field descriptions:

      Field Name

      Type

      Description

      event

      string

      Event type, fixed as NodeUpdated.

      run_id

      string

      Unique identifier for the current run.

      timestamp

      string

      Timestamp of the event occurrence, following the ISO 8601 standard.

      node_run_infos

      array[object]

      Array containing one or more node run information objects. Each object represents a node whose status or output has changed.

      ∟ node_name

      string

      (Within node_run_infos) Name of the node (legacy field, same as the node field).

      ∟ node

      string

      (Within node_run_infos) Name of the node.

      ∟ status

      string

      (Within node_run_infos) Current running status of the node, such as Running, Completed, Failed.

      ∟ error

      object or null

      (Within node_run_infos) If the node execution fails, contains an error message object; otherwise null.

      ∟ duration

      float

      (Within node_run_infos) Time spent on node execution (in seconds). For Running status, this is typically 0.0.

      ∟ description

      string or null

      (Within node_run_infos) Description of the node. Included only when response_config.include_node_description is true in the request; otherwise null.

      ∟ display_name

      string or null

      (Within node_run_infos) Display name of the node. Included only when response_config.include_node_display_name is true in the request; otherwise null.

      ∟ output

      object or null

      (Within node_run_infos) Output data of the node. Included only when the node status is Completed and response_config.include_node_output is true in the request; otherwise null.

    RunOutput event

    • Definition: The RunOutput event indicates that the run has generated its final output. It typically occurs before the RunTerminated event at the end of the Flow run.

    • Payload example:

      data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_llm_3ce063ad-bc9b-417d-9e68-08ce92c3db1b", "timestamp": "2025-04-30T11:55:24.745130Z", "outputs": {"answer": "What can"}, "output_metadata": {"answer": {"is_stream": true, "status": "Streaming"}}}
      
      data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_llm_3ce063ad-bc9b-417d-9e68-08ce92c3db1b", "timestamp": "2025-04-30T11:55:24.829133Z", "outputs": {"answer": " I help you with?"}, "output_metadata": {"answer": {"is_stream": true, "status": "Streaming"}}}
      
      data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_llm_3ce063ad-bc9b-417d-9e68-08ce92c3db1b", "timestamp": "2025-04-30T11:55:24.950055Z", "outputs": {"answer": ""}, "output_metadata": {"answer": {"is_stream": true, "status": "Streaming"}}}
      
      data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_llm_3ce063ad-bc9b-417d-9e68-08ce92c3db1b", "timestamp": "2025-04-30T11:55:24.954983Z", "outputs": {}, "output_metadata": {"answer": {"is_stream": true, "status": "Finished"}}}
      
      data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_python_oHG7_1c9fb0ac-0f45-4dbc-bf97-8e4175fd991c", "timestamp": "2025-04-30T11:55:24.957091Z", "outputs": {"python_output": "Hello: Hello! What can I help you with?"}, "output_metadata": {"python_output": {"is_stream": false, "status": "Finished"}}}
    • Field descriptions:

      Field Name

      Type

      Description

      event

      string

      Event type, fixed as RunOutput.

      run_id

      string

      Unique identifier for the current run.

      timestamp

      string

      Timestamp of the event occurrence, following the ISO 8601 standard.

      outputs

      object

      Dictionary containing the final output results of the Flow. Its specific structure depends on the outputs defined when designing the Flow.

      output_metadata

      object

      Dictionary containing Flow output metadata. The keys are output names (corresponding to the keys in outputs), and the values are objects containing metadata for that output (such as is_stream and status).

    RunTerminated event

    • Definition: The RunTerminated event marks the end of run. It is typically sent as the last event in the SSE stream of a run.

    • Payload example:

      data: {"event": "RunTerminated", "run_id": "8f92b1a6-4d69-422a-a080-50713e488b56", "timestamp": "2025-04-25T08:57:15.212791Z", "flow_run_info": {"run_id": "8f92b1a6-4d69-422a-a080-50713e488b56", "status": "Completed", "error": null, "otel_trace_id": "0x00000000000000000000000000000000"}}
    • Field descriptions:

      Field Name

      Type

      Description

      event

      string

      Event type, fixed as RunTerminate.

      run_id

      string

      The unique identifier of the current run.

      timestamp

      string

      The timestamp of the occurrence, following the ISO 8601 standard.

      flow_run_info

      object

      Contains the final status information of the entire run.

      ∟ run_id

      string

      The unique identifier for run (within flow_run_info, same as the outer run_id).

      ∟ status

      string

      The final status of the run (within flow_run_info), such as Completed, Failed, or Canceled.

      ∟ error

      object or null

      (In flow_run_info) If the run failed, this contains the error message object; otherwise, it is null.

      ∟ otel_trace_id

      string

      (In flow_run_info) The OpenTelemetry trace ID associated with this run (may be empty or zero value).

OpenAI compatible calling method

Deployed chat type application flows support OpenAI compatible calling, and can be used by clients that support OpenAI.

OpenAI API-based method

This example demonstrates streaming calls using cURL commands. Here are the request and response examples:

Sample request:

curl --location '<Endpoint>/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "default",  
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream":true
}'

The following table describes the request parameters.

Parameter

Description

--location '<Endpoint>/v1/chat/completions'

The destination URL of the request. Replace <Endpoint> with the endpoint that you obtained in Step 1.

--header "Authorization: Bearer $DASHSCOPE_API_KEY"

The HTTP header. Replace $DASHSCOPE_API_KEY with the token that you obtained in Step 1.

"model": "default"

The model name, which is fixed as default.

"stream":true

Specifies whether the returned information is streaming. Note: Streaming is supported only when an LLM node is used as the output node of the application flow (an LLM node is the direct input to the end node).

Sample response:

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"a large"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"language model"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"created by Alibaba Cloud"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":". I am called Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: [DONE]

Integration with other clients

This example demonstrates integration with ChatBox v1.13.4 on the Windows platform.

  1. Download and install Chatbox.

  2. Open ChatBox and configure the model provider name, such as LangStudio, as follows.

    image

  3. Select the configured model provider and configure the service request parameters.

    image

    The following table describes the key parameters.

    Parameter

    Description

    API Mode

    Fixed as OpenAI API Compatible.

    API Key

    Set to the token of the deployed service, see Obtain the endpoint and token on the Overview tab.

    API Host

    Set to the endpoint of the deployed service (see Obtain the endpoint and token on the Overview tab.) and add the /v1 suffix at the end. This example uses an internet endpoint. Therefore, the API host is http://langstudio-20250319153409-xdcp.115770327099****.cn-hangzhou.pai-eas.aliyuncs.com/v1.

    API Path

    Fixed as /chat/completions.

    Model

    Click New and enter a custom Model ID, such as qwen3-8b.

  4. Call the deployed service in the chat dialog box.

    image

View trace records

After you call a service, the system automatically generates a trace record. On the Tracing Analysis tab, find the trace record that you want to manage and click View Trace in the Actions column.

image

The trace data allows you to view the input and output information of each node in the application flow, such as the recall results of the vector database or the input and output information of the LLM node.

Appendix: Chat history

For chat-based application flows, LangStudio provides a feature to store the history of multi-round conversations. You can choose to use local storage or external storage to save the chat history.

Storage types

  • Local storage: The service uses the local disk to automatically create an SQLite database named chat_history.db on the EAS instance where the application flow is deployed to save the chat history. The default storage path is /langstudio/flow/. Note that the local storage does not support multi-instance deployment. Regularly check the usage of the local disk. You can also view or delete the chat history by using the API provided below. If an EAS instance is removed, the related chat history is also cleared.

  • External storage: Supports ApsaraDB RDS for MySQL. To use external storage, you must configure an RDS MySQL connection for storing the chat history when you deploy a service. For more information, see Service connection configuration - Database. The service automatically creates tables suffixed with the service name in the RDS MySQL database that you configure. For example, the service creates the langstudio_chat_session_<Service name> table to store the chat session and the langstudio_chat_history_<Service name> table to store the chat history.

Session or user support

Each chat request to an application flow is stateless. If you want multiple requests to be treated as the same conversation, you need to manually configure the request header. For information about how to make calls, see Make API calls.

Request header

Data type

Description

Note

Chat-Session-Id

String

The session ID. For each service request, the system automatically assigns a unique identifier to the session to distinguish between different sessions, and returns it through the Chat-Session-Id field in the Response Header.

Custom session IDs are supported. To ensure uniqueness, a session ID must be 32 to 255 characters in length and can contain letters, digits, underscores (_), hyphens (-), and colons (:).

Chat-User-Id

String

The user ID, which identifies the user to whom the chat belongs. The system does not automatically assign a user ID. Custom user IDs are supported.

-

Chat history API

The application flow service also provides chat history data management API operations, which allow you to easily view and delete these data. You can obtain the complete API schema by sending a GET request to {Endpoint}/openapi.json. This schema is built based on the Swagger standard. For a more intuitive understanding and exploration of these API operations, we recommend that you use Swagger UI to perform visualization operations, making operations simpler and clearer.