Deploy an application flow as an EAS service - Platform For AI

After you develop an application flow, you can deploy it as an Elastic Algorithm Service (EAS) service. EAS provides features such as auto scaling and comprehensive O&M monitoring. This ensures that your application can flexibly respond to business changes and growth, improving system stability and performance, and better meeting production environment requirements.

Prerequisites

You have created and debugged an application flow, see Develop an application flow.

Deploy an application flow

Go to LangStudio and select a workspace. On the Application Flow tab, click your debugged application flow, and then click Deploy in the upper-right corner. Before deploying, make sure the runtime is started. The following table describes the key parameters.

Parameter	Description
Resource Information
Resource Type	Select public resources or a dedicated resource group that you have created.
Instances	Configure the number of service instances. In the production stage, configure multiple instances to reduce the risk of single points of failure.
Deployment Resources	If you use the application flow only for business flow scheduling, you can select appropriate CPU resources based on the complexity of the business flow. Compared with GPU resources, CPU resources are usually more cost-effective. After deployment, you are charged for the resources, see Billing of EAS.
VPC: The application flow is actually deployed as an EAS service. To ensure that the client can normally access it, select a virtual private cloud (VPC). Note that EAS services cannot access the Internet by default. To access the Internet, configure a VPC that can be accessed over the Internet, see Configure network connectivity. Note If an application flow includes a vector database connection (such as Milvus), ensure that the configured VPC is the one where the vector database instance resides or the two VPCs are connected.
History
Enable History	This parameter applies only to chat type application flows. When enabled, the system can store and transmit multiple rounds of chat history. This feature must be used together with the request header parameter.
History Storage	Local storage does not support multi-instance deployment. If you deploy services for production use, use an external storage instead, such as ApsaraDB RDS. For more information, see Appendix: Chat history. Important If you use local storage, multi-instance deployment is not supported, and the scale-out from a single instance to multiple instances is also not supported. Otherwise, the chat history feature may not work properly.
Enable Tracing: When enabled, you can view trace records to evaluate the effect of the application flow after deployment.
Roles and Permissions: In the application flow, if you use a Faiss vector database (select a Faiss or Milvus vector database when creating a knowledge base) or "Alibaba Cloud IQS Search" (required by the IQS web search-based chatbot template), you must select an appropriate role.

For more information about parameter configurations, see Parameters for custom deployment in the console.

Online debugging

Call the service

Online debugging

After successful deployment, you are redirected to PAI-EAS. On the Online Debugging tab, configure and send a request. The Key in the request body must be the same as the value of the Chat Input parameter in the Start Node of the application flow. In this topic, the default field question is used.

Make API calls

On the Overview tab, obtain the endpoint and token.

Send an API request.

You can call the service in simple mode or complete mode. The following table describes the differences between the two modes.

Property	Simple Mode	Complete Mode
Request path	`<Endpoint>/`	`<Endpoint>/run`
Feature description	Directly returns the output results of the application flow.	Returns a complex structure, including the node status, error messages, and output messages of the application flow.
Scenario	You need only the final output results of the application flow and do not care about the internal processing or status of the flow. Suitable for simple queries or operations to quickly obtain results.	You need to understand the execution process of the application flow in detail, including the status of each node and possible error messages. Suitable for debugging, monitoring, or analyzing the execution of the application flow.
Advantages	Simple to use, no need to parse complex structures.	Provides comprehensive information to help you understand the execution process of the application flow in depth. Helps troubleshoot and optimize the performance of the application flow.

Simple mode

cURL command

The deployed EAS application flow service supports streaming or non-streaming calls using cURL commands. The following table provides request and response examples.

Example

Streaming

Non-streaming

Sample request

curl -X POST \
     -H "Authorization: Bearer <your_token>" \
     -H "Content-Type: application/json" \
     -H "Accept: text/event-stream" \
     -d '{"question": "Where is the capital of France?"}' \
     "<your_endpoint>"

curl -X POST \
     -H "Authorization: Bearer <your_token>" \
     -H "Content-Type: application/json" \
     -d '{"question": "Where is the capital of France?"}' \
     "<your_endpoint>"

Sample response

event: Message
data: {"answer": ""}

event: Message
data: {"answer": "The"}

event: Message
data: {"answer": " capital"}

event: Message
data: {"answer": " of"}

event: Message
data: {"answer": " France"}

event: Message
data: {"answer": " is"}

event: Message
data: {"answer": " Paris"}

event: Message
data: {"answer": "."}

event: Message
data: {"answer": ""}

{"answer":"The capital of France is Paris."}

The following table describes the request parameters.

Parameter	Description
-H "Authorization: Bearer <your_token>"	The HTTP header. Replace `<your_token>` with the token that you obtained in Step 1.
-H "Accept: text/event-stream"	Indicates that the client accepts SSE requests and the returned information is streaming. Note: Streaming is supported only when an LLM node is used as the output node of the application flow (an LLM node is the direct input to the end node).
-d '{"question": "Where is the capital of France?"}'	The request body, a JSON object containing a key-value pair, which is the question string. The key must match the "Chat Input" parameter field in the "Start Node" of the application flow. In this case, the default field `question` is used.
"<your_endpoint>"	The destination URL of the request. Replace `<your_endpoint>` with the endpoint that you obtained in Step 1.

Python code

The following examples show how to use the requests library to send a POST request to an application flow service, supporting streaming or non-streaming calls. Make sure that you have installed this library. If not, run pip install requests to install.

Example

Streaming

Non-streaming

Sample request

import requests
import json

url = "http://<your-endpoint-here>"
token = "<your-token-here>"
data = {"question": "Where is the capital of France?"}

# Specify a request header, including the token of your application flow service.
headers = {
    "Authorization": f"Bearer {token}",
    "Accept": "text/event-stream",
    "Content-Type": "application/json"
}

if __name__ == '__main__':
    with requests.post(url, json=data, headers=headers, stream=True) as r:
        for line in r.iter_lines(chunk_size=1024):
            print(line)

import requests
import json

url = "http://<your-endpoint-here>"
token = "<your-token-here>"
data = {"question": "Where is the capital of France?"}

# Specify a request header, including the token of your application flow service.
headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

response = requests.post(url, json=data, headers=headers)

if response.status_code == 200:
    print("Request is successful, results:")
    print(response.text)
else:
    print(f"Request fails, status code: {response.status_code}")

Sample response

event: Message
data: {"answer": ""}

event: Message
data: {"answer": "The"}

event: Message
data: {"answer": " capital"}

event: Message
data: {"answer": " of"}

event: Message
data: {"answer": " France"}

event: Message
data: {"answer": " is"}

event: Message
data: {"answer": " Paris"}

event: Message
data: {"answer": "."}

event: Message
data: {"answer": ""}

{"answer":"The capital of France is Paris."}

The following table describes the request parameters.

Parameter	Description
url	The destination URL of the request. Replace `<your-endpoint-here>` with the endpoint that you obtained in Step 1.
token	The HTTP header. Replace `<your-token-here>` with the token that you obtained in Step 1.
data	The request body, a JSON object containing a key-value pair, which is the question string. The key must match the "Chat Input" parameter field in the "Start Node" of the application flow. In this case, the default field `question` is used.
"Accept": "text/event-stream"	Indicates that the client accepts SSE requests and the returned information is streaming. Note: Streaming is supported only when an LLM node is used as the output node of the application flow (an LLM node is the direct input to the end node).

Complete mode

Langstudio supports Server-Sent Events (SSE), which can output the status, error messages, and output messages of each node when the application flow is executed. You can also customize the content of the node_run_infos in the events. The following example uses online debugging. You need to append /run to the call address and then edit the request body:

The following table describes the request body parameters.

Field Name	Type	Default Value	Description
inputs	Mapping[str, Any]	None	The input data dictionary. Keys should match the input field names defined in the application flow. If the flow has no inputs, this field is ignored.
stream	bool	True	Controls the response format. Default value: Dynamic. Valid values: True: Responds with SSE streaming. The Content-Type in the response header is `text/event-stream`, and the data is returned in DataOnly format, divided into different events: RunStarted, NodeUpdated, RunOutput, and RunTerminated. For more information, see the tables below. False: Responds with a single JSON body. The Content-Type in the response header is `application/json`. You can refer to the response information in Online debugging.
response_config	Dict[str, Any]	-	Controls the detailed node information included in the streaming response (when stream=True).
∟ include_node_description	bool	False	(Within response_config) Whether to include node descriptions in the SSE event stream.
∟ include_node_display_name	bool	False	(Within response_config) Whether to include node display names in the SSE event stream.
∟ include_node_output	bool	False	(Within response_config) Whether to include node outputs in the SSE event stream.
∟ exclude_nodes	List[str]	[]	(Within response_config) List of node names to exclude from the SSE event stream.

The returned data is divided into different events: RunStarted, NodeUpdated, RunOutput, and RunTerminated:

RunStarted event

Definition: The RunStarted event marks the beginning of run. It is typically sent as the first event in the SSE stream of a run.

Payload example:

data: {"event": "RunStarted", "run_id": "fb745e15-3b3b-4a10-9e0d-0bea08d47411", "timestamp": "2025-06-12T08:15:07.223611Z", "flow_run_info": {"run_id": "fb745e15-3b3b-4a10-9e0d-0bea08d47411", "status": "Running", "error": null, "otel_trace_id": ""}}

Field descriptions:

Field Name	Type	Description
event	string	Event type, fixed as RunStarted.
run_id	string	Unique identifier for the current run.
timestamp	string	Timestamp of the event occurrence, following the ISO 8601 standard.
flow_run_info	object	Contains the final status information for the entire run.
∟ run_id	string	(Within flow_run_info) Unique identifier for the run (same as the outer `run_id`).
∟ status	string	(Within flow_run_info) Initial status of the run, fixed as Running.
∟ error	object or null	(Within flow_run_info) If the run fails, contains an error message object; otherwise `null`.
∟ otel_trace_id	string	(Within flow_run_info) OpenTelemetry trace ID associated with this run (may be empty or zero value).

NodeUpdated event

Definition: The NodeUpdated event indicates that the status or output of one or more nodes in the Flow has changed. During a run, this event is typically sent when nodes start running (Running) or complete execution (Completed/Failed). If response_config is set, this event can also include node descriptions, display names, and outputs. Note: If you specify exclude_nodes in the response_config of the request, NodeUpdated events for those nodes will not be returned.

Payload example:

data: {"event": "NodeUpdated", "run_id": "8f92b1a6-4d69-422a-a080-50713e488b56", "timestamp": "2025-04-25T08:57:15.208601Z", "node_run_infos": [{"node_name": "custom_python", "node": "custom_python", "status": "Running", "error": null, "duration": 0.0, "description": null, "display_name": null, "output": null}]}

data: {"event": "NodeUpdated", "run_id": "8f92b1a6-4d69-422a-a080-50713e488b56", "timestamp": "2025-04-25T08:57:15.209621Z", "node_run_infos": [{"node_name": "custom_python", "node": "custom_python", "status": "Completed", "error": null, "duration": 0.001246, "description": null, "display_name": null, "output": {"text": "echo:hello", "input_length": 2}}]}

Field descriptions:

Field Name	Type	Description
event	string	Event type, fixed as NodeUpdated.
run_id	string	Unique identifier for the current run.
timestamp	string	Timestamp of the event occurrence, following the ISO 8601 standard.
node_run_infos	array[object]	Array containing one or more node run information objects. Each object represents a node whose status or output has changed.
∟ node_name	string	(Within node_run_infos) Name of the node (legacy field, same as the `node` field).
∟ node	string	(Within node_run_infos) Name of the node.
∟ status	string	(Within node_run_infos) Current running status of the node, such as Running, Completed, Failed.
∟ error	object or null	(Within node_run_infos) If the node execution fails, contains an error message object; otherwise null.
∟ duration	float	(Within node_run_infos) Time spent on node execution (in seconds). For Running status, this is typically 0.0.
∟ description	string or null	(Within node_run_infos) Description of the node. Included only when `response_config.include_node_description` is `true` in the request; otherwise `null`.
∟ display_name	string or null	(Within node_run_infos) Display name of the node. Included only when `response_config.include_node_display_name` is `true` in the request; otherwise `null`.
∟ output	object or null	(Within node_run_infos) Output data of the node. Included only when the node status is Completed and `response_config.include_node_output` is `true` in the request; otherwise `null`.

RunOutput event

Definition: The RunOutput event indicates that the run has generated its final output. It typically occurs before the RunTerminated event at the end of the Flow run.

Payload example:

data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_llm_3ce063ad-bc9b-417d-9e68-08ce92c3db1b", "timestamp": "2025-04-30T11:55:24.745130Z", "outputs": {"answer": "What can"}, "output_metadata": {"answer": {"is_stream": true, "status": "Streaming"}}}

data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_llm_3ce063ad-bc9b-417d-9e68-08ce92c3db1b", "timestamp": "2025-04-30T11:55:24.829133Z", "outputs": {"answer": " I help you with?"}, "output_metadata": {"answer": {"is_stream": true, "status": "Streaming"}}}

data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_llm_3ce063ad-bc9b-417d-9e68-08ce92c3db1b", "timestamp": "2025-04-30T11:55:24.950055Z", "outputs": {"answer": ""}, "output_metadata": {"answer": {"is_stream": true, "status": "Streaming"}}}

data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_llm_3ce063ad-bc9b-417d-9e68-08ce92c3db1b", "timestamp": "2025-04-30T11:55:24.954983Z", "outputs": {}, "output_metadata": {"answer": {"is_stream": true, "status": "Finished"}}}

data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_python_oHG7_1c9fb0ac-0f45-4dbc-bf97-8e4175fd991c", "timestamp": "2025-04-30T11:55:24.957091Z", "outputs": {"python_output": "Hello: Hello! What can I help you with?"}, "output_metadata": {"python_output": {"is_stream": false, "status": "Finished"}}}

Field descriptions:

Field Name	Type	Description
event	string	Event type, fixed as RunOutput.
run_id	string	Unique identifier for the current run.
timestamp	string	Timestamp of the event occurrence, following the ISO 8601 standard.
outputs	object	Dictionary containing the final output results of the Flow. Its specific structure depends on the outputs defined when designing the Flow.
output_metadata	object	Dictionary containing Flow output metadata. The keys are output names (corresponding to the keys in `outputs`), and the values are objects containing metadata for that output (such as `is_stream` and `status`).

RunTerminated event

Definition: The RunTerminated event marks the end of run. It is typically sent as the last event in the SSE stream of a run.

Payload example:

data: {"event": "RunTerminated", "run_id": "8f92b1a6-4d69-422a-a080-50713e488b56", "timestamp": "2025-04-25T08:57:15.212791Z", "flow_run_info": {"run_id": "8f92b1a6-4d69-422a-a080-50713e488b56", "status": "Completed", "error": null, "otel_trace_id": "0x00000000000000000000000000000000"}}

Field descriptions:

Field Name	Type	Description
event	string	Event type, fixed as RunTerminate.
run_id	string	The unique identifier of the current run.
timestamp	string	The timestamp of the occurrence, following the ISO 8601 standard.
flow_run_info	object	Contains the final status information of the entire run.
∟ run_id	string	The unique identifier for run (within flow_run_info, same as the outer `run_id`).
∟ status	string	The final status of the run (within flow_run_info), such as Completed, Failed, or Canceled.
∟ error	object or null	(In flow_run_info) If the run failed, this contains the error message object; otherwise, it is `null`.
∟ otel_trace_id	string	(In flow_run_info) The OpenTelemetry trace ID associated with this run (may be empty or zero value).

OpenAI compatible calling method

Deployed chat type application flows support OpenAI compatible calling, and can be used by clients that support OpenAI.

OpenAI API-based method

This example demonstrates streaming calls using cURL commands. Here are the request and response examples:

Sample request:

curl --location '<Endpoint>/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "default",  
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream":true
}'

The following table describes the request parameters.

Parameter	Description
--location '<Endpoint>/v1/chat/completions'	The destination URL of the request. Replace `<Endpoint>` with the endpoint that you obtained in Step 1.
--header "Authorization: Bearer $DASHSCOPE_API_KEY"	The HTTP header. Replace `$DASHSCOPE_API_KEY` with the token that you obtained in Step 1.
"model": "default"	The model name, which is fixed as `default`.
"stream":true	Specifies whether the returned information is streaming. Note: Streaming is supported only when an LLM node is used as the output node of the application flow (an LLM node is the direct input to the end node).

Sample response:

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"a large"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"language model"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"created by Alibaba Cloud"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":". I am called Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: [DONE]

Integration with other clients

This example demonstrates integration with ChatBox v1.13.4 on the Windows platform.

Download and install Chatbox.
Open ChatBox and configure the model provider name, such as LangStudio, as follows.

Select the configured model provider and configure the service request parameters.

The following table describes the key parameters.

Parameter	Description
API Mode	Fixed as `OpenAI API Compatible`.
API Key	Set to the token of the deployed service, see Obtain the endpoint and token on the Overview tab.
API Host	Set to the endpoint of the deployed service (see Obtain the endpoint and token on the Overview tab.) and add the `/v1` suffix at the end. This example uses an internet endpoint. Therefore, the API host is `http://langstudio-20250319153409-xdcp.115770327099****.cn-hangzhou.pai-eas.aliyuncs.com/v1`.
API Path	Fixed as `/chat/completions`.
Model	Click New and enter a custom Model ID, such as qwen3-8b.

Call the deployed service in the chat dialog box.

View trace records

After you call a service, the system automatically generates a trace record. On the Tracing Analysis tab, find the trace record that you want to manage and click View Trace in the Actions column.

The trace data allows you to view the input and output information of each node in the application flow, such as the recall results of the vector database or the input and output information of the LLM node.

Appendix: Chat history

For chat-based application flows, LangStudio provides a feature to store the history of multi-round conversations. You can choose to use local storage or external storage to save the chat history.

Storage types

Local storage: The service uses the local disk to automatically create an SQLite database named chat_history.db on the EAS instance where the application flow is deployed to save the chat history. The default storage path is /langstudio/flow/. Note that the local storage does not support multi-instance deployment. Regularly check the usage of the local disk. You can also view or delete the chat history by using the API provided below. If an EAS instance is removed, the related chat history is also cleared.
External storage: Supports ApsaraDB RDS for MySQL. To use external storage, you must configure an RDS MySQL connection for storing the chat history when you deploy a service. For more information, see Service connection configuration - Database. The service automatically creates tables suffixed with the service name in the RDS MySQL database that you configure. For example, the service creates the langstudio_chat_session_<Service name> table to store the chat session and the langstudio_chat_history_<Service name> table to store the chat history.

Session or user support

Each chat request to an application flow is stateless. If you want multiple requests to be treated as the same conversation, you need to manually configure the request header. For information about how to make calls, see Make API calls.

Request header	Data type	Description	Note
Chat-Session-Id	String	The session ID. For each service request, the system automatically assigns a unique identifier to the session to distinguish between different sessions, and returns it through the `Chat-Session-Id` field in the Response Header.	Custom session IDs are supported. To ensure uniqueness, a session ID must be 32 to 255 characters in length and can contain letters, digits, underscores (_), hyphens (-), and colons (:).
Chat-User-Id	String	The user ID, which identifies the user to whom the chat belongs. The system does not automatically assign a user ID. Custom user IDs are supported.	-

Chat history API

The application flow service also provides chat history data management API operations, which allow you to easily view and delete these data. You can obtain the complete API schema by sending a GET request to {Endpoint}/openapi.json. This schema is built based on the Swagger standard. For a more intuitive understanding and exploration of these API operations, we recommend that you use Swagger UI to perform visualization operations, making operations simpler and clearer.