This topic describes how to enable the tracing feature when you deploy a service.
Background information
As large language model (LLM) technology becomes more widely used, businesses face many challenges when building LLM-based applications. These challenges include unpredictable outputs, complex call chains, difficulty in identifying performance bottlenecks, and a lack of fine-grained observability. To address these challenges, Elastic Algorithm Service (EAS) supports tracing.
The main goal of tracing is to improve application observability and help you better evaluate your LLM applications. When you enable tracing, EAS automatically integrates with Alibaba Cloud Application Real-Time Monitoring Service (ARMS) to provide the following features:
Call chain visualization: Provides clear call chain logs to help you visualize the full path of a request.
Performance monitoring: Tracks key performance indicators, such as response time, token consumption, and error counts, to help you promptly identify performance bottlenecks.
Issue identification and root cause analysis: Uses a Trace ID to quickly locate issues and perform root cause analysis with contextual information.
Evaluation tools: Provides evaluation tools based on call chain data to verify the accuracy and reliability of LLM application outputs.
Terms
Trace
A trace is the complete execution path of a transaction or request in a distributed system. It records how a request flows through different services or modules. A trace consists of multiple spans. Traces help you visualize request flows and quickly identify performance bottlenecks or the source of errors. The TraceID is the unique identifier for a trace. You can use the TraceID to query the details and logs of a specific call.
Span
A span is a basic unit in a trace. It represents a specific operation and records details about that operation, including its name, start time, and end time.
Python probe
A Python probe is a tool that automatically collects call chain data and performance metrics from Python applications. When you deploy an EAS service, you can install a Python probe to enable tracing.
Evaluation
Evaluation is the process of assessing the answers generated by an LLM application in response to user questions across multiple dimensions. To confirm the specific evaluation dimension names, contact your business manager.
Limits
This feature supports only LLM applications developed with LangChain, Llama-index, or Dashscope.
Prerequisites
You have activated ARMS Application Monitoring. For more information, see Activate ARMS.
You have activated LangStudio. For more information, see Authorize the PAI service account.
If you use a Resource Access Management (RAM) user or RAM role, you must grant the
AliyunPAILLMTraceFullAccesspermission to the user or role before you use this feature. For more information, see Grant permissions to a RAM role and Grant permissions to a RAM user.
Step 1: Preparations
To provide a complete walkthrough from service deployment and invocation to trace viewing, this topic uses a simple prediction service as an example.
This code is for a simple prediction service developed based on the Dashscope API. It uses the Flask framework to build a web service and calls the Generation.call method of Dashscope to perform text generation. Before you use Dashscope for the first time, you must complete the activation process and obtain an API key. For more information, see Call the Tongyi Qianwen API for the first time. Then, when you deploy the service, you must set DASHSCOPE_API_KEY as an environment variable to ensure proper access to the API service. The following code shows an example of the app.py file:
import os
import json
import flask
import dashscope
app = flask.Flask(__name__)
def run_query(query):
"""Run a query."""
response = dashscope.Generation.call(
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qwen-plus",
messages=[
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': query}
],
result_format='message'
)
return response
@app.route('/api/trace_demo', methods=['POST'])
def query():
"""
Post data example:
{
"query": "capital of china"
}
"""
data = flask.request.get_data("utf-8")
query = json.loads(data).get('query', '')
response = run_query(query)
return response.output.choices[0].message.content
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8000)
Step 2: Enable tracing
When you deploy an EAS service, you can turn on the Tracing switch in the Service Features section to enable tracing. Based on the prompts, determine whether the image you are using has a built-in tracing component. If needed, configure the commands to install the probe and start the application with the ARMS Python probe. For more information about the configuration, see Manually install a Python probe.
Use an image with a built-in tracing component: Turn on the Tracing switch to enable tracing with a single click. No additional parameter configuration is required.
Use an image without a built-in tracing component: Turn on the Tracing switch and configure the third-party libraries and startup command based on the prompts.
Parameter
Description
Startup Command
Add
aliyun-bootstrap -a install && aliyun-instrument python app.py. This command installs the probe and starts the application with the ARMS Python probe. `app.py` is the main file configured in the image to provide the prediction service. You must also add `aliyun-bootstrap` to the third-party library configuration to download the probe installer from the Python Package Index (PyPI) repository.Third-party Library Configuration
Add
aliyun-bootstrapto download the probe installer from the PyPI repository.

This topic uses an image without a built-in tracing component and the sample code as an example. The following table describes the key parameters for deploying a custom EAS service. For specific instructions, see Deploy a service in the console. After the service is deployed, you can perform the following operations:
View the service deployment status on the Elastic Algorithm Service (EAS) page.
View the registered application on the Application List page in the ARMS console. The application name is the same as the EAS service name.
Parameter | Description | |
Environment Context | Deployment Method | Select Image Deployment. |
Image Configuration | This topic uses the default image: . You can also enter a prepared custom image on the Image URL tab. | |
Direct Mount | Because the sample code file is not integrated into the image, you must mount it to the service instance. Take mounting from Object Storage Service (OSS) as an example. Click OSS and configure the following parameters:
If you use a custom image and have already configured the main file for the prediction service in the image, you can skip this configuration. | |
Startup Command | This topic sets the command to | |
Environment Variables | Because the sample code calls the Dashscope API, click Add and configure the following environment variable:
| |
Third-party Library Configuration | Set the third-party libraries to | |
Service Registration | Virtual Private Cloud (VPC) | To use the tracing feature, you must configure a virtual private cloud (VPC). Select a Virtual Private Cloud (VPC), VSwitch, and Security Group in the region. By default, EAS services cannot access the Internet. To run the sample code, which needs to call the Dashscope API, you must configure a VPC with Internet access for the EAS service. This ensures that the service can access the Internet. For specific instructions, see Scenario 1: Allow an EAS service to access the Internet. |
VSwitch | ||
Security Group Name | ||
Service Features | Tracing | Turn on the Tracing switch and configure the third-party libraries and startup command in the Environment Context section. |
Step 3: View traces
An evaluation tool based on traces helps developers verify the accuracy and reliability of the output from LLM applications. Perform the following steps:
Call the EAS service
This topic uses online debugging as an example. You can also call the EAS service using an API. For more information, see API calls.
On the Elastic Algorithm Service (EAS) page, find the target service and click
> Online Debug in the Actions column.On the Body tab, send request data to the specified address based on your defined prediction service.
This topic uses the service interface defined in the sample app.py file. The following figure shows an example result:

View trace information
By default, trace data is stored for 30 days from the time it is generated. To extend the storage period, contact the ARMS team for custom configuration.
Switch to the Trace Query tab on the Tracing tab to view trace information.

Find the target trace and click View Trace in the Actions column to go to the Trace Details page.
The trace data on this page lets you view the service's input, output, and related log information.

Step 4: Evaluate application performance
EAS provides evaluation tools based on traces to verify the accuracy and reliability of LLM application outputs. The following two evaluation methods are supported:
Method 1: Evaluate a single trace: Manually select a trace from the EAS service for evaluation. This method is suitable for the development or testing phase to debug a specific trace and ensure that its logic is correct and its performance meets expectations.
Method 2: Evaluate traces in batches online: Periodically evaluate sampled traces generated by the running EAS service. This method is suitable for large-scale performance testing or feature validation scenarios to help you understand the overall system status and the effectiveness of trace collaboration.
By default, trace data is stored for 30 days from the time it is generated. To extend the storage period, contact the ARMS team for custom configuration.
Method 1: Evaluate a single trace
On the Trace Query tab of the Tracing tab, find the target trace and click Evaluate in the Actions column. Then, in the Evaluate configuration panel, configure the following parameters.

Evaluation Metrics: This is a fixed configuration and cannot be changed. The evaluation is performed based on the following dimensions.
Evaluation Metric
Description
Correctness
Determines whether the answer correctly addresses the question based on the input and reference text.
Faithfulness
Determines whether the answer is generated based on the input and reference text and whether it contains hallucinations.
Retrieval Relevance
Determines whether the retrieved results are relevant to the input question. It includes the following four metrics:
nDCG: Normalized Discounted Cumulative Gain
Hit Rate
Precision@K
MRR: Mean Reciprocal Rank
Model Configuration: The large language model (LLM) used to evaluate the trace. After the initial setup, this configuration is automatically backfilled for subsequent evaluations.
Parameter
Description
Model Selection
The following two models are supported:
PAI Judge Model
qwen-max (Model Studio model)
NoteTo use a Model Studio model, you need to configure an Internet connection for EAS.
Calls to Model Studio models are billed separately. For more information, see Billing.
Model Token
Enter the token for the selected model:
Judge model: Go to the Judge Model page, activate the PAI judge model, and obtain a token.
qwen-max: To learn how to obtain a token for the Model Studio qwen-max model, see Call the Tongyi Qianwen API for the first time.
Extraction Configuration: In the Query Extraction Configuration, Answer Extraction Configuration, and Context Extraction Configuration sections, configure the parameters in the following table to extract the corresponding content:
Query Extraction Configuration: Extracts the user query content (input).
Answer Extraction Configuration: Extracts the system-generated answer (output).
Context Extraction Configuration: Extracts the text or background information provided to the system (documents).
Parameter
Description
SpanName
Finds a span that matches the SpanName.
JsonPathInSpan
The format is a.b.c. This parameter cannot be empty. It extracts a value from a specified element of the matched span.
JsonPathInSpanValue
The format is a.b.c. This parameter can be empty. After the element corresponding to JsonPathInSpan is found, if the element's content is a JSON string, JsonPathInSpanValue is used to extract the corresponding value.
You can click View Trace in the Actions column to obtain the configuration content from the Trace Details page. The following table shows configuration examples:
Extraction Configuration
How to obtain
Example value
Query Extraction Configuration
This topic provides an example where JsonPathInSpanValue has no value:

For an example where JsonPathInSpanValue has a value, see the following figure.

JsonPathInSpanValue has no value
SpanName: LLM
JsonPathInSpan: attributes.input.value
JsonPathInSpanValue: Because the content of the JsonPathInSpan element is not a JSON string, this parameter is empty.
JsonPathInSpanValue has a value
SpanName: LLM
JsonPathInSpan:
attributes.input.valueJsonPathInSpanValue: Because the content of the JsonPathInSpan element is a JSON string, enter
text[0]here.
Answer Extraction Configuration

SpanName: LLM
JsonPathInSpan:
attributes.output.valueJsonPathInSpanValue: This parameter is empty.
Context Extraction Configuration
The sample service in this topic does not include a context extraction configuration. For an example of a context extraction configuration, see the following figure:

SpanName: retrieve
JsonPathInSpan:
attributes.retrieval.documents[*].document.contentImportantOnly the context configuration supports using an asterisk (*).
JsonPathInSpanValue: Because the content of the JsonPathInSpan element is not a JSON string, this parameter is empty.
After you configure the parameters, click OK.
When a result appears in the Evaluation Result column as shown in the following figure, the evaluation is successful. You can click the evaluation result to view its details.

Method 2: Evaluate traces in batches online
On the Online Evaluation tab of the Tracing tab, click Create Evaluation.
On the Create Evaluation Task page, configure the following parameters and then click OK.
Parameter
Description
Basic Configuration
Task Name
Enter a custom task name based on the prompts on the interface.
Evaluation Configuration
Evaluation Metrics
This is a fixed configuration and cannot be changed. The evaluation is performed based on the following dimensions:
Correctness: Determines whether the answer correctly addresses the question based on the input and reference text.
Faithfulness: Determines whether the answer is generated based on the input and reference text and whether it contains hallucinations.
Retrieval Relevance: Determines whether the retrieved content is relevant to the input question. It includes the following four metrics:
nDCG: Normalized Discounted Cumulative Gain
Hit Rate
Precision@K
MRR: Mean Reciprocal Rank
Model Selection
The following two models are supported:
PAI Judge Model
qwen-max (Model Studio model)
NoteTo use a Model Studio model, you need to configure an Internet connection for EAS.
Calls to Model Studio models are billed separately. For more information, see Billing.
Model Token
Enter the token for the selected model:
Judge model: Go to the Judge Model page, activate the PAI judge model, and obtain a token.
qwen-max: To learn how to obtain a token for the Model Studio qwen-max model, see Call the Tongyi Qianwen API for the first time.
Sampling Start and End Time
Select the start and end dates for sampling.
Sampling Policy
The following two sampling policies are supported:
Sample by time window: Samples once every X minutes.
Sample by probability: Randomly and uniformly samples a certain percentage of traces.
QCA Extraction Configuration: Trace data is a JSON-formatted string. The QCA extraction configuration specifies the path of the QCA in the JSON string. The value corresponding to the path is the specific content of the QCA.
Query Extraction Configuration
Query Extraction Configuration: Extracts the user query content (input).
Answer Extraction Configuration: Extracts the system-generated answer (output).
Context Extraction Configuration: Extracts the text or background information provided to the system (documents).
Configure the SpanName, JsonPathInSpan, and JsonPathInSpanValue parameters to extract the corresponding content. For more information about how to configure these parameters, see Extraction Configuration.
Answer Extraction Configuration
Context Extraction Configuration
When the evaluation task Status is Completed, all sampling evaluation operations have finished, and the task will not generate any new evaluation results.
After the evaluation is complete, you can view the results in the Evaluation Result column of the evaluation task. You can also click the task name to view its details.
View evaluation results: The system dynamically calculates and displays the average value of the evaluation results based on successful traces. A value closer to 1 indicates stronger relevance.

View evaluation details:

You can then perform management operations on the evaluation task, such as updating, stopping, deleting, and cloning it. Cloning only copies the task configuration to create a new evaluation task.