How to configure the tracing feature when deploying a service - Platform For AI

This topic describes how to enable the tracing feature when you deploy a service.

Background information

As Large Language Model (LLM) technology becomes more widespread, enterprises face many challenges when building LLM-based applications. These challenges include unpredictable outputs, complex call paths, difficulty in identifying performance bottlenecks, and a lack of granular observability. To address these challenges, Elastic Algorithm Service (EAS) offers a tracing feature.

The core goal of tracing is to improve application observability and help you evaluate your LLM applications. When you enable tracing, EAS automatically integrates with Alibaba Cloud's Application Real-Time Monitoring Service (ARMS) to provide the following capabilities:

Trace visualization: Visualize the complete path of a request with clear trace logs.
Performance monitoring: Track key performance indicators, such as response time, token consumption, and error counts, to promptly identify performance bottlenecks.
Issue detection and root cause analysis: Quickly locate issues by using a Trace ID and perform root cause analysis with contextual information.
Evaluation tools: Use evaluation tools based on trace data to verify the accuracy and reliability of LLM application outputs.

Basic concepts

Trace
A trace represents the complete execution path of a transaction or request in a distributed system. It records how the request flows through various services or modules. A trace consists of multiple spans. With a trace, you can visualize the request's journey and quickly locate performance bottlenecks or the source of errors. A Trace ID uniquely identifies a trace. You can use the Trace ID to query the details and logs of a specific call.
Span
A span is a basic unit within a trace. It represents a single, named, and timed operation, recording details such as the operation name, start time, and end time.
Python probe
A Python probe is a tool that automatically collects trace data and performance metrics from Python applications. You enable tracing by installing the Python probe when deploying an EAS service.
Evaluation
This refers to the comprehensive assessment of the answers generated by an LLM application in response to user questions, measured across multiple dimensions. Contact your business manager to confirm the specific evaluation dimensions.

Limitations

The EAS tracing feature only supports Python-based LLM applications developed with LangChain, LlamaIndex, or Dashscope.

Prerequisites

You have activated ARMS. For more information, see Activate ARMS.
You have activated LangStudio. For more information, see Authorize the PAI service account. The AliyunServiceRoleForPaiLLMTrace role is required.
If you are using a RAM user or RAM role, grant the AliyunPAILLMTraceFullAccess permission to the user or role before using this feature. For more information, see Grant permissions to a RAM role and Grant permissions to a RAM user.

Step 1: Prepare the application

To provide an end-to-end walkthrough from service deployment and invocation to trace viewing, this topic uses a simple prediction service as an example.

This code is a simple prediction service developed based on the Dashscope API. It uses the Flask framework to build a web service and calls a model for text generation via the Generation.call method of Dashscope. Before using Dashscope for the first time, activate it and obtain an API key. For more information, see Call the Dashscope API for the first time. Then, set DASHSCOPE_API_KEY as an environment variable when deploying the service to access the API service. The following is an example of the app.py code file:

import os
import json
import flask
import dashscope

app = flask.Flask(__name__)

def run_query(query):
    """Run a query."""

    response = dashscope.Generation.call(
        api_key=os.getenv('DASHSCOPE_API_KEY'),
        model="qwen-plus",
        messages=[
            {'role': 'system', 'content': 'You are a helpful assistant.'},
            {'role': 'user', 'content': query}
        ],
        result_format='message'
    )
    return response


@app.route('/api/trace_demo', methods=['POST'])
def query():
    """
    POST data example:
    {
        "query": "capital of china"
    }
    """
    data = flask.request.get_data("utf-8")
    query = json.loads(data).get('query', '')
    response = run_query(query)
    return response.output.choices[0].message.content


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8000)

Step 2: Enable tracing

When you deploy an EAS service, you can enable tracing by turning on the Tracing switch in the Service features section. Follow the prompts to determine if the image you are using has the tracing component pre-installed. If not, you must configure commands to install the probe and start the application with the ARMS Python probe. For more configuration details, see Manually install a Python probe.

If you select an image with the tracing component built-in: Turn on the Tracing switch to enable the feature with one click. No additional configuration is required.
If you select an image without the tracing component built-in: Turn on the Tracing switch and configure the third-party libraries and startup command according to the prompts:

Parameter	Description
Startup Command	Add `<br>aliyun-bootstrap -a install && aliyun-instrument python app.py<br>`. This command installs the probe and starts the application using the ARMS Python probe. `app.py` is the main file in your image that provides the prediction service. The Third-party Library Configuration must include `aliyun-bootstrap`.
Third-party Library Configuration	Add `<br>aliyun-bootstrap<br>` to download the probe installer from the PyPI repository.

This topic demonstrates the process using an image without the built-in tracing component and the provided example code. The following table describes the key parameter configurations for deploying a custom EAS service. For detailed instructions, see Deploy a service in the console. After the service is deployed:

View the service deployment status on the Elastic Algorithm Service (EAS) page.
View the registered application on the Application List page in the ARMS console. The application name is the same as the EAS service name.

Parameter		Description
Environment Context	Deployment Method	Select Image Deployment.
	Image Configuration	This topic uses the default image: Official Image > python-inference:3.9-ubuntu2004. You can also enter a prepared custom image on the Image URL tab.
	Direct Mount	Since the example code is not integrated into the image, you must mount it to the service instance. To mount from OSS, click OSS and configure the following parameters: Uri: Select the OSS folder where the sample code file is located. To learn how to upload a code file to an OSS folder, see Quick Start in the console. Mount Path: Set this to `/mnt/data/`. The code file will be read from this path. If you use a custom image and have already configured the main file for the prediction service in the image, you can skip this configuration.
	Startup Command	This topic sets the command to `aliyun-bootstrap -a install && aliyun-instrument python /mnt/data/app.py`. `/mnt/data/app.py` is the mounted sample code file.
	Environment Variables	Because the sample code calls the Dashscope API, click Add and configure the following environment variable: Key: `DASHSCOPE_API_KEY` Value: Set this to your API key. To learn how to obtain an API key, see Call the Tongyi Qianwen API for the first time.
	Third-party Library Configuration	Set the third-party libraries to `aliyun-bootstrap flask dashscope`.
Service Registration	Virtual Private Cloud (VPC)	To use the tracing feature, you must configure a virtual private cloud (VPC). Select a Virtual Private Cloud (VPC), VSwitch, and Security Group in the region. By default, EAS services cannot access the Internet. To run the sample code, which needs to call the Dashscope API, you must configure a VPC with Internet access for the EAS service. This ensures that the service can access the Internet. For specific instructions, see Scenario 1: Allow an EAS service to access the Internet.
	VSwitch
	Security Group Name
Service Features	Tracing	Turn on the Tracing switch and configure the third-party libraries and startup command in the Environment Context section.

Step 3: View traces

After calling the service, follow these steps to view the generated traces.

Call the EAS service

This topic uses online debugging as an example. You can also call the EAS service using an API. For more information, see API calls.

On the Elastic Algorithm Service (EAS) page, find the target service and click > Online Debug in the Actions column.
On the Body tab, send a request to the specified address according to your prediction service definition.
This example uses the service interface defined in the app.py file. The following figure shows the sample result:

View trace information

Note

Trace data is stored for 30 days by default. If you need to extend this period, contact the ARMS team for custom configuration.

Switch to the Trace Query tab on the Tracing tab to view trace information.
Find the target trace and click View Trace in the Actions column to go to the Trace Details page.
The trace data on this page lets you view the service's input, output, and related log information.
Note: If you use a RAM user or a RAM role, you must grant the AliyunARMSReadOnlyAccess permission to use this feature.
To share trace details, click the share icon to generate a one-time address.
Note: If you are using a RAM user or RAM role, grant the cms:CreateTicket permission to the user or role before using this feature.

Step 4: Evaluate application performance

EAS provides evaluation tools based on trace data to verify the accuracy and reliability of LLM application outputs. Two evaluation methods are available, each suitable for different scenarios:

Method 1: Evaluate a single trace: Manually select and evaluate a specific trace from the EAS service. This is suitable for debugging a specific trace during development or testing to verify its logic and performance.
Method 2: Evaluate traces in batches online: Periodically evaluate a sample of traces generated by the EAS service during runtime. This is suitable for large-scale performance testing or functional verification, helping you understand the overall system status and trace collaboration.

Note

Trace data is stored for 30 days by default. If you need to extend this period, contact the ARMS team for custom configuration.

Method 1: Evaluate a single trace

On the Trace Query tab of the Tracing tab, find the target trace and click Evaluate in the Actions column. Then, in the Evaluate configuration panel, configure the following parameters.

Evaluation Metrics: This is a fixed configuration and cannot be changed. The evaluation is performed based on the following dimensions.

Evaluation Metric	Description
Correctness	Determines whether the answer correctly addresses the question based on the input and reference text.
Faithfulness	Determines whether the answer is generated based on the input and reference text and checks for hallucinations.
Retrieval Relevance	Determines whether the retrieved results are relevant to the input question. It includes the following four metrics: nDCG: Normalized Discounted Cumulative Gain Hit Rate Precision@K MRR: Mean Reciprocal Rank

Model Configuration: The large language model (LLM) used to evaluate the trace. After the initial setup, this configuration is automatically backfilled for subsequent evaluations.

Parameter

Description

Model Selection

The following two models are supported:

PAI Judge Model
qwen-max (Model Studio model)
Note
- To use a Model Studio model, you need to configure an Internet connection for EAS.
- Calls to Model Studio models are billed separately. For more information, see Billing.

Model Token

Enter the token for the selected model:

Judge model: Go to the Judge Model page, activate the PAI judge model, and obtain a token.
qwen-max: To learn how to obtain a token for the Model Studio qwen-max model, see Call the Tongyi Qianwen API for the first time.

Extraction Configuration: In the Query Extraction Configuration, Answer Extraction Configuration, and Context Extraction Configuration sections, configure the parameters in the following table to extract the corresponding content:

Query Extraction Configuration: Extracts the user query content (input).
Answer Extraction Configuration: Extracts the system-generated answer (output).
Context Extraction Configuration: Extracts the text or background information provided to the system (documents).

Parameter	Description
SpanName	Finds a span that matches the SpanName.
JsonPathInSpan	The format is a.b.c. This parameter cannot be empty. It extracts a value from a specified element of the matched span.
JsonPathInSpanValue	The format is a.b.c. This parameter can be empty. After the element corresponding to JsonPathInSpan is found, if the element's content is a JSON string, JsonPathInSpanValue is used to extract the corresponding value.

You can click View Trace in the Actions column to obtain the configuration content from the Trace Details page. The following table shows configuration examples:

Extraction Configuration	How to obtain	Example value
Query Extraction Configuration	This topic provides an example where JsonPathInSpanValue has no value: For an example where JsonPathInSpanValue has a value, see the following figure.	JsonPathInSpanValue has no value SpanName: LLM JsonPathInSpan: attributes.input.value JsonPathInSpanValue: Because the content of the JsonPathInSpan element is not a JSON string, this parameter is empty. JsonPathInSpanValue has a value SpanName: LLM JsonPathInSpan: `attributes.input.value` JsonPathInSpanValue: Because the content of the JsonPathInSpan element is a JSON string, enter `text[0]` here.
Answer Extraction Configuration		SpanName: LLM JsonPathInSpan: `attributes.output.value` JsonPathInSpanValue: This parameter is empty.
Context Extraction Configuration	The sample service in this topic does not include a context extraction configuration. For an example of a context extraction configuration, see the following figure:	SpanName: retrieve JsonPathInSpan: `attributes.retrieval.documents[].document.content` Important* Only the context configuration supports using an asterisk (). JsonPathInSpanValue: Because the content of the JsonPathInSpan* element is not a JSON string, this parameter is empty.

After you configure the parameters, click OK.
When a result appears in the Evaluation Result column as shown in the following figure, the evaluation is successful. You can click the evaluation result to view its details.

Method 2: Evaluate traces in batches online

On the Online Evaluation tab of the Tracing tab, click Create Evaluation.

On the Create Evaluation Task page, configure the following parameters and then click OK.

Parameter	Description
Basic Configuration
Task Name	Enter a custom task name based on the prompts on the interface.
Evaluation Configuration
Evaluation Metrics	This is a fixed configuration and cannot be changed. The evaluation is performed based on the following dimensions: Correctness: Determines whether the answer correctly addresses the question based on the input and reference text. Faithfulness: Determines whether the answer is generated based on the input and reference text and whether it contains hallucinations. Retrieval Relevance: Determines whether the retrieved content is relevant to the input question. It includes the following four metrics: nDCG: Normalized Discounted Cumulative Gain Hit Rate Precision@K MRR: Mean Reciprocal Rank
Model Selection	Two models are supported: PAI Judge Model qwen-max (Model Studio model) Note To use a Model Studio model, you need to configure an Internet connection for EAS. Calls to Model Studio models are billed separately. For more information, see Billing.
Model Token	Enter the token for the selected model: Judge model: Go to the Judge Model page to activate the service and obtain a token. qwen-max: To obtain a token for the Model Studio qwen-max model, see Call the Tongyi Qianwen API for the first time.
Sampling Start and End Time	Select the start and end dates for sampling.
Sampling Policy	Two sampling policies are supported: Sample by time window: Samples one trace every x minutes. Sample by probability: Randomly samples a specified percentage of traces.
QCA Extraction Configuration: Trace data is a JSON-formatted string. QCA extraction configuration specifies the path of the Q, C, and A (Question, Context, Answer) within the JSON string. The value at that path constitutes the QCA content.
Query Extraction Configuration	Query extraction configuration: Extracts the user's query (input). Answer extraction configuration: Extracts the answer generated by the system (output). Context extraction configuration: Extracts the text or background information provided to the system (documents). Configure the SpanName, JsonPathInSpan, and JsonPathInSpanValue parameters to extract the corresponding content. For more information about how to configure these parameters, see Extraction Configuration.
Answer Extraction Configuration
Context Extraction Configuration

When the evaluation task Status is Completed, all sampling evaluation operations have finished, and the task will not generate any new evaluation results.

After the evaluation is complete, you can view the results in the Evaluation Result column of the task list. You can also click the task name to view its details.
- View evaluation results: The system dynamically calculates and displays the average score from successful traces. A value closer to 1 indicates stronger relevance.
- View evaluation details:

You can perform management operations on the evaluation task, such as Update, Stop, Delete, and Clone. The Clone action copies the task configuration to create a new evaluation task.