All Products
Search
Document Center

Platform For AI:Enable tracing for LLM-based applications in EAS

Last Updated:Sep 16, 2025

This topic describes how to enable the tracing feature when you deploy a service.

Background information

As large language model (LLM) technology becomes more widely used, businesses face many challenges when building LLM-based applications. These challenges include unpredictable outputs, complex call chains, difficulty in identifying performance bottlenecks, and a lack of fine-grained observability. To address these challenges, Elastic Algorithm Service (EAS) supports tracing.

The main goal of tracing is to improve application observability and help you better evaluate your LLM applications. When you enable tracing, EAS automatically integrates with Alibaba Cloud Application Real-Time Monitoring Service (ARMS) to provide the following features:

  • Call chain visualization: Provides clear call chain logs to help you visualize the full path of a request.

  • Performance monitoring: Tracks key performance indicators, such as response time, token consumption, and error counts, to help you promptly identify performance bottlenecks.

  • Issue identification and root cause analysis: Uses a Trace ID to quickly locate issues and perform root cause analysis with contextual information.

  • Evaluation tools: Provides evaluation tools based on call chain data to verify the accuracy and reliability of LLM application outputs.

Terms

  • Trace

    A trace is the complete execution path of a transaction or request in a distributed system. It records how a request flows through different services or modules. A trace consists of multiple spans. Traces help you visualize request flows and quickly identify performance bottlenecks or the source of errors. The TraceID is the unique identifier for a trace. You can use the TraceID to query the details and logs of a specific call.

  • Span

    A span is a basic unit in a trace. It represents a specific operation and records details about that operation, including its name, start time, and end time.

  • Python probe

    A Python probe is a tool that automatically collects call chain data and performance metrics from Python applications. When you deploy an EAS service, you can install a Python probe to enable tracing.

  • Evaluation

    Evaluation is the process of assessing the answers generated by an LLM application in response to user questions across multiple dimensions. To confirm the specific evaluation dimension names, contact your business manager.

Limits

This feature supports only LLM applications developed with LangChain, Llama-index, or Dashscope.

Prerequisites

Step 1: Preparations

To provide a complete walkthrough from service deployment and invocation to trace viewing, this topic uses a simple prediction service as an example.

This code is for a simple prediction service developed based on the Dashscope API. It uses the Flask framework to build a web service and calls the Generation.call method of Dashscope to perform text generation. Before you use Dashscope for the first time, you must complete the activation process and obtain an API key. For more information, see Call the Tongyi Qianwen API for the first time. Then, when you deploy the service, you must set DASHSCOPE_API_KEY as an environment variable to ensure proper access to the API service. The following code shows an example of the app.py file:

import os
import json
import flask
import dashscope

app = flask.Flask(__name__)

def run_query(query):
    """Run a query."""

    response = dashscope.Generation.call(
        api_key=os.getenv('DASHSCOPE_API_KEY'),
        model="qwen-plus",
        messages=[
            {'role': 'system', 'content': 'You are a helpful assistant.'},
            {'role': 'user', 'content': query}
        ],
        result_format='message'
    )
    return response


@app.route('/api/trace_demo', methods=['POST'])
def query():
    """
    Post data example:
    {
        "query": "capital of china"
    }
    """
    data = flask.request.get_data("utf-8")
    query = json.loads(data).get('query', '')
    response = run_query(query)
    return response.output.choices[0].message.content


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8000)  

Step 2: Enable tracing

When you deploy an EAS service, you can turn on the Tracing switch in the Service Features section to enable tracing. Based on the prompts, determine whether the image you are using has a built-in tracing component. If needed, configure the commands to install the probe and start the application with the ARMS Python probe. For more information about the configuration, see Manually install a Python probe.

  • Use an image with a built-in tracing component: Turn on the Tracing switch to enable tracing with a single click. No additional parameter configuration is required.

  • Use an image without a built-in tracing component: Turn on the Tracing switch and configure the third-party libraries and startup command based on the prompts.

    Parameter

    Description

    Startup Command

    Add aliyun-bootstrap -a install && aliyun-instrument python app.py. This command installs the probe and starts the application with the ARMS Python probe. `app.py` is the main file configured in the image to provide the prediction service. You must also add `aliyun-bootstrap` to the third-party library configuration to download the probe installer from the Python Package Index (PyPI) repository.

    Third-party Library Configuration

    Add aliyun-bootstrap to download the probe installer from the PyPI repository.

image

This topic uses an image without a built-in tracing component and the sample code as an example. The following table describes the key parameters for deploying a custom EAS service. For specific instructions, see Deploy a service in the console. After the service is deployed, you can perform the following operations:

Parameter

Description

Environment Context

Deployment Method

Select Image Deployment.

Image Configuration

This topic uses the default image: Official Image > python-inference:3.9-ubuntu2004.

You can also enter a prepared custom image on the Image URL tab.

Direct Mount

Because the sample code file is not integrated into the image, you must mount it to the service instance. Take mounting from Object Storage Service (OSS) as an example. Click OSS and configure the following parameters:

  • Uri: Select the OSS folder where the sample code file is located. To learn how to upload a code file to an OSS folder, see Quick Start in the console.

  • Mount Path: Set this to /mnt/data/. The code file will be read from this path.

If you use a custom image and have already configured the main file for the prediction service in the image, you can skip this configuration.

Startup Command

This topic sets the command to aliyun-bootstrap -a install && aliyun-instrument python /mnt/data/app.py.

/mnt/data/app.py is the mounted sample code file.

Environment Variables

Because the sample code calls the Dashscope API, click Add and configure the following environment variable:

Third-party Library Configuration

Set the third-party libraries to aliyun-bootstrap flask dashscope.

Service Registration

Virtual Private Cloud (VPC)

To use the tracing feature, you must configure a virtual private cloud (VPC). Select a Virtual Private Cloud (VPC), VSwitch, and Security Group in the region.

By default, EAS services cannot access the Internet. To run the sample code, which needs to call the Dashscope API, you must configure a VPC with Internet access for the EAS service. This ensures that the service can access the Internet. For specific instructions, see Scenario 1: Allow an EAS service to access the Internet.

VSwitch

Security Group Name

Service Features

Tracing

Turn on the Tracing switch and configure the third-party libraries and startup command in the Environment Context section.

Step 3: View traces

An evaluation tool based on traces helps developers verify the accuracy and reliability of the output from LLM applications. Perform the following steps:

Call the EAS service

This topic uses online debugging as an example. You can also call the EAS service using an API. For more information, see API calls.

  1. On the Elastic Algorithm Service (EAS) page, find the target service and click image > Online Debug in the Actions column.

  2. On the Body tab, send request data to the specified address based on your defined prediction service.

    This topic uses the service interface defined in the sample app.py file. The following figure shows an example result:

    image

View trace information

Note

By default, trace data is stored for 30 days from the time it is generated. To extend the storage period, contact the ARMS team for custom configuration.

  1. Switch to the Trace Query tab on the Tracing tab to view trace information.image

  2. Find the target trace and click View Trace in the Actions column to go to the Trace Details page.

    The trace data on this page lets you view the service's input, output, and related log information.image

Step 4: Evaluate application performance

EAS provides evaluation tools based on traces to verify the accuracy and reliability of LLM application outputs. The following two evaluation methods are supported:

  • Method 1: Evaluate a single trace: Manually select a trace from the EAS service for evaluation. This method is suitable for the development or testing phase to debug a specific trace and ensure that its logic is correct and its performance meets expectations.

  • Method 2: Evaluate traces in batches online: Periodically evaluate sampled traces generated by the running EAS service. This method is suitable for large-scale performance testing or feature validation scenarios to help you understand the overall system status and the effectiveness of trace collaboration.

Note

By default, trace data is stored for 30 days from the time it is generated. To extend the storage period, contact the ARMS team for custom configuration.

Method 1: Evaluate a single trace

  1. On the Trace Query tab of the Tracing tab, find the target trace and click Evaluate in the Actions column. Then, in the Evaluate configuration panel, configure the following parameters.image

    • Evaluation Metrics: This is a fixed configuration and cannot be changed. The evaluation is performed based on the following dimensions.

      Evaluation Metric

      Description

      Correctness

      Determines whether the answer correctly addresses the question based on the input and reference text.

      Faithfulness

      Determines whether the answer is generated based on the input and reference text and whether it contains hallucinations.

      Retrieval Relevance

      Determines whether the retrieved results are relevant to the input question. It includes the following four metrics:

      • nDCG: Normalized Discounted Cumulative Gain

      • Hit Rate

      • Precision@K

      • MRR: Mean Reciprocal Rank

    • Model Configuration: The large language model (LLM) used to evaluate the trace. After the initial setup, this configuration is automatically backfilled for subsequent evaluations.

      Parameter

      Description

      Model Selection

      The following two models are supported:

      Model Token

      Enter the token for the selected model:

    • Extraction Configuration: In the Query Extraction Configuration, Answer Extraction Configuration, and Context Extraction Configuration sections, configure the parameters in the following table to extract the corresponding content:

      • Query Extraction Configuration: Extracts the user query content (input).

      • Answer Extraction Configuration: Extracts the system-generated answer (output).

      • Context Extraction Configuration: Extracts the text or background information provided to the system (documents).

      Parameter

      Description

      SpanName

      Finds a span that matches the SpanName.

      JsonPathInSpan

      The format is a.b.c. This parameter cannot be empty. It extracts a value from a specified element of the matched span.

      JsonPathInSpanValue

      The format is a.b.c. This parameter can be empty. After the element corresponding to JsonPathInSpan is found, if the element's content is a JSON string, JsonPathInSpanValue is used to extract the corresponding value.

      You can click View Trace in the Actions column to obtain the configuration content from the Trace Details page. The following table shows configuration examples:

      Extraction Configuration

      How to obtain

      Example value

      Query Extraction Configuration

      This topic provides an example where JsonPathInSpanValue has no value:

      image

      For an example where JsonPathInSpanValue has a value, see the following figure.

      image

      • JsonPathInSpanValue has no value

        • SpanName: LLM

        • JsonPathInSpan: attributes.input.value

        • JsonPathInSpanValue: Because the content of the JsonPathInSpan element is not a JSON string, this parameter is empty.

      • JsonPathInSpanValue has a value

        • SpanName: LLM

        • JsonPathInSpan: attributes.input.value

        • JsonPathInSpanValue: Because the content of the JsonPathInSpan element is a JSON string, enter text[0] here.

      Answer Extraction Configuration

      image

      • SpanName: LLM

      • JsonPathInSpan: attributes.output.value

      • JsonPathInSpanValue: This parameter is empty.

      Context Extraction Configuration

      The sample service in this topic does not include a context extraction configuration. For an example of a context extraction configuration, see the following figure:

      image

      • SpanName: retrieve

      • JsonPathInSpan: attributes.retrieval.documents[*].document.content

        Important

        Only the context configuration supports using an asterisk (*).

      • JsonPathInSpanValue: Because the content of the JsonPathInSpan element is not a JSON string, this parameter is empty.

  2. After you configure the parameters, click OK.

    When a result appears in the Evaluation Result column as shown in the following figure, the evaluation is successful. You can click the evaluation result to view its details.image

Method 2: Evaluate traces in batches online

  1. On the Online Evaluation tab of the Tracing tab, click Create Evaluation.

  2. On the Create Evaluation Task page, configure the following parameters and then click OK.

    Parameter

    Description

    Basic Configuration

    Task Name

    Enter a custom task name based on the prompts on the interface.

    Evaluation Configuration

    Evaluation Metrics

    This is a fixed configuration and cannot be changed. The evaluation is performed based on the following dimensions:

    • Correctness: Determines whether the answer correctly addresses the question based on the input and reference text.

    • Faithfulness: Determines whether the answer is generated based on the input and reference text and whether it contains hallucinations.

    • Retrieval Relevance: Determines whether the retrieved content is relevant to the input question. It includes the following four metrics:

      • nDCG: Normalized Discounted Cumulative Gain

      • Hit Rate

      • Precision@K

      • MRR: Mean Reciprocal Rank

    Model Selection

    The following two models are supported:

    Model Token

    Enter the token for the selected model:

    Sampling Start and End Time

    Select the start and end dates for sampling.

    Sampling Policy

    The following two sampling policies are supported:

    • Sample by time window: Samples once every X minutes.

    • Sample by probability: Randomly and uniformly samples a certain percentage of traces.

    QCA Extraction Configuration: Trace data is a JSON-formatted string. The QCA extraction configuration specifies the path of the QCA in the JSON string. The value corresponding to the path is the specific content of the QCA.

    Query Extraction Configuration

    • Query Extraction Configuration: Extracts the user query content (input).

    • Answer Extraction Configuration: Extracts the system-generated answer (output).

    • Context Extraction Configuration: Extracts the text or background information provided to the system (documents).

    Configure the SpanName, JsonPathInSpan, and JsonPathInSpanValue parameters to extract the corresponding content. For more information about how to configure these parameters, see Extraction Configuration.

    Answer Extraction Configuration

    Context Extraction Configuration

    When the evaluation task Status is Completed, all sampling evaluation operations have finished, and the task will not generate any new evaluation results.

  3. After the evaluation is complete, you can view the results in the Evaluation Result column of the evaluation task. You can also click the task name to view its details.

    • View evaluation results: The system dynamically calculates and displays the average value of the evaluation results based on successful traces. A value closer to 1 indicates stronger relevance.image

    • View evaluation details:image

You can then perform management operations on the evaluation task, such as updating, stopping, deleting, and cloning it. Cloning only copies the task configuration to create a new evaluation task.image