Alibaba Cloud defines LLM Trace fields based on the OpenTelemetry standard and concepts from the large language model (LLM) application domain. These fields extend Attributes, Resources, and Events to describe the semantics of LLM application trace data. They reflect key operations such as LLM input and output requests and token consumption. They provide rich, context-aware semantic data for scenarios such as Completion, Chat, retrieval-augmented generation (RAG), Agent, and Tool to facilitate data tracing and reporting. These semantic fields will be continuously updated and optimized as the community evolves.
Top-level Span field definitions are based on the OpenTelemetry open standard. For more information about the top-level Trace fields stored by Managed Service for OpenTelemetry, see Trace analysis parameters.
The LLM-related SpanKind is an Attribute. It is different from the Span kind defined in OpenTelemetry Traces.
Common fields
Attributes
AttributeKey | Description | Type | Example value | Requirement level |
| Session ID | String |
| Conditionally required |
| The ID of the end user of the application. | String |
| Conditionally required |
| Operation type | String | See LLM Span Kind | Required |
| The type of framework used. | String |
| Conditionally required |
Resources
ResourceKey | Description | Type | Example value | Requirement level |
| Application name | String |
| Required |
Chain
A Chain is a tool that connects an LLM with multiple other components to perform complex tasks. It can include Retrieval, Embedding, LLM calls, and even nested Chains.
Attributes
AttributeKey | Description | Type | Example value | Requirement level |
| Operation type. A dedicated enumeration for the LLM spanKind. For a Chain, the value must be | String |
| Required |
| Secondary operation type | String |
| Conditionally required |
| Input content | String |
| Recommended |
| Returned content | String |
| Recommended |
| Time to first token. The overall time from when the server receives a user's request to when the first packet of the response is returned. The unit is nanoseconds. | Integer | 1000000 | Recommended |
Retriever
A Retriever typically accesses a vector store or database to retrieve data. This is often used to supplement context to improve the accuracy and efficiency of the LLM response.
Attributes
AttributeKey | Description | Type | Example value | Requirement level |
| Operation type. A dedicated enumeration for the LLM spanKind. For a Retriever, the value must be | String |
| Required |
| The short query for retrieval. | String |
| Recommended |
| A list of retrieved documents. | JSON array |
| Required |
Reranker
A Reranker sorts multiple input documents based on their relevance to a query. It may return the top-K documents for the LLM.
Attributes
AttributeKey | Description | Type | Example value | Requirement level |
| Operation type. A dedicated enumeration for the LLM spanKind. For a Reranker, the value must be | String |
| Required |
| The input parameter for the Reranker request. | String |
| Optional |
| The name of the model used by the Reranker. | String |
| Optional |
| The rank after reranking. | Integer |
| Optional |
| Metadata related to the input documents for reranking. It is a JSON array structure. The metadata contains basic document information, such as path, file name, and source. | String | - | Required |
| Metadata related to the output documents after reranking. It is a JSON array structure. The metadata contains basic document information, such as path, file name, and source. | String | - | Required |
LLM
An LLM span indicates a call to a large language model, such as using an SDK or OpenAPI to request inference or text generation from different large models.
Attributes
AttributeKey | Description | Type | Example value | Requirement level |
| Operation type. A dedicated enumeration for the LLM spanKind. For an LLM, the value must be | String |
| Required |
| Secondary operation type | String |
| Optional |
| Prompt template | String |
| Optional |
| The specific values for the prompt template. | String |
| Optional |
| The version number of the prompt template. | String |
| Optional |
| The provider of the large model. | String |
| Required |
| Input parameters for the LLM call. | String |
| Optional |
| Model name | String |
| Optional |
| The unique ID of the conversation. This should be collected if the session ID can be easily obtained during instrumentation. | String |
| Conditionally required |
| The output type specified in the LLM request. This should be collected if it is available and the request specifies a type, such as an output format. | String |
| Conditionally required |
| The number of candidate generations requested from the LLM. | Int |
| Conditionally required if the value is not 1 |
| The model name specified in the LLM request. | String |
| Required |
| The seed specified in the LLM request. | String |
| Conditionally required |
| The frequency penalty set in the LLM request. | Float |
| Recommended |
| The maximum number of tokens specified in the LLM request. | Integer |
| Recommended |
| The presence penalty set in the LLM request. | Float |
| Recommended |
| The temperature specified in the LLM request. | Float |
| Recommended |
| The top_p value specified in the LLM request. | Float |
| Recommended |
| The top_k value specified in the LLM request. | Float |
| Recommended |
| Indicates whether the response is streamed. If this field does not exist, the value is considered false. | Boolean |
| Recommended |
| The stop sequences for the LLM. | String[] |
| Recommended |
| The content of tool calls. (To be deprecated and replaced with | String |
| Recommended |
| The unique ID generated by the LLM. | String |
| Recommended |
| The name of the model used for LLM generation. | String |
| Recommended |
| The reason why the LLM stopped generating. | String[] |
| Recommended |
| The time to first token for the large model itself in a streaming response scenario. It represents the overall time from when the server receives a user's request to when the first packet of the response is returned. The unit is nanoseconds. | Integer |
| Recommended |
| The inference time of the reasoning model. It represents the duration of the reasoning process in the response. The unit is milliseconds. | Integer |
| Recommended |
| The number of tokens used for the input. | Integer |
| Recommended |
| The number of tokens used for the output. | Integer |
| Recommended |
| The total number of tokens used. | Integer |
| Recommended |
| A link to the model input content. | String |
| Recommended |
| A link to the model output content. | String |
| Recommended |
| A link to the content of the system prompt. Used to separately record an external link to the content of the system prompt (/system instruction). If the content of the system prompt can be obtained separately, it should be recorded in this field. If the system prompt content is part of the model call, it should be recorded in the link corresponding to the | String |
| Recommended if available |
| Model input content. Messages must be provided in the order they are sent to the model or agent. By default, this information should not be collected unless the user explicitly enables it. | String |
| Optional |
| Model output content. Messages must be provided in the order they are sent to the model or agent. By default, this information should not be collected unless the user explicitly enables it. | String |
| Optional |
| The content of the system prompt. Used to separately record the content of the system prompt (/system instruction) as a JSON string. If the content of the system prompt can be obtained separately, it should be recorded in this field. If the system prompt content is part of the model call, it should be recorded in the By default, this information should not be collected unless the user explicitly enables it. | String |
| Optional |
| The reasoning content from the reasoning model. Represents the content of the reasoning process in the response. The default length limit is 1024 characters. Content that exceeds this limit should be truncated. | String |
| Optional |
| Tool definitions | String |
| Recommended |
Embedding
An Embedding span indicates an embedding process, such as an operation to embed text into a large model. This can be used later for similarity queries to optimize subsequent operations.
Attributes
AttributeKey | Description | Type | Example value | Requirement level |
| Operation type. A dedicated enumeration for the LLM spanKind. For an Embedding, the value must be | String |
| Required |
| Token consumption for the input text. | Integer |
| Optional |
| Total token consumption for the embedding. | Integer |
| Optional |
| The name of the embedding model. (To be deprecated and replaced with | String |
| Optional |
| Embedding result. (To be deprecated). | String | - | Optional |
| Secondary operation type | String |
| Conditionally required |
| Encoding format | String | ["base64"] | Recommended |
| Number of embedding dimensions. | Integer | 100 | Recommended |
| The model name specified in the Embedding request. | String |
| Conditionally required |
Tool
A Tool span indicates a call to an external tool. For example, it might involve calling a calculator or requesting the latest weather information from a weather API.
Attributes
AttributeKey | Description | Type | Example value | Requirement level |
| Operation type. A dedicated enumeration for the LLM spanKind. For a Tool, the value must be | String |
| Required |
| Tool name. (To be deprecated and replaced with | String |
| Required |
| Tool description. (To be deprecated and replaced with | String |
| Required |
| Tool input parameters. (To be deprecated and replaced with | String |
| Required |
| Secondary operation type | String |
| Conditionally required |
| Tool ID | String |
| Recommended |
| Tool description | String |
| Recommended |
| Tool name | String |
| Recommended |
| Tool type | String |
| Recommended |
| Input arguments for the tool call. | String |
| Optional |
| Return value of the tool call. | String |
| Optional |
Agent
An Agent span represents an agent scenario. It is a more complex type of Chain that makes decisions for the next step based on the inference results of a large model. For example, it might involve multiple calls to LLMs and Tools, making step-by-step decisions to reach a final answer.
Attributes
AttributeKey | Description | Type | Example value | Requirement level |
| Operation type. A dedicated enumeration for the LLM spanKind. For an Agent, the value must be | String |
| Required |
| Input parameter. Records the original input. | String |
| Required |
| Input MIME type. | String |
| Optional |
| Return result. Returns the final output. | String |
| Required |
| Output MIME type. | String |
| Optional |
| The time to first token for the Agent. It represents the overall time from when the server receives a user's request to when the first packet of the response is returned. The unit is nanoseconds. | Integer |
| Recommended |
Task
A Task span indicates an internal custom method. For example, it might involve calling a local function or other application-defined logic.
Attributes
AttributeKey | Description | Type | Example value | Requirement level |
| Operation type. A dedicated enumeration for the LLM spanKind. For a Task, the value must be | String |
| Required |
| Input parameters | String | Custom JSON format | Optional |
| Input MIME type. | String |
| Optional |
| Output MIME type. | String |
| Optional |