Alibaba Cloud defines its Large Language Model (LLM) trace fields based on the OpenTelemetry open standard and concepts from the LLM application domain. By extending Attributes, Resource, and Event, these fields describe the semantics of LLM application call chain data. They capture key operations such as LLM input and output requests and token consumption. The fields provide rich, context-aware semantic data for scenarios such as Completion, Chat, retrieval-augmented generation (RAG), Agent, and Tool to simplify data tracking and reporting. These semantic fields are continuously updated and optimized as the community evolves.
The definitions of level-1 span fields are based on the OpenTelemetry open standard. For more information about the underlying level-1 trace fields stored in Alibaba Cloud Managed Service for OpenTelemetry, see Trace analysis parameters.
The LLM-related SpanKind is an attribute and is different from the Span kind defined in OpenTelemetry traces.
Common fields
Attributes
Attribute key | Description | Type | Example | Requirement level |
| The session ID. | String |
| Conditionally required |
| The ID of the end user of the application. | String |
| Conditionally required |
| The operation type. | String | See LLM Span Kind | Required |
| The type of framework used. | String |
| Conditionally required |
Resources
Resource key | Description | Type | Example | Requirement level |
| The application name. | String |
| Required |
Chain
A Chain is a tool that connects an LLM with multiple other components to perform complex tasks. It can be nested and may contain Retrieval, Embedding, and LLM calls.
Attributes
Attribute key | Description | Type | Example | Requirement level |
| The operation type. This is an enumeration specific to the LLM SpanKind. For a Chain, the value must be | String |
| Required |
| The sub-type of the operation. | String |
| Conditionally required |
| The input content. | String |
| Recommended |
| The returned content. | String |
| Recommended |
| The time to first token (TTFT). This is the latency for the first packet of the overall response to a query. It measures the time from when the server receives the user request to when the first packet is returned. The unit is nanoseconds. | Integer | 1000000 | Recommended |
Retriever
A Retriever typically accesses a vector store or database to retrieve data. This data is used to supplement context to improve the accuracy and efficiency of the LLM's response.
Attributes
Attribute key | Description | Type | Example | Requirement level |
| The operation type. This is an enumeration specific to the LLM SpanKind. For a Retriever, the value must be | String |
| Required |
| The short query string for retrieval. | String |
| Recommended |
| A list of retrieved documents. | JSON array |
| Required |
Reranker
A Reranker sorts multiple input documents based on their relevance to the query content and may return the top K documents for the LLM.
Attributes
Attribute key | Description | Type | Example | Requirement level |
| The operation type. This is an enumeration specific to the LLM SpanKind. For a Reranker, the value must be | String |
| Required |
| The input parameter for the Reranker request. | String |
| Optional |
| The name of the model used by the Reranker. | String |
| Optional |
| The rank after reranking. | Integer |
| Optional |
| Metadata related to the input documents for reranking. This is a JSON array. The metadata contains basic document information, such as the path, filename, and source. | String | - | Required |
| Metadata related to the output documents after reranking. This is a JSON array. The metadata contains basic document information, such as the path, filename, and source. | String | - | Required |
LLM
An LLM span identifies a call to a large model, such as requesting inference or text generation using an SDK or OpenAPI.
Attributes
Attribute key | Description | Type | Example | Requirement level |
| The operation type. This is an enumeration specific to the LLM SpanKind. For an LLM, the value must be | String |
| Required |
| The sub-type of the operation. | String |
| Optional |
| The prompt template. | String |
| Optional |
| The specific values for the prompt template. | String |
| Optional |
| The version number of the prompt template. | String |
| Optional |
| The provider of the large model. | String |
| Required |
| The input parameters for the LLM call. | String |
| Optional |
| The model name. | String |
| Optional |
| The unique ID of the conversation. This should be collected if the instrumentation can easily obtain the session ID. | String |
| Conditionally required |
| The output type specified in the LLM request. This should be collected if it is available and the request specifies a type, such as an output format. | String |
| Conditionally required |
| The number of candidate generations requested from the LLM. | Int |
| Required if the condition is met and the value is not 1 |
| The model name specified in the LLM request. | String |
| Required |
| The seed specified in the LLM request. | String |
| Conditionally required |
| The frequency penalty set in the LLM request. | Float |
| Recommended |
| The maximum number of tokens specified in the LLM request. | Integer |
| Recommended |
| The presence penalty set in the LLM request. | Float |
| Recommended |
| The temperature specified in the LLM request. | Float |
| Recommended |
| The top_p value specified in the LLM request. | Float |
| Recommended |
| The top_k value specified in the LLM request. | Float |
| Recommended |
| Indicates whether the response is streamed. If this attribute is not present, the value is considered false. | Boolean |
| Recommended |
| The stop sequences for the LLM. | String[] |
| Recommended |
| The content of the tool calls. | String |
| Recommended |
| The unique ID generated by the LLM. | String |
| Recommended |
| The name of the model used for the LLM generation. | String |
| Recommended |
| The reason why the LLM stopped generating. | String[] |
| Recommended |
| The time to first token for the large model itself in a streaming scenario. It represents the latency for the first packet of the overall response to a query, measured from when the server receives the user request to when the first packet is returned. The unit is nanoseconds. | Integer |
| Recommended |
| The inference time of the reasoning model. It represents the duration of the response reasoning process. The unit is milliseconds. | Integer |
| Recommended |
| The number of tokens used for the input. | Integer |
| Recommended |
| The number of tokens used for the output. | Integer |
| Recommended |
| The total number of tokens used. | Integer |
| Recommended |
| A link to the model's input content. | String |
| Recommended |
| A link to the model's output content. | String |
| Recommended |
| A link to the content of the system prompt. This is used to separately record an external link to the content of the system prompt (/system instruction). If the system prompt content can be obtained separately, record it using this field. If the system prompt content is part of the model call, record it in the link corresponding to the | String |
| Recommended if available |
| The model's input content. Messages must be provided in the order they were sent to the model or agent. By default, this information should not be collected unless the user explicitly enables it. | String |
| Optional |
| The model's output content. Messages must be provided in the order they were sent to the model or agent. By default, this information should not be collected unless the user explicitly enables it. | String |
| Optional |
| The content of the system prompt. This is used to separately record the content of the system prompt (/system instruction) as a JSON string. If the system prompt content can be obtained separately, record it using this field. If the system prompt content is part of the model call, record it in the By default, this information should not be collected unless the user explicitly enables it. | String |
| Optional |
| The reasoning content from the reasoning model. This represents the content of the response reasoning process. The default length is limited to 1,024 characters. Any content exceeding this limit should be truncated. | String |
| Optional |
Embedding
An Embedding span identifies an embedding process, such as an operation on a text embedding model. This embedding can be used later to optimize questions based on similarity queries.
Attributes
Attribute key | Description | Type | Example | Requirement level |
| The operation type. This is an enumeration specific to the LLM SpanKind. For an Embedding, the value must be | String |
| Required |
| The token consumption of the input text. | Integer |
| Optional |
| The total token consumption for the embedding. | Integer |
| Optional |
| The name of the embedding model. | String |
| Optional |
| The embedding result. | String | - | Optional |
Tool
A Tool span identifies a call to an external tool, such as calling a calculator or requesting the latest weather conditions from a weather API.
Attributes
Attribute key | Description | Type | Example | Requirement level |
| The operation type. This is an enumeration specific to the LLM SpanKind. For a Tool, the value must be | String |
| Required |
| The tool name. | String |
| Required |
| The tool description. | String |
| Required |
| The input parameters for the tool. | String |
| Required |
Agent
An Agent represents an agent scenario. It is a more complex Chain that decides the next step based on the inference results of an LLM. For example, it may involve multiple calls to LLMs and Tools, making decisions step-by-step to produce a final answer.
Attributes
Attribute key | Description | Type | Example | Requirement level |
| The operation type. This is an enumeration specific to the LLM SpanKind. For an Agent, the value must be | String |
| Required |
| The input parameters. Records the original input. | String |
| Required |
| The MIME type of the input. | String |
| Optional |
| The returned result. Returns the final output. | String |
| Required |
| The MIME type of the output. | String |
| Optional |
| The time to first token for the Agent. It represents the latency for the first packet of the overall response to a query, measured from when the server receives the user request to when the first packet is returned. The unit is nanoseconds. | Integer |
| Recommended |
Task
A Task span identifies an internal custom method, such as calling a local function to apply custom logic.
Attributes
Attribute key | Description | Type | Example | Requirement level |
| The operation type. This is an enumeration specific to the LLM SpanKind. For a Task, the value must be | String |
| Required |
| The input parameters. | String | Custom JSON format | Optional |
| The MIME type of the input. | String |
| Optional |
| The MIME type of the output. | String |
| Optional |