通過 loongsuite-util-genai 與 OpenTelemetry SDK 為調用鏈增加自訂埋點 - Cloud Monitor

接入 ARMS 應用監控以後，探針對常見的 AI 架構進行了自動埋點，因此不需要修改任何代碼，就可以實現調用鏈資訊的採集。如果您需要在調用鏈資訊中，體現業務方法的執行情況，可以引入 loongsuite-util-genai 以及 OpenTelemetry SDK，在業務代碼中增加自訂埋點。本文介紹如何通過 loongsuite-util-genai 以及 OpenTelemetry Python SDK 實現自訂埋點以及自訂 Attribute。

ARMS 探針支援的 AI 組件和架構，請參見：

前提條件

已經成功接入 ARMS 應用監控。

引入依賴

pip install loongsuite-util-genai

安裝後提供 opentelemetry.util.genai 包及 ExtendedTelemetryHandler 等擴充介面。更多資訊，請參見 loongsuite-util-genai 詳細文檔。

使用 loongsuite-util-genai 和 OpenTelemetry SDK

通過 loongsuite-util-genai 和 OpenTelemetry SDK 主要可以實現以下操作：

建立 GenAI 語義的 Span（Entry、Agent、Tool、ReAct Step等）。
通過 OpenTelemetry SDK 埋點產生自訂 Span。
為 Span 增加自訂 Attributes。
擷取當前 Trace 上下文並列印 traceId。

名詞介紹

Span：一次請求的一個具體操作，比如一次 LLM 調用或一次工具執行。
SpanContext：一次請求追蹤的上下文，包含 traceId、spanId 等資訊。
Attribute：Span 的附加屬性欄位，用於記錄關鍵資訊，如模型名稱、Token 用量等。
Handler：loongsuite-util-genai 提供的 ExtendedTelemetryHandler，用於建立符合 GenAI語義約定的 Span。

loongsuite-util-genai 支援的全部 Span 類型如下表所示，本文重點介紹 Entry、Agent、Tool 和 ReAct Step 的用法，其他類型（Embedding、Retrieval、Rerank、Memory等）的詳細用法請參見loongsuite-util-genai 完整文檔。

Span 類型	操作名	說明
Entry	`enter`	應用入口，攜帶 session_id / user_id / 應用完整互動資訊
Agent	`invoke_agent {name}`	Agent 調用，匯總 Token 用量
Tool	`execute_tool {name}`	工具/函數執行
Step	`react`	ReAct 單輪迭代標識
LLM	`chat {model}`	大模型對話（通常由探針自動採集）
Embedding	`embeddings {model}`	向量嵌入
Retriever	`retrieval {data_source}`	檢索（RAG）
Reranker	`rerank {model}`	重排序
Memory	`memory {operation}`	記憶讀寫

下面分步介紹各類 Span 的埋點寫法，每一步給出獨立的程式碼片段。完整可啟動並執行範例程式碼請參見本文末尾附錄部分。

重要

請務必通過 get_extended_telemetry_handler() 擷取 Handler 執行個體，而非直接執行個體化 TelemetryHandler。ARMS 探針僅對 get_extended_telemetry_handler() 進行了相容適配，直接執行個體化 TelemetryHandler 可能導致環境變數相容性問題。

重要

自訂埋點時請務必遵循LLM Trace欄位定義說明中的語義規範。AI應用可觀測能力（Token統計、會話分析等）均基於該規範中定義進行適配和渲染，若 Span 屬性不符合規範，相關資料可能無法在控制台中正確展示。

1. 擷取 Handler 和 Tracer

通過 get_extended_telemetry_handler() 擷取 loongsuite-util-genai 的單例 Handler，通過 get_tracer(__name__) 擷取 OpenTelemetry SDK 的 Tracer。兩者分別用於建立 GenAI 語義 Span 和自訂業務 Span。

from opentelemetry.util.genai.extended_handler import get_extended_telemetry_handler
from opentelemetry.util.genai.extended_types import (
    ExecuteToolInvocation,
    InvokeAgentInvocation,
)
from opentelemetry.util.genai._extended_common import EntryInvocation, ReactStepInvocation
from opentelemetry.util.genai.types import Error, InputMessage, OutputMessage, Text
from opentelemetry.trace import get_tracer

handler = get_extended_telemetry_handler()
tracer = get_tracer(__name__)

Handler 提供兩種使用方式：

上下文管理器（with handler.entry(inv) 等）：推薦方式，自動管理 Span 生命週期。
start/stop/fail API（handler.start_entry(inv) / handler.stop_entry(inv) / handler.fail_entry(inv, error)）：適用於非同步、回調或流式等無法使用 with 語句的情境。

2. 建立 Entry Span

在請求入口處建立 Entry Span，攜帶 session_id、user_id，並通過 input_messages 記錄使用者輸入。流式響應完成後，將輸出內容拼接設定到 output_messages，再調用 stop_entry 結束 Span。這樣在控制台中能直接看到該次請求的完整輸入和最終輸出。

entry_inv = EntryInvocation(
    session_id=req.session_id or str(uuid.uuid4()),
    user_id=req.user_id or "anonymous",
    input_messages=[
        InputMessage(role="user", parts=[Text(content=req.topic)]),
    ],
)

def event_generator():
    handler.start_entry(entry_inv)
    output_chunks: list[str] = [ ]

    try:
        for chunk in run_agent_stream(topic=req.topic):
            output_chunks.append(chunk)
            yield f"data: {json.dumps({'content': chunk}, ensure_ascii=False)}\n\n"
        yield "data: [DONE]\n\n"
    except Exception as exc:
        handler.fail_entry(entry_inv, Error(message=str(exc), type=type(exc)))
        yield f"data: {json.dumps({'error': str(exc)}, ensure_ascii=False)}\n\n"
        return
    entry_inv.output_messages = [
        OutputMessage(
            role="assistant",
            parts=[Text(content="".join(output_chunks))],
            finish_reason="stop",
        ),
    ]
    handler.stop_entry(entry_inv)

3. 建立 Agent Span

通過 start_invoke_agent 建立 Agent Span，記錄 Agent 名稱、模型和描述資訊。Agent Span 是整個調用鏈的根 GenAI Span，所有後續的 ReAct Step、LLM 調用和 Tool 調用都作為它的子 Span。

invocation = InvokeAgentInvocation(
    provider="dashscope",
    agent_name="TechContentAgent",
    agent_description="技術內容產生助手",
    request_model="qwen-plus",
)
total_input_tokens = 0
total_output_tokens = 0

handler.start_invoke_agent(invocation)
try:
    # ... Agent 核心邏輯（ReAct 迴圈） ...

    invocation.input_tokens = total_input_tokens
    invocation.output_tokens = total_output_tokens
    handler.stop_invoke_agent(invocation)
except Exception:
    handler.fail_invoke_agent(invocation, Error(message="agent failed", type=RuntimeError))
    raise

Agent 執行完成後，將累積的 total_input_tokens 和 total_output_tokens 寫入 Agent Span，實現 Token 指標匯總統計。

4. 建立 ReAct Step Span

在每一輪 ReAct 推理迭代時建立 Step Span，傳入當前輪次 round。迭代結束時設定 finish_reason：需要繼續迭代為 continue，最終回答為 stop。樣本中每輪迭代的 LLM 調用由 ARMS 探針自動埋點，無需手動建立。

step_inv = ReactStepInvocation(round=iteration + 1)
handler.start_react_step(step_inv)

try:
    response = client.chat.completions.create(
        model="qwen-plus",
        messages=messages,
        tools=TOOL_DEFINITIONS,
    )
    # ... 處理響應 ...

    step_inv.finish_reason = "stop"  # 或 "continue"
    handler.stop_react_step(step_inv)
except Exception:
    handler.fail_react_step(step_inv, Error(message="step failed", type=RuntimeError))
    raise

5. 建立 Tool Span

當模型返回工具調用時，為每個 tool_call 建立 Tool Span，記錄工具名稱、調用 ID、入參和返回結果。

tool_inv = ExecuteToolInvocation(
    tool_name=tool_call.function.name,
    tool_call_id=tool_call.id,
    tool_call_arguments=tool_call.function.arguments,
    tool_type="function",
)
handler.start_execute_tool(tool_inv)
try:
    result = dispatch_tool(tool_name, tool_call.function.arguments)
    tool_inv.tool_call_result = result
except Exception as exc:
    handler.fail_execute_tool(tool_inv, error=Error(message=str(exc), type=type(exc)))
    raise
else:
    handler.stop_execute_tool(tool_inv)

6. 使用 OpenTelemetry SDK 建立自訂 Span

除了 loongsuite-util-genai 提供的 GenAI 語義 Span，還可以通過 OpenTelemetry SDK 的 tracer.start_as_current_span() 建立自訂業務 Span，與 GenAI Span 混合使用。

以下樣本展示了兩種典型的自訂 Span 用法：

`duplicate_tool_detection` — 工具重複調用檢測

在每輪 ReAct 迭代前執行，通過 Counter 統計每個工具的調用次數，將檢測結果寫入 gen_ai.loop_detection.* 屬性。若發現重複，向訊息列表追加系統提示引導模型避免重複。

def _check_duplicate_tools(
    tool_usage_counter: Counter,
    messages: list[dict[str, Any]],
) -> None:
    duplicates = [name for name, count in tool_usage_counter.items() if count > 1]
    has_duplicates = len(duplicates) > 0

    with tracer.start_as_current_span("duplicate_tool_detection") as span:
        span.set_attributes({
            "gen_ai.loop_detection.detected": has_duplicates,
            "gen_ai.loop_detection.duplicate_tools": str(duplicates) if has_duplicates else "[ ]",
            "gen_ai.loop_detection.total_calls": sum(tool_usage_counter.values()),
            "gen_ai.loop_detection.unique_tools": len(tool_usage_counter),
        })

    if has_duplicates:
        details = ", ".join(f"{n}({tool_usage_counter[n]}次)" for n in duplicates)
        messages.append({
            "role": "system",
            "content": f"[系統提示] 檢測到工具被重複調用：{details}。請避免重複調用。",
        })

`response_loop_detection` — LLM 回複迴圈檢測

在每輪 LLM 回複後執行，通過比較當前回複與上一輪迴複的文本相似性，將 is_loop、overlap_ratio 等指標寫入 Span 屬性。若檢測到迴圈（文本完全相同或重疊率超過 80%），設定 finish_reason 為 loop_detected 並提前終止 Agent。

def _check_response_loop(
    current_content: str | None,
    previous_content: str | None,
) -> bool:
    cur = (current_content or "").strip()
    prev = (previous_content or "").strip()

    with tracer.start_as_current_span("response_loop_detection") as span:
        if not prev or not cur:
            span.set_attributes({
                "gen_ai.loop_detection.is_loop": False,
                "gen_ai.loop_detection.reason": "no_text_content",
            })
            return False

        is_identical = cur == prev
        longer = max(len(cur), len(prev))
        common_prefix_len = sum(1 for a, b in zip(cur, prev) if a == b)
        overlap_ratio = common_prefix_len / longer if longer > 0 else 0.0
        is_loop = is_identical or overlap_ratio > 0.8

        span.set_attributes({
            "gen_ai.loop_detection.is_loop": is_loop,
            "gen_ai.loop_detection.is_identical": is_identical,
            "gen_ai.loop_detection.overlap_ratio": round(overlap_ratio, 2),
            "gen_ai.loop_detection.current_length": len(cur),
            "gen_ai.loop_detection.previous_length": len(prev),
        })
        return is_loop

說明

由於自訂 Span 不屬於大模型語義規範，在控制台的調用鏈視圖中需要切換到全部視圖才能查看。

查看監控詳情

登入CloudMonitor2.0控制台，選擇目標工作空間，在左側導覽列選擇所有功能 > AI應用可觀測。
在AI應用列表頁面可以看到已接入的應用，單擊應用程式名稱可以查看詳細的應用監控資料。

埋點效果展示

1. Entry Span 詳情

Enter Span 能看到 gen_ai.session.id、gen_ai.user.id 等關鍵屬性，通過在函數入口處設定能自動透傳到 LLM、TOOL等Span中，能用於關聯會話和使用者資訊進行分析。同時 Entry Span 還攜帶 gen_ai.input.messages（使用者輸入內容）和 gen_ai.output.messages（最終輸出內容），便於在控制台中直接查看該次請求的整體互動內容。

2. Agent Span 詳情

Agent Span能看到該 Agent 的定義名稱以及相應的描述，同時體現上述範例程式碼中統計的屬於該 Agent 層級的 Token 用量匯總統計效果。

3. Tool Span 詳情

Tool Span 能看到該 Tool 的名稱以及入參配置，並且展示工具調用結果。

4. LLM Span 詳情

LLM Span在上述範例程式碼中並沒有進行手動埋點，由於是 openai 調用，此處全部由探針自動採集，能清晰觀察到該次 LLM 調用的完整上下文資訊以及 token 消耗。

5.自訂 Span 詳情

範例程式碼中通過 OpenTelemetry SDK 建立了兩個自訂業務 Span，展示如何將自訂埋點與 GenAI 語義 Span 混合使用，由於該自訂Span並不在大模型語義中，需要開啟全部視圖進行查看。

duplicate_tool_detection：在每輪 ReAct 迭代前執行，用於檢測 Agent 是否陷入工具重複調用。Span 屬性中記錄了是否檢測到重複、重複的工具列表、總調用次數和去重工具數，便於在 ARMS 中快速定位 Agent 的工具調用迴圈問題。
response_loop_detection：在每輪 LLM 回複後執行，用於檢測模型是否連續返回高度相似的內容。Span 屬性中記錄了是否判定為迴圈、文本是否完全相同、重疊率以及當前和上一輪迴複的文本長度，協助排查模型陷入重複輸出的異常情境。

附錄

完整範例程式碼

app.py

import json
import uuid

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from opentelemetry.util.genai.extended_handler import get_extended_telemetry_handler
from opentelemetry.util.genai._extended_common import EntryInvocation
from opentelemetry.util.genai.types import Error, InputMessage, OutputMessage, Text

from agent import run_marketing_agent_stream

app = FastAPI(title="雲產品技術內容產生助手")


class GenerateRequest(BaseModel):
    content_type: str = "blog"
    product: str = "CMS"
    target_audience: str = "營運工程師"
    topic: str = ""
    session_id: str = ""
    user_id: str = ""


@app.post("/api/v1/generate/stream")
async def generate_stream(req: GenerateRequest) -> StreamingResponse:
    handler = get_extended_telemetry_handler()

    user_prompt = (
        f"內容類型: {req.content_type}, 產品: {req.product}, "
        f"目標受眾: {req.target_audience}, 主題: {req.topic}"
    )

    entry_inv = EntryInvocation(
        session_id=req.session_id or str(uuid.uuid4()),
        user_id=req.user_id or "anonymous",
        input_messages=[
            InputMessage(role="user", parts=[Text(content=user_prompt)]),
        ],
    )

    def event_generator():
        handler.start_entry(entry_inv)
        output_chunks: list[str] = []
        try:
            for chunk in run_marketing_agent_stream(
                content_type=req.content_type,
                product=req.product,
                target_audience=req.target_audience,
                topic=req.topic,
            ):
                output_chunks.append(chunk)
                yield f"data: {json.dumps({'content': chunk}, ensure_ascii=False)}\n\n"
            yield "data: [DONE]\n\n"
        except Exception as exc:
            handler.fail_entry(
                entry_inv,
                Error(message=str(exc), type=type(exc)),
            )
            yield f"data: {json.dumps({'error': str(exc)}, ensure_ascii=False)}\n\n"
            return
        entry_inv.output_messages = [
            OutputMessage(
                role="assistant",
                parts=[Text(content="".join(output_chunks))],
                finish_reason="stop",
            ),
        ]
        handler.stop_entry(entry_inv)

    return StreamingResponse(event_generator(), media_type="text/event-stream")


@app.get("/health")
async def health():
    return {"status": "ok"}


if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

agent.py

import os
from collections import Counter
from collections.abc import Generator
from typing import Any

from openai import OpenAI
from opentelemetry.trace import get_tracer
from opentelemetry.util.genai.extended_handler import get_extended_telemetry_handler
from opentelemetry.util.genai.extended_types import (
    ExecuteToolInvocation,
    InvokeAgentInvocation,
)
from opentelemetry.util.genai._extended_common import ReactStepInvocation
from opentelemetry.util.genai.types import Error

from tools import TOOL_DEFINITIONS, dispatch_tool

tracer = get_tracer(__name__)

MODEL_NAME = os.environ.get("MODEL_NAME", "qwen-plus")
BASE_URL = os.environ.get(
    "OPENAI_BASE_URL",
    "https://dashscope.aliyuncs.com/compatible-mode/v1",
)
API_KEY = os.environ.get("DASHSCOPE_API_KEY", "")

MAX_ITERATIONS = 10

SYSTEM_PROMPT = """\
你是阿里雲CloudMonitor 2.0（CMS 2.0）的技術內容產生助手。\
面向營運工程師和架構師，用其熟悉的專業語言產生高價值技術內容。

關鍵原則：根據目標受眾調整內容的視角和語言風格——
- 營運工程師：聚焦實操步驟、排障效率、工具整合，用一線營運的日常術語
- 架構師：聚焦架構設計、標準化、可擴充性，用技術深度的專業表達

你必須嚴格按以下步驟執行，每一步都要調用對應的工具：

第一步：使用 search_product_knowledge 工具搜尋 CMS 產品資訊（features 或 comparison）
第二步：使用 get_audience_profile 工具擷取目標受眾的畫像和痛點
第三步：使用 get_industry_cases 工具尋找相關行業案例
第四步：如果是部落格文章，使用 generate_seo_keywords 工具擷取 SEO 關鍵詞
第五步：根據收集到的資訊產生內容
第六步：使用 check_content_compliance 工具檢查合規性

內容要求：圍繞產品優勢和受眾痛點，引用案例資料，中文撰寫，800 字以內。"""


def _build_client() -> OpenAI:
    return OpenAI(base_url=BASE_URL, api_key=API_KEY)


def _build_user_message(
    content_type: str,
    product: str,
    target_audience: str,
    topic: str,
) -> str:
    type_labels = {
        "blog": "面向一線技術人員的實戰技術部落格",
        "email": "精準觸達目標角色的技術推薦郵件",
        "case_study": "可落地參考的客戶實踐案例",
        "comparison": "輔助技術選型的產品對比分析",
    }
    label = type_labels.get(content_type, content_type)
    return (
        f"請為 {product} 產品產生一篇{label}。\n"
        f"目標受眾：{target_audience}\n"
        f"主題/方向：{topic}\n\n"
        f"請用目標受眾日常工作中熟悉的語言和視角來撰寫，"
        f"嚴格按照步驟調用工具收集資訊後再產生內容。"
    )


def _check_duplicate_tools(
    tool_usage_counter: Counter,
    messages: list[dict[str, Any]],
) -> list[str]:
    duplicates = [name for name, count in tool_usage_counter.items() if count > 1]
    total_calls = sum(tool_usage_counter.values())
    has_duplicates = len(duplicates) > 0

    duplicate_details = ", ".join(
        f"{name}({tool_usage_counter[name]}次)" for name in duplicates
    ) if has_duplicates else "none"

    with tracer.start_as_current_span("duplicate_tool_detection") as detect_span:
        detect_span.set_attributes({
            "gen_ai.loop_detection.detected": has_duplicates,
            "gen_ai.loop_detection.duplicate_tools": str(duplicates) if has_duplicates else "[]",
            "gen_ai.loop_detection.details": duplicate_details,
            "gen_ai.loop_detection.total_calls": total_calls,
            "gen_ai.loop_detection.unique_tools": len(tool_usage_counter),
        })

    if not has_duplicates:
        return []

    hint_message = (
        f"[系統提示] 檢測到以下工具被重複調用：{duplicate_details}。"
        f"請避免重複調用相同的工具，直接使用已擷取的資訊繼續執行後續步驟。"
    )
    messages.append({"role": "system", "content": hint_message})

    return duplicates


def _check_response_loop(
    current_content: str | None,
    previous_content: str | None,
) -> bool:
    """Compare consecutive LLM text responses to detect stuck loops."""
    cur = (current_content or "").strip()
    prev = (previous_content or "").strip()

    with tracer.start_as_current_span("response_loop_detection") as span:
        if not prev or not cur:
            span.set_attributes({
                "gen_ai.loop_detection.is_loop": False,
                "gen_ai.loop_detection.reason": "no_text_content",
            })
            return False

        is_identical = cur == prev

        common_prefix_len = 0
        for a, b in zip(cur, prev):
            if a == b:
                common_prefix_len += 1
            else:
                break
        longer = max(len(cur), len(prev))
        overlap_ratio = common_prefix_len / longer if longer > 0 else 0.0
        is_loop = is_identical or overlap_ratio > 0.8

        span.set_attributes({
            "gen_ai.loop_detection.is_loop": is_loop,
            "gen_ai.loop_detection.is_identical": is_identical,
            "gen_ai.loop_detection.overlap_ratio": round(overlap_ratio, 2),
            "gen_ai.loop_detection.current_length": len(cur),
            "gen_ai.loop_detection.previous_length": len(prev),
        })
        return is_loop


def run_marketing_agent_stream(
    content_type: str,
    product: str,
    target_audience: str,
    topic: str,
) -> Generator[str, None, None]:
    client = _build_client()
    handler = get_extended_telemetry_handler()

    user_message = _build_user_message(content_type, product, target_audience, topic)

    invocation = InvokeAgentInvocation(
        provider="dashscope",
        agent_name="TechContentAgent",
        agent_description="面向不同技術角色的雲產品內容產生助手",
        request_model=MODEL_NAME,
    )

    total_input_tokens = 0
    total_output_tokens = 0
    tool_usage_counter: Counter = Counter()
    previous_content: str | None = None

    handler.start_invoke_agent(invocation)
    try:
        messages: list[dict[str, Any]] = [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ]

        for iteration in range(MAX_ITERATIONS):
            _check_duplicate_tools(tool_usage_counter, messages)

            step_inv = ReactStepInvocation(round=iteration + 1)
            handler.start_react_step(step_inv)
            try:
                response = client.chat.completions.create(
                    model=MODEL_NAME,
                    messages=messages,
                    tools=TOOL_DEFINITIONS,
                    temperature=0.7,
                )

                choice = response.choices[0]
                message = choice.message

                if response.usage:
                    total_input_tokens += response.usage.prompt_tokens
                    total_output_tokens += response.usage.completion_tokens

                current_content = message.content
                if _check_response_loop(current_content, previous_content):
                    step_inv.finish_reason = "loop_detected"
                    handler.stop_react_step(step_inv)
                    if current_content:
                        yield current_content
                    break
                if (current_content or "").strip():
                    previous_content = current_content

                if message.tool_calls:
                    messages.append(message.model_dump())

                    for tool_call in message.tool_calls:
                        tool_name = tool_call.function.name
                        tool_args = tool_call.function.arguments
                        tool_usage_counter[tool_name] += 1

                        tool_inv = ExecuteToolInvocation(
                            tool_name=tool_name,
                            tool_call_id=tool_call.id,
                            tool_call_arguments=tool_args,
                            tool_type="function",
                        )

                        handler.start_execute_tool(tool_inv)
                        try:
                            result = dispatch_tool(tool_name, tool_args)
                            tool_inv.tool_call_result = result
                        except Exception as exc:
                            handler.fail_execute_tool(
                                tool_inv,
                                error=Error(message=str(exc), type=type(exc)),
                            )
                            raise
                        else:
                            handler.stop_execute_tool(tool_inv)

                        messages.append({
                            "role": "tool",
                            "tool_call_id": tool_call.id,
                            "content": result,
                        })

                    step_inv.finish_reason = "continue"
                    handler.stop_react_step(step_inv)
                    continue

                if choice.finish_reason == "stop" or message.content:
                    if message.content:
                        yield message.content

                    step_inv.finish_reason = "stop"
                    handler.stop_react_step(step_inv)
                    break
            except Exception:
                handler.fail_react_step(
                    step_inv, Error(message="step failed", type=RuntimeError)
                )
                raise

        invocation.input_tokens = total_input_tokens
        invocation.output_tokens = total_output_tokens
        handler.stop_invoke_agent(invocation)
    except Exception:
        handler.fail_invoke_agent(
            invocation, Error(message="agent failed", type=RuntimeError)
        )
        raise

tools.py

import json
from typing import Any

PRODUCT_KNOWLEDGE: dict[str, dict[str, str]] = {
    "CMS": {
        "features": (
            "CloudMonitor 2.0（CMS 2.0）是阿里雲一站式可觀測平台，"
            "融合 SLS + CMS + ARMS 三大產品能力：\n"
            "1. 全棧統一監控：指標、鏈路、日誌、事件統一視圖\n"
            "2. UModel 統一建模：資源自動關聯與觀測圖譜構建\n"
            "3. AI 智能分析：異常檢測、警示降噪、對話式營運 Copilot\n"
            "4. 開放相容：支援 Prometheus、Grafana、OpenTelemetry 生態\n"
            "5. AI 應用可觀測：LLM 調用鏈追蹤、Token 統計、模型效能分析"
        ),
        "comparison": (
            "CloudMonitor 2.0 vs 傳統監控方案：\n"
            "1. 資料融合：傳統方案需在 3-5 個控制台間切換；CMS 2.0 一站式融合\n"
            "2. AI 能力：傳統靜態閾值警示誤判率 30%+；CMS 2.0 AI 降噪 80%\n"
            "3. 觀測圖譜：CMS 2.0 通過 UModel 自動構建依賴圖譜\n"
            "4. AI 應用可觀測：傳統方案不支援；CMS 2.0 原生支援 LLM/Agent 全鏈路"
        ),
    },
}
AUDIENCE_PROFILES: dict[str, dict[str, str]] = {
    "營運工程師": {
        "role": "營運工程師 / SRE",
        "pain_points": (
            "1. 故障排查耗時間長度：微服務架構下定位問題平均 30-60 分鐘\n"
            "2. 警示風暴：大促期間警示激增，難以區分優先順序\n"
            "3. 工具片段化：需在 5-6 個監控工具間切換\n"
            "4. AI 營運盲區：大模型調用鏈路不透明"
        ),
        "interests": "全鏈路追蹤、根因分析、警示降噪、Prometheus/Grafana 整合",
        "decision_factors": "技術成熟度等級、社區活躍度、學習成本、整合難度",
    },
    "架構師": {
        "role": "架構師 / 技術專家",
        "pain_points": (
            "1. 微服務 + AI Agent 混合架構的可觀測性挑戰\n"
            "2. 開源自建 vs 商業方案選型缺乏客觀對比\n"
            "3. 各團隊監控方案不統一，資料格式片段化\n"
            "4. 現有方案能否支撐業務 10 倍增長"
        ),
        "interests": "架構設計、OpenTelemetry 標準化、資料模型統一、可擴充性",
        "decision_factors": "架構先進性、標準化程度、可擴充性、開放性、社區生態",
    },
}

INDUSTRY_CASES: dict[str, list[dict[str, str]]] = {
    "金融": [
        {
            "company": "某頭部股份制銀行",
            "scenario": (
                "核心交易系統可觀測升級：覆蓋 200+ 微服務，"
                "日均處理 5000 萬筆交易的全鏈路追蹤"
            ),
            "results": (
                "故障 MTTR 從 45 分鐘降至 8 分鐘，降幅 82%；"
                "警示準確率從 60% 提升至 95%；"
                "營運人效提升 3 倍，等保三級合規檢查一次通過"
            ),
        },
    ],
    "互連網": [
        {
            "company": "某社交平台",
            "scenario": (
                "千萬 DAU 應用的全棧可觀測：覆蓋 App 端體驗監控 → "
                "CDN → API Gateway → 2000+ 微服務 → 資料庫/緩衝"
            ),
            "results": (
                "使用者側 Crash 率從 0.5% 降至 0.08%；"
                "API P99 延遲最佳化 40%；"
                "每月節省 10 萬元+ 監控成本（相比自建方案）"
            ),
        },
    ],
}

COMPLIANCE_RULES: dict[str, dict[str, Any]] = {
    "product_names": {
        "incorrect": {
            "Aliyun": "阿里雲",
            "CMS2.0": "CMS 2.0",
            "CloudMonitor2.0": "CloudMonitor 2.0",
        },
    },
    "claim_rules": [
        "資料引用必須標註來源",
        "避免絕對化用語（如'最好的''唯一的''第一'）",
        "對比競品時使用客觀資料",
    ],
}

SEO_KEYWORDS_DB: dict[str, dict[str, Any]] = {
    "可觀測": {
        "primary": "可觀測性",
        "long_tail": ["雲原生可觀測性方案", "微服務可觀測平台選型"],
        "search_volume": "高",
    },
    "AI可觀測": {
        "primary": "AI 應用可觀測",
        "long_tail": ["LLM 調用鏈追蹤", "AI Agent 可觀測性"],
        "search_volume": "中（快速增長）",
    },
}





def search_product_knowledge(product: str, aspect: str) -> str:
    product_key = "CMS"
    product_data = PRODUCT_KNOWLEDGE.get(product_key)
    if not product_data:
        available = ", ".join(PRODUCT_KNOWLEDGE.keys())
        return f"未找到產品 '{product}' 的知識庫。可用產品：{available}"

    aspect_lower = aspect.lower()
    aspect_data = product_data.get(aspect_lower)
    if not aspect_data:
        available = ", ".join(product_data.keys())
        return f"未找到 '{product}' 的 '{aspect}' 方面資訊。可查詢方面：{available}"

    return f"【{product} - {aspect}】\n{aspect_data}"


def get_audience_profile(audience_type: str) -> str:
    profile = AUDIENCE_PROFILES.get(audience_type)
    if not profile:
        available = ", ".join(AUDIENCE_PROFILES.keys())
        return f"未找到受眾類型 '{audience_type}'。可用類型：{available}"

    return (
        f"受眾畫像 — {profile['role']}\n\n"
        f"核心痛點:\n{profile['pain_points']}\n\n"
        f"關注領域: {profile['interests']}\n\n"
        f"決策因素: {profile['decision_factors']}"
    )


def get_industry_cases(industry: str) -> str:
    cases = INDUSTRY_CASES.get(industry)
    if not cases:
        available = ", ".join(INDUSTRY_CASES.keys())
        return f"未找到 '{industry}' 行業的案例。可用行業：{available}"

    parts: list[str] = [f"【{industry}行業案例】\n"]
    for i, case in enumerate(cases, 1):
        parts.append(
            f"案例 {i}: {case['company']}\n"
            f"  情境: {case['scenario']}\n"
            f"  成效: {case['results']}"
        )
    return "\n\n".join(parts)


def check_content_compliance(content_type: str, key_claims: str) -> str:
    issues: list[str] = []

    for wrong, correct in COMPLIANCE_RULES["product_names"]["incorrect"].items():
        if wrong in key_claims and correct not in key_claims:
            issues.append(f"產品名稱 '{wrong}' 應更正為 '{correct}'")

    for word in ("最好", "唯一", "第一", "最強"):
        if word in key_claims:
            issues.append(f"包含絕對化用語 '{word}'，建議替換為客觀表述")

    rules_text = "\n".join(
        f"  {i+1}. {rule}"
        for i, rule in enumerate(COMPLIANCE_RULES["claim_rules"])
    )

    result = "合規檢查結果:\n\n"
    if issues:
        result += "發現問題:\n" + "\n".join(f"  - {i}" for i in issues) + "\n\n"
    else:
        result += "未發現明顯合規問題。\n\n"
    result += f"合規規則:\n{rules_text}"
    return result


def generate_seo_keywords(topic: str) -> str:
    topic_lower = topic.lower()
    matched: list[dict[str, Any]] = []

    for key, data in SEO_KEYWORDS_DB.items():
        if key.lower() in topic_lower or topic_lower in key.lower() or any(
            w in topic_lower for w in key.lower().split() if len(w) > 1
        ):
            matched.append({"keyword": key, **data})

    if not matched:
        all_keywords = list(SEO_KEYWORDS_DB.keys())
        return (
            f"未找到與 '{topic}' 直接匹配的關鍵詞資料。\n"
            f"建議關鍵詞方向：{', '.join(all_keywords)}\n"
            f"通用 SEO 建議：標題包含核心關鍵詞，"
            f"H2/H3 使用長尾關鍵詞，內容長度 2000+ 字"
        )

    parts: list[str] = [f"SEO 關鍵詞分析 — '{topic}':\n"]
    for item in matched:
        long_tail = "\n".join(f"    - {kw}" for kw in item["long_tail"])
        parts.append(
            f"主關鍵詞: {item['primary']}\n"
            f"  搜尋熱度: {item['search_volume']}\n"
            f"  長尾關鍵詞:\n{long_tail}"
        )
    return "\n\n".join(parts)


TOOL_REGISTRY: dict[str, Any] = {
    "search_product_knowledge": search_product_knowledge,
    "get_audience_profile": get_audience_profile,
    "get_industry_cases": get_industry_cases,
    "check_content_compliance": check_content_compliance,
    "generate_seo_keywords": generate_seo_keywords,
}


def dispatch_tool(name: str, arguments: str) -> str:
    func = TOOL_REGISTRY.get(name)
    if not func:
        return f"未知工具: {name}"
    try:
        kwargs = json.loads(arguments)
    except json.JSONDecodeError:
        return f"工具參數解析失敗: {arguments}"
    return func(**kwargs)


TOOL_DEFINITIONS: list[dict[str, Any]] = [
    {
        "type": "function",
        "function": {
            "name": "search_product_knowledge",
            "description": "搜尋 CMS 產品知識庫，擷取特性或競品對比資訊。",
            "parameters": {
                "type": "object",
                "properties": {
                    "product": {
                        "type": "string",
                        "description": "產品名稱",
                        "enum": ["CMS"],
                    },
                    "aspect": {
                        "type": "string",
                        "description": "查詢方面",
                        "enum": ["features", "comparison"],
                    },
                },
                "required": ["product", "aspect"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_audience_profile",
            "description": "擷取目標受眾畫像，包括痛點、關注領域和決策因素。",
            "parameters": {
                "type": "object",
                "properties": {
                    "audience_type": {
                        "type": "string",
                        "description": "目標受眾類型",
                        "enum": ["營運工程師", "架構師"],
                    },
                },
                "required": ["audience_type"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_industry_cases",
            "description": "擷取行業客戶成功案例，包括情境和成效資料。",
            "parameters": {
                "type": "object",
                "properties": {
                    "industry": {
                        "type": "string",
                        "description": "目標行業",
                        "enum": ["金融", "互連網"],
                    },
                },
                "required": ["industry"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "check_content_compliance",
            "description": "檢查內容合規性，包括產品名稱規範和宣傳用語。",
            "parameters": {
                "type": "object",
                "properties": {
                    "content_type": {
                        "type": "string",
                        "description": "內容類型",
                        "enum": ["blog", "case_study", "comparison"],
                    },
                    "key_claims": {
                        "type": "string",
                        "description": "關鍵宣傳點和資料引用",
                    },
                },
                "required": ["content_type", "key_claims"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "generate_seo_keywords",
            "description": "基於主題產生 SEO 關鍵詞，產生部落格文章時調用。",
            "parameters": {
                "type": "object",
                "properties": {
                    "topic": {
                        "type": "string",
                        "description": "文章主題或核心關鍵詞",
                    },
                },
                "required": ["topic"],
            },
        },
    },
]

requirements.txt

openai
fastapi
uvicorn[standard]
loongsuite-util-genai

Cloud Monitor：通過 loongsuite-util-genai 與 OpenTelemetry SDK 為調用鏈增加自訂埋點

前提條件

引入依賴

使用 loongsuite-util-genai 和 OpenTelemetry SDK

名詞介紹

1. 擷取 Handler 和 Tracer

2. 建立 Entry Span

3. 建立 Agent Span

4. 建立 ReAct Step Span

5. 建立 Tool Span

6. 使用 OpenTelemetry SDK 建立自訂 Span

`duplicate_tool_detection` — 工具重複調用檢測

`response_loop_detection` — LLM 回複迴圈檢測

查看監控詳情

埋點效果展示

相關文檔

其他語言的自訂埋點

附錄

app.py

agent.py

tools.py

requirements.txt

前提條件

引入依賴

使用 loongsuite-util-genai 和 OpenTelemetry SDK

名詞介紹

1. 擷取 Handler 和 Tracer

2. 建立 Entry Span

3. 建立 Agent Span

4. 建立 ReAct Step Span

5. 建立 Tool Span

6. 使用 OpenTelemetry SDK 建立自訂 Span

duplicate_tool_detection — 工具重複調用檢測

response_loop_detection — LLM 回複迴圈檢測

查看監控詳情

埋點效果展示

相關文檔

其他語言的自訂埋點

附錄

app.py

agent.py

tools.py

requirements.txt

`duplicate_tool_detection` — 工具重複調用檢測

`response_loop_detection` — LLM 回複迴圈檢測