全部產品
Search
文件中心

Function Compute:LLM推理模型服務指標監控整合方案

更新時間:Mar 15, 2026

通過在函數執行個體中以Sidecar模式部署OpenTelemetry Collector,可自動採集並上報LLM推理引擎暴露的Prometheus指標(如Token輸送量、請求延遲等),無需修改業務代碼,即可將指標無縫對接到阿里雲ARMS託管Prometheus服務,從而實現對LLM服務的生產級、可視化、可警示的即時監控。

準備工作

開始具體操作前,您需要完成以下必備的雲端服務與資源的準備工作,並擷取後續配置所需的關鍵資訊和許可權,如您全部滿足,可直接進行後續操作。

開通Object Storage Service服務並建立bucket

用於存放模型檔案、 OpenTelemetry Collector 的二進位檔案及設定檔,以便Function Compute執行個體通過掛載方式訪問。如已有可用bucket,可以直接開始進行:

  1. 開通Object Storage Service服務,在左側導覽列進入Bucket 列表頁面,單擊建立 Bucket

  2. 配置以下關鍵參數,其餘保留預設值:

    • Bucket名稱:輸入一個全域唯一的名稱,為確保唯一,建議使用專案名-地區-隨機字串的組合,例如llmfiles-hangzhou-a1b2c3

    • 地區:建議選擇與在Function Compute中部署的模型服務相同地區;

  3. 單擊完成建立

在Function Compute中部署LLM推理模型

如果您還沒有在Function Compute中部署可用的LLM推理模型服務,您可以參考如下步驟,部署一個基於vllm推理引擎部署的Qwen3-0.6B模型進行學習和測試,如已準備好,可以跳過進行下一項:

  1. 下載模型檔案並上傳至OSS,下載地址:Qwen3-0.6B

    重要

    模型檔案model.safetensors較大,請耐心等待所有檔案的狀態為上傳成功。

  2. 進入Function Compute控制台>函数,在頁面最上方選擇地區,點擊创建函数

  3. 選擇GPU 函数,點擊创建{title}

  4. 创建函数頁面進行如下核心配置,其餘配置保持預設或參考建立GPU函數按需配置:

    1. 函数名称:只能包含字母、數字、底線和中劃線。不能以數字、中劃線開頭。長度在 1-64 之間,建議使用模型-隨機字串的組合,例如:qwen3-06b-a1b2c3;

    2. 实例类型:選擇弹性实例,其餘配置保持預設;

    3. 示例代码選擇使用自定义仓库中的镜像

    4. 容器镜像

      serverless-registry.cn-hangzhou.cr.aliyuncs.com/functionai/vllm-openai:v0.10.1
    5. 启动命令

      vllm serve /mnt/qwen3 --port 9000 --served-model-name Qwen/Qwen3-0.6B --trust-remote-code
    6. 监听端口:9000;

    7. 执行超时时间:建議配置為600;

    8. 权限函数角色選擇AliyunFcDefaultRole

    9. 存储:開啟挂载 OSS 对象存储,Bucket/Bucket子目錄選擇模型檔案所在子目錄,函數本地目錄填寫/mnt/qwen3(與启动命令中的目錄一致);

  5. 點擊创建,跳轉至函數詳情頁

  6. 點擊函数拓扑图中的触发器,在编辑触发器 {name}頁面,將认证方式變更為無需認證。

    說明

    僅測試時需要將认证方式變更為無需認證,在生產環境中,建議參考認證鑒權進行配置。

開通ARMS並建立Prometheus執行個體

用於接收並持久化儲存由OpenTelemetry Collector 推送的LLM模型服務效能指標,並提供資料查詢能力。如果沒有可用執行個體,可以參考如下配置進行建立,並記錄Remote Write 地址,如已準備好,可以跳過進行下一項:

  1. 開通應用即時監控服務,並進入ARMS控制台

  2. Prometheus監控>執行個體列表頁面,點擊建立Prometheus執行個體

  3. 建立執行個體頁面,執行個體類型選擇通用執行個體,並自訂執行個體名稱,其他配置保持預設即可,或參考管理Prometheus執行個體按需配置;

  4. 點擊立即建立,會自動跳轉至執行個體設定頁面,此時可以記錄下Remote Write 地址,在配置OpenTelemetry Collector時需要。

操作步驟

步驟一:下載並配置 OpenTelemetry Collector

OpenTelemetry Collector 的核心任務是採集 LLM 推理服務暴露的 Prometheus 指標,並通過 Remote Write 協議將其推送到遠端的 Prometheus 執行個體,

  1. 下載並解壓:otelcol-contrib_0.136.0_linux_amd64.zip,解壓成功後,將得到名為otelcol-contrib的二進位檔案和名為otel_config.yaml的設定檔;

    本文以版本v0.136.0為例,其他版本下載請參考:opentelemetry-collector-releases
  2. 修改otel_config.yaml設定檔進行如下參數配置,完整的參數說明參考opentelemetry/configuration

    • job_name::任務名稱,建議與要擷取指標的函數名稱一致,方便後續進行指標查詢和分析;

    • targets{port}替換為函數的監聽連接埠,如在Function Compute中部署LLM推理模型操作中建立的函數,監聽連接埠為9000;

    • 擷取方式:函数管理>函数頁面,點擊對應函数名称,進入函数详情頁,配置>基础配置>监听端口
    • username/password配置為阿里雲帳號的AccessKey_ID/AccessKey_Secret, 如果是RAM使用者,需要確保RAM使用者有AliyunPrometheusMetricWriteAccessAliyunCloudMonitorFullAccess許可權,具體可以參考:通過Remote Write地址將開源Prometheus資料寫入阿里雲Prometheus執行個體

    • endpoint填寫Prometheus執行個體的公網Remote Write地址;

      擷取方式:進入Prometheus執行個體列表,點擊對應執行個體名稱/執行個體Id,在設定頁面中查看並複製。
    • scrape_interval(可選):指標拉取時間間隔

    一個完整的otel_config.yaml設定檔樣本如下:

    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: "{uner-vllm}"
              scrape_interval: 15s
              metrics_path: /metrics
              static_configs:
                - targets: ["0.0.0.0:9000"]
    
    processors:
      batch:
    
    extensions:
      basicauth/prw:
        client_auth:
          username: "LT******S2K"
          password: "t6Z7o******B55"
    
    exporters:
      prometheusremotewrite:
        endpoint: "https://workspace-default-******.cn-hangzhou.******/api/v1/write"
        auth:
          authenticator: basicauth/prw
    
    service:
      extensions: [basicauth/prw]
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [batch]
          exporters: [prometheusremotewrite]

步驟二:上傳二進位及設定檔至OSS

  1. Bucket 列表頁面,點擊對應Bucket名稱;

  2. 在左側導覽列進入文件管理 > 檔案清單,然後單擊上傳文件

  3. 在上傳面板中,將步驟一otelcol-contrib二進位檔案和修改後的otel_config.yaml分別拖拽到上傳地區;

  4. 其他參數保持預設,單擊上傳文件

步驟三:修改函數執行個體配置

  1. 進入Function Compute控制台,選擇函数管理>函数頁面,點擊對應函数名称

  2. 啟動參數配置:預設情況下LLM指標是開啟的,如本文樣本的vllm引擎,不需要執行此步驟;有些則是需要通過啟動參數顯示開啟。

    比如SGLang,需要點擊配置>基礎配置的編輯按鈕,在启动命令中增加--enable-metrics;

  3. 編輯配置>高级配置>存储:開啟挂载 OSS 对象存储,配置OSS 挂载点

    Bucket/Bucket 子目录:選擇otelcol-contrib二進位檔案和修改後的otel_config.yaml所在的Bucket及目錄;

    說明

    如您通過在Function Compute中部署LLM推理模型已掛載過OSSObject Storage Service,可通過添加挂载点增加一個新的Bucket。

  4. 函數本地目錄:填寫需要掛載到函數執行個體本地的目錄位址,如/mnt1/

  5. 編輯配置>实例配置

    1. 開啟实例预热

    2. 填寫预热超时时间:10;

      執行 Initializer 回調程式的逾時時間。單位為秒,取值範圍為 1 ~ 300。可以根據實際情況調整。
    3. 预热程序类型選擇执行指令指令内容

      說明

      下述內容中的/mnt1需替換為實際掛載的函數本地目錄

      nohup /mnt1/otelcol-contrib --config=/mnt1/otel_config.yaml >/dev/null 2>&1 &
    4. 點擊部署,完成配置。

驗證測試

驗證模型服務是否正常運行

使用curl命令調用模型服務,其中https://{fc_public_endpoint}需替換為函數HTTP觸發器的公網訪問地址,如:https://*******.cn-****.fcapp.run

擷取方式:函数管理>函数頁面,點擊對應函数名称,進入函数详情頁,触发器>配置信息>公网访问地址
curl --request POST \
  --url https://{fc_public_endpoint}/v1/chat/completions \
  --header 'Authorization: Bearer REPLACE_BEARER_TOKEN' \
  --header 'content-type: application/json' \
  --data '{"messages":[{"content":"Hello! 你是誰?","role":"user"}],"model":"Qwen/Qwen3-0.6B","stream":false}'

返回結果樣本:

{
  "id":"chatcmpl-b27876a60f9749859638********b",
  "object":"chat.completion",
  "created":1762760556,
  "model":"Qwen/Qwen3-0.6B",
  "choices":[
    {
      "index":0,
      "message":{
        "role":"assistant",
        "content":"我很高興認識你,是每一位同學!想和你聊天開心嗎?\t\t\t\t\t\t\t\n好了,現在我們可以聊聊學習和生活的問題~",
        "refusal":null,
        "annotations":null,
        "audio":null,
        "function_call":null,
        "tool_calls":[],
        "reasoning_content":null
        },
        "logprobs":null,
        "finish_reason":"stop",
        "stop_reason":null
        }
        ],
        "service_tier":null,
        "system_fingerprint":null,
        "usage":{
        "prompt_tokens":14,
        "total_tokens":42,
        "completion_tokens":28,
        "prompt_tokens_details":null
        },
        "prompt_logprobs":null,
        "kv_transfer_params":null
        }  

驗證指標是否上報

  1. 進入ARMS控制台>Prometheus 監控>執行個體列表,點擊對應的Prometheus執行個體名稱;

  2. Prometheus執行個體詳情頁,點擊指標管理,選擇指標統計。該頁面展示了指標數、資料量等資訊。

  3. 點擊指標探索,輸入Promql:vllm:prompt_tokens_created{}進行查詢。

後續步驟

配置Grafana面板

  1. 應用即時監控服務ARMS控制台上,選擇Grafana服務>概覽,點擊建立工作區,參考如下配置進行建立;

    配置項

    說明

    工作區名稱

    自訂Grafana工作區名稱。

    Admin 密碼

    設定Admin密碼。

    地區

    選擇工作區所在地區。

    版本

    可觀測可視化 Grafana 版計費版本。各版本計費詳情,請參見計費規則

    Grafana版本號碼

    選擇Grafana版本。

    使用者帳號數量

    選擇帳號數量。各版本計費詳情,請參見計費規則

    購買時間長度

    選擇購買時間長度。

    到期自動續約

    如果選擇自動續約,在試用到期後將會自動進入計費模式。各版本計費詳情,請參見計費規則

    資源群組

    您可以使用資源群組對雲帳號下的資源做分類分組管理,以組為單元進行許可權管理、資源部署、資源監控等,而無需單獨處理各個資源。

    標籤

    為工作區設定標籤鍵標籤值,便於分組管理。

  2. 工作區列表,選擇剛才建立的工作區;

  3. 雲端服務整合模組,選擇Prometheus 監控服務,對目標Prometheus執行個體進行整合;

  4. 基本資料查看公網訪問地址,並在瀏覽器中開啟,選擇Sign in with Alibaba Cloud,使用阿里雲登入;

  5. 配置可視化面板,可以使用vLLM、SGLang等提供的Grafana配置範例,以下內容僅供參考:

    vLLM引擎

    {
      "annotations": {
        "list": [
          {
            "builtIn": 1,
            "datasource": {
              "type": "grafana",
              "uid": "-- Grafana --"
            },
            "enable": true,
            "hide": true,
            "iconColor": "rgba(0, 211, 255, 1)",
            "name": "Annotations & Alerts",
            "target": {
              "limit": 100,
              "matchAny": false,
              "tags": [],
              "type": "dashboard"
            },
            "type": "dashboard"
          }
        ]
      },
      "description": "Monitoring vLLM Inference Server",
      "editable": true,
      "fiscalYearStartMonth": 0,
      "graphTooltip": 0,
      "id": 1,
      "links": [],
      "liveNow": false,
      "panels": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "description": "End to end request latency measured in seconds.",
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green",
                    "value": null
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              },
              "unit": "s"
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 0
          },
          "id": 9,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "mode": "single",
              "sort": "none"
            }
          },
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "histogram_quantile(0.99, sum by(le) (rate(vllm:e2e_request_latency_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "P99",
              "range": true,
              "refId": "A",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "histogram_quantile(0.95, sum by(le) (rate(vllm:e2e_request_latency_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "P95",
              "range": true,
              "refId": "B",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "histogram_quantile(0.9, sum by(le) (rate(vllm:e2e_request_latency_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "P90",
              "range": true,
              "refId": "C",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "histogram_quantile(0.5, sum by(le) (rate(vllm:e2e_request_latency_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "P50",
              "range": true,
              "refId": "D",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "editorMode": "code",
              "expr": "rate(vllm:e2e_request_latency_seconds_sum{model_name=\"$model_name\"}[$__rate_interval])\n/\nrate(vllm:e2e_request_latency_seconds_count{model_name=\"$model_name\"}[$__rate_interval])",
              "hide": false,
              "instant": false,
              "legendFormat": "Average",
              "range": true,
              "refId": "E"
            }
          ],
          "title": "E2E Request Latency",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "description": "Number of tokens processed per second",
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green",
                    "value": null
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 12,
            "y": 0
          },
          "id": 8,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "mode": "single",
              "sort": "none"
            }
          },
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "rate(vllm:prompt_tokens_total{model_name=\"$model_name\"}[$__rate_interval])",
              "fullMetaSearch": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "Prompt Tokens/Sec",
              "range": true,
              "refId": "A",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "rate(vllm:generation_tokens_total{model_name=\"$model_name\"}[$__rate_interval])",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "Generation Tokens/Sec",
              "range": true,
              "refId": "B",
              "useBackend": false
            }
          ],
          "title": "Token Throughput",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "description": "Inter token latency in seconds.",
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green",
                    "value": null
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              },
              "unit": "s"
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 8
          },
          "id": 10,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "mode": "single",
              "sort": "none"
            }
          },
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "histogram_quantile(0.99, sum by(le) (rate(vllm:inter_token_latency_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "P99",
              "range": true,
              "refId": "A",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "histogram_quantile(0.95, sum by(le) (rate(vllm:inter_token_latency_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "P95",
              "range": true,
              "refId": "B",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "histogram_quantile(0.9, sum by(le) (rate(vllm:inter_token_latency_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "P90",
              "range": true,
              "refId": "C",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "histogram_quantile(0.5, sum by(le) (rate(vllm:inter_token_latency_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "P50",
              "range": true,
              "refId": "D",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "editorMode": "code",
              "expr": "rate(vllm:inter_token_latency_seconds_sum{model_name=\"$model_name\"}[$__rate_interval])\n/\nrate(vllm:inter_token_latency_seconds_count{model_name=\"$model_name\"}[$__rate_interval])",
              "hide": false,
              "instant": false,
              "legendFormat": "Mean",
              "range": true,
              "refId": "E"
            }
          ],
          "title": "Inter Token Latency",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "description": "Number of requests in RUNNING, WAITING, and SWAPPED state",
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green",
                    "value": null
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              },
              "unit": "none"
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 12,
            "y": 8
          },
          "id": 3,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "mode": "single",
              "sort": "none"
            }
          },
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "vllm:num_requests_running{model_name=\"$model_name\"}",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "Num Running",
              "range": true,
              "refId": "A",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "vllm:num_requests_waiting{model_name=\"$model_name\"}",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "Num Waiting",
              "range": true,
              "refId": "C",
              "useBackend": false
            }
          ],
          "title": "Scheduler State",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "description": "P50, P90, P95, and P99 TTFT latency in seconds.",
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green",
                    "value": null
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              },
              "unit": "s"
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 16
          },
          "id": 5,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "mode": "single",
              "sort": "none"
            }
          },
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "histogram_quantile(0.99, sum by(le) (rate(vllm:time_to_first_token_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "P99",
              "range": true,
              "refId": "A",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "histogram_quantile(0.95, sum by(le) (rate(vllm:time_to_first_token_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "P95",
              "range": true,
              "refId": "B",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "histogram_quantile(0.9, sum by(le) (rate(vllm:time_to_first_token_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "P90",
              "range": true,
              "refId": "C",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "histogram_quantile(0.5, sum by(le) (rate(vllm:time_to_first_token_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": false,
              "instant": false,
              "legendFormat": "P50",
              "range": true,
              "refId": "D",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "editorMode": "code",
              "expr": "rate(vllm:time_to_first_token_seconds_sum{model_name=\"$model_name\"}[$__rate_interval])\n/\nrate(vllm:time_to_first_token_seconds_count{model_name=\"$model_name\"}[$__rate_interval])",
              "hide": false,
              "instant": false,
              "legendFormat": "Average",
              "range": true,
              "refId": "E"
            }
          ],
          "title": "Time To First Token Latency",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "description": "Percentage of used cache blocks by vLLM.",
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green",
                    "value": null
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              },
              "unit": "percentunit"
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 12,
            "y": 16
          },
          "id": 4,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "mode": "single",
              "sort": "none"
            }
          },
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "editorMode": "code",
              "expr": "vllm:gpu_cache_usage_perc{model_name=\"$model_name\"}",
              "instant": false,
              "legendFormat": "GPU Cache Usage",
              "range": true,
              "refId": "A"
            }
          ],
          "title": "Cache Utilization",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "description": "Heatmap of request prompt length",
          "fieldConfig": {
            "defaults": {
              "custom": {
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "scaleDistribution": {
                  "type": "linear"
                }
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 24
          },
          "id": 12,
          "options": {
            "calculate": false,
            "cellGap": 1,
            "cellValues": {
              "unit": "none"
            },
            "color": {
              "exponent": 0.5,
              "fill": "dark-orange",
              "min": 0,
              "mode": "scheme",
              "reverse": false,
              "scale": "exponential",
              "scheme": "Spectral",
              "steps": 64
            },
            "exemplars": {
              "color": "rgba(255,0,255,0.7)"
            },
            "filterValues": {
              "le": 1e-9
            },
            "legend": {
              "show": true
            },
            "rowsFrame": {
              "layout": "auto",
              "value": "Request count"
            },
            "tooltip": {
              "mode": "single",
              "show": true,
              "showColorScale": false,
              "yHistogram": true
            },
            "yAxis": {
              "axisLabel": "Prompt Length",
              "axisPlacement": "left",
              "reverse": false,
              "unit": "none"
            }
          },
          "pluginVersion": "10.0.9",
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "sum by(le) (increase(vllm:request_prompt_tokens_bucket{model_name=\"$model_name\"}[$__rate_interval]))",
              "format": "heatmap",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "{{le}}",
              "range": true,
              "refId": "A",
              "useBackend": false
            }
          ],
          "title": "Request Prompt Length",
          "type": "heatmap"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "description": "Heatmap of request generation length",
          "fieldConfig": {
            "defaults": {
              "custom": {
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "scaleDistribution": {
                  "type": "linear"
                }
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 12,
            "y": 24
          },
          "id": 13,
          "options": {
            "calculate": false,
            "cellGap": 1,
            "cellValues": {
              "unit": "none"
            },
            "color": {
              "exponent": 0.5,
              "fill": "dark-orange",
              "min": 0,
              "mode": "scheme",
              "reverse": false,
              "scale": "exponential",
              "scheme": "Spectral",
              "steps": 64
            },
            "exemplars": {
              "color": "rgba(255,0,255,0.7)"
            },
            "filterValues": {
              "le": 1e-9
            },
            "legend": {
              "show": true
            },
            "rowsFrame": {
              "layout": "auto",
              "value": "Request count"
            },
            "tooltip": {
              "mode": "single",
              "show": true,
              "showColorScale": false,
              "yHistogram": true
            },
            "yAxis": {
              "axisLabel": "Generation Length",
              "axisPlacement": "left",
              "reverse": false,
              "unit": "none"
            }
          },
          "pluginVersion": "10.0.9",
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "sum by(le) (increase(vllm:request_generation_tokens_bucket{model_name=\"$model_name\"}[$__rate_interval]))",
              "format": "heatmap",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "{{le}}",
              "range": true,
              "refId": "A",
              "useBackend": false
            }
          ],
          "title": "Request Generation Length",
          "type": "heatmap"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "description": "Number of finished requests by their finish reason: either an EOS token was generated or the max sequence length was reached.",
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green"
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 32
          },
          "id": 11,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "mode": "single",
              "sort": "none"
            }
          },
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "sum by(finished_reason) (increase(vllm:request_success_total{model_name=\"$model_name\"}[$__rate_interval]))",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "interval": "",
              "legendFormat": "__auto",
              "range": true,
              "refId": "A",
              "useBackend": false
            }
          ],
          "title": "Finish Reason",
          "type": "timeseries"
        },
        {
          "datasource": {
            "default": false,
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "seconds",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green"
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 12,
            "y": 32
          },
          "id": 14,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "mode": "single",
              "sort": "none"
            }
          },
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "rate(vllm:request_queue_time_seconds_sum{model_name=\"$model_name\"}[$__rate_interval])",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "__auto",
              "range": true,
              "refId": "A",
              "useBackend": false
            }
          ],
          "title": "Queue Time",
          "type": "timeseries"
        },
        {
          "datasource": {
            "default": false,
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green"
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 40
          },
          "id": 15,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "mode": "single",
              "sort": "none"
            }
          },
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "rate(vllm:request_prefill_time_seconds_sum{model_name=\"$model_name\"}[$__rate_interval])",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "Prefill",
              "range": true,
              "refId": "A",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "editorMode": "code",
              "expr": "rate(vllm:request_decode_time_seconds_sum{model_name=\"$model_name\"}[$__rate_interval])",
              "hide": false,
              "instant": false,
              "legendFormat": "Decode",
              "range": true,
              "refId": "B"
            }
          ],
          "title": "Requests Prefill and Decode Time",
          "type": "timeseries"
        },
        {
          "datasource": {
            "default": false,
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green"
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 12,
            "y": 40
          },
          "id": 16,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "mode": "single",
              "sort": "none"
            }
          },
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "rate(vllm:request_max_num_generation_tokens_sum{model_name=\"$model_name\"}[$__rate_interval])",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "Tokens",
              "range": true,
              "refId": "A",
              "useBackend": false
            }
          ],
          "title": "Max Generation Token in Sequence Group",
          "type": "timeseries"
        }
      ],
      "refresh": "5s",
      "schemaVersion": 38,
      "style": "dark",
      "tags": [],
      "templating": {
        "list": [
          {
            "current": {
              "selected": false,
              "text": "prom-lrg2uqia5q",
              "value": "prom-lrg2uqia5q"
            },
            "hide": 0,
            "includeAll": false,
            "label": "datasource",
            "multi": false,
            "name": "DS_PROMETHEUS",
            "options": [],
            "query": "prometheus",
            "queryValue": "",
            "refresh": 1,
            "regex": "",
            "skipUrlSync": false,
            "type": "datasource"
          },
          {
            "current": {
              "selected": false,
              "text": "/mnt1/model/Qwen3-4B-FP8",
              "value": "/mnt1/model/Qwen3-4B-FP8"
            },
            "datasource": {
              "type": "prometheus",
              "uid": "${DS_PROMETHEUS}"
            },
            "definition": "label_values(model_name)",
            "hide": 0,
            "includeAll": false,
            "label": "model_name",
            "multi": false,
            "name": "model_name",
            "options": [],
            "query": {
              "query": "label_values(model_name)",
              "refId": "StandardVariableQuery"
            },
            "refresh": 1,
            "regex": "",
            "skipUrlSync": false,
            "sort": 0,
            "type": "query"
          }
        ]
      },
      "time": {
        "from": "now-5m",
        "to": "now"
      },
      "timepicker": {},
      "timezone": "",
      "title": "vLLM",
      "uid": "b281712d-8bff-41ef-9f3f-71ad43c05e9b",
      "version": 2,
      "weekStart": ""
    }

    SGLang引擎

    {
      "annotations": {
        "list": [
          {
            "builtIn": 1,
            "datasource": {
              "type": "grafana",
              "uid": "-- Grafana --"
            },
            "enable": true,
            "hide": true,
            "iconColor": "rgba(0, 211, 255, 1)",
            "name": "Annotations & Alerts",
            "type": "dashboard"
          }
        ]
      },
      "editable": true,
      "fiscalYearStartMonth": 0,
      "graphTooltip": 0,
      "id": 8,
      "links": [],
      "liveNow": false,
      "panels": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green",
                    "value": null
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 0
          },
          "id": 14,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "hideZeros": false,
              "mode": "single",
              "sort": "none"
            }
          },
          "pluginVersion": "11.6.0",
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "histogram_quantile(0.99, sum by (le) (rate(sglang:e2e_request_latency_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "P99",
              "range": true,
              "refId": "A",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "histogram_quantile(0.9, sum by (le) (rate(sglang:e2e_request_latency_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "P90",
              "range": true,
              "refId": "B",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "histogram_quantile(0.5, sum by (le) (rate(sglang:e2e_request_latency_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "P50",
              "range": true,
              "refId": "C",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "avg(rate(sglang:e2e_request_latency_seconds_sum{model_name=\"$model_name\"}[$__rate_interval]) /  rate(sglang:e2e_request_latency_seconds_count{model_name=\"$model_name\"}[$__rate_interval]))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "Avg",
              "range": true,
              "refId": "D",
              "useBackend": false
            }
          ],
          "title": "End-to-End Request Latency",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "custom": {
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "scaleDistribution": {
                  "type": "linear"
                }
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 12,
            "y": 0
          },
          "id": 17,
          "maxDataPoints": 30,
          "options": {
            "calculate": false,
            "calculation": {
              "yBuckets": {
                "scale": {
                  "type": "linear"
                }
              }
            },
            "cellGap": 1,
            "cellValues": {},
            "color": {
              "exponent": 0.5,
              "fill": "dark-orange",
              "mode": "scheme",
              "reverse": false,
              "scale": "exponential",
              "scheme": "Spectral",
              "steps": 64
            },
            "exemplars": {
              "color": "rgba(255,0,255,0.7)"
            },
            "filterValues": {
              "le": 1e-9
            },
            "legend": {
              "show": true
            },
            "rowsFrame": {
              "layout": "auto"
            },
            "tooltip": {
              "mode": "single",
              "show": true,
              "showColorScale": true,
              "yHistogram": false
            },
            "yAxis": {
              "axisPlacement": "left",
              "reverse": false,
              "unit": "secs"
            }
          },
          "pluginVersion": "10.0.9",
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "expr": "sum(increase(sglang:e2e_request_latency_seconds_bucket{model_name=~\"$model_name\"}[$__rate_interval])) by (le)",
              "format": "heatmap",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "{{le}}",
              "range": true,
              "refId": "A",
              "useBackend": false
            }
          ],
          "title": "End-to-End Request Latency(s) Heatmap",
          "type": "heatmap"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green",
                    "value": null
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 8
          },
          "id": 20,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "hideZeros": false,
              "mode": "single",
              "sort": "none"
            }
          },
          "pluginVersion": "11.6.0",
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "histogram_quantile(0.99, sum by (le) (rate(sglang:time_to_first_token_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "P99",
              "range": true,
              "refId": "A",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "histogram_quantile(0.9, sum by (le) (rate(sglang:time_to_first_token_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "P90",
              "range": true,
              "refId": "B",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "histogram_quantile(0.5, sum by (le) (rate(sglang:time_to_first_token_seconds_bucket{model_name=\"$model_name\"}[$__rate_interval])))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "P50",
              "range": true,
              "refId": "C",
              "useBackend": false
            },
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "avg(rate(sglang:time_to_first_token_seconds_sum{model_name=\"$model_name\"}[$__rate_interval]) /  rate(sglang:time_to_first_token_seconds_count{model_name=\"$model_name\"}[$__rate_interval]))",
              "fullMetaSearch": false,
              "hide": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "Avg",
              "range": true,
              "refId": "D",
              "useBackend": false
            }
          ],
          "title": "Time-To-First-Token Latency",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "custom": {
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "scaleDistribution": {
                  "type": "linear"
                }
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 12,
            "y": 8
          },
          "id": 19,
          "maxDataPoints": 30,
          "options": {
            "calculate": false,
            "calculation": {
              "xBuckets": {
                "value": ""
              },
              "yBuckets": {
                "mode": "size",
                "scale": {
                  "type": "linear"
                },
                "value": ""
              }
            },
            "cellGap": 1,
            "color": {
              "exponent": 0.5,
              "fill": "dark-orange",
              "mode": "scheme",
              "reverse": false,
              "scale": "exponential",
              "scheme": "Spectral",
              "steps": 64
            },
            "exemplars": {
              "color": "rgba(255,0,255,0.7)"
            },
            "filterValues": {
              "le": 1e-9
            },
            "legend": {
              "show": true
            },
            "rowsFrame": {
              "layout": "auto"
            },
            "tooltip": {
              "mode": "single",
              "show": true,
              "showColorScale": true,
              "yHistogram": false
            },
            "yAxis": {
              "axisPlacement": "left",
              "reverse": false
            }
          },
          "pluginVersion": "10.0.9",
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "builder",
              "exemplar": false,
              "expr": "sum by(le) (increase(sglang:time_to_first_token_seconds_bucket{model_name=~\"$model_name\"}[$__rate_interval]))",
              "format": "heatmap",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "interval": "",
              "legendFormat": "{{le}}",
              "range": true,
              "refId": "A",
              "useBackend": false
            }
          ],
          "title": "Time-To-First-Token Seconds Heatmap",
          "type": "heatmap"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green",
                    "value": null
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 16
          },
          "id": 7,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "hideZeros": false,
              "mode": "single",
              "sort": "none"
            }
          },
          "pluginVersion": "11.6.0",
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "sglang:num_running_reqs{model_name=\"$model_name\"}",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "interval": "",
              "legendFormat": "{{instance}}",
              "range": true,
              "refId": "A",
              "useBackend": false
            }
          ],
          "title": "Num Running Requests",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green",
                    "value": null
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 12,
            "y": 16
          },
          "id": 18,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "hideZeros": false,
              "mode": "single",
              "sort": "none"
            }
          },
          "pluginVersion": "11.6.0",
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "editorMode": "code",
              "expr": "sglang:gen_throughput{model_name=\"$model_name\"}",
              "instant": false,
              "legendFormat": "{{instance}}",
              "range": true,
              "refId": "A"
            }
          ],
          "title": "Token Generation Throughput (Tokens / S)",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green"
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 24
          },
          "id": 11,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "hideZeros": false,
              "mode": "single",
              "sort": "none"
            }
          },
          "pluginVersion": "11.6.0",
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "sglang:cache_hit_rate{model_name=\"$model_name\"}",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "{{instance}}",
              "range": true,
              "refId": "A",
              "useBackend": false
            }
          ],
          "title": "Cache Hit Rate",
          "type": "timeseries"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "fieldConfig": {
            "defaults": {
              "color": {
                "mode": "palette-classic"
              },
              "custom": {
                "axisCenteredZero": false,
                "axisColorMode": "text",
                "axisLabel": "",
                "axisPlacement": "auto",
                "barAlignment": 0,
                "drawStyle": "line",
                "fillOpacity": 0,
                "gradientMode": "none",
                "hideFrom": {
                  "legend": false,
                  "tooltip": false,
                  "viz": false
                },
                "lineInterpolation": "linear",
                "lineWidth": 1,
                "pointSize": 5,
                "scaleDistribution": {
                  "type": "linear"
                },
                "showPoints": "auto",
                "spanNulls": false,
                "stacking": {
                  "group": "A",
                  "mode": "none"
                },
                "thresholdsStyle": {
                  "mode": "off"
                }
              },
              "mappings": [],
              "thresholds": {
                "mode": "absolute",
                "steps": [
                  {
                    "color": "green"
                  },
                  {
                    "color": "red",
                    "value": 80
                  }
                ]
              }
            },
            "overrides": []
          },
          "gridPos": {
            "h": 8,
            "w": 12,
            "x": 12,
            "y": 24
          },
          "id": 8,
          "options": {
            "legend": {
              "calcs": [],
              "displayMode": "list",
              "placement": "bottom",
              "showLegend": true
            },
            "tooltip": {
              "hideZeros": false,
              "mode": "single",
              "sort": "none"
            }
          },
          "pluginVersion": "11.6.0",
          "targets": [
            {
              "datasource": {
                "type": "prometheus",
                "uid": "${DS_PROMETHEUS}"
              },
              "disableTextWrap": false,
              "editorMode": "code",
              "expr": "sglang:num_queue_reqs{model_name=\"$model_name\"}",
              "fullMetaSearch": false,
              "includeNullMetadata": true,
              "instant": false,
              "legendFormat": "{{instance}}",
              "range": true,
              "refId": "A",
              "useBackend": false
            }
          ],
          "title": "Number Queued Requests",
          "type": "timeseries"
        }
      ],
      "refresh": false,
      "schemaVersion": 38,
      "style": "dark",
      "tags": [],
      "templating": {
        "list": [
          {
            "current": {
              "selected": true,
              "text": "prom-lrg2uqia5q",
              "value": "prom-lrg2uqia5q"
            },
            "hide": 0,
            "includeAll": false,
            "label": "datasource",
            "multi": false,
            "name": "DS_PROMETHEUS",
            "options": [],
            "query": "prometheus",
            "queryValue": "",
            "refresh": 1,
            "regex": "",
            "skipUrlSync": false,
            "type": "datasource"
          },
          {
            "current": {
              "selected": true,
              "text": [
                "/mnt1/model/Qwen3-32B"
              ],
              "value": [
                "/mnt1/model/Qwen3-32B"
              ]
            },
            "datasource": {
              "type": "prometheus",
              "uid": "${DS_PROMETHEUS}"
            },
            "definition": "label_values(model_name)",
            "hide": 0,
            "includeAll": true,
            "label": "model_name",
            "multi": true,
            "name": "model_name",
            "options": [],
            "query": {
              "query": "label_values(model_name)",
              "refId": "StandardVariableQuery"
            },
            "refresh": 1,
            "regex": "",
            "skipUrlSync": false,
            "sort": 0,
            "type": "query"
          }
        ]
      },
      "time": {
        "from": "2025-10-20T11:22:46.266Z",
        "to": "2025-10-20T11:25:59.269Z"
      },
      "timepicker": {},
      "timezone": "browser",
      "title": "SGLang Dashboard1",
      "uid": "sglang-dashboard1",
      "version": 3,
      "weekStart": ""
    }

配置警示

可以在阿里雲託管Prometheus服務,參考建立Prometheus警示規則進行警示配置。

計費說明

本方案中涉及的主要雲產品及計費說明如下:

  • 應用即時監控服務(ARMS):方案中使用的Prometheus以及Grafana均由阿里雲應用即時監控服務(ARMS)產品提供計費說明參考產品計費(新版)

  • Object Storage Service:儲存OpenTelemetry Collector二進位及設定檔會產生的少量儲存費用,具體費用參考計費概述

  • Function Compute(FC):運行 Collector 進程會佔用少量 CPU 和記憶體資源,可能需要根據實際負載調整函數執行個體規格,費用參考計費概述