配置 ModelAPI 限流策略 - API Gateway

通過調用 UpdateHttpApi 介面，在 Model API 的 deployConfigs 中配置和更新 aiTokenRateLimitConfig（AI Token 限流策略），實現基於 Token 消耗量的多維度限流量控制。

使用情境

AI Token 限流策略以 Token 消耗量為核心指標進行流量管控，與傳統請求數限流不同，能夠精準匹配大模型計算資源的實際消耗。支援按消費者、請求 Header、Query 參數、Cookie、用戶端 IP、模型名稱等多維度配置限流規則，並提供全域（API 層級）限流能力。

高並發情境: 電商大促期間，按使用者單位時間調用 Token 總量限流，防止惡意高頻調用。
AI 服務調用: 對大模型 API 的調用進行限流，避免因突發流量導致服務品質下降。
多租戶系統: 為不同租戶分配獨立的限流配額，確保公平性和資源隔離。
模型級精細管控: 針對不同模型設定差異化的限流閾值，保護高成本模型資源。

介面資訊

要求方法: PUT
Action: UpdateHttpApi

請求參數結構

AI Token 限流配置位於 UpdateHttpApiRequest 的 deployConfigs[ ].policyConfigs[ ] 中，通過 PolicyConfig 結構傳遞。

PolicyConfig 結構

參數	類型	是否必填	說明
type	String	是	策略類型，固定為 `"AiTokenRateLimit"`
enable	Boolean	是	是否啟用限流策略。`true` 表示開啟,`false` 表示關閉
aiTokenRateLimitConfig	Object	是（enable 為 true 時）	AI Token 限流配置詳情

AiTokenRateLimitConfig 結構

參數	類型	是否必填	說明
rules	Array<AiTokenRateLimitRule>	條件必填	普通限流規則列表（包含按維度限流和模型限流）。`rules` 和 `globalRules` 至少配置一項
enableGlobalRules	Boolean	否	是否開啟全域（API 層級）限流規則，預設 `false`
globalRules	Array<AiTokenRateLimitRule>	條件必填	全域限流規則列表,僅當 `enableGlobalRules` 為 `true` 時生效。規則的 `limitType` 必須為 `"Global"`
redisConfig	Object	否	Redis 配置資訊。 AI網關執行個體預設使用內建KVStore，無需配置`redisConfig` 僅在需要指定外部 Redis 執行個體進行流控key儲存時，才需要配置`redisConfig`。

AiTokenRateLimitRule 結構

參數	類型	是否必填	說明
limitType	String	是	限流維度類型
matchKey	String	條件必填	匹配鍵名。 `Consumer` 類型允許為空白 `IP` 類型自動化佈建為 `from-remote-addr` `Model` 類型自動化佈建為 `x-higress-llm-model` `Global` 類型自動置空
matchType	String	條件必填	匹配模式。 `IP` 類型自動化佈建為 `IP` `Model` 類型自動化佈建為 `Exact` `Global` 類型自動置空
matchValue	String	條件必填	匹配值。 `All` 模式下自動化佈建為 `*` `IP` 類型需為合法 IP 或 IP 段 `Model` 類型為模型名稱 `Global` 類型自動置空
limitMode	String	是	限流模式。
limitValue	Integer	是	限流閾值,必須大於 0

limitType 枚舉值

取值	說明
Header	按請求 Header 限流
Parameter	按請求 Query 參數限流
Consumer	按消費者限流（需先開啟消費者認證）
Cookie	按請求 Cookie 限流
IP	按用戶端 IP 限流
Model	按模型名稱限流（針對特定模型設定獨立限額）
Request	按請求數限流（針對特定 key 的請求數限流）
Concurrency	按並發數限流（針對特定 key 的並發數限流）
Global	全域限流（API 層級，不區分 key，僅用於 `globalRules`）

matchType 枚舉值

取值	說明	適用 limitType
Exact	精確匹配	Header / Parameter / Consumer / Cookie / Request / Concurrency
Prefix	首碼匹配	Header / Parameter / Consumer / Cookie / Request / Concurrency
Regex	正則匹配	Header / Parameter / Consumer / Cookie / Request / Concurrency
All	任意匹配（matchValue 自動設為 `*`）	Header / Parameter / Consumer / Cookie / Request / Concurrency
IP	IP 匹配（自動化佈建，無需手動指定）	IP

說明

匹配優先順序：精確匹配 > 首碼匹配 > 正則匹配 > 任意匹配。命中任一規則即觸發限流。

limitMode 枚舉值

取值	說明	適用 limitType
TokenPerSecond	每秒 Token 限流	所有類型
TokenPerMinute	每分鐘 Token 限流	所有類型
TokenPerHour	每小時 Token 限流	所有類型
TokenPerDay	每天 Token 限流	所有類型
RequestPerSecond	每秒請求數限流	Model / Global / Request
RequestPerMinute	每分鐘請求數限流	Model / Global / Request
RequestPerHour	每小時請求數限流	Model / Global / Request
RequestPerDay	每天請求數限流	Model / Global / Request
ConcurrencyLimit	並發數限流	Model / Global / Concurrency

redisConfig 結構

參數	類型	是否必填	說明
host	String	是	Redis 服務地址
port	Integer	是	Redis 服務連接埠
username	String	否	Redis 使用者名稱
password	String	否	Redis 密碼
timeout	Integer	否	連線逾時時間
databaseNumber	Integer	否	Redis 資料庫編號

重要

AI網關執行個體預設使用內建KVStore，無需配置redisConfig
僅在需要指定外部 Redis 執行個體進行流控key儲存時，才需要配置redisConfig

配置樣本

樣本一：按消費者和 IP 限流

為 Model API 配置消費者限流：任意消費者每分鐘限流 1000 Token。
為 Model API 配置 IP 限流：每個用戶端 IP 每分鐘限流 500 Token。

PUT /v1/http-apis/{httpApiId}

{
  "deployConfigs": [
    {
      "gatewayId": "gw-xxxxxxxxxxxxx",
      "policyConfigs": [
        {
          "type": "AiTokenRateLimit",
          "enable": true,
          "aiTokenRateLimitConfig": {
            "rules": [
              {
                "limitType": "Consumer",
                "matchKey": "",
                "matchType": "All",
                "matchValue": "*",
                "limitMode": "TokenPerMinute",
                "limitValue": 1000
              },
              {
                "limitType": "IP",
                "matchValue": "0.0.0.0/0",
                "limitMode": "TokenPerMinute",
                "limitValue": 500
              }
            ]
          }
        }
      ]
    }
  ]
}

說明

IP 類型的 matchKey 和 matchType 由系統自動化佈建，無需手動指定。
matchValue 設為 0.0.0.0/0 表示匹配所有用戶端 IP。

樣本二：按請求 Header 精確匹配限流

限制 Header 中 x-user-level 值為 beta 的請求，每分鐘限流 100 Token。

PUT /v1/http-apis/{httpApiId}

{
  "deployConfigs": [
    {
      "gatewayId": "gw-xxxxxxxxxxxxx",
      "policyConfigs": [
        {
          "type": "AiTokenRateLimit",
          "enable": true,
          "aiTokenRateLimitConfig": {
            "rules": [
              {
                "limitType": "Header",
                "matchKey": "x-user-level",
                "matchType": "Exact",
                "matchValue": "beta",
                "limitMode": "TokenPerMinute",
                "limitValue": 100
              }
            ]
          }
        }
      ]
    }
  ]
}

樣本三：按模型名稱限流

針對不同模型設定差異化限流：

qwen-max 每分鐘限流 500 Token。
qwen-plus 每分鐘限流 2000 Token。
qwen-max 每分鐘最多 10 次請求。

PUT /v1/http-apis/{httpApiId}

{
  "deployConfigs": [
    {
      "gatewayId": "gw-xxxxxxxxxxxxx",
      "policyConfigs": [
        {
          "type": "AiTokenRateLimit",
          "enable": true,
          "aiTokenRateLimitConfig": {
            "rules": [
              {
                "limitType": "Model",
                "matchValue": "qwen-max",
                "limitMode": "TokenPerMinute",
                "limitValue": 500
              },
              {
                "limitType": "Model",
                "matchValue": "qwen-plus",
                "limitMode": "TokenPerMinute",
                "limitValue": 2000
              },
              {
                "limitType": "Model",
                "matchValue": "qwen-max",
                "limitMode": "RequestPerMinute",
                "limitValue": 10
              }
            ]
          }
        }
      ]
    }
  ]
}

說明

Model 類型的 matchKey 和 matchType 由系統自動化佈建（分別為 x-higress-llm-model 和 Exact），無需手動指定。
matchValue 填寫目標模型名稱。

樣本四：開啟全域限流

開啟 API 層級的全域限流：整個 API 每分鐘最多消耗 10000 Token，每分鐘最多 100 次請求，最大並發數為 20。

PUT /v1/http-apis/{httpApiId}

{
  "deployConfigs": [
    {
      "gatewayId": "gw-xxxxxxxxxxxxx",
      "policyConfigs": [
        {
          "type": "AiTokenRateLimit",
          "enable": true,
          "aiTokenRateLimitConfig": {
            "rules": [
              {
                "limitType": "Consumer",
                "matchKey": "",
                "matchType": "All",
                "matchValue": "*",
                "limitMode": "TokenPerMinute",
                "limitValue": 1000
              }
            ],
            "enableGlobalRules": true,
            "globalRules": [
              {
                "limitType": "Global",
                "limitMode": "TokenPerMinute",
                "limitValue": 10000
              },
              {
                "limitType": "Global",
                "limitMode": "RequestPerMinute",
                "limitValue": 100
              },
              {
                "limitType": "Global",
                "limitMode": "ConcurrencyLimit",
                "limitValue": 20
              }
            ]
          }
        }
      ]
    }
  ]
}

說明

全域規則的 limitType 必須為 "Global"，不需要指定 matchKey、matchType、matchValue（系統會自動置空）。
全域限流支援 Token、Request、Concurrency 三種限流模式。

樣本五：配置外部 Redis

PUT /v1/http-apis/{httpApiId}

{
  "deployConfigs": [
    {
      "gatewayId": "gw-xxxxxxxxxxxxx",
      "gatewayType": "API",
      "policyConfigs": [
        {
          "type": "AiTokenRateLimit",
          "enable": true,
          "aiTokenRateLimitConfig": {
            "rules": [
              {
                "limitType": "Consumer",
                "matchKey": "",
                "matchType": "All",
                "matchValue": "*",
                "limitMode": "TokenPerMinute",
                "limitValue": 1000
              }
            ],
            "redisConfig": {
              "host": "r-bp1xxxxxxxxxxxxx.redis.rds.aliyuncs.com",
              "port": 6379,
              "username": "",
              "password": "your-redis-password",
              "databaseNumber": 0
            }
          }
        }
      ]
    }
  ]
}

樣本六：關閉 Token 限流

將 enable 設定為 false 即可關閉限流策略。關閉後，已配置的規則不會生效，但配置資訊會保留。

PUT /v1/http-apis/{httpApiId}

{
  "deployConfigs": [
    {
      "gatewayId": "gw-xxxxxxxxxxxxx",
      "policyConfigs": [
        {
          "type": "AiTokenRateLimit",
          "enable": false,
          "aiTokenRateLimitConfig": {
            "rules": [
              {
                "limitType": "Consumer",
                "matchKey": "",
                "matchType": "All",
                "matchValue": "*",
                "limitMode": "TokenPerMinute",
                "limitValue": 1000
              }
            ]
          }
        }
      ]
    }
  ]
}

樣本七：更新限流規則

更新已有的限流規則，需在請求中傳入完整的 policyConfigs 配置。以下樣本將消費者限流閾值從 1000 調整為 2000。

PUT /v1/http-apis/{httpApiId}

{
  "deployConfigs": [
    {
      "gatewayId": "gw-xxxxxxxxxxxxx",
      "policyConfigs": [
        {
          "type": "AiTokenRateLimit",
          "enable": true,
          "aiTokenRateLimitConfig": {
            "rules": [
              {
                "limitType": "Consumer",
                "matchKey": "",
                "matchType": "All",
                "matchValue": "*",
                "limitMode": "TokenPerMinute",
                "limitValue": 2000
              }
            ]
          }
        }
      ]
    }
  ]
}

配置更新規則

通過 UpdateHttpApi 更新策略配置時，系統按以下規則處理 deployConfigs 和 policyConfigs:

PolicyConfigs 整體替換: 當 policyConfigs 欄位非空時，會整體替換該 gatewayId 下的所有策略配置。更新限流規則時，需要同時傳入其他策略（如 AiFallback、AiStatistics 等）的配置，否則其他策略配置會被清空。
限流規則全量更新: aiTokenRateLimitConfig.rules 和 aiTokenRateLimitConfig.globalRules 均為全量更新，每次請求需傳入完整的規則列表。

重要

更新限流配置前，建議先通過 GetHttpApi 擷取當前完整的 deployConfigs 和 policyConfigs，在此基礎上修改限流相關配置後再提交更新，以避免誤覆蓋其他策略配置。

配置驗證規則

提交配置時，系統會對 aiTokenRateLimitConfig 進行以下驗證：

驗證項	規則
規則數量	`rules` 和 `globalRules` 至少需要配置一項
limitType 與 matchType 組合	Header / Parameter / Consumer / Cookie / Request / Concurrency 支援 Exact / Prefix / Regex / All IP 僅支援 IP 模式 Model 自動設為 Exact Global 不需要 matchType
matchKey	Header / Parameter / Cookie / Request / Concurrency 類型必填 Consumer 類型允許為空白 IP / Model / Global 類型由系統自動化佈建
matchValue	Exact / Prefix / Regex 模式下必填 All 模式自動設為 `*` IP 類型需為合法 IP 或 IP 段格式
limitValue	必須大於 0 Model 類型的 `limitValue` 不能超過 Int32 最大值
globalRules 中的 limitType	必須為 `"Global"`
rules 中的 limitType	不能為 `"Global"`（Global 類型僅允許出現在 globalRules 中）
redisConfig	AI網關執行個體預設使用內建KVStore，無需配置`redisConfig` 僅在需要指定外部 Redis 執行個體進行流控key儲存時，才需要配置`redisConfig`

常見問題

Q：AI 網關執行個體是否需要配置 redisConfig？

A：不需要。AI網關執行個體預設使用內建KVStore，無需配置redisConfig。僅在需要指定外部 Redis 執行個體進行流控key儲存時，才需要配置redisConfig。

Q：配置按消費者限流時，需要注意什嗎？

A：配置按消費者（Consumer）限流前，需要先為該 Model API 開啟消費者認證。否則限流策略無法識別消費者身份，規則不會生效。

Q：多條規則之間的關係是什嗎？

A：多條規則之間為或的關係，即命中任一規則即觸發限流。相同限流維度，即相同 limitType + matchKey的規則，會被合并到同一個規則群組中執行。

Q：全域限流和普通規則可以同時使用嗎？

A：可以。全域限流（globalRules）作用於整個 API 層級，不區分具體的 key；普通規則（rules）按維度細分限流。兩者可以疊加使用，任一規則引發即執行限流。

Q：更新限流配置後多久生效？

A：更新配置後，系統會自動將新的限流規則推送到網關資料面。通常在幾秒內生效。

Q：限流在分布式架構下的準確性如何？

A：由於分布式架構的特性，限流計數可能存在輕微偏差。實際允許的請求數與配置數會因請求量、速率、後端延遲等因素產生差異。