All Products
Search
Document Center

API Gateway:Throttling

Last Updated:Apr 21, 2025

A policy or plug-in of this type dynamically throttles traffic based on token usage instead of request numbers or body sizes. This enables it to be particularly suitable for Large Language Model (LLM) services and high-concurrency scenarios. A throttling policy allows you to configure throttling rules for consumers in multiple dimensions, such as identity, request header parameter, query parameter, and client IP address. In addition, it can bill and throttle in real time based on the total number of tokens consumed in a single API call. This token consumption-centered throttling mode can effectively prevent system overload and interface abuse and ensure the stable running of core services in complex scenarios based on the resource consumption characteristics of LLM computing workloads.

Benefits

  • Prevents system overload: This policy can effectively limit high-frequency invocation or malicious requests and thus prevent system breakdown or performance deterioration caused by overloads based on flexible policy settings, such as by consumer, header, query parameter, cookie, or client IP address. In combination with caching policies, this policy can further improve system performance.

  • Allows dynamic throttling: You can throttle a consumer in many granularities, such as per second, per minute, per hour, and per day. You can also flexibly adjust throttling rules based on your business requirements to ensure that your system run stably with high concurrency.

  • Supports multiple matching rules: Throttling policies support multiple matching rules to meet the needs of complex business scenarios that require high priorities.

  • Prevents attacks: By throttling specific consumers, headers, query parameters, or cookies, you can effectively limit the access of crawlers or automated tools to protect data security.

Scenarios

  • High-concurrency scenarios: API callers can be subject to throttling based on their token usage in a unit of time in scenarios such as e-commerce promotions. This prevents malicious high-frequency calls and ensures service stability and promotion fairness.

  • AI service calls: Calls to LLM APIs can be throttled to preempt service quality degradation or system breakdown due to traffic bursts.

  • Multi-tenant systems: Different throttling quotas can be assigned to different tenants in an open platform or multi-tenant architecture to ensure fairness and resource isolation.

  • Defense against attacks: Throttling mechanisms can be established against crawler attacks, DDoS attacks, and API abuse.

Prerequisites

An AI API is created. For more information, see Manage AI APIs.

Procedure

Important

You must add the CIDR block of the virtual private cloud (VPC) where your gateway instance resides to the whitelist in the Tair (Redis OSS-compatible) console.

  1. Log on to the Cloud-native API Gateway console.

  2. In the left-side navigation pane, click API. In the top navigation bar, select a region.

  3. Click the AI API tab. In the API list, click the API that you want to manage.

  4. On the Policies and Plug-ins tab, turn on Current limiting, configure the parameters, and then click Save.

    Note

    Tair is used to store the token usage and time information of a request. This allows Cloud-native API Gateway to calculate the total usage within a time range to determine whether to trigger throttling.

    Parameter

    Description

    Current limiting

    The throttling switch. By default, this switch is turned off.

    Redis service URL

    The Tair service URL.

    Port

    The Tair service port.

    Access Method

    The method in which Tair is accessed. Valid values:

    • Account + password

    • Password-only

    • Password-free

    Database Account

    The account that is used to log on to the destination database.

    Database Password

    The password of the database account.

    Database No.

    The number of the specified database.

    Throttling Policy

    • A throttling policy provides the following conditions:

      • By request header: For example, throttle requests with the beta identifier in the header to 100 tokens per minute.

      • By request query: For example, throttle requests with the user_id=1 query parameter to 100 tokens per minute.

      • By request cookie: For example, throttle requests with the specified identifier in the cookie to 100 tokens per minute.

      • By Consumer: For example, throttle all consumers to 1,000 tokens per minute.

        Important

        To configure throttling by consumer, you must enable Consumer authentication.

      • By client IP address: For example, throttle each client IP address to 100 tokens per minute.

    • All condition types support four throttling rules: Exact match, Prefix Match, Regex Match, and Random match in the following order of priority: exact match > prefix match > regex match > random match.

      Note

      If multiple rules are configured, a request is intercepted when any of the rules is matched.

    • The throttling range can be Every second, Every minute, Every hour, or Every day.

      Note

      Throttling is performed based on the number of inbound or outbound tokens by LLM.