RateLimitingPolicy is a Custom Resource Definition (CRD) in the Service Mesh (ASM) traffic scheduling suite. Use it to declaratively configure global rate limiting for services in an ASM instance. Rate limiting is based on the token bucket algorithm.
How it works
RateLimitingPolicy uses a token bucket to control request rates:
The bucket holds a fixed number of tokens, defined by
bucket_capacity.Tokens are added at a steady rate:
fill_amounttokens everyinterval.Each incoming request consumes one token. When the bucket is empty, requests are rejected with HTTP 429.
Setting
bucket_capacityequal tofill_amountprevents burst traffic. Settingbucket_capacityhigher thanfill_amountallows short bursts above the steady-state rate.
To apply separate rate limits per client or user, group requests by a label (such as a user ID header) so that each group gets its own independent token bucket.
CRD structure
apiVersion: istio.alibabacloud.com/v1
kind: RateLimitingPolicy
metadata:
name: ...
namespace: ...
spec:
rate_limiter: # RateLimiter (required)
fill_amount: ... # double - tokens added per interval
bucket_capacity: ... # double - max tokens in bucket
parameters: # RateLimiterParameters (required)
interval: ... # Duration - refill interval
limit_by_label_key: ... # string - group requests by label
continuous_fill: ... # bool - smooth refill (default: true)
delay_initial_fill: ... # bool - delay first fill (default: false)
max_idle_time: ... # Duration - idle bucket TTL (default: 7200s)
lazy_sync: # RateLimiterParametersLazySync
enabled: ... # bool - enable lazy sync (default: false)
num_sync: ... # int - syncs per interval (default: 4)
request_parameters: # RateLimiterRequestParameters
denied_response_status_code: ... # int - override HTTP 429
tokens_label_key: ... # string - override token cost per request
selectors: # []Selector (required)
- agent_group: ...
control_point: ...
service: ...Examples
Basic rate limiting
The following configuration rate-limits the httpbin service to 2 requests every 30 seconds. Because bucket_capacity equals fill_amount, no burst traffic is allowed. Requests are grouped by the user_id header, so each unique user_id gets its own token bucket.
apiVersion: istio.alibabacloud.com/v1
kind: RateLimitingPolicy
metadata:
name: ratelimit
namespace: istio-system
spec:
rate_limiter:
bucket_capacity: 2 # Max 2 tokens in the bucket
fill_amount: 2 # Add 2 tokens per interval
parameters:
interval: 30s # Refill every 30 seconds
limit_by_label_key: http.request.header.user_id # One bucket per user_id header
selectors:
- agent_group: default
control_point: ingress
service: httpbin.default.svc.cluster.localPer-user rate limiting with burst allowance
The following configuration allows 100 requests per minute with a burst capacity of 150. Lazy sync is enabled to reduce latency at the cost of slightly less accurate enforcement.
apiVersion: istio.alibabacloud.com/v1
kind: RateLimitingPolicy
metadata:
name: per-user-ratelimit
namespace: istio-system
spec:
rate_limiter:
bucket_capacity: 150 # Allow bursts up to 150 requests
fill_amount: 100 # Steady-state: 100 requests per minute
parameters:
interval: 60s
limit_by_label_key: http.request.header.user_id
continuous_fill: true # Smooth token distribution
lazy_sync:
enabled: true # Local decisions, periodic sync
num_sync: 4 # Sync 4 times per interval
request_parameters:
denied_response_status_code: 503 # Return 503 instead of 429
selectors:
- agent_group: default
control_point: ingress
service: my-api.production.svc.cluster.localField reference
RateLimitingPolicySpec
The top-level spec field of a RateLimitingPolicy resource.
| Field | Type | Required | Description |
|---|---|---|---|
rate_limiter | RateLimiter | Yes | Rate limiter configuration. |
RateLimiter
Defines the token bucket parameters, request handling, and target selectors.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
fill_amount | double | Yes | -- | Number of tokens added to the bucket each interval. Together with interval, this defines the steady-state request rate. |
bucket_capacity | double | Yes | -- | Maximum number of tokens the bucket can hold. Set equal to fill_amount to prevent bursts. Set higher to allow short traffic spikes. |
parameters | RateLimiterParameters | Yes | -- | Rate limiter runtime parameters. |
request_parameters | RateLimiterRequestParameters | No | -- | Custom request handling configuration. |
selectors | []Selector | Yes | -- | Services and traffic to which rate limiting applies. |
RateLimiterParameters
Controls how the rate limiter fills the token bucket and groups requests.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
interval | Duration | Yes | -- | Token bucket refill interval. Example: 30s adds fill_amount tokens every 30 seconds. |
limit_by_label_key | string | No | -- | Groups requests by a request label. Each unique label value gets its own token bucket. See Request labels for available label keys. |
continuous_fill | bool | No | true | When true, tokens are added smoothly over the interval rather than all at once when the interval elapses. |
delay_initial_fill | bool | No | false | When false, the bucket starts at full capacity on the first request. This may allow more requests than the configured rate during the first interval. |
lazy_sync | RateLimiterParametersLazySync | No | -- | Lazy synchronization configuration. See Lazy sync: accuracy vs. latency. |
max_idle_time | Duration | No | 7200s | Time to keep a per-label token bucket after its last request. Only applies when limit_by_label_key is set. |
RateLimiterRequestParameters
Overrides default request handling behavior.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
denied_response_status_code | int | No | 429 | HTTP status code returned when a request is rate-limited. |
tokens_label_key | string | No | -- | Request label whose value determines the number of tokens consumed per request, overriding the default of 1. |
RateLimiterParametersLazySync
Controls lazy synchronization between Envoy and the remote agent. See Lazy sync: accuracy vs. latency for guidance on when to enable this feature.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | bool | No | false | Enables lazy synchronization. |
num_sync | int | No | 4 | Number of times Envoy syncs with the remote agent within each interval. |
Lazy sync: accuracy vs. latency
By default, Envoy contacts the remote agent on every request for a precise rate limiting decision. Lazy sync changes this behavior: Envoy decides locally and syncs with the remote agent periodically.
| Mode | Behavior | Accuracy | Latency |
|---|---|---|---|
| Default (lazy sync disabled) | Envoy checks the remote agent per request | High -- exact token count enforcement | Higher -- remote call on every request |
| Lazy sync enabled | Envoy decides locally, syncs num_sync times per interval | Lower -- temporary over/under-counting between syncs | Lower -- most requests skip the remote call |
Enable lazy sync for high-throughput APIs where approximate rate enforcement is acceptable and low latency matters more than exact counting.
Request labels
The ASM traffic scheduling suite assigns labels to each request as key-value pairs. Use these labels with limit_by_label_key to group requests for per-group rate limiting, or with tokens_label_key to vary the token cost per request.
HTTP request metadata
Each HTTP request is automatically labeled with the following metadata:
| Label key | Value | Example |
|---|---|---|
http.method | HTTP method | POST |
http.flavor | HTTP protocol version | 1.1 |
http.host | Request host | httpbin.default.svc.cluster.local |
http.target | Request path | /get |
http.request_content_length | Request body size in bytes | 431 |
http.request.header.<header_name> | Value of the specified request header | http.request.header.user_agent |
Baggage header
Baggage is an OpenTelemetry standard for propagating context across distributed systems. If a request includes a baggage HTTP header, each key-value pair is converted into a request label.
Example header:
baggage: userId=alice,isProduction=falseThis produces two labels: userId: alice and isProduction: false. To rate-limit per user based on Baggage, set limit_by_label_key to userId.