RateLimitingPolicy CRD Overview - Alibaba Cloud Service Mesh

RateLimitingPolicy is a Custom Resource Definition (CRD) in the Service Mesh (ASM) traffic scheduling suite. Use it to declaratively configure global rate limiting for services in an ASM instance. Rate limiting is based on the token bucket algorithm.

How it works

RateLimitingPolicy uses a token bucket to control request rates:

The bucket holds a fixed number of tokens, defined by bucket_capacity.
Tokens are added at a steady rate: fill_amount tokens every interval.
Each incoming request consumes one token. When the bucket is empty, requests are rejected with HTTP 429.
Setting bucket_capacity equal to fill_amount prevents burst traffic. Setting bucket_capacity higher than fill_amount allows short bursts above the steady-state rate.

To apply separate rate limits per client or user, group requests by a label (such as a user ID header) so that each group gets its own independent token bucket.

CRD structure

apiVersion: istio.alibabacloud.com/v1
kind: RateLimitingPolicy
metadata:
  name: ...
  namespace: ...
spec:
  rate_limiter:                        # RateLimiter (required)
    fill_amount: ...                   # double  - tokens added per interval
    bucket_capacity: ...               # double  - max tokens in bucket
    parameters:                        # RateLimiterParameters (required)
      interval: ...                    # Duration - refill interval
      limit_by_label_key: ...          # string  - group requests by label
      continuous_fill: ...             # bool    - smooth refill (default: true)
      delay_initial_fill: ...          # bool    - delay first fill (default: false)
      max_idle_time: ...               # Duration - idle bucket TTL (default: 7200s)
      lazy_sync:                       # RateLimiterParametersLazySync
        enabled: ...                   # bool    - enable lazy sync (default: false)
        num_sync: ...                  # int     - syncs per interval (default: 4)
    request_parameters:                # RateLimiterRequestParameters
      denied_response_status_code: ... # int     - override HTTP 429
      tokens_label_key: ...            # string  - override token cost per request
    selectors:                         # []Selector (required)
    - agent_group: ...
      control_point: ...
      service: ...

Examples

Basic rate limiting

The following configuration rate-limits the httpbin service to 2 requests every 30 seconds. Because bucket_capacity equals fill_amount, no burst traffic is allowed. Requests are grouped by the user_id header, so each unique user_id gets its own token bucket.

apiVersion: istio.alibabacloud.com/v1
kind: RateLimitingPolicy
metadata:
  name: ratelimit
  namespace: istio-system
spec:
  rate_limiter:
    bucket_capacity: 2                                    # Max 2 tokens in the bucket
    fill_amount: 2                                        # Add 2 tokens per interval
    parameters:
      interval: 30s                                       # Refill every 30 seconds
      limit_by_label_key: http.request.header.user_id     # One bucket per user_id header
    selectors:
    - agent_group: default
      control_point: ingress
      service: httpbin.default.svc.cluster.local

Per-user rate limiting with burst allowance

The following configuration allows 100 requests per minute with a burst capacity of 150. Lazy sync is enabled to reduce latency at the cost of slightly less accurate enforcement.

apiVersion: istio.alibabacloud.com/v1
kind: RateLimitingPolicy
metadata:
  name: per-user-ratelimit
  namespace: istio-system
spec:
  rate_limiter:
    bucket_capacity: 150                                  # Allow bursts up to 150 requests
    fill_amount: 100                                      # Steady-state: 100 requests per minute
    parameters:
      interval: 60s
      limit_by_label_key: http.request.header.user_id
      continuous_fill: true                               # Smooth token distribution
      lazy_sync:
        enabled: true                                     # Local decisions, periodic sync
        num_sync: 4                                       # Sync 4 times per interval
    request_parameters:
      denied_response_status_code: 503                    # Return 503 instead of 429
    selectors:
    - agent_group: default
      control_point: ingress
      service: my-api.production.svc.cluster.local

Field reference

RateLimitingPolicySpec

The top-level spec field of a RateLimitingPolicy resource.

Field	Type	Required	Description
`rate_limiter`	RateLimiter	Yes	Rate limiter configuration.

RateLimiter

Defines the token bucket parameters, request handling, and target selectors.

Field	Type	Required	Default	Description
`fill_amount`	double	Yes	--	Number of tokens added to the bucket each interval. Together with `interval`, this defines the steady-state request rate.
`bucket_capacity`	double	Yes	--	Maximum number of tokens the bucket can hold. Set equal to `fill_amount` to prevent bursts. Set higher to allow short traffic spikes.
`parameters`	RateLimiterParameters	Yes	--	Rate limiter runtime parameters.
`request_parameters`	RateLimiterRequestParameters	No	--	Custom request handling configuration.
`selectors`	[]Selector	Yes	--	Services and traffic to which rate limiting applies.

RateLimiterParameters

Controls how the rate limiter fills the token bucket and groups requests.

Field	Type	Required	Default	Description
`interval`	Duration	Yes	--	Token bucket refill interval. Example: `30s` adds `fill_amount` tokens every 30 seconds.
`limit_by_label_key`	string	No	--	Groups requests by a request label. Each unique label value gets its own token bucket. See Request labels for available label keys.
`continuous_fill`	bool	No	`true`	When `true`, tokens are added smoothly over the interval rather than all at once when the interval elapses.
`delay_initial_fill`	bool	No	`false`	When `false`, the bucket starts at full capacity on the first request. This may allow more requests than the configured rate during the first interval.
`lazy_sync`	RateLimiterParametersLazySync	No	--	Lazy synchronization configuration. See Lazy sync: accuracy vs. latency.
`max_idle_time`	Duration	No	`7200s`	Time to keep a per-label token bucket after its last request. Only applies when `limit_by_label_key` is set.

RateLimiterRequestParameters

Overrides default request handling behavior.

Field	Type	Required	Default	Description
`denied_response_status_code`	int	No	`429`	HTTP status code returned when a request is rate-limited.
`tokens_label_key`	string	No	--	Request label whose value determines the number of tokens consumed per request, overriding the default of 1.

RateLimiterParametersLazySync

Controls lazy synchronization between Envoy and the remote agent. See Lazy sync: accuracy vs. latency for guidance on when to enable this feature.

Field	Type	Required	Default	Description
`enabled`	bool	No	`false`	Enables lazy synchronization.
`num_sync`	int	No	`4`	Number of times Envoy syncs with the remote agent within each `interval`.

Lazy sync: accuracy vs. latency

By default, Envoy contacts the remote agent on every request for a precise rate limiting decision. Lazy sync changes this behavior: Envoy decides locally and syncs with the remote agent periodically.

Mode	Behavior	Accuracy	Latency
Default (lazy sync disabled)	Envoy checks the remote agent per request	High -- exact token count enforcement	Higher -- remote call on every request
Lazy sync enabled	Envoy decides locally, syncs `num_sync` times per `interval`	Lower -- temporary over/under-counting between syncs	Lower -- most requests skip the remote call

Enable lazy sync for high-throughput APIs where approximate rate enforcement is acceptable and low latency matters more than exact counting.

Request labels

The ASM traffic scheduling suite assigns labels to each request as key-value pairs. Use these labels with limit_by_label_key to group requests for per-group rate limiting, or with tokens_label_key to vary the token cost per request.

HTTP request metadata

Each HTTP request is automatically labeled with the following metadata:

Label key	Value	Example
`http.method`	HTTP method	`POST`
`http.flavor`	HTTP protocol version	`1.1`
`http.host`	Request host	`httpbin.default.svc.cluster.local`
`http.target`	Request path	`/get`
`http.request_content_length`	Request body size in bytes	`431`
`http.request.header.<header_name>`	Value of the specified request header	`http.request.header.user_agent`

Baggage header

Baggage is an OpenTelemetry standard for propagating context across distributed systems. If a request includes a baggage HTTP header, each key-value pair is converted into a request label.

Example header:

baggage: userId=alice,isProduction=false

This produces two labels: userId: alice and isProduction: false. To rate-limit per user based on Baggage, set limit_by_label_key to userId.