Application Load Balancer (ALB) service extensions inject custom logic into the request forwarding path through built-in components. Add authentication, content rewriting, and AI context awareness at key processing stages without modifying application code or deploying additional proxy layers.
Use cases:
Authenticate requests to AI service endpoints using API keys or JWTs
Control token consumption for AI workloads with per-user or per-account rate limiting
Add custom processing logic to request and response flows
Service extensions require ALB Extensible Edition, currently in public preview. For more information, see Public preview of ALB Extensible Edition.
Supported components
A service extension contains one or more components. Each component adds specific functionality to the request processing pipeline. A service extension supports up to 5 components, and duplicate components are not allowed.
Traffic management
Component | Description |
Token rate limiting | Provides token-level rate limiting for AI applications. Identifies and parses request and response payloads that comply with the OpenAI-compatible protocol, automatically extracts token usage, and counts tokens using a sliding window algorithm. Supports rate limiting by dimensions such as user or account. |
Authentication
Component | Description |
API key authentication | Authenticates and authorizes inbound requests based on API keys. Parses API keys from HTTP headers, URL parameters, or cookies, then validates their legitimacy and access permissions. Simple to implement with low overhead. Suitable for non-sensitive operations. Security is lower than JWT authentication, so strict credential management is required. For more information, see |
JWT authentication | Authenticates and authorizes inbound requests based on JSON Web Tokens (JWT). Parses JWTs from HTTP headers, URL parameters, or cookies, then validates signatures using HMAC, RSA, or ECDSA algorithms and checks access permissions. Provides stronger security than API key authentication. |
Billing
During the public preview period, ALB Extensible Edition instances and service extension features are free of charge. Internet data transfer fees are charged based on standard Alibaba Cloud Internet data transfer pricing. After the public preview ends, ALB Extensible Edition will be billed according to official pricing standards.
Quotas
To request a quota increase, contact your account manager.
Quota name | Description | Default value |
| Service extensions per region | 50 |
| Associated resources per service extension | 200 |
| Components per service extension | 5 |