To meet the demand for unified ingress for modern and AI applications, Alibaba Cloud introduces ALB Extensible Edition. Built on flexible Service Extensions, this edition provides core traffic management capabilities such as identity authentication and content-based routing. It also adds AI-native features like multi-model proxy, GPU-aware scheduling, and token-based rate limiting to create an integrated, intelligent traffic gateway for applications and AI.
Benefits
-
Application-layer elasticity: ALB Extensible Edition operates at the application layer, providing domain names and virtual IP addresses (VIPs) with multi-level distribution to handle large-scale requests. It helps you scale out your applications by distributing traffic to eliminate single points of failure and improve system availability. You can also customize availability zone combinations and elastically scale across zones to prevent resource bottlenecks.
-
Advanced content-based routing: ALB Extensible Edition can identify specific service traffic and forward it to different backend servers based on various conditions, such as paths, . It also supports advanced actions such as redirects, rewrites, and custom HTTP headers.
-
Application-layer content awareness: ALB Extensible Edition supports deep inspection to dynamically route traffic to different backend services based on the content of the request body. Beyond traditional Layer 7 proxying, this provides "L7+" proxy capabilities ideal for AI application scenarios.
-
Flexible service extensions: ALB supports Service Extensions, which let you use plugins and external service calls for custom business scenarios, such as AI applications.
-
Security and reliability: ALB Extensible Edition natively supports security features such as credential management, built-in DDoS protection, and integration with a Web Application Firewall (WAF). It also provides end-to-end HTTPS encryption and supports TLS security policies and TLS 1.3. This supports encryption-sensitive workloads and Zero-Trust security architectures.
-
SSE streaming: ALB Extensible Edition supports Server-Sent Events (SSE) streaming. In large language model (LLM) AI applications, you can use SSE to return generated inference results in real time, improving the user experience.
-
Elastic and flexible billing: ALB Extensible Edition uses Elastic IP Addresses (EIPs) and Internet Shared Bandwidth for public network access, enabling flexible billing for public network usage. This edition uses a pricing model based on Load Balancer Capacity Units (LCUs), making it ideal for elastic workloads with traffic peaks.
Use cases
-
Application traffic gateway: Distributes traffic, provides authentication, and enforces rate limiting for traditional web and AI applications.
-
Modern application and AI workloads: Optimizes modern application and AI/machine learning workloads by implementing model-aware routing. It directs traffic based on specific model requirements to optimize GPU utilization and provide low-latency inference.
-
Unified multi-model proxy: Supports model adaptation, intelligent scheduling, and dynamic failover, and integrates fine-grained identity authentication to build secure, elastic, and highly reliable AI infrastructure.
-
Highly available deployment for hybrid and multi-cloud applications: Simplifies hybrid connectivity and ensures security. It acts as a core component of cross-cloud networks and provides high-performance application delivery and security for data centers, branch offices, and multi-cloud resources.
-
Container Ingress gateway: Routes external HTTP(S) requests to services in container clusters. It supports blue-green deployment, A/B testing, TLS termination, and content-based routing.
-
High-performance secure application delivery: Provides high-performance load balancing with auto-scaling capabilities and integrates with security products such as WAF, DDoS Protection, and Cloud Firewall to secure application delivery.
Instance performance metrics
An ALB instance allocates three IP addresses from each specified vSwitch: one virtual IP address (VIP) for client-facing services and two local IP addresses for backend communication and health checks.
To ensure that all elastic features of ALB are available, we recommend reserving at least eight IP addresses in each vSwitch where the ALB instance is deployed.
|
VIP metrics |
Maximum performance |
|
Maximum requests per second (QPS) |
500,000 |
|
Maximum new connections per second (CPS) |
200,000 |
|
Maximum concurrent connections |
5,000,000 |
|
Maximum private network bandwidth |
25 Gbps |
The default public bandwidth for a dual-availability-zone ALB instance is 400 Mbps. The total bandwidth of all EIPs associated with the ALB instance determines the actual public bandwidth.
In a single region, the total peak bandwidth of all pay-by-traffic EIPs under a single Alibaba Cloud account cannot exceed 5 Gbps. For more information, see the bandwidth cap section in pay-as-you-go.
If you require more bandwidth, you can purchase an Internet Shared Bandwidth instance. For more information about how to purchase an Internet Shared Bandwidth instance, see Create and manage Internet Shared Bandwidth.
Components of ALB Extensible Edition
|
Concept |
Description |
|
Instance |
An instance that operates at Layer 7 and provides powerful Layer 7 load balancing capabilities. It expands the service throughput of application systems by distributing traffic to different backend servers. A single instance can handle up to 1 million QPS. |
|
Listener |
A listener is the smallest service unit of an ALB instance. You must configure a protocol and port for a listener to process a specific type of traffic, such as HTTP traffic on port 80. Each ALB instance must have at least one listener to process and forward traffic. By default, you can configure up to 50 listeners for each ALB instance to handle different service traffic. |
|
Forwarding rule |
A forwarding rule determines how an ALB instance routes requests to backend servers in one or more server groups. ALB Extensible Edition supports multiple routing rules based on conditions such as domain names, paths, and HTTP headers. When associated with Service Extensions, it supports deep inspection of request bodies to implement application-layer content-aware scheduling. |
|
Service Extensions |
ALB Service Extensions lets you inject custom logic into the data forwarding path. Through a plugin mechanism and external service calls (callouts), you can execute business logic such as dynamic routing, authentication and authorization, content rewriting, and AI context awareness at key request processing nodes. ALB Extensible Edition provides a built-in component library that covers common scenarios, eliminating the need for application code modifications or additional proxy layers. |
|
Server group |
A server group is a logical group of backend servers that process business requests distributed by an ALB instance. In ALB, server groups are independent of ALB instances. You can attach the same server group to different ALB instances. Server groups in ALB Extensible Edition support the following backend server types: server (ECS, ECI, and elastic network interface (ENI)), IP, Function Compute, DNS name, and AI service. |
|
Health check |
ALB uses health checks to determine the operational status of backend servers. ALB detects unhealthy servers in a server group and stops distributing traffic to them. ALB supports flexible health check configurations, such as protocols, ports, and various health check thresholds. ALB also provides easy-to-apply health check templates for different server groups. |
|
Credential management |
The credential management feature of ALB Extensible Edition supports centralized creation and management of outbound credentials, which Key Management Service (KMS) encrypts and stores. When adding a backend service, you can directly reference these credentials. ALB automatically includes the credentials in forwarded requests for identity authentication. |
Instance types
Alibaba Cloud provides Internet-facing and internal-facing ALB instances. You can configure an Internet-facing or internal-facing ALB instance based on your business scenario. Your selection determines whether the instance uses Internet Shared Bandwidth and Elastic IP Addresses.
|
Concept |
Description |
|
VIP (virtual IP address) |
The service endpoint that an ALB instance uses to distribute traffic. Each VIP is a private IP address in a Virtual Private Cloud (VPC). |
|
EIP |
An Elastic IP Address (EIP) is required only when you create an Internet-facing ALB instance. It is the IP address that an ALB instance uses to provide services over the internet. An Internet-facing ALB instance can have multiple EIPs. For high availability, an Internet-facing ALB instance must have at least two EIPs in different availability zones. |
|
Internet Shared Bandwidth |
Internet Shared Bandwidth provides region-level bandwidth sharing and reuse. You can add EIPs in the same region to an Internet Shared Bandwidth instance to reuse its bandwidth and reduce public network costs. |
|
Domain name |
A domain name that resolves over the internet or within a private network to the EIP or VIP of an ALB instance. You must map your domain name to the domain name of the ALB instance by using a CNAME record. For more information, see Configure a CNAME record for an ALB instance. |
Enable ALB Extensible Edition
-
Apply for a quota: Apply for the ALB Extensible Edition privilege. Once approved, you can use the service.
-
Get started: Log on to the Application Load Balancer (ALB) console. Then, create and configure an ALB Extensible Edition instance.
ALB Extensible Edition is in public preview. For more information and to apply for access, see Public Preview Description.
Usage notes
ALB Extensible Edition instances do not support client access from the 33.0.0.0/8 and 22.0.0.0/16 CIDR blocks.