All Products
Search
Document Center

Platform For AI:Dedicated gateway capacity and QPS

Last Updated:Apr 01, 2026

Select gateway specifications based on two complementary dimensions: capacity thresholds and QPS performance. Capacity thresholds define how many connections and how much bandwidth a gateway node can sustain. QPS performance defines how many requests per second the gateway can process. In most cases, a specification that satisfies one dimension also satisfies the other. When only one dimension is the bottleneck, upgrade the specification until both align with your traffic requirements.

Capacity thresholds

The following table lists capacity thresholds per gateway node. Keep all gateway capacity metrics below the security thresholds for production workloads.

Deploy at least two nodes per gateway to meet your service-level agreement (SLA) targets. A single-node deployment cannot guarantee SLA compliance. When two or more nodes are deployed, thresholds apply per node based on each node's specification.

Threshold behavior:

ThresholdBehavior
SecurityThe gateway maintains high throughput and low latency even if traffic doubles.
WarningLatency may increase, and traffic spikes can introduce stability risks.
OverloadThe gateway rejects new connections to protect stability.

Capacity thresholds per node:

Gateway capacity metricThreshold2 cores, 4 GiB4 cores, 8 GiB8 cores, 16 GiB16 cores, 32 GiB
Number of client connectionsSecurity12,00024,00048,00096,000
Warning24,00048,00096,000192,000
Overload40,00080,000160,000320,000
New HTTPS connections per secondSecurity4008001,6003,200
Warning8001,6003,2006,400
Overload
Network bandwidth (Gbit/s)Security1248
Warning1248
Overload
CPU utilizationSecurity30%30%30%30%
Warning60%60%60%60%
Overload90%90%90%90%
Memory usageSecurity75%75%75%75%
Warning75%75%75%75%
Overload90%90%90%90%

QPS performance

The following tables list pessimistic QPS values measured when CPU utilization is below 30%. Actual throughput varies based on response size, HTTPS usage, and gzip compression.

New HTTPS connections are CPU-intensive. In scenarios with a high volume of instantaneous concurrent HTTPS connections, use the short-lived connections tables to assess gateway capacity. The gzip compression feature is available to allowlisted users only. To request access, submit a ticket.

Short-lived connections — 1 KB response

HTTPSgzip2c4g x 3 nodes2c4g x 5 nodes4c8g x 3 nodes4c8g x 5 nodes8c16g x 3 nodes8c16g x 5 nodes16c32g x 3 nodes16c32g x 5 nodes
NoNo5,2008,70010,50017,50021,00035,00042,00070,000
YesNo1,6002,7003,2005,5006,50011,00013,00022,000

Persistent connections — 1 KB response

HTTPSgzip2c4g x 3 nodes2c4g x 5 nodes4c8g x 3 nodes4c8g x 5 nodes8c16g x 3 nodes8c16g x 5 nodes16c32g x 3 nodes16c32g x 5 nodes
NoNo6,50010,80013,00021,70026,00043,50052,00087,000
YesNo6,00010,00012,00020,00024,00040,00048,00080,000
YesYes5,2008,70010,50017,50021,00035,00042,00070,000

Persistent connections — 10 KB response

HTTPSgzip2c4g x 3 nodes2c4g x 5 nodes4c8g x 3 nodes4c8g x 5 nodes8c16g x 3 nodes8c16g x 5 nodes16c32g x 3 nodes16c32g x 5 nodes
NoNo5,6009,30011,20018,70022,50037,50045,00075,000
YesNo5,3009,00010,70018,00021,50036,00043,00072,000
YesYes3,1005,2006,20010,50012,50021,00025,00042,000