The number of queries per second (QPS) that a service can process is different from the maximum number of on-demand instances that can be used at a specific point in time.
You can estimate the required number of on-demand instances by using the following formula:
A single instance with concurrency = 1: An instance processes one request at a time.
Number of required instances = Requests per second × Request processing time (seconds)
For example, 10,000 requests are processed per second. If the average request processing time for each request is 1 second, the maximum number of required on-demand instances is 10,000 (10,000 × 1 = 10,000). If the average request processing time for each request is 10 milliseconds (0.01 second), the maximum number of required on-demand instances is 100 (10,000 × 0.01 = 100).
A single instance with concurrency > 1: An instance processes multiple requests at the same time.
Number of required instances = Requests per second × Request processing time (seconds)/Instance concurrency
For example, 10,000 requests are processed per second and the instance concurrency is 10. If the average request processing time for each request is 1 second, the maximum number of required on-demand instances is 1,000 (10,000 × 1/10 = 1,000). If the average request processing time for each request is 10 milliseconds (0.01 seconds), the maximum number of required on-demand instances is 10 (10,000 × 0.01/10 = 10).
For more information, see Configure instance concurrency.
NoteBy default, each Alibaba Cloud account can run up to 100 instances in each region. The actual quota displayed on the General Quotas page in the Quota Center console prevails. You can increase the quota in the Quota Center console.