To ensure the stability of Alibaba Cloud services and the fair use of cloud resources, Elastic Compute Service (ECS) throttles API requests. This topic describes how to view the throttling information of ECS API requests and provides suggestions on how to throttle API requests.

Throttling range

Throttling is enabled for all ECS API operations to throttle requests. Region-specific throttling thresholds are set for different API operations. Access traffic generated by calls to a specified API operation in a single region within the same Alibaba Cloud account cannot exceed the system-defined throttling threshold. Otherwise, API requests initiated within a specific unit of time during the throttling period are denied by the system.

Access traffic includes data traffic from the ECS console, Resource Access Management (RAM) users, self-managed platforms based on Alibaba Cloud accounts, and Infrastructure as a Service (Iaas) orchestration platforms such as Terraform and Ansible.

You can log on to the Quota Center and click Elastic Compute Service on the Products with API Rate Limits page to view the throttling thresholds of different ECS API operations. Quota Center
Note Only the throttling thresholds of specific ECS API operations can be viewed. Throttling thresholds of more ECS API operations will be made available soon.

Throttling rules

Take note of the following API throttling rules:
  • Traffic generated by calls to each API operation is independently calculated. If traffic on a single API operation within a region reaches the throttling threshold, calls to other API operations and calls to this API operation in other regions are not affected.
  • If an API operation is throttled at minute T, the operation can continue to be called as of minute T+1.
  • If an API operation is throttled and an error is reported when the API operation is called by using an SDK or Alibaba Cloud CLI, the corresponding features are also throttled in the operations performed in the ECS console.

Suggestions

When traffic generated by ECS API requests reaches the throttling threshold, the Throttling error code is reported and the error is not correctly handled by the system. Therefore, when you build an IaaS platform, you must consider the rationality of calling API operations. The following suggestions are provided:
  • Request aggregation

    Some ECS API operations can be called to batch query or perform batch operations on resources. We recommend that you call these API operations to query multiple resources or perform operations on multiple resources at a time.

  • Fixed call frequency
    If you want to call API operations to check the states of resources, we recommend that you call API operations at a specified interval or by using the reverse backoff mechanism. Examples:
    • In most cases, the interval at which resource states (such as Starting and Stopping) are checked is 1 to 2 seconds.
    • In the reverse backoff mechanism, you do not check the state of a resource for several seconds after operations are performed on the resource, and then gradually increase the check frequency until a fixed interval of 1 to 2 seconds is reached.
  • Backoff retry policy

    If an error code is returned due to throttling when you call an API operation, you must configure a backoff retry policy for requests of the API operation. When you retry an API operation within the same Alibaba Cloud account, you can perform one query per second (QPS) to check whether the API operation is available.

API request throttling based on resources

When you call the CreateInstance or RunInstances operation to create one or more ECS instances, you must consider the limits on the amount of resources in addition to API request throttling. You can create up to 5,000 ECS instances within 1 minute within an Alibaba Cloud account. In real-world scenarios, if API requests to create a total of 5,000 instances are submitted within 1 minute, it may take more than 1 minute for the instances to be created and enter the Running state.
Note This limit of 5,000 instances per minute indicates that API requests to create up to 5,000 instances can be submitted within 1 minute. It does not mean that this number of instances are created and enter the Running state within 1 minute.