When you design applications that need to upload data to or download data from OSS, we recommend that you use the design patterns described in this topic to optimize the performance of your applications. We also provide guidelines for application performance optimization. You can refer to these guidelines when you design your applications.

Cache frequently-accessed content

In scenarios where a large number of users of your application in the same region need to download a static object from OSS at the same time, you can cache the object on the edge nodes of Alibaba Cloud Content Delivery Network (CDN). This way, the users can obtain a cached version of the object from the nearest edge node to improve download performance. You can use caches to reduce latency and improve data transmission speed. Applications that use caches send fewer requests to OSS, which reduces the request fees.

Alibaba Cloud CDN is a distributed network that is built on top of the bearer network and consists of servers distributed in different regions. CDN caches the resources on the origin to edge servers distributed in different regions. Users can obtain cached resources from edge servers that are nearest to their locations instead of the origin. This way, direct access to the origin is reduced.

Establish timeout retry mechanisms for latency-sensitive applications

OSS limits the query per second (QPS) of management operations, such as GetService (ListBuckets), PutBucket, and GetBucketLifecycle. If your application initiates a large number of requests at the same time, HTTP 503 may be returned to indicate that the requests are too frequent. In this case, we recommend that you retry the requests after a few seconds.

OSS restricts the QPS of requests sent by an Alibaba Cloud account to 10,000. If you require a higher QPS, contact the technical support. In addition, OSS also poses a limit for QPS of requests that are sent to a single partition. If you send a large number of requests at the same time to access objects within the same partition, OSS may return HTTP 503 even if the QPS of the requests does not exceed 10,000. To avoid this problem, you can configure different prefixes for objects. This way, OSS creates more partitions to store the objects and supports higher QPS. In this case, you need only to wait and then send your requests again. For more information about how to configure prefixes for objects, see OSS performance and scalability best practices.

If you send a large number of requests that are different in size, such as requests that are larger than 128 MB in size, we recommend that you measure the throughput and retry the slowest 5% requests. In general, the responses to small requests that are smaller than 512 KB in size are returned within tens of milliseconds. If you need to retry GET or PUT requests, we recommend that you retry the requests 2 seconds after the requests are sent. If a request fails multiple times, we recommend that you quit the program and send the request again. For example, you can retry a request 2 seconds after the request is sent and then retry the request again after 4 seconds.

If the requests sent by your application are the same in size and you want the response time of all requests to be consistent, we recommend that you identify and retry the 1% slowest requests. In this case, the response time of the requests can be reduced when the requests are retried.

Send requests in a distributed and concurrent manner for high throughput

OSS is a large-scale distributed storage system. To fully use the throughput of OSS, we recommend that you concurrently send requests to OSS and distribute the requests to multiple OSS service nodes. This way, workload can be distributed to multiple paths over the network.

To increase throughput during data transmission, we recommend that you create multiple threads or instances and initiate multiple requests in the threads or instances to concurrently upload and download data. For some applications, you can initiate multiple requests in different threads or instances to concurrently access OSS. You can determine how to distribute requests based on the architecture of your application and the structure of objects that you want to access.

Before you adjust the number of requests that can be concurrently sent, you must measure the resources that are required for requests. We recommend that you first measure the bandwidth and other resources that are consumed by a single request. This way, you can identify the major resource that is required to for requests and then determine the number of concurrent requests based on the resource. For example, if 10% of the CPU is required to process a request at a time, you can send up to 10 concurrent requests.

Use transfer acceleration to speed up upload and download over long distance

OSS uses data centers distributed around the globe to implement transfer acceleration. When a request is sent to your bucket, it is parsed and routed to the data center where the bucket resides over the most optimal network path and protocol. The transfer acceleration feature provides an optimized end-to-end acceleration solution to access OSS over the Internet. For more information, see Transfer acceleration.

You can visit The Comparison of OSS Direct Data Transfer and Accelerated Data Transfer in Different Regions to compare the access speeds when you use the accelerate and default endpoints to access OSS in different regions. Transfer acceleration fees are calculated separately based on the outbound traffic over the Internet. For example, if you use the accelerate endpoint of a bucket for which transfer acceleration enabled to download 1 GB of data from the bucket, you are charged transfer acceleration fees and outbound traffic fees for 1 GB.