Community Blog Three Strategies of High Concurrency Architecture Design - Part 1: Caching

Three Strategies of High Concurrency Architecture Design - Part 1: Caching

This article discusses high concurrency and explains the strategies for dealing with high concurrency. Part 2 of this article focuses on caching.


By Luotian


Background of High Concurrency

With the rapid growth of the Internet industry, there has been a significant increase in the number of users, resulting in immense pressure on systems due to concurrent requests.

Software systems strive for three goals: high performance, high concurrency, and high availability. There are both differences and connections between the three. Many skills and knowledge are involved. It would take too much time for a comprehensive discussion. Therefore, this article mainly focuses on high concurrency.

High Concurrency Challenge to Systems

High concurrency poses challenges to systems such as performance degradation, resource contention, and stability issues.

What Is High Concurrency?

Definition of High Concurrency

High concurrency refers to the ability of a system or application to receive a large number of concurrent requests in the same period. Specifically, systems in high-concurrency environments need to be able to handle a large number of requests simultaneously without performance issues or response latency.

Features of High Concurrency

  1. Large number of requests: in high-concurrency scenarios, the system needs to process a large number of requests at the same time. These requests may come from different users or clients.
  2. Simultaneous access: these requests arrive at the system almost simultaneously and need to be processed and responded to in a short time.
  3. Resource contention: due to a large number of requests arriving at the same time, system resources such as CPU, memory, and network bandwidth may face competition and contention.
  4. High requirements for response time: high-concurrency scenarios usually require fast system response, because users hope to obtain response results quickly.

Scenarios and Applications of High Concurrency

High-concurrency scenarios are commonly found in popular websites, e-commerce platforms, and social media applications on the Internet. For instance, on e-commerce platforms, there is a significant number of users simultaneously browsing, searching for products, and placing orders. Similarly, on social media platforms, there is a large influx of users posting, liking, and commenting concurrently. These situations require the system to efficiently handle a substantial number of requests while ensuring optimal system performance, availability, and user experience.

Impact of High Concurrency

• Decreased system performance and increased latency

• Resource contention and exhaustion

• Challenges to system stability and availability

Strategies for Dealing with High Concurrency

Caching: relieve system load pressure and improve system response speed.

Rate limiting: control concurrent page views and protect the system from overload.

Degradation: guarantee the stability of core features, discard non-critical business or simplify processing.



In the development of websites or APPs, the caching mechanism is indispensable to improve the access speed of websites or APPs and reduce the pressure on databases. In a high-concurrency environment, the role of the caching mechanism is more obvious, which can not only effectively reduce the load of the database, but also improve the stability and performance of the system, thereby providing a better experience for users.

How it Works

The principle of caching is to obtain the data from the cache first. If data exists in the cache, it is directly returned to the user. If data does not exist in the cache, the actual data is read from the slow device and put into the cache.

Common Technology

Browser Caching


Browser caching refers to storing resources in a webpage such as HTML, CSS, JavaScript, and images in users' browsers so that the same resource can be directly obtained from the local cache when subsequent requests are made, instead of downloading it from the server again.


Browser caching is suitable for static web pages with only a few content changes and static resources. In these scenarios, browser caching can significantly improve website performance and user experience, and reduce server load.

Common Usage

With the browser caching, you can control the cache behavior by setting the Expires and Cache-Control fields in the response header.

  1. Using the Expires field: the Expires field specifies the expiration time of caching with a specific date and time. The server can add the Expires field to the response header to notify the browser that the resource can be directly obtained from the cache before the time, without the need to initiate the request to the server. For example: Expires: Mon, 31 Dec 2022 23:59:59 GMT.
  2. Using the Cache-Control field: the Cache-Control field provides more flexible caching control options. You can specify the maximum valid time of the cache (unit: seconds) by setting the max-age instruction. For example: Cache-Control: max-age=3600 means that the resource can be obtained directly from the cache within 1 hour. You can also use other instructions, such as no-cache for caching but not using cache, and no-store for disenabling caching.


The browser caching stores real-time insensitive data, such as product frames, seller ratings, comments, and advertisement words. It has an expiration time and is controlled by the response header. Data with high real-time requirements is not suitable for browser caching.

Client-side Caching


Client-side caching refers to storing data in the browser to improve access speed and reduce server requests.


During the big sales promotion, to prevent the server from experiencing great traffic pressure, you can send some materials such as JS, CSS, and images to the client for caching in advance and avoid requesting these materials again during the promotion. In addition, some background data or style files can be stored in the client-side caching to ensure the normal operation of the APP in the event of server exceptions or network exceptions.

CDN Caching


A Content Delivery Network (CDN) is a distributed network built on a bearer network. It consists of edge node servers distributed in different regions.

CDN caching is usually used to store static page data, event pages, and images. It has two caching mechanisms: the push mechanism which actively pushes data to the CDN nodes and the pull mechanism which pulls data from the source server and stores the data in the CDN nodes at the first access.



CDN caching can improve the website access speed and is suitable for scenarios where the website access is large, the access speed is slow, and the data changes are infrequent.

Common Tools and Usage

Common CDN caching tools include Cloudflare, Akamai, Fastly, and AWS CloudFront. These tools provide a globally distributed CDN network to accelerate content delivery and improve performance. They provide consoles and APIs for configuring CDN caching rules, managing cached content, and refreshing and updating the cache.

Reverse Proxy Caching


Reverse proxy caching refers to storing the response to requests in the reverse proxy server to improve service performance and user experience. It stores frequently requested static content in the proxy server, and when a user requests the same content, the proxy server returns the cached response directly without requesting the source server again.


This method is suitable for scenarios where the speed of accessing external services is slow but data changes are infrequent.

Common Tools and Usage
  1. NGINX: a high-performance reverse proxy server that supports reverse proxy caching. You can set the caching policy through the configuration file. The NGINX proxy layer caching is mainly configured by using the HTTP module and the proxy_cacahe module.
  2. Varnish: a dedicated open-source software for reverse proxy caching. It caches efficiently and provides fast responses.
  3. Squid: a powerful caching proxy server that supports reverse proxy caching and forward proxy caching.

Local Caching


Local caching refers to storing data or resources in client storage media, such as hard disks, memory, or databases. It can be temporary, that is, it is valid only within the duration of the application. It can also be persistent, that is, it remains valid for different application sessions.


Local caching is suitable for scenarios where data is frequently accessed, or where offline access, bandwidth consumption reduction and better user experience are required.

Common Tools and Usage

Local caching is generally divided into disk caching, CPU caching, and application caching.

  1. Disk caching: caches are stored in permanent storage media such as hard disks to accelerate data reads and access.
  2. CPU caching: a high-speed memory inside the processor that temporarily stores frequently accessed data or instructions to improve computer performance.
  3. Application caching: it involves the application data or resources stored in memory to improve the response speed and user experience of the APP. Taking Java services as an example, it is divided into in-heap caching and out-of-heap caching.

Distributed Caching


Distributed caching refers to a cache solution that scatteredly stores cached data on multiple servers.


Distributed caching is suitable for scenarios where high-concurrency reading, data sharing and collaborative processing are required, elasticity and scalability are provided, and the number of backend requests is reduced.

Common Tools and Usage
  1. Redis: a high-performance key-value caching system that supports a wide range of data types and flexible caching policies. You can use Redis to build a distributed cache cluster and use its fast read/write capability and consistent hashing to implement data sharding and load balancing.
  2. Memcached: a simple but fast distributed in-memory object caching system to reduce database load and accelerate dynamic Web APPs. It uses the distributed hash algorithm for data sharding and distributed storage.
  3. Hazelcast: an open-source distributed in-memory data grid platform that provides distributed caching and computing capabilities. It can be used to build high-throughput and high-availability distributed caching systems.

Caching Problems

Cache Penetration

Keywords: Cache and database have no specific data, concurrent access

Cache penetration refers to the situation where the specific data does not exist in the database or cache, and each time a request must go through the cache and then access the database for this non-existent data record. A large number of requests may cause DB downtime.

  1. Bloom Filter: it is a data structure that quickly determines whether an element exists. It can quickly determine whether an element is in a collection with a small memory usage. All possible data are hashed into a bit array that is large enough. When a request comes, the Bloom Filter determines whether it exists in the cache. If the request does not exist in the cache, it will be directly returned to avoid query pressure on the database.
  2. Empty objects caching: for a non-existent data record, an empty object is stored in the cache to indicate that the data does not exist. When a request is made to access this non-existent data, an empty object is returned directly from the cache to avoid each request penetrating the database layer for queries.
  3. Delayed dual-determination: when a query request penetrates the cache to the database layer, the query is first performed in the database. If the database does not have corresponding data, the empty result is written to the cache, and a short expiration time is set. In this way, the next time the same query request will get an empty result from the cache without penetrating the database again.
  4. Hot data pre-loading: some hot data is asynchronously loaded to the cache when the system is started or before the cache expires. This ensures that the cached hot data always exists and prevents the penetration of frequently requested data due to cache expiration.
  5. Rate limiting policy: you can set rate limiting policies for frequently requested data. For example, you can use the token bucket algorithm or the leaky bucket algorithm to limit the frequency of requests to the data and reduce the pressure on the database.

Cache Breakdown

Keywords: Expiration of a single hot key, concurrent access

Cache breakdown refers to the situation where, if the hot data exist in the database but not in the cache, when a large number of requests access the data that does not exist in the cache, the request finally to the DB may lead to DB downtime.

  1. Setting a hot time window for hot data: for hot data, you can set a hot time window. Within this time window, if a piece of data is frequently accessed, its cache time will be extended to avoid cache breakdown caused by frequent cache refreshing.
  2. Using mutual exclusion locks (mutexes) or distributed locks: when the cache is invalid, only one thread is allowed to query the database, while other threads wait for the query results. You can use mutexes or distributed locks to ensure that only one thread can query the database while other threads wait for the results to avoid excessive database pressure caused by multiple threads querying the database at the same time.
  3. Setting the cache to never expire: for some hot data, you can set the cache to never expire, or set a long expiration time, so that even if the cache expires, there is enough time to refresh the cache to avoid cache breakdown.
  4. Updating the cache asynchronously: when the cache is invalid, you can update the cache asynchronously instead of querying the database and refreshing the cache synchronously. This reduces direct access to the database and does not block responses to other requests.
  5. Using multi-level cache architecture: the multi-level cache architecture can be used to distribute hot data to multiple cache nodes, preventing the entire cache layer from crashing due to the failure of a single cache node. When a cache node is invalid, you can obtain data from other cache nodes or databases.
  6. Setting a circuit breaking mechanism: when the cache layer fails or fails to work normally, you can set a circuit breaking mechanism to directly access the database to ensure the normal operation of the system.

Cache Avalanche

Keywords: Expiration of batch keys, concurrent access

Cache avalanche refers to the simultaneous expiration of a large number of cache keys, which results in a large number of requests to the database, and this may finally lead to DB downtime.

  1. Using multi-level cache architecture: the cache can be divided into multiple levels, and the cache at each level has different expiration times. For example, hot data is stored at a recently expired cache level, while non-hot data is stored at a longly expired cache level. In this way, even if the cache at a certain level is invalid, data can still be obtained from other levels to avoid all requests directly accessing the database.
  2. Setting random expiration time of cached data: when setting the expiration time of cached data, a random value can be added to make different cached data inconsistent at the expiration time. This can prevent a large amount of data from expiring at the same time to reduce the database load.
  3. Using distributed locks or mutexes: you can use distributed locks or mutexes to ensure that only one request can reload the cache when the cache is invalid. Other requests wait for the request to complete and then obtain data directly from the cache. This prevents multiple requests from accessing the database at the same time.
  4. Data prefetching: during system startup or off-peak hours, hot data is loaded into the cache in advance to prefetch the cache. In this way, data can be obtained from the cache even at high concurrency, which reduces the pressure on the database.
  5. Cache rate limiting: when a cache expiration is detected, requests can be limited to limit the number of concurrent requests. This prevents a large number of requests from accessing the database at the same time and causing excessive database loads.
  6. Optimizing the database: in addition to the cache level, you can also optimize the database level, such as improving database performance and increasing database capacity to cope with database pressure caused by a large number of requests.

Cache Consistency

Cache consistency refers to the data consistency between the cache and the DB. We need to prevent the cache from being inconsistent with the DB by various means. We need to ensure that the cache is consistent with the DB data or that the data is eventually consistent.


The cache consistency issue can be addressed at different layers:

1.  Database Layer:

• At the database layer, transactions can be used to ensure data consistency. By placing read and write operations in the same transaction, you can ensure that data updates and queries are consistent.

• You can use the trigger of the database. During the storing procedure, you can also actively trigger the update operation of the cache when the data is updated so that data consistency between the cache and the database can be ensured.

2.  Cache Layer:

• At the cache layer, cache update policies can be used to maintain data consistency between the cache and the database by scheduled tasks and the asynchronous message queue (MQ) to update the cache regularly or asynchronously when the data is updated.

• You can use mutexes or distributed locks to ensure the atomicity of read and write operations on the cache and avoid data conflicts.

• You can set an appropriate expiration time for cached data to prevent data inconsistency caused by long-term expiration of cached data.

3.  Application Layer:

• At the application layer, a read-write splitting strategy can be adopted to distribute read requests and write requests to different nodes. Read requests obtain data directly from the cache while write requests update the database and the cache to maintain data consistency.

• You can use cache middleware or cache components to automatically update the cache and reduce the complexity of manually maintaining the cache.

4.  Monitoring and Alerting:

• You can establish a monitoring and alerting mechanism. By monitoring indicators such as the states of the cache layer and database layer, and data consistency, exceptions can be timely detected and alerts can be triggered to deal with problems.

The comprehensive use of the strategies at different layers can effectively deal with the cache consistency issue and ensure data consistency and system stability. Strategies at different layers can cooperate to form a perfect cache consistency solution.


We benefit a lot from caching. Every request of users is accompanied by a large number of caching. However, caching also brings great challenges, such as the issues mentioned above: cache penetration, cache breakdown, cache avalanche, and cache consistency.

In addition, some other cache issues might also be involved, such as cache skew, cache blocking, cache slow query, cache primary-secondary consistency, cache high availability, cache fault discovery and recovery, cluster scaling, and large keys and hot keys.


Cache type Introduction Solution/tool Advantages Disadvantages Scenarios
Browser caching It is a cache of the storage in users' devices to store static resources and page content. It controls caching behavior by setting caching-related fields in HTTP headers. • It provides quick response to avoid frequent access to the server or network.
• It reduces network bandwidth consumption and improves website performance.
• The cached data may not be up-to-date, and the design of cache consistency and update mechanism needs to be considered.
• The cache hit rate is limited by the choice of cache capacity and caching policy.
• Caching of static resources.
• Reducing the network bandwidth consumption.
Client-side caching It is a cache of application storage in users' devices to store data, computation results, or other business-related content. It uses Web APIs such as local storage, SessionStorage, LocalStorage, or IndexedDB to store and read data. • It reduces backend load and improves system performance.
• It provides quick response to avoid frequent access to the server or network.
• The cached data may not be up-to-date, and the design of cache consistency and update mechanism needs to be considered.
• The cache hit rate is limited by the choice of cache capacity and caching policy.
• Frequently accessed data or computing results.
• Relieving the backend load.
CDN caching It is a cache of content delivery network to store and accelerate the distribution of static resources. It deploys static resources to the CDN server and configures the CDN caching policy. Users' requests will be forwarded to the nearest CDN node to accelerate content distribution and access. • It accelerates access to static resources and improves user experience.
• It reduces the source server load and improves system scalability.
• It is only suitable for caching static resources. Dynamic content cannot be cached.
• It involves the complexity of CDN configuration and management.
• Distribution of static resources and access to them.
• Accelerating the loading of static resources and access to them.
Reverse proxy caching It is a cache located at the frontend server to store and accelerate access to dynamic content and static resources. It configures the reverse proxy server and sets up a caching policy to forward user requests to the cache server, reduce the load on the backend server, and accelerate content access. • It accelerates the access to content and improves user experience.
• It reduces the load on the source server and improves system scalability.
• It is only suitable for specific Web servers and APPs. • Caching of dynamic content and static resources, and acceleration of access to them.
• Relieving the load on the backend server.
Local caching It is a cache of an application in users' devices to store data and resources to improve the performance and response speed of APPs. It uses a cache library or framework such as localStorage, sessionStorage, and Workbox to implement the local caching function. • It improves the performance and response speed of APPs.
• It reduces dependence on remote resources and improves offline experience.
• The capacity of local caching is limited by the storage space of users' devices. • Frequently accessed data or resources.
• Performance improvement and faster response of the application.
Distributed caching It is a cache used in distributed systems to store and share data. It is typically deployed on multiple servers. It provides high concurrent read and write capabilities and scalability of data access. It is commonly used in large-scale applications and systems. It uses distributed caching systems such as Redis and Memcached to store and access cached data. • It provides high concurrent read and write capabilities and scalability of data storage. • Additional server resources are required to deploy and manage the distributed caching system.
• Cache consistency and data synchronization issues need to be considered.
• High concurrent read/write capabilities and data storage scalability.
• Caching and data sharing of large-scale applications or systems.

The above is a horizontal comparison between browser caching, client caching, CDN caching, reverse proxy caching, local caching, and distributed caching, including details of the introduction, solutions/tools, advantages and disadvantages, and applicable scenarios. According to the specific needs and system architecture, you can select the appropriate caching type and solution to improve system performance, reduce server load, improve user experience, and ensure data consistency.

Continue reading the part 2 of this article

Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

0 1 0
Share on

Alibaba Cloud Community

917 posts | 201 followers

You may also like


Alibaba Cloud Community

917 posts | 201 followers

Related Products