Implementation of Java API Throttling

By Feiyou

1. Throttling

Why Throttling Is Needed?

High instantaneous traffic overwhelms the service.
Malicious users frequently access the server, causing crashes.
Fast message consumption puts a heavy load on the database, leading to performance degradation or even crashes.

What is Throttling?

Throttling limits the number of requests within a certain time window to maintain system availability and stability and prevent the system from running slow or crashing due to traffic surges.

In high-concurrency systems, traffic throttling is usually performed to protect systems.

In high concurrency scenarios in distributed systems, throttling is the most commonly used method to avoid system crashes caused by sudden traffic surges while ensuring high availability and stability of services.

What are the Common Throttling Algorithms?

There are four common throttling algorithms: the fixed window algorithm, sliding window algorithm, leaky bucket algorithm, and token bucket algorithm.

2. Throttling Algorithms

2.1 Fixed Window

2.1.1 Implementation Principle

The fixed window, also known as the fixed window throttling algorithm and the fixed window counter algorithm, is the simplest throttling algorithm.

Implementation principle: The number of accesses within a specified period is added up. When it reaches the specified threshold, the throttling is triggered, and the number of accesses is cleared in the next period. As shown in the figure, we require that there should be no more than 150 requests within 3 seconds:

2.1.2 Code Implementation


public class FixedWindowRateLimiter {
    Logger logger = LoggerFactory.getLogger(FixedWindowRateLimiter.class);
    // The size of the time window. Unit: milliseconds.
    long windowSize;
    // The number of allowed requests.
    int maxRequestCount;
    // The number of requests that pass through the current window.
    AtomicInteger counter = new AtomicInteger(0);
    // The right boundary of the window.
    long windowBorder;
    public FixedWindowRateLimiter(long windowSize, int maxRequestCount) {
        this.windowSize = windowSize;
        this.maxRequestCount = maxRequestCount;
        this.windowBorder = System.currentTimeMillis() + windowSize;
    }
    public synchronized boolean tryAcquire() {
        long currentTime = System.currentTimeMillis();
        if (windowBorder < currentTime) {
            logger.info("window reset");
            do {
                windowBorder += windowSize;
            } while (windowBorder < currentTime);
            counter = new AtomicInteger(0);
        }

        if (counter.intValue() < maxRequestCount) {
            counter.incrementAndGet();
            logger.info("tryAcquire success");
            return true;
        } else {
            logger.info("tryAcquire fail");
            return false;
        }
    }
}

2.1.3 Advantages and Disadvantages

Advantages: Simple to implement and easy to understand.

Disadvantages:

1) Uneven throttling. For instance, if a rate limit of 3 requests per second is in place and 3 requests are sent in the first millisecond, the throttling is activated, and requests for the remaining time of the window will be rejected, resulting in a poor user experience.

2) Inability to handle the window boundary problem. Throttling is performed within a certain time window, and a window boundary effect may occur, allowing a large number of requests to pass at the boundary, resulting in burst traffic. For instance, if 150 requests are generated in the third second and 150 requests in the fourth second, then 300 requests have already been sent in two seconds, exceeding the limit of no more than 150 requests in 3 seconds, as shown in the figure below:

2.2 Sliding Window

2.2.1 Implementation Principle

The sliding window is an improved version of the fixed window, addressing the issue where the fixed window receives twice the threshold number of requests when the window is switched. In the sliding window algorithm, the start and end times of the window are dynamic, while the window size remains fixed. This algorithm effectively handles the window boundary problem, but its implementation is relatively complex due to the need to record the timestamp of each request.

Implementation principle: The sliding window algorithm further divides the time window into finer shards based on the fixed window. A window is divided into several equal parts of small windows, with only a small amount of time slid each time. Each small window corresponds to a different time point and has an independent counter. When the time point of a request exceeds the maximum time point of the current window, the window shifts forward by one small window. The data in the first small window is discarded, making the second small window the first, and placing the current request in the last small window. The total requests in the entire window are not allowed to exceed the threshold. Among others, Sentinel achieves throttling through the sliding window algorithm. The following figure illustrates the sliding window algorithm.

Core steps:

1) Divide 3 seconds into 3 small windows, each with an upper limit of 50 requests.

2) Suppose we set a rate limit of no more than 150 requests in 3 seconds, then this window can accommodate 3 small windows and will slide forward over time. Each time a request is sent, the total number of requests for all small windows in the sliding window is counted.

2.2.2 Code Implementation


public class SlidingWindowRateLimiter {
    Logger logger = LoggerFactory.getLogger(FixedWindowRateLimiter.class);
    // The size of the time window. Unit: milliseconds.
    long windowSize;
    // The number of window shards.
    int shardNum;
    // The number of allowed requests.
    int maxRequestCount;
    // The count of requests in each window.
    int[] shardRequestCount;
    // The total number of requests.
    int totalCount;
    // The subscript of the current window.
    int shardId;
    // The size of each small window. Unit: milliseconds.
    long tinyWindowSize;
    // The right boundary of the window.
    long windowBorder;

    public SlidingWindowRateLimiter(long windowSize, int shardNum, int maxRequestCount) {
        this.windowSize = windowSize;
        this.shardNum = shardNum;
        this.maxRequestCount = maxRequestCount;
        this.shardRequestCount = new int[shardNum];
        this.tinyWindowSize = windowSize / shardNum;
        this.windowBorder = System.currentTimeMillis();
    }
    public synchronized boolean tryAcquire() {
        long currentTime = System.currentTimeMillis();
        if (windowBorder < currentTime) {
            logger.info("window reset");
            do {
                shardId = (++shardId) % shardNum;
                totalCount -= shardRequestCount[shardId];
                shardRequestCount[shardId] = 0;
                windowBorder += tinyWindowSize;
            } while (windowBorder < currentTime);
        }

        if (totalCount < maxRequestCount) {
            logger.info("tryAcquire success:{}", shardId);
            shardRequestCount[shardId]++;
            totalCount++;
            return true;
        } else {
            logger.info("tryAcquire fail");
            return false;
        }
    }
}

2.2.3 Advantages and Disadvantages

Advantages: The sliding window solves the window boundary problem of the fixed window algorithm and prevents burst traffic from overwhelming the server.

Disadvantages: Uneven throttling. For example, consider a rate limit of 3 requests per second, and 3 requests are sent in the first millisecond. The throttling is then triggered, and the requests for the remaining time of the window will be rejected, which brings a bad user experience.

2.3 Leaky Bucket Algorithm

2.3.1 Implementation Principle

The leaky bucket throttling algorithm is a common traffic shaping and traffic policy algorithm, which helps to regulate the rate of data transmission and avoid network congestion. What the leaky bucket algorithm can do:

Control the rate at which traffic enters the network.
Smooth out burst traffic in the network.

Implementation principle: The "leaky bucket" is a vivid simile of this algorithm. External requests are continuously sent into the bucket like water injected into the bucket, and the bucket has set a maximum water output rate. The bucket will release requests at this constant rate. The water exceeding the maximum capacity of the bucket will be discarded. No matter how quickly is the water injected into the bucket, the output rate of the water in the leaky bucket remains unchanged. The idea of leaky bucket throttling is adopted by message middlewares. The following figure illustrates the leaky bucket algorithm.

Core steps:

In a leaky bucket with a fixed capacity, the water leaks (the requests are processed) at a fixed rate.
When the water is injected too fast, it overflows directly. Similarly, when processing requests, if the number of requests exceeds the limit, they will be rejected directly.

When there is no water in the bucket, no water will flow out. Similarly, if there is no request in the bucket, no request will be processed.

2.3.2 Code Implementation

public class LeakyBucketRateLimiter {
    Logger logger = LoggerFactory.getLogger(LeakyBucketRateLimiter.class);
    // The capacity of the bucket.
    int capacity;
    // The existing water volume in the bucket.
    AtomicInteger water = new AtomicInteger();
    // The time when the water starts to leak.
    long leakTimestamp;
    // The rate at which water leaks, that is, the number of requests allowed to pass per second.
    int leakRate;

    public LeakyBucketRateLimiter(int capacity, int leakRate) {
        this.capacity = capacity;
        this.leakRate = leakRate;
    }

    public synchronized boolean tryAcquire() {
        // No water in the bucket. Restart counting.
        if (water.get() == 0) {
            logger.info("start leaking");
            leakTimestamp = System.currentTimeMillis();
            water.incrementAndGet();
            return water.get() < capacity;
        }
        // The water leaks first. Count the remaining water volume.
        long currentTime = System.currentTimeMillis();
        int leakedWater = (int) ((currentTime - leakTimestamp) / 1000 * leakRate);
        logger.info("lastTime:{}, currentTime:{}. LeakedWater:{}", leakTimestamp, currentTime, leakedWater);
        // The time may be insufficient, so the water does not leak first.
        if (leakedWater != 0) {
            int leftWater = water.get() - leakedWater;
            // The water may have run out. Set to 0.
            water.set(Math.max(0, leftWater));
            leakTimestamp = System.currentTimeMillis();
        }
        logger.info("Remaining capacity:{}", capacity - water.get());
        if (water.get() < capacity) {
            logger.info("tryAcquire sucess");
            water.incrementAndGet();
            return true;
        } else {
            logger.info("tryAcquire fail");
            return false;
        }
    }
}

2.3.3 Advantages and Disadvantages

Advantages:

1) Smooth traffic. Since the leaky bucket algorithm processes requests at a fixed rate, it can effectively smooth out and shape traffic to avoid burst traffic and fluctuations. This is similar to the effect of load shifting in a message queue.

2) Prevention of overload. When the number of requests flowing in exceeds the capacity of the bucket, they are directly discarded to prevent system overload.

Disadvantages:

1) Disability to handle burst traffic. As the bucket leaks at a constant rate, it cannot handle burst traffic. Even when the traffic is small, the leaky bucket cannot process requests at a faster rate.

2) Potential data loss. If the traffic exceeds the capacity of the bucket, some requests have to be discarded. This may be a problem in scenarios where missing requests are unacceptable.

3) Unfitness for scenarios with large rate changes. If the rate changes greatly or the rate needs to be dynamically adjusted, the leaky bucket algorithm cannot meet the requirements.

4) Inefficient use of resources. Regardless of the current system load, all requests must be queued, even when the server load is small. This results in a waste of system resources.

Due to the obvious defects of the leaky bucket, it is rarely applied in real businesses.

2.4 Token Bucket Algorithm

2.4.1 Implementation Principle

The token bucket algorithm is an improved version of the leaky bucket algorithm. It limits the average rate of service calls while allowing certain burst calls.

Implementation Principle:

1) The system adds tokens to the bucket at a fixed rate.

2) When a request is sent, it attempts to remove a token from the bucket. If the bucket contains enough tokens, the request can be processed or the packet can be sent.

3) If no token exists in the bucket, the request is rejected.

4) The number of tokens in a bucket cannot exceed the bucket capacity. New tokens will be discarded if the number exceeds the bucket capacity.

5) An important feature of the token bucket algorithm is its ability to handle burst traffic. When there are sufficient tokens in the bucket, it can process multiple requests at a time, which is useful for application scenarios where burst traffic needs to be handled. Meanwhile, because the number of tokens in the bucket is limited, it will not overwhelm the server by increasing the processing rate indefinitely.

The following figure illustrates the token bucket algorithm:

2.4.2 Code Implementation

RateLimiter in Guava is implemented based on the token bucket, and can be used directly.

2.4.3 Advantages and Disadvantages

Advantages:

1) Ability to handle burst traffic. The token bucket algorithm can handle burst traffic. When the bucket is full, it can process requests at maximum rate, which is useful for application scenarios where burst traffic needs to be handled.

2) Limit on average rate. In long-term operation, the data transmission rate will be limited to a predefined average rate, that is, the rate at which tokens are generated is limited.

3) Flexibility. The token bucket algorithm provides more flexibility than the leaky bucket algorithm does. For example, it can dynamically adjust the rate at which tokens are generated.

Disadvantages:

1) Potential overload. If the token generation rate is too fast, it may cause a large amount of burst traffic, which may overload the network or service.

2) Requirements for storage space. Token buckets require a certain amount of storage space to store tokens, which may result in a waste of memory resources.

3) Complexity in implementation. Compared with the counter algorithm, the implementation of the token bucket algorithm is more complex.

3. Application Practice

RateLimiter in Guava is implemented based on the token bucket, and can be used directly. In the following example, the whole practice is based on the application of Guava.

3.1 Introduce Dependencies

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>32.1.3-jre</version>
</dependency>

3.2 Directly Use the API

3.2.1 Generate Tokens at a Fixed Rate

    @Test
    public void acquireTest() {
        // Tokens are generated at a fixed rate of 5 tokens per second.
        RateLimiter rateLimiter = RateLimiter.create(5);
        for (int i = 0; i < 10; i++) {
            double time = rateLimiter.acquire();
            logger.info("等待时间：{}s", time);
        }
    }

Results:

As shown, a token is generated and a request is allowed every 200ms. This means five requests are allowed in one second. RateLimiter can be used to implement throttling on a single server.

3.2.2 Generate Multiple Tokens Simultaneously

Back to the burst traffic situation we mentioned earlier, so how does the token bucket solve this problem? RateLimiter introduces a concept of pre-consumption.

The number of tokens applied will not affect the response time of the application. The two requests, acquire(1) and acquire(1000), consume the same time to return the result. However, this will affect the response time of the next request.

If a task that consumes a large number of tokens is sent to an idle RateLimiter, it is immediately approved for execution, but when the next request reaches, it will wait an additional period of time to pay for the time cost of the previous request.

The following example explains why. Consider a system in an idle state. A task that will consume 100 tokens suddenly comes. In this case, waiting for 100 seconds in vain is a waste of resources. Therefore, the token bucket allows the task to be executed first and extends the throttling period for subsequent requests, thus handling burst traffic.

    @Test
    public void acquireSmoothly() {
        RateLimiter rateLimiter = RateLimiter.create(5, 3, TimeUnit.SECONDS);
        long startTimeStamp = System.currentTimeMillis();
        for (int i = 0; i < 15; i++) {
            double time = rateLimiter.acquire();
            logger.info("Wait time:{}s, Total time:{}ms", time, System.currentTimeMillis() - startTimeStamp);
        }
    }

Results:

It can be seen that the token issuance time gradually shortened from the initial 500ms to a constant 200ms after 3 seconds.

In general, the RateLimiter feature implemented based on the token bucket is powerful. In addition to throttling, RateLimiter can distribute requests evenly in each time period, so it is a widely applied throttling component in single server scenarios.

3.3 AOP Aspect

Step 1: Create an Annotation

@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.METHOD})
@Documented
public @interface Limit {
    // The primary key of the resource.
    String key() default "";
    // The maximum number of accesses, which represents the total number of requests.
    double permitsPerSeconds();
    // Time: Only permitsPerSeconds requests are allowed to access the system within the timeout period.
    long timeout();
    // The time type.
    TimeUnit timeUnit() default TimeUnit.MILLISECONDS;

    // The prompt message.
    String msg() default "The system is busy. Please try again later.";
}

Step 2: Implement the AOP Aspect

@Aspect
@Component
public class LimitAspect {
    Logger logger = LoggerFactory.getLogger(LimitAspect.class);
    private final Map<String, RateLimiter> limitMap = Maps.newConcurrentMap();

    @Around("@annotation(com.alibaba.xxx.xxx.annotation.Limit)")
    public Object around(ProceedingJoinPoint joinPoint) throws Throwable {
        MethodSignature signature = (MethodSignature) joinPoint.getSignature();
        Method method = signature.getMethod();
        // Acquire the limit annotation.
        Limit limit = method.getAnnotation(Limit.class);
        if (limit != null) {
            // Key function: Different throttling for different interfaces.
            String key = limit.key();
            RateLimiter rateLimiter;
            // Verify whether the cache has a hit key.
            if (!limitMap.containsKey(key)) {
                // Create a token bucket.
                rateLimiter = RateLimiter.create(limit.permitsPerSeconds());
                limitMap.put(key, rateLimiter);
                logger.info("New token bucket={},Capacity={}", key, limit.permitsPerSeconds());
            }
            rateLimiter = limitMap.get(key);
            // Acquire the token.
            boolean acquire = rateLimiter.tryAcquire(limit.timeout(), limit.timeUnit());
            // Fail to acquire the token. The exception message returned.
            if (!acquire) {
                logger.debug("Token bucket={},Token acquisition failed", key);
                throw new RuntimeException(limit.msg());
            }
        }
        return joinPoint.proceed();
    }
}

Step 3: Apply

@Limit(key = "query",permitsPerSeconds = 1,timeout = 1,msg = "API throttling is triggered. Please try again.")

Step 4: The Location

If it is placed on a mapping API for HTTP, the following is returned:

{
    "timestamp": "2023-12-07 11:21:47",
    "status": 500,
    "error": "Internal Server Error",
    "path": "/table/query"
}

If it is placed on a service API, the following is returned:

{
    "code": -1,
    "message": "API throttling is triggered. Please try again",
    "data": "fail"
}

4. Summary

The implementation method described in this article is an application-level throttling method, which only performs throttling in a single application. Global throttling is not supported. If the application is deployed to multiple machines, distributed throttling or access-layer throttling is required.

In general, throttling is essential for the stress tolerance of systems. Although some user requests may be discarded, these losses are generally acceptable compared with system crashes caused by burst traffic. As mentioned earlier, throttling can be integrated with fusing and degradation to ensure the availability and robustness of services.

Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

Community

Implementation of Java API Throttling

1. Throttling

Why Throttling Is Needed?

What is Throttling?

What are the Common Throttling Algorithms?

2. Throttling Algorithms

2.1 Fixed Window

2.1.1 Implementation Principle

2.1.2 Code Implementation

2.1.3 Advantages and Disadvantages

2.2 Sliding Window

2.2.1 Implementation Principle

2.2.2 Code Implementation

2.2.3 Advantages and Disadvantages

2.3 Leaky Bucket Algorithm

2.3.1 Implementation Principle

2.3.2 Code Implementation

2.3.3 Advantages and Disadvantages

2.4 Token Bucket Algorithm

2.4.1 Implementation Principle

2.4.2 Code Implementation

2.4.3 Advantages and Disadvantages

3. Application Practice

3.1 Introduce Dependencies

3.2 Directly Use the API

3.2.1 Generate Tokens at a Fixed Rate

3.2.2 Generate Multiple Tokens Simultaneously

3.3 AOP Aspect

Step 1: Create an Annotation

Step 2: Implement the AOP Aspect

Step 3: Apply

Step 4: The Location

4. Summary

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

OpenAPI Explorer

API Gateway

IDaaS

Elastic Desktop Service