Rate Limiter Implementation — Token Bucket Algorithm

Nataraj Srikantaiah
January 4, 2021
Token Bucket Image

API Rate Limiting

Rate limiting is a strategy to limit the access to APIs. It restricts the number of API calls that a client can make within any given timeframe. This helps to defend the API against abuse, both unintentional and malicious scripts.

Rate limits are often applied to an API by tracking the IP address, API keys or access tokens, etc. As an API developers, we can choose to respond in several different ways when a client reaches the limit.

  • Queueing the request until the remaining time period has elapsed.
  • Allowing the request immediately but charging extra for this request.
  • Most common one is rejecting the request (HTTP 429 Too Many Requests)

Token Bucket Algorithm

Assume that we have a bucket, the capacity is defined as the number of tokens that it can hold. Whenever a consumer wants to access an API endpoint, it must get a token from the bucket. Token is removed from the bucket if it’s available and accept the request. If the token is not available then the server rejects the request.

As requests are consuming tokens, we also need to refill them at some fixed rate and time, such that we never exceed the capacity of the bucket. Let’s consider an API that has a rate limit of 100 requests per minute. We can create a bucket with a capacity of 100, and a refill rate of 100 tokens per minute.

Please refer to the Understanding Rate Limiting Algorithms blog where the Token Bucket and other algorithms have been explained in detail.

Building a Springboot Application with API Rate Limiter

Create a new spring boot application from Spring Initializr with dependency on spring web module.

Unzip the downloaded project and import to your IDE. Let’s begin by adding the bucket4j dependency to our pom.xml


We are going to implement a simple calculator REST APIs that can do operations like add and subtract.

@RestController@RequestMapping(value = "/api/calculator")
public class CalculatorController {    
   @GetMapping(value = "/add")    
   public ResponseEntity add(@RequestParam int left, @RequestParam int right) {        
      return ResponseEntity.ok(Calculator.builder().operation("add").answer(left + right).build());    
   @GetMapping(value = "/subtract")    
   public ResponseEntity subtract(@RequestParam int left, @RequestParam int right) {        
      return ResponseEntity.ok(Calculator.builder().operation("subtract").answer(left - right).build());    

Let’s ensure that our above APIs are up and running as expected. You can use the cURL or PostMan to make an API call.

curl -X GET -H "Content-Type: application/json" 'http://localhost:9090/api/calculator/add?left=20&right=30'{"operation":"add","answer":50}

Now that we have APIs ready to consume, next let’s introduce some subscription plans with rate limits. Let’s assume that we have the following subscription plans for our clients:

  • Free Subscription allows 2 requests per 60 seconds.
  • Basic Subscription allows 10 requests per 60 seconds.
  • Professional Subscription allows 20 requests per 60 seconds.

Each API client gets a unique API key that they must send along with each request. This would help us identify the client and subscription plan linked.

public enum SubscriptionPlan {    
   private int bucketLimit;    
   private SubscriptionPlan(int bucketLimit) {        
      this.bucketLimit = bucketLimit;    
   public int getBucketLimit() {        
      return this.bucketLimit;    
   public Bandwidth getBandwidth() {        
      return Bandwidth.classic(bucketLimit,                
      Refill.intervally(bucketLimit, Duration.ofMinutes(1)));    

Next we create a subscription service which will store the bucket reference for each of the API client in a memory.

public class SubscriptionService {

    private final Map
            subscriptionCacheMap = new ConcurrentHashMap<>();

    public Bucket resolveBucket(String subscriptionKey) {
        return subscriptionCacheMap.computeIfAbsent(
                subscriptionKey, this::getSubscriptionBucket);

    private Bucket getSubscriptionBucket(String subscriptionKey) {
        return buildBucket(

    private Bucket buildBucket(Bandwidth limit) {
        return Bucket4j.builder().addLimit(limit).build();

    private SubscriptionPlan resolveSubscriptionPlanByKey(
            String subscriptionKey) {
        if (subscriptionKey.startsWith("PS1129-")) {
            return SubscriptionPlan.SUBSCRIPTION_PROFESSIONAL;
        } else if (subscriptionKey.startsWith("BS1129-")) {
            return SubscriptionPlan.SUBSCRIPTION_BASIC;

        return SubscriptionPlan.SUBSCRIPTION_FREE;

Let’s understand the implementation. The API client sends an API key with the X-Subscription-Key request header. We use the SubscriptionService to get the bucket for this API key and check whether the request is allowed by consuming a token from the bucket.

In order to enhance the client experience of the API, we will add the following additional response headers to send information about the rate limit.

  • X-Rate-Limit-Remaining - number of tokens remaining in the current time window.
  • X-Rate-Limit-Retry-After-Seconds - remaining time in seconds until the bucket is refilled with new tokens.

We can call ConsumptionProbe methods getRemainingTokens and getNanosToWaitForRefill, to get the count of the remaining tokens in the bucket and the time remaining until the next refill, respectively. The getNanosToWaitForRefill method returns 0 if we are able to consume the token successfully.

Let’s create a RateLimitInterceptor and implement the rate limit code in the preHandle method instead of writing in every API method as we will have cleaner implementation.

public class RateLimiterInterceptor implements HandlerInterceptor {    
   private static final String HEADER_SUBSCRIPTION_KEY = "X-Subscription-Key";    
   private static final String HEADER_LIMIT_REMAINING = "X-Rate-Limit-Remaining";    
   private static final String HEADER_RETRY_AFTER = "X-Rate-Limit-Retry-After-Seconds";    
   private static final String SUBSCRIPTION_QUOTA_EXHAUSTED = 
      "You've exhausted your API Request Quota. Please upgrade your subscription plan.";    
   private SubscriptionService subscriptionService;    
   public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {        
      String subscriptionKey = request.getHeader(HEADER_SUBSCRIPTION_KEY);        
      if (StringUtils.isEmpty(subscriptionKey)) {            
            "Missing Request Header: " + HEADER_SUBSCRIPTION_KEY);            
         return false;        
      Bucket tokenBucket = subscriptionService.resolveBucket(subscriptionKey);        
      ConsumptionProbe consumptionProbe = tokenBucket.tryConsumeAndReturnRemaining(1);        
      if (!consumptionProbe.isConsumed()) {            
         long waitTime = consumptionProbe.getNanosToWaitForRefill() / 1_000_000_000;            
         response.addHeader(HEADER_RETRY_AFTER, String.valueOf(waitTime));            
         response.sendError(HttpStatus.TOO_MANY_REQUESTS.value(), SUBSCRIPTION_QUOTA_EXHAUSTED);            
         return false;        
      response.addHeader(HEADER_LIMIT_REMAINING, String.valueOf(consumptionProbe.getRemainingTokens()));                
      return true;    

Finally, let’s add the interceptor to the InterceptorRegistry of Springboot so that the RateLimitInterceptor intercepts each request to our calculator API endpoints.

public class TokenBucketApplication implements WebMvcConfigurer {   
   @Autowired @Lazy   
   private RateLimiterInterceptor interceptor;   
   public void addInterceptors(InterceptorRegistry registry) {      
   public static void main(String[] args) {   , args);   

Let invoke calculator API to see the behaviour.

curl -X GET -H "Content-Type: application/json" 'http://localhost:9090/api/calculator/add?left=20&right=30'{"timestamp":"2020-12-25T12:43:43.239+0000","status":400,"error":"Bad Request","message":"Missing Request Header: X-Subscription-Key","path":"/api/calculator/add"}

The client has to send the API key within the http header otherwise the interceptor will not process the request. Let’s add the API key to the header and make the call.

curl -v -X GET -H "Content-Type: application/json" -H "X-subscription-key:A1129-12" 'http://localhost:9090/api/calculator/add?left=20&right=30'
* Connected to localhost (::1) port 9090 (#0)
> GET /api/calculator/add?left=20&right=30 HTTP/1.1
> Host: localhost:9090> User-Agent: curl/7.64.1
> Accept: */*
> Content-Type: application/json
> X-subscription-key:A1129-12
< HTTP/1.1 200< X-Rate-Limit-Remaining: 1
< Content-Type: application/json< Transfer-Encoding: chunked
< Date: Fri, 25 Dec 2020 12:46:06 GMT
* Connection #0 to host localhost left intact{"operation":"add","answer":50}
* Closing connection 0

You can see the API key is added in the header, the API responds to our request and also it has added response header which shows how many rate is remaining for the API key.

Let’s make 2 more calls then we should see that we exhausted our rate for the free plan and returns 429 as response.

curl -v -X GET -H "Content-Type: application/json" -H "X-subscription-key:A1129-12" 'http://localhost:9090/api/calculator/add?left=20&right=30'
* Connected to localhost (::1) port 9090 (#0)
> GET /api/calculator/add?left=20&right=30 HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Type: application/json
> X-subscription-key:A1129-12
< HTTP/1.1 429
< X-Rate-Limit-Retry-After-Seconds: 51
< Content-Type: application/json
< Transfer-Encoding: chunked
< Date: Fri, 25 Dec 2020 12:49:11 GMT
* Connection #0 to host localhost left intact{"timestamp":"2020-12-25T12:49:11.358+0000","status":429,"error":"Too Many Requests","message":"You've exhausted your API Request Quota. Please upgrade your subscription plan.","path":"/api/calculator/add"}
* Closing connection 0

It looks like we have successfully implemented the rate limiter using the Token Bucket algorithm. We can keep adding endpoints and the interceptor would apply the rate limit for each request.

As usual, the source code for the above spring boot implementation is available over on GitHub.

API Gateway
Application Security
Scalable Applications
Rate Limiter

About Quinbay

Quinbay is a dynamic one stop technology company driven by the passion to disrupt technology today and define the future.
We private label and create digital future tech platforms for you.

Digitized . Automated . Intelligent