API Rate Limiting
Rate limiting is a strategy to limit the access to APIs. It restricts the number of API calls that a client can make within any given timeframe. This helps to defend the API against abuse, both unintentional and malicious scripts.
Rate limits are often applied to an API by tracking the IP address, API keys or access tokens, etc. As an API developers, we can choose to respond in several different ways when a client reaches the limit.
- Queueing the request until the remaining time period has elapsed.
- Allowing the request immediately but charging extra for this request.
- Most common one is rejecting the request (HTTP 429 Too Many Requests)
Token Bucket Algorithm
Assume that we have a bucket, the capacity is defined as the number of tokens that it can hold. Whenever a consumer wants to access an API endpoint, it must get a token from the bucket. Token is removed from the bucket if it’s available and accept the request. If the token is not available then the server rejects the request.
As requests are consuming tokens, we also need to refill them at some fixed rate and time, such that we never exceed the capacity of the bucket. Let’s consider an API that has a rate limit of 100 requests per minute. We can create a bucket with a capacity of 100, and a refill rate of 100 tokens per minute.
Please refer to the Understanding Rate Limiting Algorithms blog where the Token Bucket and other algorithms have been explained in detail.
Building a Springboot Application with API Rate Limiter
Create a new spring boot application from Spring Initializr with dependency on spring web module.
Unzip the downloaded project and import to your IDE. Let’s begin by adding the bucket4j dependency to our pom.xml
We are going to implement a simple calculator REST APIs that can do operations like add and subtract.
Let’s ensure that our above APIs are up and running as expected. You can use the cURL or PostMan to make an API call.
Now that we have APIs ready to consume, next let’s introduce some subscription plans with rate limits. Let’s assume that we have the following subscription plans for our clients:
- Free Subscription allows 2 requests per 60 seconds.
- Basic Subscription allows 10 requests per 60 seconds.
- Professional Subscription allows 20 requests per 60 seconds.
Each API client gets a unique API key that they must send along with each request. This would help us identify the client and subscription plan linked.
Next we create a subscription service which will store the bucket reference for each of the API client in a memory.
Let’s understand the implementation. The API client sends an API key with the X-Subscription-Key request header. We use the SubscriptionService to get the bucket for this API key and check whether the request is allowed by consuming a token from the bucket.
In order to enhance the client experience of the API, we will add the following additional response headers to send information about the rate limit.
- X-Rate-Limit-Remaining - number of tokens remaining in the current time window.
- X-Rate-Limit-Retry-After-Seconds - remaining time in seconds until the bucket is refilled with new tokens.
We can call ConsumptionProbe methods getRemainingTokens and getNanosToWaitForRefill, to get the count of the remaining tokens in the bucket and the time remaining until the next refill, respectively. The getNanosToWaitForRefill method returns 0 if we are able to consume the token successfully.
Let’s create a RateLimitInterceptor and implement the rate limit code in the preHandle method instead of writing in every API method as we will have cleaner implementation.
Finally, let’s add the interceptor to the InterceptorRegistry of Springboot so that the RateLimitInterceptor intercepts each request to our calculator API endpoints.
Let invoke calculator API to see the behaviour.
The client has to send the API key within the http header otherwise the interceptor will not process the request. Let’s add the API key to the header and make the call.
You can see the API key is added in the header, the API responds to our request and also it has added response header which shows how many rate is remaining for the API key.
Let’s make 2 more calls then we should see that we exhausted our rate for the free plan and returns 429 as response.
It looks like we have successfully implemented the rate limiter using the Token Bucket algorithm. We can keep adding endpoints and the interceptor would apply the rate limit for each request.
As usual, the source code for the above spring boot implementation is available over on GitHub.