Rate limiting is a strategy to limit the access to APIs. It restricts the number of API calls that a client can make within any given timeframe. This helps to defend the API against abuse, both unintentional and malicious scripts.
Rate limits are often applied to an API by tracking the IP address, API keys or access tokens, etc. As an API developers, we can choose to respond in several different ways when a client reaches the limit.
Queueing the request until the remaining time period has elapsed.
Allowing the request immediately but charging extra for this request.
Most common one is rejecting the request (HTTP 429 Too Many Requests)
Token Bucket Algorithm
Assume that we have a bucket, the capacity is defined as the number of tokens that it can hold. Whenever a consumer wants to access an API endpoint, it must get a token from the bucket. Token is removed from the bucket if it’s available and accept the request. If the token is not available then the server rejects the request.
As requests are consuming tokens, we also need to refill them at some fixed rate and time, such that we never exceed the capacity of the bucket. Let’s consider an API that has a rate limit of 100 requests per minute. We can create a bucket with a capacity of 100, and a refill rate of 100 tokens per minute.
We are going to implement a simple calculator REST APIs that can do operations like add and subtract.
@RestController@RequestMapping(value = "/api/calculator")
public class CalculatorController {
@GetMapping(value = "/add")
public ResponseEntity add(@RequestParam int left, @RequestParam int right) {
return ResponseEntity.ok(Calculator.builder().operation("add").answer(left + right).build());
}
@GetMapping(value = "/subtract")
public ResponseEntity subtract(@RequestParam int left, @RequestParam int right) {
return ResponseEntity.ok(Calculator.builder().operation("subtract").answer(left - right).build());
}
}
Let’s ensure that our above APIs are up and running as expected. You can use the cURL or PostMan to make an API call.
curl -X GET -H "Content-Type: application/json" 'http://localhost:9090/api/calculator/add?left=20&right=30'{"operation":"add","answer":50}
Now that we have APIs ready to consume, next let’s introduce some subscription plans with rate limits. Let’s assume that we have the following subscription plans for our clients:
Free Subscription allows 2 requests per 60 seconds.
Basic Subscription allows 10 requests per 60 seconds.
Professional Subscription allows 20 requests per 60 seconds.
Each API client gets a unique API key that they must send along with each request. This would help us identify the client and subscription plan linked.
public enum SubscriptionPlan {
SUBSCRIPTION_FREE(2),
SUBSCRIPTION_BASIC(10),
SUBSCRIPTION_PROFESSIONAL(20);
private int bucketLimit;
private SubscriptionPlan(int bucketLimit) {
this.bucketLimit = bucketLimit;
}
public int getBucketLimit() {
return this.bucketLimit;
}
public Bandwidth getBandwidth() {
return Bandwidth.classic(bucketLimit,
Refill.intervally(bucketLimit, Duration.ofMinutes(1)));
}
}
Next we create a subscription service which will store the bucket reference for each of the API client in a memory.
@Service
public class SubscriptionService {
private final Map
subscriptionCacheMap = new ConcurrentHashMap<>();
public Bucket resolveBucket(String subscriptionKey) {
return subscriptionCacheMap.computeIfAbsent(
subscriptionKey, this::getSubscriptionBucket);
}
private Bucket getSubscriptionBucket(String subscriptionKey) {
return buildBucket(
resolveSubscriptionPlanByKey(subscriptionKey)
.getBandwidth());
}
private Bucket buildBucket(Bandwidth limit) {
return Bucket4j.builder().addLimit(limit).build();
}
private SubscriptionPlan resolveSubscriptionPlanByKey(
String subscriptionKey) {
if (subscriptionKey.startsWith("PS1129-")) {
return SubscriptionPlan.SUBSCRIPTION_PROFESSIONAL;
} else if (subscriptionKey.startsWith("BS1129-")) {
return SubscriptionPlan.SUBSCRIPTION_BASIC;
}
return SubscriptionPlan.SUBSCRIPTION_FREE;
}
}
Let’s understand the implementation. The API client sends an API key with the X-Subscription-Key request header. We use the SubscriptionService to get the bucket for this API key and check whether the request is allowed by consuming a token from the bucket.
In order to enhance the client experience of the API, we will add the following additional response headers to send information about the rate limit.
X-Rate-Limit-Remaining - number of tokens remaining in the current time window.
X-Rate-Limit-Retry-After-Seconds - remaining time in seconds until the bucket is refilled with new tokens.
We can call ConsumptionProbe methods getRemainingTokens and getNanosToWaitForRefill, to get the count of the remaining tokens in the bucket and the time remaining until the next refill, respectively. The getNanosToWaitForRefill method returns 0 if we are able to consume the token successfully.
Let’s create a RateLimitInterceptor and implement the rate limit code in the preHandle method instead of writing in every API method as we will have cleaner implementation.
@Component
public class RateLimiterInterceptor implements HandlerInterceptor {
private static final String HEADER_SUBSCRIPTION_KEY = "X-Subscription-Key";
private static final String HEADER_LIMIT_REMAINING = "X-Rate-Limit-Remaining";
private static final String HEADER_RETRY_AFTER = "X-Rate-Limit-Retry-After-Seconds";
private static final String SUBSCRIPTION_QUOTA_EXHAUSTED =
"You've exhausted your API Request Quota. Please upgrade your subscription plan.";
@Autowired
private SubscriptionService subscriptionService;
@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
String subscriptionKey = request.getHeader(HEADER_SUBSCRIPTION_KEY);
if (StringUtils.isEmpty(subscriptionKey)) {
response.sendError(HttpStatus.BAD_REQUEST.value(),
"Missing Request Header: " + HEADER_SUBSCRIPTION_KEY);
return false;
}
Bucket tokenBucket = subscriptionService.resolveBucket(subscriptionKey);
ConsumptionProbe consumptionProbe = tokenBucket.tryConsumeAndReturnRemaining(1);
if (!consumptionProbe.isConsumed()) {
long waitTime = consumptionProbe.getNanosToWaitForRefill() / 1_000_000_000;
response.addHeader(HEADER_RETRY_AFTER, String.valueOf(waitTime));
response.setContentType(MediaType.APPLICATION_JSON_VALUE);
response.sendError(HttpStatus.TOO_MANY_REQUESTS.value(), SUBSCRIPTION_QUOTA_EXHAUSTED);
return false;
}
response.addHeader(HEADER_LIMIT_REMAINING, String.valueOf(consumptionProbe.getRemainingTokens()));
return true;
}
}
Finally, let’s add the interceptor to the InterceptorRegistry of Springboot so that the RateLimitInterceptor intercepts each request to our calculator API endpoints.
@SpringBootApplication
public class TokenBucketApplication implements WebMvcConfigurer {
@Autowired @Lazy
private RateLimiterInterceptor interceptor;
public void addInterceptors(InterceptorRegistry registry) {
registry.addInterceptor(interceptor).addPathPatterns("/api/calculator/**");
}
public static void main(String[] args) {
SpringApplication.run(TokenBucketApplication.class, args);
}
}
The client has to send the API key within the http header otherwise the interceptor will not process the request. Let’s add the API key to the header and make the call.
curl -v -X GET -H "Content-Type: application/json" -H "X-subscription-key:A1129-12" 'http://localhost:9090/api/calculator/add?left=20&right=30'
* Connected to localhost (::1) port 9090 (#0)
> GET /api/calculator/add?left=20&right=30 HTTP/1.1
> Host: localhost:9090> User-Agent: curl/7.64.1
> Accept: */*
> Content-Type: application/json
> X-subscription-key:A1129-12
>
< HTTP/1.1 200< X-Rate-Limit-Remaining: 1
< Content-Type: application/json< Transfer-Encoding: chunked
< Date: Fri, 25 Dec 2020 12:46:06 GMT
<
* Connection #0 to host localhost left intact{"operation":"add","answer":50}
* Closing connection 0
You can see the API key is added in the header, the API responds to our request and also it has added response header which shows how many rate is remaining for the API key.
Let’s make 2 more calls then we should see that we exhausted our rate for the free plan and returns 429 as response.
curl -v -X GET -H "Content-Type: application/json" -H "X-subscription-key:A1129-12" 'http://localhost:9090/api/calculator/add?left=20&right=30'
* Connected to localhost (::1) port 9090 (#0)
> GET /api/calculator/add?left=20&right=30 HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Type: application/json
> X-subscription-key:A1129-12
>
< HTTP/1.1 429
< X-Rate-Limit-Retry-After-Seconds: 51
< Content-Type: application/json
< Transfer-Encoding: chunked
< Date: Fri, 25 Dec 2020 12:49:11 GMT
<
* Connection #0 to host localhost left intact{"timestamp":"2020-12-25T12:49:11.358+0000","status":429,"error":"Too Many Requests","message":"You've exhausted your API Request Quota. Please upgrade your subscription plan.","path":"/api/calculator/add"}
* Closing connection 0
It looks like we have successfully implemented the rate limiter using the Token Bucket algorithm. We can keep adding endpoints and the interceptor would apply the rate limit for each request.
As usual, the source code for the above spring boot implementation is available over on GitHub.
Manual Testing becomes challenging when there are too many features to be checked. Thankfully we have automation to deal with time-consuming Manual checks.
MongoDB support for Snappy, Zlib and Zstd compression for data. Its a tradeoff on how much CPU is used to perform the compression post data insertion and reads.
Start a new activity and get some result back from the newly started activity. I know what you are thinking... startActivityForResult() and onActivityResult()
Synthetic monitoring is a way to monitor your applications by simulating user actions. We create a JS code snippet using the Selenium WebDriverJS libraries.
Sliding Log rate limiting involves tracking a time stamp log for each request. Logs are stored in a hash set or table that is sorted by sorted by time.
Rate Limiting helps to protect our services against abusive behaviours targeting an application layer like denial of service attacks, brute-force attempts etc.
Any e-commerce search engines rely on parameters such as product popularity, rating, click through rate etc to influence the result set for an input user query.
Caching isn’t an architecture, it’s just about optimisation. It provides fast response time, enabling effortless performance improvements in many use cases.
Apache Maven is a software project management and comprehension tool, which can manage a project’s build and reporting from a central piece of information.
We wanted to build a recommendation engine but as the calculation of scores would be computationally heavy. Hence went to utilise the capabilities of BigQuery.
This article is focused on tunning G1 GC params in Solr and tell how blibli.com‘s search gained huge performance gains with limited resources by tweaks.
UI Rendering was slow on search listing pages on our site as we had multiple cross-cutting components, redundant & sub-optimal implementations during the load.
React Js is one of the most popular UI technologies nowadays, alongside Angular and Vue.
Rohit Agrawal
|
July 21, 2019
About Quinbay
Quinbay is a dynamic one stop technology company driven by the passion to disrupt technology today and define the future. We private label and create digital future tech platforms for you. Digitized . Automated . Intelligent
Follow us on
Subscribe to our blog
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.