Which rate limiting algorithm is best?

Token bucket for simplicity, sliding window for precision. Token bucket allows short bursts (useful for UX), sliding windows prevent gaming the system. For most APIs, token bucket with small burst is the ideal balance.

Should I rate limit by IP or by user?

Both. By IP for public endpoints (prevents attacks before authentication), by user for authenticated endpoints (more precise). Combine both: generous IP limit, strict user limit.

How do I handle legitimate users who exceed the limit?

First, verify your limits are reasonable. Then, implement an appeal system: users can contact support, you review logs, and if legitimate, increase their limit. Also consider degrading gracefully: instead of 429, return cached data with a header indicating it's rate limited.

What status code to use for rate limiting?

429 Too Many Requests. Include Retry-After header with seconds until they can retry. Some use 503 Service Unavailable, but 429 is more semantically correct and clients recognize it specifically.

Rate Limit Policy Generator

Effective rate limiting strategies

IP-based rate limiting protects against distributed attacks but penalizes users behind corporate NATs where hundreds of employees share one IP. Combine generous IP limits with strict authenticated user limits. Example: /login allows 5 attempts per IP every 15 minutes, but only 3 per email in the same window. This prevents brute force without blocking entire offices.

Sliding windows are superior to fixed windows. With 1-minute fixed windows, an attacker can make 100 requests at 10:00:59 and another 100 at 10:01:00, achieving 200 requests in 2 seconds. Sliding windows count the last 60 seconds at any moment, preventing this burst. Implement them with sorted sets in Redis: ZADD with timestamp, ZREMRANGEBYSCORE to clean old entries.

Limits should reflect real usage patterns. Analyze your logs: how many requests does an average user make? And the 95th percentile? Your limit should be between the 95th and 99th. A legitimate user should never hit the limit in normal use. If they do regularly, your limit is too low or your frontend polls inefficiently.

Implementing rate limiting

Redis is the standard backend for distributed rate limiting. The token bucket algorithm is simple: INCR a key with TTL. If the value exceeds the limit, reject. For sliding windows, use ZSET: ZADD key timestamp 1, ZCOUNT key (now-window) now to count, EXPIRE key window. This scales to millions of requests without consistency issues.

In small applications, in-process memory stores work. Node has rate-limiter-flexible, Python has slowapi. These use in-memory dictionaries with timestamps. The problem: they don't share state between instances. If you have 3 servers, each allows 100 req/min independently, giving you 300 total. Works if your limits are generous; fails if you need precision.

Response headers are crucial: X-RateLimit-Limit (total limit), X-RateLimit-Remaining (available requests), X-RateLimit-Reset (reset timestamp). Well-designed clients read these headers and adjust their behavior automatically. Also include Retry-After in the 429 response to tell the client when to retry.

Rate limiting by endpoint type

Authentication endpoints need aggressive limits: 5 login attempts per IP every 15 minutes prevents brute force without frustrating users who forgot their password. Password reset should be even more restrictive (3/hour) because each request sends an expensive email and attackers can use it to enumerate valid users.

Public APIs need tiers: free (100 req/min), pro (1000 req/min), enterprise (no limit). Implement this with different API keys linked to plans. The free tier should be enough for development but not for serious production. This incentivizes upgrades without blocking initial adoption.

Expensive operations (exports, reports, ML inference) deserve long-term limits: 10 requests/hour instead of 10/minute. A user exporting 10 reports in 1 minute is probably an automated script, not a human. Hour limits prevent abuse without affecting legitimate use spaced over time.

Handling exceptions and edge cases

Webhooks from external providers need whitelisting. If Stripe sends payment events, you can't rate limit them strictly or you'll lose events. Configure generous limits (200 req/min) and validate the webhook signature. If signature fails, apply aggressive rate limiting because it's probably an attacker.

Enterprise users often need custom limits. Instead of hardcoding exceptions, implement an override system: a table in your DB with user_id and custom_limits. Your rate limiting middleware checks this first. This lets you give a client 10,000 req/min without deploying new code.

During incidents, you need a kill switch. A feature flag that reduces all limits to 10% of normal can save you from outages. When your database is dying, reducing limits to 10 req/min globally is better than going down completely. Implement this in your rate limiter config, not application code.

Rate Limit Policy Generator

Effective rate limiting strategies

Implementing rate limiting

Rate limiting by endpoint type

Handling exceptions and edge cases

FAQ

Other generators you might like