Effective rate limiting strategies
IP-based rate limiting protects against distributed attacks but penalizes users behind corporate NATs where hundreds of employees share one IP. Combine generous IP limits with strict authenticated user limits. Example: /login allows 5 attempts per IP every 15 minutes, but only 3 per email in the same window. This prevents brute force without blocking entire offices.
Sliding windows are superior to fixed windows. With 1-minute fixed windows, an attacker can make 100 requests at 10:00:59 and another 100 at 10:01:00, achieving 200 requests in 2 seconds. Sliding windows count the last 60 seconds at any moment, preventing this burst. Implement them with sorted sets in Redis: ZADD with timestamp, ZREMRANGEBYSCORE to clean old entries.
Limits should reflect real usage patterns. Analyze your logs: how many requests does an average user make? And the 95th percentile? Your limit should be between the 95th and 99th. A legitimate user should never hit the limit in normal use. If they do regularly, your limit is too low or your frontend polls inefficiently.
Implementing rate limiting
Redis is the standard backend for distributed rate limiting. The token bucket algorithm is simple: INCR a key with TTL. If the value exceeds the limit, reject. For sliding windows, use ZSET: ZADD key timestamp 1, ZCOUNT key (now-window) now to count, EXPIRE key window. This scales to millions of requests without consistency issues.
In small applications, in-process memory stores work. Node has rate-limiter-flexible, Python has slowapi. These use in-memory dictionaries with timestamps. The problem: they don't share state between instances. If you have 3 servers, each allows 100 req/min independently, giving you 300 total. Works if your limits are generous; fails if you need precision.
Response headers are crucial: X-RateLimit-Limit (total limit), X-RateLimit-Remaining (available requests), X-RateLimit-Reset (reset timestamp). Well-designed clients read these headers and adjust their behavior automatically. Also include Retry-After in the 429 response to tell the client when to retry.
Rate limiting by endpoint type
Authentication endpoints need aggressive limits: 5 login attempts per IP every 15 minutes prevents brute force without frustrating users who forgot their password. Password reset should be even more restrictive (3/hour) because each request sends an expensive email and attackers can use it to enumerate valid users.
Public APIs need tiers: free (100 req/min), pro (1000 req/min), enterprise (no limit). Implement this with different API keys linked to plans. The free tier should be enough for development but not for serious production. This incentivizes upgrades without blocking initial adoption.
Expensive operations (exports, reports, ML inference) deserve long-term limits: 10 requests/hour instead of 10/minute. A user exporting 10 reports in 1 minute is probably an automated script, not a human. Hour limits prevent abuse without affecting legitimate use spaced over time.
Handling exceptions and edge cases
Webhooks from external providers need whitelisting. If Stripe sends payment events, you can't rate limit them strictly or you'll lose events. Configure generous limits (200 req/min) and validate the webhook signature. If signature fails, apply aggressive rate limiting because it's probably an attacker.
Enterprise users often need custom limits. Instead of hardcoding exceptions, implement an override system: a table in your DB with user_id and custom_limits. Your rate limiting middleware checks this first. This lets you give a client 10,000 req/min without deploying new code.
During incidents, you need a kill switch. A feature flag that reduces all limits to 10% of normal can save you from outages. When your database is dying, reducing limits to 10 req/min globally is better than going down completely. Implement this in your rate limiter config, not application code.