AI API rate limits, by provider
Rate limits are the invisible ceiling every AI app eventually hits. Providers cap you on
requests per minute (RPM), tokens per minute (TPM), and sometimes requests or tokens per
day — and each provider counts, resets, and surfaces those limits differently. A request
that sails through at 2 a.m. can 429 at peak traffic because you crossed a
TPM line you didn't know existed.
These references lay out the actual numbers per provider and tier, explain which response
headers tell you how much headroom is left (x-ratelimit-remaining-* and
friends), and show how the limits scale as you move up usage tiers. Where a provider gates
higher limits behind spend or time-on-platform, we spell out the checklist. Treat every
number as "as of" its stamped date — providers raise and reshuffle limits frequently, so
confirm against your own dashboard before you size a workload.
-
Anthropic Claude Rate Limits Explained: RPM, ITPM & OTPM
Anthropic Claude rate limits reference: requests, input tokens, and output tokens per minute, usage tiers, the rate-limit headers, and handling 429 vs 529.
Updated June 2, 2026 -
OpenAI Rate Limits Explained: RPM, TPM & Tiers
OpenAI rate limits reference: how RPM and TPM work, the usage-tier table, the response headers that show your headroom, and how to engineer around the limits.
Updated June 1, 2026