AI API rate limits, by provider

Rate limits are the invisible ceiling every AI app eventually hits. Providers cap you on requests per minute (RPM), tokens per minute (TPM), and sometimes requests or tokens per day — and each provider counts, resets, and surfaces those limits differently. A request that sails through at 2 a.m. can 429 at peak traffic because you crossed a TPM line you didn't know existed.

These references lay out the actual numbers per provider and tier, explain which response headers tell you how much headroom is left (x-ratelimit-remaining-* and friends), and show how the limits scale as you move up usage tiers. Where a provider gates higher limits behind spend or time-on-platform, we spell out the checklist. Treat every number as "as of" its stamped date — providers raise and reshuffle limits frequently, so confirm against your own dashboard before you size a workload.