AI API Cost Calculator
AI API cost calculator
Estimate and compare monthly spend across frontier models at your real traffic.
| Model | Input /1M | Output /1M | Cost split | Monthly cost ↑ |
|---|---|---|---|---|
|
1
DeepSeek-V4-Flash Cheapest
DeepSeek
|
$0.14 | $0.28 | $224.00baseline | |
|
2
DeepSeek-V4-Pro
DeepSeek
|
$0.43 | $0.87 | $696.00+211% | |
|
3
Gemini 2.5 Flash
Google
|
$0.30 | $2.50 | $1,050.00+369% | |
|
4
GPT-5.4 mini
OpenAI
|
$0.75 | $4.50 | $2,100.00+838% | |
|
5
Claude Haiku 4.5
Anthropic
|
$1.00 | $5.00 | $2,500.00+1,016% | |
|
6
Gemini 3.5 Flash
Google
|
$1.50 | $9.00 | $4,200.00+1,775% | |
|
7
Gemini 2.5 Pro
Google
|
$1.25 | $10.00 | $4,250.00+1,797% | |
|
8
GPT-5.4
OpenAI
|
$2.50 | $15.00 | $7,000.00+3,025% | |
|
9
Claude Sonnet 4.6
Anthropic
|
$3.00 | $15.00 | $7,500.00+3,248% | |
|
10
Claude Opus 4.8
Anthropic
|
$5.00 | $25.00 | $12,500.00+5,480% | |
|
11
GPT-5.5
OpenAI
|
$5.00 | $30.00 | $14,000.00+6,150% |
Estimating what a large language model will cost in production is deceptively hard. The sticker price — dollars per million tokens — is only one of several moving parts. Your real bill depends on how many requests you serve, how long your prompts are, how much the model writes back, and the fact that output tokens usually cost three to five times more than input tokens. Two models with near-identical headline rates can land far apart once your input/output ratio enters the picture, and a "cheap" model that needs longer prompts or more retries can quietly cost more than a pricier one.
This calculator cuts through that. Enter your monthly request volume and average input and output tokens per request, pick the models you're weighing across OpenAI, Anthropic, Google, and DeepSeek, and it shows the estimated monthly cost for each, side by side — with the cheapest option highlighted, the input-vs-output cost split, and a "+%" delta so you can see how much you'd pay above the floor. It runs entirely in your browser; nothing is sent anywhere. Use it to sanity-check a budget, compare a potential migration, or decide whether routing cheap tasks to a smaller model is worth it. Prices are illustrative and dated — always confirm against official pricing before you commit spend.
How to use this calculator
- Enter your monthly request volume. The total number of API calls you expect per month. If you think in requests per second, multiply by ~2.6 million for a steady month.
- Set average input tokens per request. Everything you send: system prompt, context/RAG payload, few-shot examples, and the user message. Long context drives this up fast.
- Set average output tokens per request. What the model generates back. This is the expensive half, so estimate it carefully — and trim it where you can.
- Tick the models to compare. The table re-sorts instantly, cheapest first, and flags the lowest-cost option.
- Read the delta. The "+%" next to each model shows how much more it costs than the cheapest, so trade-offs (quality vs. price) are obvious at a glance.
Get your token numbers from the usage object every provider returns on a real
sample of traffic, then average. Guessing is fine for a first pass — just refine once you
have data.
Tips to reduce AI API costs
Once the calculator shows where your money goes, these levers move the bill the most — roughly in order of impact:
1. Right-size the model
The biggest savings come from not over-paying for capability you don't need. Route easy, high-volume tasks (classification, extraction, short answers) to a mini/flash-tier model and reserve flagship models for genuinely hard work. A two-tier setup often beats picking a single model for everything — see our OpenAI vs Anthropic pricing comparison for worked examples.
2. Use prompt caching
If you send the same long prefix on every call — a big system prompt, few-shot examples, a reused document — mark it cacheable. Cached input is billed at a steep discount (often 50–90% off), which is pure savings on input-heavy workloads. Output cost is unaffected.
3. Cut output length
Because output is the priciest tokens, shorter responses save the most. Ask for concise
answers, set a realistic max_tokens, request structured/JSON output instead of
prose where you can, and stop the model from "thinking out loud" when you only need the
result.
4. Trim and compress input
Don't stuff the whole knowledge base into context. Retrieve only the relevant chunks, drop stale conversation history, and summarize long threads. Every token you don't send is a token you don't pay for — on every request, forever.
5. Batch non-realtime work
For jobs that don't need an instant response (bulk summarization, backfills, evals), batch endpoints typically apply around a 50% discount and use separate, higher rate limits — a double win.
6. Avoid silent retry waste
Retries on rate limits and timeouts cost real tokens. Cap and jitter your backoff so a blip doesn't turn into a billing spike — see the 429 rate-limit guide.
AI API cost calculator — FAQ
How does the AI API cost calculator work?
Are the prices accurate and up to date?
pricing.json file so they're easy to refresh.Why is output so much more expensive than input?
What are input and output tokens?
How do I estimate my average token counts?
usage object the APIs return (prompt/input tokens and completion/output tokens) over a representative sample of real traffic, then average. If you can't measure yet, estimate: count the words in a typical prompt and expected answer, then divide by ~0.75 to approximate tokens.