AI API Cost Calculator

AI API cost calculator

Estimate and compare monthly spend across frontier models at your real traffic.

Rates as of June 2026

Monthly requests

1M requests / month

Avg input tokens / request

≈ 750 words in

Avg output tokens / request

≈ 225 words out

Workload presets

Models to compare

OpenAI 3 models toggle all

GPT-5.5

GPT-5.4

GPT-5.4 mini

Anthropic 3 models toggle all

Claude Opus 4.8

Claude Sonnet 4.6

Claude Haiku 4.5

Google 3 models toggle all

Gemini 3.5 Flash

Gemini 2.5 Pro

Gemini 2.5 Flash

DeepSeek 2 models toggle all

DeepSeek-V4-Flash

DeepSeek-V4-Pro

Cheapest option

DeepSeek DeepSeek-V4-Flash

$224.00 /mo

Potential saving

$13,776/mo

Switching from the priciest pick, GPT-5.5 ($14,000/mo), to DeepSeek-V4-Flash cuts spend by 98%.

Model	Input /1M	Output /1M	Monthly cost ↑
1 DeepSeek-V4-Flash Cheapest DeepSeek	$0.14	$0.28	$224.00baseline
2 DeepSeek-V4-Pro DeepSeek	$0.43	$0.87	$696.00+211%
3 Gemini 2.5 Flash Google	$0.30	$2.50	$1,050.00+369%
4 GPT-5.4 mini OpenAI	$0.75	$4.50	$2,100.00+838%
5 Claude Haiku 4.5 Anthropic	$1.00	$5.00	$2,500.00+1,016%
6 Gemini 3.5 Flash Google	$1.50	$9.00	$4,200.00+1,775%
7 Gemini 2.5 Pro Google	$1.25	$10.00	$4,250.00+1,797%
8 GPT-5.4 OpenAI	$2.50	$15.00	$7,000.00+3,025%
9 Claude Sonnet 4.6 Anthropic	$3.00	$15.00	$7,500.00+3,248%
10 Claude Opus 4.8 Anthropic	$5.00	$25.00	$12,500.00+5,480%
11 GPT-5.5 OpenAI	$5.00	$30.00	$14,000.00+6,150%

Input cost Output cost Bars scaled to the most expensive selected model

monthly = (requests × input_tokens × input_rate ÷ 1M) + (requests × output_tokens × output_rate ÷ 1M)

Standard input/output list rates, as of June 2026 (last verified 2026‑06‑06). Prices change often and exclude batch, cached‑input, and volume discounts — confirm on each provider's pricing page before budgeting. For deeper breakdowns see our pricing comparisons.

Estimating what a large language model will cost in production is deceptively hard. The sticker price — dollars per million tokens — is only one of several moving parts. Your real bill depends on how many requests you serve, how long your prompts are, how much the model writes back, and the fact that output tokens usually cost three to five times more than input tokens. Two models with near-identical headline rates can land far apart once your input/output ratio enters the picture, and a "cheap" model that needs longer prompts or more retries can quietly cost more than a pricier one.

This calculator cuts through that. Enter your monthly request volume and average input and output tokens per request, pick the models you're weighing across OpenAI, Anthropic, Google, and DeepSeek, and it shows the estimated monthly cost for each, side by side — with the cheapest option highlighted, the input-vs-output cost split, and a "+%" delta so you can see how much you'd pay above the floor. It runs entirely in your browser; nothing is sent anywhere. Use it to sanity-check a budget, compare a potential migration, or decide whether routing cheap tasks to a smaller model is worth it. Prices are illustrative and dated — always confirm against official pricing before you commit spend.

How to use this calculator

Enter your monthly request volume. The total number of API calls you expect per month. If you think in requests per second, multiply by ~2.6 million for a steady month.
Set average input tokens per request. Everything you send: system prompt, context/RAG payload, few-shot examples, and the user message. Long context drives this up fast.
Set average output tokens per request. What the model generates back. This is the expensive half, so estimate it carefully — and trim it where you can.
Tick the models to compare. The table re-sorts instantly, cheapest first, and flags the lowest-cost option.
Read the delta. The "+%" next to each model shows how much more it costs than the cheapest, so trade-offs (quality vs. price) are obvious at a glance.

Get your token numbers from the usage object every provider returns on a real sample of traffic, then average. Guessing is fine for a first pass — just refine once you have data.

Tips to reduce AI API costs

Once the calculator shows where your money goes, these levers move the bill the most — roughly in order of impact:

1. Right-size the model

The biggest savings come from not over-paying for capability you don't need. Route easy, high-volume tasks (classification, extraction, short answers) to a mini/flash-tier model and reserve flagship models for genuinely hard work. A two-tier setup often beats picking a single model for everything — see our OpenAI vs Anthropic pricing comparison for worked examples.

2. Use prompt caching

If you send the same long prefix on every call — a big system prompt, few-shot examples, a reused document — mark it cacheable. Cached input is billed at a steep discount (often 50–90% off), which is pure savings on input-heavy workloads. Output cost is unaffected.

3. Cut output length

Because output is the priciest tokens, shorter responses save the most. Ask for concise answers, set a realistic max_tokens, request structured/JSON output instead of prose where you can, and stop the model from "thinking out loud" when you only need the result.

4. Trim and compress input

Don't stuff the whole knowledge base into context. Retrieve only the relevant chunks, drop stale conversation history, and summarize long threads. Every token you don't send is a token you don't pay for — on every request, forever.

5. Batch non-realtime work

For jobs that don't need an instant response (bulk summarization, backfills, evals), batch endpoints typically apply around a 50% discount and use separate, higher rate limits — a double win.

6. Avoid silent retry waste

Retries on rate limits and timeouts cost real tokens. Cap and jitter your backoff so a blip doesn't turn into a billing spike — see the 429 rate-limit guide.

AI API cost calculator — FAQ

How does the AI API cost calculator work?

It multiplies your monthly request volume by your average input and output tokens, then applies each model's per-million-token input and output rates. Monthly cost = (requests × input tokens ÷ 1M × input rate) + (requests × output tokens ÷ 1M × output rate). It compares every selected model side by side and highlights the cheapest.

Are the prices accurate and up to date?

The prices are illustrative and stamped with an 'as of' date. AI providers change pricing frequently, so always confirm against each provider's official pricing page before budgeting. The numbers live in a single pricing.json file so they're easy to refresh.

Why is output so much more expensive than input?

Generating tokens is more compute-intensive than reading them, so providers typically price output 3–5× higher than input. That's why output-heavy workloads (agents, long drafting) cost far more than input-heavy ones (classification, short answers) at the same request volume.

What are input and output tokens?

Input tokens are everything you send — system prompt, context, user message, documents. Output tokens are what the model generates back. As a rough rule, one token is about four characters or three-quarters of a word in English, but it varies by tokenizer.

How do I estimate my average token counts?

Log the usage object the APIs return (prompt/input tokens and completion/output tokens) over a representative sample of real traffic, then average. If you can't measure yet, estimate: count the words in a typical prompt and expected answer, then divide by ~0.75 to approximate tokens.

Does the calculator account for prompt caching or batch discounts?

Not yet — it uses standard input/output rates so comparisons stay apples-to-apples. Prompt caching can cut repeated-prefix input costs by 50–90%, and batch endpoints often halve cost for non-realtime work, so your real bill may be lower. See our pricing guides for how those discounts change the math.

Which AI model is cheapest?

It depends on your input/output ratio. Mini/flash-tier models (GPT-4o-mini, Gemini Flash) and DeepSeek-V3 are usually cheapest per token, but a flagship model can win on total cost if it needs fewer retries or shorter prompts. Run your own numbers in the calculator rather than assuming.

Is this calculator free?

Yes, completely free and runs entirely in your browser — no sign-up, and your numbers never leave your device.

Can one endpoint route to the cheapest model automatically?

Yes. A router like OpenRouter exposes an OpenAI-compatible endpoint across many providers and can route each request to the cheapest option that meets your constraints, without you maintaining multiple SDKs. You can also build equivalent fallback logic yourself.