OpenAI vs Anthropic Pricing: A Developer's Cost Comparison
The headline “dollars per million tokens” rarely predicts your actual bill. Your input/output ratio, prompt caching, and batch usage move the number more than the sticker rate does — two providers with near-identical headline prices can produce a 3× difference on your invoice. This page compares OpenAI and Anthropic on the things that actually show up on the bill — and benchmarks Google Gemini and DeepSeek alongside them — with worked math you can copy.
AI pricing changes frequently and quietly. Figures below are standard list rates as of June 2026 (the same data behind our cost calculator) — confirm on each provider’s official pricing page before committing a budget.
The cost model: input, output, cache, batch
Four numbers determine your spend per request:
- Input price — per million input tokens (prompt, context, system message).
- Output price — per million generated tokens, usually 3–5× the input price.
- Cached input price — a steep discount for repeated prefixes you mark cacheable.
- Batch discount — typically ~50% off for non-realtime work via batch endpoints.
The single biggest mistake is comparing on input price alone. Output is the expensive half, so a chatbot that reads long context but answers briefly (input-bound) behaves completely differently from an agent that writes long completions (output-bound).
Per-model rates — all four providers
Below are the current standard list rates across OpenAI, Anthropic, Google, and DeepSeek — the same data that powers our cost calculator. Don’t rank on the headline input price; rank on your input/output mix.
| Model | Input / 1M | Output / 1M | Cached input / 1M |
|---|---|---|---|
| OpenAI GPT-5.5 | $5.00 | $30.00 | $0.50 |
| OpenAI GPT-5.4 | $2.50 | $15.00 | $0.25 |
| OpenAI GPT-5.4 mini | $0.75 | $4.50 | $0.075 |
| Anthropic Claude Opus 4.8 | $5.00 | $25.00 | $0.50 |
| Anthropic Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 |
| Anthropic Claude Haiku 4.5 | $1.00 | $5.00 | $0.10 |
| Google Gemini 3.5 Flash | $1.50 | $9.00 | — |
| Google Gemini 2.5 Pro | $1.25 | $10.00 | — |
| Google Gemini 2.5 Flash | $0.30 | $2.50 | — |
| DeepSeek DeepSeek-V4-Flash | $0.14 | $0.28 | $0.0028 |
| DeepSeek DeepSeek-V4-Pro | $0.435 | $0.87 | $0.0036 |
Two things to notice: output costs 3–5× input at every provider, and cached-input rates vary wildly — Anthropic and DeepSeek discount cached reads by ~90%, which can flip the cheapest pick for prompt-heavy, cache-friendly workloads. The mini/flash tiers (GPT‑5.4 mini, Gemini Flash) and DeepSeek run an order of magnitude cheaper than the flagships.
Worked example: an output-light vs output-heavy app
Take 1,000,000 requests/month. Compare two shapes:
- Support bot (input-heavy): 2,000 input tokens, 200 output tokens each.
- Drafting agent (output-heavy): 1,000 input tokens, 1,500 output tokens each.
Monthly token volume:
| Workload | Input tokens | Output tokens |
|---|---|---|
| Support bot | 2.0B | 0.2B |
| Drafting agent | 1.0B | 1.5B |
Now cost both workloads across every model (no caching), using tokens ÷ 1,000,000 × rate, sorted cheapest-first by the support-bot bill:
| Model | Support bot /mo | Drafting agent /mo |
|---|---|---|
| DeepSeek DeepSeek-V4-Flash | $336 | $560 |
| DeepSeek DeepSeek-V4-Pro | $1,044 | $1,740 |
| Google Gemini 2.5 Flash | $1,100 | $4,050 |
| OpenAI GPT-5.4 mini | $2,400 | $7,500 |
| Anthropic Claude Haiku 4.5 | $3,000 | $8,500 |
| Google Gemini 2.5 Pro | $4,500 | $16,250 |
| Google Gemini 3.5 Flash | $4,800 | $15,000 |
| OpenAI GPT-5.4 | $8,000 | $25,000 |
| Anthropic Claude Sonnet 4.6 | $9,000 | $25,500 |
| Anthropic Claude Opus 4.8 | $15,000 | $42,500 |
| OpenAI GPT-5.5 | $16,000 | $50,000 |
Two patterns jump out. First, the mini/flash tiers and DeepSeek run 10–50× cheaper than the flagships at the same traffic. Second, rankings flip between workloads — a model that’s cheap for the read-heavy support bot can fall behind for the output-heavy agent, because output dominates that bill. So don’t crown one “cheapest model”; pick per workload. Want this math at your own traffic? The cost calculator does it live across all 11 models.
How caching and batching change the answer
Prompt caching can cut repeated-prefix input costs by 50–90%. Batch endpoints typically halve total cost for non-realtime work. A provider that looks pricier at sticker price can win once these are applied — always model your own usage with them on.
The support bot above reuses a large system prompt on every call. If 1,800 of its 2,000 input tokens are a cacheable prefix, Anthropic’s cheap cached-input rate ($0.30/1M vs $3.00 base) erases much of its input cost — narrowing or reversing the gap with OpenAI. Caching helps input-bound apps the most; it does nothing for output cost.
For offline jobs (bulk classification, embeddings backfills, nightly summarization), routing through each provider’s batch API roughly halves the bill and uses separate, higher rate limits — a double win that’s independent of which provider is cheaper at sticker price.
A quick cost estimator
def monthly_cost(requests, in_tok, out_tok, in_rate, out_rate,
cached_in_tok=0, cached_rate=0.0, batch_discount=0.0):
billable_in = (in_tok - cached_in_tok) * in_rate + cached_in_tok * cached_rate
per_req = (billable_in + out_tok * out_rate) / 1_000_000
total = per_req * requests
return total * (1 - batch_discount)
# Support bot on Anthropic Sonnet, 1.8k of 2k input cached, realtime (no batch):
print(monthly_cost(
requests=1_000_000, in_tok=2000, out_tok=200,
in_rate=3.00, out_rate=15.00,
cached_in_tok=1800, cached_rate=0.30,
)) # -> dramatically lower than the no-cache $9,000 above When to switch, when to split
- Switch if one provider is meaningfully cheaper for your specific input/output ratio and quality is acceptable for the task. Confirm with the estimator on real token counts, not headline rates.
- Split by routing cheap, tolerant tasks to a mini/Haiku model and reserving flagship/Sonnet for hard ones. This usually saves more than picking a single provider, and a thin abstraction makes it easy — see the wrapper in the OpenAI → Anthropic migration guide.
- Factor in non-token costs: rate-limit headroom (a cheaper provider you constantly 429 against isn’t cheaper), latency, and quality on your eval set.
What to do next
- Plug your real token counts into the estimator above — separate input from output.
- Turn on caching and batching in the model before deciding; they often change the winner.
- Validate quality on your own eval set, not vendor benchmarks.
- Ready to move workloads? Follow the OpenAI → Anthropic migration guide, and size throughput with the OpenAI and Anthropic Claude rate-limit references.