Migrating from OpenAI to Anthropic Claude: A Practical Guide

Moving from OpenAI’s Chat Completions to Anthropic’s Messages API is mostly a mechanical remap — but a handful of differences (system prompts, a required max_tokens, tool schemas, streaming events) will bite if you miss them. This guide maps the concepts field by field, gives you before/after code for each surface, and ends with a rollout checklist so you can cut over without crossing your fingers.

API concept mapping

OpenAI Chat Completions → Anthropic Messages (as of June 2026).
ConceptOpenAIAnthropic
Auth header Authorization: Bearer <key>x-api-key: <key> + anthropic-version
Endpoint POST /v1/chat/completionsPOST /v1/messages
System prompt messages[] with role: systemtop-level system parameter
Max output tokens max_tokens (optional)max_tokens (REQUIRED)
Assistant text choices[0].message.contentcontent[0].text
Stop reason choices[0].finish_reasonstop_reason
Token usage usage.prompt/completion_tokensusage.input_tokens/output_tokens
Tool calling tools + message.tool_callstools + tool_use content blocks
Tool result role: tool messageuser message with tool_result block
Streaming SSE deltas (choices[].delta)typed events (content_block_delta)
The three that break builds
  • max_tokens is required on Anthropic — omit it and the request errors with a 400.
  • The system prompt is a top-level field, not a message with role: "system".
  • The response shape differs: read content[0].text, not choices[0].message.content.

Basic chat: before / after

Before — OpenAI Chat Completions
from openai import OpenAI
client = OpenAI()

resp = client.chat.completions.create(
  model="gpt-4o",
  messages=[
      {"role": "system", "content": "You are concise."},
      {"role": "user", "content": "Explain rate limits in one line."},
  ],
)
print(resp.choices[0].message.content)
After — Anthropic Messages
import anthropic
client = anthropic.Anthropic()

resp = client.messages.create(
  model="claude-sonnet-4-6",
  max_tokens=1024,                      # required
  system="You are concise.",            # top-level, not a message
  messages=[
      {"role": "user", "content": "Explain rate limits in one line."},
  ],
)
print(resp.content[0].text)               # different response shape

Streaming: before / after

OpenAI streams SSE chunks with choices[].delta; Anthropic streams typed events. The high-level SDK helpers smooth this over:

Before — OpenAI streaming
stream = client.chat.completions.create(
  model="gpt-4o", stream=True,
  messages=[{"role": "user", "content": "Count to 5."}],
)
for chunk in stream:
  delta = chunk.choices[0].delta.content
  if delta:
      print(delta, end="", flush=True)
After — Anthropic streaming (text helper)
with client.messages.stream(
  model="claude-sonnet-4-6", max_tokens=256,
  messages=[{"role": "user", "content": "Count to 5."}],
) as stream:
  for text in stream.text_stream:     # yields incremental text
      print(text, end="", flush=True)
  final = stream.get_final_message()  # full message when done

Tool / function calling: before / after

Both support tools, but the request schema and the response/return shapes differ. OpenAI returns tool_calls; Anthropic returns tool_use content blocks and expects results back as a tool_result block in a user message.

Before — OpenAI tools
tools = [{
  "type": "function",
  "function": {
      "name": "get_weather",
      "description": "Get weather for a city",
      "parameters": {"type": "object",
                     "properties": {"city": {"type": "string"}},
                     "required": ["city"]},
  },
}]
r = client.chat.completions.create(model="gpt-4o", tools=tools,
      messages=[{"role": "user", "content": "Weather in Paris?"}])
call = r.choices[0].message.tool_calls[0]   # .function.name / .function.arguments
After — Anthropic tools
tools = [{
  "name": "get_weather",
  "description": "Get weather for a city",
  "input_schema": {"type": "object",                # note: input_schema, not parameters
                   "properties": {"city": {"type": "string"}},
                   "required": ["city"]},
}]
r = client.messages.create(model="claude-sonnet-4-6", max_tokens=1024, tools=tools,
      messages=[{"role": "user", "content": "Weather in Paris?"}])

# Claude returns a tool_use block; return the result as a user tool_result block:
tool_use = next(b for b in r.content if b.type == "tool_use")
followup = client.messages.create(
  model="claude-sonnet-4-6", max_tokens=1024, tools=tools,
  messages=[
      {"role": "user", "content": "Weather in Paris?"},
      {"role": "assistant", "content": r.content},
      {"role": "user", "content": [{
          "type": "tool_result",
          "tool_use_id": tool_use.id,
          "content": "18°C, clear",
      }]},
  ],
)
print(followup.content[0].text)

A thin abstraction for safe rollout

Wrap both providers behind one function so you can A/B outputs, cut over per-request, and keep the wrapper afterward for failover (handy when Anthropic returns a 529 Overloaded):

Python — provider-agnostic wrapper
def generate(prompt, system="", provider="anthropic"):
  if provider == "openai":
      from openai import OpenAI
      msgs = ([{"role": "system", "content": system}] if system else []) + \
             [{"role": "user", "content": prompt}]
      r = OpenAI().chat.completions.create(model="gpt-4o", messages=msgs)
      return r.choices[0].message.content
  else:
      import anthropic
      r = anthropic.Anthropic().messages.create(
          model="claude-sonnet-4-6", max_tokens=1024,
          system=system, messages=[{"role": "user", "content": prompt}],
      )
      return r.content[0].text

# Flip 'provider' per request to shadow-test outputs before cutting over.
print(generate("Explain rate limits in one line.", "You are concise."))

Behavioral differences to expect

  • Prompts need re-tuning. Claude responds differently to the same prompt — formatting, verbosity, and refusal behavior shift. Don’t expect drop-in parity; budget time to adjust prompts and re-run evals.
  • Token counting differs. The tokenizer isn’t the same, so your per-request token math (and cost/limit budgets) changes — recheck against the Anthropic Claude rate limits.
  • Stop reasons differ. Map finish_reasonstop_reason (stopend_turn, lengthmax_tokens, tool calls → tool_use).
  • Errors differ. Anthropic adds 529 Overloaded on top of 429; handle both with backoff.

Rollout checklist

  • Map every tool schema (parametersinput_schema) and port the tool-result round-trip.
  • Set max_tokens everywhere — it’s required.
  • Move system prompts to the top-level field, not a role: system message.
  • Update response parsing to content[0].text, stop_reason, and usage.input/output_tokens.
  • Re-tune prompts and re-run your eval set; compare quality, not just “it returns something.”
  • Handle 429 and 529 with capped backoff — see the 529 fix.
  • Re-budget tokens and limits against the Claude rate limits reference.
  • Shadow-run both providers on a traffic slice via the wrapper, diff outputs, then cut over gradually.

What to do next

  1. Confirm the economics first with the OpenAI vs Anthropic pricing comparison.
  2. Drop in the wrapper and shadow-test on a small traffic slice.
  3. Work the rollout checklist, re-tuning prompts as you go.
  4. Size throughput with the Claude rate limits reference before full cutover.

Frequently asked questions

Is migrating from OpenAI to Anthropic hard?
It's moderate — mostly a mechanical remap of the endpoint, the system prompt, required max_tokens, and the response shape. The real work is re-tuning prompts, porting tool schemas, and validating quality on your evals before cutover.
What breaks most often during the migration?
Forgetting that max_tokens is required on Anthropic, leaving the system prompt as a role: system message instead of the top-level field, and reading the wrong response path (it's content[0].text, not choices[0].message.content).
How do tools/function calling differ?
Anthropic uses input_schema instead of parameters, returns a tool_use content block instead of tool_calls, and expects the result back as a tool_result block inside a user message rather than a role: tool message.
Can I run both providers at once?
Yes, and you should during rollout. Wrap both behind one function, shadow-run a slice of traffic, diff the outputs, then cut over gradually. Keep the abstraction afterward for provider failover on 429/529.
Will my prompts work the same on Claude?
Not exactly. Claude's formatting, verbosity, and refusal behavior differ from GPT models, so expect to adjust prompts and re-run your eval set rather than copying them verbatim and assuming parity.