Moving from OpenAI’s Chat Completions to Anthropic’s Messages API is mostly a mechanical remap — but a handful of differences (system prompts, a required max_tokens, tool schemas, streaming events) will bite if you miss them. This guide maps the concepts field by field, gives you before/after code for each surface, and ends with a rollout checklist so you can cut over without crossing your fingers.
API concept mapping
| Concept | OpenAI | Anthropic |
|---|---|---|
| Auth header | Authorization: Bearer <key> | x-api-key: <key> + anthropic-version |
| Endpoint | POST /v1/chat/completions | POST /v1/messages |
| System prompt | messages[] with role: system | top-level system parameter |
| Max output tokens | max_tokens (optional) | max_tokens (REQUIRED) |
| Assistant text | choices[0].message.content | content[0].text |
| Stop reason | choices[0].finish_reason | stop_reason |
| Token usage | usage.prompt/completion_tokens | usage.input_tokens/output_tokens |
| Tool calling | tools + message.tool_calls | tools + tool_use content blocks |
| Tool result | role: tool message | user message with tool_result block |
| Streaming | SSE deltas (choices[].delta) | typed events (content_block_delta) |
max_tokensis required on Anthropic — omit it and the request errors with a400.- The system prompt is a top-level field, not a message with
role: "system". - The response shape differs: read
content[0].text, notchoices[0].message.content.
Basic chat: before / after
from openai import OpenAI
client = OpenAI()
resp = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "Explain rate limits in one line."},
],
)
print(resp.choices[0].message.content) import anthropic
client = anthropic.Anthropic()
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024, # required
system="You are concise.", # top-level, not a message
messages=[
{"role": "user", "content": "Explain rate limits in one line."},
],
)
print(resp.content[0].text) # different response shape Streaming: before / after
OpenAI streams SSE chunks with choices[].delta; Anthropic streams typed events. The high-level SDK helpers smooth this over:
stream = client.chat.completions.create(
model="gpt-4o", stream=True,
messages=[{"role": "user", "content": "Count to 5."}],
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True) with client.messages.stream(
model="claude-sonnet-4-6", max_tokens=256,
messages=[{"role": "user", "content": "Count to 5."}],
) as stream:
for text in stream.text_stream: # yields incremental text
print(text, end="", flush=True)
final = stream.get_final_message() # full message when done Tool / function calling: before / after
Both support tools, but the request schema and the response/return shapes differ. OpenAI returns tool_calls; Anthropic returns tool_use content blocks and expects results back as a tool_result block in a user message.
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]},
},
}]
r = client.chat.completions.create(model="gpt-4o", tools=tools,
messages=[{"role": "user", "content": "Weather in Paris?"}])
call = r.choices[0].message.tool_calls[0] # .function.name / .function.arguments tools = [{
"name": "get_weather",
"description": "Get weather for a city",
"input_schema": {"type": "object", # note: input_schema, not parameters
"properties": {"city": {"type": "string"}},
"required": ["city"]},
}]
r = client.messages.create(model="claude-sonnet-4-6", max_tokens=1024, tools=tools,
messages=[{"role": "user", "content": "Weather in Paris?"}])
# Claude returns a tool_use block; return the result as a user tool_result block:
tool_use = next(b for b in r.content if b.type == "tool_use")
followup = client.messages.create(
model="claude-sonnet-4-6", max_tokens=1024, tools=tools,
messages=[
{"role": "user", "content": "Weather in Paris?"},
{"role": "assistant", "content": r.content},
{"role": "user", "content": [{
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": "18°C, clear",
}]},
],
)
print(followup.content[0].text) A thin abstraction for safe rollout
Wrap both providers behind one function so you can A/B outputs, cut over per-request, and keep the wrapper afterward for failover (handy when Anthropic returns a 529 Overloaded):
def generate(prompt, system="", provider="anthropic"):
if provider == "openai":
from openai import OpenAI
msgs = ([{"role": "system", "content": system}] if system else []) + \
[{"role": "user", "content": prompt}]
r = OpenAI().chat.completions.create(model="gpt-4o", messages=msgs)
return r.choices[0].message.content
else:
import anthropic
r = anthropic.Anthropic().messages.create(
model="claude-sonnet-4-6", max_tokens=1024,
system=system, messages=[{"role": "user", "content": prompt}],
)
return r.content[0].text
# Flip 'provider' per request to shadow-test outputs before cutting over.
print(generate("Explain rate limits in one line.", "You are concise.")) Behavioral differences to expect
- Prompts need re-tuning. Claude responds differently to the same prompt — formatting, verbosity, and refusal behavior shift. Don’t expect drop-in parity; budget time to adjust prompts and re-run evals.
- Token counting differs. The tokenizer isn’t the same, so your per-request token math (and cost/limit budgets) changes — recheck against the Anthropic Claude rate limits.
- Stop reasons differ. Map
finish_reason→stop_reason(stop→end_turn,length→max_tokens, tool calls →tool_use). - Errors differ. Anthropic adds
529 Overloadedon top of429; handle both with backoff.
Rollout checklist
- Map every tool schema (
parameters→input_schema) and port the tool-result round-trip. - Set
max_tokenseverywhere — it’s required. - Move system prompts to the top-level field, not a
role: systemmessage. - Update response parsing to
content[0].text,stop_reason, andusage.input/output_tokens. - Re-tune prompts and re-run your eval set; compare quality, not just “it returns something.”
- Handle 429 and 529 with capped backoff — see the 529 fix.
- Re-budget tokens and limits against the Claude rate limits reference.
- Shadow-run both providers on a traffic slice via the wrapper, diff outputs, then cut over gradually.
What to do next
- Confirm the economics first with the OpenAI vs Anthropic pricing comparison.
- Drop in the wrapper and shadow-test on a small traffic slice.
- Work the rollout checklist, re-tuning prompts as you go.
- Size throughput with the Claude rate limits reference before full cutover.
Frequently asked questions
Is migrating from OpenAI to Anthropic hard?
What breaks most often during the migration?
max_tokens is required on Anthropic, leaving the system prompt as a role: system message instead of the top-level field, and reading the wrong response path (it's content[0].text, not choices[0].message.content).How do tools/function calling differ?
input_schema instead of parameters, returns a tool_use content block instead of tool_calls, and expects the result back as a tool_result block inside a user message rather than a role: tool message.