- Retry with exponential backoff and jitter — a
529is Anthropic’s side being temporarily saturated, not you doing anything wrong. - Don’t confuse it with a
429—429is your rate limit;529is Anthropic’s capacity. Both retry, but only429is fixed by tiering up. - Add a fallback model or provider for user-facing paths so a capacity blip doesn’t become an outage.
529 Overloaded is Anthropic telling you its API is temporarily over capacity and can’t take your request right now. It is not your rate limit and not a bug in your code. The correct response is to back off and retry — and for anything user-facing, to have a fallback ready so a brief provider blip doesn’t turn into your outage.
What this error means
A 529 carries an overloaded_error and means the service is briefly saturated:
{
"type": "error",
"error": { "type": "overloaded_error", "message": "Overloaded" }
} This is distinct from 429 (rate_limit_error), which means you exceeded your own RPM/ITPM/OTPM. The practical difference: a 429 is fixed by slowing down or raising your tier; a 529 can only be fixed by retrying and spreading load, because the constraint is on Anthropic’s side, not yours.
Anthropic error codes at a glance
| Status | error.type | Cause | Retry? |
|---|---|---|---|
| 429 | rate_limit_error | You exceeded your RPM / ITPM / OTPM | Yes — backoff; also fixable by tiering up |
| 529 | overloaded_error | Anthropic API temporarily over capacity | Yes — backoff + fallback |
| 500 | api_error | Unexpected server-side error | Yes — backoff |
| 400 | invalid_request_error | Malformed request (e.g. missing max_tokens) | No — fix the request |
| 401 | authentication_error | Missing / wrong x-api-key | No — fix the credential |
Common causes
They are correlated with overall demand, not with your individual traffic — which is why they tend to arrive in bursts.
- Provider-side demand spikes. Popular models get saturated at peak hours; capacity is shared across all customers.
- Large or long-running requests are more likely to be shed when the service is under pressure.
- Retry storms from your own fleet. A naive retry-immediately loop across many workers amplifies a brief blip into sustained 529s — you become part of the overload.
How to fix it
The baseline is capped exponential backoff with jitter. The official SDKs already retry 429, 529, and 5xx automatically, but you’ll usually want your own outer loop for control and observability.
import anthropic, time, random
client = anthropic.Anthropic(max_retries=4) # SDK retries 429/529/5xx itself
RETRYABLE = (anthropic.RateLimitError, # 429
anthropic.InternalServerError) # 5xx / 529 surface here
def message_with_backoff(messages, model="claude-sonnet-4-6", max_attempts=6):
for attempt in range(max_attempts):
try:
return client.messages.create(
model=model, max_tokens=1024, messages=messages,
)
except RETRYABLE as e:
if attempt == max_attempts - 1:
raise
# Full jitter: random point in [0, 2**attempt] seconds, capped.
delay = min(2 ** attempt, 30) * random.random()
time.sleep(delay)
resp = message_with_backoff([{"role": "user", "content": "Hello"}])
print(resp.content[0].text) import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ maxRetries: 4 });
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
async function messageWithBackoff(messages, { model = "claude-sonnet-4-6", maxAttempts = 6 } = {}) {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
try {
return await client.messages.create({ model, max_tokens: 1024, messages });
} catch (err) {
const retryable = err?.status === 529 || err?.status === 429 || err?.status >= 500;
if (!retryable || attempt === maxAttempts - 1) throw err;
const delayMs = Math.min(2 ** attempt, 30) * 1000 * Math.random();
await sleep(delayMs);
}
}
}
console.log((await messageWithBackoff([{ role: "user", content: "Hello" }])).content[0].text); Add a fallback for user-facing paths
Backoff hides short blips, but a sustained overload still fails. For anything a user is waiting on, fall back — to another Claude model, or another provider entirely:
import anthropic
anthropic_client = anthropic.Anthropic(max_retries=2)
def generate(prompt):
# 1) Try the preferred Claude model.
for model in ("claude-sonnet-4-6", "claude-haiku-4-5"):
try:
r = anthropic_client.messages.create(
model=model, max_tokens=1024,
messages=[{"role": "user", "content": prompt}],
)
return r.content[0].text
except (anthropic.RateLimitError, anthropic.InternalServerError):
continue # 529/429/5xx — try the next option
# 2) Last resort: a different provider (see the migration guide for the wrapper).
from openai import OpenAI
r = OpenAI().chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
)
return r.choices[0].message.content
print(generate("Summarize the CAP theorem in one sentence.")) Add a circuit breaker so you stop hammering
When 529s persist, stop sending for a cool-down window instead of retrying every request — this protects both you and the service:
import time
class Breaker:
def __init__(self, threshold=5, cooldown=20):
self.fails, self.threshold, self.cooldown, self.open_until = 0, threshold, cooldown, 0
def allow(self):
return time.monotonic() >= self.open_until
def record(self, ok):
if ok:
self.fails = 0
else:
self.fails += 1
if self.fails >= self.threshold:
self.open_until = time.monotonic() + self.cooldown
breaker = Breaker()
def call(fn):
if not breaker.allow():
raise RuntimeError("circuit open — backing off provider entirely")
try:
out = fn(); breaker.record(True); return out
except Exception:
breaker.record(False); raise How to prevent it
- Cap and jitter retries so your own fleet doesn’t amplify the overload into a retry storm.
- Add a fallback (another model or provider) for user-facing requests — see the provider-abstraction pattern in the OpenAI → Anthropic migration guide.
- Move non-urgent work to the Message Batches API, which has separate capacity and keeps bulk jobs out of your live path.
- Shift batch jobs off-peak to reduce overlap with demand spikes.
- Monitor the 529 rate as a distinct metric from 429 so you can tell provider weather from your own over-sending.
What to do next
- Wrap calls in capped backoff with jitter (code above).
- For user-facing paths, add a model/provider fallback.
- If 529s persist, add a circuit breaker and check Anthropic’s status page.
- Also seeing
429s? That’s your own limit — read the Anthropic Claude rate limits reference. For the OpenAI equivalent of throttling, see the OpenAI 429 fix.
Frequently asked questions
Is a 529 my fault?
overloaded_error) means Anthropic's API is temporarily over capacity. Your code, key, and rate limits are fine — back off and retry, and add a fallback for critical paths.How is 529 different from 429?
rate_limit_error) means you exceeded your own RPM/ITPM/OTPM and can be fixed by throttling or a higher tier. A 529 means the service itself is saturated; only retrying and load-spreading help — tiering up does nothing for it.Will upgrading my plan reduce 529s?
How long should I back off on a 529?
Should I retry 529 forever?
Do the Anthropic SDKs retry 529 automatically?
max_retries / maxRetries). An outer loop is still useful for fallback logic, metrics, and circuit breaking.