Anthropic Error 529: Overloaded

Q: Is a 529 my fault?

No. A 529 (overloaded_error) means Anthropic's API is temporarily over capacity. Your code, key, and rate limits are fine — back off and retry, and add a fallback for critical paths.

Q: How is 529 different from 429?

A 429 (rate_limit_error) means you exceeded your own RPM/ITPM/OTPM and can be fixed by throttling or a higher tier. A 529 means the service itself is saturated; only retrying and load-spreading help — tiering up does nothing for it.

Quick Fix

Retry with exponential backoff and jitter — a 529 is Anthropic’s side being temporarily saturated, not you doing anything wrong.
Don’t confuse it with a 429 — 429 is your rate limit; 529 is Anthropic’s capacity. Both retry, but only 429 is fixed by tiering up.
Add a fallback model or provider for user-facing paths so a capacity blip doesn’t become an outage.

529 Overloaded is Anthropic telling you its API is temporarily over capacity and can’t take your request right now. It is not your rate limit and not a bug in your code. The correct response is to back off and retry — and for anything user-facing, to have a fallback ready so a brief provider blip doesn’t turn into your outage.

What this error means

A 529 carries an overloaded_error and means the service is briefly saturated:

Typical 529 body

{
"type": "error",
"error": { "type": "overloaded_error", "message": "Overloaded" }
}

This is distinct from 429 (rate_limit_error), which means you exceeded your own RPM/ITPM/OTPM. The practical difference: a 429 is fixed by slowing down or raising your tier; a 529 can only be fixed by retrying and spreading load, because the constraint is on Anthropic’s side, not yours.

Anthropic error codes at a glance

Common Anthropic API status codes and how to respond (as of May 2026).
Status	error.type	Cause	Retry?
429	rate_limit_error	You exceeded your RPM / ITPM / OTPM	Yes — backoff; also fixable by tiering up
529	overloaded_error	Anthropic API temporarily over capacity	Yes — backoff + fallback
500	api_error	Unexpected server-side error	Yes — backoff
400	invalid_request_error	Malformed request (e.g. missing max_tokens)	No — fix the request
401	authentication_error	Missing / wrong x-api-key	No — fix the credential

Common causes

Why 529s cluster

They are correlated with overall demand, not with your individual traffic — which is why they tend to arrive in bursts.

Provider-side demand spikes. Popular models get saturated at peak hours; capacity is shared across all customers.
Large or long-running requests are more likely to be shed when the service is under pressure.
Retry storms from your own fleet. A naive retry-immediately loop across many workers amplifies a brief blip into sustained 529s — you become part of the overload.

How to fix it

The baseline is capped exponential backoff with jitter. The official SDKs already retry 429, 529, and 5xx automatically, but you’ll usually want your own outer loop for control and observability.

Python — backoff for 529 (and 429/5xx) with the Anthropic SDK

import anthropic, time, random

client = anthropic.Anthropic(max_retries=4)  # SDK retries 429/529/5xx itself

RETRYABLE = (anthropic.RateLimitError,          # 429
           anthropic.InternalServerError)      # 5xx / 529 surface here

def message_with_backoff(messages, model="claude-sonnet-4-6", max_attempts=6):
  for attempt in range(max_attempts):
      try:
          return client.messages.create(
              model=model, max_tokens=1024, messages=messages,
          )
      except RETRYABLE as e:
          if attempt == max_attempts - 1:
              raise
          # Full jitter: random point in [0, 2**attempt] seconds, capped.
          delay = min(2 ** attempt, 30) * random.random()
          time.sleep(delay)

resp = message_with_backoff([{"role": "user", "content": "Hello"}])
print(resp.content[0].text)

JS — retry 529 with full jitter

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ maxRetries: 4 });
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

async function messageWithBackoff(messages, { model = "claude-sonnet-4-6", maxAttempts = 6 } = {}) {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
  try {
    return await client.messages.create({ model, max_tokens: 1024, messages });
  } catch (err) {
    const retryable = err?.status === 529 || err?.status === 429 || err?.status >= 500;
    if (!retryable || attempt === maxAttempts - 1) throw err;
    const delayMs = Math.min(2 ** attempt, 30) * 1000 * Math.random();
    await sleep(delayMs);
  }
}
}

console.log((await messageWithBackoff([{ role: "user", content: "Hello" }])).content[0].text);

Add a fallback for user-facing paths

Backoff hides short blips, but a sustained overload still fails. For anything a user is waiting on, fall back — to another Claude model, or another provider entirely:

Python — fall back across models, then providers, on 529

import anthropic

anthropic_client = anthropic.Anthropic(max_retries=2)

def generate(prompt):
  # 1) Try the preferred Claude model.
  for model in ("claude-sonnet-4-6", "claude-haiku-4-5"):
      try:
          r = anthropic_client.messages.create(
              model=model, max_tokens=1024,
              messages=[{"role": "user", "content": prompt}],
          )
          return r.content[0].text
      except (anthropic.RateLimitError, anthropic.InternalServerError):
          continue  # 529/429/5xx — try the next option

  # 2) Last resort: a different provider (see the migration guide for the wrapper).
  from openai import OpenAI
  r = OpenAI().chat.completions.create(
      model="gpt-4o-mini",
      messages=[{"role": "user", "content": prompt}],
  )
  return r.choices[0].message.content

print(generate("Summarize the CAP theorem in one sentence."))

Add a circuit breaker so you stop hammering

When 529s persist, stop sending for a cool-down window instead of retrying every request — this protects both you and the service:

Python — minimal circuit breaker around the call

import time

class Breaker:
  def __init__(self, threshold=5, cooldown=20):
      self.fails, self.threshold, self.cooldown, self.open_until = 0, threshold, cooldown, 0
  def allow(self):
      return time.monotonic() >= self.open_until
  def record(self, ok):
      if ok:
          self.fails = 0
      else:
          self.fails += 1
          if self.fails >= self.threshold:
              self.open_until = time.monotonic() + self.cooldown

breaker = Breaker()

def call(fn):
  if not breaker.allow():
      raise RuntimeError("circuit open — backing off provider entirely")
  try:
      out = fn(); breaker.record(True); return out
  except Exception:
      breaker.record(False); raise

How to prevent it

Cap and jitter retries so your own fleet doesn’t amplify the overload into a retry storm.
Add a fallback (another model or provider) for user-facing requests — see the provider-abstraction pattern in the OpenAI → Anthropic migration guide.
Move non-urgent work to the Message Batches API, which has separate capacity and keeps bulk jobs out of your live path.
Shift batch jobs off-peak to reduce overlap with demand spikes.
Monitor the 529 rate as a distinct metric from 429 so you can tell provider weather from your own over-sending.

What to do next

Wrap calls in capped backoff with jitter (code above).
For user-facing paths, add a model/provider fallback.
If 529s persist, add a circuit breaker and check Anthropic’s status page.
Also seeing 429s? That’s your own limit — read the Anthropic Claude rate limits reference. For the OpenAI equivalent of throttling, see the OpenAI 429 fix.

Frequently asked questions

Is a 529 my fault?

No. A 529 (overloaded_error) means Anthropic's API is temporarily over capacity. Your code, key, and rate limits are fine — back off and retry, and add a fallback for critical paths.

How is 529 different from 429?

A 429 (rate_limit_error) means you exceeded your own RPM/ITPM/OTPM and can be fixed by throttling or a higher tier. A 529 means the service itself is saturated; only retrying and load-spreading help — tiering up does nothing for it.

Will upgrading my plan reduce 529s?

Not directly — 529 is about provider-side capacity, not your account tier. The durable fixes are backoff with jitter plus a fallback model or provider for requests a user is waiting on.

How long should I back off on a 529?

Use full jitter exponential backoff: a random delay in [0, 2^attempt] seconds, capped around 30s, over ~5-6 attempts. Jitter is important — synchronized retries across your fleet re-create the overload.

Should I retry 529 forever?

No. Cap attempts and add a circuit breaker so that when overloads persist you stop sending for a cool-down window. Endless retries make you part of the problem and can mask a real outage from your monitoring.

Do the Anthropic SDKs retry 529 automatically?

Yes — the official Python and TypeScript SDKs retry 429, 529, and 5xx with backoff (configurable via max_retries / maxRetries). An outer loop is still useful for fallback logic, metrics, and circuit breaking.

What this error means

Anthropic error codes at a glance

Common causes

How to fix it

Add a fallback for user-facing paths

Add a circuit breaker so you stop hammering

How to prevent it

What to do next

Frequently asked questions

Related pages