Part
3
  |  
The API Layer
  |  
Chapter
10

Structured Responses and JSON Mode

Free-text responses are fine for chat. The moment you need to feed Claude's output into another system, you need structure — and hoping the model formats things correctly is not a strategy.
Reading Time
11
mins
BACK TO CLAUDE MASTERCLASS

The trap is treating Claude like a human who happens to know JSON. Developers ask Claude to "return the response as JSON" in natural language, get a well-formatted JSON blob back, and assume the problem is solved. It works in testing. It works on the demo. Then a user submits an edge-case input and Claude wraps the JSON in a polite explanation, adds a markdown code fence, or decides that this particular request deserves a conversational preamble before the data. Your json.loads() call throws a JSONDecodeError, your pipeline halts, and you spend an hour debugging what turns out to be a missing system prompt constraint.

Asking Claude to return JSON is a suggestion. Telling Claude it is a JSON service is a constraint. The difference shows up at 2 a.m. when your parser breaks.

Structured output is not about asking nicely. It's about engineering the request so that Claude produces machine-parseable responses every time, on every input, including the weird ones. This chapter covers the techniques that actually work — from system prompt discipline to schema enforcement — and the failure modes that bite teams who skip them.

Why Structure Matters

Free-text responses are fine when a human reads them. The moment another piece of software reads them, you need guarantees. A downstream function expects a dictionary with specific keys. A database insert expects typed fields. A frontend component expects an array of objects with a known shape.

Claude can produce all of these. The question is whether it produces them reliably — meaning every response, not just the ones you tested.

Consider this scenario: you're building a ticket classification system. Users submit support requests, and Claude categorizes them into a priority level, a department, and a short summary. If Claude returns free text, you need a parsing layer that extracts those fields from natural language — which is its own NLP problem. If Claude returns JSON, the parsing layer is json.loads() and a dictionary lookup. The difference in engineering effort is enormous.

The gap is not just in code — it's in reliability. A free-text parsing layer breaks whenever Claude changes its phrasing. "Priority: High" in one response becomes "This is a high-priority issue" in the next. Both say the same thing; your regex catches the first and misses the second. With JSON, the field is "priority": "high" every time, because the schema constrains the format and Claude follows it.

I've seen this pattern across dozens of integrations. Teams start with free text because it's faster to prototype. They add regex extraction. The regex gets more complex as they encounter edge cases. Eventually the extraction layer has more bugs than the feature it supports. The teams that start with structured output from day one skip that entire arc.

Framework · The Schema Contract · SC

Define the exact JSON schema you expect before writing the prompt. The schema is the contract between your application and Claude. Every field, every type, every constraint. If the schema is ambiguous, the output will be ambiguous. If the schema is precise, Claude matches it with remarkable consistency.

The System Prompt Approach

The most reliable way to get structured JSON from Claude is to make the system prompt absolutely unambiguous about the output format. Not "please return JSON" — a full specification of the schema, the constraints, and what Claude must not do.

import anthropic
import json

client = anthropic.Anthropic()

system_prompt = """You are a JSON-only response service. You must follow these rules exactly:

1. Return ONLY valid JSON. No explanations, no markdown, no code fences, no preamble.
2. Every response must follow this exact schema:
{
  "title": "string — a concise title for the task",
  "category": "string — one of: bug, feature, question, documentation",
  "priority": "string — one of: low, medium, high, critical",
  "summary": "string — one sentence summarizing the core issue",
  "suggested_steps": ["string — each step is one actionable sentence"]
}
3. Infer category and priority from the user's description. Do not ask for clarification.
4. suggested_steps must contain between 2 and 5 items.
5. Do not add any fields beyond those specified."""

response = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    system=system_prompt,
    messages=[
        {
            "role": "user",
            "content": "Our checkout page throws a 500 error when users apply a discount code that has expired. This is happening on every transaction that uses old promo codes from last quarter's campaign."
        }
    ]
)

raw_output = response.content[0].text
parsed = json.loads(raw_output)

print(f"Title: {parsed['title']}")
print(f"Category: {parsed['category']}")
print(f"Priority: {parsed['priority']}")
print(f"Summary: {parsed['summary']}")
print("Steps:")
for i, step in enumerate(parsed['suggested_steps'], 1):
    print(f"  {i}. {step}")

This works. Claude reads the schema, follows the constraints, and returns clean JSON that passes json.loads() without complaint. But notice the level of specificity in the system prompt. It doesn't say "return JSON." It says "return ONLY valid JSON," specifies every field name and type, defines the allowed values for enum fields, sets a range for the array length, and explicitly forbids extra fields. Every constraint you leave out is a degree of freedom Claude fills with its own judgment — which is usually reasonable but not guaranteed.

The system prompt is the cheapest place to enforce structure. Adding a constraint costs you a few tokens of input. Not adding it costs you a parsing failure at 2 a.m. and an engineer's time to debug it.

Defensive Parsing

Even with a strong system prompt, defensive parsing is not optional. Claude is a language model, not a JSON generator. It will produce valid JSON in the vast majority of cases, but edge cases exist: markdown code fences wrapping the output, a stray newline before the opening brace, or a conversational prefix on unusual inputs.

Here's the parsing function I use in every project that expects JSON from Claude:

import json
import re

def parse_claude_json(raw: str) -> dict:
    """Parse JSON from Claude's response, handling common formatting artifacts."""
    # Strip markdown code fences if present
    cleaned = raw.strip()
    cleaned = re.sub(r'^```(?:json)?\s*', '', cleaned)
    cleaned = re.sub(r'\s*```$', '', cleaned)
    cleaned = cleaned.strip()

    try:
        return json.loads(cleaned)
    except json.JSONDecodeError as e:
        raise ValueError(
            f"Failed to parse Claude's response as JSON: {e}\n"
            f"Raw output was:\n{raw}"
        ) from e

The function does three things: strips whitespace, removes markdown code fences (the most common formatting artifact), and attempts to parse. If it fails, it raises a clear error with the raw output included — so you can see exactly what Claude returned instead of guessing.

Why code fences appear

Claude is trained on conversations where JSON is often presented inside markdown code fences for readability. Even with a strong system prompt saying "no code fences," edge cases can trigger them — especially when the user's input contains markdown formatting. The defensive strip costs nothing and catches a real failure mode.

Enforcing Output Format with Prefill

There's a technique that dramatically increases JSON reliability: prefilling Claude's response. You add an assistant message with the opening of the JSON structure, and Claude continues from there.

import anthropic
import json

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    system="You are a data extraction service. Return only valid JSON. No explanations.",
    messages=[
        {
            "role": "user",
            "content": "Extract the entities from this text: 'Dr. Sarah Chen at Stanford published her findings on CRISPR-Cas9 in the journal Nature on March 15, 2024.'"
        },
        {
            "role": "assistant",
            "content": "{"
        }
    ]
)

# Claude continues from the opening brace
raw_json = "{" + response.content[0].text
parsed = json.loads(raw_json)
print(json.dumps(parsed, indent=2))

By providing { as the start of Claude's response, you eliminate any possibility of preamble text, markdown formatting, or conversational hedging. Claude sees that its response has already started with a JSON object and continues accordingly. You prepend the { back to the output and parse it.

Prefilling the response with an opening brace is the simplest reliability trick in the Claude API toolkit. It costs zero extra tokens and eliminates the most common parsing failure.

This technique works for arrays too — prefill with [ if you expect a JSON array. The principle is the same: by controlling the first character of the output, you anchor Claude's formatting for the entire response.

A subtlety to watch for: when you prefill with {, Claude's response starts after the brace. So you need to prepend { when parsing: "{" + response.content[0].text. If you forget this step, you get a JSON string missing its opening brace, and json.loads() fails with a cryptic error about expecting a value. I've seen this exact bug in three different codebases. The fix is always the same one line.

You can extend the prefill technique to include the first key. Prefilling with {"title": anchors both the format and the first field, which is useful when you want to guarantee field ordering or when Claude occasionally starts with a different field than expected. The tradeoff is that longer prefills are more brittle — if you change the schema, you need to update the prefill to match.

Key takeaway

Three layers of defense make JSON output reliable: a strict system prompt defining the schema, a prefilled assistant message anchoring the format, and a defensive parser that strips common artifacts. Use all three. Together, they handle edge cases that any single technique misses.

Schema Validation

Parsing JSON is necessary but not sufficient. A valid JSON document can still have wrong types, missing fields, or unexpected values. Schema validation closes that gap.

import anthropic
import json

client = anthropic.Anthropic()

EXPECTED_FIELDS = {
    "title": str,
    "category": str,
    "priority": str,
    "summary": str,
    "suggested_steps": list,
}

VALID_CATEGORIES = {"bug", "feature", "question", "documentation"}
VALID_PRIORITIES = {"low", "medium", "high", "critical"}

def validate_response(data: dict) -> list[str]:
    """Validate parsed JSON against the expected schema. Returns a list of errors."""
    errors = []

    for field, expected_type in EXPECTED_FIELDS.items():
        if field not in data:
            errors.append(f"Missing required field: {field}")
        elif not isinstance(data[field], expected_type):
            errors.append(f"Field '{field}' should be {expected_type.__name__}, got {type(data[field]).__name__}")

    if data.get("category") not in VALID_CATEGORIES:
        errors.append(f"Invalid category: {data.get('category')}. Must be one of {VALID_CATEGORIES}")

    if data.get("priority") not in VALID_PRIORITIES:
        errors.append(f"Invalid priority: {data.get('priority')}. Must be one of {VALID_PRIORITIES}")

    steps = data.get("suggested_steps", [])
    if not (2 <= len(steps) <= 5):
        errors.append(f"suggested_steps must have 2-5 items, got {len(steps)}")

    return errors

# Use it after parsing
response = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    system="You are a JSON-only ticket classification service. Return valid JSON with fields: title (string), category (bug|feature|question|documentation), priority (low|medium|high|critical), summary (string), suggested_steps (array of 2-5 strings).",
    messages=[
        {"role": "user", "content": "Login page is slow on mobile devices."},
        {"role": "assistant", "content": "{"}
    ]
)

raw = "{" + response.content[0].text
data = json.loads(raw)
errors = validate_response(data)

if errors:
    print("Validation errors:")
    for err in errors:
        print(f"  - {err}")
else:
    print("Valid response:")
    print(json.dumps(data, indent=2))

This pattern — parse, validate, use — is the backbone of every production system that consumes Claude's output programmatically. Skipping validation means trusting that the model always follows the schema perfectly. It usually does. "Usually" is not a production-grade guarantee.

What happens when validation fails? That depends on your application. For a real-time user-facing feature, you might retry the request once with the same input — Claude's output is non-deterministic, so the second attempt often produces valid output even if the first didn't. For a batch pipeline, you might log the failure, skip the item, and flag it for human review. What you should never do is silently pass invalid data downstream. A missing priority field that defaults to None in your database is a bug that surfaces weeks later when a dashboard filter breaks, and by then you've lost the context to diagnose it.

Retry with tighter constraints

When a validation failure occurs, a useful pattern is to retry with the validation errors included in the user message: "Your previous response had these issues: [errors]. Please correct them and return valid JSON." Claude is remarkably good at self-correction when you tell it exactly what went wrong. This costs one extra API call but produces a valid response almost every time.

Choosing the Right Output Format

JSON is not the only structured format Claude can produce. Depending on your downstream consumer, other formats may serve you better.

✕ Use JSON when
  • Output feeds into code or APIs
  • You need typed fields and nested data
  • Multiple systems consume the response
  • You need schema validation
✓ Use other formats when
  • Markdown: output is rendered to humans
  • Bullet points: simple lists, no nesting
  • CSV: tabular data for spreadsheets
  • YAML: config files or human-readable data

The principles are the same regardless of format. Define the structure in the system prompt. Be explicit about what's allowed and what's not. Validate the output before using it. The format changes; the discipline doesn't.

One format I want to highlight specifically: bullet points and numbered lists. These sit between free text and JSON on the structure spectrum. They're human-readable but parseable — you can split on newlines and strip the leading dash or number. For internal tools where the consumer is a dashboard or a simple list renderer, bullet-point output is often the pragmatic choice. You skip the JSON parsing overhead while still getting a consistent, predictable format.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=512,
    system="Return exactly 5 bullet points. Each line must start with '- '. No other formatting. No preamble.",
    messages=[
        {"role": "user", "content": "Key risks of deploying a language model to production without monitoring."}
    ]
)

lines = [line.strip()[2:] for line in response.content[0].text.strip().split("\n") if line.strip().startswith("- ")]
for risk in lines:
    print(f"Risk: {risk}")

The parsing is trivial, the output is clean, and the format is resilient to minor model variations. For cases where JSON feels like overkill, this is the right tool.

When to use Claude's native tool use

Claude has a built-in tool use feature that forces structured output by defining a tool schema. The model returns a structured tool_use content block that matches your schema exactly. For high-reliability structured output, tool use is often more dependable than prompt-based JSON — because the schema is enforced at the API level, not just in the prompt. We'll cover tool use in detail in a later chapter.

Monday-Morning Moves

Write a JSON schema before writing the prompt

Start with the output shape. Define every field name, type, and constraint. Then write the system prompt to enforce that schema. This order — schema first, prompt second — prevents the common mistake of building a prompt and discovering your schema is ambiguous when parsing fails.

Implement the defensive parser from this chapter

Copy the parse_claude_json() function into your project. Use it everywhere you parse Claude's JSON output. The code fence stripping alone will save you from a class of production bugs that only appear on unusual inputs.

Add prefill to every JSON request

Set the assistant message to { (or [ for arrays) in every API call that expects structured output. This single change eliminates preamble text and conversational formatting — the two most common reasons json.loads() fails on Claude's output.

Build a validation function for your schema

Write a function like validate_response() that checks for missing fields, wrong types, and invalid enum values. Run it on every parsed response. Log validation failures separately from parse failures — the diagnostic tells you whether the problem is formatting or content.

Test with adversarial inputs

Feed your structured output pipeline the weirdest inputs you can think of: empty strings, very long text, text in other languages, text with special characters and code blocks. The goal is to find the inputs where Claude deviates from your schema before your users find them for you.

Structured output is an engineering discipline, not a prompt trick. The schema is the contract. The system prompt is the specification. The parser is the enforcer. All three must agree, or the pipeline breaks on the input you didn't test.