What Claude Code Does Before the Model Sees Your Message
A deep dive into Claude Code's preprocessing pipeline — the 6-stage system that transforms raw user input into a carefully orchestrated message stream with context injection, file pre-reading, skill discovery, and message normalization. Based on source code analysis.
Abstract:
Claude Code doesn't just forward your prompt to Claude. Between hitting Enter and the model generating a response, your input passes through a 6-stage preprocessing pipeline that assembles system prompts with cache boundaries, captures git state, processes images, runs user-defined hooks, injects ~25 types of contextual attachments, and normalizes the entire message history for API compatibility. This is context engineering at production scale.
Estimated reading time: 10 minutes
What Claude Code Does Before the Model Sees Your Message
I've been studying the Claude Code source to understand how production agent harnesses actually work. The most interesting finding isn't any single feature — it's the sheer amount of preprocessing that happens between you typing a message and the model seeing it.
The model never sees your raw input. It sees a carefully orchestrated message stream.
The Pipeline at a Glance
User types message
│
▼
┌─────────────────────┐
│ 1. System Prompt │ Static cacheable prefix + dynamic session suffix
│ Assembly │ with explicit cache boundary marker
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ 2. User Context │ CLAUDE.md hierarchy + git status snapshot
│ (memoized once) │ + current date
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ 3. Input Processing │ Image resize, bridge safety, ultraplan
│ │ keyword rewrite, slash command routing
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ 4. Hook Execution │ UserPromptSubmit hooks can block,
│ │ inject context, or stop continuation
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ 5. Attachment │ ~25 attachment types computed in parallel:
│ Injection │ files, skills, diffs, tasks, agent mail...
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ 6. Message │ Reorder, merge, strip virtual messages,
│ Normalization │ remove failed media, clean tool refs
└──────────┬──────────┘
│
▼
API call to Claude
Let's walk through each stage.
Stage 1: System Prompt Assembly
The system prompt isn't a static string. It's assembled from ~15 sections, split by an explicit cache boundary marker (__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__).
Before the boundary (static, globally cacheable):
- Identity and intro (including output style if configured)
- System rules (tool permissions, tags, hooks, compression notice)
- "Doing tasks" guidelines (code style, security, no over-engineering)
- Action safety guidelines (reversibility, blast radius)
- Tool usage instructions (prefer dedicated tools over Bash)
- Tone and style
- Output efficiency
After the boundary (dynamic, per-session):
- Session-specific guidance (which tools are available, agent/skill configuration)
- Auto-memory prompt
- Environment info (model name, cwd, platform, OS version)
- Language preference
- MCP server instructions (for connected external tools)
- Scratchpad instructions
- Function-result-clearing hints
- Compaction-related instructions
The boundary exists for prompt prefix caching. Everything before it gets a scope: 'global' cache key. This means the static portion is cached across all users and orgs — only the dynamic tail varies per session. That's a meaningful cost optimization when you're serving millions of sessions.
There's also a CLAUDE_CODE_SIMPLE env var that collapses the entire system prompt to a single line with just the CWD and date. Presumably for testing.
Ant-only sections
Some sections are gated on process.env.USER_TYPE === 'ant' (Anthropic internal users). These include:
- More detailed comment-writing guidelines ("Default to writing no comments")
- False-claims mitigation ("Never claim 'all tests pass' when output shows failures")
- Numeric length anchors ("keep text between tool calls to ≤25 words")
- A verification agent system for non-trivial implementations
- Assertiveness counterweights ("If you notice the user's request is based on a misconception... say so")
This is interesting because it means Anthropic's internal build has noticeably different model behavior tuning than the external release. The @[MODEL LAUNCH] comments throughout suggest these are actively adjusted per model generation.
Stage 2: User Context (Memoized)
Two context blocks are computed once per conversation and cached:
getUserContext()
Walks the directory hierarchy to find and assemble all CLAUDE.md files. These are the user's project-specific instructions. Also injects the current date. Can be disabled with CLAUDE_CODE_DISABLE_CLAUDE_MDS or --bare mode.
The result gets cached in a side channel for the "yolo classifier" (the auto-mode permission classifier) — which reads it without importing the claudemd module directly, avoiding an import cycle.
getSystemContext()
Captures a git status snapshot: current branch, main/default branch, git user name, short status (truncated at 2K chars), and last 5 commits. This runs once and doesn't update during the conversation — it's a snapshot.
There's also a system prompt injection mechanism gated behind BREAK_CACHE_COMMAND that can bust the prompt cache. The comment says "ant-only, ephemeral debugging state."
Stage 3: User Input Processing
processUserInput() is where raw input gets transformed. The processing order matters:
-
Image handling — Images (both pasted and in content blocks) are resized and downsampled via
maybeResizeAndDownsampleImageBlock(). Metadata (dimensions, source paths) is collected for a hiddenisMetamessage. -
Bridge safety — Remote inputs (from mobile/web clients) get slash commands filtered. Only "bridge-safe" commands pass through. Unsafe commands get a polite error:
"/<command> isn't available over Remote Control."Unknown/fooinputs fall through as plain text — "A mobile user typing '/shrug' shouldn't see 'Unknown skill'." -
Ultraplan keyword detection — If the user's input contains a special keyword (detected on the pre-expansion input so pasted content can't accidentally trigger it), the entire input gets rewritten as a
/ultraplancommand. This is gated behind aULTRAPLANfeature flag and only works in interactive prompt mode. -
Slash command routing —
/prefixed input routes to command handlers. There are 70+ commands. -
Attachment extraction — For non-slash-command inputs, the system extracts
@-mentioned files, MCP resource references, and IDE selections. These become attachment messages.
Stage 4: Hook Execution
After input processing, UserPromptSubmit hooks run. These are user-defined shell commands configured in settings. The hook system can:
- Block the prompt entirely — returns a system warning message, erasing the original input
- Prevent continuation — keeps the prompt in context but stops processing
- Inject additional context — hook output gets attached as context (truncated at 10,000 characters)
Hook output is truncated at MAX_HOOK_OUTPUT_LENGTH = 10000 characters to prevent context bloat.
This is where CI/CD-style guardrails can intercept prompts. A user could, for example, run a linter on the prompt or check it against a policy before it reaches the model.
Stage 5: The Attachment System
This is the most fascinating stage. Every turn, ~25 attachment types are computed in parallel and injected as AttachmentMessages into the conversation. The model sees these as context, but they're invisible to the user in the transcript.
Here's the full inventory:
User Input Attachments
| Type | What it does |
|---|---|
at_mentioned_files | Reads files referenced with @path/to/file syntax |
mcp_resources | Resolves MCP resource URI references |
agent_mentions | Detects @agent-<name> syntax for subagent routing |
skill_discovery | Runs a Haiku-based search to surface relevant skills for the current input |
Per-Turn Context Attachments
| Type | What it does |
|---|---|
queued_commands | Drains queued commands (agent-scoped) |
date_change | Notifies when the calendar date rolls over mid-session |
ultrathink_effort | Adjusts thinking effort based on input signals |
deferred_tools_delta | Announces tools that became available or were removed since last turn |
agent_listing_delta | Announces agent definition changes |
mcp_instructions_delta | Announces MCP server connections/disconnections |
companion_intro | Injects the buddy/companion sprite context (feature-flagged) |
changed_files | Detects files modified since the last turn — so the model knows what changed |
nested_memory | Loads CLAUDE.md files from newly-accessed directories |
dynamic_skill | Loads skill definitions relevant to current context |
skill_listing | Full skill metadata listing |
plan_mode / plan_mode_exit | Plan mode state management |
auto_mode / auto_mode_exit | Auto-mode (YOLO) state management |
todo_reminders / task_reminders | Reminds about incomplete tasks |
teammate_mailbox | Inter-agent messages in swarm mode |
team_context | Shared team state in swarm mode |
agent_pending_messages | Pending messages for agent coordination |
critical_system_reminder | High-priority system state |
compaction_reminder | Hints when context is approaching limits |
context_efficiency | Snip-based context efficiency metrics |
The skill_discovery attachment is particularly clever. It runs a Haiku call to search for relevant skills based on the user's input — essentially using a small model to curate context for the large model. The comment notes that "97% of those Haiku calls found nothing in prod," so they moved inter-turn discovery to an async prefetch to avoid blocking.
The changed_files attachment means the model is always aware of what files changed between turns, even if it didn't make the changes itself. This is how it stays aware of external edits.
The ordering matters too: user input attachments run first, then per-turn attachments. This ensures that @-mentioned files trigger nested_memory loading for their directories.
Stage 6: Message Normalization for API
The final transformation before the API call handles compatibility and cleanup:
-
Attachment reordering — Attachment messages get bubbled up past user messages until they hit a tool_result or assistant message. This ensures context appears before the prompt that needs it.
-
Virtual message stripping — Display-only messages (like REPL inner tool calls marked
isVirtual) are removed. These exist for the CLI UI but shouldn't reach the model. -
Error-based content stripping — If a PDF was too large, password-protected, or invalid, or if an image exceeded size limits, the system finds the offending block in the preceding user message and strips it. Specific error types map to specific block types to remove.
-
Consecutive user message merging — The Anthropic first-party API supports multiple consecutive user messages, but Bedrock doesn't. Messages get merged for compatibility.
-
Tool reference cleanup — When tool search is enabled,
tool_referenceblocks for tools that no longer exist (e.g., disconnected MCP servers) get stripped. When tool search is disabled, alltool_referenceblocks are stripped entirely. -
System message conversion —
local_commandsystem messages (from slash command output) get converted to user messages so the model can reference previous command output in later turns.
What This Tells Us About Production Agent Systems
A few takeaways from studying this pipeline:
Context engineering is the real product. The model is powerful, but the value of Claude Code is in what happens before and after the model runs. The attachment system alone — dynamically enriching every turn with file changes, skill suggestions, task reminders, and inter-agent messages — is a substantial engineering effort.
Caching is a first-class architectural concern. The system prompt cache boundary, memoized context, and the careful separation of static vs. dynamic content all serve prompt prefix caching. When you're serving at scale, cache hit rates directly impact cost and latency.
Feature flags enable gradual rollout of model behavior changes. The @[MODEL LAUNCH] comments, USER_TYPE gates, and bun:bundle feature flags mean different users can get meaningfully different system prompts. Anthropic can A/B test prompt changes on internal users before rolling them out externally.
The attachment system is an implicit RAG pipeline. Rather than a traditional retrieval-augmented generation setup with vector embeddings and similarity search, Claude Code uses deterministic, rule-based context injection. @-mentioned files are pre-read. Changed files are detected. Skills are discovered. It's RAG, but the "retrieval" is a set of parallel, purpose-built functions rather than a vector store.
Hook systems turn the agent into a platform. By letting users inject shell commands at key lifecycle points (prompt submission, tool use, session start), Claude Code becomes extensible without code changes. This is the same pattern that made Git powerful — hooks as extension points.
Wrapping Up
The next time you type a message in Claude Code and see the model respond, remember: your message went through image resizing, bridge safety checks, keyword detection, slash command routing, file extraction, hook execution, ~25 parallel attachment computations, and message normalization before Claude ever saw it.
The model is the engine. The preprocessing pipeline is the transmission, steering, and suspension. Both matter.
This analysis is based on source code study of the Claude Code harness. The code belongs to Anthropic — this is research and commentary, not a distribution of their work.