The context window is not infinite patience
You are midway through building a feature. You need to check how the auth middleware works. Then you want a second opinion on the database schema. Then you realize you should verify the design system tokens before writing the UI. Each detour loads dozens of files into the same conversation, and by the time you return to the original task, the context is bloated with noise that has nothing to do with the feature you are shipping.
This is the single-thread trap. One agent, one context window, every task crammed into the same session. It works for small changes. It collapses on anything that requires exploration, review, and implementation in the same sitting. The alternative is delegation: spin up isolated subagents that handle the ancillary work and return only the synthesized result.
What a subagent actually is
A subagent is a self-contained Claude instance with its own context window, its own tool permissions, and its own lifecycle. It reads the codebase, does its work, and reports back to the main conversation. The main agent never sees the raw file reads or intermediate tool calls — it gets a summary.
This matters because context windows are RAM, not disk. Every file your agent reads, every tool call it makes, every intermediate result it evaluates — all of that competes for the same finite space. Subagents move that overhead into separate memory, so your primary conversation stays focused on the work that requires your attention.
Claude Code ships three built-in subagent types: general-purpose agents for broad tasks, explore agents optimized for codebase search, and plan agents for designing implementation strategies. Custom subagents can be defined in project or user-level configuration with specific system prompts, tool scopes, and model overrides.
Five signals that you should delegate
Not every task benefits from subagents. Spinning up a separate context has overhead — token cost and coordination latency. The payoff comes when the task would otherwise pollute your main session with noise. Here are the concrete signals.
- The task requires reading ten or more files to gather context. A research subagent synthesizes findings instead of dumping raw content into your conversation.
- You have three or more independent pieces of work. Parallel subagents complete them simultaneously — 40 to 60 percent latency reduction on read-heavy tasks like exploration, triage, and test verification.
- You need an unbiased second opinion. A fresh subagent has no memory of your implementation journey, so it catches issues that familiarity obscures.
- The work has clear sequential stages with handoffs. A pipeline of focused subagents — research, then implement, then review — keeps each phase sharp instead of accumulating cross-stage noise.
- You want to verify before committing. A read-only review subagent with restricted tool access gives you confidence without the risk of accidental edits.
Practical delegation patterns
The most effective pattern is research before implementation. Delegate codebase exploration to a subagent first — how does the auth flow work, what utilities already exist, where are the relevant type definitions. The subagent returns a focused summary. Your implementation conversation starts informed instead of spending its first twenty messages on archaeology.
For modifications that touch multiple independent files, parallel subagents avoid the cross-context confusion of updating everything in one thread. Each subagent focuses on a single file or module. The constraint: never send two subagents to edit the same file. That invites merge conflicts.
The most underused pattern is the independent review. After implementing a feature, spin up a fresh subagent with read-only access and ask it to check for security issues, edge cases, and error handling. It has no knowledge of your reasoning during implementation — which is exactly why it catches things you missed.
When to keep it in one thread
Subagents are not a default. They are a tool for specific situations. If the work is sequential and each step depends on the previous result, a single session handles the chain more cleanly than a pipeline of handoffs. If the edit is small — a quick fix, a rename, a config tweak — the coordination overhead outweighs the benefit.
The practical rule: start conversational. Use natural language to delegate when the signals appear. As patterns become clear and repeatable, encode them into custom subagent definitions or skills that your entire team can invoke. The goal is not maximum parallelism — it is keeping your primary context clean so your attention stays on the decisions that matter.