Long-running agents do not fail only because they run out of tokens.
They fail because their context gets messy.
Old errors, stale assumptions, duplicate tool output, irrelevant files, and half-resolved plans accumulate until the model can no longer see the actual task.
Context budget management is how you prevent that.
Start with a Budget
Give each phase a target:
- task summary
- retrieved evidence
- local code context
- current plan
- tool output
- verification results
Do not let any one category dominate the window.
Summarize Durable Facts
After each major step, compress:
Verified facts:
- The failing test is auth-refresh.spec.ts.
- The route uses src/app/api/auth.
- The SDK now expects refreshToken().
Open risks:
- Middleware behavior still needs checking.
Carry the facts forward.
Drop the noise.
Retrieve on Demand
Do not preload every possible document.
Expose retrieval tools and let the agent search when the task requires it.
This is especially important for web context. A live search tool like Ninelayer can return current evidence only when needed.
Add Retry Circuit Breakers
If the same error appears twice, stop regenerating.
The agent should:
- name the failed assumption
- retrieve or inspect new evidence
- patch only after the assumption changes
That saves tokens and prevents repeated wrong edits.
The Practical Takeaway
Long-running agents need context discipline.
Summarize facts.
Filter retrieval.
Cap tool output.
Stop retry loops early.
The goal is not a bigger window.
The goal is a cleaner one.
Sources
- Ninelayer blog: How to Reduce AI Agent Token Usage
- Claude Code docs: MCP output limits and warnings
