4 Context Engineering Strategies Every AI Engineer Needs to Know
The thing nobody explains about building AI agents.
Hey friend đ,
A few months ago, I was building an AI agent to help engineers debug production issues. The idea was simple: pull logs from multiple sources, find patterns, and explain what went wrong.
âSearch the logs and tell me why this alert fired.â
The agent would come back with something like:
âAt 14:32 UTC, the checkout service started returning 503 errors. The root cause was the Redis cache hitting memory limits. The issue self-resolved at 14:47.â
Incredible, right?
Except it didnât work.
The log data was massive and noisy. Within a few conversational turns, Iâd maxed out the context window. The agent couldnât keep all those log outputs in memory. It would start strong, then eventually fail or hallucinate.
The solution wasnât to switch models or add more data. It was to rethink the context management strategy.
What Is Context Engineering?
When you talk to an AI model, it sees more than just your prompts. Your instructions, the conversation so far, tool call results, documents-all of it sits in this window together.
Andrej Karpathy has a useful mental model for this: the LLM is the CPU, and the context window is the RAM. Itâs the modelâs working memory. Everything has to fit there.
But itâs not just about overflow. Even before you hit the limit, models suffer from âcontext rotâ-performance degrades as more tokens are added, even within the window size.
Think about finding one important note on a desk. Easy with 10 papers. Hard with 1,000. The note is still there-but good luck finding it.
Drew Breunig outlined four ways bad context breaks your agent:
Context Poisoning: A hallucination enters context and corrupts all future reasoning
Context Distraction: Too much context overwhelms the model
Context Confusion: Irrelevant information influences responses
Context Clash: Different parts of the context contradict each other
If your agent works at first then drifts later, one of these is usually why.
Why This Matters For Agents
Hereâs the thing that makes this click: LLMs are stateless.
They donât ârememberâ anything between calls. Every time you call the model, you pass in the entire conversation history via an API call.
â User asks a question (20 tokens)
â Assistant decides to call a tool (50 tokens)
â Tool returns results (2,000 tokens)
â Assistant reasons about the results (100 tokens)
â ...repeat 50 times...
Eventually, youâre passing hundreds of thousands of tokens just to generate the next sentence.
This is the context engineering problem.
The Four Strategies
So how do you actually manage context? There are four main strategies. Weâll use Claude Code (a terminal AI agent) as a reference because it uses all of these.
1. Write (External Memory)
Donât keep everything in context. Have your agent write important stuff somewhere external.
Claude Code writes its plans to disk. It also uses a TodoWrite tool to persist task state. When debugging a complex issue across 15 files, instead of holding âfixed auth.ts, need to check db.ts, then run testsâ in context, it writes each step to a structured todo list. The todos live outside the window-the agent references them when needed, not constantly.
Cursor and Windsurf use rules files. ChatGPT saves memories across sessions. Same idea: give your agent a write_to_scratch tool that writes findings and plans to a file. Those notes donât cost attention until the agent pulls them back in.
2. Select (Just-in-Time Retrieval)
Some people dump all docs and tools into context upfront. Donât do this.
Claude Code never reads an entire codebase upfront. It uses Glob to find file paths matching a pattern (e.g., **/*.ts), Grep to locate specific code references, then Read to pull in only the relevant file. A question like âwhere is authentication handled?â triggers a targeted search-not a 50-file dump into context.
Keep references instead (file paths, database queries). When the agent needs the content, it loads it then. And if you have 50 tools, the model parses 50 descriptions every turn-keep your toolset minimal or dynamically load definitions based on the task.
Claude Skills was a recent feature that used this approach - it readâs a description of the tools - not the âentireâ tool definition.
Hereâs the algorithm:
Give agent a compressed summary of the tools (âUse this tool if the user asks about LinkedIn postsâ)
Read the full tool description dynamically if the agent thinks itâs needed
3. Compress and Prune
Even with a 200k token window, a messy context leads to bad answers.
Summarization: If youâve used Claude Code, youâve seen this. When the window fills, it summarizes the conversation-preserving architectural decisions but dropping the exploration that led there.
Context editing (pruning): Sometimes you donât need a summary. You just need to delete. Anthropic found that simply removing stale tool outputs reduced token usage by 84% on long-running tasks.
Did the agent run a
ls -lacommand 10 turns ago? Delete the output. The model already used that info.Did a tool return 5,000 lines of logs? Summarize it to âFound 847 errors, 92% were Redis timeouts: org.redis.client.RedisTimeoutException: Redis server response timeout (3000 ms) occured for command: (GET),â then delete the raw data.
4. Isolate (Multi-Agent Systems)
This is my favourite technique for complex tasks. Instead of one agent drowning in context, split the work.
Claude Code spawns specialized agents by type: Explore for codebase navigation, Plan for architecture decisions, claude-code-guide for documentation lookup. Each operates in its own context window. If the user asks âhow does billing work?â and âwhatâs in the docs about webhooks?â-two agents run in parallel, each with fresh context, returning focused summaries to the main conversation.
When delegating to a sub-agent, the prompt is compressed: âFind all API endpoints that modify user dataâ rather than passing the full conversation history. The sub-agent explores freely, then returns a summary. The orchestrator never sees the 30 files the sub-agent read-just the 500-token answer.
Uses more total tokens. Gets better results.
TL;DR
Managing context is important when building long running AI agents. The window quickly fills up.
The counterintuitive thing about context windows is that bigger doesnât always mean better. A 200k window full of noise performs worse than a 20k window with exactly what matters. Context engineering isnât about cramming more in. Itâs about curating what the model sees.
How to solve this:
Write: Save state to external files.
Select: Load data only when needed.
Compress: Summarise history and delete stale tool outputs.
Isolate: Use sub-agents to encapsulate high-token tasks.
Back to my logs agent: the fix was combining a few of these strategies together. What felt like a model limitation was actually a context engineering problem.
Thanks for reading.
Have an awesome week : )
P.S. If you want to go deeper on building AI systems, I run a community where we build agents hands-on: https://skool.com/aiengineer


