The Four Questions

A framework for understanding any AI agent system

A figure standing at the center of a grand stone rotunda, glowing threads of light connecting each arched corridor — multiple paths radiating outward, the architecture itself a framework for navigation

Every AI agent system, whether it’s a coding assistant, a customer support bot, a research agent, or an autonomous pipeline, reduces to the same four questions. The answers predict whether the system will work. And more importantly, they predict how it will fail.

I’ve spent months building and operating agent systems. The ones that succeed aren’t the ones with the fanciest models or the most elaborate prompts. They’re the ones where someone sat down and actually answered these four questions before writing a line of code.


The Framework

Strip away the hype. Strip away the model names and the framework du jour. What remains is architecture. And agent architecture, no matter how complex the system, decomposes into four zones.

Agent Framework: Triggers, Context, Tools, Outputs1. TRIGGERSmessagecronevent2. CONTEXTidentitymemorystate3. TOOLSreadwritesearch4. OUTPUTSresponsememoryaction

1. What triggers the agent?

Something has to wake it up. A user sends a message. A cron job fires. A webhook arrives. Another agent completes its work and passes the baton. An event in a database triggers a listener.

The trigger question sounds simple, but it’s where most design failures start. Because the trigger determines the agent’s operating context: what it knows at wake-up time, what urgency it carries, what the human expects.

Consider the difference:

Take a user-triggered coding assistant. There’s a human on the other end waiting. Latency matters, conversational context is available, and if the task is ambiguous the human can just say so. The feedback loop is tight by design.

A scheduled research agent has none of that. It wakes up alone, no one watching, no one to ask. Latency is irrelevant. But if the task specification is vague, the agent has to handle that ambiguity on its own or fail gracefully. Nobody’s there to clarify.

An event-triggered pipeline agent is different again. Something happened (a deployment completed, a test failed, a file changed) and that event itself carries context. The trigger doesn’t just wake the agent up. It tells the agent what just changed and what might need attention.

Each trigger type produces a different agent design, and the differences aren’t cosmetic. If you don’t think about the trigger carefully, you’ll build a conversational agent for a scheduled task, or an autonomous agent for an interactive context. Both will feel wrong to the user. You won’t understand why until you trace it back to the trigger mismatch.

2. What context does it get?

When the agent wakes up, what does it know? Toy demos hand-curate this. Production systems have to assemble it programmatically, under time pressure, from multiple sources. That gap is where most things break.

A customer support bot that gets nothing but the user’s latest message will hallucinate answers. Give it the full conversation history, the customer’s account data, and the knowledge base, and it actually works. Same model. Same prompt structure. Radically different context.

Context has dimensions that matter:

Breadth. How much of the relevant world does the agent see? A coding assistant that can only see the current file versus one that sees the full repository will produce different quality output.

Freshness. Is the context current? An agent making decisions based on yesterday’s data in a system that changes hourly is making decisions blind.

Relevance. Not everything the agent could see is useful. Dumping an entire codebase into the context window doesn’t help; it drowns signal in noise. The art is selective injection: give the agent what it needs for this specific turn, not everything that might be relevant someday.

Persistence. Does the agent remember previous interactions? A stateless agent treats every conversation as the first. A persistent one builds understanding over time. Neither is inherently better — but the choice has massive implications for how the system feels to use.

If you’re building an agent system and things feel brittle, check the context assembly first. That’s where most systems fail quietly. Not in the model, not in the prompt, but in the gap between what the agent needed to know and what it actually got.

3. What tools can it call?

Strip out the tools and what you have is a text generator. Fine for brainstorming. Not enough for anything that has to interact with the real world.

Tools are the agent’s exoskeleton. They determine what the agent can do, not just what it can say. And the tool set is a trust boundary; every tool you give an agent is a capability you’re granting, and every tool you withhold is a guardrail.

This is where design gets interesting:

A coding assistant might get: file read, file write, terminal execution, web search. That’s enough to build software. It’s also enough to delete your repository if the prompt goes sideways.

A customer support agent might get: knowledge base search, ticket creation, escalation to human. Notably absent: the ability to issue refunds, modify accounts, or access internal systems. Not because it couldn’t — because it shouldn’t.

A research agent might get: web search, document reading, citation generation. No file writes. No code execution. Its output is information, not action.

The tool set isn’t just a capability list; it’s a narrowing. Each agent profile is a restricted view of the full capability space. A well-designed system makes the restrictions intentional and explicit. What can this agent do? Equally important: what can it not do?

The mistake I see most often: giving every agent every tool. It’s the path of least resistance, and it’s how you end up with an agent that sends emails when it was supposed to write code, or modifies production data when it was supposed to generate a report. Capability without constraint isn’t power. It’s exposure.

Ancient keys of various sizes hanging from hooks on a candlelit stone wall — each key different, each for a different lock, the warm light revealing their distinct shapes

4. What does it output?

The agent did its thing. Now what? Where does the output go, what form does it take, and who — or what — consumes it?

Possible outputs:

A response to a human. The simplest case. The agent generates text, the human reads it. But even here, design matters: is the response streamed or delivered complete? Does it include citations? Can the human provide feedback that feeds back into the system?

A persistent artifact. The agent writes a file, creates a database record, generates a report. Now the output has a lifecycle. It exists after the agent dies. It can be wrong in ways that aren’t immediately visible. It needs versioning, review, or validation.

A trigger for another agent. Agent A completes and its output becomes Agent B’s context. Now you have a pipeline. The output format becomes a contract between agents. Get the contract wrong and the whole chain breaks — but the failure might not surface until three agents downstream.

A state change. The agent modifies something in the world: deploys code, updates a configuration, sends a notification. This is the highest-stakes output because it’s the hardest to undo. An agent that sends the wrong Slack message is embarrassing. An agent that deploys the wrong code is a production incident.

What happens if the output is wrong? How expensive is the mistake? How quickly will someone notice? Those questions aren’t afterthoughts. They should drive the entire system design.


How the Answers Interact

Here’s the thing: the four questions aren’t independent.

Trigger determines context requirements. A user-triggered agent needs conversational context. A scheduled agent needs time-based state. An event-triggered agent needs the event payload.

Context constrains tool usefulness. An agent with broad context but no tools can inform but not act. An agent with powerful tools but narrow context will act blindly.

Tools shape possible outputs. An agent that can only read and search produces information. An agent that can write and execute produces artifacts and state changes.

Outputs create new triggers. The output of one agent becomes the trigger or context for the next. This is how simple agents compose into complex systems.

When an agent system isn’t working, trace the four questions. The problem is almost always a mismatch. The agent is triggered in a way that doesn’t match its context. It has context it can’t act on because it lacks the right tools. Its output goes somewhere nobody checks and errors compound silently. The specific failure mode varies; the diagnostic is always the same.


Applying the Framework

Next time you’re evaluating an agent system — whether building one, buying one, or trying to understand why one isn’t working — run it through the four questions.

What triggers it? If the answer is vague (“it just runs”), the system is underspecified. Every agent needs a clear activation condition.

What context does it get? If the answer is “everything” or “I’m not sure,” the system is either overloaded or opaque. Good context is specific, fresh, and relevant.

What tools can it call? If the answer is “all of them,” there are no guardrails. If the answer is “none,” it’s just a chatbot. The interesting design space is in the intentional middle.

What does it output? If the answer doesn’t include what happens when the output is wrong, the system isn’t ready for production.

What the Pattern Books Miss

The emerging industry consensus on agent architecture has arrived at a taxonomy. Pattern catalogs now name and describe twenty-plus distinct agent behaviors — tool use, reflection, planning, multi-agent orchestration — and assign them maturity levels from basic reasoning engines up through collaborative autonomous systems. Framework documentation for LangChain, ADK, CrewAI, and others maps these patterns to concrete implementation choices. It’s useful work.

Pattern catalogs answer a different question than the one that matters most, though. They answer: what can this agent do? The Four Questions answer: should this agent exist, and what happens when it fails?

There’s also a composability problem the catalogs sidestep. Each pattern is described in isolation: here’s what a reflection loop looks like, here’s how to implement a planning agent. Real systems don’t use one pattern. They combine them. A production pipeline might chain a planning agent into a tool-using executor into a reflection loop into a handoff to human review. When that system breaks, which pattern failed? You can’t debug at the pattern level. You have to trace the whole thing.

Trigger, context, tools, output compose cleanly across any number of chained agents. Take a single question — say, what context does it get? — and trace it through every agent in a pipeline. You’ll find the point where context degrades or gaps appear. The pattern taxonomy tells you what each agent is doing. The Four Questions tell you whether the whole chain is coherent.

These four questions won’t tell you which model to use or how to write your prompts. They’ll tell you whether your agent system has a coherent architecture or whether you’re building on a foundation that will crack under load.

I spent weeks on model selection and prompt tuning before realizing the architecture was already decided, by accident, and that’s where the actual problems were.