June 2025

The Agent Boundary

Where agents help, where they fail, and when to replace them with software.

A manufacturing team asked me to look at an agent that was supposed to triage deviation reports. It read incoming emails, pulled context from three systems, drafted responses, and sometimes opened tickets. In a demo it felt like magic. In week two it created two tickets for the same lot because it could not see the first one had already been filed under a different subject line.

That is the agent boundary in one story. Agents are good at stitching together messy inputs when a human is watching. They are bad at owning state, enforcing rules, and behaving the same way every time.

Where agents help

Exploration. You have ten data sources and no spec yet. An agent can poke around, summarize what it finds, and propose a workflow. That saves days of manual reading.

Drafting inside a bounded task. Summarize this PDF. Classify this email into one of five categories. Generate a first pass at a SQL query from a plain English question. The output is a draft, not a record.

Glue during development. Wire up APIs quickly, prototype integrations, generate boilerplate. Throw it away or harden it once you know the shape.

Where agents fail

Long-running ownership. Anything that must remember state across days, enforce idempotency, or recover from partial failure needs a database and code, not a conversation.

Regulated decisions. If an inspector asks why a batch was released, you need a row in a table with a user ID and timestamp, not a chat log.

High-volume repetition. Running the same multi-step reasoning loop four hundred times a day burns cost and variance. Software does the same step the same way.

When to replace an agent with software

Replace when the demo ships to production. Replace when the same workflow runs daily. Replace when two teams depend on the output. Replace when audit asks for evidence.

The replacement does not throw away the agent work. It extracts the useful parts: which APIs to call, which fields matter, which prompts classify well. Those become functions with tests. The agent becomes a design tool, not a runtime.

A simple test

Ask: if this runs at 2am with no human watching, what happens when it is wrong? If the answer is unclear, you are past the agent boundary. Build software.