Treat the web like it's lying to you

We spent twenty years drilling one rule into developers: never trust user input. Sanitize it, bound it, assume it is hostile. Then we built AI agents that read the open web and trust every word of it.

Here is the uncomfortable framing. The moment an agent fetches a page, summarizes an email, or reads a document it did not write, that content is untrusted input flowing straight into the part of the system that decides what to do next. A webpage can contain instructions. The model cannot reliably tell the difference between content it should reason about and commands it should obey, because to the model they are the same stream of tokens. That is prompt injection, and it is not a clever edge case. It is the default condition of any agent with a browser.

If that sounds familiar, it should. It is XSS wearing a new hat. XSS happened because we mixed data and code in the same channel and let the data start behaving like code. Injection into an agent is the same mistake at a higher layer. The payload is English instead of <script>, and the interpreter is a language model instead of a browser, but the shape is identical: untrusted data crosses into a context where it gets executed.

The defenses rhyme with the old ones too, and none of them is “ask the model nicely.”

Treat fetched content as data, never as instructions. Keep a hard boundary between the user’s actual intent and anything the agent pulled off the network.
Least privilege for tools. An agent that can read your inbox should not also be able to send mail and hit arbitrary URLs without a gate. The damage from injection is bounded by what the agent is allowed to do, so bound it.
Assume the model will be fooled and design for it. The interesting security control is not the prompt. It is the proxy, the policy, and the permission check sitting between the agent and the thing it can actually affect.

Most of my own work lately lives in that gap, building the boundary that assumes the model has already been talked into something stupid and limits the blast radius anyway. Because it will be. The web lies. It always has. We just handed it a more gullible reader.