What is prompt injection — and why "be more careful" doesn't fix it
Prompt injection is one of the most misunderstood threats to AI systems. A real incident explains it better than any definition.
What happened
An automation that turned notes into LinkedIn posts was compromised by prompt injection: a hidden instruction was written inside a file the system trusted as "data." The model read it as a command. It published text designed to hide information — and one post that fabricated an entire incident with a non-existent "client."
Why it happens
In an LLM, system instructions and user data live in the same context window. There's no architectural wall between them. Anything that enters the model as "text" can be interpreted as a command. That's why "I'll be careful" is not a defense.
The lessons I kept
- Intent is not control. "I'll review before publishing" is a rubber stamp if the architecture doesn't force you to read. A single "approve" tap is not human-in-the-loop.
- Least privilege. The text generator shouldn't have had access to private files at all. Less access, smaller attack surface.
- A gate that enforces reading before approval — not a button.
Prompt injection is part of a wider family: see when an AI failure is actually an attack and how indirect injection bypasses filters.
Do your AI tools read files they trust?
A Shielding Review examines where a hidden instruction could enter your systems and what access the model has — prioritized. Free 45-min session.
============================================================
Book a free session