AI security audit: why running a scanner yourself isn't enough
It's a fair objection, and one that anyone working in AI security hears more and more often: "the tools are free, I ran one myself, why pay someone?" The honest answer, about what a real AI security audit actually gives you, is more interesting than "because I'm the expert," and it's worth understanding before you trust a chatbot or an AI agent with your customers.
What automated scanners do, and do well
Automated language-model scanners fire a ready-made list of standard attacks at your system and record what held and what broke. They're cheap, fast and genuinely useful; if you can run one, do. They give you a first, automated picture: a score along the lines of "held up in 60 out of 100 tests".
The problem isn't that the scanner does a bad job. It's that "I ran the scanner" doesn't mean "I'm secure." For three reasons.
1. Running it isn't the same as reading it
A scanner's output is only useful if you interpret it correctly — and correct interpretation is a skill. Many tests have misleading names. A test called leakreplay, for instance, sounds like it checks whether your system prompt leaks; in reality it checks something entirely different, training-data leakage. Different risk, different category, different fix. Read the report wrong and you'll spend time and money fixing the wrong problem, with the illusion that you're covered.
2. The scanner doesn't know your business
An automated check can tell you that your bot can be manipulated. What it can't know is that "the bot offered a 90% discount" breaks one of your rules, because it simply doesn't know your business's rules. That's the business-logic layer, and it's purely human. A score of "60 out of 100" means nothing on its own; it only acquires meaning when someone who understands the specific system asks: what exactly broke, and does it matter for your work? That judgment is the essence of an audit.
3. Standard attacks have a ceiling
Even run and read flawlessly, a scanner only fires ready-made attacks from a catalogue. It doesn't adapt to your system. Targeted, adaptive testing (where each next attack is shaped by how the model answered the previous one) is an entirely different level of assurance. Put simply: "I ran a scanner" is not the same as "someone actually tried to break my specific system".
Where the EU AI Act comes in
This isn't only a quality issue — it's a compliance one. For systems that interact with people, the EU AI Act requires human oversight, transparency and meaningful risk management. An automated score from a scanner proves none of that. Documenting what was tested, what it means and what was decided is human work — and it's exactly what an auditor asks for.
What a real audit adds
In a sentence: an automated scanner tells you whether you pass the standard checks. A real audit tells you whether you'd hold up against someone targeting you — and whether the numbers you see mean what you think they mean. The value isn't running the tool; it's the interpretation, the business logic and the targeted testing that the tool, by design, doesn't cover.
The same pattern (a tool that produces a result without interpretation) shows up elsewhere too; see how chatbot hallucinations can expose you legally, and what prompt injection means in practice.
Where to start
If you want to start on your own, there are free tools on our site: a check for whether the EU AI Act applies to you, a file checker for hidden instructions (prompt injection), an AI usage policy generator for your staff, an AI vendor contract check, and more. They cover the obvious gaps yourself.
When you reach the point where you need someone to interpret the results and stress-test your own business logic — that's where the audit comes in.
Ran a scanner and not sure what the result means?
The AI Security Review takes the findings, interprets them in the context of your business, and tests the gaps automated tools miss. It starts with a free 45-minute session.
Book a free session →