Can We Belief AI To Write Vulnerability Checks? This is What We Discovered

September 29, 2025

Vulnerability administration is all the time a race. Attackers transfer shortly, scans take time, and in case your scanner can’t sustain, you’re left uncovered.

That’s why Intruder’s security staff kicked off a analysis undertaking: may AI assist us construct new vulnerability checks quicker, with out dropping our excessive requirements for high quality?

In any case, pace is barely helpful if the detections are strong – a examine that fires false positives (or worse, misses actual points) doesn’t assist anybody.

On this publish, we’ll share how we’ve been experimenting with AI, what’s working effectively, and the place it falls brief.

One-shot vs. Agentic Strategy

We began easy: drop prompts into an LLM chatbot and see if it may write Nuclei templates. The outcomes have been messy. Outputs referenced options that didn’t exist, spat out invalid syntax, and used weak matchers and extractors. This was constant throughout ChatGPT, Claude, and Gemini.

So we tried an agentic method. Not like a chatbot, an agent can use instruments, search reference materials, and comply with guidelines. We went in with wholesome skepticism (current “vibe coding” disasters didn’t encourage confidence), however the enchancment was instant.

We used Cursor’s agent, and really shortly noticed that with minimal prompts, the standard of output from preliminary runs was much more promising.

From there, we layered on guidelines and listed a curated repo of Nuclei templates. This gave the agent strong examples to study from, reduce down inconsistencies, and nudged it in direction of utilizing the proper performance. The standard of templates jumped noticeably and have been far nearer to what we’d anticipate from our engineers.

However it wasn’t set-and-forget. Left alone, the agent nonetheless wanted course corrections. With clear prompting, although, it may generate checks that regarded like they’d been written manually.

That’s when our aim shifted: not full automation, however a productiveness software that helps us ship high quality checks quicker with out decreasing the bar.

Backlogs don’t stand an opportunity. GregAI, your AI security sidekick, cuts by way of the noise by prioritizing what issues, validating points, and even writing your experiences. Much less slog, extra time again.

1000’s already belief Intruder – why not you?

Study Extra

Our Present Workflow

The method we’ve settled on (for now) makes use of a regular set of prompts and guidelines. The engineer supplies key inputs, corresponding to:

With these in place, the agent builds the template. It’s not absolutely “vibe-coded,” but it surely’s a lot quicker and frees our engineers to spend extra time on deeper analysis.

Successes

Attack Floor Checks

Agentic AI has been particularly helpful for creating checks the place no public templates exist. One candy spot: detecting admin panels uncovered to the web. These checks are easy in precept, however writing them at scale is time-consuming. With automation, we are able to produce much more of them, a lot quicker.

We’re usually shocked at what number of merchandise aren’t coated by the main scanners we use beneath the hood. This course of helps us fill these gaps and provides prospects a fuller view of their assault floor. As a result of in case your VM scanner isn’t flagging uncovered panels – and your property is giant – chances are high you received’t know they’re there.

Unsecured Elasticsearch

We created an unsecured Elasticsearch examine as a fast win for the agentic workflow. A public Nuclei detection template existed, but it surely didn’t cowl the worst-case: cases left huge open the place anybody can learn knowledge. That’s the case we wished to reliably detect.

What we fed the agent:

The duty in 2-3 brief sentences – e.g. detect Elasticsearch cases, make a request to X endpoint after which a follow-up request to Y endpoint to see if knowledge is actually uncovered.
A listing of testing targets internet hosting Elasticsearch servers
An instance goal that was weak to the tactic we wished to check
An instance goal that was not weak

The agent then iterated by way of our course of utilizing the customized guidelines that we set.

The ultimate consequence was a Nuclei template that lists knowledge sources and follows promising endpoints to substantiate whether or not unauthenticated customers can learn knowledge – a multi-request template with working matchers and extractors appropriate for automated scanning.

There was nonetheless handbook enter and judgement from our security engineering staff, however the agent dealt with the repetitive heavy lifting.

Challenges

Our exploration thus far has not been with out its roadblocks and rethinks.

Limits of Present Outputs

Even with guidelines in place, the agent typically strays. One instance: it constructed a examine for an uncovered admin panel however didn’t embrace sturdy sufficient matchers, which risked false positives. A fast additional immediate fastened it – we added a favicon matcher distinctive to that product – but it surely’s a reminder that the agent nonetheless wants guardrails. Till it may well reliably select the strongest matchers and validate them, human oversight stays important.

Truncated Curl Output

Cursor usually pipes ‘curl’ responses by way of ‘head’ to save lots of tokens. Sadly, this will miss distinctive identifiers that will make splendid matchers. It’s an effectivity characteristic, but it surely works towards us and we haven’t absolutely solved it but.

Forgetting the Fundamentals

Typically Cursor overlooks Nuclei’s personal flags, like -l for operating towards a number listing, and as an alternative scripts a handbook loop. We’re engaged on new guidelines to remind it of key Nuclei options and reduce out that inefficiency.

What’s Subsequent?

AI is being pitched in all places as a silver bullet to switch complicated duties outright. From our perspective, a lot of that’s advertising hype. We’re nonetheless a good distance from handing over security engineering to an AI agent with out shut supervision.

That’s to not say it’s not possible, however for now we’re cautious of anybody claiming full automation. We’ll maintain pushing AI in vulnerability administration, each as a productiveness software and, the place doable, in direction of protected automation.

However the backside line as we speak is obvious: to ship high-quality customized checks that don’t miss vulns or generate false positives, professional engineers stay important.

Writer bio: Benjamin Marr, Safety Engineer at Intruder

Ben is a Safety Engineer at Intruder, the place he automates offensive security scanning and carries out security analysis. His background is as an OSWE licensed penetration tester and PHP software program engineer.

Sponsored and written by Intruder.

- Advertisment -

Can We Belief AI To Write Vulnerability Checks? This is What We Discovered

One-shot vs. Agentic Strategy

Our Present Workflow

Successes

Attack Floor Checks

Unsecured Elasticsearch

Challenges

Limits of Present Outputs

Truncated Curl Output

Forgetting the Fundamentals

What’s Subsequent?

Anthropic Launches Claude Code Safety for AI-Powered Vulnerability Scanning

Compromised npm package deal silently installs OpenClaw on developer machines

BeyondTrust RCE flaw now exploited in ransomware assaults

LEAVE A REPLY Cancel reply

Most Popular

PixieFail flaws affect PXE community boot in enterprise techniques

PixieFail UEFI Flaws Expose Tens of millions of Computer systems to RCE, DoS, and Data Theft

New Marvin assault revives 25-year-old decryption flaw in RSA

1000’s of Juniper gadgets susceptible to unauthenticated RCE flaw

Why Instagram Threads is a hotbed of dangers for companies

Phishing Campaigns Ship New SideTwist Backdoor and Agent Tesla Variant

Prospects warned to cancel bank cards

EDITOR PICKS

Notion Level tackles QR code phishing assaults

Fog ransomware gang abuses worker monitoring device in uncommon multi-stage assault

CVE funding disaster presents probability for vulnerability remediation rethink

POPULAR News

PixieFail flaws affect PXE community boot in enterprise techniques

PixieFail UEFI Flaws Expose Tens of millions of Computer systems to RCE, DoS, and Data Theft

New Marvin assault revives 25-year-old decryption flaw in RSA

POPULAR TAGS

POPULAR Tags

POPULAR Tags

ABOUT US

FOLLOW US