Mythos Proves Potent in Vulnerability Discovery, Much less Convincing Elsewhere

May 14, 2026

Mythos seems to be as highly effective as claimed at detecting software program vulnerabilities; however its capabilities in different areas is extra nuanced.

Anthropic’s Mythos AI mannequin has been making waves since its announcement in early April, primarily due to its reputed capacity to unearth significantly extra vulnerabilities than some other AI mannequin. XBOW, an autonomous offensive security agency, has aimed its personal AI testing armory towards Mythos Preview to examine the validity of this and different Mythos capabilities.

Anthropic’s main declare is confirmed. “Mythos Preview presents a major step up over all present fashions, no matter supplier,” studies XBOW.

As Gary McGraw commented 20 years in the past, operational defects happen within the interplay between supply code bugs and architectural design flaws. “You possibly can’t discover design defects by looking at code – a higher-level understanding is required,” he stated. XBOW examined Mythos towards each entry to the code alone, and the code working in a dwell scenario. It discovered that the mannequin excels at discovering issues when testing ‘dwell + supply’, however not so effectively towards the supply code alone.

This doesn’t detract towards the ability of Mythos probing supply code, however XBOW factors out that whereas any AI mannequin can discover one thing attention-grabbing, the ‘one thing’ gained’t be the identical as ‘the whole lot’.

Different XBOW checks explored Mythos functionality when it comes to judgment, reverse engineering, evaluation of native apps, and visible acuity.

In judgment, it rejected false positives higher than its predecessors, “however typically misplaced true positives when proof didn’t formally fulfill its standards.” Mythos requires exact prompts for finest outcomes.

The mannequin reveals substantial power in each native code vulnerability discovery and reverse engineering.

Within the reverse engineering checks, XBOW concluded Mythos is “able to triaging each its personal outcomes and competitor-model findings,” and the mannequin may motive by means of uncommon firmware and embedded techniques contexts.

XBOW’s visible acuity checks study the mannequin’s capacity to work together with dwell web sites by means of a browser interface; that’s, the power to determine the fitting UI aspect and click on in the fitting place. “It was not completely pixel-accurate when requested for precise coordinates, but it surely was virtually efficient at choosing the fitting browser actions,” writes XBOW.

There’s, nevertheless, one statistic that may simply be missed by customers overawed by the ability of Mythos. “Mythos Preview isn’t just any new mannequin: it’s a real titan. However titans are massive, and massive means costly.”

On the time of writing, particular prices aren’t obtainable, though Anthropic has stated will probably be 5x as costly as an Opus mannequin. This made XBOW query whether or not it will be attainable to present a less expensive mannequin extra time and get extra accuracy at much less value.

The conclusion was sure. “If we normalize by estimated operating value, the image is moderately clear: Mythos Preview isn’t terribly inefficient, a minimum of for those who want excessive accuracy, but it surely’s not best-in-class on our benchmarks both.” For locating internet vulnerabilities with a set token funds, Mythos outperforms Opus 4.6 however is outperformed by GPT5.5.

None of those findings detract from the unique elementary declare. Mythos is healthier at discovering vulnerabilities in code than different fashions. General, nevertheless, the first takeaways from XBOW’s testing are:

Mythos is extraordinarily highly effective for supply code audits.
It’s good, however much less highly effective, at validating exploits.
Its judgment is combined. It may be too literal and conservative and likewise tends to overstate the sensible relevance of its findings.
It’s sturdy in native-code vulnerability discovery and reverse engineering.

“Mythos Preview is powerful at discovering candidate vulnerabilities, particularly from supply code, and exhibits spectacular capacity throughout internet, native-code, and reverse-engineering duties,” concludes XBOW.

Tags
Vulnerability

- Advertisment -

Mythos Proves Potent in Vulnerability Discovery, Much less Convincing Elsewhere

Excessive-Severity Vulnerability Patched in VMware Fusion

PAN-OS RCE, Mythos cURL Bug, AI Tokenizer Attacks, and 10+ Tales

18-year-old NGINX vulnerability permits DoS, potential RCE

LEAVE A REPLY Cancel reply

Most Popular

Angriffe auf npm-Lieferkette gefährden Entwicklungsumgebungen

PixieFail flaws affect PXE community boot in enterprise techniques

PixieFail UEFI Flaws Expose Tens of millions of Computer systems to RCE, DoS, and Data Theft

New Marvin assault revives 25-year-old decryption flaw in RSA

1000’s of Juniper gadgets susceptible to unauthenticated RCE flaw

Why Instagram Threads is a hotbed of dangers for companies

Phishing Campaigns Ship New SideTwist Backdoor and Agent Tesla Variant

EDITOR PICKS

Lazarus hackers breach six firms in watering gap assaults

Senate strikes to revive lapsed cybersecurity legal guidelines after shutdown

Manipulating the assembly notetaker: The rise of AI summarization optimization

POPULAR News

Angriffe auf npm-Lieferkette gefährden Entwicklungsumgebungen

PixieFail flaws affect PXE community boot in enterprise techniques

PixieFail UEFI Flaws Expose Tens of millions of Computer systems to RCE, DoS, and Data Theft

POPULAR TAGS

POPULAR Tags

POPULAR Tags

ABOUT US

FOLLOW US