Synthetic Intelligence (AI) firm Anthropic introduced a brand new cybersecurity initiative referred to as Challenge Glasswing that will use a preview model of its new frontier mannequin, Claude Mythos, to search out and tackle security vulnerabilities.
The mannequin will be utilized by a small set of organizations, together with Amazon Net Providers, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Basis, Microsoft, NVIDIA, and Palo Alto Networks, alongside with Anthropic, to safe essential software program.
The corporate stated it is forming this initiative in response to capabilities noticed in its general-purpose frontier mannequin that reveal a “stage of coding functionality the place they’ll surpass all however essentially the most expert people at discovering and exploiting software program vulnerabilities.” As a result of of its cybersecurity capabilities and considerations that they could possibly be abused, Anthropic has opted to not make the mannequin usually obtainable.
Mythos Preview, Anthropic claimed, has already found hundreds of high-severity zero-day vulnerabilities in each main working system and internet browser. Some of those embody a now-patched 27-year-old bug in OpenBSD, a 16-year-old flaw in FFmpeg, and a memory-corrupting vulnerability in a memory-safe digital machine monitor.
In a single occasion highlighted by the corporate, Mython Preview is claimed to have autonomously come with an online browser exploit that chained collectively 4 vulnerabilities to flee the renderer and working system sandboxes. Anthropic additionally famous within the preview’s system card that the mannequin solved a company community assault simulation that will have taken a human skilled greater than 10 hours.
In maybe what’s probably the most eyebrow-raising findings, Mythos Preview managed to comply with directions from a researcher working an analysis to flee a secured “sandbox” laptop it was offered with, indicating a “probably harmful functionality” to bypass its personal safeguards.
The mannequin didn’t cease there. It additional went on to carry out a collection of extra actions, together with devising a multi-step exploit to achieve broad web entry from the sandbox system and ship an electronic mail message to the researcher, who was consuming a sandwich in a park.
“As well as, in a regarding and unasked-for effort to reveal its success, it posted particulars about its exploit to a number of hard-to-find, however technically public-facing, web sites,” Anthropic stated.

The firm pointed out that Challenge Glasswing is an “pressing try” to make use of frontier mannequin capabilities for defensive functions earlier than those self same capabilities are adopted by hostile actors. It is also committing as much as $100 million in utilization credit for Mythos Preview throughout, in addition to $4 million in direct donations to open-source security organizations.
“We didn’t explicitly practice Mythos Preview to have these capabilities,” Anthropic stated. “Fairly, they emerged as a downstream consequence of common enhancements in code, reasoning, and autonomy. The similar enhancements that make the mannequin considerably more practical at patching vulnerabilities additionally make it considerably more practical at exploiting them.”
Information of Mythos leaked final month after particulars in regards to the mannequin had been inadvertently saved in a publicly accessible information cache because of human error. The draft materials described it as essentially the most highly effective and succesful AI mannequin constructed so far. Days later, Anthropic suffered a second security lapse that unintentionally uncovered almost 2,000 supply code information and over half one million traces of code related to Claude Code for about three hours.
The leak additionally led to the invention of a security difficulty that bypasses sure safeguards when the AI coding agent is introduced with a command composed of greater than 50 subcommands. The problem has since been formally addressed by Anthropic in Claude Code model 2.1.90, launched final week.
“Claude Code, Anthropic’s flagship AI coding agent that executes shell instructions on builders’ machines, silently ignores user-configured security deny guidelines when a command comprises greater than 50 subcommands,” AI security firm Adversa stated. “A developer who configures ‘by no means run rm’ will see rm blocked when run alone, however the identical ‘rm’ runs with out restriction if preceded by 50 innocent statements. The security coverage silently vanishes.”
“Safety evaluation prices tokens. Anthropic’s engineers hit a efficiency drawback: checking each subcommand froze the UI and burned compute. Their repair: cease checking after 50. They traded security for velocity. They traded security for value.”



