Claude Mythos Finds Over 10,000 Critical Bugs in Weeks – Glasswing Project Shocks Security World
Anthropic's Claude Mythos preview model, deployed in the Glasswing project, uncovered more than 10,000 high‑severity vulnerabilities across core software in just weeks, validated by independent researchers, while highlighting the massive gap between rapid AI‑driven bug discovery and the slower human patching process.
Finding Vulnerabilities Becomes Easy
Anthropic, together with dozens of technology partners, launched Project Glasswing to harden the world’s most critical software before powerful AI models can be misused. The unreleased Claude Mythos preview model was used to generate the first performance report.
Open‑Source Software Health Report
The model scanned over 1 000 open‑source projects that underpin the internet and enterprise infrastructure. In total it identified 23 019 vulnerabilities, of which 6 202 were estimated to be high‑severity or critical. Six independent security firms manually evaluated 1 752 of the high‑severity candidates, confirming 1 587 true positives (90.6 %) and classifying 1 094 as high‑severity (62.4 %). Extrapolating this verification rate suggests that even if scanning stopped now, nearly 3 900 additional high‑severity bugs could still be confirmed.
Partner case studies illustrate the impact:
Cloudflare’s security team discovered 2 000 bugs in critical‑path systems, including 400 high‑severity issues, and reported that the AI’s false‑positive rate was lower than that of human testers.
Mozilla, testing Firefox 150, found and fixed 271 bugs—more than ten times the number uncovered with Claude Opus 4.6 on Firefox 148.
The UK AI Security Institute called Mythos the first model to solve every problem in their complex multi‑step network‑range testbed.
Independent benchmarks ExploitBench and ExploitGym showed Mythos achieving dominant performance across exploit‑generation tasks.
Real‑world impact includes a bank that averted a $1.5 million fraudulent wire transfer thanks to Mythos‑assisted detection, and the discovery of a critical flaw in the wolfSSL library (CVE‑2026‑5194) that allowed certificate forgery.
Defender Strategies
Despite the rapid discovery of bugs, patch development lags severely. On average it takes two weeks to remediate a high‑severity vulnerability found by Mythos, and only 75 of the 530 disclosed high‑severity bugs have been fully patched, with 65 public advisories issued. Open‑source maintainers—mostly volunteers—are overwhelmed by a flood of AI‑generated low‑quality reports and the traditional 90‑day coordinated‑disclosure timeline, creating a dangerous time gap that attackers can exploit.
Recommendations emphasize shortening patch cycles, adopting NIST and UK NCSC critical controls (default‑secure configurations, mandatory MFA, comprehensive logging), and improving update mechanisms so end users can quickly apply fixes.
Looking ahead, the Glasswing project demonstrates that AI can make vulnerability discovery orders of magnitude faster, but without robust safety guardrails the disparity between finding and fixing widens. Anthropic is releasing toolkits, shared Skills, and a Threat Model Builder to qualified customers, and expects future AI models from major vendors to match Mythos‑level capabilities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
SuanNi
A community for AI developers that aggregates large-model development services, models, and compute power.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
