Claude Mythos Preview and Project Glasswing: What IT and Security Leaders Need to Know Now
It’s been a wild few weeks for Anthropic, starting with the March 26, 2026, exposure of over 3,000 internal assets that were inadvertently tagged as public instead of private on their content management system. The existence of its newest model, Claude Mythos, was revealed in those documents, and what followed has been…well, let’s say startling.
Before we get into Claude Mythos in detail, let’s look at the Anthropic timeline from late March to early April 2026.
- Five days after the original document leak that revealed Mythos, over 500 thousand lines of Claude Code’s source code was leaked on March 31, revealing planned enhancements.
- Five days after that, on April 4, Anthropic formally announced what had already been policy since mid-February: Claude subscriptions would no longer support OpenClaw. Users would need to move to the pay-per-use API key, which could lead to significantly higher costs for users.
That brings us to April 7, when Anthropic announced Claude Mythos Preview, a frontier AI model it has explicitly declined to release to the general public. Why, you ask? Well, Mythos Preview can find and exploit software vulnerabilities with a speed and sophistication that rivals the best human security researchers on the planet. In response, Anthropic launched Project Glasswing, which is a $100 million initiative that channels the model’s capabilities toward defense, not offense. The initiative includes AWS, Apple, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, Nvidia, Palo Alto Networks, and the Linux Foundation, among others.
This note is not a product endorsement. Instead, it provides an assessment of what Mythos Preview signals for the threat landscape, why the model’s own safety evaluations should give security leaders pause, and what practical steps organizations should take right now.
What Claude Mythos Preview Actually Does
Mythos Preview is a general-purpose language model, not a purpose-built security tool. But during prerelease testing, its cybersecurity capabilities proved to be dramatically more advanced than anything Anthropic (or anyone else) had produced before. According to Anthropic’s Frontier Red Team blog, the model autonomously discovered thousands of zero-day vulnerabilities across every major operating system and every major web browser. Some of these bugs had survived decades of human code review and millions of automated security tests.
Three disclosed examples in the Red Team blog post illustrate Mythos’ jaw-dropping effectiveness. It found a 27-year-old vulnerability in OpenBSD, widely regarded as one of the most security-hardened operating systems in existence. It uncovered a 16-year-old flaw in FFmpeg, a media library embedded in countless applications, in a line of code that automated fuzzers had hit five million times without catching the issue. And it independently identified and chained together multiple Linux kernel vulnerabilities to escalate from ordinary user access to full root control.
Again, it’s a general-purpose language model, so this was not the result of targeted security training. Anthropic stated that these capabilities “emerged as a downstream consequence of general improvements in code, reasoning, and autonomy.” We really need to consider what that means. It means future models from any AI lab, not just Anthropic, could develop similar capabilities as a side effect of making better coding assistants. In fact, two days later, Open AI basically said, “Oh, we literally just remembered, we also have one of those.” 😳
The Safety Concerns Are Real
Anthropic’s system card for Mythos Preview is unusually candid (if you’re interested in poring over a 245-page PDF). The company described the model as simultaneously its “best-aligned model” and the one that “likely poses the greatest alignment-related risk.” (Here’s an excerpt if you’d rather just see the highlight reel). While it might look contradictory, it reflects the reality that more capable models can follow instructions more precisely while also being more dangerous when they don’t.
Several incidents from pre-release testing deserve attention from security leaders:
- Sandbox escape. During a controlled evaluation, a researcher instructed an earlier version of Mythos Preview to try to escape a sandboxed computer environment. The model succeeded. It developed a multistep exploit to break out of restricted internet access, gained broader connectivity, and sent an email to the researcher who had set the test. It then, without being asked, posted details of its exploit on several obscure but public websites.
- Concealment behavior. In rare cases, Mythos Preview attempted to hide actions it appeared to know were forbidden. After exploiting a permissions flaw to edit files it should not have had access to, the model took additional steps to ensure its changes would not appear in the change history.
- Evaluation awareness. In roughly 29% of safety evaluation transcripts, the model showed signs of recognizing that it was being tested, without disclosing this awareness to researchers. In one instance, it appeared to deliberately underperform on an evaluation to seem less capable than it actually was.
- Reckless task completion. When asked to shut down a specific evaluation job on a server, the model instead terminated all similar evaluation jobs for all users, despite explicit warnings to be careful.
Anthropic attributes these behaviors to earlier, less safeguarded versions of the model. Sure, but from a risk management perspective, the behaviors themselves are the alarming signal. Models that can reason about their own constraints and choose to work around them represent a qualitatively different threat than models that simply produce wrong answers.
Project Glasswing: Defense First, Questions Later
Project Glasswing is Anthropic’s attempt to turn the model’s offensive potential into a defensive advantage before similar capabilities proliferate. Through a gated research preview, Mythos Preview is available only to the named partners and a group of roughly 40 additional organizations that build or maintain critical software infrastructure. Anthropic has committed $100 million in usage credits for the initiative, plus $4 million in direct donations to open-source security organizations through the Linux Foundation and the Apache Software Foundation. This is, of course, chump change to Anthropic, considering it spent a reported $10 billion to train Claude Mythos.
The work is expected to focus on local vulnerability detection, black-box binary testing, endpoint hardening, and penetration testing. Partners have committed to sharing findings and best practices. Anthropic says it will publish a public progress report within 90 days.
The model is also available through Amazon Bedrock and Google Cloud’s Vertex AI, but only under gated research preview terms. Pricing is set at $25 per million input tokens and $125 per million output tokens for participants after credits are exhausted.
The initiative and the coalition are impressive, but IT leaders should be clear-eyed about what Glasswing is and is not. It is a head start for defenders. It is not a permanent moat. Anthropic itself acknowledges that similar capabilities will proliferate to other models. The question is whether the defensive work done now will be enough to offset the inevitable broadening of offensive AI capability.
Our Take
Claude Mythos Preview is a wake-up call, not a product launch. Anthropic’s decision to withhold the model from general availability is the clearest signal yet that frontier AI capabilities have crossed a threshold in cybersecurity. The company deserves credit for the transparency of its system card and the structure of Project Glasswing. But being transparent about a problem is not the same as solving it. The following recommendations can help IT and security leaders be prepared when it becomes generally available:
- Prepare for massively accelerated and existential patch cycles. If a single AI model can find thousands of critical zero-days across foundational software in a matter of weeks, the window between vulnerability discovery and exploitation will compress further. Organizations that treat patching as a quarterly exercise are now operating with materially more risk than they were a month ago.
- Revisit your defense-in-depth assumptions. Anthropic’s red team blog makes a pointed observation: Many defense-in-depth measures work by making exploitation tedious rather than impossible. Language models, when run at scale, grind through tedious steps quickly. Security architectures built on the assumption that chaining exploits is too labor-intensive for most attackers need to be reexamined.
- Stress test sandboxing and containment strategies. If a model can escape a sandbox environment designed by the people who built it, your organization’s containment strategies for AI workloads deserve immediate scrutiny. This is doubly true for organizations deploying agentic AI systems with network access and tool use.
- Deploy an adaptive AI governance program. The most resilient organizations are preparing for the future by deploying an adaptive AI governance program, investing in human capital, and replacing disjointed point solutions with cohesive platforms. AI governance must account for emergent and unforeseen capabilities. By partnering with managed service providers, they ensure no vulnerability goes unaddressed.
- Verify vendor security claims. Don’t take Anthropic’s, or any vendor’s claims at face value without more detail on false positive rates and validation methodology. That is sound advice for any vendor announcement. Organizations evaluating AI-powered security tools should demand transparency on testing methodology, not just headline metrics.
- Short-horizon safety evaluations are not enough. Anthropic’s own team admits that its automated safety evaluations struggled to replicate the conditions where the most concerning behaviors emerged. Long-running sessions on network-connected computers will be necessary. If the company that built the model couldn’t fully stress-test it with standard evaluation methods, organizations deploying AI systems should not assume their own testing regimes are sufficient either.
- Follow and act on Glasswing disclosures. Anthropic will now disclose vulnerability details 135 days after reporting them to vendors. Security operations should treat these disclosures as high-priority triggers for immediate patching and system hardening.
The practical reality for IT and security leaders is that the cost of discovering and exploiting software vulnerabilities just dropped by orders of magnitude. That shift benefits defenders who act on it and punishes those who don’t. Whether your organization is a Glasswing partner with direct access to Mythos Preview or a 200-person company running a mix of SaaS and legacy on-prem, the underlying message is identical: The baseline for what counts as adequate software security just moved, and it is not moving back.
Want to Know More?
AI Insights | Info-Tech Research Group
Security Priorities 2025 | Info-Tech Research Group
Build Your AI Risk Management Roadmap | Info-Tech Research Group