On June 9, artificial intelligence company Anthropic made its most capable model, Claude Fable 5, generally available, simultaneously introducing a dual-product strategy. This innovative approach splits the powerful AI not by its core capabilities, but by layers of safety classifiers, creating a public-facing version and a restricted version for cybersecurity professionals.
The public-facing model is Claude Fable 5, while its counterpart, Claude Mythos 5, features the same underlying technology with enhanced cybersecurity safeguards removed. This advanced cybersecurity model remains accessible only to a vetted group of cyber defenders and critical infrastructure operators, who Anthropic claims make it the world’s strongest cybersecurity model.
The practical distinction lies in how each model handles specific types of requests. Fable 5 redirects requests flagged for cybersecurity, biology, chemistry, and distillation to the less powerful Claude Opus 4.8. In contrast, Mythos 5 retains its full cybersecurity capabilities for its approved users. Both models are priced at $10 per million input tokens and $50 per million output tokens, representing a significant price reduction from the earlier Mythos Preview. Fable 5 is now accessible via the Claude API, and will be included in Pro, Max, Team, and Enterprise plans at no additional cost through June 22, after which it will transition to usage credits.
How Fable 5’s Cybersecurity Classifiers Operate
The decision to split the model stems from concerns that Mythos-class models possess the ability to effectively identify and exploit software vulnerabilities. Anthropic believes that releasing this capability to the general public without stringent controls could significantly empower malicious actors.
The mechanism employed involves a set of classifiers, which are separate AI systems designed to detect misuse and attempts to bypass safety protocols. When a user’s request triggers a classifier, Fable 5 does not outright refuse the request. Instead, the response is rerouted to Opus 4.8, and the user is informed of the handoff. Distillation, the process of extracting a model’s capabilities to train a competing model, is also blocked to prevent sensitive, near-frontier abilities from leaking without accompanying safeguards.
The cybersecurity classifier is broadly designed to impede not only the development of exploits but also a wider range of offensive cyber tasks. This includes activities like reconnaissance, discovery, and lateral movement – the sequential steps involved in carrying out a cyberattack. Internal evaluations conducted by Anthropic, with Fable 5 configured to block rather than reroute flagged requests, and without attempts to evade safeguards, reportedly showed no progress on these offensive tasks. An external partner similarly found that Fable 5 did not comply with any harmful single-turn requests related to cyberattack planning, exploit development, or defense evasion, demonstrating resilience against 30 different public jailbreak techniques.
A notable trade-off is the potential for false positives, where harmless requests might be incorrectly flagged. Anthropic states that it tuned the safeguards conservatively to enable a timely launch, which may lead to occasional misclassifications. The company reports that these fallback instances occur in under 5% of all sessions, meaning Fable 5 functions like the unrestricted Mythos 5 in over 95% of interactions. This figure encompasses all fallback scenarios, including genuine blocks, thereby capping the overall disruption rather than solely measuring the false-positive rate. Anthropic has indicated plans to refine the safeguards post-launch to reduce false positives.
Regarding robustness, external evaluations have yielded specific results. A bug bounty program running for over 1,000 hours did not uncover a universal jailbreak, prompt, or harness capable of completely bypassing the safeguards. External red teams also reported no success on long-form agentic tasks. However, Anthropic acknowledged a caveat: the UK’s AI Security Institute reportedly made progress toward a universal jailbreak during a brief initial testing period. Anthropic concedes that completely preventing universal jailbreaks might be impossible, stating its goal is to make any successful jailbreaks slow and costly enough to detect before widespread exploitation.
The Threat Posed by Advanced AI Capabilities
The rationale for a cautious approach to this model was previously outlined in April, when Anthropic released Claude Mythos Preview to a limited cohort through Project Glasswing. A technical write-up from Anthropic’s red team detailed significant findings.
During testing, Mythos Preview successfully identified and exploited zero-day vulnerabilities across major operating systems and web browsers when directed to do so. The oldest flaw discovered dated back 27 years in OpenBSD, an operating system renowned for its security. The model also autonomously created a remote code execution exploit against FreeBSD’s NFS server, leveraging a 17-year-old bug and triaged as CVE-2026-4747. Anthropic described this outcome as granting full root access to an unauthenticated attacker from anywhere on the internet.
Interestingly, Anthropic stated that these capabilities were not explicitly trained but emerged as a byproduct of general improvements in code generation, reasoning, and autonomy – the same enhancements that improve the model’s ability to patch vulnerabilities. The red team issued a stark warning: mitigations relying on friction rather than hard technical barriers become significantly weaker against a model that can systematically execute tedious exploitation steps at scale. Hard technical barriers like KASLR and W^X still increase the cost and difficulty for attackers, but the warning is specifically aimed at defenses that depend on attacker patience or manual effort, capabilities that the model can now automate.
Mythos 5 inherits these advanced skills, and Anthropic suggests users will find its performance comparable to, or even slightly stronger than, Mythos Preview. The implications for cybersecurity are substantial, as this AI can accelerate the discovery and exploitation of previously unknown vulnerabilities.
The Defender’s Evolving Challenge in Cybersecurity
The defensive implications are not theoretical. In the initial weeks of Project Glasswing, Anthropic and approximately 50 partners utilized Mythos Preview to uncover over ten thousand high- or critical-severity vulnerabilities in systemically important software. Cloudflare alone identified 2,000 bugs, with 400 classified as high- or critical-severity. Mozilla reported finding and fixing 271 vulnerabilities in Firefox 150, a marked increase compared to its findings with the older Opus 4.6. Anthropic indicates that this accelerated pace of vulnerability discovery is evident across the broader industry, with vendors releasing uncharacteristically large security advisories.
This surge in discovered vulnerabilities presents a new challenge: the bottleneck has shifted from discovery to the verification, triage, and patching of these findings. These processes still largely rely on human timelines. Anthropic reports that open-source maintainers, already inundated with lower-quality AI-generated bug reports, have requested a slower disclosure rate due to their inability to develop patches quickly enough. In Project Glasswing, Anthropic estimates that an average of two weeks is required to patch a high- or critical-severity bug identified by the model.
The widening gap between a public disclosure of a vulnerability and the deployment of a patch is now the critical window of opportunity for attackers. The red team’s experiments with disclosed CVEs and their patches underscore this point: starting with only the notification of a CVE and its fix, Mythos Preview was able to construct working Linux privilege-escalation exploits in under a day for each, at a compute cost of a few thousand dollars or less. This rapid exploitation capability necessitates a fundamental shift in defensive strategies.
For defenders, the evolving landscape demands a proactive approach. The assumption should be that a high-severity CVE can be weaponized into a working exploit within hours of its disclosure, not weeks. This underscores the importance of prioritizing auto-update mechanisms for internet-facing systems and treating dependency updates that include CVE fixes as time-sensitive tasks rather than items for the backlog. Foundational security measures such as multi-factor authentication (MFA) and comprehensive logging remain critical, ensuring that a single unpatched vulnerability does not become the sole entry point for an attacker.
To facilitate responsible use, Anthropic has launched a Cyber Verification Program, enabling vetted security professionals to utilize its models for legitimate offensive security work without the standard cybersecurity safeguards. This program aims to provide defenders with the tools needed to stay ahead of potential threats.
New Data Retention Policy for Advanced Models
Anthropic is also implementing a new data retention policy for its Mythos-class models. A 30-day retention period will be enforced for all traffic involving Fable 5, Mythos 5, and future models of comparable capability, across both first-party and third-party platforms. The company has stated that this data will not be used for training or any purpose other than safety, all human access will be logged, and the data will be deleted after 30 days, unless a safety investigation or legal obligation necessitates longer retention.
The stated rationale for this policy is defensive, aiming to help detect novel attacks and jailbreaks that may manifest across multiple user requests. Organizations with stringent data-handling requirements will need to consider this 30-day retention window when routing sensitive traffic through these advanced AI models.
Anthropic intends to expand access to Mythos 5 through a trusted-access program. Furthermore, as compute capacity increases, the company aims to reintegrate Fable 5 into its standard subscription plans without the usage-credit premium that begins after June 22. The broader implications of this launch are significant, raising the question of how similarly capable models from other AI developers will be deployed. Many future models may not include the extensive safety classifiers that Anthropic has implemented. The defensive head start that Project Glasswing was intended to provide is only valuable if the rest of the industry adopts and leverages similar safeguards.

