As cybersecurity professionals gain access to increasingly sophisticated AI models like Anthropic’s Mythos and OpenAI’s Daybreak, organizations are anticipating a significant rise in reported vulnerabilities. This surge in AI-assisted bug hunting is already impacting bug bounty programs, leading to challenges in processing the increased volume of submissions.
GitHub, a leading code repository, has announced it is refining its criteria for “complete” bug reports due to a notable uptick in AI-generated submissions over the past year. While the influx has some benefits, many reports lack essential details such as proof-of-concept, present unrealistic attack scenarios, or address issues already deemed ineligible, complicating the task of separating valid findings from noise.
The Challenge of AI-Generated Vulnerability Reports
Jarom Brown, senior product security engineer at GitHub, stated that this issue is not isolated to his company. “Programs across the industry are grappling with the same challenge, and some have shut down entirely,” Brown wrote, emphasizing the widespread effect. GitHub aims to avoid an outright ban on AI-assisted reports, recognizing their potential to enhance security efforts. However, the company advocates for researchers to validate their discoveries and provide proof of exploitability.
Brown clarified GitHub’s stance: “What we need is the same standard we’ve always expected: validation. An AI-assisted finding that’s been verified, reproduced, and submitted with a working proof of concept is a great submission. An unvalidated output submitted as-is without reproduction or demonstrated impact is not.” This call for validation underscores the need for reproducible and actionable findings, regardless of the tools used for their discovery.
False Positives and Exploitability Concerns
Grant Bourzikas, chief security officer at Cloudflare, echoed these sentiments, noting that triaging bugs and confirming their exploitability has always been a complex aspect of vulnerability research. He added that AI vulnerability scanners and code generation tools have exacerbated this challenge. For instance, AI tools scanning code written in memory-unsafe languages like C and C++ are prone to generating a higher number of false positives for potential exploits.
A primary concern with current AI tools is their tendency to fulfill user requests, even when the requested outcome is not present. This can result in bug reports filled with speculative language and qualifiers regarding exploitability, necessitating significant human effort for verification. Bourzikas described this as a “reasonable bias for an exploratory tool” but a “ruinous one for a triage queue,” where each speculative finding consumes valuable human attention.
Evaluating Newer AI Models
Cloudflare shared recent test results of Anthropic’s Mythos, which was applied to 50 of its code repositories. Bourzikas characterized Mythos as a distinct tool with advanced capabilities, noting its effectiveness in reducing false positives. He highlighted Mythos’s ability to chain exploits together and generate its own proof-of-concept code, features that set it apart from previous models.
Historically, older AI models could identify many bugs but often struggled to demonstrate exploitability in real-world conditions. This advancement, however, is not universally seen as a revolutionary leap. Daniel Stenberg, lead developer for the open-source tool curl, reported a similar increase in AI-assisted submissions but observed a tapering off of low-quality reports since March as models have improved.
Mixed Results and Future Outlook
Despite praise for AI’s general impact on cybersecurity, Stenberg found that Mythos offered only a marginal improvement over existing tools. After analyzing 178,000 lines of curl code with Mythos, five “confirmed” vulnerabilities were flagged. However, subsequent human review identified four of these as false positives or having no security impact, with the single remaining bug being a low-severity flaw slated for a routine update. Stenberg concluded, “My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing.”
The ongoing evolution of AI capabilities in cybersecurity presents a complex landscape for bug bounty programs. While newer models show promise in identifying vulnerabilities, the industry continues to emphasize the critical need for human validation and reproducible evidence of exploitability. Future developments will likely focus on AI’s ability to provide more actionable and less speculative findings, alongside continued efforts by organizations to refine their reporting standards.

