AI shows potential to reduce false positives, but cannot eliminate all inaccuracies

As cybersecurity professionals gain access to increasingly sophisticated AI models like Anthropic’s Mythos and OpenAI’s Daybreak, organizations are anticipating a significant rise in reported vulnerabilities. This surge in AI-assisted bug hunting is already impacting bug bounty programs, leading to challenges in processing the increased volume of submissions.

GitHub, a leading code repository, has announced it is refining its criteria for “complete” bug reports due to a notable uptick in AI-generated submissions over the past year. While the influx has some benefits, many reports lack essential details such as proof-of-concept, present unrealistic attack scenarios, or address issues already deemed ineligible, complicating the task of separating valid findings from noise.

The Challenge of AI-Generated Vulnerability Reports

Jarom Brown, senior product security engineer at GitHub, stated that this issue is not isolated to his company. “Programs across the industry are grappling with the same challenge, and some have shut down entirely,” Brown wrote, emphasizing the widespread effect. GitHub aims to avoid an outright ban on AI-assisted reports, recognizing their potential to enhance security efforts. However, the company advocates for researchers to validate their discoveries and provide proof of exploitability.

Brown clarified GitHub’s stance: “What we need is the same standard we’ve always expected: validation. An AI-assisted finding that’s been verified, reproduced, and submitted with a working proof of concept is a great submission. An unvalidated output submitted as-is without reproduction or demonstrated impact is not.” This call for validation underscores the need for reproducible and actionable findings, regardless of the tools used for their discovery.

False Positives and Exploitability Concerns

Grant Bourzikas, chief security officer at Cloudflare, echoed these sentiments, noting that triaging bugs and confirming their exploitability has always been a complex aspect of vulnerability research. He added that AI vulnerability scanners and code generation tools have exacerbated this challenge. For instance, AI tools scanning code written in memory-unsafe languages like C and C++ are prone to generating a higher number of false positives for potential exploits.

A primary concern with current AI tools is their tendency to fulfill user requests, even when the requested outcome is not present. This can result in bug reports filled with speculative language and qualifiers regarding exploitability, necessitating significant human effort for verification. Bourzikas described this as a “reasonable bias for an exploratory tool” but a “ruinous one for a triage queue,” where each speculative finding consumes valuable human attention.

Evaluating Newer AI Models

Cloudflare shared recent test results of Anthropic’s Mythos, which was applied to 50 of its code repositories. Bourzikas characterized Mythos as a distinct tool with advanced capabilities, noting its effectiveness in reducing false positives. He highlighted Mythos’s ability to chain exploits together and generate its own proof-of-concept code, features that set it apart from previous models.

Historically, older AI models could identify many bugs but often struggled to demonstrate exploitability in real-world conditions. This advancement, however, is not universally seen as a revolutionary leap. Daniel Stenberg, lead developer for the open-source tool curl, reported a similar increase in AI-assisted submissions but observed a tapering off of low-quality reports since March as models have improved.

Mixed Results and Future Outlook

Despite praise for AI’s general impact on cybersecurity, Stenberg found that Mythos offered only a marginal improvement over existing tools. After analyzing 178,000 lines of curl code with Mythos, five “confirmed” vulnerabilities were flagged. However, subsequent human review identified four of these as false positives or having no security impact, with the single remaining bug being a low-severity flaw slated for a routine update. Stenberg concluded, “My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing.”

The ongoing evolution of AI capabilities in cybersecurity presents a complex landscape for bug bounty programs. While newer models show promise in identifying vulnerabilities, the industry continues to emphasize the critical need for human validation and reproducible evidence of exploitability. Future developments will likely focus on AI’s ability to provide more actionable and less speculative findings, alongside continued efforts by organizations to refine their reporting standards.

Trending

Eclipse Incident Highlights Ongoing Researcher-Vendor Disputes

Hackers Exploit Critical Vulnerability in Everest Forms Pro WordPress Plugin

Final Layer Remains

The Challenge of AI-Generated Vulnerability Reports

False Positives and Exploitability Concerns

Evaluating Newer AI Models

Mixed Results and Future Outlook

Eclipse Incident Highlights Ongoing Researcher-Vendor Disputes

AI Agent Poses Insider Threat

Palo Alto Networks vulnerability exploit revealed.

Zapier addresses bug chain that researchers linked to widespread account takeover risk

Apple Releases Quantum-Resistant Encryption Code

FBI warns of rapidly growing phishing kit targeting Microsoft 365 users

Hackers Exploit Critical Vulnerability in Everest Forms Pro WordPress Plugin

Final Layer Remains

Cisco Addresses Vulnerability in Unified Communications Manager Following Publication of Exploit Code

AI Agent Poses Insider Threat

Cisco Addresses Vulnerability in Unified Communications Manager Following Publication of Exploit Code

AI Agent Poses Insider Threat

GitHub Action Vulnerability Affects Claude Code Repositories

Trending

AI shows potential to reduce false positives, but cannot eliminate all inaccuracies

The Challenge of AI-Generated Vulnerability Reports

False Positives and Exploitability Concerns

Evaluating Newer AI Models

Mixed Results and Future Outlook

Keep Reading