A groundbreaking study has revealed that advanced AI language models like GPT-5.2 can now autonomously develop working exploits for zero-day vulnerabilities. Security researcher Sean Heelan conducted an experiment challenging two sophisticated AI systems, GPT-5.2 and Opus 4.5, to create exploits for a previously unknown flaw in the QuickJS JavaScript interpreter. The findings suggest a significant shift in offensive cybersecurity, where automated systems are demonstrating the capacity to generate functional attack code without human intervention.
The experiment involved multiple scenarios designed to test the AI models’ capabilities against various security protections and objectives. GPT-5.2 successfully navigated every challenge, while Opus 4.5 solved all but two. Collectively, the systems generated over 40 distinct exploits across six different configurations. These ranged from basic shell spawning to more intricate tasks, such as writing specific files to disk while simultaneously bypassing multiple modern security mechanisms. The research indicates that current generation AI models possess the reasoning and problem-solving skills necessary to tackle complex exploitation challenges.
AI Demonstrates Capability to Develop Zero-Day Exploits at Scale
Independent analyst Sean Heelan noted that the implications of this research extend well beyond simple proof-of-concept demonstrations. The study suggests that organizations may soon evaluate their offensive cybersecurity capabilities not by the number of skilled human hackers they employ, but rather by their computational resources and token budgets. Many of the challenges were solved in under an hour at relatively modest costs. Standard scenarios required approximately 30 million tokens, costing around $30 per attempt. Even the most complex task was completed in just over three hours for approximately $50, making the large-scale generation of exploits economically feasible.
This research raises critical questions regarding the future of cybersecurity defenses. While the QuickJS interpreter used in the study is considerably less complex than production browsers such as Chrome or Firefox, the systematic approach demonstrated by these AI models suggests a potential for scalability to larger and more complex targets. The exploits generated did not rely on discovering novel security weaknesses but instead leveraged known gaps and limitations, mirroring techniques commonly employed by human exploit developers. The ability of AI to generate sophisticated attack code underscores the need for adaptive and advanced defensive strategies.
How the Advanced Exploit Chains Work
The most sophisticated challenge presented in the study required GPT-5.2 to write a specific string to a designated file path while multiple security mechanisms were actively in place. These protections included Address Space Layout Randomization (ASLR), non-executable memory, full RELRO (relocation read-only), fine-grained control flow integrity on the QuickJS binary, hardware-enforced shadow stack, and a seccomp sandbox designed to prevent shell execution. Furthermore, all operating system and file system functionalities were removed from QuickJS, eliminating common exploitation pathways.
GPT-5.2 devised a creative solution by chaining seven function calls through the glibc exit handler mechanism to achieve the file writing capability. This technique successfully bypassed the shadow stack protection, which typically prevents return-oriented programming (ROP) techniques, and circumvented the sandbox restrictions that would have blocked shell spawning. The AI agent consumed 50 million tokens and required just over three hours to develop this functional zero-day exploit. This demonstrates that substantial computational resources can effectively substitute for human expertise in complex security research tasks.
The verification process for these AI-generated exploits was straightforward and automated. Since exploits typically aim to create capabilities that should not normally exist, testing involves attempting to perform the forbidden action after running the exploit code. For shell spawning tests, the verification system initiated a network listener, executed the JavaScript interpreter, and then checked if a connection was received. If the connection succeeded, the exploit was confirmed to be functional, as QuickJS normally lacks the ability to perform network operations or spawn processes.
The rapid advancement in AI’s capability to identify and exploit zero-day vulnerabilities presents a significant challenge for cybersecurity professionals. Future research will likely focus on understanding the limitations of current AI models and developing AI-powered defenses capable of keeping pace with these evolving threats. The ongoing developments in artificial intelligence necessitate a proactive and adaptive approach to security, anticipating the next steps in both offensive and defensive technologies.

