New research reveals that large language models (LLMs), increasingly tasked with generating passwords, produce credentials that are significantly weaker than they appear, posing a substantial cybersecurity risk. While seemingly random strings like ‘G7$kL9#mQ2&xP4!w’ might fool standard password-strength checkers, they contain inherent predictability and repetition flaws that LLMs, by their nature, introduce.
A study by Irregular analysts examining prominent LLMs such as GPT, Claude, and Gemini found that their password generation processes deviate from cryptographically secure pseudorandom number generators (CSPRNGs). Unlike CSPRNGs, which ensure uniform character distribution, LLMs are designed to predict the most probable next token based on preceding data. This predictive mechanism fundamentally clashes with the requirements of true randomness, leading to exploitable patterns.
LLM-Generated Passwords Expose Major Security Flaws Due to Predictability
The research highlighted significant issues across multiple LLM models. In repeated tests with Claude Opus 4.6, only 30 unique passwords were generated across 50 runs, with one specific sequence, ‘G7$kL9#mQ2&xP4!w’, appearing 18 times, indicating a staggering 36% probability. Furthermore, GPT-5.2 consistently produced passwords beginning with the letter “v,” while Gemini 3 Flash favored passwords starting with “K” or “k.” These are not superficial anomalies; they represent deliberate biases that attackers could leverage.
The implications extend beyond individual users seeking password assistance from chatbots. Coding agents, including Claude Code, Codex, and Gemini-CLI, have been observed generating LLM-based passwords during software development tasks, sometimes without explicit user prompts. In development environments with less rigorous oversight, these weak credentials can be integrated into production systems unnoticed, creating widespread vulnerabilities.
Assessing the Actual Weakness of LLM Passwords
To quantify the insecurity of these LLM-generated passwords, researchers employed the Shannon entropy formula and analyzed log-probability data directly from the models. A well-constructed 16-character password typically possesses around 98 bits of entropy, making it virtually uncrackable through brute-force methods within a reasonable timeframe. In stark contrast, passwords generated by Claude Opus 4.6 exhibited an estimated entropy of only 27 bits. Even more concerning, 20-character passwords from GPT-5.2 showed approximately 20 bits of entropy, a level low enough to be cracked by standard computing power in mere seconds.
Adjusting the “temperature” setting on LLMs, a parameter that influences randomness, proved ineffective. Even at its maximum setting of 1.0, Claude continued to produce repetitive patterns. Reducing the temperature to 0.0 resulted in the same password being generated every single time. The study also identified that predictable LLM-generated password prefixes, such as ‘K7#mP9’ and ‘k9#vL’, are already present in public code repositories like GitHub and various online technical documents.
In light of these findings, cybersecurity teams are advised to audit and promptly rotate any credentials that may have been generated by AI tools or coding agents. Developers should ensure that AI agents are configured to utilize cryptographically secure generation methods, such as `openssl rand` or `/dev/random`. A critical step before deployment is a thorough review of all AI-generated code to identify and remove any hardcoded passwords.

