A recent study by ETH Zurich has revealed that large language models (LLMs) can significantly expedite the process of identifying individuals online, potentially undermining internet anonymity. Researchers demonstrated how these advanced AI tools can efficiently connect disparate pieces of online information to unmask pseudonymous users, a capability that could reshape privacy expectations in the digital age.
The study involved LLM agents analyzing anonymous biographical data derived from real user profiles on platforms like HackerNews and Reddit. These agents were tasked with searching the internet for additional details to identify the individuals behind the profiles. While success rates varied, the models demonstrated an ability to achieve in minutes what would typically require hours of manual investigation by a human. In one instance involving data from AI company Anthropic, a fine-tuned LLM successfully re-identified 9 out of 125 candidates by correlating profile summaries with publicly available information, including professional networking sites like LinkedIn.
LLMs Accelerate Online Deanonymization Efforts
The research underscores a fundamental shift in the landscape of online privacy, with LLMs enabling automated deanonymization attacks that operate on unstructured text data at scale. Daniel Paleka, a doctoral student and co-author of the study, stated that the findings indicate these AI tools have dramatically lowered the barrier to identifying individuals who maintain a degree of online anonymity. According to Paleka, operational security models that relied on the time commitment of adversaries are now potentially compromised.
However, it is important to note that the study’s ethical considerations meant researchers did not test their methods on individuals actively seeking to conceal their identities. The subjects whose data was used were not high-privacy users. This distinction is crucial, as the findings may not directly reflect the capabilities against those employing advanced privacy-protection techniques.
Real-World Implications and Concerns
Instances of AI-driven deanonymization are already emerging. For example, xAI’s Grok recently revealed the legal name and address of an adult film actress who had exclusively used a stage name for over a decade. The actress reported being “doxxed” by the AI tool, with her private information subsequently disseminated across the internet by other AI scrapers. This case highlights the potential for misuse and the rapid spread of personally identifiable information once exposed.
While law enforcement and intelligence agencies have long used open-source intelligence gathering, LLMs can perform similar tasks with unprecedented speed and cost-effectiveness. Tasks like determining a user’s nationality, location, or employment history, which previously might have required significant human effort or professional services, can potentially be accomplished by LLMs in seconds for minimal processing costs. Paleka expressed significant concern over the privacy implications, referring to LLM deanonymization capabilities as a “large scale invasion of privacy.”
The study suggests these AI advancements could reshape the online environment for various actors, including governments, law enforcement, advertisers, and cybercriminals. In authoritarian regimes, these tools could pose amplified risks to dissidents, journalists, and human rights activists who rely on anonymity to operate safely. Jacob Hoffman-Andrews, a senior staff technologist at the Electronic Frontier Foundation, noted that even small pieces of identifying information, when processed by LLMs, could lead to an individual’s unmasking unexpectedly.
Posting seemingly innocuous personal details or maintaining a consistent online presence over extended periods can provide LLMs with sufficient data to correlate accounts and, eventually, identify a real-world identity. The capacity of LLMs to efficiently summarize information and tirelessly process vast datasets makes them powerful tools for such investigations.
Looking Ahead: The Future of Online Anonymity
The potential exists for companies offering insurance or background check services to leverage deanonymization technology. AI companies themselves may also develop these capabilities into standalone products. The overall long-term impact is likely to be an internet where maintaining anonymity becomes considerably more challenging for individuals, regardless of their intentions.
Hoffman-Andrews emphasized the value of pseudo-anonymity online and argued that individuals should not need to be advanced security experts to protect their privacy from sophisticated adversaries like LLMs. The next steps will likely involve discussions and potential policy developments regarding the responsible development and deployment of such deanonymization technologies, as well as user education on evolving online privacy risks.

