China employs human labor for AI-powered hacking, despite claims of autonomy.

Anthropic revealed Thursday that a sophisticated, previously unknown Chinese state-sponsored hacking group utilized the company’s Claude AI generative AI product in a campaign that breached the defenses of at least 30 organizations. This marks a significant escalation in the misuse of advanced AI tools for cyberattacks.

According to Anthropic’s research, the threat actor circumvented Claude’s security measures through two primary tactics: segmenting malicious tasks into discreet operations to evade detection of overall intent and deceiving the AI into believing it was performing a legitimate security audit. This operation highlights novel methods threat actors are employing with AI.

Novel AI Misuse in Cyber Espionage

Jacob Klein, who heads Anthropic’s threat intelligence team, stated that the company has observed an increasing number of inventive uses of Claude by malicious actors over the past year. Initially, threat actors were observed copying and pasting interactions from chatbots to construct malware or phishing lures. Following the release of Claude Code, Anthropic’s code development tool, the company noted a rise in the use of this tool by bad actors to accelerate script generation and code development for their operations.

Klein described the September operation as the “most autonomous misuse” seen to date. However, he clarified that “most autonomous” is a relative term, as evidence suggests this hacking group invested substantial human and technical resources into orchestrating its use of Claude.

The Role of Human Oversight and Framework Development

The advanced automation performed by Claude was enabled by a custom-built frontend framework designed to manage and support its operations. This framework handled functions like scripting, provisioning necessary servers, and undertaking significant backend development to ensure each stage of the attack proceeded as intended. Klein emphasized that the construction of this framework was the most challenging and, critically, human-intensive aspect of the entire operation.

Additionally, to conduct reconnaissance, scan for vulnerabilities, and perform other tasks, Claude interacted with external open-source tools via Model Context Protocol (MCP) servers. These servers facilitate secure connections between AI models and external digital resources. Establishing these connections requires coding expertise, meticulous planning, and considerable human technical effort to ensure interoperability.

Furthermore, Claude’s actions were consistently subject to human validation and review. The documented attack chain illustrates at least four distinct steps where human operators verified Claude’s output or directed the AI to revise its work before proceeding to subsequent actions.

This iterative human oversight indicates that while Claude could execute tasks autonomously, its operation was contingent on human intervention for reviewing outputs, confirming findings, ensuring backend systems functioned correctly, and guiding future actions.

Anthropic’s report also underscores a common limitation across all AI-generated research: models like Claude can frequently “hallucinate,” fabricate credentials, exaggerate findings, or present publicly available information as novel discoveries. This inherent unreliability necessitates that threat actors, like any users, rely on human technical experts to review and correct AI outputs at each stage before trusting the results.

For example, in vulnerability scanning, Klein noted that “step one is Claude comes back and says, ‘here’s all the assets I found related to this target,’ then sends it back to the human.” Claude did not proceed to the penetration testing phase until human review was completed.

Despite the human elements involved, Klein expressed significant concern about the findings. The human operator’s ability to “scale themselves fairly dramatically” through the use of such frameworks is a key worry, as it drastically amplifies individual capabilities.

Attribution and Behavioral Indicators

Regarding the suspected ties to China, Klein cited overlaps in infrastructure and operational behavior with known Chinese state-sponsored actors. The targeting set also strongly aligned with the strategic objectives attributed to the Chinese Ministry of State Security.

Additional, albeit circumstantial, evidence pointing to a Chinese nexus includes usage logs indicating the group typically operated during standard business hours and observed a cessation of activity during a noted Chinese holiday. Klein indicated that other proprietary information further supported the attribution but could not be publicly disclosed.

AI and Cybersecurity Experts Offer Divergent Views

While research into AI-powered cyber espionage is still emerging, substantial evidence suggests that large language models have demonstrated marked improvements in handling cybersecurity-specific tasks over the past year. Earlier this year, XBOW, a startup, saw its AI vulnerability scanning and patching tool achieve top rankings on bug bounty platforms like HackerOne.

On the offensive side, researchers at NYU developed a framework similar to the one used in the Anthropic campaign, leveraging a publicly available version of ChatGPT to automate significant portions of a ransomware attack. The Anthropic report represents the first publicly known instance of a nation-state reportedly employing such a process for successful attacks.

However, the implications of this campaign and Anthropic’s findings have sparked debate within AI and cybersecurity communities. Some view it as validation of existing concerns about AI-enabled hacking, while others argue the report’s conclusions may misrepresent the current state of cyber-espionage operations.

Kevin Beaumont, a cybersecurity researcher based in the UK, criticized Anthropic’s report for a perceived lack of transparency and for describing techniques that are already achievable with existing tools, leaving limited room for external validation. He stated on LinkedIn that the report lacks indicators of compromise and that the described techniques are “off-the-shelf” with established detection methods.

Klein responded that Anthropic has shared indicators of compromise with relevant entities through established information-sharing agreements, but not with the general public.

Other observers contend that Anthropic’s findings are a crucial milestone in the application of AI to cybersecurity. Jen Easterly, former director of the Cybersecurity and Infrastructure Security Agency, acknowledged concerns about transparency but credited Anthropic for disclosing the attacks. She noted the uncertainty regarding which tasks were genuinely accelerated by AI versus those achievable with standard tools, and the lack of specific details on agent chains, model hallucinations, human intervention frequency, and output reliability.

Tiffany Saade, an AI researcher with Cisco’s AI defense team, commented that the report clearly indicates that tools like Claude offer attackers advantages in speed and scale. She posed the question of whether these advantages are sufficient to motivate hackers to adopt LLMs over other automation methods, considering their limitations, and whether this will lead to increased sophistication in attacks.

Saade also raised questions about the chosen AI model, finding it unusual for a Chinese state-sponsored actor to employ a major U.S. AI model for automation when in-house alternatives exist. She suggested that using prominent public models like Claude might indicate a desire for visibility.

Saade proposed an alternative motivation: a geopolitical message to Washington D.C., demonstrating that Beijing’s hackers can execute the very capabilities that are widely feared. The operation could be intended to generate attention and validate hypotheses about AI’s offensive capabilities, rather than purely for stealth or sabotage.

The next expected step involves further analysis and potential responses from cybersecurity firms and governments regarding AI security protocols and threat intelligence sharing. Uncertainties remain regarding the full scope of AI’s capabilities in cyber warfare and the ongoing arms race between AI developers and malicious actors.

Trending

Cisco Catalyst SD-WAN Manager Vulnerability Exploited, Patch Pending

Eclipse Incident Highlights Ongoing Researcher-Vendor Disputes

Hackers Exploit Critical Vulnerability in Everest Forms Pro WordPress Plugin

Novel AI Misuse in Cyber Espionage

The Role of Human Oversight and Framework Development

Attribution and Behavioral Indicators

AI and Cybersecurity Experts Offer Divergent Views

Eclipse Incident Highlights Ongoing Researcher-Vendor Disputes

AI Agent Poses Insider Threat

Palo Alto Networks vulnerability exploit revealed.

Zapier addresses bug chain that researchers linked to widespread account takeover risk

Apple Releases Quantum-Resistant Encryption Code

FBI warns of rapidly growing phishing kit targeting Microsoft 365 users

Eclipse Incident Highlights Ongoing Researcher-Vendor Disputes

Hackers Exploit Critical Vulnerability in Everest Forms Pro WordPress Plugin

Final Layer Remains

Cisco Addresses Vulnerability in Unified Communications Manager Following Publication of Exploit Code

Final Layer Remains

Cisco Addresses Vulnerability in Unified Communications Manager Following Publication of Exploit Code

AI Agent Poses Insider Threat

Trending

China employs human labor for AI-powered hacking, despite claims of autonomy.

Novel AI Misuse in Cyber Espionage

The Role of Human Oversight and Framework Development

Attribution and Behavioral Indicators

AI and Cybersecurity Experts Offer Divergent Views

Keep Reading