A newly discovered cyberattack, dubbed “Lies-in-the-Loop,” is exploiting the trust users place in AI code assistant safety dialogs, turning them into a vector for remote code execution. Researchers from Checkmarx have identified this significant vulnerability, which affects prominent AI platforms such as Claude Code and Microsoft Copilot Chat, posing a critical threat to AI security.
The Lies-in-the-Loop attack specifically targets Human-in-the-Loop (HITL) controls, a crucial safeguard designed to prevent unintended or malicious operations by requiring explicit user approval. However, by manipulating the content displayed in these approval dialogs, attackers can deceive users into authorizing the execution of harmful code, essentially weaponizing AI’s own safety mechanisms.
Lies-in-the-Loop Attack Undermines AI Safety
The core of the Lies-in-the-Loop attack lies in its ability to manipulate the visual presentation of HITL dialogs. Attackers achieve this through indirect prompt injection, where malicious instructions are embedded within seemingly innocuous text. This technique leverages the trust users have in these confirmation prompts, assuming the presented information is an accurate representation of the impending action.
According to Checkmarx researchers, the malicious payload is often padded with benign-looking text. This padding is designed to push the dangerous commands beyond the visible area of terminal windows or code editors. When a user is prompted to review and approve an action, they may only see a portion of the generated text. By scrolling to review what appears to be harmless instructions, users can unknowingly approve arbitrary code execution on their systems.
As a proof of concept, the attack has successfully demonstrated the execution of simple commands, such as launching the calculator application (calculator.exe). However, security experts warn that the potential for more damaging payloads, including data exfiltration, system compromise, or the deployment of ransomware, is significant.
The danger is amplified when the Lies-in-the-Loop attack is combined with Markdown injection vulnerabilities. This combination allows attackers to artfully craft entirely fake approval dialogs that mimic the legitimate interface of the AI assistant. This sophisticated manipulation can make the attack exceptionally difficult for users to detect, as the visual cues they rely on to identify trustworthy prompts are compromised.
Infection Mechanism Revealed
The infection mechanism employed by the Lies-in-the-Loop attack operates through a three-step process. First, attackers inject malicious content into the AI agent’s context. This is typically achieved through external sources that the AI might process, such as code repositories, web pages, or shared documents. This initial injection poisons the environment the AI operates within.
Following the injection, the AI agent processes the compromised instructions and generates a HITL dialog. This dialog, while appearing benign on the surface, is based on the attacker’s hidden agenda. The AI, following its programming to solicit user confirmation for potentially sensitive actions, presents this manipulated dialog to the user.
Finally, the user approves the dialog. This approval is given without full awareness of the actual payload concealed within the surrounding deceptive text. The attack capitalizes on the user’s inability to independently verify the exact executable instruction that the AI agent intends to run, relying solely on the potentially misleading interface provided.
Both Anthropic, the developer of Claude Code, and Microsoft, behind Copilot Chat, have acknowledged the findings. However, they have reportedly classified these vulnerabilities as outside their current threat models, citing that successful exploitation requires multiple, non-default user actions. Despite this classification, security researchers stress that the Lies-in-the-Loop attack highlights a fundamental challenge in the design of AI agents and their interfaces.
When humans rely on dialog content that they cannot independently audit or verify, the trust inherent in these safety mechanisms becomes a critical point of failure. As AI systems become more integrated and capable of autonomous actions, traditional security paradigms need re-evaluation. Protecting users from sophisticated social engineering attacks at the human-AI interface level is paramount.
The implications of this discovery extend to the broader landscape of AI development and security. Developers will need to devise more robust methods for validating AI-generated prompts and ensuring the integrity of HITL dialogs. Users, in turn, may need to exercise increased vigilance and employ advanced verification techniques when interacting with AI-powered tools that handle sensitive operations. The ongoing evolution of AI capabilities necessitates a continuous adaptation of security measures to stay ahead of emerging threats like Lies-in-the-Loop.

