Cybersecurity researchers have uncovered critical remote code execution vulnerabilities affecting prominent AI inference engines from Meta, Nvidia, Microsoft, and open-source projects like vLLM and SGLang. The widespread flaws, identified by Oligo Security, stem from a shared insecure coding pattern dubbed “ShadowMQ,” which leverages the combination of ZeroMQ (ZMQ) and Python’s pickle deserialization. This pattern has propagated across multiple libraries due to code reuse, creating significant security risks for AI deployments.
The root cause of these vulnerabilities has been traced back to Meta’s Llama large language model (LLM) framework. Specifically, a previously patched vulnerability (CVE-2024-50050) involved the unsafe use of ZeroMQ’s `recv_pyobj()` method, which deserializes incoming data using Python’s pickle module. Because the framework exposed the ZeroMQ socket over the network, attackers could send malicious data, leading to arbitrary code execution. While Meta addressed this issue and the pyzmq Python library has also been updated, the underlying pattern of insecure deserialization had already made its way into other critical AI infrastructure components.
ShadowMQ: A Widespread AI Inference Engine Vulnerability
Oligo Security researchers observed the same insecure pattern, described as “pickle deserialization over unauthenticated ZMQ TCP sockets,” recurring in several other influential AI inference engines. These include NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server, vLLM, and SGLang. According to Oligo researcher Avi Lumelsky, this indicates a concerning trend where developers across different organizations and projects unintentionally replicate the same security flaw, often through direct code copying.
The report highlights instances where vulnerable code was directly copied. For example, a vulnerable file in SGLang explicitly states it was adapted from vLLM. Similarly, the Modular Max Server reportedly borrowed logic from both vLLM and SGLang, effectively amplifying the propagation of the same security weakness across multiple codebases.
Identified Vulnerabilities and Their Impact
The vulnerabilities have been assigned specific CVE identifiers and CVSS scores, indicating their severity. These include:
- CVE-2025-30165 (CVSS score: 8.0) in vLLM. While not fully patched, the project has switched to its V1 engine by default to mitigate the risk.
- CVE-2025-23254 (CVSS score: 8.8) in NVIDIA TensorRT-LLM, which has been fixed in version 0.18.2.
- CVE-2025-60455 (CVSS score: N/A) affecting Modular Max Server, which has also been patched.
- Sarathi-Serve remains unpatched, posing an ongoing risk.
- SGLang has implemented fixes, though they are described as incomplete.
The implications of these vulnerabilities are substantial. Inference engines are fundamental to AI infrastructure. A successful compromise of a single node could enable attackers to execute arbitrary code across an entire cluster. This could lead to privilege escalation, theft of valuable AI models, or the deployment of malicious payloads such as cryptocurrency miners for financial gain. Lumelsky emphasized that the rapid pace of development in AI projects often leads to borrowing code components, but this practice can quickly spread insecure patterns.
Broader AI Security Concerns Beyond Inference Engines
This disclosure follows other recent reports highlighting security risks within AI development tools. A separate report from AI security platform Knostic indicated that Cursor’s built-in browser is susceptible to compromising developer workstations. This can be achieved through JavaScript injection techniques, potentially facilitated by malicious extensions. The attack vector involves setting up a rogue local Model Context Protocol (MCP) server that bypasses Cursor’s security measures. This allows attackers to present fake login pages, harvesting user credentials and exfiltrating them to a remote server.
Furthermore, given that Cursor is based on Visual Studio Code, malicious extensions could be crafted to inject JavaScript directly into the IDE. This allows attackers to execute arbitrary actions, including falsely flagging legitimate extensions as malicious. According to Knostic, JavaScript running within the IDE’s Node.js interpreter inherits its privileges, granting attackers full file-system access, the ability to modify IDE functions, and to persist malicious code that restarts with the IDE. This can effectively turn the development environment into a platform for malware distribution and data exfiltration.
To mitigate these risks, users are advised to disable auto-run features in their Integrated Development Environments (IDEs), carefully vet all extensions before installation, and only install MCP servers from trusted sources. It is also crucial to review the data and APIs accessed by these servers, utilize API keys with the minimum necessary permissions, and audit MCP server source code for critical integrations. The ongoing evolution of AI technologies necessitates continuous vigilance against emerging security threats like the ShadowMQ vulnerability.

