NVIDIA and Lakera AI have jointly proposed a novel unified framework aimed at enhancing the safety and security of increasingly autonomous agentic system safety. This collaborative effort addresses the significant security challenges posed by advanced AI agents that can interact with and manipulate digital environments and tools. The urgency for such a framework stems from the rapid evolution of AI, where traditional security paradigms are proving insufficient against the unique risks introduced by these sophisticated systems.
The proposed framework redefines safety and security not as static attributes of an AI model but as emergent properties arising from the complex interplay between AI models, their underlying orchestration, the tools they utilize, and the data they access. This holistic perspective is intended to provide a more robust method for identifying and managing potential risks throughout the entire lifecycle of an agentic system, from its initial development to its operational deployment in real-world applications. This proactive approach is crucial as AI agents become more integrated into various sectors.
The Limits of Traditional Security and the Need for a New Paradigm
The researchers highlight that conventional security assessment tools, such as the Common Vulnerability Scoring System (CVSS), are inadequate for evaluating the multifaceted risks associated with agentic AI. They explain that a seemingly minor security flaw at a component level could potentially escalate into significant, system-wide harm to users. This underscores the need for a more comprehensive and nuanced approach to security assessment.
The new model offers a structured methodology for understanding how localized vulnerabilities can compound and lead to unforeseen and potentially large-scale failures. Illustrated by an architectural diagram in their research, the framework provides a clear roadmap for assessing these complex systems. It is designed with enterprise-grade workflows in mind, aiming to ensure that as AI agents become more deeply embedded in business processes, their operations remain consistently aligned with established safety and security policies.
AI-Driven Risk Discovery for Agentic Systems
A key innovation within the proposed framework is its emphasis on AI-driven risk discovery, leveraging a sophisticated red teaming process. Within a secure, sandboxed environment, specialized “evaluator” AI agents are deployed to systematically probe the primary agentic system for latent weaknesses. These evaluators simulate a wide array of potential attack scenarios, including prompt injection tactics and advanced attempts at tool misuse, with the goal of uncovering vulnerabilities before they can be exploited in live environments.
This automated evaluation process enables developers to identify and mitigate novel agentic risks, such as unintended amplification of control or cascading sequences of actions, within a controlled and predictable setting. This capability is vital for building trust and reliability in AI systems that are expected to operate autonomously.
To further support the research and development in this critical area, the researchers have also made available the Nemotron-AIQ Agentic Safety Dataset 1.0. This comprehensive dataset comprises over 10,000 detailed traces of agent behaviors observed during both attack and defense simulations. This valuable resource is expected to empower the broader AI community to study and develop more resilient safety measures for the next generation of agentic AI, offering evolving insights into the operational dynamics of these complex systems.
The ongoing research in agentic system safety and security is expected to yield further advancements in safeguarding AI systems as they become more pervasive. The development and adoption of such unified frameworks are crucial steps toward ensuring that AI’s potential can be harnessed responsibly and securely for the benefit of society and industry alike. Future work will likely focus on refining these AI-driven testing methodologies and expanding the datasets to cover an even broader range of potential risks and operational scenarios.

