What if agentic systems could defend themselves from adversarial attacks? In our paper, we experimented with and developed a defensive system for AI agents to protect themselves against adversarial prompt attacks and jailbreak attempts. We demonstrated that our multi-agentic systems can:
This will have a significant impact in the near term, especially as agentic systems are required to function autonomously without human supervision.
Our work is available at: Preprint Guardians of the Agentic System: Preventing Many Shots Jailb...
Our code and experiments are open-sourced at: https://github.com/GitsSaikat/Guardian-Agent