In our recent paper on prompt injections, we derived new threats facing applications built on top of LLMs. In this post, I will take these abstract threat models and show how they will affect software being deployed to hundreds of millions of users- including nation-states and militaries. We will look at LLM applications in escalating order of stupidity, ending with attackers potentially compromising military LLMs to suggest kinetic options (a euphemism for bombing people with battlefield AIs compromised by bad actors)⊕ I guess it would be an evil use-case no matter if the battlefield AI was compromised by other evil actors- icing on the cake.
My take so far is that there isn’t really any great options to protect against prompt injections. Simon Wilson presents an idea here on his blog which could is a bit interesting. NVIDIA has open sourced a framework for this as well, but it’s not without problems. Otherwise I’ve mostly seen prompt injection firewall products but I wouldn’t trust them too much yet.