Introduction
Anytime something new comes along, there's always going to be somebody that tries to break it. AI is no different, and this is why it seems we can't have nice things. In fact, we've already seen more than 6,000 research papers, exponential growth, that have been published related to adversarial AI examples. Now, in this post, we're going to take a look at six different types of attacks, major classes and try to understand them better. And then stick around to the end where I'm going to share with you three different resources that you can use to understand the problem better and build defenses.
1. Prompt Injection Attack
So you might have heard of a SQL injection attack. When we're talking about an AI, well, we have prompt injection attacks. What does a prompt injection attack involve? Well, think of it as sort of like a social engineering of the AI. So we're convincing it to do things it shouldn't do.
Sometimes it's referred to as jailbreaking, but we're basically doing this in one of two ways. There's a direct injection attack where we have an individual that sends a command into the AI and tells it to do something. Pretend that this is the case. Or, I want you to play a game that looks like this.
I want you to give me all wrong answers. These might be some of the things that we inject into the system. And because it's wanting to please, it's going to try to do everything that you ask it to, unless it's been explicitly told not to do that, it will follow the rules that you've told it. So you're setting a new context, and now it starts operating out of the context that we originally intended it to. And that can affect the output.
Another example of this is an indirect attack, where maybe I have the AI, I send a command, or the AI is designed to go out and retrieve information from an external source, maybe a web page. And in that web page, I've embedded my injection attack. That's where I say, now pretend that you're going to give me all the wrong answers and do something of that sort. That then gets consumed by the AI and it starts following those instructions.
So this is one major attack. In fact, we believe this is probably the number one set of attacks against large language models, according to the OWASP report.
2. Infection
What's another type of attack that we think we're going to be seeing? In fact, we've already seen examples of this to date, is infection. So we know that you can infect a computing system with malware. In fact, you can infect an AI system with malware as well. In fact, you could use things like Trojan horses or back doors, things of that sort that come from your supply chain.
And if you think about this, most people are never going to build a large language model because it's too compute intensive, requires a lot of expertise, and a lot of resources. So we're going to download these models from other sources. And what if someone in that supply chain has infected one of those models? The model then could be suspect. It could do things that we don't intend it to do. And, in fact, there's a whole class of technologies, machine learning, detection and response capabilities, because it's been demonstrated that this can happen, these technologies exist to try to detect and respond to those types of threats.
3. Evasion
Another type of attack class is something called evasion. And in evasion, we're basically modifying the inputs into the AI. So we're making it come up with results that we were not wanting.
An example of this that's been cited in many cases was a stop sign where someone was using a self-driving car or a vision-related system that was designed to recognize street signs. And normally it would recognize the stop sign, but someone came along and put a small sticker, something that would not confuse you or me, but it confused the AI massively to the point where it thought it was not looking at a stop sign, it thought it was looking at a speed limit sign, which is a big difference and a big problem if you're in a self-driving car that can't figure out the difference between those two. So sometimes the AI can be fooled, and that's an evasion attack in that case.
4. Poisoning
Another type of attack class is poisoning. We poison the data that's going into the AI, and this can be done intentionally by someone who has bad purposes in mind.
In this case, if you think about our data that we're going to use to train the AI, we've got lots and lots of data. And sometimes introducing just a small error, a small factual error into the data is all it takes in order to get bad results.
In fact, there was one research study that came out and found that as little as 0.001% of error introduced in the training data for an AI was enough to cause results to be anomalous and be wrong.
5. Extraction
Another class of attack is what we refer to as extraction. Think about the AI system that we built and the valuable information that's in it. So we've got in this system potentially intellectual property that's valuable to our organization. We've got data that we maybe used to train and tune the models that are in here. We might have even built a model ourselves. And all of these things we consider to be valuable assets to the organization.
So what if someone decided they just wanted to steal all of that stuff? Well, one thing they could do is a set of extensive queries into the system. So maybe I ask it a little bit and I get a little bit of information.
I send another query, I get a little more information. And I keep getting more and more information. If I do this enough, and if I fly sort of slow and low below radar, no one sees that I've done this, in enough time I've built my own database and I have basically lifted your model and stolen your IP, extracted it from your AI.
6. Denial of Service (DOS)
And the final class of attack that I want to discuss is denial of service. This is basically just to overwhelm the system by sending too many requests, there may be other types of this but the most basic version, you just send too many requests into the system and the whole thing goes boom.
It cannot keep up, and therefore it denies access to all the other legitimate users. If you've read some of my other posts, you know I often refer to a thing that we call the CIA triad. It's confidentiality, integrity, and availability. These are the focus areas that we have in cyber security.
We're trying to make sure that we keep this information that is sensitive available only to the people that are justified in having it, and with integrity that the data is true to itself, it hasn't been tampered with, and availability that the system still works when you need it to.
Well, in IT security generally, historically, what we have mostly focused on is confidentiality and availability. But there's an interesting thing to look at here. If we look at these attacks, confidentiality, well that's definitely what the extraction attack is about. And maybe it could be an infection attack if that infects and then pulls data out through a back door.
But then let's take a look at availability, well, that's basically there. This denial of service is an availability attack. The others, though, this is an integrity attack. This could be an integrity attack. This is an integrity attack. I hope you understand.
So you see what's happening is, in the era of AI, integrity attacks now become something we're going to have to focus a lot more on than we've been focusing on in the past. So, be aware.
How To Protect Your AI Model From Cyber Attacks
Now I hope you understand that AI is the new attack surface. We need to be smart so that we can guard against these new threats. And I'm going to recommend three things for you that you can do that will make you smarter about these attacks. And by the way, the links to all of these things are down in the description below, so please make sure you check that out.
First of all, a couple of posts I'll refer you to, one that I did on securing your computer and another on comprehensive guide to cyber attacks. Both of those should give you a better idea of what the threats look like and in particular some of the things that you can do to guard against those threats.
The next thing, download IBM's the guide to cybersecurity in the era of generative AI. That's a free document that will also give you some additional insights and a point of view on how to think about these threats. Check their website for detail on how to download.
Finally, there's a tool that IBM research group has come out with that you can download for free also, and it's called the Adversarial Robustness Toolkit. And this thing will help you test your AI to see if it's susceptible to at least some of these attacks. Subscribe to this blog so we can continue to bring you content that matters to you. Happy Learning!!
Print this post
0 Comments