Data poisoning attacks, which exploit machine learning and AI by tainting training data for criminal ends, are widely considered to be the next big cybersecurity threat. But what exactly is data poisoning and how does it endanger AI and machine learning models? We talked with Jeff Chan, vice president of technology at MOXFIVE, who gave us the overview of what data poisoning is, why it’s done, and how end users can help stop this emerging practice.
What is data poisoning?
Data poisoning is a type of attack where adversaries poison data or a large data set in order to manipulate the overarching outcome. For example, they take spam or malicious emails and mark them as safe and legitimate. The systems or algorithms that are put in place to prevent those from actually getting to our mailboxes let them go through instead, and we actually get to see them in our inbox. It’s very similar to, let’s say, fake Amazon or Yelp reviews. The fake reviewer is trying to poison a large data set so that a two-star review product actually becomes a five-star review product.
So it’s the same concept here from a security perspective where people are just trying to say that this malicious email is really legitimate. And what ends up happening is that, as end users, we succumb to whatever our platforms are giving us.
How do threat actors go about the processing of “poisoning” data?
What they do is set up fake accounts either within Microsoft or Google, one of the two largest service providers in terms of email. They start sending malicious samples through their inbox and mark it as safe, essentially saying “No, this is not spam. This is actually legitimate.” In doing so, they are training the Google and Microsoft systems.
When you do that at scale, not just with one user, but with 20 or 30 different users marking a bunch of emails as legitimate, then the sample data that Microsoft and Google have, becomes contaminated or “poisoned” to the extent that fake data or incorrect data supersedes the true data, which can have a very negative impact.
Are data poisoning attacks limited to email?
Not necessarily. It can happen to other data sets depending on the organization. As an example, let’s use a company that tracks statistics for every professional basketball game around the globe. If a threat actor compromised their environment, since they have a large data set, a threat actor could poison its databases by inserting malicious or fake information, ultimately skewing the data. However, a major cybersecurity concern from a corporate perspective is business email compromise and wire fraud—and data poisoning can set the stage for that. Our end users are likely going to receive these attempts at a business email compromise. And when that happens, it’s really up to that user to make the right judgment call. Will they end up clicking that malicious email or will they not? And how can we prevent that?
How common is data poisoning right now?
Very common. Threat actors run sprints. So let’s say they have a campaign that’s going to go out next week. They’ll start doing more sprints a couple of weeks prior to ensure that their emails can actually go through the email providers and land in somebody’s mailbox. And the more they can do it, the better their results will be. It obviously takes a lot of effort to successfully carry out data poisoning, but these efforts can ultimately increase their chances of success at having a victim fall into their traps. This would then help threat actors to fulfill their end goal, which could be either compromising an email account, or worse, compromising an environment, exfiltrating confidential data, and deploying ransomware.
It sounds like it is usually a means to another criminal end, but is it ever done simply for the sake of poisoning data?
It’s almost always financially motivated. Getting paid for deploying ransomware or conducting wire fraud through a business email compromise, would be a threat actor’s main goal. There are possibilities where threat actors gain access to a mailbox or an email account and they could sell it on the dark web as well. And other threat actors could then leverage that access to further exploit the user, ultimately, getting to a point where, worst case scenario, organizations get impacted by ransomware. Or there could potentially be a situation where if you manage data for medicines or vaccines, and you start skewing that data to a point where you’re saying that the research results of a specific product are actually good to go to the market, when in fact, they are not. In that sort of case, any incorrect malicious data could ultimately impact users’ lives.
What can everyday users do to prevent data poisoning from happening?
To some extent we have to rely on the tools that these companies are developing to detect malicious data. But we can also do our part by not clicking on suspicious emails and training employees to do the same. In addition, correctly marking things as spam or malicious actually helps to train these systems to better detect them. If we don’t do anything, then these threat actors are going to keep adding more malicious data that will eventually overpower legitimate data and that’s a bad outcome for everyone.