Cyber Modeling And The Importance Of Collecting The Right Data
In this discussion with NetDiligence President Mark Greisiger, Scott Stransky, Vice President of Cyber Modeling at AIR Worldwide—and a member of the Verisk Cyber Solutions team—explains the important role of firmographic and technographic data in effective cyber risk modeling and risk management for insurance carriers and reinsurers.
While ransomware remains a hot topic, in reality, it’s one of many cyber risks faced by businesses and insurance carriers—risks that continually evolve and change. For instance, who would have thought that most of the world would be working from home with all the potential security risks involved? For insurance carriers and reinsurers, this makes cyber coverage underwriting challenging. It’s difficult to have an efficient process, develop effective coverage, and be profitable when so many risks are present (and ever changing).
At NetDiligence, we’ve launched a 25 person ransomware advisory group to forecast future issues for the next six to twelve months. As someone on the cutting edge when it comes to helping insurance carriers understand these complex issues, Scott Stransky has many valuable insights to share—beginning with answering a general question about data and modeling.
MG: Data, of course, is becoming more and more critical to cyber modeling. It seems very hard to collect the quality and kind of data needed. Scott, what types of data do you use, and how do you process these data?
SS: Mark, data’s really important to cyber modeling and looking at cyber risk as a whole. I can think about data in a few different buckets. The first bucket is what we call firmographic data. And by firmographic data, we’re talking about things like revenue of companies, the industry they’re in, the employees they have—very basic stuff about a risk. This gives us a first pass at how risky—from a cyber perspective—a particular company will be.
It’s important to collect this data in a large database because we know it’s somewhat difficult to collect good, detailed firmographic data at the time of underwriting. That being said, collecting it is easier than collecting what we would call technographic data. And, technographic data is perhaps more interesting than firmographic data. With technographic data, we’re thinking about things such as which cloud service provider or email hosting provider they use, but also things like how good is their patching cadence—is there evidence of bot infection in their network? We can collect a lot of different things, and we’ve spent a lot of time over the past few years trying to source out the best firmographic and technographic data possible.
On the firmographic side, we look at hundreds of millions of risks around the world, and we try to understand where these are located, which countries they are in, and what industries are most important. But, when we think about technographics, what we’ve done is deploy a network of sensors across the internet. These sensors collect—without a company’s permission required—information on who a company is talking to and how often, and if what they’re saying is in a legitimate sense.
What does that mean? I’ll give you an example using a police analogy. Think about the police force out to collect drunk drivers on a holiday weekend. The police can’t be everywhere, but they know where the drunk drivers tend to congregate. Maybe on the superhighways, party areas, maybe colleges, etc.
So, what we can think of in a virtual sense is this network of police we’ve deployed across the internet scoping out where this malicious traffic is flowing. We may not collect all of it, but we’ll collect the vast majority of what we call malicious intersections. These are intersections of real companies with malicious sites which may be known botnet websites or command and control sites for ransomware.
This allows us to understand who’s doing good things, who’s talking to Amazon Web Services or Microsoft Azure, versus who’s talking to Mirai botnet or a maze-ransomware type of server. We’re able to differentiate this traffic. And because we have this network at the raw level, we’re not bound by any pre-baked scores. We’re able to create and train scores in our own data sets.
We have many of our own data sets. We have insurance claims data and historical incident data that allows us to take that raw technographic and firmographic data and actually translate it into an insurance use case. Meaning we’re not just saying, “Oh, you have 651 malicious intersections.”
An insurance company is not necessarily going to know what to do with that. So, we actually train models to understand what it means for your cyber insurance risk if you have 651 malicious intersections (and you’re a company based here with certain criteria and characteristics). We’re trying to add that extra level of value to the data.
MG: One thing that is always mind-boggling to me in cybersecurity, and even traditional parts of insurance; there’s just so much data out there. How do you know that the data that you are collecting actually adds some value to the cyber risk modeling results?
SS: There are several different ways to see if it’s adding value, but we employ the scientific method where you build a control model and then you do an experiment on top of that. Our control model is one that’s just based on revenue and industry. That’s the most basic underwriting criteria you can; It’s a large retailer, it’s a small bank. That’s a decent model, but we know we can do better than that. So, we first build a model and train it based on those two criteria alone.
We then add on technographics one at a time, five at a time, 20 at a time. We do it in many different combinations and versions until we get a model that gives us a statistical lift above that control model, effectively putting it into the insurance world, where it adds the most profitability.
If you were running just that control model, how profitable would your book of business be? If you were running our model with the firmographics plus the technographics, how much more profitable would you be? We can explicitly quantify that lift (how much extra profitability you get with our technographic data and with our full model.) We can literally scientifically prove the extra value that you get by adding this data in.
MG: Ransomware is on the minds of every carrier and reinsurer that we talk to. The whole industry is suffering large losses from targeted ransomware. It’s a major problem they’re trying to get their arms around. Can you actually help model this?
SS: We think about ransomware in a couple of areas. One would be systemic ransomware like NotPetya or Wannacry—which are very widespread, almost scattershot—events. And then, we also think about targeted ransomware or individual risk ransomware where bad actors focus on a particular organization and really go after that one company. These each have different ramifications and different modeling methodologies.
When we’re thinking about systemic ransomware, we need to understand the points of aggregation that could lead to it. Things like which operating system, how good it’s patched, and maybe even the geographic extent of a company’s internet footprint. And, we saw with NetPetya that it targeted companies based in Ukraine due to how it was released into the wild.
With targeted ransomware, we also have to think about data exfiltration. Unlike systemic, which is scattershot, the way bad actors use targeted ransomware to get data out of a company can cause even more loss than just business interruption and remediation. So we need to focus on the various aspects of it to build appropriate models for both systemic and targeted.
One other thing we need to think about for targeted ransomware is the correlation amongst the risks. While it is still individual bad actors going after individual companies, it can be quite correlated.
We saw in the news last year when there was a campaign of targeted ransomware against hospitals. One bad actor going after one hospital at a time. But, they were going after lots of hospitals amongst their group of bad actors. So while it wasn’t a systemic campaign—meaning one worm released on the internet just propagated—it was a correlated event. We need to explicitly model that correlation to capture what’s going on in the world.
MG: I’m going to end on a quick question about looking into a crystal ball: What do you see as maybe the next frontier of cyber modeling?
SS: I would say right now we have a pretty good grasp on what we would call affirmative cyber— meaning you write a cyber insurance policy and know the potential for loss from that policy or portfolio of those policies.
Where I think the next frontier lies is on what we could call non-affirmative, silent cyber, or cyber as a peril. It’s effectively the cyber risk to non-cyber policies. I know at conferences you’ve put on, that they always talk about silent cyber. It has become a big deal and we’re actively researching silent cyber and building out new models for that. We’re thinking about things like risk to property policies, bodily injury risk, and many other forms of risk from cyber incidents. We’re working closely with our client partners to try to understand their needs on the non-affirmative side.
For further insights, watch the full discussion about the data challenges insurance carriers and reinsurers face when it comes to understanding cyber risk and NetDiligence’s new ransomware advisory group. And don’t miss the latest ransomware issue post-attack more companies are facing today.