Back To The Blog

Using Big Data to Protect Against Cyber Risk

Uncategorized / December 01 , 2015

A Q&A with Lance Forbes of LemonFish Technologies
Of all Big Data’s capabilities, the means to proactively detect cyber breach events is especially intriguing. I spoke with Lance Forbes, chief scientist of LemonFish Technologies to find out more about how analytics can be used to find lost data across the internet.

What’s the difference between forensics services and your company’s services?
Forensics goes in when a problem has been detected and responds to it, trying to figure out what happened retroactively. At LemonFish, we use Big Data and analytics to find the problem proactively.

The 2014 Verizon Data Breach Investigations Report stated that it took companies an average of 220 days to discover a data breach. That’s a problem, and potentially, a lot of exposure.

It seems any Data Loss Prevention (DLP) system can be defeated, especially by a determined and malicious insider. Even the NSA, as technically sophisticated as they are, was the victim of a data breach. What can a company do if their DLP system is defeated?
The 2014 Verizon Data Breach Investigations Report stated that it took companies an average of 220 days to discover a data breach. That’s a problem, and potentially, a lot of exposure. The most important thing a company in this situation can do is detect the breach as fast as possible, remove the content from the public domain, and monitor for its reappearance. The way to reduce time to detection is by using a proactive analytics approach to look for data. The next step is to have a plan in place including arrangements with legal support, to remove content where possible.A company may should also set up maintenance monitoring to watch for copies of data that tend to pop up over time.

The Internet is a big place, especially if you include the deep and dark web. Despite Google’s massive infrastructure, it only indexes a fraction of the internet. How can a company look across such a large space to determine if their sensitive files are out there?
Federate existing search engines, and crawl high value areas not indexed by others. Our product LemonAid+ for Data Detection does all of these things to help us monitor more of the Internet. Our technology can federate multiple sources. We create the right kind of queries to find out what’s out there while not exposing our customer’s sensitive information in the questions themselves. We talk with companies with extensive overseas operations and to help them prevent business interruption, exposure and theft of intellectual property. This requires our systems to look in the open, deep, and dark parts of the Internet.

Given the size of the internet, looking everywhere, all the time is cost prohibitive. How can a company determine the best places to watch?
We will post an article on our site that discusses our approach in detail. However, in short we use the topics our customer needs to investigate to compare sources of data. We compare how much novel information each source of data provides. This helps our assess the value of a search engines or a custom crawl. LemonFish wants to be sure our customer is getting the best value for the money they spend.

What are the key things the C-suite needs to know, so far as data loss is concerned?
First, they need to know where they store their protected data. This helps answer questions like, “Is this data stored in as few places as possible?” and “Is our data properly marked or watermarked?” Second, as soon as possible, they need to know if their sensitive data is appearing on the Internet. This allows them to act immediately, so they can minimize the damage. We have talked to Fortune 100 companies that don’t know where their data resides. We came out of the government so we are used to classifying the sensitive of data down to the paragraph, so we are often surprised to see that few companies in the private sector do this. But understanding where the data resides is really key.

In Summary…
We want to thank Dr. Forbes and LemonFish for raising this important asset risk topic. One may argue the theft of trade secrets is the leading cyber risk issue facing the Fortune 1000. It serves to remind us all that protecting intellectual property may require undertaking proactive and sophisticated measures.


NetDiligence® is a cyber risk assessment and data breach services company. Since 2001, NetDiligence has conducted thousands of enterprise-level cyber risk assessments for organizations. NetDiligence services are used by leading cyber liability insurers in the U.S. and U.K. to support loss-control and education objectives. NetDiligence hosts a semiannual Cyber Liability Conference attended by risk managers, privacy attorneys and cyber liability insurance leaders from around the world. NetDiligence is also an acknowledged leader in data and privacy breach prevention and recovery. Its eRiskHub® portal ( is licensed by cyber liability insurers to provide education and breach recovery services to their policyholders.

Related Blog Posts

Download 2023 Cyber Claims Study

The annual NetDiligence® Cyber Claims Study uses actual cyber insurance reported claims to illuminate the real costs of incidents from an insurer’s perspective.


© 2024 NetDiligence All Rights Reserved.