For all the talk of how data is the new oil and the most valuable resource of any enterprise, there is a deep dark secret companies are reluctant to share — most of the data collected by businesses simply goes unused.
This unknown and unused data, known as dark data comprises more than half the data collected by companies. Given that some estimates indicate that 7.5 septillion (7,700,000,000,000,000,000,000) gigabytes of data are generated every single day, not using most of it is a considerable issue.
In this article, we’ll look at this dark data. Just how much of it is created by companies, what are the reasons this data isn’t being analyzed, and what are the costs and implications of companies not using the majority of the data they collect.
Before diving into the analysis, it’s worth spending a moment clarifying what we mean by the term “dark data.” Gartner defines dark data as:
“The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).
To learn more about this phenomenon, Splunk commissioned a global survey of 1,300+ business leaders to better understand how much data they collect, and how much is dark. Respondents were from IT and business roles, and were located in Australia, China, France, Germany, Japan, the United States, and the United Kingdom. across various industries. For the report, Splunk defines dark data as: “all the unknown and untapped data across an organization, generated by systems, devices and interactions.”
Of the vast quantities of data collected by companies, how much of it is dark data? The chart below shows the estimate among the 1,300 executives on the percentage of their data that goes unused:
Fifty five percent of all data collected by companies is dark data. Within this category of dark data lies two subcategories — data that they know has been captured but don’t know how to use and data that they are not even sure with certainty that they have.
Furthermore, while 55% is the global average for dark data, some companies have a lot more and others less.
Just 11% of executives believe that less than a quarter of their organization’s data is dark. On the other hand, three times as many people believe that more than 75% of their company’s data is dark.
Business leaders in France think that their companies have the highest rates of dark data. 42% of executives at French companies estimate that more than 75% of the data they collect is unusable. On the other hand, leaders in China have the highest confidence that their organization is using most of the data they collect. Even still, 44% of executives in China believe that over half the data they collect is dark data.
While the costs of storing data has decreased overtime, the cost of saving septillions of gigabytes of wasted data is still significant. What's more, during this time the strategic importance of data has increased as companies have found more and more uses for it. Given the cost of storage and the value of data, why does so much of it go unused?
The following chart shows the reasons why dark data isn’t currently being harnessed:
By a large margin, the number one reason given for not using dark data is that companies lack a tool to capture or analyze the data. Companies accumulate data from server logs, GPS networks, security tools, call records, web traffic and more. Companies track everything from digital transactions to the temperature of their server rooms to the contents of retail shelves. Most of this data lies in separate systems, is unstructured, and cannot be connected or analyzed.
Second, the data captured just isn’t good enough. You might have important customer information about a transaction, but it’s missing location or other important metadata because that information sits somewhere else or was never captured in useable format.
Additionally, dark data exists because there is simply too much data out there and a lot of is unstructured. The larger the dataset (or the less structured it is), the more sophisticated the tool required for analysis. Additionally, these kinds of datasets often time require analysis by individuals with significant data science expertise who are often is short supply.
The implications of the prevalence are vast. As a result of the data deluge, companies often don’t know where all the sensitive data is stored and can’t be confident they are complying with consumer data protection measures like GDPR. The next two charts show the percentage of executives who think their companies knows where all the sensitive data is located and those that think their companies are compliant with consumer data protection laws:
According to a survey from digital security company Gemalto, 46% of executives believe their companies do not know where all the sensitive or private information is stored.
Lack of clarity on where data is stored makes it difficult to safeguard sensitive information. Considering the cost of dealing with data breaches and associated government penalties, dark data can be very expensive; the list of companies receiving substantial GDPR fines thus far is expanding rapidly.
With businesses generating more and more data the costs and liabilities associated with dark data continue to grow. For business leaders aiming to increase their usability and security of data collected, what are some options they should consider? The chart below from the Splunk State of Dark Data report shows the percentage of leaders that view various solutions rated as having potential for solving dark data problems:
The most promising solution according to business leaders is talent. 76% of executives in the Splunk survey rated training current employees in data science as a potential solution and 70% also thought hiring more data experts was promising.
Business leaders also rated new software solutions as the second most promising avenue for making more data usable. 75% of those surveyed suggested that software allowing less technical employees to analyze large data sets would help them alleviate the dark data problem. Additionally, executives rate increased training about the value of data, increased funding for data projects and artificial intelligence as potential solutions.
The value of data is only projected to increase as machine learning and artificial intelligence become mainstream solutions employed by most businesses. At the same time, data that is misused or improperly protected makes business vulnerable to legal action or theft from hackers. Both of these trends conspire to make dark data an increasingly expensive problem. Those good news however is that software and data science training is increasing geared towards solving the problem of dark data. After all, what’s the good of collecting data if you can’t put it to work?
Splunk helps organizations worldwide turn data into doing. With solutions for IT, security, IoT and business operations, Splunk empowers people to make faster, better decisions and take action on all kinds of data in real-time. Read more about dark data, AI and the future of work in The State of Dark Data Report.