Dark Data And Why It Matters In Big Data
By Thanh Pham
Dark data refers to the data generated from regular business activities but are rarely utilized. They are not used to draw insights useful for decision making in the business. Instead, they to be retained, mostly for compliance purposes. Their handling and storage often translates to additional expenses and elevated risk of manipulation rather than offering increased value to the business.
How big is dark data in big data?
According to IBM, it is estimated that 90 percent of big data collected from multiple streams is dark data that never gets used. The key reasons why they may not be utilized are that companies may be generating more data than they can process, the data analysis tools available may be ineffective and, the data may be in types and formats that are incompatible with the existing tools for data analytics. The cost implications of analyzing dark data may also be too high for most businesses.
Some of the dark data collected during typical business operations include:
- Customer call detail records
- Server log files depicting the behavior of website visitors
- Mobile geo-location data
- Data stored in obsolete storage devices that can no longer be accessed
- Raw survey data
- Customer profile and information
- Email correspondences
- Financial statements
- Old documents and business notes
These examples of dark data paint a picture of how it is inevitable to collect dark data. Although they are rarely utilized, they are an essential component of business data.
Analysis of Dark Data
As mentioned earlier, a lot of the dark data collected from the various business processes are unstructured, thus making it difficult to categorize and analyze with standard computer software.
To make dark data useful and capable of being analyzed, businesses have to convert the data, e.g., videos, audios, emails and other forms of dark data into other structured and friendlier formats that can then be analyzed.
However, the conversion of the data into friendlier formats that can be analyzed is often multi-staged and done manually, although some data formats can be converted using third-party tools and software. These approaches require significant investment in time and money, thus can be costly for businesses. The resource implications for the analysis of dark data is the main reason businesses fail to utilize them despite most businesses finding their intelligence reporting inadequate.
Nonetheless, dark data has the potential to improve business intelligence.
Advantages of dark data
When put into use, dark data can be beneficial to businesses. Some of the top benefits include:
Drawing value for data collected from mundane business processes
While some processes can be mundane and repetitive, the data generated from these processes have the power to transform how businesses deliver services to their clients. For instance, server logs for your websites can help provide insights on how friendly your website is to users. Call logs and email correspondents are also useful sources of data to help gauge customer satisfaction and the quality of your service delivery.
Analysis of dark data translates to businesses making the most of every data element they collect.
Minimizing accumulation of the data to maximize the value of real-time information
Dark data accumulates rapidly and the large volumes of data may shadow the usefulness of the information if not utilized fast. Extracting the value of dark data before it accumulates beyond your storage capacity helps you draw value from the information and evade additional storage cost as you can get rid of obsolete data after it has been utilized.
Simplify the costly and tedious analysis of large volumes of dark data
Analysis of dark data involves conversion of the data into simplified formats that can be analyzed using conventional tools and software. When the volumes are large, this process can be tedious and costly. But when the analysis is done continuously when data volumes are low, it makes the process less tedious and less costly, while allowing the business to derive value from the dark data in real-time.
Best practices for handling dark data
What data is being collected, and why?
It is important to be particular with the data you are collecting and handling as a business. Communicate with potential users in your organization on the type of data being collected and why the collection is necessary.
Periodic audits and database trimming
Analysis of dark data should be done continuously. Periodic audits and database trimming to discard the sets of data that may no longer be useful for informing business processes should augment the continuous dark data analysis. Organizations should establish clear guidelines for the retention and disposal of data. That can be done either by inhouse or outsourcing software development teams.
Data encryption aims to enhance the security of the data. This is a critical practice, given that sometimes businesses are not aware they are collecting some data.
Dark data largely remains unutilized in most organizations. However, they can be excellent sources of information that add significant value to how businesses operate. ment, and the latest technology trends.