For many years now, the path towards business digitization has been driving us to search for relevant information in the data generated in daily operations. SQL cubes, pivot tables in spreadsheets and data mining were some of the tools and strategies that we used back in the day. More recently, we started talking about big data because of the huge volume of data that was being generated. Let’s take a look at what dark data is all about.
With the latest boost that the pandemic gave to the digital transformation of all organizations, the amounts of data available exceed anything even imaginable. The current buzzword is dark data. Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes”, such as business analytics, business relationships or direct monetizing of this data, which is often stored only because it’s required by law.
|
If data is immobilized, storing and protecting them can be more expensive (and sometimes riskier) than it’s worth. Does that make any sense? Obviously not, especially when data-driven decisions need to be made to achieve a competitive advantage, improve results or reduce return-on-investment times.
Delve into these topics with our free eBook
The need to be data-driven and other trends that companies can’t lose sight of
It is well known that data is generated in every business process flow, every communication, and every task executed in a digital environment —and that’s a lot of data of different types and with different characteristics. Data governance is the strategy designed by each organization to find, identify, sort, process, and leverage as much dark data as possible, so that the value they contain can be unlocked.
Just as a detailed mapping of systems, processes, and circuits that information travels through is key to any cybersecurity strategy, the same applies to identifying what type of information is generated, when and where, so as to design methods that allow value to be obtained from that data and move it from darkness to light.
Artificial Intelligence (AI) can be very useful in helping to make sense of unstructured data that’s not being used. By using AI and machine learning techniques, people can work with 1% of dark data and classify their relevance. Then, a reinforcement learning model can quickly produce relevance scores for the remaining data to prioritize which data to look at more closely.” Source: Mexico CIO magazine |
Amazon created a solution for these tasks: Textract. It’s a service that extracts text and structured data, such as tables and forms. It also goes beyond the simple optical character recognition (OCR), by extracting relationships, structure, and text from documents. Microsoft isn’t far behind, with its Azure Cognitive Services, just like IBM with Datacap or diverse API from Google.
Continue reading
CIO Agenda: New Responsibilities, Opportunities and Challenges
Although roles associated with data analysis have been emerging, university careers in Data Science are quite new. Organizations are progressively incorporating this type of talent, in order to work specifically with the identification, recovery and enhancement of dark data.
Only 35% of organizations classify all their data. That’s bad news for the remaining 65% that only do so partially or not at all, because we also found that organizations that classify all their data make more effective business decisions and display higher levels of trust. |
The scope of data governance has lately broadened from a technical approach (master data management, data catalogs, data quality, etc.) to incorporate data privacy, protection, and sovereignty. But, as stated by Forrester, organizations have a growing appetite to leverage their data for a business advantage, whether through internal collaboration, cross-ecosystem data sharing, direct marketing, or as the basis for AI‑driven business decision-making.
They obviously advise that, in doing so, organizations must be careful to maintain policies that engage employees, partners, and customers in their approach to leveraging data while respecting current regulations (compliance).