What is dark data?

by | 25.01.2021

Dark data – The business model with the data!

Imagine a former Google employee and a former Pinterest employee jointly developing a new app. The app is supposed to be a combination of social network and audio platform. The aim is to give people the opportunity to discuss with each other. The app is released as a beta version1 and only runs on Apple devices. But with the right idea, these limitations need not be a disadvantage. A marketing strategy to increase demand and take advantage of the FOMO (fear of missing out) effect2 is being sought. “Let’s use invitations! Only those who receive an invitation from an active user can join! And later, when we have enough reach, we’ll open the network to everyone!” Well said, well done. Less than a year later, the app has 600,000 users3, and the trend is rising sharply. And it is valued at just under 100 million US dollars4. That sounds like a nice success story, doesn’t it? From zero to 100 million in less than a year. Impressive!

How would you like it if this company had your contact details? And not because you decided to use the app, but because you know someone who uses it. That someone has given the app access to your contact details. Not in bad faith, of course. He or she didn’t ask you if you actually wanted it, but what’s the worst that could happen? “I’m sure the company isn’t particularly interested in using the data!” However, the question then arises as to why access to the data is explicitly demanded by the app! The motives remain in the dark. And there is a term for data that is collected by companies in this or a similar way: dark data.

The definition of dark data

The American market research company Gartner defines dark data as”the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value”.

To be honest, I do not think much of this definition, because

  • it fits Small Data much better than dark data. Small data is data that exists in companies but is not actively used, e.g. unused customer feedback. However, Small Data is not only aimed at customers, users or consumers, employees can also benefit from it. (You can read more about this – including the distinction from Smart Data and Big Data – in this blog post).
  • the actual use of data is often not even recognisable from the outside. Moreover, it often develops in directions that were not even recognised internally at the time of collection.5 The monetisation of data at Google, Facebook, WhatsApp, Twitter and Co. sends its regards.
  • compliance addresses something different. Of course, companies keep data for compliance reasons. They use investigative, preventive and reactive compliance management systems in an attempt to prevent, among other things, criminal offences that could be committed from a company. But this has nothing to do with the data collection mania of companies.
  • storing data is very cheap these days. Gone are the days when mails were printed out and stored in thick folders in cupboards. In fact, there are now even calls to tax internet giants not just on profits but on stored records.
  • where should risks lie?
  • collecting and monetising data has long been a common business model in numerous industries!

In short, I propose a different definition of dark data: “Dark data are the information assets that companies collect, process and store as part of their business activities or as part of the business activities of affiliated companies, and typically use for extensive analysis and prognosis, initiation of business relationships and direct and indirect monetisation.”

Sounds a bit scary at first glance, unfortunately. And it is!

Examples of dark data

What are the examples of dark data? Which companies monetise data? Of course the usual suspects, global internet giants like Google, Facebook, Tiktok etc. Since the services of these giants are literally world class, many users don’t mind their data being used in exchange for them.

  • If network operators like Telekom or Vodafone locate your mobile phone in radio cells, why shouldn’t Google Maps use the location of your mobile phone to calculate traffic jam information?
  • Why shouldn’t the Google search engine use your search queries to auto-complete similar search queries from other users? (Interestingly, not only the actual search results on Google are individualised, but also the auto-completions). The Federal Court of Justice already gave one answer to this question in 2013: if personality rights of users are violated.6

Facebook goes a whole step further in its use of dark data. Advertisers can not only place ads there for a “core audience” with defined characteristics such as age, location or end device, but subsequently use specific characteristics, settings or preferences of the clicking prospective customers to address a “lookalike audience” with similar/identical characteristics. This is a perfectly legal way, which unfortunately can also be used to influence the formation of opinion in a political process, for example.

Are there other examples of the use of dark data, far away from the internet giants? Yes. However, these are less prominent. On the one hand, of course, because companies rarely flirt with it publicly. Statements like “We use progressive profiling forms and get the information we need step by step. Works quite well.”7 may be found in forums, but not in the “About us” section of a website. And on the other hand because they are simply overlooked. Here are a few examples:

  • If a website offers you a PDF to take away and the website operator wants to send you the document, what data does he need for that? Your email address, obviously. Why does he need your name? Your address, your position in the company, your company size and your telephone number? Surely it’s nice to be addressed by your name in the delivery email. And the rest? Dark data! The sales department wants to use the data to qualify you as a lead and win you as a customer. Without your explicit consent and a separate verification of the telephone number, he would not be allowed to call you, but where there is no plaintiff, there is no judge.8
  • How do you think it is that you regularly receive newsletters, but you are sure that you never signed up to receive them?
  • How is it possible that parents find “appropriate” advertisements and tips in their letterbox directly after the birth of a child. Perhaps hospitals monetise relevant information, but quite certainly the data is passed on by the residents’ registration office!9

At least there is also good news: in Germany and in the European Union there are clear rules on data protection!

Data protection is …

The European Data Protection Regulation has been in force since 25 May 2018. “Data protection will be an important competitive factor in the future”, a lawyer was sure at an information event at the time. I’m not sure I’ve heard another statement since then that had as little to do with reality as this prediction. Here is a small example from our website:

  • In 2020 we had 663,547 page views and 10,360 downloads were made.
  • To confirm the sending of a requested document, interested parties have to accept our privacy policy. Of course, this is directly linked and easy to read with one click in a separate tab. Essentially, it says that we use the email address (that’s all we collect) to send the document. Full stop.

How many times do you think our privacy policy was accessed last year? The answer is: 451 times. In terms of downloads, that’s 4.35%; in terms of page views, that’s 0.068%. Any questions?

Data protection is not really important to many people. That’s a shame, but do you know what’s even less important to people? The data protection of other people! That’s the only way an app that demands access to all the data of the user’s contacts is worth $100 million. That’s the only way dark data works.

Dark data: Use it or leave it alone?

Regardless of the fact that many small and medium-sized enterprises have neither the knowledge nor the capacities to monetise data, one question remains: should one use dark data or leave it alone? Should one try to buy e-mail addresses, ratings in portals, backlinks or followers for various social media platforms? Should the sales department take action if a lead has not given permission to be contacted when downloading a document? Should social selling automation be activated on LinkedIn, data-driven personas be developed via your website, or “do not track” settings accepted from website visitors?

Morally, most of these questions should be relatively easy to answer. Unfortunately, morality rarely pays any bills. In addition, those who do not want to benefit from dark data (and take data protection seriously) are in some ways at a disadvantage. A sales pipeline is fairly easy to fill externally by throwing in a few bucks, and a filled sales pipeline increases the chances of sales. If a company actively chooses to collect, process, store and monetise dark data internally, more chunks of money will be needed, but even one or two new customers should justify the effort. And if the company then manages to establish a scaling system, it might even become a business model!

So: Use it or leave it?

My personal answer to the question is: Hands off! I would rather focus on our customers. I think it is better if we put our energy into the concrete development of products and the improvement of our services. And when I look at the work of my colleagues, they see it that way too.

One last thought to finish with: Maybe you will soon receive an invitation to an app that combines a social network and an audio platform. If you know my mobile number, why not decline the invitation and arrange to meet me for a walk-and-talk coffee instead? We are also welcome to discuss something other than dark data. And of course I’ll pay for the coffee too! 😉

 

Notes (some in German):

[1] Beta Version – An early stage of software development
[2] FOMO Effect
[3] Willkommen im Club
[4] Silicon Valles is going crazy …
[5] Definite recommendation: The Social Dilemma
[6] Autocomplete-Funktion: Bettina Wulff und Google einigen sich
[7] Diskussion auf LinkedIn über die Optimierung von Formularen
[8] Wichtiges Urteil: Ist das DSGVO Kopplungsverbot Vergangenheit?
[9] Bundesmeldegesetz: Datenweitergabe durch Meldebehörden

Michael Schenkel has published more articles in the t2informatik Blog, including

t2informatik Blog: Speed as unfair advantage

Speed as unfair advantage

t2informatik Blog: The organisational rebel – after all, a good idea?

The organisational rebel – after all, a good idea?

t2informatik Blog: Learning to code in schools

Learning to code in schools

Michael Schenkel
Michael Schenkel

Head of Marketing, t2informatik GmbH

Michael Schenkel has a heart for marketing - so it is fitting that he is responsible for marketing at t2informatik. He likes to blog, likes a change of perspective and tries to offer useful information - e.g. here in the blog - at a time when there is a lot of talk about people's decreasing attention span. If you feel like it, arrange to meet him for a coffee and a piece of cake; he will certainly look forward to it!​