The fair-trade standard for data

7 min readAug 28, 2018

Born over half a century ago, the fair-trade movement is one which has made significant strides since its inception. The goal of the social movement can be interpreted as one which seeks to spread prosperity, by improving the trade conditions for producers in developing countries. As an over-simplification to prove a point, producers were being over-worked and underpaid, and firms in developed nations were profiting at the expense of these producers. That is, the value these producers were generating was not being fairly compensated. Instead, the value generated by the producers was concentrated with a small number of firms and individuals.

History has a tendency to repeat itself and can teach us a lot if we just pay attention. With the advent of the internet and an ever-connected world, peoples’ identity has split into two. Suddenly, a parallel universe was born in which a different and arguably more powerful version of ourselves resides. The digital realm is one which gave birth to our digital identity. Unlike our physical identities, our digital selves do not forget anything. It is a complete representation of all our actions, preferences, opinions, secrets, and who we are. Our digital identity can be thought of as all the data we generate throughout our lives compliled into a highly valuable, computable package. It is this data which can then be fed into machines which output something of value from it.

What is this “something of value”? Well, it can be anything from knowing the optimal time to show someone a product which maximises the probability of a purchase, to a highly sophisticated AI. Consider self driving cars or Google Duplex — an AI assistant capable of making bookings in an eerily familiar voice. It is crucial to note that these cognitive services are dependent on the data we produce — without it, they cannot exist.

“Human intelligence and human data drive the machines and are the reason these machines exist in the first place.” — John G. Messerly

Yet the data generators, the people, who produce this value are exploited for profit without obtaining any piece of it. Moreover, people do not have control of their own digital identity. The current paradigm of data usage, has resulted in a world where people are not fairly compensated for the value they generate. Once again, the value we generate is concentrated with a small number of firms and individuals. But this time, there is a big difference. This time, it is economically beneficial to all stakeholders if we make privacy the default and compensate people for the data they generate. Imagine what knowledge or services could be provided if the use of people’s data was a collaborative win-win affair. To quickly bring in my venture OMNIA protocol, we are creating the first fair-trade standard for data which can be broken down as follows:

Privacy + Compensation = Fair-trade data

Where is the value for firms?

The value of the fair-trade data standard for individuals is easy to infer. The ultimate control of our digital identity is controlling who has access to see the data we generate, that is, having privacy as the default. The value of compensation for the data we generate speaks for itself. Beyond boosts to reputation and consumer trust, for the firm, the value stems from the following:

Significant reduction of costs and risks associated with accessing data.
Putting data in users’ hands and incentivising them to share it with companies results in near-optimal distribution and use of that data.
Access to more depth of data.
Ability to implement a new generation of highly valuable, privacy preserving applications.

Before breaking down these points further, it is necessary to have an overview of our platform — OMNIA. We are building a platform which preserves and enhances the utility of current data practices whilst ensuring that generated data is never revealed to anyone. OMNIA gives firms and other interested parties access to compute on data but never to see it on an individual level.

Reduction of costs and risks

Think about the journey your data must take from the point you generate it, to the point it is computed on. Before firms even obtain your data, they must incur legal costs such as constructing terms and conditions, privacy policies, and data usage terms of agreement. Then comes the infrastructure costs of obtaining that data and keeping it secure which leads to one of the most significant risks a firm can face — data breaches. According to the IBM cost of data breach study conducted last year, the direct costs of these breaches alone can vary from an average of $US3.6 million to $US300 million depending on the size of the breach. But all of these costs and risks are significantly reduced if the firm never actually holds or sees any sensitive data. There is no such thing as perfect security, but you can’t have a sensitive data breach if you don’t have any sensitive data to begin with.

Near-optimal distribution and depth of data

The next two points build on each other. The second point is based on a recent paper by Jones et al (2017). To keep things brief, I provide an extract from their conclusion:

“Our framework supports this: when firms own data, they may overuse it and not adequately respect consumer privacy. But another important consideration arises from the nonrivalry of data. Because data does not get depleted, there are large social gains to allocations in which the same data is used by multiple firms simultaneously.”

The study suggests that when firms own data, they are reluctant to sell it out of fear of creative destruction. Thus, shifting ownership to the data generators is economically beneficial — multiple machine learning algorithms can use the same sets of data. They state that data generators must weigh the compensation they can receive from selling data with the privacy consequences which follow the need to disclose data. But what if they didn’t have to. What if there was a way to obviate all the privacy concerns and thus give firms access to a new depth of data whilst allowing data generators to maximise the compensation from the data they produce. This brings me to my third point.

A consequence of a practical privacy solution coupled with compensation for data, is that people are likely to be more willing to give firms access to their most sensitive data. Remember, with our fair-trade data that access entitles the firm to compute on the data — not to see it. So, if someone offered you a reward — be it fiat or otherwise — in exchange for being able to input your financial or even health data into their computations, but they would never see your data, would you do it? It is a difficult concept to grasp, how do firms extract value from my data if they can never see it? The answer is that they use that data to compose a low-dimensional and unidentifiable output. It may be useful to think about an example of such an output — a health rating. Let’s take the optimal case where we have access to a large set of data to construct a highly accurate health rating. The input of a health rating can be a large range of data varying from heart-rate, accelerometer readings, financial data, photos of receipts, health records, sleep patterns, and even music. A data generator then runs a computation which turns those sensitive data inputs into a health rating. Given a health rating, one cannot derive the inputs which make up the health rating — it is a trap-door function. This is just one level of the privacy layer and is intended to show that data does not have to be seen to be valuable.

Privacy preserving applications

The last point I want to make is around the ability of firms to develop highly personalised, yet privacy preserving applications. Consider the following example. A health insurance firm issues you a computation which takes all that aforementioned data, and determines that you have a high risk factor for a cardio-vascular disease. Some integrated application then notifies you of this and offers to book an online video appointment with a partnered virtual doctor service. You agree and give permission for that particular doctor to access all relevant health data to the detected risk factor. You’re able to save time and therefore money on the appointment, as the doctor knows exactly what they need to know. The health insurer just helped their customer be healthier , and helped drive the business of this virtual doctor. Where does privacy come in? No-one except for you knows this happened. The health insurance firm knows only that the correct service was offered and then accepted by one of their customers. These are the future applications which we envision our platform playing an important part in.

To be clear, we do not think we can build such an application alone, nor do we intend to. We see us playing an important role in a new-generation of privacy preserving applications and wish to bring humanity to a new level of prosperity. Ultimately, we envision the fair-trade data standard becoming a societal norm and aim to spark another social movement — this time to ensure the value we generate is compensated for fairly. The importance of a practical privacy solution and the transition to a world where people are compensated for the value they generate is crucial for the future of humanity.

The fair-trade standard for data

Where is the value for firms?

Reduction of costs and risks

Near-optimal distribution and depth of data

Privacy preserving applications

Written by Nick Shibanov