Sections

Commentary

The democratization of data

A little more than a year ago, the international community endorsed the Sustainable Development Goals. While the goals reinforced attention to previously identified challenges in health, education, and sanitation, there were also new areas of focus brought forth, including a call for a “data revolution.” But what do we actually mean when we think of the data revolution in connection to social and economic development? What are some of the examples that we can show? I often ask these simple questions to friends and colleagues but, somehow, never get a very clear answer.

Let’s step back and consider the digital revolution thus far. Do you remember when you saw a computer for the first time and how you experienced the internet initially?

My first computer was an Atari, which I got in 1990. I wrote my first university seminar papers on it. Loading time was around five minutes, even though I would only use word processing. The software had to be loaded from a floppy disks (those born after 1990 can google what it was) because disk space was unusual at that time. In hindsight it seems like it must have been really painful. At the time it was bliss, considering the alternative was my mother’s typewriter and removing each mistake with Tipp-Ex or retyping the whole page.

A few years later, at the end of 1994, I had my first encounter with email and the internet. Our university had one computer room where you could connect. We would converge to that room, in a separate building, to go “online” much in the same way as our grandparents in the 1950s would huddle around the one TV set that the wealthy neighbor owned. But the internet then was very different: It was just text, with links to other pages equally packed with text. I used to think how nice it would be to have a few pictures and a more playful way to interact.

Another memorable moment was in the spring of 2000, shortly after I joined the World Bank. My mentor who was already in his 50s came proudly to my office to show me something only he knew about in our department. It was a search engine, which had a clear and simple interface and gave you answers instantly. This was the first time I used Google.

The commonality between all of these big tech innovations of our lifetime—the personal computer, email, and the internet—is that they succeeded through simplification and personalization. It was the genius of Bill Gates and Paul Allen to understand that software would change the way people around the world worked and played. Since then “computers” have evolved from clunky machines to stylish items featured in most offices and homes (at least in the developed world). Virtually anyone can use a computer for daily work, transactions, or bookings without the help of specialized technology staff.

This is exactly the type of “revolution” that we are still waiting for in the field of data. Data scientists and software engineers can build the machines and tools that we need to collect, mine, curate, and analyze data. But we should all be able to use them. We should make the same bet that Gates and Allen made a generation ago and personalize (big) data in a way that empowers individual users. The experience should be playful, pleasant, and simple.

This big transformation is yet to come in many segments of society and sectors of work. This is even true for economists who work most of the day with data but often still operate in a 20th century pre-data revolution world. In this “ancient regime” we have experts—mainly statisticians and economists—who crunch official data and make projections. The public accepts this official wisdom, until forecasts get corrected by the same actors.

This old model needs an upgrade, badly. The opportunities are enormous, ranging from leveraging data traces from the world’s 7 billion cell phones to exploiting high-quality satellite imagery (for which costs are declining rapidly). However, the breakthrough will not come from technology alone, and the true game changer will be when data collection, aggregation, and communication is effectively democratized. For that to happen an important first step is to learn to live with the imperfection of the existing data and of the systems that generate them. If we are transparent about these imperfections, we can create systems for improvements and error correction.

Here are three surprising findings from population.io, a tool (I helped develop) that personalizes demographic data and tells you among other things, how long you might expect to live:

  1. A girl born in Korea today can expect to live 99 years. This makes Korea the country with the highest life expectancy in the world.
  2. A 45-year old Russian man has a lower life expectancy than the same man in the Democratic Republic of Congo. By contrast, Russian women in the same age bracket live 10 years longer than in the Congo.
  3. If you are 80 years old in Brazil, you can expect to live longer than any other octogenarian in the world. For example, an 80-year old Brazilian woman would live on average another 10.5 years, compared to 10 years in the United States or Germany.

Each of these three examples warrant a closer look at the data. Obviously there are hypotheses to explain these results—girls’ education in South Korea, alcohol abuse in Russia, etc. For the Brazilian case, it is more difficult. If this is an example of a data error, we should correct it. If not, we may create a body of research on the secrets of long lives in Brazil for those who overcame the first 80 years successfully.

Authors