Hans Rosling1 was the co-founder of the Gapminder Foundation2, an organization dedicated to educating the public by using data to dispel common myths about the so-called developing world. The organization uses data to show how actual trends in health and economics contradict the narratives that emanate from sensationalist media coverage of catastrophes, tragedies, and other unfortunate events. As stated in the Gapminder Foundation’s website:

Journalists and lobbyists tell dramatic stories. That’s their job. They tell stories about extraordinary events and unusual people. The piles of dramatic stories pile up in peoples’ minds into an over-dramatic worldview and strong negative stress feelings: “The world is getting worse!”, “It’s we vs. them!”, “Other people are strange!”, “The population just keeps growing!” and “Nobody cares!”

Hans Rosling conveyed actual data-based trends in a dramatic way of his own, using effective data visualization. This section is based on two talks that exemplify this approach to education: New Insights on Poverty3 and The Best Stats You’ve Ever Seen4. Specifically, in this section, we use data to attempt to answer the following two questions:

  1. Is it a fair characterization of today’s world to say it is divided into western rich nations and the developing world in Africa, Asia, and Latin America?
  2. Has income inequality across countries worsened during the last 40 years?

To answer these questions, we will be using the gapminder dataset provided in dslabs. This dataset was created using a number of spreadsheets available from the Gapminder Foundation. You can access the table like this:

library(tidyverse)
library(dslabs)
data(gapminder)
gapminder %>% as_tibble()
#> # A tibble: 10,545 x 9
#>   country  year infant_mortality life_expectancy fertility population
#>   <fct>   <int>            <dbl>           <dbl>     <dbl>      <dbl>
#> 1 Albania  1960            115.             62.9      6.19    1636054
#> 2 Algeria  1960            148.             47.5      7.65   11124892
#> 3 Angola   1960            208              36.0      7.32    5270844
#> 4 Antigu…  1960             NA              63.0      4.43      54681
#> 5 Argent…  1960             59.9            65.4      3.11   20619075
#> # … with 10,540 more rows, and 3 more variables: gdp <dbl>,
#> #   continent <fct>, region <fct>

Hans Rosling’s quiz

As done in the New Insights on Poverty video, we start by testing our knowledge regarding differences in child mortality across different countries. For each of the six pairs of countries below, which country do you think had the highest child mortality rates in 2015? Which pairs do you think are most similar?

  1. Sri Lanka or Turkey
  2. Poland or South Korea
  3. Malaysia or Russia
  4. Pakistan or Vietnam
  5. Thailand or South Africa

When answering these questions without data, the non-European countries are typically picked as having higher child mortality rates: Sri Lanka over Turkey, South Korea over Poland, and Malaysia over Russia. It is also common to assume that countries considered to be part of the developing world: Pakistan, Vietnam, Thailand, and South Africa, have similarly high mortality rates.

To answer these questions with data, we can use dplyr. For example, for the first comparison we see that:

gapminder %>% 
  filter(year == 2015 & country %in% c("Sri Lanka","Turkey")) %>% 
  select(country, infant_mortality)
#>     country infant_mortality
#> 1 Sri Lanka              8.4
#> 2    Turkey             11.6

Turkey has the higher infant mortality rate.

We can use this code on all comparisons and find the following:

#> New names:
#> * country -> country...1
#> * infant_mortality -> infant_mortality...2
#> * country -> country...3
#> * infant_mortality -> infant_mortality...4
country infant mortality country infant mortality
Sri Lanka 8.4 Turkey 11.6
Poland 4.5 South Korea 2.9
Malaysia 6.0 Russia 8.2
Pakistan 65.8 Vietnam 17.3
Thailand 10.5 South Africa 33.6

We see that the European countries on this list have higher child mortality rates: Poland has a higher rate than South Korea, and Russia has a higher rate than Malaysia. We also see that Pakistan has a much higher rate than Vietnam, and South Africa has a much higher rate than Thailand. It turns out that when Hans Rosling gave this quiz to educated groups of people, the average score was less than 2.5 out of 5, worse than what they would have obtained had they guessed randomly. This implies that more than ignorant, we are misinformed. In this chapter we see how data visualization helps inform us.