Inequality and economic growth in Argentina

A small clarification before we start: I am not pro or against the current Argentinian government, and if I had to vote in next sunday’s elections I would not be able to choose a side. The following is not a real political commentary, what I like is data visualization.

Recently I got to read Alberto Cairo’s The Functional Art, and I was strongly attracted to a great infographic about the evolution of the economy and inequality in Brazil. It got me thinking about how would that look for Argentina, so I went and got the data from the World Bank (starting in 1986, and only up to 2013), and reproduced the plot in a very similar style but using the Argentinian data.

Link to the original plot and discussion

My plot:

desigualdadArgentinav3

The chart’s interpretation is that points higher up represent more inequality, while points to the right mean a higher production of economic value. In the words of Alberto Cairo, one of the messages of the chart is that growth in GDP does not always mean a reduction in inequality.

A few technical comments:

Like in the original chart, I used the Gini coefficient to measure inequality. It is far from a perfect indicator, but it is a very popular one. To measure economic growth I decided to use the GDP per capita instead of the total one used in the original chart. I think this would be a little better since the country’s population grew significantly in the three decades spanned by the data.

I respected the original design and separated the presidential terms by colour, which I think is a brilliant decision. It totally makes the plot. There is one caveat though, and it is that each data point measures a whole year and president changes happen at different points during that year. Therefore there might be some leeway in how to put the color (and I did’t figure out an impartial algorithmic way to decide this).

With the risk of ruining the incredible work by Cairo and collaborators, I took the liberty of adding a few labels indicating important economic events (and I also changed fonts, colors, stroke widths, and other small stuff that makes my chart subtly but clearly worse, of course).

Comments on the content:

Like in the original chart, you can see a clear mark or general tendency that is very different for each presidency (including the difference between Menem’s first and second term).

More importantly, beginning in 2003, and coinciding with the Kirchner’s rise to power –just like Lula in Brazil–, the country grows economically and inequality goes down in an unprecedented manner (except for a small glitch under the global economic crisis of 2008). I could not find Gini estimates before 1986, but there are some for the urban area of Buenos Aires and they show that only in 1984, and before that in 1974, there were such low levels of inequality as today.

An important detail: reliability of the source

After a lot of comments by many of the readers in the original (in spanish) post, I caved and gave more hours to this project and included an alternative measure of GDP (alternative to the World Bank, that is).

Why? Well, for those not well versed in Argentinian politics, the current government intervened the official statistics institute, and since then their numbers (which feed the World Bank’s database) have been strongly questioned by many. The World Bank itself recognises that unreliability of the official data from Argentina and puts a disclaimer saying “the World Bank is also using alternative data sources and estimates for the surveillance of macroeconomic developments in Argentina”.

Many people asked me to get an independent source and check the data, so I did. I discovered other things in the middle, like for instance that it is terribly difficult to match different methodologies when measuring GDP. Therefore, I decided that the best (in terms of a compromise between easiness and correctness) was to give the official numbers for 2007 as valid, and then calculate the subsequent years from the relative growth (year to year percentage) measured with this other indicator (called ARKLEMS, it just rolls of of the tongue right?). The data from this analysis is shown in the alternative timeline in gray, which unexpectedly (from the comments), almost lines up with the data from the World Bank.

There are still things that bother me with this: first, I am plotting different things that maybe have no way of being normalized to the same values, and second, and perhaps more important, perhaps this indicates that the adjustment was already taken into account by the World Bank in my original data set (they say they are doing it, but not how). Hopefully this will lead to good conversations and discussions on how to recover the reliability of the INDEC.

New questions:

The evolution in Brazil, the original chart, is strikingly similar to that of Argentina. This makes me wonder how much is the effect of particular presidents (sure there must be some, but still) compared to the global and regional environment. To answer this question I got the full data set from the World Bank and maybe in one or two weeks I’ll get a complete infographic with a comparison (it won’t be easy as I have to rethink the design and maybe even go interactive…)

Note:
All the data comes from the World Bank, from this table and this table, except for the Gini coefficient from 88, 89, and 90, which I found in Gapminder.org. They cite the World Bank as a source but probably they took it from somewhere else. As a side note, Gapminder already let’s you do the comparison between countries of the region, only that there is no presidencial term information. The alternative GDP calculation from 2008 onwards was taken from ARKLEMS, which I adjusted for population growth, and only used the year to year change to estimate the movement of the 2007 World Bank data point.

The UX Visualization Diaries · Number 1

Someone once accused me of not doing visualizations. Although that is not actually true (I’ve done more than one and so that means there are a lot of people out there with a bad memory) However, I have to admit that it’s not a really my job.

My job, my function in the Visualization department – apart from designing user friendly interfaces, of course– is to make the visualisations of my team more understandable:

Trying to prevent them putting 20 variables in the sae graphic in an attempt to demonstrate that it can be done, just for the sake of it.

Sometimes I managed to do it. Sometimes I didn’t.

That’s the reason why I decided to write this series of blog entries dedicated to analyzing things that are not clear or aspects that could be improved upon, always from the UX point of view.

Taking Tuftte’s work as a base, Fernanda Viegas and Martin Wattenberg wrote a blog entry titled Design & Redesign which suggested that Data Scientists should not only criticize other people’s work but improve on it with suggestions on how to redesign the visualization and  I’ll try to analyze them respecting their original style.

Journal visualization, the number of publications are represented by the volume of circles

First I’ll confess that I chose this representation: Jounals, because I thought, at first glance, that it was appealing and easy to analyze. I also wanted to prove that my point of view coincided with, or at least complemented, the view of experts (my boss basically). And they did.

The first problem we see is that depending of the subject or type of publication we see a different time period (from 2004 to 2013, from 1970 to 2010…)

The question is why not represent the same time period for all graphs to show  that they don´t have any data in some years.

A different problem is the time step which is changing throughout the different graphs: every two years, every five years…

different step in timelinedifferent step in timelineAfter thinking about the color range, in the end, I deduced it was not relevant. The color range selected only tries to differenciate one line from another, but some users could have thought: Does the range (Blue, red, green) mean something? Does the color intensity mean something else? Is the light blue more relevant than the dark one?
And the last and most important design error: Why are some totally different values represented with the same/similar radius?

diapo_values

The basic problem is that for every line they have changed the relative radius. So if you don´t see the values beside two similar circles you might think they are hiding a similar value, but they don’t. One circle could have a value of 20, while a similar circle could have a value of 2. So at first glance, and without any interaction you can’t compare the two graphs (or even two lines) easily.