On returning home after a walk, with a stop at a mountain restaurant, my partner told me with a smile of excitement:
– I’ve been in a fantastic place. Great food and they also have a celiac menu. I’ve taken a picture of it, in case you want to come with me someday.
Those who have any kind of intolerance will know that the problem does not only affect the sufferer but also family and friends when it comes to finding a restaurant where everybody can eat. Depending on the level of intolerance, maybe will happen if the food has some traces of gluten, for example. But sometimes, and for very intolerance sensitive people, an error could be a serious problem, or even lead to death. (Several of my friends always carry an adrenaline injection, in case they accidentally encounter ingredients to which they are allergic.
Returning to the subject of the restaurant, I was looking at the photo I received on the mobile phone. After examining it for a while, I raised my eyebrows and looked at him sternly and said with a sour face:
– This is not a celiac menu…
– What are you saying? Here it says what has gluten and what doesn’t.
– That’s not a celiac menu (I insisted). In fact, it’s something completely different.
People usually say I’m not the nicest of people, especially when I am hungry for something, but there’s no way to satisfy this hunger. (quite an apt expression with the subject at hand). If you have never suffered any kind of intolerance related issue, you may perceive my reaction as a bit OTT and exaggerated, but in this particular case, I’m completely right: An allergen card is not a celiac menu, it’s basically the opposite.
After the European law 1169/11 came into force, it became mandatory to inform customers about the allergens in the dishes on the menu in bars and anywhere in the restaurant trade, although there are still many that do not comply with the regulations and many others that don’t present the information properly.
Let’s take a look at the differences:
Normally an allergen card is presented as follows: A list of dishes with their possible adverse components listed one after another for each dish.
Sometimes they do not even appear in the same order, and this is why it is difficult to process the information and determine which dishes you can eat and which dishes you can’t. Although the icons include a certain order, by not having the same ingredients, it is difficult to make a visual scan between the icons for which we should be looking out for.
Let’s Imagine the case of a celiac and allergic who is also allergic to celery: In this case, we will read the dishes one by one and then check which of them have the wheat icon (we will not enter the debate on the icon’s meaningfulness).
Once this is done, we will memorize the dishes with gluten and begin the search for celery, something that little by little will increase our cognitive load.
Some of you may be thinking that this is not what happens in real life, that users do not search in this way, and that they will start from a specific whim, or we focus on the dishes that we like or feel like eating at that precise moment, and then check if they are suitable or not for us.
I can assure you that this never happens: Almost all the dishes that we might fancy, can not be consumed, because the contain specific substances that we can not eat. There are even several theories that say we only become allergic to what we like, because of the burden of the system: We ‘abuse’ or consume more things that we like, and in some cases, the body reaches a limit intolerance.
In other allergen card cases, somewhat more developed we may find, something similar to this:
Icons listed in columns ordered by allergen, highlighting the substances present in each dish. In this case, the information is easier to process, even when detecting several intolerances: It is much easier to detect the rows in which the problematic substances are together but these systems still have a problem:
It is focused on what you cannot eat, instead of what you can. It happens on many occasions that after having checked an entire menu (especially in tapas bars during the summer) you reach the conclusion that you can not consume anything at all (but your appetite has already been awoken, if your appetite was not aroused when you left home).
Normally a specialized restaurant will not consider all cases of allergens, perhaps only the most common ones, but in this case, the menu is oriented to the dishes you can eat, so you only process the food that you are able to eat:
What is the big difference?
So far the examples have focused on what I can NOT eat, this last menu focuses on the opposite: What I can eat.
But the previous example, although good, is insufficient. What would happen to a person who had allergies to many of the products? How would we solve the problem? One of the options would be to go back to the second case (Figure 3) trying to find the rows with fewer icons highlighted, and then check if there is any possibility, although this would mean having a customizable menu that automatically gave us information about what to eat and whatnot. I am sure that within a few years and thanks to artificial intelligence, we will have this information filtered just bypassing our biometric bracelet over the welcome sensor at the door. Including the combination that will best suit us at that time of day and in line with the exercise we have done that day. In the meantime we could have something similar to the following on a digital menu supported on a tablet or mobile:
To get something similar like this:
No results matching the search criteria were found:
Better go home. Or drink a glass of water.
Or better still, that we “print” on demand whatever we like. No allergens included. Or better still, that nobody has allergies to any food.
Spanish companies made large investments in working capital and in international companies, in spite of a slight slowdown in the economy during 2018.
According to figures from the consultancy firm TTR, Spanish investors spent more than 12,250 million € acquiring international capital. Of the total completed transactions over the last year, only those transactions for which the total outlay is known were included in this report. For some of these, however, the associated debt is not known.
Leading the ranking:
Abanca paid 1,379 million to Deutsche Bank for a retail business in Portugal. Amadeus(from the tourist industry) acquired Travel Click (of USA) for 1.319 million € and Cepsapurchased the state-owned oil company Abu Dabi Adnoc for 1.208 million €.
Through the following visualization, we will better understand the size of these investments per country and sector:
The #Cuéntalo movement started early in the morning on April 2018, inviting women to share on Twitter their personal experiences of sexual aggression. In a few days, it generated more than 2.5 million tweets and retweets with stories told by their protagonists.
Archivists Vicenç Ruiz and Aniol María gathered the tweets in real time, and together with data journalist Karma Peiró they came to BSC to talk about how we could study and visualise this dataset.
The tweets go from the uncomfortable to the unbearable, stories in first person occasionally mixed with a woman speaking in the name of another one because she doesn’t have a computer, or because she doesn’t dare to tell her story, or because she was murdered.
“My mom’s boyfriend raped me when she was drunk because ‘I was very much alike and it was wrong to make love in her state’ and ‘so that you learn for when you get older’, since I was 9 until I was 12 years old”
There are many tweets, and each of the is important. Our goal was to study them with statistics and visualise the results to convey the shocking magnitude of the movement, while trying to respect the unique identity of each narration. Cristina Fallarás, the journalist that started and pushed the viral movement, gave us one more goal: She wanted the #Cuentalo visualization to be shocking, but also to remain a safe space, a place without fear or shame where victims could tell in first person their brutally honest testimonies (many of the telling them for the first time in their lives). A place where they can tell what’s hidden in plain sight, because even if it happens every day, what is not said out loud does not exist.
In the next few posts we will explain our analysis and design process for this project: From data gathering, cleaning, and treatment, to the conceptualisation and implementation of the final result.
The original dataset contained 2.1 million tweets in JSON format, created between April 27th and May 12th, 2018. The collection was missing two days that we were able to recover partially, reaching 2.75 million tweets in total. Each tweet has a lot of properties that we analised (and we will tell about this on the next post); here we will focus on what we used for the visualization.
One important distinction is that there were 160 thousand tweets with content written by users, while the rest (2.6 million) were participations in the form of retweets. These are indeed crucial for spreading the movement (and we assume they are supportive of it), but the data we want to visualize is in the content of the tweets we call “original” (i.e. not retweets).
We want to visualize who tweeted: These mostly anonymous 160 thousand tweets can be split between those that gave some testimony in first person, the ones that tell something on behalf of another woman who doesn’t dare or can’t do it (for example, because they don’t have internet access or they were murdered), and those that express words of support. We also found a few unclassifiable tweets (advertising, images), and a small group of trolls and jokers.
And on the other hand, we wanted to visualize what people wrote about: What testimonies were told? Because of the volume of the data, this was a complex task, but we knew it would be valuable, as we know that above 70% or 80% of sexual aggressions are never reported. This is one of the key points of #Cuéntalo: Bring out to light those things that happen everyday and are not computed in any registry, either by fear, or worst, because society does not believe them.
“I’ve been raped twice; part of my family doesn’t speak to me anymore for saying this, and for saying I wouldn’t shut up about it. I’ve even withdrawn a rape report because the police made me afraid. I won’t shut up anymore #Cuentalo”
In order to categorise the 160 thousand original tweets, 16 people in our team manually classified the content of over 10600 tweets (randomly chosen). The goal was to use this sample to create an algorithm to classify the rest of the tweets automatically, and for this we used the maximum number of categories that we could process correctly.
The first categorization was about who wrote the tweet: In the first case, we have testimonies in first or second person, an expression of support, someone that opposes the movement, and random tweets (for example, tweets in other languages, people that used the popularity of the hashtag to do advertising, or tweets containing just a screenshot). This last example did contain people that told stories that were too long for the 280 characters of Twitter, but we cannot read them correctly.
“The stories of #Cuentalo give me the shudders. They pain. They pain a lot. But it is necessary to read them to understand they are not isolated cases: they are everyday situations.”
After this we categorized the content of the tweets. We found accounts that go from the sensation of fear and insecurity that women feel everyday, to murders with torture, with all types of aggressions in between (physical, verbal, or virtual), including beatings and rape. To ease the training process (more details to come in the next post), we opted for the simplest possible categorization, sadly at the expense of the precision we would like to have (or that we could if we read by hand all tweets). Aspects like frequency or duration, degree of cruelty, age, or victim-aggressor relation were not worked on, not because we don’t think they are important but rather because of technical reasons.
Our categorization is incomplete and improbable in many ways (for example from legal and social points of view). We argued a lot about this issue, specially worried about not minimizing or simplifying the seriousness of the fact.
Initially, the categories we used were: Murder, rape, sexual agression, assault, harassment, fear (the explicit mention of), and emotions like disgust, rage, and indignation. Like we said, this categorization is imperfect because of its simplicity, and for the visualization we aggregated it even more to just three categories: assault or any kind of physical agression (including murder, rape, and so on), non-physical agression (harrassment, fear), and an emotional reaction (like rage, etc). The previous and finer categorization work was not in vain as we were able to use it to estimate the percentage of tweets in each category in the full dataset. Of the 10632 tweets we labelled by hand, 31.03% are written in first person, 8.91% are told by someone else, 40.18% are support tweets, 3.12% are tweets agains the movement, and 16.69% are unrelated tweets. Extrapolating these percentages to the total, we would have error brackets around 1.5% for the tweets against #cuentalo, 3% error in the estimation of testimonial tweets, and almost 6% for the support tweets.
Within the testimonial tweets (first and second person, totaling almost 40%), 3.92% mention murder, 5.59% mention rape, 11.18% mention sexual assault, 6.27% mention physical assault, 14.19% talk about harassment, 11.78% mention fear, and 19.23% some reaction like disgust/rage/sadness (the percentages don’t add to 100 because in the same tweet many things can be mentioned). Again, if we extrapolate these numbers to the global dataset, we find error bars of 1% for rapes and murders, 3% error in the estimation of assault and harassment, and 6% for reaction tweets.
It’s often told that people understand frequencies better than percentages, so we wrote these results in the webpage as such:
Let’s conclude this section by commenting that our algorithm was able to label tweets from the dataset with a precision of 80% for who is writing the tweets, and with a precision of 70% for the topics mentioned. In general, not exact but quite well due to the reduced size of the input data (better results are obtained with millions of sentences). We expect that there will be many mis-labeled tweets, and the predictions is more a suggestion than a final conclusion. Our take away from this is that the visualization cannot depend way too much on the precision of this clasification until we can manually label them all.
We start by discussing images and themes that we used as inspiration, including some of our own sketches as we started producing them.
Initially we started with a preconcept: We were going to find many conversation threads, and we tried to represent them with trees like this:
However, we found that there was overwhelmingly more single tweets and that the conversations were not long, so this technique did not work.
We then though about a lineal time narrative, showing the virality of the phenomen and its magnitude. This charts shows the number of tweets per minute (the highest point is about one thousand), from 27th of April to the 13th of may:
We played with the idea of this chart looking like a sound wave, representing the movement as a giant scream heard around the world:
But in the end we were not convinced by the scream metaphor. Also, the lineal time representation would prevent us from adding data in the future. It felt like freezing the event in time, not allowing it to continue. This visualization did allow us to represent tweets by country, though:
We ended up dropping this option also because we had lost the individuality of each tweet, which was important for us.
In order to be able to let people add more tweets, we started exploring circular representations and periodicity.
In our first attempts, we started plotting hours of the day around the circle, and setting tweets from the inside out in a first created order. These sketches were made thinking of an eclipse, or a human eye:
This representation is quite versatile and allows us to incorporate more dimensions like country (e.g. in color), or number of retweets, AND it allows to explore interactively each tweet one by one.
In order to escape the ink blot shape and take better advantage of the radial position for each point, we tried with some structure, for example by ordering the tweets inside-out in a ring for each day, with the size of the ring proportional to the total volume of tweets for each day:
The results are interesting by they also have the problem of being difficult to expand with new tweets in the future.
At this point, we met with Cristina Fallarás who sent us back to the origins, and focused on where we should put the main message: #Cuentalo, aside from being an event with a large social impact, was a safe space where women could tell their story. We then decided to use our radial coordinate to somehow represent this. We needed to put the women who were telling their stories in a space at the center of focus, and surround them by the people who support them but also isolate them from the randoms and trolls outside in the world.
With this in mind, we created the first sketches that approach the final solution:Di
Our classification algorithm (of who wrote the tweet) allowed us to make a final improvement: We removed those tweets where we were at least 90% sure that they were random (jokes, advertising, and unfortunately also tweets with only an image). In the final visualization we end up with only 100 thousand original tweets, and retweets encoded in the size (subtly).
SO, before getting to the final representation, let’s remember our goals in the design, what it should say:
– Magnitude: These things happen, and more than we think, These are numbers that should set off our alarms, specially because behind these tweets there are so many more women who haven’t dared say anything.
– Empathy: This maybe happened to you too, or could happen to anyone around you. It is empathy that helps us understand someone else´s suffering, their fear, like a close feeling that makes us intervene and speak up.
– The diversity and atrocity of the crimes. Murder, rape, torture, crimes against minors, and crimes commited by families, friends.
– Hope that the victims will find a safe environment where they are not judged and questioned, helping others come forward. This safe space is reinforced by the messages of those that denounce the situation without having experienced it personally, facing those that do not believe how serious the problem is and try victim shaming and blaming. Only by legitimizing the suffering of many we can discuss justice reform and make law reflect better what happens in reality, and thus help change and improve our world.
The final visualization #cuéntalo
The final visualization (here), ended up using the circular representation to evoke a safe and protected space. Our estimation of who wrote the tweets let’s us represent testimonials in the center ring, somehow surrounded and protected by the tweets of support in the second circle. The rest, far or against the cause, are very outside the inner circle. Each tweet is represented by a point in in space, forming a large and filled cloud that gives us an idea of the magnitude and repercusion of the phenomenon. The individuality of each tweet is preserved thanks to exploration, allowing us to see the content of each story by hovering with the mouse. The time of day when the tweet was written is encoded in the angular variable, reminding us that this is an issue that happens at all times of the day, everywhere. The bright colors, over dark background, represent a light in the darkness. The color palete (from red to white) evokes the violence of the topic, and by using it to encode the probability that a tweet is talking about a physical aggression, we discover something interesting: Most of the tweets that talk about assault are located near the center. The inner ring, where women tell their stories, is tainted with deep red where the most piercing tweets.
The visualization is complex, and the legend helps us interpret it:
In alphabetical order: Sol Bucalo, Luz Calvo, Carlos Carrasco, Fernando Cucchietti, Artur García Saez, Carlos García Calatrava, David García Povedano, Juan Felipe Gómez, Camilo Arcadio González, Guillermo Marín, Irene Meta, Patricio Reyes, Feliu Serra and Diana Fernanda Vélez. Also from BSC for the classification we had the collaboration of María Coto and Laura Gutierrez.
Epilogue: Unexplored options
In the selection process of themes we wanted to explore, we dropped some interesting options, like for example talking about the age of the victims: more than three thousand tweets report victims less than 18 years old. Many of them are today adults and telling their story for the first time.
We also thought about focalizing in those tweets in second person that mention the woman who was murdered (10% of all testimonies), usually ending with a phrase “I’m telling this because…can’t”. This is an example visualization of all the name mentioned in this kind of phrase, in a homage to the Iraq’s bloody toll famous visualiazation:
Note: The charts in this post can be explored interactively and for many more countries here. Source code and data are here.
This post has two parts, a commentary on the charts, and a more technical discussion on the visualization.
Recently I posted a chart on the evolution of the economy and social inequality in Argentina. The chart was a reproduction of one made by Alberto Cairo for Brasil, where it was shown that starting on the governments by Lula (in principle more leaned to the left) the country grew and inequality was reduced. The same trend can be seen in the Argentina chart starting from the governments of Kirchner and Fernandez (perhaps even from the interin Duhalde government). From the begining of this exercise my attention was drawn to the point that the trend starts around 2002 for both countries, so I thought it would be interesting to make the same chart for the countries in the region:
Here I reduced importance (by lowering color contrast) to presidential periods, so we can focus in the curve trends. As countries have very different sizes, putting them in the same chart is not super-helpful. Let’s try a separate chart for each country (now with more colours):
I think it can be well appreciated that yes, all the countries of southern south America (the southern cone as we call it in Argentina) have grown economically and lowered inequality from 2002 on. Without entering a discussion on the causes of the trend (which would require a rigorous statistical study and a good theoretical base), there are many interesting details we can see:
Chile is the only country that has a clear and constant trend for the whole observed period, even though its Gini coefficient is higher than Uruguay or Argentina (however comparing Ginis between different countries can be tricky, it’s best to compare them when methodologies are the same)
Paraguay, Argentina, and Brasil suffered huge swings in the late 90s.
Bolivia and Paraguay are much smaller economically, although in the interactive version you can see that actually all of latin america is was smaller than developed countries.
I have the feeling that the charts above do not make the best job at highligting how coupled the countries are. For this purpose, I created a second chart in which I used a single color to distinguish the four combinations of larger or smaller GDP or Gini. The idea of this chart is that when countries change all in the same direction (irrespective of precise direction or size of change) it will look like a solid color block. This is the result:
I am pretty happy with the chart, the joined trend after 2002 is now very evident. I am still lingering about how to include at the same time the size of the change without loosing this strong highlight. It could be very useful, take for instance Chile that has a few bad years between 1998 and 2000, but they are actually way smaller than Argentina’s 2001 crisis–but in this chart they look the same, so bad chart, bad.
In a continuation post we shall study how to include this information as well as presidential periods in this deconstructed Cairo chart.
As before, the source for the data is the World Bank (this table y this table), and presidential periods were extracted from here.
Having many countries together in the plot makes it complex to distinguish presidential periods — which is in my opinion what puts the original chart by Alberto Cairo in a different class than a simple connected scatter plot (maybe we can call it a Cairo chart?). In this case I lowered the colour cacophony on purpose to highlight just the direction of the curves, but in doing so I maybe should have distinguished before-after 2002? Perhaps we can plot a simpler variable like left or right leaning government? I would need to catalogue all those governments, though…(you can help in this repository).
Placing the labels in the interactive is a nightmare for which I don’t have time. Literally, I’ve had it in dreams haunting me. The charts here are heavily stylized in Illustrator, but it’s not simple to place labels automatically without putting them on top of interesting things. Map people know this too well.
Perhaps the interactive could be helped by having presidential periods come to the foreground when you hover on top? I mean, not just the label, and reduce the rest?
The multiple scales are a discussion on their own. I left a few of the best options in the interactive. The default version is the one I like the most, but it is certainly a little misleading. It is more accurate to see all curves with the same scale, but it makes it difficult to see the details in each countries trends.
The only important scale option I left behind is an isometric on both axis (same percentage change). The closest I have is the combination of seeing all curves in the same plot, and then using a reference year to see percentage change. Just by coincidence, the x axis changes about 75%, and the y axis a 60%. So, almost there.
The list of presidents I found online is not what you think for some countries like Germany (and others), that have a president (head of state) AND a chouncellor or primer minister (head of government) who is the one holding the actual power. Adding this information is easy if I can get help (and if the World bank has the GDP and Gini data)
Finally, on the direction chart: I’ve been trying small lines under the squares to denote presidents, a colour gradient instead of a single colour for each direction, and little arrows instead of a single colour block. None is too satisfying, but I think it’s worth to keep trying. Ideas?
A small clarification before we start: I am not pro or against the current Argentinian government, and if I had to vote in next sunday’s elections I would not be able to choose a side. The following is not a real political commentary, what I like is data visualization.
Recently I got to read Alberto Cairo’s The Functional Art, and I was strongly attracted to a great infographic about the evolution of the economy and inequality in Brazil. It got me thinking about how would that look for Argentina, so I went and got the data from the World Bank (starting in 1986, and only up to 2013), and reproduced the plot in a very similar style but using the Argentinian data.
The chart’s interpretation is that points higher up represent more inequality, while points to the right mean a higher production of economic value. In the words of Alberto Cairo, one of the messages of the chart is that growth in GDP does not always mean a reduction in inequality.
A few technical comments:
Like in the original chart, I used the Gini coefficient to measure inequality. It is far from a perfect indicator, but it is a very popular one. To measure economic growth I decided to use the GDP per capita instead of the total one used in the original chart. I think this would be a little better since the country’s population grew significantly in the three decades spanned by the data.
I respected the original design and separated the presidential terms by colour, which I think is a brilliant decision. It totally makes the plot. There is one caveat though, and it is that each data point measures a whole year and president changes happen at different points during that year. Therefore there might be some leeway in how to put the color (and I did’t figure out an impartial algorithmic way to decide this).
With the risk of ruining the incredible work by Cairo and collaborators, I took the liberty of adding a few labels indicating important economic events (and I also changed fonts, colors, stroke widths, and other small stuff that makes my chart subtly but clearly worse, of course).
Comments on the content:
Like in the original chart, you can see a clear mark or general tendency that is very different for each presidency (including the difference between Menem’s first and second term).
More importantly, beginning in 2003, and coinciding with the Kirchner’s rise to power –just like Lula in Brazil–, the country grows economically and inequality goes down in an unprecedented manner (except for a small glitch under the global economic crisis of 2008). I could not find Gini estimates before 1986, but there are some for the urban area of Buenos Aires and they show that only in 1984, and before that in 1974, there were such low levels of inequality as today.
An important detail: reliability of the source
After a lot of comments by many of the readers in the original (in spanish) post, I caved and gave more hours to this project and included an alternative measure of GDP (alternative to the World Bank, that is).
Why? Well, for those not well versed in Argentinian politics, the current government intervened the official statistics institute, and since then their numbers (which feed the World Bank’s database) have been strongly questioned by many. The World Bank itself recognises that unreliability of the official data from Argentina and puts a disclaimer saying “the World Bank is also using alternative data sources and estimates for the surveillance of macroeconomic developments in Argentina”.
Many people asked me to get an independent source and check the data, so I did. I discovered other things in the middle, like for instance that it is terribly difficult to match different methodologies when measuring GDP. Therefore, I decided that the best (in terms of a compromise between easiness and correctness) was to give the official numbers for 2007 as valid, and then calculate the subsequent years from the relative growth (year to year percentage) measured with this other indicator (called ARKLEMS, it just rolls of of the tongue right?). The data from this analysis is shown in the alternative timeline in gray, which unexpectedly (from the comments), almost lines up with the data from the World Bank.
There are still things that bother me with this: first, I am plotting different things that maybe have no way of being normalized to the same values, and second, and perhaps more important, perhaps this indicates that the adjustment was already taken into account by the World Bank in my original data set (they say they are doing it, but not how). Hopefully this will lead to good conversations and discussions on how to recover the reliability of the INDEC.
The evolution in Brazil, the original chart, is strikingly similar to that of Argentina. This makes me wonder how much is the effect of particular presidents (sure there must be some, but still) compared to the global and regional environment. To answer this question I got the full data set from the World Bank and maybe in one or two weeks I’ll get a complete infographic with a comparison (it won’t be easy as I have to rethink the design and maybe even go interactive…)
All the data comes from the World Bank, from this table and this table, except for the Gini coefficient from 88, 89, and 90, which I found in Gapminder.org. They cite the World Bank as a source but probably they took it from somewhere else. As a side note, Gapminder already let’s you do the comparison between countries of the region, only that there is no presidencial term information. The alternative GDP calculation from 2008 onwards was taken from ARKLEMS, which I adjusted for population growth, and only used the year to year change to estimate the movement of the 2007 World Bank data point.
Someone once accused me of not doing visualizations. Although that is not actually true (I’ve done more than one and so that means there are a lot of people out there with a bad memory) However, I have to admit that it’s not a really my job.
My job, my function in the Visualization department – apart from designing user friendly interfaces, of course– is to make the visualisations of my team more understandable:
Trying to prevent them putting 20 variables in the sae graphic in an attempt to demonstrate that it can be done, just for the sake of it.
Sometimes I managed to do it. Sometimes I didn’t.
That’s the reason why I decided to write this series of blog entries dedicated to analyzing things that are not clear or aspects that could be improved upon, always from the UX point of view.
Taking Tuftte’s work as a base, Fernanda Viegas and Martin Wattenberg wrote a blog entry titled Design & Redesign which suggested that Data Scientists should not only criticize other people’s work but improve on it with suggestions on how to redesign the visualization and I’ll try to analyze them respecting their original style.
First I’ll confess that I chose this representation: Jounals, because I thought, at first glance, that it was appealing and easy to analyze. I also wanted to prove that my point of view coincided with, or at least complemented, the view of experts (my boss basically). And they did.
The first problem we see is that depending of the subject or type of publication we see a different time period (from 2004 to 2013, from 1970 to 2010…)
The question is why not represent the same time period for all graphs to show that they don´t have any data in some years.
A different problem is the time step which is changing throughout the different graphs: every two years, every five years…
After thinking about the color range, in the end, I deduced it was not relevant. The color range selected only tries to differenciate one line from another, but some users could have thought: Does the range (Blue, red, green) mean something? Does the color intensity mean something else? Is the light blue more relevant than the dark one?
And the last and most important design error: Why are some totally different values represented with the same/similar radius?
The basic problem is that for every line they have changed the relative radius. So if you don´t see the values beside two similar circles you might think they are hiding a similar value, but they don’t. One circle could have a value of 20, while a similar circle could have a value of 2. So at first glance, and without any interaction you can’t compare the two graphs (or even two lines) easily.
This does not pretend to be an deep and extensive visualization experiment. I just wanted to share with you a simple exercise of data visualization using Tableau.
I have to admit that my first contact with the tool was few months ago, and I also have to say that it has improved a lot this past year, adding some useful functionalities. (I have to admit that the video tutorials might have had something to do with this, but to tell the truth I’m falling in love again)
The visualization shows a representation of the offenses committed during 2013 by the bias motivation from the crime datase of the FBI.
The offenses (committed in 2013 in the USA) were grouped depending on the type of the offense: by race, religion, sexual orientation, even gender. Every type was subdivided in subtypes: by talking about religion we have more information about if the incident was done against catholics, jews, islamics, etc.
My first representation was just about the number of the incidents depending on the incident type:
Then I added the subtype variable to the color filter to add more deep information to every listed type:
Not many conclusions about this , I simply want to say that is rather sad that the incidents related to race are still the most frequent, and that the number of offenses against the afro-american population triples the number of offenses against white people. The number of incidents related to race are followed by offenses linked to sexual orientation. Most of them against the gay community.
Well, not really a troll. A Troll by definition is a person that publishes wind-up messages in an online community with the main intention of annoying or provoking an emotional answer in the users or readers.
Provocative message? Guilty.
Trying to provoke an emotional answer? Guilty.
Message Irrelevant? Not at all.
The conflict began on 4th of August with this first tweet in response to an infographic published on the El País twitter account:
in which apparently the most expensive signings from 1998 to 2015, between the English league and the Spanish league clubs were compared.
I say apparently, because it seems the information was wrong: Someone called Bale hadn’t been included. I have to admit I don´t know anything even about his existence and much less about football signings.
The second tweet (persistent) was this:
showing some Tableau created graphics using some data taken from The Guardian: Totally different results.
And the last tweet:
Giving as a reference and article with the same (and correct) numbers.
Wrong processing? Some kind of error? May be something deliberate?
After sitting in front the great Julio Pomar for months I know that’s not the best way to deal with a troll. By ignoring him, I mean, specially when the troll is telling the truth and we have published some wrong data.
The best way to react would had been to admit the error, say sorry and rectify the data.
Talking with my boss about that two weeks ago, I knew the reason for this unpleasant comment (I have to say that’s not his usual way of doing things. He is actually nice, respectful and always ready to help anybody):
“I think that journalists have access to a lot of information, information that most of us don’t normally have access to, so they have a commitment with the society to be honest and unbiased. Those things made me indignant.”
So I decided to write this post, and I might mention the author (@rodrigo0silva) in a tweet linking to this article. I probably won’t get an answer.
The Book of Trees covers over 800 years of human culture through the lens of the tree chart, from its roots in religious medieval exegesis to its contemporary, secular digital themes. With more than 200 images the book offers a visual evolution along history of this universal metaphor, showing us the recent emergence of new visual models.
This book, written by Manuel Lima (Visual Complexity) makes visualization a prism through which we can observe the evolution of culture.
Manuel is a leading voice on information visualization and has spoken in numerous conferences, schools and festivals around the world, including TED, Lift, OFFF, Eyeo, Ars Electronica, IxDA Interaction, Harvard, MIT, Royal College of Art, NYU Tisch School of the Arts, ENSAD Paris, University of Amsterdam, MediaLab Prado Madrid. He has also been featured in various magazines and newspapers, such as Wired, New York Times, Science, BusinessWeek, Creative Review, Fast Company, Forbes, Eye, Grafik, SEED, Étapes, and El País.