On returning home after a walk, with a stop at a mountain restaurant, my partner told me with a smile of excitement:
– I’ve been in a fantastic place. Great food and they also have a celiac menu. I’ve taken a picture of it, in case you want to come with me someday.
Those who have any kind of intolerance will know that the problem does not only affect the sufferer but also family and friends when it comes to finding a restaurant where everybody can eat. Depending on the level of intolerance, maybe will happen if the food has some traces of gluten, for example. But sometimes, and for very intolerance sensitive people, an error could be a serious problem, or even lead to death. (Several of my friends always carry an adrenaline injection, in case they accidentally encounter ingredients to which they are allergic.
Returning to the subject of the restaurant, I was looking at the photo I received on the mobile phone. After examining it for a while, I raised my eyebrows and looked at him sternly and said with a sour face:
– This is not a celiac menu…
– What are you saying? Here it says what has gluten and what doesn’t.
– That’s not a celiac menu (I insisted). In fact, it’s something completely different.
People usually say I’m not the nicest of people, especially when I am hungry for something, but there’s no way to satisfy this hunger. (quite an apt expression with the subject at hand). If you have never suffered any kind of intolerance related issue, you may perceive my reaction as a bit OTT and exaggerated, but in this particular case, I’m completely right: An allergen card is not a celiac menu, it’s basically the opposite.
After the European law 1169/11 came into force, it became mandatory to inform customers about the allergens in the dishes on the menu in bars and anywhere in the restaurant trade, although there are still many that do not comply with the regulations and many others that don’t present the information properly.
Let’s take a look at the differences:
Normally an allergen card is presented as follows: A list of dishes with their possible adverse components listed one after another for each dish.
Sometimes they do not even appear in the same order, and this is why it is difficult to process the information and determine which dishes you can eat and which dishes you can’t. Although the icons include a certain order, by not having the same ingredients, it is difficult to make a visual scan between the icons for which we should be looking out for.
Let’s Imagine the case of a celiac and allergic who is also allergic to celery: In this case, we will read the dishes one by one and then check which of them have the wheat icon (we will not enter the debate on the icon’s meaningfulness).
Once this is done, we will memorize the dishes with gluten and begin the search for celery, something that little by little will increase our cognitive load.
Some of you may be thinking that this is not what happens in real life, that users do not search in this way, and that they will start from a specific whim, or we focus on the dishes that we like or feel like eating at that precise moment, and then check if they are suitable or not for us.
I can assure you that this never happens: Almost all the dishes that we might fancy, can not be consumed, because the contain specific substances that we can not eat. There are even several theories that say we only become allergic to what we like, because of the burden of the system: We ‘abuse’ or consume more things that we like, and in some cases, the body reaches a limit intolerance.
In other allergen card cases, somewhat more developed we may find, something similar to this:
Icons listed in columns ordered by allergen, highlighting the substances present in each dish. In this case, the information is easier to process, even when detecting several intolerances: It is much easier to detect the rows in which the problematic substances are together but these systems still have a problem:
It is focused on what you cannot eat, instead of what you can. It happens on many occasions that after having checked an entire menu (especially in tapas bars during the summer) you reach the conclusion that you can not consume anything at all (but your appetite has already been awoken, if your appetite was not aroused when you left home).
Normally a specialized restaurant will not consider all cases of allergens, perhaps only the most common ones, but in this case, the menu is oriented to the dishes you can eat, so you only process the food that you are able to eat:
What is the big difference?
So far the examples have focused on what I can NOT eat, this last menu focuses on the opposite: What I can eat.
But the previous example, although good, is insufficient. What would happen to a person who had allergies to many of the products? How would we solve the problem? One of the options would be to go back to the second case (Figure 3) trying to find the rows with fewer icons highlighted, and then check if there is any possibility, although this would mean having a customizable menu that automatically gave us information about what to eat and whatnot. I am sure that within a few years and thanks to artificial intelligence, we will have this information filtered just bypassing our biometric bracelet over the welcome sensor at the door. Including the combination that will best suit us at that time of day and in line with the exercise we have done that day. In the meantime we could have something similar to the following on a digital menu supported on a tablet or mobile:
To get something similar like this:
No results matching the search criteria were found:
Better go home. Or drink a glass of water.
Or better still, that we “print” on demand whatever we like. No allergens included. Or better still, that nobody has allergies to any food.
Spanish companies made large investments in working capital and in international companies, in spite of a slight slowdown in the economy during 2018.
According to figures from the consultancy firm TTR, Spanish investors spent more than 12,250 million € acquiring international capital. Of the total completed transactions over the last year, only those transactions for which the total outlay is known were included in this report. For some of these, however, the associated debt is not known.
Leading the ranking:
Abanca paid 1,379 million to Deutsche Bank for a retail business in Portugal. Amadeus(from the tourist industry) acquired Travel Click (of USA) for 1.319 million € and Cepsapurchased the state-owned oil company Abu Dabi Adnoc for 1.208 million €.
Through the following visualization, we will better understand the size of these investments per country and sector:
The #Cuéntalo movement started early in the morning on April 2018, inviting women to share on Twitter their personal experiences of sexual aggression. In a few days, it generated more than 2.5 million tweets and retweets with stories told by their protagonists.
Archivists Vicenç Ruiz and Aniol María gathered the tweets in real time, and together with data journalist Karma Peiró they came to BSC to talk about how we could study and visualise this dataset.
The tweets go from the uncomfortable to the unbearable, stories in first person occasionally mixed with a woman speaking in the name of another one because she doesn’t have a computer, or because she doesn’t dare to tell her story, or because she was murdered.
“My mom’s boyfriend raped me when she was drunk because ‘I was very much alike and it was wrong to make love in her state’ and ‘so that you learn for when you get older’, since I was 9 until I was 12 years old”
There are many tweets, and each of the is important. Our goal was to study them with statistics and visualise the results to convey the shocking magnitude of the movement, while trying to respect the unique identity of each narration. Cristina Fallarás, the journalist that started and pushed the viral movement, gave us one more goal: She wanted the #Cuentalo visualization to be shocking, but also to remain a safe space, a place without fear or shame where victims could tell in first person their brutally honest testimonies (many of the telling them for the first time in their lives). A place where they can tell what’s hidden in plain sight, because even if it happens every day, what is not said out loud does not exist.
In the next few posts we will explain our analysis and design process for this project: From data gathering, cleaning, and treatment, to the conceptualisation and implementation of the final result.
The original dataset contained 2.1 million tweets in JSON format, created between April 27th and May 12th, 2018. The collection was missing two days that we were able to recover partially, reaching 2.75 million tweets in total. Each tweet has a lot of properties that we analised (and we will tell about this on the next post); here we will focus on what we used for the visualization.
One important distinction is that there were 160 thousand tweets with content written by users, while the rest (2.6 million) were participations in the form of retweets. These are indeed crucial for spreading the movement (and we assume they are supportive of it), but the data we want to visualize is in the content of the tweets we call “original” (i.e. not retweets).
We want to visualize who tweeted: These mostly anonymous 160 thousand tweets can be split between those that gave some testimony in first person, the ones that tell something on behalf of another woman who doesn’t dare or can’t do it (for example, because they don’t have internet access or they were murdered), and those that express words of support. We also found a few unclassifiable tweets (advertising, images), and a small group of trolls and jokers.
And on the other hand, we wanted to visualize what people wrote about: What testimonies were told? Because of the volume of the data, this was a complex task, but we knew it would be valuable, as we know that above 70% or 80% of sexual aggressions are never reported. This is one of the key points of #Cuéntalo: Bring out to light those things that happen everyday and are not computed in any registry, either by fear, or worst, because society does not believe them.
“I’ve been raped twice; part of my family doesn’t speak to me anymore for saying this, and for saying I wouldn’t shut up about it. I’ve even withdrawn a rape report because the police made me afraid. I won’t shut up anymore #Cuentalo”
In order to categorise the 160 thousand original tweets, 16 people in our team manually classified the content of over 10600 tweets (randomly chosen). The goal was to use this sample to create an algorithm to classify the rest of the tweets automatically, and for this we used the maximum number of categories that we could process correctly.
The first categorization was about who wrote the tweet: In the first case, we have testimonies in first or second person, an expression of support, someone that opposes the movement, and random tweets (for example, tweets in other languages, people that used the popularity of the hashtag to do advertising, or tweets containing just a screenshot). This last example did contain people that told stories that were too long for the 280 characters of Twitter, but we cannot read them correctly.
“The stories of #Cuentalo give me the shudders. They pain. They pain a lot. But it is necessary to read them to understand they are not isolated cases: they are everyday situations.”
After this we categorized the content of the tweets. We found accounts that go from the sensation of fear and insecurity that women feel everyday, to murders with torture, with all types of aggressions in between (physical, verbal, or virtual), including beatings and rape. To ease the training process (more details to come in the next post), we opted for the simplest possible categorization, sadly at the expense of the precision we would like to have (or that we could if we read by hand all tweets). Aspects like frequency or duration, degree of cruelty, age, or victim-aggressor relation were not worked on, not because we don’t think they are important but rather because of technical reasons.
Our categorization is incomplete and improbable in many ways (for example from legal and social points of view). We argued a lot about this issue, specially worried about not minimizing or simplifying the seriousness of the fact.
Initially, the categories we used were: Murder, rape, sexual agression, assault, harassment, fear (the explicit mention of), and emotions like disgust, rage, and indignation. Like we said, this categorization is imperfect because of its simplicity, and for the visualization we aggregated it even more to just three categories: assault or any kind of physical agression (including murder, rape, and so on), non-physical agression (harrassment, fear), and an emotional reaction (like rage, etc). The previous and finer categorization work was not in vain as we were able to use it to estimate the percentage of tweets in each category in the full dataset. Of the 10632 tweets we labelled by hand, 31.03% are written in first person, 8.91% are told by someone else, 40.18% are support tweets, 3.12% are tweets agains the movement, and 16.69% are unrelated tweets. Extrapolating these percentages to the total, we would have error brackets around 1.5% for the tweets against #cuentalo, 3% error in the estimation of testimonial tweets, and almost 6% for the support tweets.
Within the testimonial tweets (first and second person, totaling almost 40%), 3.92% mention murder, 5.59% mention rape, 11.18% mention sexual assault, 6.27% mention physical assault, 14.19% talk about harassment, 11.78% mention fear, and 19.23% some reaction like disgust/rage/sadness (the percentages don’t add to 100 because in the same tweet many things can be mentioned). Again, if we extrapolate these numbers to the global dataset, we find error bars of 1% for rapes and murders, 3% error in the estimation of assault and harassment, and 6% for reaction tweets.
It’s often told that people understand frequencies better than percentages, so we wrote these results in the webpage as such:
Let’s conclude this section by commenting that our algorithm was able to label tweets from the dataset with a precision of 80% for who is writing the tweets, and with a precision of 70% for the topics mentioned. In general, not exact but quite well due to the reduced size of the input data (better results are obtained with millions of sentences). We expect that there will be many mis-labeled tweets, and the predictions is more a suggestion than a final conclusion. Our take away from this is that the visualization cannot depend way too much on the precision of this clasification until we can manually label them all.
We start by discussing images and themes that we used as inspiration, including some of our own sketches as we started producing them.
Initially we started with a preconcept: We were going to find many conversation threads, and we tried to represent them with trees like this:
However, we found that there was overwhelmingly more single tweets and that the conversations were not long, so this technique did not work.
We then though about a lineal time narrative, showing the virality of the phenomen and its magnitude. This charts shows the number of tweets per minute (the highest point is about one thousand), from 27th of April to the 13th of may:
We played with the idea of this chart looking like a sound wave, representing the movement as a giant scream heard around the world:
But in the end we were not convinced by the scream metaphor. Also, the lineal time representation would prevent us from adding data in the future. It felt like freezing the event in time, not allowing it to continue. This visualization did allow us to represent tweets by country, though:
We ended up dropping this option also because we had lost the individuality of each tweet, which was important for us.
In order to be able to let people add more tweets, we started exploring circular representations and periodicity.
In our first attempts, we started plotting hours of the day around the circle, and setting tweets from the inside out in a first created order. These sketches were made thinking of an eclipse, or a human eye:
This representation is quite versatile and allows us to incorporate more dimensions like country (e.g. in color), or number of retweets, AND it allows to explore interactively each tweet one by one.
In order to escape the ink blot shape and take better advantage of the radial position for each point, we tried with some structure, for example by ordering the tweets inside-out in a ring for each day, with the size of the ring proportional to the total volume of tweets for each day:
The results are interesting by they also have the problem of being difficult to expand with new tweets in the future.
At this point, we met with Cristina Fallarás who sent us back to the origins, and focused on where we should put the main message: #Cuentalo, aside from being an event with a large social impact, was a safe space where women could tell their story. We then decided to use our radial coordinate to somehow represent this. We needed to put the women who were telling their stories in a space at the center of focus, and surround them by the people who support them but also isolate them from the randoms and trolls outside in the world.
With this in mind, we created the first sketches that approach the final solution:Di
Our classification algorithm (of who wrote the tweet) allowed us to make a final improvement: We removed those tweets where we were at least 90% sure that they were random (jokes, advertising, and unfortunately also tweets with only an image). In the final visualization we end up with only 100 thousand original tweets, and retweets encoded in the size (subtly).
SO, before getting to the final representation, let’s remember our goals in the design, what it should say:
– Magnitude: These things happen, and more than we think, These are numbers that should set off our alarms, specially because behind these tweets there are so many more women who haven’t dared say anything.
– Empathy: This maybe happened to you too, or could happen to anyone around you. It is empathy that helps us understand someone else´s suffering, their fear, like a close feeling that makes us intervene and speak up.
– The diversity and atrocity of the crimes. Murder, rape, torture, crimes against minors, and crimes commited by families, friends.
– Hope that the victims will find a safe environment where they are not judged and questioned, helping others come forward. This safe space is reinforced by the messages of those that denounce the situation without having experienced it personally, facing those that do not believe how serious the problem is and try victim shaming and blaming. Only by legitimizing the suffering of many we can discuss justice reform and make law reflect better what happens in reality, and thus help change and improve our world.
The final visualization #cuéntalo
The final visualization (here), ended up using the circular representation to evoke a safe and protected space. Our estimation of who wrote the tweets let’s us represent testimonials in the center ring, somehow surrounded and protected by the tweets of support in the second circle. The rest, far or against the cause, are very outside the inner circle. Each tweet is represented by a point in in space, forming a large and filled cloud that gives us an idea of the magnitude and repercusion of the phenomenon. The individuality of each tweet is preserved thanks to exploration, allowing us to see the content of each story by hovering with the mouse. The time of day when the tweet was written is encoded in the angular variable, reminding us that this is an issue that happens at all times of the day, everywhere. The bright colors, over dark background, represent a light in the darkness. The color palete (from red to white) evokes the violence of the topic, and by using it to encode the probability that a tweet is talking about a physical aggression, we discover something interesting: Most of the tweets that talk about assault are located near the center. The inner ring, where women tell their stories, is tainted with deep red where the most piercing tweets.
The visualization is complex, and the legend helps us interpret it:
In alphabetical order: Sol Bucalo, Luz Calvo, Carlos Carrasco, Fernando Cucchietti, Artur García Saez, Carlos García Calatrava, David García Povedano, Juan Felipe Gómez, Camilo Arcadio González, Guillermo Marín, Irene Meta, Patricio Reyes, Feliu Serra and Diana Fernanda Vélez. Also from BSC for the classification we had the collaboration of María Coto and Laura Gutierrez.
Epilogue: Unexplored options
In the selection process of themes we wanted to explore, we dropped some interesting options, like for example talking about the age of the victims: more than three thousand tweets report victims less than 18 years old. Many of them are today adults and telling their story for the first time.
We also thought about focalizing in those tweets in second person that mention the woman who was murdered (10% of all testimonies), usually ending with a phrase “I’m telling this because…can’t”. This is an example visualization of all the name mentioned in this kind of phrase, in a homage to the Iraq’s bloody toll famous visualiazation: