Songs are grouped by similarity. Each line represents the distance to the next group of songs, calculated from their sonic properties. (more...)

Click on lines to re-group
Hover to see the connections



How many times do we hear that "the songs from that artist sound all the same"? How similar are they, really? How many kinds of songs did they actually write?

This project was originally inspired by the interest to find out more about musicians and their careers, in particular trying to get a glimpse at the evolution and variety of their work.

For this, we analized the songs by extracting sonic features that describe them(1) (which are used, for example, to automatically classify musical genres). Using these features we can quantify mathematically differences between songs, and group them by likeness using clustering algorithms. It was important that we used artists with a large discography, and since we also wanted to see their evolution we chose people with long active careers. Finally, because of availability, we settled on U2, David Bowie, and Pink Floyd.

The algorithm works like this: The first groups are formed with the songs that are the most similar to each other. Then, these groups are enlarged with the song that is closest to the group, and so on until we have classified all songs. The distance between the groups is very important: In the example below, if the distance between song C and the mini-group A-B is small, we cannot really call A-B a group very different from C, but we should talk about an A-B-C group. On the other hand, if song C is very different from the A-B group, the distance will be large and we can talk about separate A-B and C groups.

We can convert this reasoning to a criteria(2) that lets us define how many actually different groups of songs we find. If we do this automatically, the criteria says that a band like U2 has basically two types of songs. However, we can fine tune the splitting using our own criteria, which is more nuanced than the computer's(3). In the case of U2 we can visually identify maybe up to 9 groups, although mathematically it is hard to justify more than 6.

What conclusions can we extract? Both U2 and David Bowie seem to have written only a few types of songs. On the other hand, Pink Floyd songs resisted any kind of reasonable grouping. They have the widest variety of the three artists we studied (and we are still double checking this). Also important, all the productions we studied seem to be roughly stable in time, that is, the groups contain songs from most of their periods. Only when looking at the smallest subgroups we see a sparcity in time appear, although at this point the groups are not that different from neighboring ones (we are looking for a metric to quantify this properly).

PS: Do you want your favorite band analysed? Get in touch with us and let's talk.

(1) The feature sets we extracted include low-level signal properties (bandwith, frequency, pitch, etc.) and mel-frequency spectral coefficients (MFCC) as in this paper, although we used a combination of 5 seven-seconds samples from each song, separated by 30 seconds.

(2) We used the maximum of the called silhouette width.

(3) It would be reasonable to choose any number of groups from a range in which the silhouette width is similar.

This research was carried out during the PRACE Summer of HPC programme by Sofia Kypraiou and the collaboration of the scientific visualization team at the Barcelona Supercomputing Center.

To speed up the analysis we used the PyCOMPSs parallel programming model and the Marenostrum supercomputer. A full report including methods and more results will be available from PRACE soon.