Do you listen to music? Do you use a music streaming service? Are you a data nerd?
If you answered yes to these 3 questions then I have the website for you! LastFM is a great way to keep track of your listening habits and gather statistics on your music streaming without having to wait around for Spotify Wrapped once a year.
In this post, I will performing an exploratory data analysis (EDA) of my music listening habits by analyzing data from my personal last.fm page, which contains a mostly accurate log of every song I have listened to on Spotify since I joined the website in April of 2018.
All code will be kept in the Jupyter notebook. In this post I will include some visualizations and discussion of the analysis. Code will be referenced by hyperlink for those interested. The full Jupyter Notebook can be found here.
Outline
What is Last.fm? 🎶
Last.fm is a website that allows you to track your music listening habits among other things.
A last.fm account can be automatically linked to a Spotify account and it will start sending "Scrobbles".
A "Scrobble" is just one entry of a listened track on your last.fm page.
Scrobble data includes:
Track Name
Album Name
Artist Name
Album Artwork (if any)
Exact timestamp of listen
Last.fm API 📑
Last.fm provides a developer API which we can call by making Python requests. Each API call returns a 'page' from a user profile, with a maximum of 200 scrobbles. Therefore we need a function to get the amount of pages per user.
Once we have this, we can write a function to loop through these pages and get scrobble information to add to a pandas dataframe.
Pandas DataFrame Cleaning 🐼
To clean the data, first we format datetimes, and concatenating artists with albums and tracks to create unique identifiers (avoiding the issue of duplicate album or track names across multiple artists).
I also created a small webscraper to parse available metadata for each unique artist.
Lastly, from the geodata, we can extract country and subdivision using simple string splits.
Preliminary Analysis 📊
Taking a first look at the data
In this section, we'll create a line graph to take a look at the average number of scrobbles per day, as well as find the longest streak of continuous daily scrobbling:
This bar graph is something that is already easily viewable on last.fm user pages, but when we colour by artist, we can see that half of my top tracks are by the same artist!
Top 10 artists is another graph that is readily visible on my user page, however if we colour by band vs. solo artist, we find that exactly half of my top artists are bands, and half are solo acts!
• Genre
Now let's take a look at tracks by genre, keep in mind the parser only grabs the top tag of a track, and many tracks contain multiple tags, but this will still give us a general idea of the genre's of music in my library:
• Duration
A histogram is a great way to see the distribution of duration (ie. track length) of songs in my library. Unfortunately not all tracks in last.fm's API contain duration data, but we'll work with what we have to get a good sense of it anyways:
Deeper Analysis 📊📉
Asking and answering specific questions about the data
When did I reach scrobble milestones on my account?
Specific milestones are somewhat arbitrary in this case but they do give us a sense of my overall listening habits, which look to be fairly consistent throughout the 3 years I've had my lastfm account:
How did my top artists become my top artists?
To see how my current top 10 artists became my current top 10 artists, we can graph their cumulative scrobbles over time to get a sense for inflection points for specific artists:
How have my top artists changed over time?
Prompted by a similar question to the above graph, I wanted to see how my top 10 artists changed over time, not just how my current top 10 artists got there. To accomplish that I created a bar plot race which tracks cumulative artists scrobbles every month:
When did I discover the most new artists?
Below is a line graph showing the 7 day rolling average for number of new artists discovered. Click on the red trace to see the true values:
Where are artists in my library from?
This question will have multiple visualizations to asnwer it.
First off: a comparison between American artists vs. Artists from outside the USA:
Now let's see that in a choropleth map:
Note: the logarithmic colour scale:
Now to folium, plotting the birthplaces of solo artists in my library. Zoom it to get a better sense of exact locations! The circle radius is proportional to the amount of artists born there:
And here's the same idea, but showing where the bands in my library were formed:
As a last visualization for the geodata, here's a Sankey diagram showing various splits from available data to geodata:
When were the artists I listen to born? When were the bands formed?
Histograms again, we can see an interesting bimodal distribution here, with one peak around 1940-1944 and the second peak around 1985-1999.
Same idea for band founding years, 1965-1969 seems to have more bands, relative to surrounding years, with 2005-2014 being holding the highest proportion:
How popular are my tracks on the site as a whole, is there any correlation in listens?
The scatterplot below plots a dot for each track in my library, with my plays (logarithmic) on the x-axis and the sitewide playcount (also logarithmic) on the y-axis. There is no discernable strong linear correlation here:
Next up, a scatterplot with logarithmic playcount against sitewide unique listeners. Here we can see a strong discernable correlation both linearly and logarithmically:
When do I listen to the most or least music? What day and what month?
A radial bar plot is a good way of visualizing this distribution. We can see most of my scrobbles occur between the hours of 12:00 - 20:59, with very few in the morning hours of 4:00 - 6:59.
Note: times are in EST
Now let's see if plot scrobbles by day of week. We can see on Fridays & Saturdays I average slightly above 60 scrobbles, whereas Sundays are the lowest but not by much:
Same idea but for month of year, here we can see more variation, with March and October being standouts among their surrounding months and August and December having relatively less:
Does my listening change within weekdays throughout the year?
Here's a heatmap showing my average number of scrobbles in a day of week against month of year. Interestingly we can see Thursdays in October are when I listen to the most music on average, as well as Friday & Saturdays in October, Fridays in November and Tuesdays in April. August stands out as a lower month for music listening:
Summary 🎼
Overall I feel like I have a new understanding of what I listen to, who I listen to and when I listen to music. These graphs are just a few highlights from my full Exploratory Data Analysis of my LastFM page. The Jupyter notebook can be found here on Jovian or Github*.
*Note Plotly graphs may have issues rendering in GitHub.
If you have a LastFM account, you can run the EDA for your stats by changing the username variable near the top of the notebook to your username. Enjoy!
Visualization Packages 📚
Full list of packages used in the EDA can be found here in the Jupyter Notebook.
Future Work 📅
Optimizing the API scraper function to be more efficient and run faster
Improving the tags parser for better genre analysis
Including scraping of image URLs to allow for artist photo and album artworks to be integrated into the data visualizations
Creating a "mainstream-o-meter" to determine how mainstream vs. distinct a user's music taste is
Creating a dashboard for all LastFM users to access to these visualizations for their personal music profiles
Comments