top of page
  • Rohan Dawar

The Bar Chart Race - Visualizing My Music


Intro

As a music loving data geek who uses Spotify, I often feel disappointed with the platform’s lack of user statistics. There is currently no way for users to see any statistics relating to their streaming habits within the app. The only feature that comes close happens once a year in December when they release Spotify Wrapped, which will show your top artists, albums and songs streamed that year and create a playlist based on your year’s top tracks.

That is why in 2018 I joined Last.fm, a platform that can connect to your Spotify account and from then on it tracks every song you stream - which it calls scrobbles. It also provides recommendations which I personally find to be superior to Spotify’s recommendation algorithm. Once your scrobbles start rolling in, you’ll be able to see lists and grids of your top artists, albums and tracks in any given time frame.


But the real power of harnessing this data comes with the Last.fm API. Here are just a few samples of amazing tools the last.fm / developer community has built on the Last.fm API: {link lastfm data tools examples built on lastfm API}. Here is a somewhat more comprehensive list of currently working last.fm tools: {link Reddit post}.


A simple and practical tool is benjaminbenben’s Last.fm to csv. It is exactly what it sounds like: you input your Last.fm username and it outputs a downloadable csv file of your entire scrobbling history.


Flourish is a service that produces beautiful data visualizations and animations with your data.


The goal of this script is to convert my Last.fm .csv into Flourish-readable data to visualize my Top Artists over time in a ‘Bar-Chart-Race’ Style Animation.


Raw Data

The csv data includes the artist, album, track and date in format DD MMM YYYY HH:mm

Script - Dependencies & Skeleton

import csv, arrow
from pandas import read_csv

csv_filename = "lastfm-aug15.csv"

def prep_csv(filename):
    # add header
    process_csv(filename)

def process_csv(filename):
    pass # Insert Script Below

process_csv(csv_filename)

Script - Add Header with Pandas

This uses pandas to add the header: 'artist','album','track','date' to the first row of our csv file.

def prep_csv(filename):
    df = read_csv(filename)
    df.columns = ['artist','album','track','date']
    df.to_csv(filename, index=False)
    process_csv(filename)

Script - Read Data

1) Creates artists as a dictionary type with {artist : date}, and dates as an array.

2) Reads in csv values for each row using arrow to format the datetime format (here we only care about month and year).

3) Adds a new artist to the artists dictionary if that artist is not already in there.

4) Starts counter or adds to it for an artist in a given month.

5) Adds the date to the dates array if needed.

def process_csv(filename):
    #1
    artists = {}
    dates = []
    
    #2
    with open(filename, 'rt', encoding='utf-8') as f:
        csv_reader = csv.DictReader(f)
        for row in csv_reader:
            artist = row['artist']
            month = arrow.get(row['date'], 'DD MMM YYYY HH:mm').format('MMMM')
            year = arrow.get(row['date'], 'DD MMM YYYY HH:mm').format('YYYY')
            date = f'{year} {month}'
            
            #3
            if artist not in artists:
                artists[artist] = {}
                
            #4
            if date not in artists[artist]:
                artists[artist][date] = 1
            else:
                artists[artist][date] += 1
                
            #5    
            if date not in dates:
                dates.insert(0, date)

Script - Accumulate Counts

Counts all artist's plays in a given month (date) and accumulates them into the values of the dictionary artistsTotal.

artistsTotal = {}
for artist in artists:
    sum = 0
    artistsTotal[artist] = {'artist':artist}
    for date in dates:
        if date in artists[artist]:
            sum += artists[artist][date]
            artistsTotal[artist][date] = sum

Script - Write Data

Writes a csv file with artists in the first column, and month(dates) in the rest of the columns, with cells containing the sum of plays of that artist in that given month.

dates.insert(0, 'artist')
with open(f'{filename}-processed.csv', 'w', encoding='utf-8') as out:
    csvOut = csv.DictWriter(out, dates)
    csvOut.writeheader()
    for artist in artistsTotal:
        csvOut.writerow(artistsTotal[artist])

Processed Data

Artists in the first column, and cumulative plays each successive month in the next columns.

Upload Processed CSV to Flourish

Flourish lets you customize most aspects of your visualization, including bar colours, images, speed, etc. Play around with it, here's what I got:

671 views0 comments

Comentarios


bottom of page