top of page
  • Rohan Dawar

Data Transform for a Sankey

A Sankey diagram depicts flows of any kind, where the width of a flow pictured is proportional to its quantity. A flow is any quantity that changes from one state to a different state. As an example, a sankey diagram would be great at depicting a job search, as applications shift to replies, shift to interviews and offers (flowing across states). Below is an example of this type of visualization from reddit user u/778dmsn, on r/dataisbeautiful:

There exists a great online tool for building Sankey Diagrams called SankeyMATIC.

To create Sankey diagrams with this tool, data must be listed in the following format:

SOURCE [AMOUNT] TARGET
SOURCE [AMOUNT] TARGET
etc.

I decided to write a short script to transform data from this wikipedia table of USA-Canada land border crossings:

The interesting thing about this data in terms of a sankey diagram is that flows here are bidirectional (ie. at a land border crossing you can cross into either country), but at differing quantities between U.S. States and Canadian Provinces/Territory. For example, from the Yukon in Canada there are 2 crossings to Alaska. However from Alaska there are 4 crossings to Canada: 2 to Yukon + 2 to British Columbia.


First, import and clean the dataframe:

import pandas as pd

df = pd.read_csv('/content/table-3.csv')

df = df.rename(columns=
          {'CanadaPort of Entry Name' : 'name',
           'Province/ Territory' : 'prov'}).drop(
               columns=['CanadaRoad/Highway [Community]','Notes','Structure orNotable Feature','Coordinates'])

new_df = df.drop(columns=['U.S.Port of Entry Name'])

Next, the loop to generate the textual format required for sankeymatic:

for prov in new_df.prov.unique():
  testdf = new_df[new_df['prov'] == prov]
  for state in testdf.State.unique():
    count = testdf.State.value_counts()[state]
    print(f'{prov} [{count}] {state}')

Output:

Yukon [2] Alaska
British Columbia [3] Alaska
British Columbia [13] Washington
British Columbia [2] Idaho
British Columbia [1] Montana
Alberta [6] Montana
Saskatchewan [6] Montana
Saskatchewan [6] North Dakota
Manitoba [12] North Dakota
Manitoba [4] Minnesota
Ontario [3] Minnesota
Ontario [4] Michigan
Ontario [7] New York
Quebec [9] New York
Quebec [15] Vermont
Quebec [1] New Hampshire
Quebec [7] Maine
New Brunswick [18] Maine

Inputting this into Sankeymatic, and adding some aesthetics, we get this:

What I like about this data and subsequent visualization is that all states are connected to each other by the 'opposite' states, and the one can 'travel' down the sankey diagram through subsequent connections (ie. going from Yukon to New Brunswick).


25 views0 comments

Comments


bottom of page