58
Rush Hour Dynamics: Using Python to Study the Rush Hour Dynamics: Using Python to Study the London Underground London Underground Camilla Montonen Camilla Montonen PyData Paris 2015 PyData Paris 2015 Full slides available at (http://cs-with-python.github.io/) http://cs-with-python.github.io/ 1 of 58 10/04/15 00:30

PyData Paris 2015 - Track 2.2 Camilla Montonen

Embed Size (px)

Citation preview

Page 1: PyData Paris 2015 - Track 2.2 Camilla Montonen

Rush Hour Dynamics: Using Python to Study theRush Hour Dynamics: Using Python to Study theLondon UndergroundLondon UndergroundCamilla MontonenCamilla Montonen

PyData Paris 2015PyData Paris 2015

Full slides available at (http://cs-with-python.github.io/)http://cs-with-python.github.io/

1 of 58 10/04/15 00:30

Page 2: PyData Paris 2015 - Track 2.2 Camilla Montonen

IntroductionIntroduction

2 of 58 10/04/15 00:30

Page 3: PyData Paris 2015 - Track 2.2 Camilla Montonen

BackgroundBackgroundBryn Mawr College 2013University of Edinburgh 2014Currently working in QA at Caplin Systems Ltd.Member of Pyladies London and Women in Data. If you're ever in London, pleasedrop in to one of our meetups!

3 of 58 10/04/15 00:30

Page 4: PyData Paris 2015 - Track 2.2 Camilla Montonen

There are interesting data problems everywhere...There are interesting data problems everywhere...Python gives you the tools, but you have to ask the questions!

4 of 58 10/04/15 00:30

Page 5: PyData Paris 2015 - Track 2.2 Camilla Montonen

Back in August 2014...Back in August 2014...

5 of 58 10/04/15 00:30

Page 6: PyData Paris 2015 - Track 2.2 Camilla Montonen

Which Tube line should I take to work?Which Tube line should I take to work?

6 of 58 10/04/15 00:30

Page 7: PyData Paris 2015 - Track 2.2 Camilla Montonen

Some days it was all good...Some days it was all good...

7 of 58 10/04/15 00:30

Page 8: PyData Paris 2015 - Track 2.2 Camilla Montonen

Other days ...not so goodOther days ...not so good

8 of 58 10/04/15 00:30

Page 9: PyData Paris 2015 - Track 2.2 Camilla Montonen

A pattern starts to emergeA pattern starts to emerge

Source: (http://news.bbc.co.uk/1/hi/in_pictures/8092917.stm)BBC News

9 of 58 10/04/15 00:30

Page 10: PyData Paris 2015 - Track 2.2 Camilla Montonen

Observation: delays or suspensions on one station can affect remoteObservation: delays or suspensions on one station can affect remotestationsstations

10 of 58 10/04/15 00:30

Page 11: PyData Paris 2015 - Track 2.2 Camilla Montonen

QuestionsQuestions

What are the most "important" stations in the London UndergroundWhat are the most "important" stations in the London Undergroundnetwork?network?

How does suspending these "important" stations affect the rest of theHow does suspending these "important" stations affect the rest of thenetworknetwork

11 of 58 10/04/15 00:30

Page 12: PyData Paris 2015 - Track 2.2 Camilla Montonen

Let's bring the Python to the DataLet's bring the Python to the Data

12 of 58 10/04/15 00:30

Page 13: PyData Paris 2015 - Track 2.2 Camilla Montonen

In the beginning, there was the 'Data'In the beginning, there was the 'Data'How do I translate a physical map of the London Underground into aHow do I translate a physical map of the London Underground into aGraph I can process with Python?Graph I can process with Python?

13 of 58 10/04/15 00:30

Page 14: PyData Paris 2015 - Track 2.2 Camilla Montonen

StartStart

14 of 58 10/04/15 00:30

Page 15: PyData Paris 2015 - Track 2.2 Camilla Montonen

GoalGoal

15 of 58 10/04/15 00:30

Page 16: PyData Paris 2015 - Track 2.2 Camilla Montonen

16 of 58 10/04/15 00:30

Page 17: PyData Paris 2015 - Track 2.2 Camilla Montonen

GoalGoal

17 of 58 10/04/15 00:30

Page 18: PyData Paris 2015 - Track 2.2 Camilla Montonen

18 of 58 10/04/15 00:30

Page 19: PyData Paris 2015 - Track 2.2 Camilla Montonen

Data collection:Data collection:It would be cool to program some kind of OCR to automatically read the data from the mapand produce a data file! But alas, I had to resort to manually creating a data file:

#Station #Neighbour(line)Acton Town Chiswick Park (District), South Ealing (Picadilly), Turnham Green (Picadilly)Aldgate Tower Hill (Circle; District), Liverpool Street (Metropolitan; Circle; District)Aldgate East Tower Hill (District), Liverpool Street (HammersmithCity; Metropolitan)Alperton Sudbury Town (Picadilly), Park Royal (Picadilly)

19 of 58 10/04/15 00:30

Page 20: PyData Paris 2015 - Track 2.2 Camilla Montonen

Now it's a piece of cake...Now it's a piece of cake...

20 of 58 10/04/15 00:30

Page 21: PyData Paris 2015 - Track 2.2 Camilla Montonen

... to perform some analysis... to perform some analysis

21 of 58 10/04/15 00:30

Page 22: PyData Paris 2015 - Track 2.2 Camilla Montonen

Let's go back to our question 1Let's go back to our question 1What is the most "important" station in the London UndergroundWhat is the most "important" station in the London Undergroundnetwork?network?

22 of 58 10/04/15 00:30

Page 23: PyData Paris 2015 - Track 2.2 Camilla Montonen

Defining "importance"Defining "importance"

23 of 58 10/04/15 00:30

Page 24: PyData Paris 2015 - Track 2.2 Camilla Montonen

Let's talk about betweenness centralityLet's talk about betweenness centrality

24 of 58 10/04/15 00:30

Page 25: PyData Paris 2015 - Track 2.2 Camilla Montonen

Betweenness seems like a good metric to measure theBetweenness seems like a good metric to measure the"importance" of a station"importance" of a station

The higher the betweenness of a station, the moreThe higher the betweenness of a station, the morecommuters will pass through itcommuters will pass through it

How can we compute betweenness on our LondonHow can we compute betweenness on our LondonUnderground graph?Underground graph?

25 of 58 10/04/15 00:30

Page 26: PyData Paris 2015 - Track 2.2 Camilla Montonen

Graphs and Python: Graphs and Python: graph-tool

graph-tool is a Python library written by Tiago Peixoto that provides a number oftools for analyzing and plotting graphs.

26 of 58 10/04/15 00:30

Page 27: PyData Paris 2015 - Track 2.2 Camilla Montonen

What can you do with What can you do with graph-tool ? ?

27 of 58 10/04/15 00:30

Page 28: PyData Paris 2015 - Track 2.2 Camilla Montonen

Create a graph objectCreate a graph objectIn [14]: from graph_tool.all import Graph

#create a new Graph objectgraph_object=Graph()

28 of 58 10/04/15 00:30

Page 29: PyData Paris 2015 - Track 2.2 Camilla Montonen

Add edges and vertices to the graphAdd edges and vertices to the graphIn [15]:

In [16]:

# add a vertex vertex1 = graph_object.add_vertex()vertex2 = graph_object.add_vertex()

# add an edgeedge1 = graph_object.add_edge(vertex1, vertex2)

29 of 58 10/04/15 00:30

Page 30: PyData Paris 2015 - Track 2.2 Camilla Montonen

Create property mapsCreate property maps

helpful for storing information about your nodes and edgeshelpful for storing information about your nodes and edges

In [17]: # create a property mapvertex_names = graph_object.new_vertex_property("string")

## iterate through the vertices in the graphfor vertex in graph_object.vertices(): vertex_names[vertex]="some_name"

30 of 58 10/04/15 00:30

Page 31: PyData Paris 2015 - Track 2.2 Camilla Montonen

Create visualizationsCreate visualizationsIn [10]: from graph_tool.draw import graph_draw

from graph_tool.all import price_network

# draw a small graphgraph_draw(graph_object, output="somefile.png")

#create a price network price_graph=price_network(5000)graph_draw(price_graph, output="price.png")

Out[10]: <PropertyMap object with key type 'Vertex' and value type 'vector<double>', for Graph 0x7f27b0121190, at 0x7f277c05cf10>

31 of 58 10/04/15 00:30

Page 32: PyData Paris 2015 - Track 2.2 Camilla Montonen

A Simple GraphA Simple Graph

32 of 58 10/04/15 00:30

Page 33: PyData Paris 2015 - Track 2.2 Camilla Montonen

A Price NetworkA Price Network

33 of 58 10/04/15 00:30

Page 34: PyData Paris 2015 - Track 2.2 Camilla Montonen

34 of 58 10/04/15 00:30

Page 35: PyData Paris 2015 - Track 2.2 Camilla Montonen

Filter vertices and edgesFilter vertices and edges

35 of 58 10/04/15 00:30

Page 36: PyData Paris 2015 - Track 2.2 Camilla Montonen

A sample visualization of the London UndergroundA sample visualization of the London Underground

36 of 58 10/04/15 00:30

Page 37: PyData Paris 2015 - Track 2.2 Camilla Montonen

Let's go back to betweennessLet's go back to betweenness

Easily calculate betweenness by calling theEasily calculate betweenness by calling thebetweenness function in function ingraph_tool.centrality

37 of 58 10/04/15 00:30

Page 38: PyData Paris 2015 - Track 2.2 Camilla Montonen

BetweennessBetweenness

38 of 58 10/04/15 00:30

Page 39: PyData Paris 2015 - Track 2.2 Camilla Montonen

39 of 58 10/04/15 00:30

Page 40: PyData Paris 2015 - Track 2.2 Camilla Montonen

We have our answer for question 1...We have our answer for question 1...

40 of 58 10/04/15 00:30

Page 41: PyData Paris 2015 - Track 2.2 Camilla Montonen

Let's take our analysis of betweenness one step further...Let's take our analysis of betweenness one step further...and answer question 2and answer question 2

41 of 58 10/04/15 00:30

Page 42: PyData Paris 2015 - Track 2.2 Camilla Montonen

How do problems on one of these important stations affectHow do problems on one of these important stations affectthe Underground network?the Underground network?

42 of 58 10/04/15 00:30

Page 43: PyData Paris 2015 - Track 2.2 Camilla Montonen

Bokeh: creating interactive data visualizationBokeh: creating interactive data visualization

43 of 58 10/04/15 00:30

Page 44: PyData Paris 2015 - Track 2.2 Camilla Montonen

A basic visualization of the London UndergroundA basic visualization of the London Underground

44 of 58 10/04/15 00:30

Page 45: PyData Paris 2015 - Track 2.2 Camilla Montonen

45 of 58 10/04/15 00:30

Page 46: PyData Paris 2015 - Track 2.2 Camilla Montonen

A Basic Visualization of BetweennessA Basic Visualization of Betweenness

46 of 58 10/04/15 00:30

Page 47: PyData Paris 2015 - Track 2.2 Camilla Montonen

47 of 58 10/04/15 00:30

Page 48: PyData Paris 2015 - Track 2.2 Camilla Montonen

How does the betwenness of each station changeHow does the betwenness of each station changewhen Baker Street is suspended?when Baker Street is suspended?

48 of 58 10/04/15 00:30

Page 49: PyData Paris 2015 - Track 2.2 Camilla Montonen

49 of 58 10/04/15 00:30

Page 50: PyData Paris 2015 - Track 2.2 Camilla Montonen

Bokeh allows us to visualize this interactively in the browserBokeh allows us to visualize this interactively in the browser

50 of 58 10/04/15 00:30

Page 51: PyData Paris 2015 - Track 2.2 Camilla Montonen

In [13]: from IPython.display import YouTubeVideo

YouTubeVideo('VouLqY-Uegs')

Out[13]:

51 of 58 10/04/15 00:30

Page 52: PyData Paris 2015 - Track 2.2 Camilla Montonen

Bokeh can do alot more than thisBokeh can do alot more than this

In fact, we can build "real time" simulations by using the built-in bokeh-In fact, we can build "real time" simulations by using the built-in bokeh-server app to stream data to a graphserver app to stream data to a graph

52 of 58 10/04/15 00:30

Page 53: PyData Paris 2015 - Track 2.2 Camilla Montonen

A Simple Bokeh Simulation of the UndergroundA Simple Bokeh Simulation of the UndergroundEach station is assigned a random number of commuters1. Each commuter is assigned a random destination2. At each step in the simulation, commuters travel over one edge3. Bokeh allows us to observe how the number of commuters at each station changesover time

4.

53 of 58 10/04/15 00:30

Page 54: PyData Paris 2015 - Track 2.2 Camilla Montonen

In [11]: YouTubeVideo('ZKHMtu1eKtc')

Out[11]:

54 of 58 10/04/15 00:30

Page 55: PyData Paris 2015 - Track 2.2 Camilla Montonen

SummarySummaryAt the beginning, we set off to answer two questions:At the beginning, we set off to answer two questions:

55 of 58 10/04/15 00:30

Page 56: PyData Paris 2015 - Track 2.2 Camilla Montonen

1. What are the most important stations on the1. What are the most important stations on theUnderground?Underground?

We used We used graph-tool to calculate betweenness to calculate betweenness

We determined that Baker Street, King's Cross St. Pancras and LiverpoolWe determined that Baker Street, King's Cross St. Pancras and LiverpoolStreet are the most importantStreet are the most important

56 of 58 10/04/15 00:30

Page 57: PyData Paris 2015 - Track 2.2 Camilla Montonen

2. How does suspending one of the important stations2. How does suspending one of the important stationsaffect the rest of the network?affect the rest of the network?

We used bokeh to create interactive graphicsWe used bokeh to create interactive graphics

We saw that removing Baker Street can put more pressure on almost anWe saw that removing Baker Street can put more pressure on almost anentire Tube line worth of stationsentire Tube line worth of stations

57 of 58 10/04/15 00:30

Page 58: PyData Paris 2015 - Track 2.2 Camilla Montonen

Thank you very much!Thank you very much!

Questions, comments and critique are very welcome!Questions, comments and critique are very welcome!

Please get in touch at camillamon[at]gmail.com orPlease get in touch at camillamon[at]gmail.com orinfo[at]winterflower.netinfo[at]winterflower.net

58 of 58 10/04/15 00:30