PyData Paris 2015 - Track 2.2 Camilla Montonen

Preview:

Citation preview

Rush Hour Dynamics: Using Python to Study theRush Hour Dynamics: Using Python to Study theLondon UndergroundLondon UndergroundCamilla MontonenCamilla Montonen

PyData Paris 2015PyData Paris 2015

Full slides available at (http://cs-with-python.github.io/)http://cs-with-python.github.io/

1 of 58 10/04/15 00:30

IntroductionIntroduction

2 of 58 10/04/15 00:30

BackgroundBackgroundBryn Mawr College 2013University of Edinburgh 2014Currently working in QA at Caplin Systems Ltd.Member of Pyladies London and Women in Data. If you're ever in London, pleasedrop in to one of our meetups!

3 of 58 10/04/15 00:30

There are interesting data problems everywhere...There are interesting data problems everywhere...Python gives you the tools, but you have to ask the questions!

4 of 58 10/04/15 00:30

Back in August 2014...Back in August 2014...

5 of 58 10/04/15 00:30

Which Tube line should I take to work?Which Tube line should I take to work?

6 of 58 10/04/15 00:30

Some days it was all good...Some days it was all good...

7 of 58 10/04/15 00:30

Other days ...not so goodOther days ...not so good

8 of 58 10/04/15 00:30

A pattern starts to emergeA pattern starts to emerge

Source: (http://news.bbc.co.uk/1/hi/in_pictures/8092917.stm)BBC News

9 of 58 10/04/15 00:30

Observation: delays or suspensions on one station can affect remoteObservation: delays or suspensions on one station can affect remotestationsstations

10 of 58 10/04/15 00:30

QuestionsQuestions

What are the most "important" stations in the London UndergroundWhat are the most "important" stations in the London Undergroundnetwork?network?

How does suspending these "important" stations affect the rest of theHow does suspending these "important" stations affect the rest of thenetworknetwork

11 of 58 10/04/15 00:30

Let's bring the Python to the DataLet's bring the Python to the Data

12 of 58 10/04/15 00:30

In the beginning, there was the 'Data'In the beginning, there was the 'Data'How do I translate a physical map of the London Underground into aHow do I translate a physical map of the London Underground into aGraph I can process with Python?Graph I can process with Python?

13 of 58 10/04/15 00:30

StartStart

14 of 58 10/04/15 00:30

GoalGoal

15 of 58 10/04/15 00:30

16 of 58 10/04/15 00:30

GoalGoal

17 of 58 10/04/15 00:30

18 of 58 10/04/15 00:30

Data collection:Data collection:It would be cool to program some kind of OCR to automatically read the data from the mapand produce a data file! But alas, I had to resort to manually creating a data file:

#Station #Neighbour(line)Acton Town Chiswick Park (District), South Ealing (Picadilly), Turnham Green (Picadilly)Aldgate Tower Hill (Circle; District), Liverpool Street (Metropolitan; Circle; District)Aldgate East Tower Hill (District), Liverpool Street (HammersmithCity; Metropolitan)Alperton Sudbury Town (Picadilly), Park Royal (Picadilly)

19 of 58 10/04/15 00:30

Now it's a piece of cake...Now it's a piece of cake...

20 of 58 10/04/15 00:30

... to perform some analysis... to perform some analysis

21 of 58 10/04/15 00:30

Let's go back to our question 1Let's go back to our question 1What is the most "important" station in the London UndergroundWhat is the most "important" station in the London Undergroundnetwork?network?

22 of 58 10/04/15 00:30

Defining "importance"Defining "importance"

23 of 58 10/04/15 00:30

Let's talk about betweenness centralityLet's talk about betweenness centrality

24 of 58 10/04/15 00:30

Betweenness seems like a good metric to measure theBetweenness seems like a good metric to measure the"importance" of a station"importance" of a station

The higher the betweenness of a station, the moreThe higher the betweenness of a station, the morecommuters will pass through itcommuters will pass through it

How can we compute betweenness on our LondonHow can we compute betweenness on our LondonUnderground graph?Underground graph?

25 of 58 10/04/15 00:30

Graphs and Python: Graphs and Python: graph-tool

graph-tool is a Python library written by Tiago Peixoto that provides a number oftools for analyzing and plotting graphs.

26 of 58 10/04/15 00:30

What can you do with What can you do with graph-tool ? ?

27 of 58 10/04/15 00:30

Create a graph objectCreate a graph objectIn [14]: from graph_tool.all import Graph

#create a new Graph objectgraph_object=Graph()

28 of 58 10/04/15 00:30

Add edges and vertices to the graphAdd edges and vertices to the graphIn [15]:

In [16]:

# add a vertex vertex1 = graph_object.add_vertex()vertex2 = graph_object.add_vertex()

# add an edgeedge1 = graph_object.add_edge(vertex1, vertex2)

29 of 58 10/04/15 00:30

Create property mapsCreate property maps

helpful for storing information about your nodes and edgeshelpful for storing information about your nodes and edges

In [17]: # create a property mapvertex_names = graph_object.new_vertex_property("string")

## iterate through the vertices in the graphfor vertex in graph_object.vertices(): vertex_names[vertex]="some_name"

30 of 58 10/04/15 00:30

Create visualizationsCreate visualizationsIn [10]: from graph_tool.draw import graph_draw

from graph_tool.all import price_network

# draw a small graphgraph_draw(graph_object, output="somefile.png")

#create a price network price_graph=price_network(5000)graph_draw(price_graph, output="price.png")

Out[10]: <PropertyMap object with key type 'Vertex' and value type 'vector<double>', for Graph 0x7f27b0121190, at 0x7f277c05cf10>

31 of 58 10/04/15 00:30

A Simple GraphA Simple Graph

32 of 58 10/04/15 00:30

A Price NetworkA Price Network

33 of 58 10/04/15 00:30

34 of 58 10/04/15 00:30

Filter vertices and edgesFilter vertices and edges

35 of 58 10/04/15 00:30

A sample visualization of the London UndergroundA sample visualization of the London Underground

36 of 58 10/04/15 00:30

Let's go back to betweennessLet's go back to betweenness

Easily calculate betweenness by calling theEasily calculate betweenness by calling thebetweenness function in function ingraph_tool.centrality

37 of 58 10/04/15 00:30

BetweennessBetweenness

38 of 58 10/04/15 00:30

39 of 58 10/04/15 00:30

We have our answer for question 1...We have our answer for question 1...

40 of 58 10/04/15 00:30

Let's take our analysis of betweenness one step further...Let's take our analysis of betweenness one step further...and answer question 2and answer question 2

41 of 58 10/04/15 00:30

How do problems on one of these important stations affectHow do problems on one of these important stations affectthe Underground network?the Underground network?

42 of 58 10/04/15 00:30

Bokeh: creating interactive data visualizationBokeh: creating interactive data visualization

43 of 58 10/04/15 00:30

A basic visualization of the London UndergroundA basic visualization of the London Underground

44 of 58 10/04/15 00:30

45 of 58 10/04/15 00:30

A Basic Visualization of BetweennessA Basic Visualization of Betweenness

46 of 58 10/04/15 00:30

47 of 58 10/04/15 00:30

How does the betwenness of each station changeHow does the betwenness of each station changewhen Baker Street is suspended?when Baker Street is suspended?

48 of 58 10/04/15 00:30

49 of 58 10/04/15 00:30

Bokeh allows us to visualize this interactively in the browserBokeh allows us to visualize this interactively in the browser

50 of 58 10/04/15 00:30

In [13]: from IPython.display import YouTubeVideo

YouTubeVideo('VouLqY-Uegs')

Out[13]:

51 of 58 10/04/15 00:30

Bokeh can do alot more than thisBokeh can do alot more than this

In fact, we can build "real time" simulations by using the built-in bokeh-In fact, we can build "real time" simulations by using the built-in bokeh-server app to stream data to a graphserver app to stream data to a graph

52 of 58 10/04/15 00:30

A Simple Bokeh Simulation of the UndergroundA Simple Bokeh Simulation of the UndergroundEach station is assigned a random number of commuters1. Each commuter is assigned a random destination2. At each step in the simulation, commuters travel over one edge3. Bokeh allows us to observe how the number of commuters at each station changesover time

4.

53 of 58 10/04/15 00:30

In [11]: YouTubeVideo('ZKHMtu1eKtc')

Out[11]:

54 of 58 10/04/15 00:30

SummarySummaryAt the beginning, we set off to answer two questions:At the beginning, we set off to answer two questions:

55 of 58 10/04/15 00:30

1. What are the most important stations on the1. What are the most important stations on theUnderground?Underground?

We used We used graph-tool to calculate betweenness to calculate betweenness

We determined that Baker Street, King's Cross St. Pancras and LiverpoolWe determined that Baker Street, King's Cross St. Pancras and LiverpoolStreet are the most importantStreet are the most important

56 of 58 10/04/15 00:30

2. How does suspending one of the important stations2. How does suspending one of the important stationsaffect the rest of the network?affect the rest of the network?

We used bokeh to create interactive graphicsWe used bokeh to create interactive graphics

We saw that removing Baker Street can put more pressure on almost anWe saw that removing Baker Street can put more pressure on almost anentire Tube line worth of stationsentire Tube line worth of stations

57 of 58 10/04/15 00:30

Thank you very much!Thank you very much!

Questions, comments and critique are very welcome!Questions, comments and critique are very welcome!

Please get in touch at camillamon[at]gmail.com orPlease get in touch at camillamon[at]gmail.com orinfo[at]winterflower.netinfo[at]winterflower.net

58 of 58 10/04/15 00:30