Big Data “Triage” for Long Range Planning
Transportation Engineering and Safety Conference
Reuben S MacMartinDecember 12, 2014
Delaware Valley Regional Planning Commission
Metropolitan Planning Organization (MPO) 2 States 9 Counties 351 Municipalities 5.6 Million Population 3,800 sq. miles ~115 employees
Activities – Long Range Plan (LRP) Transportation Improvement Program (TIP) Wide range of planning and technical support for
regional partners
Outline
What we use data for? Traditional data sources – traffic counts,
surveys, demographic data The old-new – OSM, GTF, VPP Suite,
Bluetooth The new-new – CycleTracks, real-time
transit data,…, data-mined GPS data, etc.
What do we use data for?
Current conditions on transportation studies
Current Conditions
What do we use data for?
Current conditions on transportation studies
Definition and analysis of congestion for the Congestion Management Process (CMP)
A bad day compared to average
A bad day compared to average
What do we use data for?
Current conditions on transportation studies
Definition and analysis of congestion for the Congestion Management Process (CMP)
Long Range Planning
Long Range Planning
Long Range Planning
What do we use data for?
Current conditions on transportation studies
Definition and analysis of congestion for the Congestion Management Process (CMP)
Long Range Planning Calibration and validation of travel
forecasting models250 Riders in 2040
Also a data provider – eg. RIMIS
“Traditional” Planning Data Sources Inventories
Traffic counts (78,300+) Bike and Ped counts (1000+) Travel time surveys
Behavioral Surveys Household travel survey (2012-2013) Transit on-board (2010-2012)
Demographic Data Census, American Community Survey National Employment Time Series (NETS)
The old “new” data
These were innovative 5 years ago – Open source data for our travel demand
model networks
Travel Demand Model Networks The need:
Accurate representations of regional highway and transit networks
The past: “hand” code from paper maps, schedules,
etc. or, combine a multitude of different data
sources The innovation:
Fuse OpenStreetMap (OSM) and GTF (i.e. “Google-transit”) and add extra data for modeling
Open Data Mash-up for Transportation Modeling
Data integration Data objects
of different origin are merged
New relationships are created
from OSM
Stop Point
Number
Line
Name
Service Pattern
Line NameRoute NameDirection
Scheduled Run
Line NameRoute NameDirectionIndex
Travel DemandData
Stop Area
Number
from GTFS
Node
Number
Link
From NodeTo Node
2
1 or more
0 or more
Exactly 1
Legend
Connector
Zone NumberNode NumberDirection
Zone
Zone Number
Integrated Street & Transit Network
© in part by OSM and CC-by-SA
TIM 2 Highway Network
© in part by OSM and CC-by-SA
New, accurate topology (& routable) Legacy DVRPC network model
Original SEPTA GTFS (2010)
VISUM Imported Network
VISUM Exported Network(WKTPoly shape)
The old “new” data
These were innovative 5 years ago – Open source data for our travel demand
model networks Bluetooth detectors for speed and O-D
data
The old “new” data
These were innovative 5 years ago – Open source data for our travel demand
model networks Bluetooth detectors for speed and O-D
data Automated Passenger Counter (APC)
data - SEPTA
Why APC data?
Time stamped boarding and alighting data by line by stop
Time period level targets for modeling Stop and line level expansion values for
On Board Survey work Used in calibration/validation of path
builder Transit studies: O-D matrices by line
The new “new” data
User-sourced bike data - CyclePhilly
CyclePhilly – User Generated GPS Data
www.cyclephilly.org
Raw GPS Trace
Snapped GPS Model Path Model vs. Data
The new “new” data
User-sourced bike data – CyclePhilly Vehicle probe data – INRIX
PM Peak TTI – INRIX
Archived Operational Data – INRIX
The new “new” data
User-sourced bike data – CyclePhilly Vehicle probe data – INRIX SEPTA Key (new fare payment
technology) data – SEPTA (availability TBD)
Fare Card Data – Possibilities Anonymized full day transit-based tour
data for all riders O-D data Route choice data Transfer behavior Frequency of transit use Much higher resolution data than current
survey methods
Triage – Making Data Usable
Aggregation – Resolution and limits of existing analytical tools/methods
Cleaning – You can’t check every data point Initial spot check and clean as you go if you
find discrepancies Sampling Biases – Not all big data is
truly random Compare non-random to random sources
whenever possible Declare biases of data when using it