Upload
blaise-phillips
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
U2SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data
Jianting Zhang134 Hongmian Gong234
Camille Kamga24, Le Gruenwald5
1 CUNY City College (CCNY), 2 CUNY Hunter College 3 CUNY Graduate Center
4 University Transportation Research Center Region II, 5 University of Oklahoma
Outline•Introduction & Background•System Architecture and Implementation
Time Segmented Column-Oriented Data LayoutEfficient Spatial -Temporal Aggregations Spatial Join with Infrastructural Data
•Case Studies and Performance Evaluations •Conclusion and Future Work
Introduction
3
Ubiquitous Urban Sensing Origin-Destination Data (U2SOD)
Taxi trips Cellular phone calls
Social network activities
Introduction• What do they have in common?
– produced and collected by end users using commodity sensing devices and are rich in data volumes in urban areas
– special type of spatial-temporal data– the intermediate locations between origins and
destinations are either unavailable, inaccessible or unimportant
– can be more effective to help understand the real dynamic of urban areas with respect to spatial/temporal resolutions and representativeness.
Introduction
• How to manage U2SOD data? – Geographical Information System (GIS)– Spatial Databases (SDB)– Moving Object Databases (MOD)
• How good are they? – Pretty good for small amount of data – But, rather poor for large-scale data
Introduction• Example 1:
– Loading 170 million taxi pickup locations into PostgreSQL– UPDATE t SET PUGeo = ST_SetSRID(ST_Point("PULong","PuLat"),4326);
– 105.8 hours!
• Example 2: – Finding the nearest tax blocks for 170 million taxi pickup locations
using open source libspatiaindex+GDAL
– 30.5 hours!
I do not have time to wait...
Can we do better?
• The combination of architectural and organizational enhancements lead to 16 years of sustained growth in performance at an annual rate of 50% from 1986 to 2002.
• However, due to the combined power, memory and instruction-level parallelism problem, the growth rate has dropped to about 20% per year from 2002 to 2006
• On the other hand, the growth in performance for GPU remains 50% per year.
Quadro 6000 $4000
$500
$2500
Nvidia GTX 690: 3072 core (915 MHZ), 4GB GDDR5 memory, 384 GB/s bandwidth; under $1,000
Introduction
• So, the goal is to design a data management system to efficiently manage large-scale U2SOD data on massively data parallel GPUs
• And cut the runtimes from hours to seconds on a single commodity GPU device
• With the help of new data models, data structures and algorithms
System Design and Implementation
Spatial Joins and Shortest Path Computation
Day
Month
Year Raw data
Compression, aggregation and indexing
Physical Data Layout
U2SOD-DB
System Design and Implementation
Medallion#Shift#Trip#
Trip_Pickup_DateTimeTrip_Dropoff_DateTime
Trip_Pickup_LocationTrip_Dropoff_Location
Start_LonStart_LatEnd_LonEnd_Lat
Payment_TypeSurchargeTotal_AmtRate_Code
Passenger_CountFare_AmtTolls_AmtTip_AmtTrip_Time
Trip_Distance
vendor_namedate_loadedstore_and_forward
time_between_servicedistance_between_service
Start_Zip_CodeEnd_Zip_Code
start_xstart_yend_xend_y (local projection)
1
2
3
4
5
6
78 9
1110
System Design and Implementation
Year
Month
Day
Hour
Day of the Year
Week of the Year
Day of the Week
City
Borough
Community District
Police Precinct
Census Tract
Census Block
Street Segment
Tax Lot
Tax Block
Pickup/drop-off locations
Level 0 grid
Level k grid
Top level grid
15/30-minutes
Pickup/drop-off timestamps
NYC taxi trip records
Peak/off-peak
Auxiliary data (weather, events…)
System Design and Implementation
P2P-TP2N-D P2P-D
The three types of spatial joins are now supported by U2SOD-DB completely on GPUs with signficant speedups.
Case Studies and Performance Evaluations
• Data– Taxi trip records: 300 million in two years (2008-
2010), ~170 million in 2009 (~150 million in Manhattan)
– NYC DCPLION street network data: 147,011 street segments
– NYC Census 2000 blocks: 38,794– NYC MapPluto Tax blocks: 735,488 in four boroughs
(excluding SI) and 43,252 in Manhattan• Hardware
– Dell T5400 Dual Quadcore CPUs with 16 GB memory– Nvidia Quadro 6000 with 448 cores and 6 GB memory
Case Studies and Performance Evaluations
Top: grid size =256*256resolution=128 feet Right: grid size =8192*8192resolution=4 feet
Spatial Aggregation
9,424 /326=30X (8192*8192)
Temporal Aggregation
1709/198=8.6X (minute)
1598 /165 = 9.7X (hour)
Case Studies and Performance Evaluations
T-Drive dataset: 17,762,489 GPS point locations; 47.25 milliseconds for aggregation (4,110 ms on CPU) using STL 87X speedup
Case Studies and Performance Evaluations
P2P-TP2N-D P2P-D
147,011 street segments
38,794 census blocks (470941 points)
735,488 tax blocks (4,698,986 points)
- 15.2 hours 30.5 hours
10.9 seconds 11.2 seconds 33.1 seconds
- 4,900X 3,200X
CPU time
GPU Time
Speedup
Conclusion and Future Work
• We reported our design and implementation of U2SOD-DB, a column-oriented, GPU-accelerated, in-memory data management system targeted at large-scale ubiquitous urban sensing origin-destination data
• Experiments have demonstrated signficant speedups over serial CPU implementations in main-memory (10-100X) and traditional disk-resident systems (3000-5000X) for processing 170 million taxi trip records and their spatial joins with various types of urban infrastructure data
Conclusion and Future Work
• Extend U2SOD-DB to handle other types of OD data as well as trajectory data
• Further improve the performance by designing and implementing more efficient data structures and algorithms on GPUs
• Apply U2SOD-DB to in-depth analysis of trip purposes and urban dynamics in NYC by collaborating with transportation researchers, and urban geographers.