Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
DDeveloping HPC-enabled Simulation System for Multi-scale Mobility
Network Analytics
PI: Pengfei (Taylor) LiOct-12, 2018
Project Kickoff Meeting
1
OOverview
• Research Team (2018~2019): • Dr. Taylor Li, PI
• (HPC-enabled simulation engine development)
• Dr. Linkan Bian, Co-PI • (Methodology design of big data analytics)
• Dr. Wenmeng Tian, Senior staff • (Big data analytics and visualization)
• Slade Wang, Ph.D. student • (HPC environment configuration and data visualization in the HPC environment)
2
OObjective
• Build a multi-purpose, scalable, agent-based simulation engine within the HPC environment
• Mega-scale data streaming, archiving, visualizing and analyzing within the distributed computing environment • at the magnitude of hundreds of GB, TB or PB
• Preliminary Investigation of pioneering GPU-based computing engine• Combinatorial Optimization• Discrete-event Simulation
3
RResearch Tasks
• Task 1: Develop and fine tune HPC-enabled discrete-event simulation engine• Mega-scale, scalable, high-fidelity
• Task 2: Data analysis and visualization• Single computer• Distributed environment
• Task 3: Optimization
4
PPreliminary Results• HPC-enabled simulation engine development
• Second-by-second formulation• Path first-in-first-out rule• The engine can be expedited on a computer cluster• Hybrid programming: MPI (for shortest path finding) + Multi-threading
(network loading)
5
DData Set for Experiment
• Houston network, (28+ million vehicles with known origins and destinations per day, 86,400 sec (24 hours), 70k links and 20k nodes• Shortest path finding per iteration (6 hours)• Network loading per day, (5 hours)• Mobility data size: 14+ Gigabytes per day
Time of Day Demand Road Network7
PPreliminary Results
• Two Files: • Vehicle Shortest path data (14 G): with no link capacity constraints• Vehicle Trajectory data: with link capacity constraints
• Multiple runs will be needed.
8
DData Analytics Overview
• Model 1: Time series analysis• Analyze change of traffic flow over time for critical locations
• Model 2: Spatial analysis• Recognize spatial pattern at limited time points
• Model 3: Spatio-temporal analysis• Identify spatio-temporal patterns using tensor based approach
10
0 5 10 15 20 25 300
50
100
150
200
250
300
350
400
450
500Link ID=26248 (3104559.2805,13946399.335)
0 5 10 15 20 25 300
100
200
300
400
500
600Link ID=34838 (3115688.604,13756788.775)
0 5 10 15 20 25 300
2
4
6
8
10
12
14Link ID=1 (3118945.5045,13841059.685)
Model 1: Time Series Analysis at Critical Locations
• 74990 links -> 74990 time series models
• Drawbacks• This approach totally
ignores the spatial correlation between the links
• Huge redundancy in parameter estimation Link ID Xt-1 Xt-2 Xt-3 Xt-4 Xt-5
1 0.9954 0.0135 0.0353 -0.0897
.
.
.
26248 0.9801 0.0223 0.0018 0.0372 -0.0438
34838 1.0070 0.0046 -0.0170 -0.0052 0.0156
Time Series Analysis based on AR(p) model*
*The first 10k points are used for demonstration purpose11
Time
# of
Veh
icles
Time
# of
Veh
icles
Time
# of
Veh
icles
MModel 2: Spatial Analysis at Limited Time Points
• Characterize the spatial correlation at each time stamp• How does traffic pattern change over space?• Predict traffic volume at unobserved locations• Drawbacks
• Totally ignores temporal correlation• Very computationally intensive
12
MModel 3: Spatio-temporal Analysis using Tensor based Approach• Advantages
• Handle large scale spatio-temporal data
• Jointly considers spatio-temporal correlation
• Effective data compression in space and time
• Projection matrices indicates the importance of each single point on the map
• Challenges• Large matrix operations may still be a
problem
• Proposed solution• Distributed computing
DimensionOriginal Data 121*121*10
Compressed Data 50*52*4
x
y
t
x’
y’
t’
Dimension reduced by 92.9%13
OOn-going Efforts and Future Plan
• Improve the simulation engine• Generate the benchmark data set for cross-comparing various big-data analytics
approaches.
• Explore alternative options for mega-size data processing and visualization• Alt 1: Distributed MATLAB• Alt 2: Python +Spark within the HPC environment• Alt 3: Other appropriate tools?
• Preliminary GPU-based computing• GPU Development environment set up• GPU-based big data analytical packages• GPU-based computing engine development
14