Upload
martin-stewart-tyler
View
214
Download
0
Embed Size (px)
Citation preview
1
Have we met?
Trying to find similar events in an archival database in the context of travel time estimation
Rafael J. Fernández-Moctezuma
April 3, 2007
2
Acknowledgements
• Kristin Tufte for introducing me to the “fetch similar” problem – and helping bring the problem to smaller pieces to solve.
• Umut Ozertem (OGI) for valuable discussion from a Machine Learning perspective.
3
Contents
• Issues in estimating travel time– “Instantaneous estimate”– A posteriori estimate
• Combining archived data efficiently
• Imposing structure to minimize the search space
• Preliminary results
• Concurrent / Future work
4
Motivation: Travel time
Direction of flow
5
Travel time
A B C
Direction of flow
MA MB MC
Inductive loop detectors measure speed, occupancy, and volume. These are the three fundamental quantities used to reason theoretically about traffic flow.
6
Travel time
A B C
Direction of flow
MA MB MC
Region of Influence
7
Travel time
A B C
Direction of flow
VMS
α Ω
MA MB MC
Region of Influence
A Variable Message Sign could inform drivers with estimated travel times. Useful before intersections, alternate routes may be selected if considerable delay is ahead.
8
Travel time
A B C
Direction of flow
VMS
α ΩAB BC
MA MB MC
Region of Influence
9
Instantaneous estimate
A B C
Direction of flow
VMS
α ΩAB BC
MA MB MC
At time t0, calculatesthe travel time from αto Ω as the sum of timesbetween the regions(i.e., time from α to AB +time from AB to BC +time from BC to Ω).
However, by the timethe vehicle arrives to AB,conditions measuredat B may have changed.
t0 tf
10
A posteriori estimate
A B C
Direction of flow
VMS
α ΩAB BC
MA MB MC
At time t0, calculatesthe travel time from αto AB (say, t1.). At time t1, look at the condition reported by B, and calculate the time between AB and BC (and so on.)
This is closer to the travel time a vehicle experienced, but this estimate cannot be computed online at α, for the complete segment, since we cannot see the future.
t0 t1 t2 tf
11
Archived data is useful
• It is possible to compute a posteriori estimates for previously observed measurements.
• This opens the possibility for incorporating previously seen travel times (associated with instantaneous measurements) for online estimation.
12
Identical vs. Similar
• We cannot guarantee that all possible combinations of measured values have been observed already.
• We would also like the recall of relevant data point to be a fast process – we don’t want to go through the entire history at every refresh.
• ML people constantly complain about “not having enough data”. In this case, we have a lot of data and we wish to quickly extract a representative sample.
13
Let’s step out and think in general terms: A model system
Data stream of measurements
Archive
Selectionmechanism
similar historical measurementsFusion
mechanism
Estimation
14
A model system
Data stream of measurements
Archive
Selectionmechanism
similar historical measurementsFusion
mechanism
Estimation
How can we do this efficiently?
15
A model system
Data stream of measurements
Archive
Selectionmechanism
similar historical measurementsFusion
mechanism
Estimation
What is a reasonable strategy?
16
A model system
Data stream of measurements
Archive
Selectionmechanism
similar historical measurementsFusion
mechanism
Estimation
Our effort so far has concentrated in
this section.
17
Impose structure in the data archive
• Databases are very efficient when we know what to ask (e.g., “value >= 20” benefits greatly from index lookup, if index exists)
• Can we index “similarity”?• Consider imposing structure on previously
seen information. We can be clever about what to index and reduce the search space on the fly.
18
Similar as “close” in a vector space
Retrieve the 3 nearest points to the new point
For n existing points in the space, we need n comparisons.
This problem is referred to as k-nearest neighbors.
We can define “Close” in terms of Euclidean distance.
19
What if we can prune off some points?
Suppose we are given regional boundaries. It is now feasible to look first at which one is the closest region, and then perform the search within it.
The outcome of clustering can give us such boundaries.
20
K-means
One of the simplest algorithms for clustering, attempts to minimize the variance within members of each cluster, while maximizing the variance between clusters.
It finds the centroids of k clusters. These points may not be observed points.
For this example, an initial comparison with three centroids reduces the search space to ~ 1/3.
21
K-meansThe random start property of k-means implies several limitations, three of which stand out:
(1) For a particular choice of k, there can be one or more solutions.(2) It is possible to end up with empty clusters.(3) Initial choice of centroids can be problematic (bad derivative)
We can cope with these limitations doing several Monte Carlo runs.
22
May not be a small enough set for KNN
What if we could further reduce the search within a cluster?
Suppose a torus is defined in terms of distances to the cluster centroid – we have already pre-computed them in pre-processing, and we have computed the novelty point’s distance when classifying as well.
Only do KNN with points within the two radiuses (d + λ and d – λ).
23
Back to transportation
• Input vector
for a particular time t, in a segment s that contains n sensor stations
),,,,,,,(),( )()()()1()1()1( traveltimeoccupancyvolumespeedoccupancyvolumespeedstv nnn
• What are we clustering?
– The input vectors looking only at the n fundamental measurements.
– Travel time is our measure of interest, i.e., target for prediction. Clustering is “blind” to it.
24
Considerations
• I said that clustering is “blind” to the a posteriori estimate of travel time
• This much is true, but we want clusters that help us predict travel time.
• The core assumption is that the fundamental measurements at a time t are related to the travel time (they are, we just haven’t expressed that yet).
25
Considerations
• Look at the difference in travel times among members of a cluster. This helps us choose a suitable number of clusters.
• We could use variance, but (1) it grows quadratically, and (2) is not intuitive.
• Proposed error function should decrease smoothly as the number of clusters increases.
• The error function is saying “+/- 3 is all the same to me.” . of choice
eachfor clusters over Average
1
time. travelis where
elements, with cluster aFor
||1
1
)()(
1
)()()(
K
K
Kerror
KError
x
nj
xxN
error
i
ik
N
i
jji
j
26
Prototype implementation
• Looked at US 26 E sub-segment• Morning period that includes peak: 06:00 –
11:00• Treated one day as historical, tested on another
day (Oct. 9 2006 and Oct. 12 2006)• Careful: if we just shuffle points to estimate
performance, we are fooling ourselves – the fitting process may have seen past and future. For this domain, if we are to simulate data loss, always leave the test day(s) out.
27
US 26 E
Image from http://maps.google.com/
Looked at three stations:CornellMurrayCedar Hills
Hypothetical VMS between185th and Cornell, with target destination betweenCedar Hills and Parkway.
Segment length: 3.7 miles.
Believe it or not, it can take up to 30 minutes during rush hour (been there, done that).
28
Choice of k
• As expected, error function drops
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12x 10
4
Number of clusters
Erro
r
Choice of k
29
Choice of k
• As expected, error function drops
2 4 6 8 10
100
200
300
400
500
600
Number of clusters
Erro
r
Choice of k
suitable
30
Simplifying criteria
• Radiuses determining the torus centered around the centroid:
R1 = d/2
R2 = 3d/2
Where d is the distance from the novelty to the centroid.
• “Fusion mechanism” is the average of 3-nearest neighbors within the torus.
31
Experimental resultsThe trends during the peak period are followed correctly.
Unsurprisingly, the early peak is somewhat captured – the fitting set did not have one. Still, ups and downs are discovered.
06:00 07:00 08:00 09:00 10:00 11:002
4
6
8
10
12
14
16
18
Time
Trav
el ti
me
(in m
inut
es)
Measured vs. Estimated travel times
a posteriori estimateon-line cluster based estimate
ERRATA: LABELSREVERSED
32
Fitting set timeseries
6:00 7:00 8:00 9:00 10:00 11:000
5
10
15
20
25
30
Time
Trav
el ti
me
(min
utes
)
A posteriori travel time estimates for 10/9/2007
33
Ongoing work (suggested by Kristin)
• Looking at probe runs and comparing the measured times with a posteriori estimates – previous efforts were made with instantaneous estimates only. Curious as to whether the a posteriori estimate is significantly different than the instantaneous one.
• Pick a larger dataset – OR 217 has better sensor density. Test over one month or so.
34
Future work
• Current prototype is in MATLAB – should I start getting familiar with Niagara?
• Any better ideas for the “fusion”? Should this just be an extra parameter? (“pick k nearest neighbors”)
• The radius estimate can be a potential problem. Any suggestions? Should this just be one more parameter to find during fitting?
35
Future work
• Is a torus the right shape? How about a hypercone? Could it be easily derived on the fly from pre-computed information?
• We have considered expanding the feature vector (temperature, precipitation, etc.) These measurements are updated hourly, and sometimes available the next day. Any other sources that may make sense?