Upload
ashley-harper
View
221
Download
6
Embed Size (px)
Citation preview
2006-09-15
VLDB '2006
Haibo Hu (Hong Kong Baptist University, Hong Kong)Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong)Victor Lee (City University of Hong Kong, Hong Kong)
Distance Indexing on Road Networks
2
Modeling Road Networks
Network -> Undirected weighted graphRoad junction -> Vertex (node)Road segment -> Edge Distance -> Edge weightData object and query point -> On node only
H igh wa y G a s s ta tio n Q ue ry po in t
Ac tual N eares t
N eares t in E uc lidean S pac e
4.50.7
1
0.84
objects query point
3
Query Processing on Road Networks
Queries: Window querykNN, continuous kNN
Processing methods:Network Expansion [Papadias VLDB03]
Use Euclidean distance for preliminary pruningIndexing the objects byspatial index
Precomputed Index [Kolahdouzan VLDB04]
Voronoi Network Nearest Neighbor (VN3)NN list: precompute and store the kNNs for some large-degree nodes
4.50.7
1
0.84
5
4
Problems and Disadvantages
Distance computation is still toughBy Dijkstra's single-source shortest path algorithm:
Maintain nodes whose distances are not finalizedPick the node with the shortest distance and finalize itRelax all not-yet-finalized distances Repeat until all distances are finalized
Limitations:Must visit nodes in the ascending order of distancesRunning time O(NlgV)
Precomputed indexes cannot suit all queriesReturn k nearest neighborReturn the actual shortest path
Precomputed indexes are costly to store and update
5
Our Solution at a Glance
Distance signature --- the first general-purposed index on road networks that
Categorizes the distances of a node to all objectsSupports both rough and exact distance computationAccelerates processing of common query typesReduces the storage and maintenance costIs orthogonal to other query optimization techniques
6
Roadmap
BackgroundDistance Signature OverviewOperations on SignaturesQuery Processing on SignaturesSmart Choice of Distance CategoriesConstruction and MaintenanceExperimental ResultsConclusion
7
Distance Signature
Basic Idea:Precomputing distances is a good trade-off between having no indexing and solution space indexingMaintain the approximate distance between objects and nodesHow rough is the approximation?
Apply rough approximation to faraway objectsQueries are always interested in local objectsFaraway objects are more than local objects
We use an exponential sequence of categoriesIn the form of [0, T), [T, cT), [cT, c2T), [c2T, c3T), ... T and c are constant parametersE.g., T = 3, c = 2, then [0, 3), [3,6), [6,12), [12,24), ...
3 6 2412
Cat 0Cat 1 Cat 2 Cat 3
8
Distance Signature (Cont'd)
For each node n, signature component S(n)[i] denotes the category of dist(n,i)
S(n)[i].link denotes the next node from n in the shortest path to i
Signature S(n) is the whole set of components S(n)[i] 33
5
6
81264
0 0 21 2
01
2
0 104 16
n1
n2
n3n4
n5 n6
n7
s (n2)
n3 n6
6
1 0 10 0 2
adjac enc y lis t
s (n) n1 6 n3 4 n5 5 null
s (n) n3 5 n5 15 n6 8 null
dis tance category
s (n2).link
n2
5
s (n4)s (n4).link
3
node object
9
Roadmap
BackgroundDistance Signature OverviewOperations on SignaturesQuery Processing on SignaturesSmart Choice of Distance CategoriesConstruction and MaintenanceExperimental ResultsConclusion
10
Distance Operations on Signatures
Principle: trace back the link until the distance range is
accurate enoughExact Approximate
Retrieval
(distance between node and object)
Trace back through the link from node to object
Terminate once the distance range does not partially overlap with input
Comparison
(distances from node n to objects a and b)
Trace back until the two distance ranges don’t overlap
Sorting First apply approximate sorting, then apply bubble sort using exact comparison
Quick sort using approximate comparison
114n2
n3
n611
p1
p2
p1p2: possible positions of n4
11
Approximate Distance Comparison
What and Why?Compare the distances of two objects based on one signatureAvoid accessing the signatures of other nodesUsed to get a rough result of distance sorting
How?Example: compare dist(n4,n2) with dist(n4,n6)
Select an observer n3
Embed objects n2,n3,n6 into Euclidean spacen3 tells if n2 or n6 is closer to n4
If n4 is on the perpendicularbisector, is it possible for n3
to find n4 within distance ranges(n4)[n3]?
Let multiple observers vote
11
114
n2
n3po ss ib le po s itio n fo r n 4
n6
p1
p2
12
kNN Search on Signatures
ProceduresRead signature s(q) of query node qCategories tell the approximate distances between q and other objectsGet k closest objects according to their category valuesIf no need to know the distances or order, return objects based on category rangesTo find the ordering:
Sort objects within each category
To find exact distances:Perform exact distance retrieval for each knn
13
Roadmap
BackgroundDistance Signature OverviewOperations on SignaturesQuery Processing on SignaturesSmart Choice of Distance CategoriesConstruction and MaintenanceExperimental ResultsConclusion
14
Smart Choice of Distance Categories
Exponential categories [0, T), [T, cT), [cT, c2T], ...
How to determine c and T?Factors:
Dataset density, distributionQuery type, load (metric: spreading)Storage availability
SimplificationsThe road network is a uniform gridSpreading is uniformly distributed in [0, SP]Unlimited disk storage
TheoremThe optimal c = e, T = (SP/e)0.5
n
O (2)
15
Signature Construction
Basic proceduresAllocate storage for signaturesBuild shortest path spanning tree for each object (Dijkstra)Fill in s(n)[i] when the tree of object i is spanned to node n
Variable length encodingObservation
the number of objects in each category is not even
# of objects 1 unit, 2 units, 3 units, ... away: 4, 8, 12, ...
Use fewer bits for larger categories
16
Variable Length Encoding
Reverse zero codingBased on Huffman encoding schemeUnder assumptions "exponential partition", "grid topology", "uniform distance range of queries", and c>1.5, this coding scheme is optimal[0, T) [T, cT) [cT, c2T) [c2T, c3T) [c3T, ∞)
Average code length is approximately :
10100100010000 Reverse coding
000 001 010 011 100Fixed coding
c2
c2−1≈1.2
17
Signature Compression
Idea: Many objects share the same link
u v
n
If s(n)[u] + s(u)[v] = s(n)[v], then s(n)[v] can be replaced by1-bit flag
not compressedin memory
18
Signature Update
RequirementThe shortest path spanning trees of all objectsA reverse index for each edge of trees that comprise this edge
limit the number of trees affected by the change of this edge
How (suppose edge (a,b) is updated) :Find those affected spanning treesFor each affected tree of object c, check s(a)[c] or s(b)[c] (whichever is smaller)Propagate to adjacent nodes until no more updates
19
Roadmap
BackgroundDistance Signature OverviewOperations on SignaturesQuery Processing on SignaturesSmart Choice of Distance CategoriesConstruction and MaintenanceExperimental ResultsConclusion
20
Experiment Settings
Statistics183K nodes351K edges Random edge weights from 1 to 10Page size: 4K bytes
kNN CompetitorsSignature indexingFull indexing (NN list for all nodes)Network Voronoi Diagram (NVD) from VN3
Tuning parametersp: object densityT, c, k
Comparison metrics: page access (I/O cost), CPU time
21
Index Construction Cost
Good for medium and sparse datasets
22
KNN Search Performance
Moderate performance over various k
23
Robustness
The choice of parameters does not make large difference
24
Conclusion
Our ContributionsThe first index for distance computation on road networksSpeed up general query processingOptimal choice of distance categories and category encoding
Future workCross-node signature compression
The signatures of nearby nodes are similarDerivation of optimal distance categories for a wider range of network topologies and object distributions