IntroductionKd-trees
Range searching and kd-trees
Computational Geometry
Lecture 7: Range searching and kd-trees
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Databases
Databases store records or objects
Personnel database: Each employee has a name, id code, dateof birth, function, salary, start date of employment, . . .
Fields are textual or numerical
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Database queries
A database query may ask forall employees with agebetween a1 and a2, and salarybetween s1 and s2
date of birth
salary
19,500,000 19,559,999
G. Ometerborn: Aug 16, 1954salary: $3,500
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Database queries
When we see numerical fields of objects as coordinates, adatabase stores a point set in higher dimensions
Exact match query: Asks for the objects whose coordinatesmatch query coordinates exactly
Partial match query: Same but not all coordinates arespecified
Range query: Asks for the objects whose coordinates lie in aspecified query range (interval)
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Database queries
Example of a 3-dimensional(orthogonal) range query:children in [2 , 4], salary in[3000 , 4000], date of birth in[19,500,000 , 19,559,999]
19,500,000 19,559,999
3,000
4,000
2
4
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Data structures
Idea of data structures
Representation of structure, for convenience (like DCEL)
Preprocessing of data, to be able to solve futurequestions really fast (sub-linear time)
A (search) data structure has a storage requirement, a querytime, and a construction time (and an update time)
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
1D range query problem
1D range query problem: Preprocess a set of n points onthe real line such that the ones inside a 1D query range(interval) can be reported fast
The points p1, . . . ,pn are known beforehand, the query [x,x′]only later
A solution to a query problem is a data structure description,a query algorithm, and a construction algorithm
Question: What are the most important factors for theefficiency of a solution?
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Balanced binary search trees
A balanced binary search tree with the points in the leaves
3 10 19 23 30 37 59 62 70 80
893 19
10
30 59 70
62
93
89
8023
49
93 97
37
49
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Balanced binary search trees
The search path for 25
3 10 19 23 30 37 59 62 70 80
893 19
10
30 59 70
62
93
89
8023
49
93 97
37
49
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Balanced binary search trees
The search paths for 25 and for 90
3 10 19 23 30 37 59 62 70 80
893 19
10
30 59 70
62
93
89
8023
49
93 97
37
49
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Example 1D range query
A 1-dimensional range query with [25, 90]
3 10 19 23 30 37 59 62 70 80
893 19
10
30 59 70
62
93
89
8023
49
93 97
37
49
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Node types for a query
Three types of nodes for a given query:
White nodes: never visited by the query
Grey nodes: visited by the query, unclear if they lead tooutput
Black nodes: visited by the query, whole subtree isoutput
Question: What query time do we hope for?
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Node types for a query
The query algorithm comes down to what we do at each typeof node
Grey nodes: use query range to decide how to proceed: tonot visit a subtree (pruning), to report a complete subtree, orjust continue
Black nodes: traverse and enumerate all points in the leaves
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Example 1D range query
A 1-dimensional range query with [61, 90]
3 10 19 23 30 37 59 62 70 80
893 19
10
30 59 70
62
93
89
8023
49
93 97
37
49
split node
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
1D range query algorithm
Algorithm 1DRangeQuery(T, [x : x′])1. νsplit ←FindSplitNode(T,x,x′)2. if νsplit is a leaf3. then Check if the point in νsplit must be reported.4. else ν ← lc(νsplit)5. while ν is not a leaf6. do if x≤ xν
7. then ReportSubtree(rc(ν))8. ν ← lc(ν)9. else ν ← rc(ν)10. Check if the point stored in ν must be reported.11. ν ← rc(νsplit)12. Similarly, follow the path to x′, and . . .
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Query time analysis
The efficiency analysis is based on counting the numbers ofnodes visited for each type
White nodes: never visited by the query; no time spent
Grey nodes: visited by the query, unclear if they lead tooutput; time determines dependency on n
Black nodes: visited by the query, whole subtree isoutput; time determines dependency on k, the output size
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Query time analysis
Grey nodes: they occur on only two paths in the tree, andsince the tree is balanced, its depth is O(logn)
Black nodes: a (sub)tree with m leaves has m−1 internalnodes; traversal visits O(m) nodes and finds m points for theoutput
The time spent at each node is O(1) ⇒ O(logn+ k) querytime
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Storage requirement and preprocessing
A (balanced) binary search tree storing n points uses O(n)storage
A balanced binary search tree storing n points can be built inO(n) time after sorting, so in O(n logn) time overall(or by repeated insertion in O(n logn) time)
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Result
Theorem: A set of n points on the real line can bepreprocessed in O(n logn) time into a data structure of O(n)size so that any 1D range query can be answered inO(logn+ k) time, where k is the number of answers reported
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Example 1D range counting query
A 1-dimensional range tree for range counting queries
3 10 19 23 30 37 59 62 70 80
893 19
10
30 59 70
62
93
89
8023
49
93 97
37
49
1 1 1 1 1 1 1 1 1 1 1 1
112 22222
3 34 4
7 7
14
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Example 1D range counting query
A 1-dimensional range counting query with [25, 90]
3 10 19 23 30 37 59 62 70 80
893 19
10
30 59 70
62
93
89
8023
49
93 97
37
49
1 1 1 1 1 1 1 1 1 1 1 1
112 22222
3 34 4
7 7
14
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Database queries1D range trees
Result
Theorem: A set of n points on the real line can bepreprocessed in O(n logn) time into a data structure of O(n)size so that any 1D range counting query can be answered inO(logn) time
Note: The number of points does not influence the outputsize so it should not show up in the query time
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Range queries in 2D
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Range queries in 2D
Question: Why can’t we simply use a balanced binary tree inx-coordinate?
Or, use one tree on x-coordinate and one on y-coordinate, andquery the one where we think querying is more efficient?
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-trees
Kd-trees, the idea: Split the point set alternatingly byx-coordinate and by y-coordinate
split by x-coordinate: split by a vertical line that has half thepoints left and half right
split by y-coordinate: split by a horizontal line that has halfthe points below and half above
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-trees
Kd-trees, the idea: Split the point set alternatingly byx-coordinate and by y-coordinate
split by x-coordinate: split by a vertical line that has half thepoints left or on, and half right
split by y-coordinate: split by a horizontal line that has halfthe points below or on, and half above
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-trees
p4
p1
p5
p3
p2
p7
p9
p10
p6
p8
ℓ1
ℓ2
ℓ3
ℓ4
ℓ5
ℓ6
ℓ7
ℓ8
ℓ9
p1 p2
ℓ8 p3 p4
ℓ4 ℓ5
p5
p6 p7
p8 p9 p10
ℓ7
ℓ3
ℓ1
ℓ6
ℓ9
ℓ2
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree construction
Algorithm BuildKdTree(P,depth)
1. if P contains only one point2. then return a leaf storing this point3. else if depth is even4. then Split P with a vertical line ℓ through the
median x-coordinate into P1 (left of oron ℓ) and P2 (right of ℓ)
5. else Split P with a horizontal line ℓ throughthe median y-coordinate into P1 (belowor on ℓ) and P2 (above ℓ)
6. νleft ← BuildKdTree(P1,depth+1)
7. νright ← BuildKdTree(P2,depth+1)
8. Create a node ν storing ℓ, make νleft the leftchild of ν , and make νright the right child of ν .
9. return ν
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree construction
The median of a set of n values can be computed in O(n)time (randomized: easy; worst case: much harder)
Let T(n) be the time needed to build a kd-tree on n points
T(1) = O(1)
T(n) = 2 ·T(n/2)+O(n)
A kd-tree can be built in O(n logn) time
Question: What is the storage requirement?
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree regions of nodes
ℓ1
ℓ2
ℓ3region(ν)
ν
ℓ1
ℓ2
ℓ3
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree regions of nodes
How do we know region(ν) when we are at a node ν?
Option 1: store it explicitly with every node
Option 2: compute it on-the-fly, when going fromthe root to ν
Question: What are reasons to choose one or the otheroption?
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree querying
p1 p2
p2
p1
p3
p3 p4
p4
p5
p5
p6p6
p7
p7 p8
p8
p9
p9
p10
p10
p11
p11
p12
p12 p13
p13
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree querying
Algorithm SearchKdTree(ν ,R)
Input. The root of (a subtree of) a kd-tree, and a range R
Output. All points at leaves below ν that lie in the range.1. if ν is a leaf2. then Report the point stored at ν if it lies in R
3. else if region(lc(ν)) is fully contained in R
4. then ReportSubtree(lc(ν))5. else if region(lc(ν)) intersects R
6. then SearchKdTree(lc(ν),R)
7. if region(rc(ν)) is fully contained in R
8. then ReportSubtree(rc(ν))9. else if region(rc(ν)) intersects R
10. then SearchKdTree(rc(ν),R)
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree querying
Question: How about a range counting query?How should the code be adapted?
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
To analyze the query time of kd-trees, we use the concept ofwhite, grey, and black nodes
White nodes: never visited by the query; no time spent
Grey nodes: visited by the query, unclear if they lead tooutput; time determines dependency on n
Black nodes: visited by the query, whole subtree isoutput; time determines dependency on k, the output size
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
p1 p2
p2
p1
p3
p3 p4
p4
p5
p5
p6p6
p7
p7 p8
p8
p9
p9
p10
p10
p11
p11
p12
p12 p13
p13
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
White, grey, and black nodes with respect to region(ν):
White node ν: R does not intersect region(ν)
Grey node ν: R intersects region(ν), but region(ν) 6⊆ R
Black node ν: region(ν)⊆ R
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
Question: How many grey and how many black leaves?Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
Question: How many grey and how many black nodes?Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
Grey node ν : R intersects region(ν), but region(ν) 6⊆ R
It implies that the boundaries of R and region(ν) intersect
Advice: If you don’t know what to do, simplify until you do
Instead of taking the boundary of R, let’s analyze the numberof grey nodes if the query is with a vertical line ℓ
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
Question: How many grey and how many black leaves?Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
We observe: At every vertical split, ℓ is only to one side, whileat every horizontal split ℓ is to both sides
Let G(n) be the number of grey nodes in a kd-tree with n
points (leaves). Then G(1) = 1 and:
If a subtree has n leaves: G(n) = 1+G(n/2) at even depth
If a subtree has n leaves: G(n) = 1+2 ·G(n/2) at odd depth
If we use two levels at once, we get:
G(n) = 2+2 ·G(n/4) or G(n) = 3+2 ·G(n/4)
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
x
y y
y
x x
n leaves n leaves
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
G(1) = 1
G(n) = 2 ·G(n/4)+O(1)
Question: What does this recurrence solve to?
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
The grey subtree has unary and binary nodes
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
The depth is logn, so the binary depth is 12· logn
Important: The logarithm is base-2
Counting only binary nodes, there are
212·logn = 2logn1/2
= n1/2 =√
n
Every unary grey node has a unique binary parent (except theroot), so there are at most twice as many unary nodes asbinary nodes, plus 1
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
The number of grey nodes if the query were a vertical lineis O(
√n)
The same is true if the query were a horizontal line
How about a query rectangle?
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
The number of grey nodes for a query rectangle is at mostthe number of grey nodes for two vertical and two horizontallines, so it is at most 4 ·O(
√n) = O(
√n) !
For black nodes, reporting a whole subtree with k leaves,takes O(k) time (there are k−1 internal black nodes)
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Result
Theorem: A set of n points in the plane can be preprocessedin O(n logn) time into a data structure of O(n) size so thatany 2D range query can be answered in O(
√n+ k) time,
where k is the number of answers reported
For range counting queries, we need O(√
n) time
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Efficiency
n logn√
n
4 2 216 4 464 6 8
256 8 161024 10 324096 12 64
1.000.000 20 1000
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Higher dimensions
A 3-dimensional kd-tree alternates splits on x-, y-, andz-coordinate
A 3D range query is performed with a box
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Higher dimensions
The construction of a 3D kd-tree is a trivial adaptation of the2D version
The 3D range query algorithm is exactly the same as the 2Dversion
The 3D kd-tree still requires O(n) storage if it stores n points
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Higher dimensions
How does the query time analysis change?
Intersection of B and region(ν) depends on intersection offacets of B ⇒ analyze by axes-parallel planes (B has no moregrey nodes than six planes)
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Higher dimensions
m leaves
x
y
z
y
z zz
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Kd-tree query time analysis
Let G3(n) be the number of grey nodes for a query with anaxes-parallel plane in a 3D kd-tree
G3(1) = 1
G3(n) = 4 ·G3(n/8)+O(1)
Question: What does this recurrence solve to?
Question: How many leaves does a perfectly balanced binarysearch tree with depth 2
3logn have?
Computational Geometry Lecture 7: Range searching and kd-trees
IntroductionKd-trees
Kd-treesQuerying in kd-treesKd-tree query time analysisHigher-dimensional kd-trees
Result
Theorem: A set of n points in d-space can be preprocessed inO(n logn) time into a data structure of O(n) size so that anyd-dimensional range query can be answered in O(n1−1/d + k)time, where k is the number of answers reported
Computational Geometry Lecture 7: Range searching and kd-trees
Introduction2D Range trees
Degenerate cases
Range trees
Computational Geometry
Lecture 8: Range trees
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Database queries
A database query may ask forall employees with agebetween a1 and a2, and salarybetween s1 and s2
date of birth
salary
19,500,000 19,559,999
G. Ometerborn: Aug 16, 1954salary: $3,500
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Result
Theorem: A set of n points on the real line can bepreprocessed in O(n logn) time into a data structure of O(n)size so that any 1D range [counting] query can be answered inO(logn [+k]) time
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Result
Theorem: A set of n points in the plane can be preprocessedin O(n logn) time into a data structure of O(n) size so thatany 2D range query can be answered in O(
√n+ k) time,
where k is the number of answers reported
For range counting queries, we need O(√
n) time
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Faster queries
Can we achieve O(logn [+k]) query time?
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Faster queries
Can we achieve O(logn [+k]) query time?
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Faster queries
If the corners of the query rectangle fall in specific cells of thegrid, the answer is fixed (even for lower left and upper rightcorner)
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Faster queries
Build a tree so that the leaves correspond to the differentpossible query rectangle types (corners in same cells of grid),and with each leaf, store all answers (points) [or: the count]
Build a tree on the different x-coordinates (to search with leftside of R), in each of the leaves, build a tree on the differentx-coordinates (to search with the right side of R), in each ofthe leaves, . . .
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Faster queries
n
n
n
n
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Faster queries
Question: What are the storage requirements of thisstructure, and what is the query time?
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Faster queries
Recall the 1D range tree and range query:
Two search paths (grey nodes)
Subtrees in between have answers exclusively (black)
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Example 1D range query
A 1-dimensional range query with [25, 90]
3 10 19 23 30 37 59 62 70 80
893 19
10
30 59 70
62
93
89
8023
49
93 97
37
49
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Example 1D range query
A 1-dimensional range query with [61, 90]
3 10 19 23 30 37 59 62 70 80
893 19
10
30 59 70
62
93
89
8023
49
93 97
37
49
split node
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Examining 1D range queries
Observation: Ignoring the search path leaves, all answers arejointly represented by the highest nodes strictly between thetwo search paths
Question: How many highest nodes between the searchpaths can there be?
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Examining 1D range queries
For any 1D range query, we can identify O(logn) nodes thattogether represent all answers to a 1D range query
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Toward 2D range queries
For any 2d range query, we can identify O(logn) nodes thattogether represent all points that have a correct firstcoordinate
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Toward 2D range queries
(3, 8)(1, 5) (4, 2) (5, 9) (6, 7) (8, 1)(7, 3) (9, 4)
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Toward 2D range queries
(3, 8)(1, 5) (4, 2) (5, 9) (6, 7) (8, 1)(7, 3) (9, 4)
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Toward 2D range queries
(3, 8)(1, 5) (4, 2) (5, 9) (6, 7) (8, 1)(7, 3) (9, 4)
data structurefor searching ony-coordinate
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate casesRange queries
Toward 2D range queries
(3, 8)(1, 5) (4, 2) (5, 9) (6, 7) (8, 1)(7, 3) (9, 4)
(3, 8)
(1, 5)
(4, 2)
(5, 9)
(6, 7)
(8, 1)
(7, 3)
(9, 4)
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
2D range trees
Every internal node stores a whole tree in an associated
structure, on y-coordinate
Question: How much storage does this take?
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Storage of 2D range trees
To analyze storage, two arguments can be used:
By level: On each level, any point is stored exactly once.So all associated trees on one level together have O(n)size
By point: For any point, it is stored in the associatedstructures of its search path. So it is stored in O(logn) ofthem
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Construction algorithm
Algorithm Build2DRangeTree(P)
1. Construct the associated structure: Build a binary searchtree Tassoc on the set Py of y-coordinates in P
2. if P contains only one point3. then Create a leaf ν storing this point, and make
Tassoc the associated structure of ν .4. else Split P into Pleft and Pright, the subsets ≤ and >
the median x-coordinate xmid
5. νleft ← Build2DRangeTree(Pleft)
6. νright ← Build2DRangeTree(Pright)
7. Create a node ν storing xmid, make νleft the leftchild of ν , make νright the right child of ν , andmake Tassoc the associated structure of ν
8. return ν
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Efficiency of construction
The construction algorithm takes O(n log2 n) time
T(1) = O(1)
T(n) = 2 ·T(n/2)+O(n logn)
which solves to O(n log2 n) time
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Efficiency of construction
Suppose we pre-sort P on y-coordinate, and whenever we splitP into Pleft and Pright, we keep the y-order in both subsets
For a sorted set, the associated structure can be built in lineartime
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Efficiency of construction
The adapted construction algorithm takes O(n logn) time
T(1) = O(1)
T(n) = 2 ·T(n/2)+O(n)
which solves to O(n logn) time
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
2D range queries
How are queries performed and why are they correct?
Are we sure that each answer is found?
Are we sure that the same point is found only once?
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
2D range queries
ν
µ µ′ p
p
p
p
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Query algorithm
Algorithm 2DRangeQuery(T, [x : x′]× [y : y′])1. νsplit ←FindSplitNode(T,x,x′)2. if νsplit is a leaf3. then report the point stored at νsplit, if an answer4. else ν ← lc(νsplit)5. while ν is not a leaf6. do if x≤ xν
7. then 1DRangeQ(Tassoc(rc(ν)), [y : y′])8. ν ← lc(ν)9. else ν ← rc(ν)10. Check if the point stored at ν must be reported.11. Similarly, follow the path from rc(νsplit) to x′ . . .
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
2D range query time
Question: How much time does a 2D range query take?
Subquestions: In how many associated structures do wesearch? How much time does each such search take?
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
2D range queries
ν
µ µ′
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
2D range query efficiency
We search in O(logn) associated structures to perform a 1Drange query; at most two per level of the main tree
The query time is O(logn)×O(logm+ k′), or
∑ν
O(lognν + kν)
where ∑kν = k the number of points reported
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
2D range query efficiency
Use the concept of grey and black nodes again:
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
2D range query efficiency
The number of grey nodes is O(log2 n)
The number of black nodes is O(k) if k points are reported
The query time is O(log2 n+ k), where k is the size of theoutput
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Result
Theorem: A set of n points in the plane can be preprocessedin O(n logn) time into a data structure of O(n logn) size sothat any 2D range query can be answered in O(log2 n+ k)time, where k is the number of answers reported
Recall that a kd-tree has O(n) size and answers queries inO(√
n+ k) time
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Efficiency
n logn log2 n√
n
16 4 16 464 6 36 8
256 8 64 161024 10 100 324096 12 144 64
16384 14 196 12865536 16 256 256
1M 20 400 1K16M 24 576 4K
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
2D range query efficiency
Question: How about range counting queries?
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Higher dimensional range trees
A d-dimensional range tree hasa main tree which is aone-dimensional balancedbinary search tree on the firstcoordinate, where every nodehas a pointer to an associatedstructure that is a(d−1)-dimensional range treeon the other coordinates
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Storage
The size Sd(n) of a d-dimensional range tree satisfies:
S1(n) = O(n) for all n
Sd(1) = O(1) for all d
Sd(n)≤ 2 ·Sd(n/2)+Sd−1(n) for d ≥ 2
This solves to Sd(n) = O(n logd−1 n)
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Query time
The number of grey nodes Gd(n) satisfies:
G1(n) = O(logn) for all n
Gd(1) = O(1) for all d
Gd(n)≤ 2 · logn+2 · logn ·Gd−1(n) for d ≥ 2
This solves to Gd(n) = O(logd n)
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Result
Theorem: A set of n points in d-dimensional space can bepreprocessed in O(n logd−1 n) time into a data structure ofO(n logd−1 n) size so that any d-dimensional range query canbe answered in O(logd n+ k) time, where k is the number ofanswers reported
Recall that a kd-tree has O(n) size and answers queries inO(n1−1/d + k) time
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Comparison for d = 4
n logn log4 n n3/4
1024 10 10,000 18165,536 16 65,536 4096
1M 20 160,000 32,7681G 30 810,000 5,931,6411T 40 2,560,000 1G
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Improving the query time
We can improve the query time of a 2D range tree fromO(log2 n) to O(logn) by a technique called fractionalcascading
This automatically lowers the query time in d dimensions toO(logd−1 n) time
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Improving the query time
The idea illustrated best by a different query problem:
Suppose that we have a collection of sets S1, . . . ,Sm, where|S1|= n and where Si+1 ⊆ Si
We want a data structure that can report for a querynumber x, the smallest value ≥ x in all sets S1, . . . ,Sm
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Improving the query time
1 2 3 5 8 13 21 34 55
1 3 5 8 13 21 34 55
1 3 13 34 55
3 34 55
21
S1
S2
S3
S4
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Improving the query time
1 2 3 5 8 13 21 34 55
1 3 5 8 13 21 34 55
1 3 13 34 55
3 34 55
21
S1
S2
S3
S4
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Improving the query time
1 2 3 5 8 13 21 34 55
1 3 5 8 13 21 34 55
1 3 13 34 55
3 34 55
21
S1
S2
S3
S4
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Improving the query time
Suppose that we have a collection of sets S1, . . . ,Sm, where|S1|= n and where Si+1 ⊆ Si
We want a data structure that can report for a querynumber x, the smallest value ≥ x in all sets S1, . . . ,Sm
This query problem can be solved in O(logn+m) time insteadof O(m · logn) time
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Improving the query time
Can we do something similar for m 1-dimensional rangequeries on m sets S1, . . . ,Sm?
We hope to get a query time of O(logn+m+ k) with k thetotal number of points reported
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Improving the query time
1 2 3 5 8 13 21 34 55
1 3 5 8 13 21 34 55
1 3 13 34 55
3 34 55
21
S1
S2
S3
S4
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Improving the query time
1 2 3 5 8 13 21 34 55
1 3 5 8 13 21 34 55
1 3 13 34 55
3 34 55
21
S1
S2
S3
S4
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Improving the query time
1 2 3 5 8 13 21 34 55
1 3 5 8 13 21 34 55
1 3 13 34 55
3 34 55
21
[6,35]
S1
S2
S3
S4
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Fractional cascading
Now we do “the same” on the associated structures of a2-dimensional range tree
Note that in every associated structure, we search with thesame values y and y′
Replace all associated structures except for the one ofthe root by a linked list
For every list element (and leaf of the associatedstructure of the root), store two pointers to theappropriate list elements in the lists of the left child andof the right child
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Fractional cascading
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Fractional cascading
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Fractional cascading
(2, 19)
(5, 80)
(7, 10)
(8, 37)
(12, 3)
(15, 99)
(17, 62) (21, 49)
(33, 30)
(41, 95)
(52, 23)
(58, 59)
(67, 89)
(93, 70)
2
5 7 8 12 15
17
21 33 41 52
58
67
17
8
155
7 12 21 41
33
52
58 67
932
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Fractional cascading
3 99
10 19 37 80
30 4962
3 10 19 23 30 37 49 59 62 70 80 89 95 99
89705923 9530 49803 9962371910
3 9962 89705923 9530 49
897023 9510 3719 80
70892395371019
59
4980 3 3099
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Fractional cascading
(2, 19)
(5, 80)
(7, 10)
(8, 37)
(12, 3)
(15, 99)
(17, 62) (21, 49)
(33, 30)
(41, 95)
(52, 23)
(58, 59)
(67, 89)
(93, 70)
2
5 7 8 12 15
17
21 33 41 52
58
67
17
8
155
7 12 21 41
33
52
58 67
932
[4, 58]× [19, 65]
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Fractional cascading
3 99
10 19 37 80
30 4962
3 10 19 23 30 37 49 59 62 70 80 89 95 99
89705923 9530 49803 9962371910
3 9962 89705923 9530 49
897023 9510 3719 80
70892395371019
59
4980 3 3099
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Fractional cascading
3 99
10 19 37 80
30 4962
3 10 19 23 30 37 49 59 62 70 80 89 95 99
89705923 9530 49803 9962371910
3 9962 89705923 9530 49
897023 9510 3719 80
70892395371019
59
4980 3 3099
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Fractional cascading
Instead of doing a 1D range query on the associated structureof some node ν , we find the leaf where the search to y wouldend in O(1) time via the direct pointer in the associatedstructure in the parent of ν
The number of grey nodes reduces to O(logn)
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
ConstructionQueryingHigher dimensionsFractional cascading
Result
Theorem: A set of n points in d-dimensional space can bepreprocessed in O(n logd−1 n) time into a data structure ofO(n logd−1 n) size so that any d-dimensional range query canbe answered in O(logd−1 n+ k) time, where k is the number ofanswers reported
Recall that a kd-tree has O(n) size and answers queries inO(n1−1/d + k) time
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
Degenerate cases
Both for kd-trees and for rangetrees we have to take care ofmultiple points with the samex- or y-coordinate
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
Degenerate cases
Both for kd-trees and for rangetrees we have to take care ofmultiple points with the samex- or y-coordinate
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
Degenerate cases
Treat a point p = (px,py) with two reals as coordinates as apoint with two composite numbers as coordinates
A composite number is a pair of reals, denoted (a|b)
We let (a|b)< (c|d) iff a < c or ( a = c and b < d ); thisdefines a total order on composite numbers
Computational Geometry Lecture 8: Range trees
Introduction2D Range trees
Degenerate cases
Degenerate cases
The point p = (px,py) becomes ((px|py) , (py|px)). Then notwo points have the same first or second coordinate
The median x-coordinate or y-coordinate is a compositenumber
The query range [x : x′]× [y : y′] becomes
[(x|−∞) : (x′|+∞)]× [(y|−∞) : (y′|+∞)]
We have (px,py) ∈ [x : x′]× [y : y′] iff
((px|py) , (py|px)) ∈ [(x|−∞) : (x′|+∞)]× [(y|−∞) : (y′|+∞)]
Computational Geometry Lecture 8: Range trees