View
212
Download
0
Category
Preview:
Citation preview
Spatial (or N-Dimensional) Searchin a Relational World
Jim Gray
Equations Define Subspaces
• For (x,y) above the lineax+by > c
• Reverse the space by-ax + -by > -c
• Intersect a 3 volumes: a1x + b1y > c1
a2x + b2y > c2
a3x + b3y > c3
x
y
x=c/a
y=c/b
ax + by = c
x
y
Domain is Union of Convex Hulls
• Simple volumes are unions of convex hulls.
• Higher order curves also work
• Complex volumes have holes and their holes have holes. (that is harder).
Not a convex hull
+
Now in Relational Termscreate table HalfSpace (
domainID int not null -- domain name foreign key references Domain(domainID), convexID int not null, -- grouping a set of ½ spaces halfSpaceID int identity(), -- a particular ½ space x float not null, -- the (a,b,..) parameters y float not null, -- defining the ½ space z float not null, c float not null, -- the constant (“c” above) primary key (domainID, convexID, halfSpaceID)
(x,y,z) inside a convex if it is inside all lines of the convex(x,y,z) inside a convex if it is NOT OUTSIDE ANY line of the convex
select convexID -- return the convex hullsfrom HalfSpace -- from the constraintswhere @x * x + @y * y + @x * z < l -- point outside the line?group by all convexID -- consider all the lines of a
convexIDhaving count(*) = 0 -- count outside == 0
The Algebra is Simple (Boolean)@domainID = spDomainNew (@type varchar(16), @comment varchar(8000))@convexID = spDomainNewConvex (@domainID int)@halfSpaceID = spDomainNewConvexConstraint (@domainID int, @convexID int, @x float, @y float, @z float, @l float)@returnCode = spDomainDrop(@domainID)
select * from fDomainsContainPoint(@x float, @y float, @z float) Once constructed they can be manipulated with the Boolean operations.@domainID = spDomainOr (@domainID1 int, @domainID2 int, @type varchar(16), @comment varchar(8000))@domainID = spDomainAnd (@domainID1 int, @domainID2 int, @type varchar(16), @comment varchar(8000))@domainID = spDomainNot (@domainID1 int, @type varchar(16), @comment varchar(8000))
What! No Bounding Box?
• Bounding box limits search.A subset of the convex hulls.
• If query runs at 3M halfspace/sec then no need for bounding box, unless you have more than 10,000 lines.
• But, if you have a lot of half-spaces then bounding box is good.
A Different Problem• Table-valued function
find points near a point– Select * from fGetNearbyEq(ra,dec,r)
• Use Hierarchical Triangular Mesh www.sdss.jhu.edu/htm/
– Space filling curve, bounding triangles…– Standard approach
• 13 ms/call… So 70 objects/second.• Too slow, so precompute neighbors:
Materialized view.• At 70 objects/sec
it takes 6 months to compute a billion objects.
Zone Based Spatial Join• Divide space into zones• Key points by Zone, offset
(on the sphere this need wrap-around margin.)
• Point search look in a few zonesat a limited offset: ra ± ra bounding box that has
1-π/4 false positives• All inside the relational engine• Avoids “impedance mismatch” • Can “batch” all-all comparisons• 33x faster and parallel
6 days, not 6 months!
r ra-zoneMax
√(r2+(ra-zoneMax)2)cos(radians(zoneMax))
zoneMax
x
Ra ± x
In SQL
select o1.objID -- find objectsfrom zone o1 -- in the zoned tablewhere o1.zoneID between -- where zone #
floor((@dec-@r)/@zoneHeight) and -- overlaps the circlefloor((@dec+@r)/@zoneHeight)
and o1.ra between @ra - @r and @ra + @r -- quick filter on ra and o1.dec between @dec-@r and @dec+@r -- quick filter on dec and ( (sqrt( power(o1.cx-@cx,2)+power(o1.cy-@cy,2)+power(o1.cz-@cz,2))))
< @r -- careful filter on distance
Eliminates the ~ 21% = 1-π/4False positives
Bounding box
Summary
• SQL is a set oriented language
• You can express constraints as rows
• Then You – Can evaluate LOTS of predicates per second– Can do set algebra on the predicates.
• Benefits from SQL parallelism
• SQL == Prolog?
Recommended