Upload
michael-rys
View
1.224
Download
0
Tags:
Embed Size (px)
DESCRIPTION
SQLPASS 2011 Presentation on Spatial Indexing in SQL Server 2008, 2008R2 and what is new in SQL Server 2012.
Citation preview
October 11-14, Seattle, WA
Performance Tuning of Spatial Queries in SQL ServerDeep Dive into Spatial Indexing
Michael Rys (@SQLServerMike)Principal Program ManagerMicrosoft Corp.
October 11-14, Seattle, WA
DEMOA spatial query……
3
Q: Why is my Query so Slow?
A: Usually because the index isn’t being used.Q: How do I tell?A: SELECT * FROM T WHERE g.STIntersects(@x) = 1
NO INDEX
INDEX!
AD404-M| Spatial Performance
4
Hinting the Index
Spatial indexes can be forced if needed.
SELECT * FROM T WHERE g.STIntersects(@x) = 1
Use SQL Server 2008 SP1 or 2008 R2!
WITH(INDEX(T_g_idx))
AD404-M| Spatial Performance
5
But Why Isn't My Index Used?
Plan choice is cost-based• QO uses various information, including cardinality
When can we estimate cardinality?• Variables: never• Literals: not for spatial since they are not literals
under the covers• Parameters: yes, but cached, so first call matters
DECLARE @x geometry = 'POINT (0 0)'SELECT *FROM TWHERE T.g.STIntersects(@x) = 1
SELECT *FROM TWHERE T.g.STIntersects('POINT (0 0)') = 1
EXEC sp_executesql N'SELECT * FROM T WHERE T.g.STIntersects(@x) = 1', N'@x geometry', N'POINT (0 0)'
AD404-M| Spatial Performance
6
Spatial Indexing Basics
In general, split predicates in two• Primary filter finds all candidates, possibly
with false positives (but never false negatives)• Secondary filter removes false positives
The index provides our primary filterOriginal predicate is our secondary filterSome tweaks to this scheme• Sometimes possible to skip secondary filter
A B
C
D A BD A BPrimary Filter (Index lookup)
Secondary Filter (Original predicate)E
AD404-M| Spatial Performance
7
Using B+-Trees for Spatial Index
SQL Server has B+-TreesSpatial indexing is usually done through other structures• Quad tree, R-Tree
Challenge: How do we repurpose the B+-Tree to handle spatial queries?• Add a level of indirection!
AD404-M| Spatial Performance
8
Mapping to the B+-Tree
B+-Trees handle linearly ordered sets wellWe need to somehow linearly order 2D space • Either the plane or the globe
We want a locality-preserving mapping from the original space to the line• i.e., close objects should be close in the index• Can’t be done, but we can approximate it
AD404-M| Spatial Performance
9
SQL Server Spatial Indexing Story
• Requires bounding box• Only one grid
• No bounding box• Two top-level projection grids
Planar Index Geographic Index
3.
2.
1.1 2 15 16
4 3 14 13
5 8 9 12
6 7 10 11
1 2 15 16
4 3 14 13
5 8 9 12
6 7 10 11
1 2 15 16
4 3 14 13
5 8 9 12
6 7 10 11
1. Overlay a grid on the spatial object2. Identify grids for spatial object to store in index3. Identify grids for query object(s)4. Intersecting grids identifies candidates
Indexing PhasePrimary Filter
Secondary Filter
5. Apply actual CLR method on candidates to find matches
AD404-M| Spatial Performance
10
SQL Server Spatial Indexing StoryMulti-Level Grid• Much more flexible than a simple grid• Hilbert numbering• Modified adaptable QuadTree
Grid index features• 4 levels• Customizable grid subdivisions• Customizable maximum number of cells per object (default
16)• NEW IN SQL Server Codename “DENALI”: New Default
tessellation with 8 levels of cell nesting
AD404-M| Spatial Performance
11
Multi-Level Grid
Deepest-cell Optimization: Only keep the lowest level cell in index
Covering Optimization: Only record higher level cells when all lower cells are completely covered by the object
/ (“cell 0”)
Cell-per-object Optimization: User restricts max number of cells per object
/4/2/3/1
AD404-M| Spatial Performance
12
Implementation of the Index
Persist a table-valued function• Internally rewrite queries to use the table
Prim_key geography
1 g1
2 g2
3 g3
Prim_key cell_id srid cell_attr
1 0x00007 42 0
3 0x00007 42 1
3 0x0000A 42 2
3 0x0000B 42 0
3 0x0000C 42 1
1 0x0000D 42 0
2 0x00014 42 1
Base Table T
Internal Table for sixdCREATE SPATIAL INDEX sixdON T(geography)
0 – cell at least touches the object (but not 1 or 2)1 – guarantee that object partially covers cell2 – object covers cell
15 columns and 895 byte limitation
Spatial Reference IDHave to be the same to produce match
Varbinary(5) encoding of grid cell id
AD404-M| Spatial Performance
13
New AUTO GRID Index• NEW IN SQL Server Codename “DENALI”• Has 8 levels of cell nesting • No manual grid density selection: • Fixed at HLLLLLLL
• default number of cells per object:• 8 for geometry • 12 for geography
• More stable performance • for windows of different size• for data with different spatial density
• For default values:• Up to 2x faster for longer queries > 500 ms
• More efficient primary filter • Fewer rows returned
• 10ms slower for very fast queries < 50 ms• Increased tessellation time which is constant AD404-M| Spatial Performance
14
Spatial Index Performance
New grid gives much stable performance for query windows of different sizeBetter grid coverage gives fewer high peaks
AD404-M| Spatial Performance
15
Index Creation and MaintenanceCreate index example GEOMETRY:
CREATE SPATIAL INDEX sixd ON spatial_table(geom_column)WITH (
BOUNDING_BOX = (0, 0, 500, 500), GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 20)
Create index example GEOGRAPHY:CREATE SPATIAL INDEX sixd ON spatial_table(geogr_column)USING GEOGRAPHY_GRIDWITH (
GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 20)
NEW IN SQL Server “DENALI” (equivalent to default creation):
CREATE SPATIAL INDEX sixd ON spatial_table(geom_column)USING GEOGRAPHY_AUTO_GRIDWITH (CELLS_PER_OBJECT = 20)
Use ALTER and DROP INDEX for maintenance.
October 11-14, Seattle, WA
DEMOIndexing and Performance
17
Spatial Methods supported by Index
Geometry:• STIntersects() = 1
• STOverlaps() = 1
• STEquals()= 1
• STTouches() = 1
• STWithin() = 1
• STContains() = 1
• STDistance() < val
• STDistance() <= val
• Nearest Neighbor
• Filter() = 1
Geography:• STIntersects() = 1 • STOverlaps() = 1• STEquals()= 1• STWithin() = 1• STContains() = 1• STDistance() < val • STDistance() <= val• Nearest Neighbor• Filter() = 1
New in Denali
AD404-M| Spatial Performance
18
How Costing is Done• The stats on the index contain a trie constructed
on the string form of the packed binary(5) typed Cell ID.
• When a window query is compiled with a sniffable window object, the tessellation function on the window object is run at compile time. The results are used to construct a trie for use during compilation. • May lead to wrong compilation for later objects
• No costing on:• Local variables, constants, results of expressions
• Use different indices and different stored procs to account for different query characteristics
AD404-M| Spatial Performance
19
Understanding the Index Query Plan
AD404-M| Spatial Performance
20
Seeking into a Spatial Index
Minimize I/O and random I/OIntuition: small windows should touch small portions of the indexA cell 7.2.4 matches • Itself• Ancestors• Descendants
Spatial Index S
7 7.2 7.2.4
AD404-M| Spatial Performance
21
Understanding the Index Query Plan
T(@g)
Spatial Index Seek
Ranges
Remove dup ranges
Optional Sort
AD404-M| Spatial Performance
22
Other Query Processing Support
• Index intersection• Enables efficient mixing of spatial and non-spatial
predicates
• Matching• New in SQL Server “Denali”: Nearest Neighbor query• Distance queries: convert to STIntersects• Commutativity: a.STIntersects(b) = b.STIntersects(a)• Dual: a.STContains(b) = b.STWithin(a)• Multiple spatial indexes on the same column• Various bounding boxes, granularities
• Outer references as window objects• Enables spatial join to use one index
AD404-M| Spatial Performance
23
Other Spatial Performance Improvements in SQL Server Codename “Denali”
• Spatial index build time for point data can be as much as four to five times faster
• Optimized spatial query plan for STDistance and STIntersects like queries
• Faster point data queries• Optimized STBuffer, lower memory footprint
AD404-M| Spatial Performance
24
Spatial Nearest Neighbor (Denali)Main scenario• Give me the closest 5 Italian restaurants
Execution plan • SQL Server 2008/2008 R2: table scan• SQL Server Codename “Denali”: uses spatial index
Specific query pattern required
• SELECT TOP(5) *FROM Restaurants rWHERE r.type = ‘Italian’ AND r.pos.STDistance(@me) IS NOT NULLORDER BY r.pos.STDistance(@me)
AD404-M| Spatial Performance
October 11-14, Seattle, WA
DEMONearest Neighbor performance
26
Nearest Neighbor Performance
NN query vs best current workaround (sort all points in 10km radius)
*Average time for NN query is ~236ms
Find the closest 50 business points (22 million in total)
AD404-M| Spatial Performance
27
Limitations of Spatial Plan Selection• Off whenever window object is not a
parameter:• Spatial join (window is an outer reference)• Local variable, string constant, or complex expression
• Has the classic SQL Server parameter-sensitivity problem• SQL compiles once for one parameter value and reuses
the plan for all parameter values• Different plans for different sizes of window require
application logic to bucketize the windows
AD404-M| Spatial Performance
28
Index Support• Can be built in parallel
• Can be hinted
• File groups/Partitioning
• Aligned to base table or Separate file group
• Full rebuild only
• New catalog views, DDL Events
• DBCC Checks
• Supportability stored procedures
• New in SQL Server “Denali”: Index Page and Row Compression• Ca. 50% smaller indices, 0-15% slower queries
• Not supported
• Online rebuild
• Database Tuning advisorAD404-M| Spatial Performance
29
SET Options
Spatial indexes requires:• ANSI_NULLS: ON• ANSI_PADDING: ON• ANSI_WARNINGS: ON• CONCAT_NULL_YIELDS_NULL: ON• NUMERIC_ROUNDABORT: OFF• QUOTED_IDENTIFIER: ON
AD404-M| Spatial Performance
30
Index Hinting
FROM T WITH (INDEX (<Spatial_idxname>))• Spatial index is treated the same way a
non-clustered index is• the order of the hint is reflected in the order of the
indexes in the plan• multiple index hints are concatenated• no duplicates are allowed
• The following restrictions exist:• The spatial index must be either first in the first index
hint or last in the last index hint for a given table.• Only one spatial index can be specified in any index hint
for a given table.AD404-M| Spatial Performance
31
Query Window Hinting (Denali)
SELECT * FROM table t with(SPATIAL_WINDOW_MAX_CELLS=1024)WHERE t.geom.STIntersects(@window)=1
• Used if an index is chosen (does not force an index)• Overwrites the default (512 for geometry, 768 for
geography)• Rule of thumb:• Higher value makes primary filter phase longer but
reduces work in secondary filter phase• Set higher for dense spatial data • Set lower for sparse spatial data
AD404-M| Spatial Performance
October 11-14, Seattle, WA
DEMOQuery hinting
35
Spatial Catalog Views
• sys.spatial_indexes catalog view• sys.spatial_index_tessellations catalog
view• Entries in sys.indexes for a spatial index:• A clustered index on the internal table of the spatial
index• A spatial index (type = 4) for spatial index
• An entry in sys.internal_tables• An entry to sys.index_columns
AD404-M| Spatial Performance
38
sp_spatial_help_geometry_histogramsp_spatial_help_geography_histogramUsed for spatial data and index analysis
New Spatial Histogram Helpers (Denali)
Histogram of 22 million business points over USLeft: SSMS view of a histogramRight: Custom drawing on top of Bing Maps
AD404-M| Spatial Performance
39
Indexing Support Procedures
sys.sp_help_spatial_geometry_indexsys.sp_help_spatial_geometry_index_xmlsys.sp_help_spatial_geography_indexsys.sp_help_spatial_geography_index_xml
Provide information about index:64 properties10 of which are considered core
AD404-M| Spatial Performance
40
sys.sp_help_spatial_geometry_index
Arguments
Results in property name/value pair table of the format:
Parameter Type Description
@tabname nvarchar(776)
the name of the table for which the index has been specified
@indexname sysname the index name to be investigated
@verboseoutput tinyint 0 core set of properties is reported1 all properties are being reported
@query_sample geometry A representative query sample that will be used to test the usefulness of the index. It may be a representative object or a query window.
PropName: nvarchar(256) PropValue: sql_variant
AD404-M| Spatial Performance
43
Some of the returned Properties
Property Type DescriptionNumber_Of_Rows_Selected_By_Primary_Filter
bigint Core P = Number of rows selected by the primary filter.
Number_Of_Rows_Selected_By_Internal_Filter
bigint Core S = Number of rows selected by the internal filter. For these rows, the secondary filter is not called.
Number_Of_Times_Secondary_Filter_Is_Called
bigint Core Number of times the secondary filter is called.
Percentage_Of_Rows_NotSelected_By_Primary_Filter
float Core Suppose there are N rows in the base table, suppose P are selected by the primary filter. This is (N-P)/N as percentage.
Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter
float Core This is S/P as a percentage. The higher the percentage, the better is the index in avoiding the more expensive secondary filter.
Number_Of_Rows_Output bigint Core O=Number of rows output by the query.
Internal_Filter_Efficiency float Core This is S/O as a percentage.
Primary_Filter_Efficiency float Core This is O/P as a percentage. The higher the efficiency is, the less false positives have to be processed by the secondary filter.
AD404-M| Spatial Performance
October 11-14, Seattle, WA
DEMO
Indexing Supportability
45
Spatial Tips on index settings
Some best practice recommendations (YMMV):• Start out with new default tesselation• Point data: always use HIGH for all 4 level.
CELL_PER_OBJECT are not relevant in the case.• Simple, relatively consistent polygons: set all levels
to LOW or MEDIUM, MEDIUM, LOW, LOW • Very complex LineString or Polygon instances:• High number of CELL_PER_OBJECT (often 8192 is best)• Setting all 4 levels to HIGH may be beneficial
• Polygons or line strings which have highly variable sizes: experimentation is needed.
• Rule of thumb for GEOGRAPHY: if MMMM is not working, try HHMM
AD404-M| Spatial Performance
46
What to do if my Spatial Query is slow?• Make sure you are running SQL Server 2008 SP1, 2008 R2 or
“Denali”• Check query plan for use of index• Make sure it is a supported operation• Hint the index (and/or a different join type)• Do not use a spatial index when there is a highly selective non-
spatial predicate• Run above index support procedure:• Assess effectiveness of primary filter (Primary_Filter_Efficiency)• Assess effectiveness of internal filter (Internal_Filter_Efficiency)• Redefine or define a new index with better characteristics
• More appropriate bounding box for GEOMETRY• Better grid densities
AD404-M| Spatial Performance
47
Related ContentWeblog• http://blogs.msdn.com/isaac• http://blogs.msdn.com/edkatibah • http://johanneskebeck.spaces.live.com/ • http://sqlblog.com/blogs/michael_rys/
Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1629&SiteID=1 Whitepapers, Websites & Code• Denali CTP3: http://
sqlcat.com/sqlcat/b/whitepapers/archive/2011/08/08/new-spatial-features-in-sql-server-code-named-denali-community-technology-preview-3.aspx
• Spatial Wiki: http://social.technet.microsoft.com/wiki/contents/articles/4136.aspx • SQL Server 2008 Spatial Site:
http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx• SQL Spatial Codeplex: http://www.codeplex.com/sqlspatialtools• http://www.sharpgis.net/page/SQL-Server-2008-Spatial-Tools.aspx • http://www.codeplex.com/ProjNET • http://www.geoquery2008.com/ • SIGMOD 2008 Paper: Spatial Indexing in Microsoft SQL Server 2008• And of course Books Online! AD404-M| Spatial Performance
48
Complete the Evaluation Form to Win!Win a Dell Mini Netbook – every day – just for submitting your completed form. Each session evaluation form represents a chance to win.
Pick up your evaluation form:• In each presentation room• Online on the PASS Summit website
Drop off your completed form:• Near the exit of each presentation room• At the Registration desk• Online on the PASS Summit website
Sponsored by Dell
AD404-M| Spatial Performance
October 11-14, Seattle, WA
Thank youfor attending this session and the 2011 PASS Summit in Seattle
50AD404-M| Spatial Performance
Microsoft SQL Server Clinic
Work through your technical issues with
SQL Server CSS & get architectural guidance
from SQLCAT
Microsoft Product Pavilion
Talk with Microsoft SQL Server & BI experts to learn about the next version of SQL Server and check out the new Database Consolidation
Appliance
Expert Pods
Meet Microsoft SQL Server Engineering team members &
SQL MVPs
Hands-on Labs
Get experienced through self-paced & instructor-led labs on our cloud based lab platform - bring your
laptop or use HP provided hardware
Room 611 Expo Hall 6th Floor Lobby Room 618-620