44
October 11-14, Seattle, WA Performance Tuning of Spatial Queries in SQL Server Deep Dive into Spatial Indexing Michael Rys (@SQLServerMike) Principal Program Manager Microsoft Corp.

SQLPASS AD404-M Spatial Index MRys

Embed Size (px)

DESCRIPTION

SQLPASS 2011 Presentation on Spatial Indexing in SQL Server 2008, 2008R2 and what is new in SQL Server 2012.

Citation preview

Page 1: SQLPASS AD404-M Spatial Index MRys

October 11-14, Seattle, WA

Performance Tuning of Spatial Queries in SQL ServerDeep Dive into Spatial Indexing

Michael Rys (@SQLServerMike)Principal Program ManagerMicrosoft Corp.

Page 2: SQLPASS AD404-M Spatial Index MRys

October 11-14, Seattle, WA

DEMOA spatial query……

Page 3: SQLPASS AD404-M Spatial Index MRys

3

Q: Why is my Query so Slow?

A: Usually because the index isn’t being used.Q: How do I tell?A: SELECT * FROM T WHERE g.STIntersects(@x) = 1

NO INDEX

INDEX!

AD404-M| Spatial Performance

Page 4: SQLPASS AD404-M Spatial Index MRys

4

Hinting the Index

Spatial indexes can be forced if needed.

SELECT * FROM T WHERE g.STIntersects(@x) = 1

Use SQL Server 2008 SP1 or 2008 R2!

WITH(INDEX(T_g_idx))

AD404-M| Spatial Performance

Page 5: SQLPASS AD404-M Spatial Index MRys

5

But Why Isn't My Index Used?

Plan choice is cost-based• QO uses various information, including cardinality

When can we estimate cardinality?• Variables: never• Literals: not for spatial since they are not literals

under the covers• Parameters: yes, but cached, so first call matters

DECLARE @x geometry = 'POINT (0 0)'SELECT *FROM TWHERE T.g.STIntersects(@x) = 1

SELECT *FROM TWHERE T.g.STIntersects('POINT (0 0)') = 1

EXEC sp_executesql N'SELECT * FROM T WHERE T.g.STIntersects(@x) = 1', N'@x geometry', N'POINT (0 0)'

AD404-M| Spatial Performance

Page 6: SQLPASS AD404-M Spatial Index MRys

6

Spatial Indexing Basics

In general, split predicates in two• Primary filter finds all candidates, possibly

with false positives (but never false negatives)• Secondary filter removes false positives

The index provides our primary filterOriginal predicate is our secondary filterSome tweaks to this scheme• Sometimes possible to skip secondary filter

A B

C

D A BD A BPrimary Filter (Index lookup)

Secondary Filter (Original predicate)E

AD404-M| Spatial Performance

Page 7: SQLPASS AD404-M Spatial Index MRys

7

Using B+-Trees for Spatial Index

SQL Server has B+-TreesSpatial indexing is usually done through other structures• Quad tree, R-Tree

Challenge: How do we repurpose the B+-Tree to handle spatial queries?• Add a level of indirection!

AD404-M| Spatial Performance

Page 8: SQLPASS AD404-M Spatial Index MRys

8

Mapping to the B+-Tree

B+-Trees handle linearly ordered sets wellWe need to somehow linearly order 2D space • Either the plane or the globe

We want a locality-preserving mapping from the original space to the line• i.e., close objects should be close in the index• Can’t be done, but we can approximate it

AD404-M| Spatial Performance

Page 9: SQLPASS AD404-M Spatial Index MRys

9

SQL Server Spatial Indexing Story

• Requires bounding box• Only one grid

• No bounding box• Two top-level projection grids

Planar Index Geographic Index

3.

2.

1.1 2 15 16

4 3 14 13

5 8 9 12

6 7 10 11

1 2 15 16

4 3 14 13

5 8 9 12

6 7 10 11

1 2 15 16

4 3 14 13

5 8 9 12

6 7 10 11

1. Overlay a grid on the spatial object2. Identify grids for spatial object to store in index3. Identify grids for query object(s)4. Intersecting grids identifies candidates

Indexing PhasePrimary Filter

Secondary Filter

5. Apply actual CLR method on candidates to find matches

AD404-M| Spatial Performance

Page 10: SQLPASS AD404-M Spatial Index MRys

10

SQL Server Spatial Indexing StoryMulti-Level Grid• Much more flexible than a simple grid• Hilbert numbering• Modified adaptable QuadTree

Grid index features• 4 levels• Customizable grid subdivisions• Customizable maximum number of cells per object (default

16)• NEW IN SQL Server Codename “DENALI”: New Default

tessellation with 8 levels of cell nesting

AD404-M| Spatial Performance

Page 11: SQLPASS AD404-M Spatial Index MRys

11

Multi-Level Grid

Deepest-cell Optimization: Only keep the lowest level cell in index

Covering Optimization: Only record higher level cells when all lower cells are completely covered by the object

/ (“cell 0”)

Cell-per-object Optimization: User restricts max number of cells per object

/4/2/3/1

AD404-M| Spatial Performance

Page 12: SQLPASS AD404-M Spatial Index MRys

12

Implementation of the Index

Persist a table-valued function• Internally rewrite queries to use the table

Prim_key geography

1 g1

2 g2

3 g3

Prim_key cell_id srid cell_attr

1 0x00007 42 0

3 0x00007 42 1

3 0x0000A 42 2

3 0x0000B 42 0

3 0x0000C 42 1

1 0x0000D 42 0

2 0x00014 42 1

Base Table T

Internal Table for sixdCREATE SPATIAL INDEX sixdON T(geography)

0 – cell at least touches the object (but not 1 or 2)1 – guarantee that object partially covers cell2 – object covers cell

15 columns and 895 byte limitation

Spatial Reference IDHave to be the same to produce match

Varbinary(5) encoding of grid cell id

AD404-M| Spatial Performance

Page 13: SQLPASS AD404-M Spatial Index MRys

13

New AUTO GRID Index• NEW IN SQL Server Codename “DENALI”• Has 8 levels of cell nesting • No manual grid density selection: • Fixed at HLLLLLLL

• default number of cells per object:• 8 for geometry • 12 for geography

• More stable performance • for windows of different size• for data with different spatial density

• For default values:• Up to 2x faster for longer queries > 500 ms

• More efficient primary filter • Fewer rows returned

• 10ms slower for very fast queries < 50 ms• Increased tessellation time which is constant AD404-M| Spatial Performance

Page 14: SQLPASS AD404-M Spatial Index MRys

14

Spatial Index Performance

New grid gives much stable performance for query windows of different sizeBetter grid coverage gives fewer high peaks

AD404-M| Spatial Performance

Page 15: SQLPASS AD404-M Spatial Index MRys

15

Index Creation and MaintenanceCreate index example GEOMETRY:

CREATE SPATIAL INDEX sixd ON spatial_table(geom_column)WITH (

BOUNDING_BOX = (0, 0, 500, 500), GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 20)

Create index example GEOGRAPHY:CREATE SPATIAL INDEX sixd ON spatial_table(geogr_column)USING GEOGRAPHY_GRIDWITH (

GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 20)

NEW IN SQL Server “DENALI” (equivalent to default creation):

CREATE SPATIAL INDEX sixd ON spatial_table(geom_column)USING GEOGRAPHY_AUTO_GRIDWITH (CELLS_PER_OBJECT = 20)

Use ALTER and DROP INDEX for maintenance.

Page 16: SQLPASS AD404-M Spatial Index MRys

October 11-14, Seattle, WA

DEMOIndexing and Performance

Page 17: SQLPASS AD404-M Spatial Index MRys

17

Spatial Methods supported by Index

Geometry:• STIntersects() = 1

• STOverlaps() = 1

• STEquals()= 1

• STTouches() = 1

• STWithin() = 1

• STContains() = 1

• STDistance() < val

• STDistance() <= val

• Nearest Neighbor

• Filter() = 1

Geography:• STIntersects() = 1 • STOverlaps() = 1• STEquals()= 1• STWithin() = 1• STContains() = 1• STDistance() < val • STDistance() <= val• Nearest Neighbor• Filter() = 1

New in Denali

AD404-M| Spatial Performance

Page 18: SQLPASS AD404-M Spatial Index MRys

18

How Costing is Done• The stats on the index contain a trie constructed

on the string form of the packed binary(5) typed Cell ID.

• When a window query is compiled with a sniffable window object, the tessellation function on the window object is run at compile time. The results are used to construct a trie for use during compilation. • May lead to wrong compilation for later objects

• No costing on:• Local variables, constants, results of expressions

• Use different indices and different stored procs to account for different query characteristics

AD404-M| Spatial Performance

Page 19: SQLPASS AD404-M Spatial Index MRys

19

Understanding the Index Query Plan

AD404-M| Spatial Performance

Page 20: SQLPASS AD404-M Spatial Index MRys

20

Seeking into a Spatial Index

Minimize I/O and random I/OIntuition: small windows should touch small portions of the indexA cell 7.2.4 matches • Itself• Ancestors• Descendants

Spatial Index S

7 7.2 7.2.4

AD404-M| Spatial Performance

Page 21: SQLPASS AD404-M Spatial Index MRys

21

Understanding the Index Query Plan

T(@g)

Spatial Index Seek

Ranges

Remove dup ranges

Optional Sort

AD404-M| Spatial Performance

Page 22: SQLPASS AD404-M Spatial Index MRys

22

Other Query Processing Support

• Index intersection• Enables efficient mixing of spatial and non-spatial

predicates

• Matching• New in SQL Server “Denali”: Nearest Neighbor query• Distance queries: convert to STIntersects• Commutativity: a.STIntersects(b) = b.STIntersects(a)• Dual: a.STContains(b) = b.STWithin(a)• Multiple spatial indexes on the same column• Various bounding boxes, granularities

• Outer references as window objects• Enables spatial join to use one index

AD404-M| Spatial Performance

Page 23: SQLPASS AD404-M Spatial Index MRys

23

Other Spatial Performance Improvements in SQL Server Codename “Denali”

• Spatial index build time for point data can be as much as four to five times faster

• Optimized spatial query plan for STDistance and STIntersects like queries

• Faster point data queries• Optimized STBuffer, lower memory footprint

AD404-M| Spatial Performance

Page 24: SQLPASS AD404-M Spatial Index MRys

24

Spatial Nearest Neighbor (Denali)Main scenario• Give me the closest 5 Italian restaurants

Execution plan • SQL Server 2008/2008 R2: table scan• SQL Server Codename “Denali”: uses spatial index

Specific query pattern required

• SELECT TOP(5) *FROM Restaurants rWHERE r.type = ‘Italian’ AND r.pos.STDistance(@me) IS NOT NULLORDER BY r.pos.STDistance(@me)

AD404-M| Spatial Performance

Page 25: SQLPASS AD404-M Spatial Index MRys

October 11-14, Seattle, WA

DEMONearest Neighbor performance

Page 26: SQLPASS AD404-M Spatial Index MRys

26

Nearest Neighbor Performance

NN query vs best current workaround (sort all points in 10km radius)

*Average time for NN query is ~236ms

Find the closest 50 business points (22 million in total)

AD404-M| Spatial Performance

Page 27: SQLPASS AD404-M Spatial Index MRys

27

Limitations of Spatial Plan Selection• Off whenever window object is not a

parameter:• Spatial join (window is an outer reference)• Local variable, string constant, or complex expression

• Has the classic SQL Server parameter-sensitivity problem• SQL compiles once for one parameter value and reuses

the plan for all parameter values• Different plans for different sizes of window require

application logic to bucketize the windows

AD404-M| Spatial Performance

Page 28: SQLPASS AD404-M Spatial Index MRys

28

Index Support• Can be built in parallel

• Can be hinted

• File groups/Partitioning

• Aligned to base table or Separate file group

• Full rebuild only

• New catalog views, DDL Events

• DBCC Checks

• Supportability stored procedures

• New in SQL Server “Denali”: Index Page and Row Compression• Ca. 50% smaller indices, 0-15% slower queries

• Not supported

• Online rebuild

• Database Tuning advisorAD404-M| Spatial Performance

Page 29: SQLPASS AD404-M Spatial Index MRys

29

SET Options

Spatial indexes requires:• ANSI_NULLS: ON• ANSI_PADDING: ON• ANSI_WARNINGS: ON• CONCAT_NULL_YIELDS_NULL: ON• NUMERIC_ROUNDABORT: OFF• QUOTED_IDENTIFIER: ON

AD404-M| Spatial Performance

Page 30: SQLPASS AD404-M Spatial Index MRys

30

Index Hinting

FROM T WITH (INDEX (<Spatial_idxname>))• Spatial index is treated the same way a

non-clustered index is• the order of the hint is reflected in the order of the

indexes in the plan• multiple index hints are concatenated• no duplicates are allowed

• The following restrictions exist:• The spatial index must be either first in the first index

hint or last in the last index hint for a given table.• Only one spatial index can be specified in any index hint

for a given table.AD404-M| Spatial Performance

Page 31: SQLPASS AD404-M Spatial Index MRys

31

Query Window Hinting (Denali)

SELECT * FROM table t with(SPATIAL_WINDOW_MAX_CELLS=1024)WHERE t.geom.STIntersects(@window)=1

• Used if an index is chosen (does not force an index)• Overwrites the default (512 for geometry, 768 for

geography)• Rule of thumb:• Higher value makes primary filter phase longer but

reduces work in secondary filter phase• Set higher for dense spatial data • Set lower for sparse spatial data

AD404-M| Spatial Performance

Page 32: SQLPASS AD404-M Spatial Index MRys

October 11-14, Seattle, WA

DEMOQuery hinting

Page 33: SQLPASS AD404-M Spatial Index MRys

35

Spatial Catalog Views

• sys.spatial_indexes catalog view• sys.spatial_index_tessellations catalog

view• Entries in sys.indexes for a spatial index:• A clustered index on the internal table of the spatial

index• A spatial index (type = 4) for spatial index

• An entry in sys.internal_tables• An entry to sys.index_columns

AD404-M| Spatial Performance

Page 34: SQLPASS AD404-M Spatial Index MRys

38

sp_spatial_help_geometry_histogramsp_spatial_help_geography_histogramUsed for spatial data and index analysis

New Spatial Histogram Helpers (Denali)

Histogram of 22 million business points over USLeft: SSMS view of a histogramRight: Custom drawing on top of Bing Maps

AD404-M| Spatial Performance

Page 35: SQLPASS AD404-M Spatial Index MRys

39

Indexing Support Procedures

sys.sp_help_spatial_geometry_indexsys.sp_help_spatial_geometry_index_xmlsys.sp_help_spatial_geography_indexsys.sp_help_spatial_geography_index_xml

Provide information about index:64 properties10 of which are considered core

AD404-M| Spatial Performance

Page 36: SQLPASS AD404-M Spatial Index MRys

40

sys.sp_help_spatial_geometry_index

Arguments

Results in property name/value pair table of the format:

Parameter Type Description

@tabname nvarchar(776)

the name of the table for which the index has been specified

@indexname sysname the index name to be investigated

@verboseoutput tinyint 0 core set of properties is reported1 all properties are being reported

@query_sample geometry A representative query sample that will be used to test the usefulness of the index. It may be a representative object or a query window.

PropName: nvarchar(256) PropValue: sql_variant

AD404-M| Spatial Performance

Page 37: SQLPASS AD404-M Spatial Index MRys

43

Some of the returned Properties

Property Type DescriptionNumber_Of_Rows_Selected_By_Primary_Filter

bigint Core P = Number of rows selected by the primary filter.

Number_Of_Rows_Selected_By_Internal_Filter

bigint Core S = Number of rows selected by the internal filter. For these rows, the secondary filter is not called.

Number_Of_Times_Secondary_Filter_Is_Called

bigint Core Number of times the secondary filter is called.

Percentage_Of_Rows_NotSelected_By_Primary_Filter

float Core Suppose there are N rows in the base table, suppose P are selected by the primary filter. This is (N-P)/N as percentage.

Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter

float Core This is S/P as a percentage. The higher the percentage, the better is the index in avoiding the more expensive secondary filter.

Number_Of_Rows_Output bigint Core O=Number of rows output by the query.

Internal_Filter_Efficiency float Core This is S/O as a percentage.

Primary_Filter_Efficiency float Core This is O/P as a percentage. The higher the efficiency is, the less false positives have to be processed by the secondary filter.

AD404-M| Spatial Performance

Page 38: SQLPASS AD404-M Spatial Index MRys

October 11-14, Seattle, WA

DEMO

Indexing Supportability

Page 39: SQLPASS AD404-M Spatial Index MRys

45

Spatial Tips on index settings

Some best practice recommendations (YMMV):• Start out with new default tesselation• Point data: always use HIGH for all 4 level.

CELL_PER_OBJECT are not relevant in the case.• Simple, relatively consistent polygons: set all levels

to LOW or MEDIUM, MEDIUM, LOW, LOW • Very complex LineString or Polygon instances:• High number of CELL_PER_OBJECT (often 8192 is best)• Setting  all 4 levels to HIGH may be beneficial

• Polygons or line strings which have highly variable sizes: experimentation is needed. 

• Rule of thumb for GEOGRAPHY: if MMMM is not working, try HHMM

AD404-M| Spatial Performance

Page 40: SQLPASS AD404-M Spatial Index MRys

46

What to do if my Spatial Query is slow?• Make sure you are running SQL Server 2008 SP1, 2008 R2 or

“Denali”• Check query plan for use of index• Make sure it is a supported operation• Hint the index (and/or a different join type)• Do not use a spatial index when there is a highly selective non-

spatial predicate• Run above index support procedure:• Assess effectiveness of primary filter (Primary_Filter_Efficiency)• Assess effectiveness of internal filter (Internal_Filter_Efficiency)• Redefine or define a new index with better characteristics

• More appropriate bounding box for GEOMETRY• Better grid densities

AD404-M| Spatial Performance

Page 41: SQLPASS AD404-M Spatial Index MRys

47

Related ContentWeblog• http://blogs.msdn.com/isaac• http://blogs.msdn.com/edkatibah • http://johanneskebeck.spaces.live.com/ • http://sqlblog.com/blogs/michael_rys/

Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1629&SiteID=1 Whitepapers, Websites & Code• Denali CTP3: http://

sqlcat.com/sqlcat/b/whitepapers/archive/2011/08/08/new-spatial-features-in-sql-server-code-named-denali-community-technology-preview-3.aspx

• Spatial Wiki: http://social.technet.microsoft.com/wiki/contents/articles/4136.aspx • SQL Server 2008 Spatial Site:

http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx• SQL Spatial Codeplex: http://www.codeplex.com/sqlspatialtools• http://www.sharpgis.net/page/SQL-Server-2008-Spatial-Tools.aspx • http://www.codeplex.com/ProjNET • http://www.geoquery2008.com/ • SIGMOD 2008 Paper: Spatial Indexing in Microsoft SQL Server 2008• And of course Books Online! AD404-M| Spatial Performance

Page 42: SQLPASS AD404-M Spatial Index MRys

48

Complete the Evaluation Form to Win!Win a Dell Mini Netbook – every day – just for submitting your completed form. Each session evaluation form represents a chance to win.

Pick up your evaluation form:• In each presentation room• Online on the PASS Summit website

Drop off your completed form:• Near the exit of each presentation room• At the Registration desk• Online on the PASS Summit website

Sponsored by Dell

AD404-M| Spatial Performance

Page 43: SQLPASS AD404-M Spatial Index MRys

October 11-14, Seattle, WA

Thank youfor attending this session and the 2011 PASS Summit in Seattle

Page 44: SQLPASS AD404-M Spatial Index MRys

50AD404-M| Spatial Performance

Microsoft SQL Server Clinic

Work through your technical issues with

SQL Server CSS & get architectural guidance

from SQLCAT

Microsoft Product Pavilion

Talk with Microsoft SQL Server & BI experts to learn about the next version of SQL Server and check out the new Database Consolidation

Appliance

Expert Pods

Meet Microsoft SQL Server Engineering team members &

SQL MVPs

Hands-on Labs

Get experienced through self-paced & instructor-led labs on our cloud based lab platform - bring your

laptop or use HP provided hardware

Room 611 Expo Hall 6th Floor Lobby Room 618-620