26
Indexes Rose-Hulman Institute of Technology Curt Clifton

Indexes Rose-Hulman Institute of Technology Curt Clifton

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Indexes Rose-Hulman Institute of Technology Curt Clifton

Indexes

Rose-Hulman Institute of Technology

Curt Clifton

Page 2: Indexes Rose-Hulman Institute of Technology Curt Clifton

Overview Introduction to Indexes Index Architecture How SQL Server Retrieves Stored Data How SQL Server Maintains Index and Heap

Structures Deciding Which Columns to Index

Page 3: Indexes Rose-Hulman Institute of Technology Curt Clifton

Data Pages

Page 7 Page 8 Page 9AkhtarAkhtarFunkFunkSmithSmithMartinMartin......

Page 4 Page 5 Page 6 ..............................

ConConFunkFunkWhiteWhite............

RuddRuddWhiteWhiteBarrBarr............

SmithSmithOtaOtaJonesJones............

MartinMartinPhuaPhuaJonesJonesSmithSmith......

GanioGanioJonesJonesHallHall............

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

......

Storing and Accessing Data How Data Is Stored

Rows are stored in data pages Heaps are a collection of data pages for a table

Page 4: Indexes Rose-Hulman Institute of Technology Curt Clifton

Whether to Create Indexes Why to Create an Index

Speeds up data access Enforces uniqueness of rows

Why Not to Create an Index Consumes disk space Incurs overhead

Page 5: Indexes Rose-Hulman Institute of Technology Curt Clifton

Using Heaps SQL Server: Uses Index Allocation Map Pages That:

Contain information on where the extents of a heap are stored

Navigate through the heap and find available space for new rows being inserted

Connect data pages Reclaims Space for New Rows in the Heap When a

Row Is Deleted

Page 6: Indexes Rose-Hulman Institute of Technology Curt Clifton

Using Clustered Indexes Each Table Can Have Only One Clustered

Index The Physical Row Order of the Table and the

Order of Rows in the Index Are the Same Key Value Uniqueness Is Maintained

Explicitly or Implicitly

Page 7: Indexes Rose-Hulman Institute of Technology Curt Clifton

Using Nonclustered Indexes Nonclustered Indexes Are the SQL Server

Default Existing Nonclustered Indexes Are

Automatically Rebuilt When: An existing clustered index is dropped A new clustered index is created The DROP_EXISTING option is used to change

which columns define the clustered index

Page 8: Indexes Rose-Hulman Institute of Technology Curt Clifton

Maintaining Index and Heap Structures

Page 9: Indexes Rose-Hulman Institute of Technology Curt Clifton

Page Splits in an Index

Index Pages

Non-LeafLevel

INSERT member (last name)VALUES lastname = ‘Jackson'

INSERT member (last name)VALUES lastname = ‘Jackson'

LangLangSmithSmith

……

AkhtarAkhtarGanioGanio

……

AkhtarAkhtar……

MartinMartin

Jackson

Leaf Level(Key Value)

AkhtarAkhtarBarrBarrBarrBarrBormBormBuhlBuhl

…………………………

LangLangMartinMartinMartinMartinMartinMartinMorisMoris

…………………………

SmithSmithSmithSmithSmithSmithSmithSmithSmithSmith

…………………………

… ……GanioGanioHallHallHartHart

JonesJonesJonesJones

…………………………

Leaf Level(Key Value)

AkhtarAkhtarBarrBarrBarrBarrBormBormBuhlBuhl

…………………………

JacksonJacksonJonesJonesJonesJones

………………

LangLangMartinMartinMartinMartinMartinMartinMorisMoris

…………………………

SmithSmithSmithSmithSmithSmithSmithSmithSmithSmith

…………………………

… ……GanioGanioHallHallHartHart

……………… …

Jackson …

Page 10: Indexes Rose-Hulman Institute of Technology Curt Clifton

Forwarding Pointer in a Heap

Non-LeafLevel

Page 12 - RootPage 37 Page 28

Leaf Level(Key Value)

Page 41 Page 51 Page 61 Page 71

AkhtarAkhtar......MartinMartin

AkhtarAkhtarBarrBarrConConFunkFunkFunkFunk

4:706:014:706:014:705:034:705:034:704:014:704:014:706:024:706:024:704:024:704:02

MartinMartinSmithSmith......

SmithSmithSmithSmithSmithSmithWhiteWhiteWhiteWhite

4:706:034:706:034:708:044:708:044:707:014:707:014:704:034:704:034:705:024:705:02

AkhtarAkhtarGanioGanio......

GanioGanioHallHall

JonesJonesJonesJonesJonesJones

4:709:014:709:014:709:044:709:044:709:024:709:024:708:034:708:034:707:034:707:03

HeapPage 707 Page 708 Page 709

0101020203030404......

......

......

......

......

......

AkhtarAkhtarFunkFunkSmithSmithMartinMartin......

Page 704 Page 705 Page 706 010102020303............

......

......

......

......

......

ConnConnFunkFunkWhiteWhite............

010102020303............

......

......

......

......

......

RuddRuddWhiteWhiteBarrBarr

......

......

01010202030304040505

......

......

......

......

......

SmithSmithOtaOta

JonesJonesCoretsCoretsNashNash

0101020203030404......

......

......

......

......

......

MartinMartinPhuaPhuaJonesJonesSmithSmith......

010102020303............

......

......

......

......

......

GanioGanioJonesJonesHallHall

......

......

MartinMartinMartinMartin

OtaOtaPhuaPhuaRuddRudd

4:708:014:708:014:706:044:706:044:707:024:707:024:708:024:708:024:705:014:705:01

NonClustered

Index

File ID #4

idid indid = 2indid = 2 rootrootsysindexes

Non-Leaf

Level

Page 12 - RootPage 37 Page 28

Leaf Level(Key Value)

Page 41 Page 51 Page 61 Page 71

AkhtarAkhtar......MartinMartin

AkhtarAkhtarBarrBarrConConFunkFunkFunkFunk

4:706:014:706:014:705:034:705:034:704:014:704:014:706:024:706:024:704:024:704:02

MartinMartinSmithSmith......

SmithSmithSmithSmithSmithSmithWhiteWhiteWhiteWhite

4:706:034:706:034:708:044:708:044:707:014:707:014:704:034:704:034:705:024:705:02

AkhtarAkhtarGanioGanio......

GanioGanioHallHall

JonesJonesJonesJonesJonesJones

4:709:014:709:014:709:044:709:044:709:024:709:024:708:034:708:034:707:034:707:03

HeapPage 707 Page 708 Page 709

0101020203030404......

......

......

......

......

......

AkhtarAkhtarFunkFunkSmithSmithMartinMartin......

Page 704 Page 705 Page 706 010102020303............

......

......

......

......

......

ConnConnFunkFunkWhiteWhite............

010102020303............

......

......

......

......

......

RuddRuddWhiteWhiteBarrBarr

......

......

01010202030304040505

......

......

......

......

......

SmithSmithOtaOta

JonesJonesCoretsCoretsNashNash

0101020203030404......

......

......

......

......

......

MartinMartinPhuaPhuaJonesJonesSmithSmith......

010102020303............

......

......

......

......

......

GanioGanioJonesJonesHallHall

......

......

MartinMartinMartinMartin

OtaOtaPhuaPhuaRuddRudd

4:708:014:708:014:706:044:706:044:707:024:707:024:708:024:708:024:705:014:705:01

Nonclustered

Index

File ID #4

0202 ...... OtaOta0202 ...... OtaOta

MartinMartin

MartinMartin

0202 OtaOta

0404 ...... OtaOta

OtaOta 4:707:024:707:02

UPDATE memberSET Address = <really long>WHERE lastname = 'Ota'

UPDATE memberSET Address = <really long>WHERE lastname = 'Ota'

Page 11: Indexes Rose-Hulman Institute of Technology Curt Clifton

Row Updates Generally do not cause rows to move Like a delete followed by an insert

Logically Sometimes practically

Batch updates touch each index once

Page 12: Indexes Rose-Hulman Institute of Technology Curt Clifton

Deletion Deletion creates “ghost records” Reclaiming space

Free pages when empty For indexed table:

Can overwrite ghost records immediately For non-indexed table:

Compact records when more space is needed for insert

Page 13: Indexes Rose-Hulman Institute of Technology Curt Clifton

Deciding What to Index

Page 14: Indexes Rose-Hulman Institute of Technology Curt Clifton

What You Need to Know Logical and Physical Database Design Data Characteristics How Data Is Used

The types of queries performed The frequency of queries that are typically

performed

Page 15: Indexes Rose-Hulman Institute of Technology Curt Clifton

Indexing Guidelines Columns to Index

Primary and foreign keys Those frequently searched in ranges Those frequently accessed in sorted order Those frequently grouped together during aggregation

Columns Not to Index Those seldom referenced in queries Those that contain few unique values Those defined with text, ntext, or image data types

Page 16: Indexes Rose-Hulman Institute of Technology Curt Clifton

Choosing the Clustered Index Heavily Updated Tables

A clustered index with an identity column keeps updated pages in memory

Sorting A clustered index keeps the data pre-sorted

Column Length and Data Type Limit the number of columns Reduce the number of characters Use the smallest data type possible

Page 17: Indexes Rose-Hulman Institute of Technology Curt Clifton

Data Characteristics – Densitylast_namelast_namelast_namelast_name first_namefirst_namefirst_namefirst_name

RandallRandall

..

..

..

JoshuaJoshua

RandallRandall CynthiaCynthia

RandallRandall TristanTristan

..

..

..

OtaOta LaniLani

..

..

..

SELECT *FROM memberWHERE last_name = ‘Ota’

SELECT *FROM memberWHERE last_name = ‘Ota’

Low Density

SELECT *FROM memberWHERE last_name = ‘Randall’

SELECT *FROM memberWHERE last_name = ‘Randall’

High Density

Page 18: Indexes Rose-Hulman Institute of Technology Curt Clifton

Data Characteristics – Selectivity How effective is a column at selecting a

subset of the data A property of a given query:

Rows matching property / Total number of rows

Page 19: Indexes Rose-Hulman Institute of Technology Curt Clifton

Determining Selectivity High selectivitymember_nomember_nomember_nomember_no

11

22

..

..

last_namelast_namelast_namelast_name first_namefirst_namefirst_namefirst_name

RandallRandall

FloodFloodJoshuaJoshua

KathieKathie

..

1000010000 AndersonAnderson BillBill

SELECT *FROM memberWHERE member_no > 8999

SELECT *FROM memberWHERE member_no > 8999

100010000 = 10%

Number of rows meeting criteriaTotal number of rows in table =

Low selectivitymember_nomember_nomember_nomember_no

11

22

..

..

last_namelast_namelast_namelast_name first_namefirst_namefirst_namefirst_name

RandallRandall

FloodFloodJoshuaJoshua

KathieKathie

..

1000010000 AndersonAnderson BillBill

SELECT *FROM memberWHERE member_no < 9001

SELECT *FROM memberWHERE member_no < 9001

900010000 = 90%

Number of rows meeting criteriaTotal number of rows in table =

Page 20: Indexes Rose-Hulman Institute of Technology Curt Clifton

Indexing to Support Queries Writing Good Search Arguments

Specify a WHERE clause in the query Verify that the WHERE clause limits the number

of rows Avoid using leading wildcards

Page 21: Indexes Rose-Hulman Institute of Technology Curt Clifton

Introduction to Statistics

Page 22: Indexes Rose-Hulman Institute of Technology Curt Clifton

Standard Distribution of Values

F - JA - E K - O P - U V - ZLast Name

Number ofLast Names

Even Distribution of Values

Last Name

Number ofLast Names

C - FA - B G - K L - N O - Z

Page 23: Indexes Rose-Hulman Institute of Technology Curt Clifton

How Statistics Are Gathered DMBS reads/samples column values

Produces an evenly distributed sorted list of values

Performs a full scan or sampling of rows Depending on size of table and granularity

wanted Selects samplings if necessary

Picks rows to be sampled Includes all rows on the page of selected rows

Page 24: Indexes Rose-Hulman Institute of Technology Curt Clifton

Creating Statistics Automatically Creating Statistics

Indexed columns that contain data Non-indexed columns that are used in a join

predicate or a WHERE clause Manually Creating Statistics

Columns that are not indexed All columns other than the first column of a

composite index

Page 25: Indexes Rose-Hulman Institute of Technology Curt Clifton

Viewing Statistics The DBCC SHOW_STATISTICS Statement

Returns Statistical Information in the Distribution Page for an Index or Column

Statistical Information Includes: The time when the statistics were last updated The number of rows sampled to produce the

histogram Density information Average key length Histogram step information

Page 26: Indexes Rose-Hulman Institute of Technology Curt Clifton

Performance Considerations Create Indexes on Foreign Keys Create the Clustered Index Before

Nonclustered Indexes Consider Creating Composite Indexes Create Multiple Indexes for a Table That Is

Read Frequently