63
1 Data Storage and Indexing

File and Indexing

Embed Size (px)

Citation preview

Page 1: File and Indexing

1

Data Storage and Indexing

Page 2: File and Indexing

2

Outline• Data Storage

– storing data: disks – buffer manager – representing data – external sorting

• Indexing– types of indexes– B+ trees– hash tables (maybe)

Page 3: File and Indexing

3

Physical Addresses• Each block and each record have a physical

address that consists of:– The host

– The disk

– The cylinder number

– The track number

– The block within the track

– For records: an offset in the block• sometimes this is in the block’s header

Page 4: File and Indexing

4

Logical Addresses

• Logical address: a string of bytes (10-16)

• More flexible: can blocks/records around

• But need translation table:

Logical address Physical address

L1 P1

L2 P2

L3 P3

Page 5: File and Indexing

5

Main Memory Address

• When the block is read in main memory, it receives a main memory address

• Buffer manager has another translation table

Memory address Logical address

M1 L1

M2 L2

M3 L3

Page 6: File and Indexing

6

The I/O Model of Computation

• In main memory algorithms – we care about CPU time

• In databases time is dominated by I/O cost

• Assumption: cost is given only by I/O

• Consequence: need to redesign certain algorithms

• Will illustrate here with sorting

Page 7: File and Indexing

7

Sorting

• Illustrates the difference in algorithm design when your data is not in main memory:– Problem: sort 1Gb of data with 1Mb of RAM.

• Arises in many places in database systems: – Data requested in sorted order (ORDER BY)– Needed for grouping operations– First step in sort-merge join algorithm– Duplicate removal– Bulk loading of B+-tree indexes.

Page 8: File and Indexing

8

2-Way Merge-sort:Requires 3 Buffers

• Pass 1: Read a page, sort it, write it.– only one buffer page is used

• Pass 2, 3, …, etc.:– three buffer pages used.

Main memory buffers

INPUT 1

INPUT 2

OUTPUT

DiskDisk

Page 9: File and Indexing

9

Two-Way External Merge Sort• Each pass we read + write

each page in file.

• N pages in the file => the number of passes

• So total cost is:

• Improvement: start with larger runs

• Sort 1GB with 1MB memory in 10 passes

log2 1N

2 12N Nlog

Input file

1-page runs

2-page runs

4-page runs

8-page runs

PASS 0

PASS 1

PASS 2

PASS 3

9

3,4 6,2 9,4 8,7 5,6 3,1 2

3,4 5,62,6 4,9 7,8 1,3 2

2,34,6

4,7

8,91,35,6 2

2,3

4,46,7

8,9

1,23,56

1,22,3

3,4

4,56,6

7,8

Page 10: File and Indexing

10

Can We Do Better ?• We have more main memory• Should use it to improve performance

Page 11: File and Indexing

11

Cost Model for Our Analysis

• B: Block size

• M: Size of main memory

• N: Number of records in the file

• R: Size of one record

Page 12: File and Indexing

12

Indexes

Page 13: File and Indexing

13

Outline

• Types of indexes

• B+ trees

• Hash tables (maybe)

Page 14: File and Indexing

14

Indexes• An index on a file speeds up selections on the

search key field(s) • Search key = any subset of the fields of a relation

– Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation).

• Entries in an index: (k, r), where:– k = the key

– r = the record OR record id OR record ids

Page 15: File and Indexing

15

Index Classification• Clustered/unclustered

– Clustered = records sorted in the key order– Unclustered = no

• Dense/sparse– Dense = each record has an entry in the index– Sparse = only some records have

• Primary/secondary– Primary = on the primary key– Secondary = on any key– Some books interpret these differently

• B+ tree / Hash table / …

Page 16: File and Indexing

16

Clustered Index

• File is sorted on the index attribute

• Dense index: sequence of (key,pointer) pairs

10

20

30

40

50

60

70

80

10

20

30

40

50

60

70

80

Page 17: File and Indexing

17

Clustered Index

• Sparse index: one key per data block

10

30

50

70

90

110

130

150

10

20

30

40

50

60

70

80

Page 18: File and Indexing

18

Clustered Index with Duplicate Keys

• Dense index: point to the first record with that key

10

20

30

40

50

60

70

80

10

10

10

20

20

20

30

40

Page 19: File and Indexing

19

Clustered Index with Duplicate Keys

• Sparse index: pointer to lowest search key in each block:

• Search for 20

10

10

20

30

10

10

10

20

20

20

30

40

Page 20: File and Indexing

20

Clustered Index with Duplicate Keys

• Better: pointer to lowest new search key in each block:

• Search for 20

10

20

30

40

50

60

70

80

10

10

10

20

30

30

40

50

Page 21: File and Indexing

21

Unclustered Indexes

• To index other attributes than primary key

• Always dense (why ?)

10

10

20

20

20

30

30

30

20

30

30

20

10

20

10

30

Page 22: File and Indexing

22

Summary Clustered vs. Unclustered Index

Data entries(Index File)

(Data file)

Data Records

Data entries

Data Records

CLUSTERED UNCLUSTERED

Page 23: File and Indexing

23

Composite Search Keys

• Composite Search Keys: Search on a combination of fields.

– Equality query: Every field value is equal to a constant value. E.g. wrt <sal,age> index:

• age=20 and sal =75

– Range query: Some field value is not a constant. E.g.:

• age =20; or age=20 and sal > 10

sue 13 75

bob

cal

joe 12

10

20

8011

12

name age sal

<sal, age>

<age, sal> <age>

<sal>

12,20

12,10

11,80

13,75

20,12

10,12

75,13

80,11

11

12

12

13

10

20

75

80

Data recordssorted by name

Data entries in indexsorted by <sal,age>

Data entriessorted by <sal>

Examples of composite keyindexes using lexicographic order.

Page 24: File and Indexing

24

B+ Trees

• Search trees

• Idea in B Trees:– make 1 node = 1 block

• Idea in B+ Trees:– Make leaves into a linked list (range queries are

easier)

Page 25: File and Indexing

25

• Parameter d = the degree

• Each node has >= d and <= 2d keys (except root)

• Each leaf has >=d and <= 2d keys:

B+ Trees Basics

30 120 240

Keys k < 30Keys 30<=k<120 Keys 120<=k<240 Keys 240<=k

40 50 60

40 50 60

Next leaf

Page 26: File and Indexing

26

B+ Tree Example

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 15 18 20 30 40 50 60 65 80 85 90

d = 2

Page 27: File and Indexing

27

B+ Tree Design

• How large d ?

• Example:– Key size = 4 bytes– Pointer size = 8 bytes– Block size = 4096 byes

• 2d x 4 + (2d+1) x 8 <= 4096

• d = 170

Page 28: File and Indexing

28

Searching a B+ Tree

• Exact key values:– Start at the root– Proceed down, to the leaf

• Range queries:– As above– Then sequential traversal

Select nameFrom peopleWhere age = 25

Select nameFrom peopleWhere 20 <= age and age <= 30

Page 29: File and Indexing

29

B+ Trees in Practice• Typical order: 100. Typical fill-factor: 67%.

– average fanout = 133

• Typical capacities:– Height 4: 1334 = 312,900,700 records– Height 3: 1333 = 2,352,637 records

• Can often hold top levels in buffer pool:– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 MBytes

Page 30: File and Indexing

30

Insertion in a B+ Tree

Insert (K, P)• Find leaf where K belongs, insert• If no overflow (2d keys or less), halt• If overflow (2d+1 keys), split node, insert in parent:

• If leaf, keep K3 too in right node• When root splits, new root has 1 key only

K1 K2 K3 K4 K5

P0 P1 P2 P3 P4 p5

K1 K2

P0 P1 P2

K4 K5

P3 P4 p5

(K3, ) to parent

Page 31: File and Indexing

31

Insertion in a B+ Tree

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 15 18 20 30 40 50 60 65 80 85 90

Insert K=19

Page 32: File and Indexing

32

Insertion in a B+ Tree

80

20 60 100 120 140

10 15 18 19 20 30 40 50 60 65 80 85 90

10 15 18 20 30 40 50 60 65 80 85 9019

After insertion

Page 33: File and Indexing

33

Insertion in a B+ Tree

80

20 60 100 120 140

10 15 18 19 20 30 40 50 60 65 80 85 90

10 15 18 20 30 40 50 60 65 80 85 9019

Now insert 25

Page 34: File and Indexing

34

Insertion in a B+ Tree

80

20 60 100 120 140

10 15 18 19 20 25

30 40

50 60 65 80 85 90

10 15 18 20 25 30 40 60 65 80 85 9019

After insertion

50

Page 35: File and Indexing

35

Insertion in a B+ Tree

80

20 60 100 120 140

10 15 18 19 20 25

30 40

50 60 65 80 85 90

10 15 18 20 25 30 40 60 65 80 85 9019

But now have to split !

50

Page 36: File and Indexing

36

Insertion in a B+ Tree

80

20 30 60 100 120 140

10 15 18 19 20 25

60 65 80 85 90

10 15 18 20 25 30 40 60 65 80 85 9019

After the split

50

30 40

50

Page 37: File and Indexing

37

Deletion from a B+ Tree

80

20 30 60 100 120 140

10 15 18 19 20 25

60 65 80 85 90

10 15 18 20 25 30 40 60 65 80 85 9019

Delete 30

50

30 40

50

Page 38: File and Indexing

38

Deletion from a B+ Tree

80

20 30 60 100 120 140

10 15 18 19 20 25

60 65 80 85 90

10 15 18 20 25 40 60 65 80 85 9019

After deleting 30

50

40 50

May change to 40, or not

Page 39: File and Indexing

39

Deletion from a B+ Tree

80

20 30 60 100 120 140

10 15 18 19 20 25

60 65 80 85 90

10 15 18 20 25 40 60 65 80 85 9019

Now delete 25

50

40 50

Page 40: File and Indexing

40

Deletion from a B+ Tree

80

20 30 60 100 120 140

10 15 18 19 20 60 65 80 85 90

10 15 18 20 40 60 65 80 85 9019

After deleting 25Need to rebalanceRotate

50

40 50

Page 41: File and Indexing

41

Deletion from a B+ Tree

80

19 30 60 100 120 140

10 15 18 19 20

60 65 80 85 90

10 15 18 20 40 60 65 80 85 9019

Now delete 40

50

40 50

Page 42: File and Indexing

42

Deletion from a B+ Tree

80

19 30 60 100 120 140

10 15 18 19 20

60 65 80 85 90

10 15 18 20 60 65 80 85 9019

After deleting 40Rotation not possibleNeed to merge nodes

50

50

Page 43: File and Indexing

43

Deletion from a B+ Tree

80

19 60 100 120 140

10 15 18 19 20

50

60 65 80 85 90

10 15 18 20 60 65 80 85 9019

Final tree

50

Page 44: File and Indexing

44

Hash Tables

• Secondary storage hash tables are much like main memory ones

• Recall basics:– There are n buckets– A hash function f(k) maps a key k to {0, 1, …, n-1}– Store in bucket f(k) a pointer to record with key k

• Secondary storage: bucket = block, use overflow blocks when needed

Page 45: File and Indexing

45

• Assume 1 bucket (block) stores 2 keys + pointers

• h(e)=0

• h(b)=h(f)=1

• h(g)=2

• h(a)=h(c)=3

Hash Table Example

e

b

f

g

a

c

0

1

2

3

Page 46: File and Indexing

46

• Search for a:

• Compute h(a)=3

• Read bucket 3

• 1 disk access

Searching in a Hash Table

e

b

f

g

a

c

0

1

2

3

Page 47: File and Indexing

47

• Place in right bucket, if space

• E.g. h(d)=2

Insertion in Hash Table

e

b

f

g

d

a

c

0

1

2

3

Page 48: File and Indexing

48

• Create overflow block, if no space• E.g. h(k)=1

• More over-flow blocksmay be needed

Insertion in Hash Table

e

b

f

g

d

a

c

0

1

2

3

k

Page 49: File and Indexing

49

Hash Table Performance

• Excellent, if no overflow blocks

• Degrades considerably when number of keys exceeds the number of buckets (I.e. many overflow blocks).

Page 50: File and Indexing

50

Extensible Hash Table

• Allows has table to grow, to avoid performance degradation

• Assume a hash function h that returns numbers in {0, …, 2k – 1}

• Start with n = 2i << 2k , only look at first i most significant bits

Page 51: File and Indexing

51

Extensible Hash Table

• E.g. i=1, n=2, k=4

• Note: we only look at the first bit (0 or 1)

0(010)

1(011)

i=1 1

1

01

Page 52: File and Indexing

52

Insertion in Extensible Hash Table

• Insert 1110

0(010)

1(011)

1(110)

i=1 1

1

01

Page 53: File and Indexing

53

Insertion in Extensible Hash Table

• Now insert 1010

• Need to extend table, split blocks

• i becomes 2

0(010)

1(011)

1(110), 1(010)

i=1 1

1

01

Page 54: File and Indexing

54

Insertion in Extensible Hash Table

• Now insert 1110

0(010)

10(11)

10(10)

i=2 1

2

00011011

11(10) 2

Page 55: File and Indexing

55

Insertion in Extensible Hash Table

• Now insert 0000, then 0101

• Need to split block

0(010)

0(000), 0(101)

10(11)

10(10)

i=2 1

2

00011011

11(10) 2

Page 56: File and Indexing

56

Insertion in Extensible Hash Table

• After splitting the block

00(10)

00(00)

10(11)

10(10)

i=2

2

2

00011011

11(10) 2

01(01) 2

Page 57: File and Indexing

57

Performance Extensible Hash Table

• No overflow blocks: access always one read

• BUT:– Extensions can be costly and disruptive– After an extension table may no longer fit in

memory

Page 58: File and Indexing

58

Linear Hash Table

• Idea: extend only one entry at a time

• Problem: n= no longer a power of 2

• Let i be such that 2i <= n < 2i+1

• After computing h(k), use last i bits:– If last i bits represent a number > n, change msb

from 1 to 0 (get a number <= n)

Page 59: File and Indexing

59

Linear Hash Table Example

• N=3

(01)00

(11)00

(10)10

i=2

000110

(01)11 BIT FLIP

Page 60: File and Indexing

60

Linear Hash Table Example

• Insert 1000: overflow blocks…

(01)00

(11)00

(10)10

i=2

000110

(01)11

(10)00

Page 61: File and Indexing

61

Linear Hash Tables

• Extension: independent on overflow blocks

• Extend n:=n+1 when average number of records per block exceeds (say) 80%

Page 62: File and Indexing

62

Linear Hash Table Extension• From n=3 to n=4

• Only need to touchone block (which one ?)

(01)00

(11)00

(10)10

i=2

000110

(01)11(01)11

(01)11

i=2

000110

(10)10

(01)00

(11)00

11

Page 63: File and Indexing

63

Linear Hash Table Extension

• From n=3 to n=4 finished

• Extension from n=4to n=5 (new bit)

• Need to touch everysingle block (why ?)

(01)11

i=2

000110

(10)10

(01)00

(11)00

11