10
Index Variations Indexed Files - Part Three Analyzing the Options

Index Variations

Embed Size (px)

DESCRIPTION

Index Variations. Indexed Files - Part Three Analyzing the Options. Second Level Index in Multiple Files. Must have space to allow each 1:1 Index File to grow. Main index points to multiple secondary index files, which are each one cluster in size (e.g. 8KB). Sparse Second Level Index. - PowerPoint PPT Presentation

Citation preview

Page 1: Index Variations

Index Variations

Indexed Files - Part Three

Analyzing the Options

Page 2: Index Variations

Second Level Index in Multiple Files Main index points to multiple secondary index

files, which are each one cluster in size (e.g. 8KB).

Product IDSecondary Index

Filename

AD1089-A IF1

CNR-5439-Z IF2

HDD-8208-12 IF3

GRA12 IF4

MONT-880 IF5

TRS-2012A IF6

ZWD-1200 IF7

Product ID DRRN Product ID DRRN

Must have space to allow each1:1 Index File to grow.

Page 3: Index Variations

Sparse Second Level Index Q: Assuming 2-Levels, does the second

level need to be 1:1?

A: It depends if datafile is sorted, then no if datafile is unsorted, then yes

Key DRRN

0 0

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

. . . . . .

Key IRRN

0 0

20 20

40 40

60 60

80 80

Page 4: Index Variations

Multiple Indexes for Multiple Keys If our key is Customer Name, then we cannot

sort by Customer ID please note that names are terrible

keys, because names are not unique

DRRN Name Acct NumYadda yadda

0 Carey 1001

1 Foster 1003

2 Barnes 1004

3 Zinn 1009

4 Critter 1002

5 Faulk 1011

6 Adams 1010

7 Wilks 1005

8 Bishop 1013

9 Farrow 1012

10 Duncan 1014

11 Dinkins 1015

12 West 1020

. . . . . .

18 Bell 1042

19 Conner 1044

20 Davis 1043

21 Dannelly 1045... . . .80 Camp 2134

81 Young 2135

... . . .

98 Fuller 2598

99 Crook 2600

Name DRRN

0 Adams 6

1 Barnes 2

2 Bell 18

3 Bishop 8

4 Camp 80

5 Carey 0

6 Conner 19

7 Critter 4

8 Crook 99

9 Dannelly 21

10 Davis 20

11 . . . . . .

AcctNum DRRN

0 1001 0

1 1002 4

2 1003 1

3 1004 2

4 1005 7

5 1009 3

6 1010 6

7 1011 5

8 1012 9

9 1013 8

10 1014 10

11 . . . . . .

Page 5: Index Variations

Multiple Indexes for Multiple "Views"

Suppose there are different types of users, with different levels of access. Implemented in Linux via "set effective user id".

All Accts

AcctNum DRRN

1001 0

1002 4

1003 1

1004 2

1005 7

1009 3

1010 6

1011 5

1012 9

1013 8

1014 10

. . . . . .

SE Region Accts

AcctNum DRRN

1001 0

1002 4

1004 2

1005 7

1009 3

1012 9

1013 8

1014 10

1042 18

1044 19

DRRN Name Acct NumYadda yadda

0 Carey 1001

1 Foster 1003

2 Barnes 1004

3 Zinn 1009

4 Critter 1002

5 Faulk 1011

6 Adams 1010

7 Wilks 1005

8 Bishop 1013

9 Farrow 1012

10 Duncan 1014

11 Dinkins 1015

12 West 1020

. . . . . .

18 Bell 1042

19 Conner 1044

20 Davis 1043

21 Dannelly 1045... . . .80 Camp 2134

81 Young 2135

... . . .

98 Fuller 2598

99 Crook 2600

West Region Accts

AcctNum DRRN

1003 1

1010 6

1011 5

1015 11

1020 12

1021 13

1022 14

1043 20

. . . . . .

2598 98

2600 99

Page 6: Index Variations

Three Level

Index

Name Acct Num

AddressYadda yadda

0 Carey

1 Foster

2 Barnes

3 Zinn

4 Critter

5 Faulk

6 Adams

7 Wilks

8 Bishop

9 Farrow

10 Duncan

11 Dinkins

12 West

. . . . . .

18 Bell

19 Conner

20 Davis

21 Dannelly

... . . .

80 Camp

81 Zane

... . . .

98 Fuller

99 Crook

Key IRRN

Adams 0

Ingram 3

Randall 6

Young 9

Key DRRN

0 Adams 6

1 Barnes 2

2 Bell 18

3 Bishop 8

4 Camp 80

5 Carey 0

6 Conner 19

7 Critter 4

8 Crook 99

9 Dannelly 21

10 Davis 20

11 Dinkins 11

12 Duncan 10

. . . . . . . . .

18 Faulk 5

19 Farrow 9

20 Foster 1

21 Fuller 98

... ... ...80 West 12

81 Wilks 7

... ... ...

98 Zane 81

99 Zinn 3

Key IRRN

0 Adams 0

1 Davis 10

2 Foster 20

3 Ingram 30

4 Lambert 40

5 Norris 50

6 Randall 60

7 Tyler 70

8 West 80

9 Young 90

Level One

Level Two

Level Three Data File

Page 7: Index Variations

When to Use 3 Levels When the number of records in the datafile is

very large.

Assume each index record for levels Two and Three take 28 bytes each key = product ID = 20 bytes RRN = long int = 8 bytes

Best Max Index section for internal sorting = 315 records 8KB cluster / 28B = 315.01

If N = 1,000,000 data records then level three contains 1,000,000 records and level two contains 3175 index records

1,000,000 / 315 = 3174.6 and level one contains 11 index records

Page 8: Index Variations

Timing Analysis 1 Given a data file:

100,000 records no index sorted

How much time does a search take on average?

Using a binary file search log2N reads of datafile records

log2100,000 = 16.6 = 17 reads of datafile records

Page 9: Index Variations

Timing Analysis 2 Given an Indexed File with

100,000 records two index levels

top level index size = 1,000 elements

How much time does a search take on average?

Level One linear search time = N / 2 1000 / 2 = 500 memory compares

Level Two read the portion into memory = 1 read of 100 index records binary search that portion = Log2 100 = 7 memory compares

Data File 1 datafile record read

Page 10: Index Variations

To index, or not to index...

No Index File 17 reads of datafile records

Two Levels of Index 507 memory compares 1 read of 100 index records 1 datafile record read