Upload
calvin-briggs
View
23
Download
0
Embed Size (px)
DESCRIPTION
Index Variations. Indexed Files - Part Three Analyzing the Options. Second Level Index in Multiple Files. Must have space to allow each 1:1 Index File to grow. Main index points to multiple secondary index files, which are each one cluster in size (e.g. 8KB). Sparse Second Level Index. - PowerPoint PPT Presentation
Citation preview
Index Variations
Indexed Files - Part Three
Analyzing the Options
Second Level Index in Multiple Files Main index points to multiple secondary index
files, which are each one cluster in size (e.g. 8KB).
Product IDSecondary Index
Filename
AD1089-A IF1
CNR-5439-Z IF2
HDD-8208-12 IF3
GRA12 IF4
MONT-880 IF5
TRS-2012A IF6
ZWD-1200 IF7
Product ID DRRN Product ID DRRN
Must have space to allow each1:1 Index File to grow.
Sparse Second Level Index Q: Assuming 2-Levels, does the second
level need to be 1:1?
A: It depends if datafile is sorted, then no if datafile is unsorted, then yes
Key DRRN
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
. . . . . .
Key IRRN
0 0
20 20
40 40
60 60
80 80
Multiple Indexes for Multiple Keys If our key is Customer Name, then we cannot
sort by Customer ID please note that names are terrible
keys, because names are not unique
DRRN Name Acct NumYadda yadda
0 Carey 1001
1 Foster 1003
2 Barnes 1004
3 Zinn 1009
4 Critter 1002
5 Faulk 1011
6 Adams 1010
7 Wilks 1005
8 Bishop 1013
9 Farrow 1012
10 Duncan 1014
11 Dinkins 1015
12 West 1020
. . . . . .
18 Bell 1042
19 Conner 1044
20 Davis 1043
21 Dannelly 1045... . . .80 Camp 2134
81 Young 2135
... . . .
98 Fuller 2598
99 Crook 2600
Name DRRN
0 Adams 6
1 Barnes 2
2 Bell 18
3 Bishop 8
4 Camp 80
5 Carey 0
6 Conner 19
7 Critter 4
8 Crook 99
9 Dannelly 21
10 Davis 20
11 . . . . . .
AcctNum DRRN
0 1001 0
1 1002 4
2 1003 1
3 1004 2
4 1005 7
5 1009 3
6 1010 6
7 1011 5
8 1012 9
9 1013 8
10 1014 10
11 . . . . . .
Multiple Indexes for Multiple "Views"
Suppose there are different types of users, with different levels of access. Implemented in Linux via "set effective user id".
All Accts
AcctNum DRRN
1001 0
1002 4
1003 1
1004 2
1005 7
1009 3
1010 6
1011 5
1012 9
1013 8
1014 10
. . . . . .
SE Region Accts
AcctNum DRRN
1001 0
1002 4
1004 2
1005 7
1009 3
1012 9
1013 8
1014 10
1042 18
1044 19
DRRN Name Acct NumYadda yadda
0 Carey 1001
1 Foster 1003
2 Barnes 1004
3 Zinn 1009
4 Critter 1002
5 Faulk 1011
6 Adams 1010
7 Wilks 1005
8 Bishop 1013
9 Farrow 1012
10 Duncan 1014
11 Dinkins 1015
12 West 1020
. . . . . .
18 Bell 1042
19 Conner 1044
20 Davis 1043
21 Dannelly 1045... . . .80 Camp 2134
81 Young 2135
... . . .
98 Fuller 2598
99 Crook 2600
West Region Accts
AcctNum DRRN
1003 1
1010 6
1011 5
1015 11
1020 12
1021 13
1022 14
1043 20
. . . . . .
2598 98
2600 99
Three Level
Index
Name Acct Num
AddressYadda yadda
0 Carey
1 Foster
2 Barnes
3 Zinn
4 Critter
5 Faulk
6 Adams
7 Wilks
8 Bishop
9 Farrow
10 Duncan
11 Dinkins
12 West
. . . . . .
18 Bell
19 Conner
20 Davis
21 Dannelly
... . . .
80 Camp
81 Zane
... . . .
98 Fuller
99 Crook
Key IRRN
Adams 0
Ingram 3
Randall 6
Young 9
Key DRRN
0 Adams 6
1 Barnes 2
2 Bell 18
3 Bishop 8
4 Camp 80
5 Carey 0
6 Conner 19
7 Critter 4
8 Crook 99
9 Dannelly 21
10 Davis 20
11 Dinkins 11
12 Duncan 10
. . . . . . . . .
18 Faulk 5
19 Farrow 9
20 Foster 1
21 Fuller 98
... ... ...80 West 12
81 Wilks 7
... ... ...
98 Zane 81
99 Zinn 3
Key IRRN
0 Adams 0
1 Davis 10
2 Foster 20
3 Ingram 30
4 Lambert 40
5 Norris 50
6 Randall 60
7 Tyler 70
8 West 80
9 Young 90
Level One
Level Two
Level Three Data File
When to Use 3 Levels When the number of records in the datafile is
very large.
Assume each index record for levels Two and Three take 28 bytes each key = product ID = 20 bytes RRN = long int = 8 bytes
Best Max Index section for internal sorting = 315 records 8KB cluster / 28B = 315.01
If N = 1,000,000 data records then level three contains 1,000,000 records and level two contains 3175 index records
1,000,000 / 315 = 3174.6 and level one contains 11 index records
Timing Analysis 1 Given a data file:
100,000 records no index sorted
How much time does a search take on average?
Using a binary file search log2N reads of datafile records
log2100,000 = 16.6 = 17 reads of datafile records
Timing Analysis 2 Given an Indexed File with
100,000 records two index levels
top level index size = 1,000 elements
How much time does a search take on average?
Level One linear search time = N / 2 1000 / 2 = 500 memory compares
Level Two read the portion into memory = 1 read of 100 index records binary search that portion = Log2 100 = 7 memory compares
Data File 1 datafile record read
To index, or not to index...
No Index File 17 reads of datafile records
Two Levels of Index 507 memory compares 1 read of 100 index records 1 datafile record read