Upload
gillian-cook
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Kruse/Ryba ch09 1
Object Oriented Data Structures
Tables and Information RetrievalRectangular Tables
Tables of Various ShapesRadix SortHashing
Kruse/Ryba ch09
What is an INDEX?
An index lets you impose order on a file without actually rearranging the file.An index gives keyed access to fixed or variable-length record files.
Kruse/Ryba ch09
Simple Index
A simple index uses a simple array to implement the index.
Called by IBM ISAM (Indexed Sequential Access Method)
Kruse/Ryba ch09
ANG3795 167
COL31809 353
COL38358 211
DG139201 396
DG18807 256
FF245 442
LON2312 32
MER75016 300
RCA2626 77
WAR23699 132
LON|2312|Romeo and Juliet|...
RCA|2626|Quartet in C Sharp...
WAR|23699|Topuchstone|...
ANG|3795|Symphony No. 9|...
COL|38358|Nebraska|...
DG|18807|Symphony No. 9|...
MER|75016|Coq d'or Suite|...
COL|31809|Symphony No. 9|...
DG|139201|Violin Concerto|...
FF|245|Good News|...
Indexfile
Key Reference
Datafile
Actual data record
32
77
132
167
211
256
300
353
396
442
Kruse/Ryba ch09
Concerns
Two files to deal withIndex file easier to deal with than data file because it has fixed-length recordsFixed-length fields impose limits on size of keysIn the example, the index carries no information other than the keys and the reference fields. Other data could be included. (length)
Kruse/Ryba ch09
Basic OperationsCreate the original empty index and data files.Load the index file into memory before using it.Rewrite the index file from memory after using it.Add records to the data file and index.Delete records from the data file.Update records in the data file.
Kruse/Ryba ch09
Creating the Files
Create both the index and data files as empty files. Write headers to both files.
Kruse/Ryba ch09
Loading the Index into Memory
Assume that the index file is small enough to fit into RAM.
Each array element is an index record.
Kruse/Ryba ch09
Safety Mechanisms
Know when the index is out of date.Be able to reconstruct the index from the data file.
Kruse/Ryba ch09
Record AdditionAdding a new record to the data file requires that we also add a record to the index file.
Kruse/Ryba ch09
ANG3795 167
COL31809 353
COL38358 211
DG139201 396
DG18807 256
FF245 442
LON2312 32
MER75016 300
RCA2626 77
WAR23699 132
LON|2312|Romeo and Juliet|...
RCA|2626|Quartet in C Sharp...
WAR|23699|Topuchstone|...
ANG|3795|Symphony No. 9|...
COL|38358|Nebraska|...
DG|18807|Symphony No. 9|...
MER|75016|Coq d'or Suite|...
COL|31809|Symphony No. 9|...
DG|139201|Violin Concerto|...
FF|245|Good News|...
Indexfile
Key Reference
Datafile
Actual data record
32
77
132
167
211
256
300
353
396
442
486 LON|783|Sweet Somthings|...
LON783 486
MER75016 300
RCA2626 77
Kruse/Ryba ch09
Record DeletionAny of the methods discussed in chapter 5 could be used. However, the index file must now be considered.
The index entry could be removed and the array adjusted or the index entry could just be marked as deleted.
Kruse/Ryba ch09
Record Updating
Updating changes the key field– conceptually, this is best thought of as a deletion
followed by an addition
Updating does not change a key field– this will not cause any changes in the index file but
could well cause changes in the data file if the size of the record changes.
Kruse/Ryba ch09
Indexes too large to fit in RAM
Essentially, the later text material deals with this problem.
Hashed Organization
Tree-structures
Kruse/Ryba ch09
Access by Multiple KeysBEETHOVEN ANG3795
BEETHOVEN DG139201
BEETHOVEN DG18807
BEETHOVEN RCA2626
COREA EAR23699
DVORAK COL31809
PROKOFIEV LON2312
RIMSKY-KORSAKOV MER75016
SPRINGSTEEN COL38358
SWEET HONEY IN THE FF245
Secondary keyorganized bycomposer
Kruse/Ryba ch09
Record Addition
Additional indices imply additional overhead when new records are added.
Kruse/Ryba ch09
Record DeletionThis usually implies removing all references to that record in the file system.
Since the primary index does reflect a deletion, a request from a secondary index will result in a failure, implying the record has been deleted.
Such a method would result in wasted space in the secondary index.
Kruse/Ryba ch09
Record Updating
If the update changes the secondary key– it may be necessary to rearrange the secondary key index so
it stays in sorted order
If the update changes the primary key– this creates a major impact on secondary indices
If the update is confined to other fields.– Updates that do not affect either the primary or secondary
key fields do not affect the secondary key index.
Kruse/Ryba ch09
Access by Multiple KeysCOQ D'OR SUITE MER75016
GOOD NEWS FF245
NEBRASKA COL38358
QUARTET IN C SHAR RCA2626
ROMEO AND JULIET LON2312
SYMPHONY NO. 9 ANG3795
SYMPHONY NO. 9 COL31809
SYMPHONY NO. 9 DG18807
TOUCHSTONE WAR23699
VIOLIN CONCERTO DG139201
Secondary keyorganized byrecording title
Kruse/Ryba ch09
Access by Multiple KeysCOQ D'OR SUITE MER75016
GOOD NEWS FF245
NEBRASKA COL38358
QUARTET IN C SHAR RCA2626
ROMEO AND JULIET LON2312
SYMPHONY NO. 9 ANG3795
SYMPHONY NO. 9 COL31809
SYMPHONY NO. 9 DG18807
TOUCHSTONE WAR23699
VIOLIN CONCERTO DG139201
Find all data records with composer = BEETHOVENand title = SYMPHONY NO. 9
Kruse/Ryba ch09
Access by Multiple KeysCOQ D'OR SUITE MER75016
GOOD NEWS FF245
NEBRASKA COL38358
QUARTET IN C SHAR RCA2626
ROMEO AND JULIET LON2312
SYMPHONY NO. 9 ANG3795
SYMPHONY NO. 9 COL31809
SYMPHONY NO. 9 DG18807
TOUCHSTONE WAR23699
VIOLIN CONCERTO DG139201
Find all data records with composer = BEETHOVENand title = SYMPHONY NO. 9
Kruse/Ryba ch09
Access by Multiple KeysBEETHOVEN ANG3795
BEETHOVEN DG139201
BEETHOVEN DG18807
BEETHOVEN RCA2626
COREA EAR23699
DVORAK COL31809
PROKOFIEV LON2312
RIMSKY-KORSAKOV MER75016
SPRINGSTEEN COL38358
SWEET HONEY IN THE FF245
Find all data records with composer = BEETHOVENand title = SYMPHONY NO. 9
Kruse/Ryba ch09
Access by Multiple KeysBEETHOVEN ANG3795
BEETHOVEN DG139201
BEETHOVEN DG18807
BEETHOVEN RCA2626
COREA EAR23699
DVORAK COL31809
PROKOFIEV LON2312
RIMSKY-KORSAKOV MER75016
SPRINGSTEEN COL38358
SWEET HONEY IN THE FF245
Find all data records with composer = BEETHOVENand title = SYMPHONY NO. 9
Kruse/Ryba ch09
ANG3795 167
COL31809 353
COL38358 211
DG139201 396
DG18807 256
FF245 442
LON2312 32
MER75016 300
RCA2626 77
WAR23699 132
LON|2312|Romeo and Juliet|...
RCA|2626|Quartet in C Sharp...
WAR|23699|Topuchstone|...
ANG|3795|Symphony No. 9|...
COL|38358|Nebraska|...
DG|18807|Symphony No. 9|...
MER|75016|Coq d'or Suite|...
COL|31809|Symphony No. 9|...
DG|139201|Violin Concerto|...
FF|245|Good News|...
Indexfile
Key Reference
Datafile
Actual data record
32
77
132
167
211
256
300
353
396
442
LOGICAL AND
Kruse/Ryba ch09
Problems
We have to rearrange the index file every time a new record is added to the file, even if the new record is from an existing secondary key.
Kruse/Ryba ch09
A Better Solution: Linking the List of References
Inverted lists work their way backward from a secondary key to the primary key to the record itself.
Kruse/Ryba ch09
BEETHOVEN
COREA
DVORAK
PROKOFIEV
ANG3795
DG139201
DG18807
RCA2626
WAR23699
COL31809
LON2312
Kruse/Ryba ch09
BEETHOVEN
COREA
DVORAK
PROKOFIEV
ANG3795
DG139201
DG18807
RCA2626
WAR23699
COL31809
LON2312
Might create a large numberof small files, one for eachcomposer.
Kruse/Ryba ch09
Improved Version
Redefine the secondary key index so it consists of records with two fields - a secondary key field, and a field containing the relative record number of the first corresponding primary key reference in the inverted list.
The actual primary key references associated with each secondary key would be stored in a separate entry-sequenced file.
Kruse/Ryba ch09
3BEETHOVEN
2COREA
7DVORAK
10PROKOFIEV
6RIMSKY-KORSAKOV
4SPRINGSTEEN
9SWEET HONEY IN
0
1
2
3
4
5
6
LON2312
RCA2626
WAR23699
ANG2795
COL38358
DG18807
MER75016
COL31809
DG139201
FF245
ANG3193
-1
-1
-1
8
-1
-1
-1
-1
5
-1
0
0
1
2
3
4
5
6
7
8
9
10
Secondary IndexFile
Lable ID List File
Kruse/Ryba ch09 31
Hash Functions
Truncation– Ignore part, use the rest for key
Folding– Partition and combine
Modular ArithmeticPerfect Hash Function
Kruse/Ryba ch09 32
int hash(const Key &target){ int value = 0; for (int position = 0; position < 8; position++) value = 4 * value + target.key_letter(position); return value % hash_size;}
C++ Example
Kruse/Ryba ch09 33
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Kruse/Ryba ch09 34
Collision Resolution
Linear Probing– Clustering
RehashingIncrement FunctionsQuadratic Probing– h+i2
Key-Dependent Increments– Increment = (int)the_data.key_letter(0);
Random Probing
Kruse/Ryba ch09 35
Error_code Hash_table::insert(const Record &new_entry){ Error_code result = success; int probe_count, // be sure that table is not full. increment, // Increment used for quadratic probing. probe; // Position currently probed Key null; // Null key for comparison purposes. null.make_blank(); probe = hash(new_entry); probe_count = 0; increment = 1; while (table[probe] != null // Is the location empty? && table[probe] != new_entry // Duplicate key? && probe_count < (hash_size + 1) / 2) {// Has overflow occurred? probe_count++; probe = (probe + increment) % hash_size; increment += 2; // Prepare increment for next iteration. } if (table[probe] == null) table[probe] = new_entry; else if(table[probe] == new_entry) result = duplicate_error; else result = overflow; // The table is full. return result;}
Kruse/Ryba ch09 36
Collision Resolution with Buckets0
1
2
Kruse/Ryba ch09 37
Collision Resolution by Chaining
Kruse/Ryba ch09 38
Collision Resolution by Chaining
Advantages– Saving of space– Simple, efficient collision handling– Size of hash table does not need to exceed the
number of records– Deletion becomes quick and easy
Disadvantage– Links require space
Kruse/Ryba ch09 39
Theoretical ComparisonLoad factor 0.10 0.50 0.80 0.90 0.99 1.00
Successful search, expected number of probes:
Chaining 1.05 1.25 1.40 1.45 1.50 2.00
Open, random probes 1.05 1.40 2.0 2.6 4.6 -----
Open, linear probes 1.06 1.50 3.0 5.5 50.5 -------
Kruse/Ryba ch09 40
Theoretical ComparisonLoad factor 0.10 0.50 0.80 0.90 0.99 2.00
Unsuccessful search, expected number of probes:
Chaining 0.10 0.50 0.80 0.90 0.99 2.00
Open, random probes 1.1 2.00 5.0 10.0 100 -----
Open, linear probes 1.12 2.50 13. 50. 5000 -------
Kruse/Ryba ch09 41
Empirical ComparisonLoad factor 0.10 0.50 0.80 0.90 0.99 2.00
Successful search, expected number of probes:
Chaining 1.04 1.2 1.4 1.4 1.59 2.00
Open, quadratic probes 1.04 1.50 2.1 2.7 5.2 -----
Open, linear probes 1.05 1.60 3.4. 6.2 21.3 -------
Kruse/Ryba ch09 42
Empirical ComparisonLoad factor 0.10 0.50 0.80 0.90 0.99 2.00
Unsuccessful search, expected number of probes:
Chaining 0.10 0.50 0.80 0.90 0.99 2.00
Open, quadratic probes 1.13 2.20 5.2 11.9 126. -----
Open, linear probes 1.13 2.70 15.4. 59.8 430. -------
Kruse/Ryba ch09 43
(1). is retrieval table-Hash
).log( issearch Binary
).( issearch Sequential
n
n
Highlights
Kruse/Ryba ch09 44
Chapter 9 - The End