Upload
leslie-harris
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Today’s Goals
Look at how Dictionarys used in real world Where this would occur & why they are
used there In real world setting, what problems can/do
occur Indexed file usage presented and
shown How & why we split index & data files Formatting of each file and how they get
used Describe what problems solved using
indexed files Java coding techniques that simplify using
these files Idea needed when using multiple
indexes shown
Dictionaries in Real World
Often need large database on many machines Split search terms across machines Updating & searching work split between
machines Database way too large for any single
machine If you think about it, this is incredibly
common Where?
Splitting Keys From Values
In real world, we often have many indices Simple units measure where we can find
values Values could be searched for in multiple
ways
Splitting Keys From Values
In real world, we often have many indices Simple units measure where we can find
values Values could be searched for in multiple
ways
Index & Data Files
Split information into two (or more) files Data file uses fixed-size records to store
data Index files contain search terms & data
locations Fixed-size records usually used in data
file Each record will use exactly that much
space Extra space wasted if the value is smaller But limits data size, cannot get more space Makes it far easier to reuse space &
rebuild index
Index File Format
No standard format – depends on type of data Often variable sized, but this not specific
requirement Each entry in index file begins with exact
search term Followed by position containing matching
data As a result, often find indexes smushed
together Can read indexes at start of program
execution Reasonably assumes index file smaller than
data file Changes written immediately, however
When program starts, do NOT read data file
Indexed Files
Enables splitting search terms across computers Alphabetical split searches faster on many
serversA - C
D-E
F-HI-P
Q-R
S-T
U-X Y-Z
Indexed Files
Enables splitting search terms across computers Create indexes for different types of
searchingSong name
SongLength
How Does This Work?
Using index files simplified using positions Look in index structure to find position of
data in file With this position can then seek to specific
record Create instance & initialize by reading data
from file
Starting with Indexed Files
American Telephone & Telegraph 112
International Business Machines
0
Ford Motorcars, Inc. 224
IBM106
IBM
AT & T 23 T Ford 2 F
F 224
IBM 0
T 112
Where Was "Searching" Used?
Indexed files used in Maps and Dictionarys Read data into searchable object after
opening file For each record, Entry uses indexed data as
its key Single data file has multiple indexes to
search it Not a problem, each index has own Collection
Cannot have multiple instances for each data item
Cannot have single instance for each data item
Then how can we construct each Entry's value?
Proxy Pattern For The Win!
Create proxy instances to use as Entry's value Proxy pretends has data by defining getters
& setters Data's position & file only fields these
objects have Whenever method called looks up &
returns data Other classes will think proxy has fields
declared Simplifies using class & ensures up-to-date
data used But little memory needed, since data
resides on disk!
Starting with Indexed Files
American Telephone & Telegraph 112
International Business Machines
0
Ford Motorcars, Inc. 224
IBM106
IBM
AT & T 23 T
F 224
IBM 0
T 112
Ford 12 F
Coding
public class Stock {private static final int NAME_OFF = 0;private static final int NAME_SZ = 50;private static final int PRC_OFF=NAME_OFF + NAME_SZ;private static final int PRC_SZ = 4;private static final int TICK_OFF = PRC_OFF + PRC_SZ;private static final int TICK_SZ = 6;private static final int SIZE = TICK_OFF + TICK_SZ;
private long position;private RandomAccessFile theFile;
public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file;}
Coding
public class Stock {private static final int NAME_OFF = 0; private static final int NAME_SZ = 50;private static final int PRC_OFF=NAME_OFF + NAME_SZ;private static final int PRC_SZ = 4;private static final int TICK_OFF = PRC_OFF + PRC_SZ;private static final int TICK_SZ = 6;private static final int SIZE = TICK_OFF + TICK_SZ;
private long position;private RandomAccessFile theFile;
public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file;}
Fixed max. sizeof each field
Fixed size of a record in data file
Coding
public class Stock {private static final int NAME_OFF = 0;private static final int NAME_SZ = 50;private static final int PRC_OFF=NAME_OFF + NAME_SZ;private static final int PRC_SZ = 4;private static final int TICK_OFF = PRC_OFF + PRC_SZ;private static final int TICK_SZ = 6;private static final int SIZE = TICK_OFF + TICK_SZ;
private long position;private RandomAccessFile theFile;
public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file;}
Offset in recordto field start
Coding
public class Stock { // Continues from last time
public int getStockPrice() { theFile.seek(position + PRC_OFF); return theFile.readInt();}public void setStockPrice(int price) { theFile.seek(position + PRC_OFF); theFile.writeInt(price);}public void setTickerSymbol(String sym) { theFile.seek(position + TICK_OFFSET); theFile.writeUTF(sym);}// More getters & setters from here…
Visualizing Indexed Files
American Telephone & Telegraph 112
International Business Machines
0
Ford Motorcars, Inc. 224
F 224
IBM 0
T 112
IBM106
IBM
AT & T 23 T Ford 12 F
How Do We Add Data?
Adding new records takes only a few steps Add space for record with setLength on
data file Update index structure(s) to include new
record Records in data file updated at each
change
Adding New Data To The Files
C 336
F 224
IBM 0
T 112
0 Ø
American Telephone & Telegraph 112
Citibank 336
International Business Machines
0
Ford Motorcars, Inc. 224
IBM106
IBM
AT & T 23 T Ford 12 F
Adding New Data To The Files
C 336
F 224
IBM 0
T 112
Citibank -2 C
American Telephone & Telegraph 112
Citibank 336
International Business Machines
0
Ford Motorcars, Inc. 224
IBM106
IBM
AT & T 23 T Ford 12 F
How Does This Work?
Removing records even easier To prevent using record, remove items from
indexes Do NOT update index file(s) until program
completes Use impossible magic numbers for record in
data file
Removing Data As We Go
C 336
F 224
IBM 0
T 112
American Telephone & Telegraph 112
Citibank 336
International Business Machines
0
Ford Motorcars, Inc. 224
Citibank -2 CIBM106
IBM
AT & T 23 T Ford 12 F
Removing Data As We Go
C 336
IBM 0
T 112
American Telephone & Telegraph 112
Citibank 336
International Business Machines
0
Citibank -2 CIBM106
IBM
AT & T 23 T 0 Ø
Using Multiple Indexes
Multiple indexes for data file very often needed Provides many ways of searching for
important data Since file read individually could also create
problem Multiple proxy instances for data could
be created Duplicates of instance are created for each
index Makes removing them all difficult, since not
linked Very easy to solve: use Map while loading
index Converts positions in file to proxy instances
to solve this
Linking Multiple Indexes
Use one Map instance while reading all indexes For each position in file, check if already in Map
Use existing proxy instance, if position already in Map
If a search in Map returns null, create new instance
Make sure to call put() when we must create proxy
For Next Lecture
Angel now has week #9 assignment (due 3/20) This is after break, but might want to get start now
Angel will also have project #2 available Has staggered submissions like previous project Based upon index files, so can start working now!
Will discuss implementing space efficient BST Start coloring nodes red & black Keeps balanced, but limits amount of movement