Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Maps
Finding Information
Immediately
© schmiedecke 07 Inf2-7-Trees 2
What is a Map?
• Associative data structure
• Complex information:
- record of data
- unique key value for identification
- e.g. number plate of a car
• Task:
- Store data so that it canbe retrieved by means of the key:
Who is the owner of 'HZ-AX 1'?
© schmiedecke 07 Inf2-7-Trees 3
Associative Structures
�Associative data structure:- Retrieval by content
rather than by index
- Structure of a key plus information ("payload")
- hide physical data structure(storage concept)
• It's all about information retrieval.
• How to implement an efficient lookup?
Smith, Peter
* 7.7.97Smith, Peter
* 7.7.97Smith, Peter
* 7.7.97Smith, Peter
* 7.7.97
© schmiedecke 07 Inf2-7-Trees 4
Example: Electronic Patient Records
• Patient Records
• Kept in a data base to efficiently find, compare and evaluate patient information
• So a patient record would be a line in a database table
• Unique, identifying key: insurance number
© schmiedecke 07 Inf2-7-Trees 5
Natural Associative Data Structures
• dictionaries
• symbol tables
• administrative structures
• lists of utilities (hotels, tools, lectures)
• guest lists
• ....
• you can store them in an array or list, BUT
• in all of these cases, indeces would "feel artificial"
• so you should at least hide them (ADT)
© schmiedecke 07 Inf2-7-Trees 6
Basic Map Operations
• A map entry is a pair (key, content)
• e.g. (Benni, 0177-7777777)
• so they basic operations are
- add(key, content)
- retrieve(key) – returns content
- remove(key) – may return content
- contains(key) – returns boolean
• i.e., as long as you have the key, you can retrieve the content.
© schmiedecke 07 Inf2-7-Trees 7
(Simple) Associative Table Interface
public interface AssociativeTable {
public boolean contains(Object key, Object val);
public Object get(Object key);
pulic void add(Object key, Object val);
public Object remove(Object key);
}
For multiple entries (synonyms in a symbol table),
- Functions for navigating in synonym list.
- Generalize as an Iterator object:
public Iterator keyIterator();
// defines next(), previous(), first(), last();
© schmiedecke 07 Inf2-7-Trees 8
Interface Map
© schmiedecke 07 Inf2-7-Trees 9
SearchTree Implementation
• Easy: - adding a symbol (will be added as leaf)
- removing a leaf
- removing the root of an incomplete tree
• Hard - removing a middle node
public class BinNode {
Comparable key;
Object val;
BinNode left, right;
// constructors as required
}
public class SearchTree implements SymbolTable {
protected BinNode root = new BinNode(); // Empty
....
}
© schmiedecke 07 Inf2-7-Trees 10
SearchTree: Utility Methods
/** find value node or find place to add new value node */
protected BinNode locate(BinNode root, Object value){
Object rootVal = root.val;
BinNode child;
if (rootVal.equals(value)) return root;
if (rootVal.compareTo(value) < 0) child = root.right;
else child = root.left;
if (child.isEmpty()) return root;
else return locate (child,value);
}
/** find (logical) predecessor of given node: *
* rightmost entry of left subtree */
protected BinNode predecessor(BinNode root) {
BinNode result = root.left;
while (!result.right.isEmpty()) result = result.right;
return result;
}
© schmiedecke 07 Inf2-7-Trees 11
Implementation using SortedTrees
• Let's consider implementing a symbol table
• Keys implement Comparable, i.e. they are ordered(most keys are ordered)
• So we can use a search tree (sorted tree) for logarithmic access.
interface Comparable { public int compareTo(Object o); }
// less, equal, more ���� neg, 0, pos
// ClassCastException if not comparable
interface SymbolTable {
public boolean contains(Comparable symbol);
public void add(Comparable symbol, Object value);
public Object get(Comparable symbol);
public Object remove(Comparable symbol);
}
© schmiedecke 07 Inf2-7-Trees 12
SearchTree: contains()
public boolean contains(Object val) {
if (root.isEmpty()) return false;
return val.equals(locate(root, val).value);
}
© schmiedecke 07 Inf2-7-Trees 13
Adding a Node
10
5 14
12 17
16 19
4 7
2
31
9
18
6
6
18
12
11
12
locatelocate
locate
for identical values, locate returns a subtree:
append new node right of predecessor
© schmiedecke 07 Inf2-7-Trees 14
SearchTree: add()
public void add(Object value) {
BinNode newnode = new Binnode(value);
if (root.isEmpty()) root = newnode;
else {
BinNode insertpoint = locate(root, value);
if (value.compareTo(insertpoint.value) > 0){
insertpoint.right = newnode;
} else {
(if value.compareTo(insertpoint.value ==0)
newnode.left = insertpoint.left;
insertpoint.left = newnode;
}
}
}
// save left subtree, without checking for null
newnode.left = insertpoint.left;
insertpoint.left = newnode;
© schmiedecke 07 Inf2-7-Trees 15
Removing a Node
10
5 16
12 17
19
4 7
2
31
9 1113
15
14move up solitary subtree
move up predecessor node
© schmiedecke 07 Inf2-7-Trees 16
Root Removal(General Case)
Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003
© schmiedecke 07 Inf2-7-Trees 17
SearchTree: remove()
/** to be used for node removal
* yields new subtree with top removed */
protected Object removeTop(BinNode top) {
BinNode topleft = top.left, topright = top.right;
top.left = null; top.right = null; // disconnect
if (topleft.isEmpty()) return topright;
if (topright.isEmpty()) return topleft;
BinNode pred = topleft.right;
if (pred.isempty())
{ topleft.right = topright; return topleft; }
Binnode parent = topleft;
while (!pred.right.isEmpty())
{ parent = pred; pred = pred.right; }
parent.right = pred.left;
pred.left = topleft;
pred.right = topright;
return pred;
}
© schmiedecke 07 Inf2-7-Trees 18
Splay Trees
• AVM-Trees: balancing criterion requires structure analysis.
• Efficient heuristical alternative: Splay Trees
• invented around 1965 by Daniel Sleator and Robert Tarjan.
• always make new node root:
- insert in proper place as leaf
- move upward by successive "rotations"
• Splaying keeps tree "balanced on average"
• Saves structure analysis and bookkeeping
• Nearly logarithmic adding and lookup complexity
• Splaying after every access operation keeps recently used elements near the root � statistically relevant efficiency gain!
© schmiedecke 07 Inf2-7-Trees 19
Rotating operations (pattern)
Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003
© schmiedecke 07 Inf2-7-Trees 20
Splay Rotation
Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003
© schmiedecke 07 Inf2-7-Trees 21
Rotation Implementation
protected void rotateRight() {
BinNode parent = this.parent;
BinNode newroot = this.left;
boolean wasChild = parent!=null;
boolean wasLeftChild = isLeftchild();
this.setLeft(newroot.right); // updating parent
newroot.setRight(this);
if (this.wasChild())
{ if (this.wasLeftChild()) parent.setLeft(newroot);
else parent.setRight(newroot);
}
}
© schmiedecke 07 Inf2-7-Trees 22
Quietly Use Splay Trees?
• Watch out for equality:
• When are two SearchTrees equal?
• Problem: Hidden restructuring
• General rule:
- Define equality representation-independant!
- You can base it on the linearization using the Iterator
© schmiedecke 07 Inf2-7-Trees 23
Why Use a Sorted Tree for a Symbol Table?
• Characteristics:
- frequent additions and removals
- frequent lookups
• Obviously a good idea to keep the table sorted at low cost
� use Sorted Tree
- Preferably use Splay Tree or AVM tree– differ only in add method implementation.
• So, a tree is fine – but what about a simple list?
- keeping it ordered required moving trunks of data
- only acceptable, if modifications are infrequent.
• Are there alternatives to trees and lists?
© schmiedecke 07 Inf2-7-Trees 24
Arbitrary (non-ordered) Keys
• True (and realistic) Associativity ☺
• E.g. Fingerprints, Biometrics, Pictures, Sounds, Toys, CDs,...
• Mapping:
- left unique relation
- i.e. unique right value for every left value
• Associative Table can be considered a mapping from key to value.
- usually a unique mapping
• For data retrieval, we need a second mapping:
- from key to storage – i.e. how do we find the key?
- with ordered keys, the order provides the mapping (index, tree pos.)
- but with arbitrary keys?
© schmiedecke 07 Inf2-7-Trees 25
Hash Maps
• Time of entry provides an immediate order.
• Can be used as a mapping: simply add entries to a List as they turn up.
• Mapping cannot be determined from the key, i.e. it is not a computable mapping.
• But linear search is unacceptable!
• Find a computable mapping from keys to order(Chinese Letters: lookup table to determine order)
• Let the computer compute the mapping
• Better: find an computable immediate mapping from key to index
• � hash code
• Results in constant access time! O(1)
ingenious solution!
© schmiedecke 07 Inf2-7-Trees 26
Hashing is like an personal ordering scheme....
"John, where is the torque screw driver?"
"Just where it belongs, of course!"
"And where does it belong?"
"In the gloves drawer of course!"
"And WHY there???"
"Because I use it as rarely as my gloves, of course. Got it?"
In other words:
The hashing scheme needs not be logical, as long as it works.
So you may use the second character or the word length or the sum of the character values, whatever...
© schmiedecke 07 Inf2-7-Trees 27
Example: 26 Slots
Hashing color names
of antique glass:
hash code is index of first letter
a) data entry:
names are hashed into the
first available slot,
possibly after rehashing.
b) data lookup:
identical process to entry.
Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003
© schmiedecke 07 Inf2-7-Trees 28
Hash Clashes
• What if the hash slot is already full?
• Open Addressing:- Rehash by adding an offset and try again
- Constant offset
- Double Hashing: computed offset.
• Problem: clustering
• Best distribution with prime length tables
• External Chaining- let each entry be the head of a chain of hash-isomorphic entries
© schmiedecke 07 Inf2-7-Trees 29
26 Slots: Deleting Entries
a) Deletion of entry
leaves a shaded reserved cell as
placeholder.
b) A reserved cell is considered empty
during entry and full
during lookup.marigold?
Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003
© schmiedecke 07 Inf2-7-Trees 30
26 Slots: Clustering
a) Primary clustering:
Two values with identical hash code compete during rehashing
b) Secondary Clustering:
Two values with originally different hash code compete during
rehashing
Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003
© schmiedecke 07 Inf2-7-Trees 31
26 Slots: Double Hashing
• Double hashing uses a
second hash function
to determine the rehashing offset
• In this case, the
second letter of the name is used,
requiring at least two-
letter names.
Graphics according to D.Baily,
Java Structures, Mc Graw Hill 2003
© schmiedecke 07 Inf2-7-Trees 32
Locate() with Open Addressingpublic interface AssociativeTable {
public boolean contains(Comparable key, Object val);
public Object get(Comparable key);
pulic void add(Comparable key, Object val);
public Object remove(Comparable key);
}
public class HashTable implements AssociativeTable {
protected Object[] keys = new Object[capacity]; // translation
protected Object[] values = new Object[capacity]; // tables
protected int locate(Object key) {
int hash = key.hashcode() % capacity;
int firstReserved = NONE; // no reserved cell found yet
while (keys[hash] != null)
{ if (key.equals(keys[hash]) return hash;
// remember first reserved slot
if (keys[hash]==RESERVED)
{ if (firstReserved==NONE) firstReserved = hash; }
hash = rehash(hash); // e.g. (hash+1)%capacity
}
// return first empty slot available
if (firstReserved==NONE) return hash;
else return firstReserved;
}
© schmiedecke 07 Inf2-7-Trees 33
26 Slots: External Chaining
Use each entry as
chain head for hash
isomorphic entries
amber aubergine
dusk
mandarine mauve
violet
© schmiedecke 07 Inf2-7-Trees 34
Hash Functions
• Using the index of the first letter is a simple hash function.
• Is it a good one?- criterion: even distribution over table
- so, not so good...
- but prime table length helps.
• Better hash functions on Strings:- sum of all letter indeces modulo length
- sum of weighted characters (higher powers of two)
- sum of weighted selected characters.
• In Java, hashCode() is an Object function- override it for your own types, if you like.
© schmiedecke 07 Inf2-7-Trees 35
Hash Distribution
Hash frequency of
dictionary words:
• list size:
997 "buckets"
• hash function: character sum
Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003
© schmiedecke 07 Inf2-7-Trees 36
Hash Distribution
Hash frequency of dictionary words:
• list size: 997 "buckets"
• hash function: weighted character sum using powersof 2
Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003
© schmiedecke 07 Inf2-7-Trees 37
Hash Distribution
Hash frequency of dictionary words:
• list size: 997 "buckets"
• hash function: weighted character sum using powers of 256
Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003
© schmiedecke 07 Inf2-7-Trees 38
Hash Distribution
Hash frequency of
dictionary words:
• list size:
997 "buckets"
• hash function: Java String
hashCode()
Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003
© schmiedecke 07 Inf2-7-Trees 39
Hash Map Performance
Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003
© schmiedecke 07 Inf2-7-Trees 40
Resizing a Hash Map
• Performance depends on load factor.
• Resize at load factor between 60% and 70%.
• Resizing requires rehashing all entries.
- but you can traverse the list linearly.
• Best strategy: double size.
© schmiedecke 07 Inf2-7-Trees 41
Java Hash Code
• Hash Code of an Object o:o.hashCode();
• Make sure for all new Types:x.equals(y) ���� x.hashCode() == y.hashCode()
• Hash Code of a Collection:
• Sum of element hash codes!public int hashcode() {
Iterator i = this.iterator();
int result = 0,
while (i.hasNext())
{ Object o = i.next();
if (o != null) result += o.hashCode();
}
return result;
}
© schmiedecke 07 Inf2-7-Trees 42
Maps and HashTablesin the Java Collection Framework
interface Map <K,V>
{ public boolean containsKey(K key);
public boolean containsValue(V, val);
public boolean isEmpty();
public int Size();
public V remove(K key);
}
© schmiedecke 07 Inf2-7-Trees 43
HashSet
Special case HashSet:
• Simple, unstructured content
• instead of an association (key, content)
• content immediately used as key
Same retrieval properties as HashMap:
• constant access time
• no order required on keys (i.e. content)
© schmiedecke 07 Inf2-7-Trees 44
TreeMap
Associations with ordered keys:
• can also be implemented as ordered lists
• e.g. a SortedTree
Choose implementation according to usage
• Use a TreeMap, if
- the oder is important (you need ordered iterators)
- deletions are frequent
• Use a HashMap, if
- the order is of little relevance
- you need very fast access
- deletions are less frequent
So So much for much for Information Retrieval...Information Retrieval...
Let's Let's finish up on finish up on sortingsorting
before we before we start start climbing graphsclimbing graphs........