Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

Maps

Finding Information

Immediately

© schmiedecke 07 Inf2-7-Trees 2

What is a Map?

• Associative data structure

• Complex information:

- record of data

- unique key value for identification

- e.g. number plate of a car

• Task:

- Store data so that it canbe retrieved by means of the key:

Who is the owner of 'HZ-AX 1'?


Associative Structures

�Associative data structure:- Retrieval by content

rather than by index

- Structure of a key plus information ("payload")

- hide physical data structure(storage concept)

• It's all about information retrieval.

• How to implement an efficient lookup?

Smith, Peter

* 7.7.97Smith, Peter



* 7.7.97


Example: Electronic Patient Records

• Patient Records

• Kept in a data base to efficiently find, compare and evaluate patient information

• So a patient record would be a line in a database table

• Unique, identifying key: insurance number


Natural Associative Data Structures

• dictionaries

• symbol tables

• administrative structures

• lists of utilities (hotels, tools, lectures)

• guest lists

• ....

• you can store them in an array or list, BUT

• in all of these cases, indeces would "feel artificial"

• so you should at least hide them (ADT)


Basic Map Operations

• A map entry is a pair (key, content)

• e.g. (Benni, 0177-7777777)

• so they basic operations are

- add(key, content)

- retrieve(key) – returns content

- remove(key) – may return content

- contains(key) – returns boolean

• i.e., as long as you have the key, you can retrieve the content.


(Simple) Associative Table Interface

public interface AssociativeTable {

public boolean contains(Object key, Object val);

public Object get(Object key);

pulic void add(Object key, Object val);

public Object remove(Object key);

}

For multiple entries (synonyms in a symbol table),

- Functions for navigating in synonym list.

- Generalize as an Iterator object:

public Iterator keyIterator();

// defines next(), previous(), first(), last();


Interface Map


SearchTree Implementation

• Easy: - adding a symbol (will be added as leaf)

- removing a leaf

- removing the root of an incomplete tree

• Hard - removing a middle node

public class BinNode {

Comparable key;

Object val;

BinNode left, right;

// constructors as required

}

public class SearchTree implements SymbolTable {

protected BinNode root = new BinNode(); // Empty

....

}


SearchTree: Utility Methods

/** find value node or find place to add new value node */

protected BinNode locate(BinNode root, Object value){

Object rootVal = root.val;

BinNode child;

if (rootVal.equals(value)) return root;

if (rootVal.compareTo(value) < 0) child = root.right;

else child = root.left;

if (child.isEmpty()) return root;

else return locate (child,value);

}

/** find (logical) predecessor of given node: *

* rightmost entry of left subtree */

protected BinNode predecessor(BinNode root) {

BinNode result = root.left;

while (!result.right.isEmpty()) result = result.right;

return result;

}


Implementation using SortedTrees

• Let's consider implementing a symbol table

• Keys implement Comparable, i.e. they are ordered(most keys are ordered)

• So we can use a search tree (sorted tree) for logarithmic access.

interface Comparable { public int compareTo(Object o); }

// less, equal, more �� neg, 0, pos

// ClassCastException if not comparable

interface SymbolTable {

public boolean contains(Comparable symbol);

public void add(Comparable symbol, Object value);

public Object get(Comparable symbol);

public Object remove(Comparable symbol);

}


SearchTree: contains()

public boolean contains(Object val) {

if (root.isEmpty()) return false;

return val.equals(locate(root, val).value);

}


Adding a Node

10

5 14

12 17

16 19

4 7

2

31

9

18

6

6

18

12

11

12

locatelocate

locate

for identical values, locate returns a subtree:

append new node right of predecessor


SearchTree: add()

public void add(Object value) {

BinNode newnode = new Binnode(value);

if (root.isEmpty()) root = newnode;

else {

BinNode insertpoint = locate(root, value);

if (value.compareTo(insertpoint.value) > 0){

insertpoint.right = newnode;

} else {

(if value.compareTo(insertpoint.value ==0)

newnode.left = insertpoint.left;

insertpoint.left = newnode;

}

}

}

// save left subtree, without checking for null

newnode.left = insertpoint.left;

insertpoint.left = newnode;


Removing a Node

10

5 16

12 17

19

4 7

2

31

9 1113

15

14move up solitary subtree

move up predecessor node


Root Removal(General Case)

Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003


SearchTree: remove()

/** to be used for node removal

* yields new subtree with top removed */

protected Object removeTop(BinNode top) {

BinNode topleft = top.left, topright = top.right;

top.left = null; top.right = null; // disconnect

if (topleft.isEmpty()) return topright;

if (topright.isEmpty()) return topleft;

BinNode pred = topleft.right;

if (pred.isempty())

{ topleft.right = topright; return topleft; }

Binnode parent = topleft;

while (!pred.right.isEmpty())

{ parent = pred; pred = pred.right; }

parent.right = pred.left;

pred.left = topleft;

pred.right = topright;

return pred;

}


Splay Trees

• AVM-Trees: balancing criterion requires structure analysis.

• Efficient heuristical alternative: Splay Trees

• invented around 1965 by Daniel Sleator and Robert Tarjan.

• always make new node root:

- insert in proper place as leaf

- move upward by successive "rotations"

• Splaying keeps tree "balanced on average"

• Saves structure analysis and bookkeeping

• Nearly logarithmic adding and lookup complexity

• Splaying after every access operation keeps recently used elements near the root � statistically relevant efficiency gain!


Rotating operations (pattern)



Splay Rotation



Rotation Implementation

protected void rotateRight() {

BinNode parent = this.parent;

BinNode newroot = this.left;

boolean wasChild = parent!=null;

boolean wasLeftChild = isLeftchild();

this.setLeft(newroot.right); // updating parent

newroot.setRight(this);

if (this.wasChild())

{ if (this.wasLeftChild()) parent.setLeft(newroot);

else parent.setRight(newroot);

}

}


Quietly Use Splay Trees?

• Watch out for equality:

• When are two SearchTrees equal?

• Problem: Hidden restructuring

• General rule:

- Define equality representation-independant!

- You can base it on the linearization using the Iterator


Why Use a Sorted Tree for a Symbol Table?

• Characteristics:

- frequent additions and removals

- frequent lookups

• Obviously a good idea to keep the table sorted at low cost

� use Sorted Tree

- Preferably use Splay Tree or AVM tree– differ only in add method implementation.

• So, a tree is fine – but what about a simple list?

- keeping it ordered required moving trunks of data

- only acceptable, if modifications are infrequent.

• Are there alternatives to trees and lists?


Arbitrary (non-ordered) Keys

• True (and realistic) Associativity ☺

• E.g. Fingerprints, Biometrics, Pictures, Sounds, Toys, CDs,...

• Mapping:

- left unique relation

- i.e. unique right value for every left value

• Associative Table can be considered a mapping from key to value.

- usually a unique mapping

• For data retrieval, we need a second mapping:

- from key to storage – i.e. how do we find the key?

- with ordered keys, the order provides the mapping (index, tree pos.)

- but with arbitrary keys?


Hash Maps

• Time of entry provides an immediate order.

• Can be used as a mapping: simply add entries to a List as they turn up.

• Mapping cannot be determined from the key, i.e. it is not a computable mapping.

• But linear search is unacceptable!

• Find a computable mapping from keys to order(Chinese Letters: lookup table to determine order)

• Let the computer compute the mapping

• Better: find an computable immediate mapping from key to index

• � hash code

• Results in constant access time! O(1)

ingenious solution!


Hashing is like an personal ordering scheme....

"John, where is the torque screw driver?"

"Just where it belongs, of course!"

"And where does it belong?"

"In the gloves drawer of course!"

"And WHY there???"

"Because I use it as rarely as my gloves, of course. Got it?"

In other words:

The hashing scheme needs not be logical, as long as it works.

So you may use the second character or the word length or the sum of the character values, whatever...


Example: 26 Slots

Hashing color names

of antique glass:

hash code is index of first letter

a) data entry:

names are hashed into the

first available slot,

possibly after rehashing.

b) data lookup:

identical process to entry.



Hash Clashes

• What if the hash slot is already full?

• Open Addressing:- Rehash by adding an offset and try again

- Constant offset

- Double Hashing: computed offset.

• Problem: clustering

• Best distribution with prime length tables

• External Chaining- let each entry be the head of a chain of hash-isomorphic entries


26 Slots: Deleting Entries

a) Deletion of entry

leaves a shaded reserved cell as

placeholder.

b) A reserved cell is considered empty

during entry and full

during lookup.marigold?



26 Slots: Clustering

a) Primary clustering:

Two values with identical hash code compete during rehashing

b) Secondary Clustering:

Two values with originally different hash code compete during

rehashing



26 Slots: Double Hashing

• Double hashing uses a

second hash function

to determine the rehashing offset

• In this case, the

second letter of the name is used,

requiring at least two-

letter names.

Graphics according to D.Baily,

Java Structures, Mc Graw Hill 2003


Locate() with Open Addressingpublic interface AssociativeTable {

public boolean contains(Comparable key, Object val);

public Object get(Comparable key);

pulic void add(Comparable key, Object val);

public Object remove(Comparable key);

}

public class HashTable implements AssociativeTable {

protected Object[] keys = new Object[capacity]; // translation

protected Object[] values = new Object[capacity]; // tables

protected int locate(Object key) {

int hash = key.hashcode() % capacity;

int firstReserved = NONE; // no reserved cell found yet

while (keys[hash] != null)

{ if (key.equals(keys[hash]) return hash;

// remember first reserved slot

if (keys[hash]==RESERVED)

{ if (firstReserved==NONE) firstReserved = hash; }

hash = rehash(hash); // e.g. (hash+1)%capacity

}

// return first empty slot available

if (firstReserved==NONE) return hash;

else return firstReserved;

}


26 Slots: External Chaining

Use each entry as

chain head for hash

isomorphic entries

amber aubergine

dusk

mandarine mauve

violet


Hash Functions

• Using the index of the first letter is a simple hash function.

• Is it a good one?- criterion: even distribution over table

- so, not so good...

- but prime table length helps.

• Better hash functions on Strings:- sum of all letter indeces modulo length

- sum of weighted characters (higher powers of two)

- sum of weighted selected characters.

• In Java, hashCode() is an Object function- override it for your own types, if you like.


Hash Distribution

Hash frequency of

dictionary words:

• list size:

997 "buckets"

• hash function: character sum



Hash Distribution

Hash frequency of dictionary words:

• list size: 997 "buckets"

• hash function: weighted character sum using powersof 2



Hash Distribution

Hash frequency of dictionary words:

• list size: 997 "buckets"

• hash function: weighted character sum using powers of 256



Hash Distribution

Hash frequency of

dictionary words:

• list size:

997 "buckets"

• hash function: Java String

hashCode()



Hash Map Performance



Resizing a Hash Map

• Performance depends on load factor.

• Resize at load factor between 60% and 70%.

• Resizing requires rehashing all entries.

- but you can traverse the list linearly.

• Best strategy: double size.


Java Hash Code

• Hash Code of an Object o:o.hashCode();

• Make sure for all new Types:x.equals(y) �� x.hashCode() == y.hashCode()

• Hash Code of a Collection:

• Sum of element hash codes!public int hashcode() {

Iterator i = this.iterator();

int result = 0,

while (i.hasNext())

{ Object o = i.next();

if (o != null) result += o.hashCode();

}

return result;

}


Maps and HashTablesin the Java Collection Framework

interface Map <K,V>

{ public boolean containsKey(K key);

public boolean containsValue(V, val);

public boolean isEmpty();

public int Size();

public V remove(K key);

}


HashSet

Special case HashSet:

• Simple, unstructured content

• instead of an association (key, content)

• content immediately used as key

Same retrieval properties as HashMap:

• constant access time

• no order required on keys (i.e. content)


TreeMap

Associations with ordered keys:

• can also be implemented as ordered lists

• e.g. a SortedTree

Choose implementation according to usage

• Use a TreeMap, if

- the oder is important (you need ordered iterators)

- deletions are frequent

• Use a HashMap, if

- the order is of little relevance

- you need very fast access

- deletions are less frequent

So So much for much for Information Retrieval...Information Retrieval...

Let's Let's finish up on finish up on sortingsorting

before we before we start start climbing graphsclimbing graphs........

Documents

Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification