45
Maps Finding Information Immediately

Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

Maps

Finding Information

Immediately

Page 2: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 2

What is a Map?

• Associative data structure

• Complex information:

- record of data

- unique key value for identification

- e.g. number plate of a car

• Task:

- Store data so that it canbe retrieved by means of the key:

Who is the owner of 'HZ-AX 1'?

Page 3: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 3

Associative Structures

�Associative data structure:- Retrieval by content

rather than by index

- Structure of a key plus information ("payload")

- hide physical data structure(storage concept)

• It's all about information retrieval.

• How to implement an efficient lookup?

Smith, Peter

* 7.7.97Smith, Peter

* 7.7.97Smith, Peter

* 7.7.97Smith, Peter

* 7.7.97

Page 4: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 4

Example: Electronic Patient Records

• Patient Records

• Kept in a data base to efficiently find, compare and evaluate patient information

• So a patient record would be a line in a database table

• Unique, identifying key: insurance number

Page 5: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 5

Natural Associative Data Structures

• dictionaries

• symbol tables

• administrative structures

• lists of utilities (hotels, tools, lectures)

• guest lists

• ....

• you can store them in an array or list, BUT

• in all of these cases, indeces would "feel artificial"

• so you should at least hide them (ADT)

Page 6: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 6

Basic Map Operations

• A map entry is a pair (key, content)

• e.g. (Benni, 0177-7777777)

• so they basic operations are

- add(key, content)

- retrieve(key) – returns content

- remove(key) – may return content

- contains(key) – returns boolean

• i.e., as long as you have the key, you can retrieve the content.

Page 7: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 7

(Simple) Associative Table Interface

public interface AssociativeTable {

public boolean contains(Object key, Object val);

public Object get(Object key);

pulic void add(Object key, Object val);

public Object remove(Object key);

}

For multiple entries (synonyms in a symbol table),

- Functions for navigating in synonym list.

- Generalize as an Iterator object:

public Iterator keyIterator();

// defines next(), previous(), first(), last();

Page 8: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 8

Interface Map

Page 9: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 9

SearchTree Implementation

• Easy: - adding a symbol (will be added as leaf)

- removing a leaf

- removing the root of an incomplete tree

• Hard - removing a middle node

public class BinNode {

Comparable key;

Object val;

BinNode left, right;

// constructors as required

}

public class SearchTree implements SymbolTable {

protected BinNode root = new BinNode(); // Empty

....

}

Page 10: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 10

SearchTree: Utility Methods

/** find value node or find place to add new value node */

protected BinNode locate(BinNode root, Object value){

Object rootVal = root.val;

BinNode child;

if (rootVal.equals(value)) return root;

if (rootVal.compareTo(value) < 0) child = root.right;

else child = root.left;

if (child.isEmpty()) return root;

else return locate (child,value);

}

/** find (logical) predecessor of given node: *

* rightmost entry of left subtree */

protected BinNode predecessor(BinNode root) {

BinNode result = root.left;

while (!result.right.isEmpty()) result = result.right;

return result;

}

Page 11: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 11

Implementation using SortedTrees

• Let's consider implementing a symbol table

• Keys implement Comparable, i.e. they are ordered(most keys are ordered)

• So we can use a search tree (sorted tree) for logarithmic access.

interface Comparable { public int compareTo(Object o); }

// less, equal, more ���� neg, 0, pos

// ClassCastException if not comparable

interface SymbolTable {

public boolean contains(Comparable symbol);

public void add(Comparable symbol, Object value);

public Object get(Comparable symbol);

public Object remove(Comparable symbol);

}

Page 12: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 12

SearchTree: contains()

public boolean contains(Object val) {

if (root.isEmpty()) return false;

return val.equals(locate(root, val).value);

}

Page 13: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 13

Adding a Node

10

5 14

12 17

16 19

4 7

2

31

9

18

6

6

18

12

11

12

locatelocate

locate

for identical values, locate returns a subtree:

append new node right of predecessor

Page 14: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 14

SearchTree: add()

public void add(Object value) {

BinNode newnode = new Binnode(value);

if (root.isEmpty()) root = newnode;

else {

BinNode insertpoint = locate(root, value);

if (value.compareTo(insertpoint.value) > 0){

insertpoint.right = newnode;

} else {

(if value.compareTo(insertpoint.value ==0)

newnode.left = insertpoint.left;

insertpoint.left = newnode;

}

}

}

// save left subtree, without checking for null

newnode.left = insertpoint.left;

insertpoint.left = newnode;

Page 15: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 15

Removing a Node

10

5 16

12 17

19

4 7

2

31

9 1113

15

14move up solitary subtree

move up predecessor node

Page 16: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 16

Root Removal(General Case)

Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003

Page 17: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 17

SearchTree: remove()

/** to be used for node removal

* yields new subtree with top removed */

protected Object removeTop(BinNode top) {

BinNode topleft = top.left, topright = top.right;

top.left = null; top.right = null; // disconnect

if (topleft.isEmpty()) return topright;

if (topright.isEmpty()) return topleft;

BinNode pred = topleft.right;

if (pred.isempty())

{ topleft.right = topright; return topleft; }

Binnode parent = topleft;

while (!pred.right.isEmpty())

{ parent = pred; pred = pred.right; }

parent.right = pred.left;

pred.left = topleft;

pred.right = topright;

return pred;

}

Page 18: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 18

Splay Trees

• AVM-Trees: balancing criterion requires structure analysis.

• Efficient heuristical alternative: Splay Trees

• invented around 1965 by Daniel Sleator and Robert Tarjan.

• always make new node root:

- insert in proper place as leaf

- move upward by successive "rotations"

• Splaying keeps tree "balanced on average"

• Saves structure analysis and bookkeeping

• Nearly logarithmic adding and lookup complexity

• Splaying after every access operation keeps recently used elements near the root � statistically relevant efficiency gain!

Page 19: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 19

Rotating operations (pattern)

Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003

Page 20: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 20

Splay Rotation

Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003

Page 21: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 21

Rotation Implementation

protected void rotateRight() {

BinNode parent = this.parent;

BinNode newroot = this.left;

boolean wasChild = parent!=null;

boolean wasLeftChild = isLeftchild();

this.setLeft(newroot.right); // updating parent

newroot.setRight(this);

if (this.wasChild())

{ if (this.wasLeftChild()) parent.setLeft(newroot);

else parent.setRight(newroot);

}

}

Page 22: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 22

Quietly Use Splay Trees?

• Watch out for equality:

• When are two SearchTrees equal?

• Problem: Hidden restructuring

• General rule:

- Define equality representation-independant!

- You can base it on the linearization using the Iterator

Page 23: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 23

Why Use a Sorted Tree for a Symbol Table?

• Characteristics:

- frequent additions and removals

- frequent lookups

• Obviously a good idea to keep the table sorted at low cost

� use Sorted Tree

- Preferably use Splay Tree or AVM tree– differ only in add method implementation.

• So, a tree is fine – but what about a simple list?

- keeping it ordered required moving trunks of data

- only acceptable, if modifications are infrequent.

• Are there alternatives to trees and lists?

Page 24: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 24

Arbitrary (non-ordered) Keys

• True (and realistic) Associativity ☺

• E.g. Fingerprints, Biometrics, Pictures, Sounds, Toys, CDs,...

• Mapping:

- left unique relation

- i.e. unique right value for every left value

• Associative Table can be considered a mapping from key to value.

- usually a unique mapping

• For data retrieval, we need a second mapping:

- from key to storage – i.e. how do we find the key?

- with ordered keys, the order provides the mapping (index, tree pos.)

- but with arbitrary keys?

Page 25: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 25

Hash Maps

• Time of entry provides an immediate order.

• Can be used as a mapping: simply add entries to a List as they turn up.

• Mapping cannot be determined from the key, i.e. it is not a computable mapping.

• But linear search is unacceptable!

• Find a computable mapping from keys to order(Chinese Letters: lookup table to determine order)

• Let the computer compute the mapping

• Better: find an computable immediate mapping from key to index

• � hash code

• Results in constant access time! O(1)

ingenious solution!

Page 26: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 26

Hashing is like an personal ordering scheme....

"John, where is the torque screw driver?"

"Just where it belongs, of course!"

"And where does it belong?"

"In the gloves drawer of course!"

"And WHY there???"

"Because I use it as rarely as my gloves, of course. Got it?"

In other words:

The hashing scheme needs not be logical, as long as it works.

So you may use the second character or the word length or the sum of the character values, whatever...

Page 27: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 27

Example: 26 Slots

Hashing color names

of antique glass:

hash code is index of first letter

a) data entry:

names are hashed into the

first available slot,

possibly after rehashing.

b) data lookup:

identical process to entry.

Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003

Page 28: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 28

Hash Clashes

• What if the hash slot is already full?

• Open Addressing:- Rehash by adding an offset and try again

- Constant offset

- Double Hashing: computed offset.

• Problem: clustering

• Best distribution with prime length tables

• External Chaining- let each entry be the head of a chain of hash-isomorphic entries

Page 29: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 29

26 Slots: Deleting Entries

a) Deletion of entry

leaves a shaded reserved cell as

placeholder.

b) A reserved cell is considered empty

during entry and full

during lookup.marigold?

Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003

Page 30: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 30

26 Slots: Clustering

a) Primary clustering:

Two values with identical hash code compete during rehashing

b) Secondary Clustering:

Two values with originally different hash code compete during

rehashing

Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003

Page 31: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 31

26 Slots: Double Hashing

• Double hashing uses a

second hash function

to determine the rehashing offset

• In this case, the

second letter of the name is used,

requiring at least two-

letter names.

Graphics according to D.Baily,

Java Structures, Mc Graw Hill 2003

Page 32: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 32

Locate() with Open Addressingpublic interface AssociativeTable {

public boolean contains(Comparable key, Object val);

public Object get(Comparable key);

pulic void add(Comparable key, Object val);

public Object remove(Comparable key);

}

public class HashTable implements AssociativeTable {

protected Object[] keys = new Object[capacity]; // translation

protected Object[] values = new Object[capacity]; // tables

protected int locate(Object key) {

int hash = key.hashcode() % capacity;

int firstReserved = NONE; // no reserved cell found yet

while (keys[hash] != null)

{ if (key.equals(keys[hash]) return hash;

// remember first reserved slot

if (keys[hash]==RESERVED)

{ if (firstReserved==NONE) firstReserved = hash; }

hash = rehash(hash); // e.g. (hash+1)%capacity

}

// return first empty slot available

if (firstReserved==NONE) return hash;

else return firstReserved;

}

Page 33: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 33

26 Slots: External Chaining

Use each entry as

chain head for hash

isomorphic entries

amber aubergine

dusk

mandarine mauve

violet

Page 34: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 34

Hash Functions

• Using the index of the first letter is a simple hash function.

• Is it a good one?- criterion: even distribution over table

- so, not so good...

- but prime table length helps.

• Better hash functions on Strings:- sum of all letter indeces modulo length

- sum of weighted characters (higher powers of two)

- sum of weighted selected characters.

• In Java, hashCode() is an Object function- override it for your own types, if you like.

Page 35: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 35

Hash Distribution

Hash frequency of

dictionary words:

• list size:

997 "buckets"

• hash function: character sum

Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003

Page 36: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 36

Hash Distribution

Hash frequency of dictionary words:

• list size: 997 "buckets"

• hash function: weighted character sum using powersof 2

Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003

Page 37: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 37

Hash Distribution

Hash frequency of dictionary words:

• list size: 997 "buckets"

• hash function: weighted character sum using powers of 256

Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003

Page 38: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 38

Hash Distribution

Hash frequency of

dictionary words:

• list size:

997 "buckets"

• hash function: Java String

hashCode()

Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003

Page 39: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 39

Hash Map Performance

Graphics according to D.Baily, Java Structures, Mc Graw Hill 2003

Page 40: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 40

Resizing a Hash Map

• Performance depends on load factor.

• Resize at load factor between 60% and 70%.

• Resizing requires rehashing all entries.

- but you can traverse the list linearly.

• Best strategy: double size.

Page 41: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 41

Java Hash Code

• Hash Code of an Object o:o.hashCode();

• Make sure for all new Types:x.equals(y) ���� x.hashCode() == y.hashCode()

• Hash Code of a Collection:

• Sum of element hash codes!public int hashcode() {

Iterator i = this.iterator();

int result = 0,

while (i.hasNext())

{ Object o = i.next();

if (o != null) result += o.hashCode();

}

return result;

}

Page 42: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 42

Maps and HashTablesin the Java Collection Framework

interface Map <K,V>

{ public boolean containsKey(K key);

public boolean containsValue(V, val);

public boolean isEmpty();

public int Size();

public V remove(K key);

}

Page 43: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 43

HashSet

Special case HashSet:

• Simple, unstructured content

• instead of an association (key, content)

• content immediately used as key

Same retrieval properties as HashMap:

• constant access time

• no order required on keys (i.e. content)

Page 44: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

© schmiedecke 07 Inf2-7-Trees 44

TreeMap

Associations with ordered keys:

• can also be implemented as ordered lists

• e.g. a SortedTree

Choose implementation according to usage

• Use a TreeMap, if

- the oder is important (you need ordered iterators)

- deletions are frequent

• Use a HashMap, if

- the order is of little relevance

- you need very fast access

- deletions are less frequent

Page 45: Finding Information Immediately© schmiedecke 07 Inf2-7-Trees 2 What is a Map? • Associative data structure • Complex information: - record of data - unique key value for identification

So So much for much for Information Retrieval...Information Retrieval...

Let's Let's finish up on finish up on sortingsorting

before we before we start start climbing graphsclimbing graphs........