View
15
Download
0
Category
Tags:
Preview:
DESCRIPTION
Welcome to CIS 068 !. Lesson 10: Data Structures. Overview. Description, Usage and Java-Implementation of Collections Lists Sets Hashing. Definition. Data Structures Definition ( www.nist.gov ): - PowerPoint PPT Presentation
Citation preview
CIS 068
Welcome to CIS 068 !Lesson 10:
Data Structures
CIS 068
Overview
Description, Usage and Java-Implementation of
Collections
Lists
Sets
Hashing
CIS 068
Definition
Data Structures
Definition (www.nist.gov):
“An organization of information, usually in memory, for better algorithm efficiency, such as queue, stack, linked list, heap,
dictionary, and tree, or conceptual unity, such as the name and address of a
person.”
CIS 068
Efficiency
“An organization of information …for better algorithm efficiency...”:
Isn’t the efficiency of an algorithm defined by the order of magnitude O( )?
CIS 068
Efficiency
Yes, but it is dependent on its implementation.
CIS 068
Introduction• Data structures define the structure of a
collection of data types, i.e. primitive data types or objects
• The structure provides different ways to access the data
• Different tasks need different ways to access the data
• Different tasks need different data structures
CIS 068
Introduction
Typical properties of different structures:
• fixed length / variable length
• access by index / access by iteration
• duplicate elements allowed / not allowed
CIS 068
Examples
Tasks:
• Read 300 integers
• Read an unknown number of integers
• Read 5th element of sorted collection
• Read next element of sorted collection
• Merge element at 5th position into collection
• Check if object is in collection
CIS 068
Examples
Although you can invent any datastructure you want, there are ‘classic structures‘, providing:
• Coverage of most (classic) problems
• Analysis of efficience
• Basic implementation in modern languages, like JAVA
CIS 068
Data Structures in JAVA
Let‘s see what JAVA has to offer:
CIS 068
The Collection Hierarchy
Collection: top interface, specifying requirements for all collections
CIS 068
Collection Interface
CIS 068
Collection Interface!
CIS 068
Iterator InterfacePurpose:
• Sequential access to collection elements
• Note: the so far used technique of sequentially accessing elements by sequentially indexing is not reasonable in general (why ?) !
Methods:
CIS 068
Iterator Interface
Iterator points ‘between‘ the elements of collection:
1 2 3 4 5
first position, hasNext() = true, remove() throws error
Current position (after 2 calls to next() ),
remove() deletes element 2 Position after next()
hasNext() = false
Returned element
CIS 068
Iterator Interface Usage
Typical usage of iterator:
CIS 068
Back to Collections
AbstractCollection
CIS 068
AbstractCollection
• Facilitates implementation of Collection interface
• Providing a skeletal implementation
• Implementation of a concrete class:
• Provide data structure (e.g. array)
• Provide access to data structure
CIS 068
AbstractCollection• Concrete class must provide implementation of
Iterator
• To maintain ‘abstract character‘ of data in AbstractClass implemented (non abstract) methods use Iterator-methods to access data
AbstractCollection myCollection
add(){
Iterator i=iterator();
…
}
Clear(){
Iterator i=iterator();
…
}
implements Iterator;
int[ ] data;
Iterator iterator(){
return this;
}
hasNext(){
…
}
…
CIS 068
Back to Collections
List Interface
CIS 068
List Interface
• Extends the Collection Interface
• Adds methods to insert and retrieve objects by their position (index)
• Note: Collection Interface could NOT specify the position
• A new Iterator, the ListIterator, is introduced
• ListIterator extends Iterator, allowing for bidirectional traversal (previousIndex()...)
CIS 068
List Interface
Incorporates
index !
A new
Iterator Type
(can move forward
and
backward)
CIS 068
Example: Selection-Sorting a ListPart 1: call to selection
sort
Actual implementation of List does not
matter !
Call to SelectionSort
Use only Iterator-properties of ListIterator (upcasting)
CIS 068
Example: Selection-Sorting a ListPart 2:
Selection sort
access at index ‘fill‘
Inner loop
swap
CIS 068
Back to Collections
AbstractList: ...again the implementation of some methods...
Note:
Still ABSTRACT !
CIS 068
Concrete Lists
ArrayList and Vector:
at last concrete implementations !
CIS 068
ArrayList and VectorVector:
• For compatibility reasons (only)
• Use ArrayList
ArrayList:
• Underlying DataStructure is Array
• List-Properties add advantage over Array:
• Size can grow and shrink
• Elements can be inserted and removed in the middle
CIS 068
An Alternative Implementation (1)
CIS 068
An Alternative Implementation (2)
CIS 068
An Alternative Implementation (3)
CIS 068
Collections
The underlying array-datastructure has
• advantages for index-based access
• disadvantages for insertion / removal of middle elements (copy), insertion/removal with O(n)
• Alternative: linked lists
CIS 068
Linked List
Flexible structure, providing
• Insertion and removal from any place in O(1), compared to O(n) for array-based list
• Sequential access
• Random access at O(n), compared to O(1) for array-based list
CIS 068
Linked List
• List of dynamically allocated nodes
• Nodes arranged into a linked structure
• Data Structure ‘node‘ must provide
• Data itself (example: the bead-body)
• A possible link to another node (ex.: the link)
Children’s pop-beads as an example for a linked list
CIS 068
Linked List
Old nodeNew node next next (null)
CIS 068
Connecting Nodes
creating the nodes
connecting
CIS 068
Inserting Nodes
p.link = r
r.link = q
q can be accessed by p.link.link
r
CIS 068
Removing Nodes
p q
CIS 068
Traversing a List
(null)
CIS 068
Double Linked ListsSingle linked list
Double linked list
(null)
(null)
data
successor
predecessor
data
successor
predecessor
data
successor
predecessor
(null)
(null)
CIS 068
Back to Collections
AbstractSequentialList and LinkedList
CIS 068
LinkedList
An implementation example:
See textbook
CIS 068
Sets
Example task:
Examine, collection contains object o
Solution using a List:
-> O(n) operation !
CIS 068
Sets
Comparison to List:
• Set is designed to overcome the limitation of O(n)
• Contains unique elements
• contains() / remove() operate in O(1) or O(log n)
• No get() method, no index-access...
• ...but iterator can (still) be used to traverse set
CIS 068
Back to Collections
Interface Set
CIS 068
Hashing
How can method ‘contain()‘ be implemented to be an O(1) operation ?
http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html
CIS 068
Hashing
How can method ‘contain()‘ be implemented to be an O(1) operation ?
Idea:
• Retrieving an object of an array can be done in O(1) if the index is known
• Determine the index to store and retrieve an object by the object itself !
CIS 068
Hashing
Determine the index ... by the object itself:
Example:
Store Strings “Apu“, “Bob“, “Daria“ as Set.
Define function H: String -> integer:
• Take first character, A=1, B=2,...
Store names in String array at position H(name)
CIS 068
HashingApu: first character: A H(A) = 1
Bob: first character: B H(B) = 2
Daria: first character: D H(D) = 4
...
Apu
Bob
(unused)
Daria
(unused)
…
CIS 068
Hashing• The Function H(o) is called the HashCode of the object o
• Properties of a hashcode function:
• If a.equals(b) then H(a) = H(b)
• BUT NOT NECESSARILY VICE VERSA:
• H(a) = H(b) does NOT guarantee a.equals(b) !
• If H() has ‘sufficient variation‘, then it is most likely, that different objects have different hashcodes
CIS 068
Hashing• Additionally an array is needed,
that has sufficient space to contain at least all elements.
• The hashcode may not address an index outside the array, this can easily be achieved by:
• H1(o) = H(o) % n
• % = modulo-function, n = array length
• The larger the array, the more variates H1() !
Apu
Bob
(unused)
Daria
(unused)
…
CIS 068
HashingBack to the example:
Insert ‘Abe‘
First character: A H(A) = 1
H(Apu) = H(Abe), this is called a
Collision
Apu
Bob
(unused)
Daria
(unused)
…
CIS 068
Solving CollisionsMethod 1:
Don‘t use array of objects, but arrays of linked lists !
Apu
Bob
(unused)
Daria
(unused)
Abe
ARRAY
Array contains (start of) linked lists
CIS 068
Solving CollisionsDrawback:
• Objects must be ‘wrapped‘ in node structure, to provide links, introducing a huge overhead
’Apu’
Node
link
’Apu’wrap
CIS 068
Solving CollisionsMethod 2:
• Iteratively apply different hashcodes H0, H1, H2,.. to object o, until collision is solved
• As long as the different hashcodes
are used in the same order, the
search is guaranteed to be
consistent
Apu
Bob
(unused)
Daria
(unused)
ARRAY
Apu
H0
H1
H2
CIS 068
Solving CollisionsThe easiest hashcode-series Hinc:
H(0) = H
Hi = Hi-1 + i
http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html
Apu
Bob
(unused)
Daria
(unused)
ARRAY
Apu
H0H1
H2
CIS 068
addExample implementation of ‘add(Object o)‘ using Hinc
(assume array A has length n, H as given above)
determine index = H(o) % n
while ( A[index] != null )
if o.equals(A[index])
break;
else
index = (index +1) % n;
end
}
add element at position a[index]
CIS 068
Example implementation of ‘contains(Object o)‘ using Hinc
(assume array A has length n, H as given above)
determine index = H(o) % n
found = false;
while ( A[index] != null )
if o.equals(A[index])
found = true;
break;
else
index = (index +1) % n;
end
}
// ‘found‘ is true if set contains object o
contains
CIS 068
• If there is no collision, contains() operates in O(1)
• If the set contains elements having the same hashcode, there is a collision. Being dupmax the maximum value of elements having the same hash code, contains() operates in O(dupmax)
• If dupmax is near n, there is no increase in speed, since contains() operates in O(n)
Analysis
CIS 068
• JAVA provides a hashcode for every object
• The implementation for hashCode for e.g. String is computed by:
S[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
n = length of string, s[i] = character at position i
A Real Hashcode
Method hashCode in java.lang.Object
CIS 068
What happens if the array is full ?• Create new array, e.g. double size, and insert all
elements of old table into new table
• Note: the elements won‘t keep their index, since the modulo-function applied to the hashing has changed !
Rehashing a table
CIS 068
• Hashtable provides Set-operations add(), contains() in O(1) if hashcode is chosen properly and array allows for sufficient variation
• Speed is gained by usage of more memory
• If multiple collisions occur, hashtable might be slower than list due to overhead (computation of H,...)
Hashcode Resume
Recommended