Welcome to CIS 068 !

Preview:

DESCRIPTION

Welcome to CIS 068 !. Lesson 10: Data Structures. Overview. Description, Usage and Java-Implementation of Collections Lists Sets Hashing. Definition. Data Structures Definition ( www.nist.gov ): - PowerPoint PPT Presentation

Citation preview

CIS 068

Welcome to CIS 068 !Lesson 10:

Data Structures

CIS 068

Overview

Description, Usage and Java-Implementation of

Collections

Lists

Sets

Hashing

CIS 068

Definition

Data Structures

Definition (www.nist.gov):

“An organization of information, usually in memory, for better algorithm efficiency, such as queue, stack, linked list, heap,

dictionary, and tree, or conceptual unity, such as the name and address of a

person.”

CIS 068

Efficiency

“An organization of information …for better algorithm efficiency...”:

Isn’t the efficiency of an algorithm defined by the order of magnitude O( )?

CIS 068

Efficiency

Yes, but it is dependent on its implementation.

CIS 068

Introduction• Data structures define the structure of a

collection of data types, i.e. primitive data types or objects

• The structure provides different ways to access the data

• Different tasks need different ways to access the data

• Different tasks need different data structures

CIS 068

Introduction

Typical properties of different structures:

• fixed length / variable length

• access by index / access by iteration

• duplicate elements allowed / not allowed

CIS 068

Examples

Tasks:

• Read 300 integers

• Read an unknown number of integers

• Read 5th element of sorted collection

• Read next element of sorted collection

• Merge element at 5th position into collection

• Check if object is in collection

CIS 068

Examples

Although you can invent any datastructure you want, there are ‘classic structures‘, providing:

• Coverage of most (classic) problems

• Analysis of efficience

• Basic implementation in modern languages, like JAVA

CIS 068

Data Structures in JAVA

Let‘s see what JAVA has to offer:

CIS 068

The Collection Hierarchy

Collection: top interface, specifying requirements for all collections

CIS 068

Collection Interface

CIS 068

Collection Interface!

CIS 068

Iterator InterfacePurpose:

• Sequential access to collection elements

• Note: the so far used technique of sequentially accessing elements by sequentially indexing is not reasonable in general (why ?) !

Methods:

CIS 068

Iterator Interface

Iterator points ‘between‘ the elements of collection:

1 2 3 4 5

first position, hasNext() = true, remove() throws error

Current position (after 2 calls to next() ),

remove() deletes element 2 Position after next()

hasNext() = false

Returned element

CIS 068

Iterator Interface Usage

Typical usage of iterator:

CIS 068

Back to Collections

AbstractCollection

CIS 068

AbstractCollection

• Facilitates implementation of Collection interface

• Providing a skeletal implementation

• Implementation of a concrete class:

• Provide data structure (e.g. array)

• Provide access to data structure

CIS 068

AbstractCollection• Concrete class must provide implementation of

Iterator

• To maintain ‘abstract character‘ of data in AbstractClass implemented (non abstract) methods use Iterator-methods to access data

AbstractCollection myCollection

add(){

Iterator i=iterator();

}

Clear(){

Iterator i=iterator();

}

implements Iterator;

int[ ] data;

Iterator iterator(){

return this;

}

hasNext(){

}

CIS 068

Back to Collections

List Interface

CIS 068

List Interface

• Extends the Collection Interface

• Adds methods to insert and retrieve objects by their position (index)

• Note: Collection Interface could NOT specify the position

• A new Iterator, the ListIterator, is introduced

• ListIterator extends Iterator, allowing for bidirectional traversal (previousIndex()...)

CIS 068

List Interface

Incorporates

index !

A new

Iterator Type

(can move forward

and

backward)

CIS 068

Example: Selection-Sorting a ListPart 1: call to selection

sort

Actual implementation of List does not

matter !

Call to SelectionSort

Use only Iterator-properties of ListIterator (upcasting)

CIS 068

Example: Selection-Sorting a ListPart 2:

Selection sort

access at index ‘fill‘

Inner loop

swap

CIS 068

Back to Collections

AbstractList: ...again the implementation of some methods...

Note:

Still ABSTRACT !

CIS 068

Concrete Lists

ArrayList and Vector:

at last concrete implementations !

CIS 068

ArrayList and VectorVector:

• For compatibility reasons (only)

• Use ArrayList

ArrayList:

• Underlying DataStructure is Array

• List-Properties add advantage over Array:

• Size can grow and shrink

• Elements can be inserted and removed in the middle

CIS 068

An Alternative Implementation (1)

CIS 068

An Alternative Implementation (2)

CIS 068

An Alternative Implementation (3)

CIS 068

Collections

The underlying array-datastructure has

• advantages for index-based access

• disadvantages for insertion / removal of middle elements (copy), insertion/removal with O(n)

• Alternative: linked lists

CIS 068

Linked List

Flexible structure, providing

• Insertion and removal from any place in O(1), compared to O(n) for array-based list

• Sequential access

• Random access at O(n), compared to O(1) for array-based list

CIS 068

Linked List

• List of dynamically allocated nodes

• Nodes arranged into a linked structure

• Data Structure ‘node‘ must provide

• Data itself (example: the bead-body)

• A possible link to another node (ex.: the link)

Children’s pop-beads as an example for a linked list

CIS 068

Linked List

Old nodeNew node next next (null)

CIS 068

Connecting Nodes

creating the nodes

connecting

CIS 068

Inserting Nodes

p.link = r

r.link = q

q can be accessed by p.link.link

r

CIS 068

Removing Nodes

p q

CIS 068

Traversing a List

(null)

CIS 068

Double Linked ListsSingle linked list

Double linked list

(null)

(null)

data

successor

predecessor

data

successor

predecessor

data

successor

predecessor

(null)

(null)

CIS 068

Back to Collections

AbstractSequentialList and LinkedList

CIS 068

LinkedList

An implementation example:

See textbook

CIS 068

Sets

Example task:

Examine, collection contains object o

Solution using a List:

-> O(n) operation !

CIS 068

Sets

Comparison to List:

• Set is designed to overcome the limitation of O(n)

• Contains unique elements

• contains() / remove() operate in O(1) or O(log n)

• No get() method, no index-access...

• ...but iterator can (still) be used to traverse set

CIS 068

Back to Collections

Interface Set

CIS 068

Hashing

How can method ‘contain()‘ be implemented to be an O(1) operation ?

http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html

CIS 068

Hashing

How can method ‘contain()‘ be implemented to be an O(1) operation ?

Idea:

• Retrieving an object of an array can be done in O(1) if the index is known

• Determine the index to store and retrieve an object by the object itself !

CIS 068

Hashing

Determine the index ... by the object itself:

Example:

Store Strings “Apu“, “Bob“, “Daria“ as Set.

Define function H: String -> integer:

• Take first character, A=1, B=2,...

Store names in String array at position H(name)

CIS 068

HashingApu: first character: A H(A) = 1

Bob: first character: B H(B) = 2

Daria: first character: D H(D) = 4

...

Apu

Bob

(unused)

Daria

(unused)

CIS 068

Hashing• The Function H(o) is called the HashCode of the object o

• Properties of a hashcode function:

• If a.equals(b) then H(a) = H(b)

• BUT NOT NECESSARILY VICE VERSA:

• H(a) = H(b) does NOT guarantee a.equals(b) !

• If H() has ‘sufficient variation‘, then it is most likely, that different objects have different hashcodes

CIS 068

Hashing• Additionally an array is needed,

that has sufficient space to contain at least all elements.

• The hashcode may not address an index outside the array, this can easily be achieved by:

• H1(o) = H(o) % n

• % = modulo-function, n = array length

• The larger the array, the more variates H1() !

Apu

Bob

(unused)

Daria

(unused)

CIS 068

HashingBack to the example:

Insert ‘Abe‘

First character: A H(A) = 1

H(Apu) = H(Abe), this is called a

Collision

Apu

Bob

(unused)

Daria

(unused)

CIS 068

Solving CollisionsMethod 1:

Don‘t use array of objects, but arrays of linked lists !

Apu

Bob

(unused)

Daria

(unused)

Abe

ARRAY

Array contains (start of) linked lists

CIS 068

Solving CollisionsDrawback:

• Objects must be ‘wrapped‘ in node structure, to provide links, introducing a huge overhead

’Apu’

Node

link

’Apu’wrap

CIS 068

Solving CollisionsMethod 2:

• Iteratively apply different hashcodes H0, H1, H2,.. to object o, until collision is solved

• As long as the different hashcodes

are used in the same order, the

search is guaranteed to be

consistent

Apu

Bob

(unused)

Daria

(unused)

ARRAY

Apu

H0

H1

H2

CIS 068

Solving CollisionsThe easiest hashcode-series Hinc:

H(0) = H

Hi = Hi-1 + i

http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html

Apu

Bob

(unused)

Daria

(unused)

ARRAY

Apu

H0H1

H2

CIS 068

addExample implementation of ‘add(Object o)‘ using Hinc

(assume array A has length n, H as given above)

determine index = H(o) % n

while ( A[index] != null )

if o.equals(A[index])

break;

else

index = (index +1) % n;

end

}

add element at position a[index]

CIS 068

Example implementation of ‘contains(Object o)‘ using Hinc

(assume array A has length n, H as given above)

determine index = H(o) % n

found = false;

while ( A[index] != null )

if o.equals(A[index])

found = true;

break;

else

index = (index +1) % n;

end

}

// ‘found‘ is true if set contains object o

contains

CIS 068

• If there is no collision, contains() operates in O(1)

• If the set contains elements having the same hashcode, there is a collision. Being dupmax the maximum value of elements having the same hash code, contains() operates in O(dupmax)

• If dupmax is near n, there is no increase in speed, since contains() operates in O(n)

Analysis

CIS 068

• JAVA provides a hashcode for every object

• The implementation for hashCode for e.g. String is computed by:

S[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

n = length of string, s[i] = character at position i

A Real Hashcode

Method hashCode in java.lang.Object

CIS 068

What happens if the array is full ?• Create new array, e.g. double size, and insert all

elements of old table into new table

• Note: the elements won‘t keep their index, since the modulo-function applied to the hashing has changed !

Rehashing a table

CIS 068

• Hashtable provides Set-operations add(), contains() in O(1) if hashcode is chosen properly and array allows for sufficient variation

• Speed is gained by usage of more memory

• If multiple collisions occur, hashtable might be slower than list due to overhead (computation of H,...)

Hashcode Resume

Recommended