92
Data Structures Fundamental Data Storage

Data Structures

  • Upload
    sheng

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Data Structures. Fundamental Data Storage. Data Structures. For sizeable programs, one problem that can quickly arise is that of data storage. What is the most efficient or effective way to organize and utilize information within a program? Quick answer – it depends on the task. - PowerPoint PPT Presentation

Citation preview

Page 1: Data Structures

Data Structures

Fundamental Data Storage

Page 2: Data Structures

Data Structures• For sizeable programs, one problem

that can quickly arise is that of data storage.–What is the most efficient or effective

way to organize and utilize information within a program?

– Quick answer – it depends on the task.

Page 3: Data Structures

Data Structures• For some tasks, it is helpful (at

minimum) and possibly necessary to have sorted data.

• For other tasks, it is not necessary to note where any given piece of data is stored within a storage data structure.

Page 4: Data Structures

Data Structures• Note: while we have seen these in

passing and as examples earlier in the course, we will now examine these a little more closely.

Page 5: Data Structures

Arrays• Possibly the most basic non-trivial

data storage structure is that of the array.–We’ve already seen the notion of a

“vector” that dynamically resizes.

10 2 43 5 6 87 9

Page 6: Data Structures

Beyond Arrays• Note that the main structure being

implemented by an array is effectively that of an ordered list.– Just like with an array, each element

being stored has a specific location, which implies an ordering.

10 2 43 5 6 87 9

Page 7: Data Structures

Beyond Arrays• In Java, there is an ArrayList class

in the java.util.* package.– This class internally uses an array and

resizes it when necessary as new items are added to the conceptual underlying list.

– This resizing is handled internally and automatically by the class.

Page 8: Data Structures

Beyond Arrays• In C++, there is a vector class as

part of the std namespace.– Likewise, this class internally uses an

array and resizes it when necessary as new items are added to the conceptual underlying list.

– This resizing is also handled internally and automatically by the class.

Page 9: Data Structures

Beyond Arrays• However, arrays are not the only way

to model a list.– Another such model is that of the linked

list. (See the graphic below.)

Page 10: Data Structures

Linked Lists• The linked list stores each data

element separately and individually, allocating space for new elements whenever as they are added into the list.

Page 11: Data Structures

Linked Lists• Adding data to the end of a linked list

is trivial, as it (usually) also is for an array.

Page 12: Data Structures

Linked Lists• Adding data in the middle of the list,

or at its beginning, is (relatively) very time-consuming for an array.

• For a linked list, however, it is often a much simpler operation.

Page 13: Data Structures

Adding Elements• Remember that for an array,

elements are in fixed locations.• To insert an element into the middle

of an array requires moving all elements at and after the point of insertion, e.g., insert 7 at index 3.

3 8 1 2 4 13 42 9 5

10 2 43 5 6 87 9

Page 14: Data Structures

Adding Elements

3 8 1 7 2 4 13 42 9 5

10 2 43 5 6 87 9

3 8 1 2 4 13 42 9 5

10 2 43 5 6 87 9

3 8 1 2 4 13 42 9 5

10 2 43 5 6 87 9

Page 15: Data Structures

Adding Elements• For a linked list, however, each

element’s storage space is distinct and separate from the others.

• New storage may be placed directly in the middle of the chain.

Page 16: Data Structures

Adding Elements

Page 17: Data Structures

Linked Lists• Naturally, there is the question of

what these “links of the chain” actually are, or more properly, how to represent them.

Page 18: Data Structures

Linked Lists• In their most basic and simple form…

template <typename T> class Node<T>{

public: T value;Node<T>* next;

}

Page 19: Data Structures

Linked Liststemplate <typename T> class Node<T>{

public: T value;Node<T>* next;

}

value next

Page 20: Data Structures

Linked Lists

Remember – objects are handled by reference, so the class Node<T> doesn’t actually contain another Node<T> – just a reference to the next one in line.

Page 21: Data Structures

Linked Lists

The end of the “linked list chain” is denoted by a null reference in the last node.

The “ground” symbol at the end denotes this.

Page 22: Data Structures

Lists• Note that we now have two different

ways of storing data, each of which has its own pros and cons.– Arrays• Good for adding items to the end of lists and

for random access to items within the list.• Bad for cases with many additions and

removals at various places within the list.

Page 23: Data Structures

Lists• Note that we now have two different

ways of storing data, each of which has its own pros and cons.– Arrays• Good for adding items to the end of lists and

for random access to items within the list.• Bad for cases with many additions and

removals at various places within the list.

Page 24: Data Structures

Lists• Note that we now have two different

ways of storing data, each of which has its own pros and cons.– Linked Lists• Better for adding and removing items at

random locations within the list.• Bad at randomly accessing items from the

list. – Note that to use a random item within the list, we

must traverse the chain to find it.

Page 25: Data Structures

Lists• Note that both of these objects fulfill

the same end goal – to represent a group of objects with some implied ordering upon them.

• While they meet this goal differently, their primary purpose is identical.

Page 26: Data Structures

Templates• Templates are integral to generic

programming in C++– Template is like a blueprint– Blueprint is used to instantiate function

when it is actually used in code– “Actual” types are substituted in for the

“formal” types of the template

Page 27: Data Structures

Why Templates?What is the difference between the following two functions?

int compare(const string &v1, const string &v2) { if (v1 < v2) return -1; if (v2 < v1) return 1; return 0;}

int compare(const double &v1, const double &v2) { if (v1 < v2) return -1; if (v2 < v1) return 1; return 0;}

Only the types!

Page 28: Data Structures

Why Templates?What if we could write the function once for any type and have the compiler just use the right types?

template <typename T>

int compare(const T &v1, const T &v2) { if (v1 < v2) return -1; if (v2 < v1) return 1; return 0;}

Requires type T to have < operator

Page 29: Data Structures

Exercise 1• Implement the generic compare

function• Implement a main() that compares

two doubles, two ints, two chars, and two strings using the compare fcn.

• Compile and see that it is good!

Page 30: Data Structures

What is Going On?• Compiler sees structure when template

is defined, blueprint when generic function is coded (in header)

• When call to function is seen, compiler substitutes types used in invocation into blueprint and generates required code

• Can’t catch many errors until invocation is seen

Page 31: Data Structures

Abstracting Beyond Lists• We have this notion of a “list”

structure, which maps its stored objects to indices.–What if we don’t actually need to have a

lookup position for our stored objects?• But wait! How could we possibly iterate

over the objects in a for loop?

Page 32: Data Structures

The Iterator• Many programming languages

provide objects called iterators for enumerating objects contained within data structures– C++ and Java are no exceptions– C++’s versions are defined in the

<iterator> header file– (see 3.4 – 3.5)

Page 33: Data Structures

The Iterator• This iterator may be used to get each

contained object in order, one at a time, in a controllable manner.– It’s especially designed to work well with for loops.

Page 34: Data Structures

The Iterator• Example code:vector<int> numbers;

// omitted code initializing numbers.

iterator<int> iter;for(iter = numbers.begin();

iter != numbers.end(); iter++){

cout << *iter << ‘ ’;}

Page 35: Data Structures

The Iterator• In C++, iterators are designed to

look like and act something like pointers.– The * and -> operators are overloaded

to give pointer-like semantics, allowing users of the iterator object to “dereference” the object currently “referenced” by the iterator.

Page 36: Data Structures

The Iterator• In C++, iterators are designed to

look like and act something like pointers.– Note the use of operator ++ to

increment the iterator to the next item• This is another way we can interact with

pointers; it’s useful for iterating across an array while using pointer semantics… but keep a copy of the original around!

Page 37: Data Structures

The Iteratorvector<int> numbers;

// omitted code initializing numbers.

iterator<int> iter;for(iter = numbers.begin();

iter != numbers.end(); iter++){

cout << *iter << ‘ ’;}

Page 38: Data Structures

The Iterator• C++11 (the newest edition/standard) also

provides an alternate version of the for-loop which is designed to work with iterable structures and iterators

• Looks like “foreach” in other languagesvector<Person> structure;for(Person &p:structure){

//Code.}

Page 39: Data Structures

The Iterator• Both the std::vector and std::list classes of C++ implement iterators.– begin() returns an iterator to the list’s

first element– end() is a special iterator “just after” the

final element of the list, useful for checking when we’re done with iteration

– Use != to check for termination

Page 40: Data Structures

Exercise 2• Include <iterator> header• Use iterator to walk through an array

you define and print out its contents• Compile and run• See that it is good

Page 41: Data Structures

Abstracting Beyond Lists• There are many, many other

techniques for storing data than the model of a list.– Such other data structures have

different techniques for accessing stored data.

– You have seen one in your lab exercises

Page 42: Data Structures

Other Data Structures• Let’s move on from this idea of a

“list” structure.• In particular, note how lists map their

stored objects to indices (or can map an index to the stored object)–What if we don’t actually need to have a

lookup position for our stored objects?– In particular, does it really need to be an

integer?

Page 43: Data Structures

Other Data Structures• There are many, many other

techniques for storing data than the model of a list.– Such other data structures have

different techniques for accessing and handling stored data.

– These “different techniques” are often designed with a focus on different usage patterns.

Page 44: Data Structures

Other Data Structures• A first example: arrays index their

contained objects by integers.– Should integers be the only thing by

which we can index an item within a collection-oriented data structure?

– Think up some examples with neighbors

bearapple A113 cake42 blue red …

Page 45: Data Structures

Maps• The interface built on this idea within

Java is the Map.• TreeMap and HashMap are the two prominent

implementations.– The value is the object being stored

within the map.– The key is the data element used as an

index into the map for that value (i.e., how you “look up” the value)

– Key is like key in a database, sometimes call “tag” in associative memory

Page 46: Data Structures

Maps• The classes built on this idea within

C++ are map and unordered_map.• Sidenote – these are also not

polymorphically related.–Map stores items in order of keys– Unordered map does not require keys to

have order relation at all!

Page 47: Data Structures

Maps• How would such a map work?–We could just use matching arrays for

the keys and values.– However, this wouldn’t be the most

efficient idea – better techniques are known.

Page 48: Data Structures

Hash Maps• Hash maps work by converting the

key to a unique integer, where possible, through a hashing function.– C++: hash maps are represented by unordered_map.

– The selection of such a function is not a simple operation.• As such, the constructor takes in a hashing

function as an argument, mapping each key to a nearly-unique integer.

Page 49: Data Structures

Hash Maps• This “hash code” is then mapped into

an array for storage.– Problem: the “hash code” can easily be

larger than the storage array’s size.– Solution: modular arithmetic. Divide by

the array’s size and use the remainder.

Page 50: Data Structures

Hash Mapsi Key Value0 “Football” “Will”123456

New input:(“Football”, “Will”)

hash(“Football”)-2070369658

-2070369658 mod 70

Page 51: Data Structures

Hash Mapsi Key Value0 “Football” “Will”123 “Basketball” “Billy”456

New input:(“Basketball”, “Billy”)

hash(“Horton”)-2127646392

-2127646392 mod 7-4 => 3

Page 52: Data Structures

Hash MapsNew input:

(“Gymnastics”, “Rhonda”)

hash(“Gymnastics”)2068792

2068792 mod 75

i Key Value0 “Football” “Will”123 “Basketball” “Billy”45 “Gymnastics” “Rhonda”6

Page 53: Data Structures

Hash MapsNew input:

(“Soccer”, “Becky”)

hash(“Soccer”)-2026118662

-2026118662 mod 7-1 => 6

i Key Value0 “Football” “Will”123 “Basketball” “Billy”45 “Gymnastics” “Rhonda”6 “Soccer” “Becky”

Page 54: Data Structures

Hash Maps• Pros:– direct, instant lookup of values,

regardless of the key’s type.• Cons:– does not support sorting– requires a specialized hashing function

for keys that creates a unique int for each possible key.

Page 55: Data Structures

Map Example#include <map>#include <iterator>

main() {map<string, size_t> wordcount;String word; while (cin >> word) { ++word_count[word]; // use map to look up value } for (const auto &w : word_count) { // iterator cout << w.first << “ occurs ” << w.second << ((w.second > 1) ? “ times ” : “ time ”) << endl; }exit 0;}

Page 56: Data Structures

Exercise 3• Include <map> header• Use unordered map – to store >= four <key, value> pairs –

your choice– Look up values based on keys and print– Or code up previous example

• Compile and run• See that it is good

Page 57: Data Structures

Maps• What if we want to have the entries

sorted by their keys?– It is possible to build structures that

efficiently keep their data permanently sorted by key!

Page 58: Data Structures

Binary Tree• The binary tree is an example of one

structure that can accomplish this.– Think of it as a linked list, but with two

links per node instead of one.

Page 59: Data Structures

Binary Tree• The corresponding Java structure is

the TreeMap class.– It implements the SortedMap interface.

Page 60: Data Structures

Binary Tree• The corresponding C++ structure, on

the other hand, is the std::map class.

Page 61: Data Structures

Binary Tree• The “first” node of the tree is called

the root.– Any key smaller than the root’s key is in

the left branch.– Any key larger than the root’s key is in the

right branch.

Page 62: Data Structures

Binary Tree13

7 25

2 9 17 42

root

Page 63: Data Structures

Binary Tree• Binary trees require the ability to compare

the keys – C++ assumes that operator< has been

overloaded for custom data types

Page 64: Data Structures

Binary Tree• Of particular note with binary trees –

operations on them tend to be highly recursive due to their structure.– You’ve done this in lab – twice now!

Page 65: Data Structures

Binary Tree• Pros:– the items are always in an established,

sorted order! (By key)• Pro/Con:– accesses are slower than an unordered_map, but generally faster than a list.

Page 66: Data Structures

Questions?• You have already implemented trees

Page 67: Data Structures

Input/Output Modeling• Certain other structures exist to

model specialized, restricted input and output behavior.– Consider the usual interaction someone

might have with a stack of papers.– Another possibility: the usual behavior

of a group of people waiting in line… in a queue waiting to be served.

Page 68: Data Structures

Stacks• The data structure known as a stack

is a “Last In, First Out” (LIFO) structure.– That is, the last input to the structure is

the first output obtained from it.– Consider a stack of papers – when

searching through it, one typically starts at the top and searches downward, from newest to oldest.

Page 69: Data Structures

Stacks

a a

b

a

b

c

a

b

a

b

d

a

b

Page 70: Data Structures

Stacks• Stacks are a very good model for

function calls.–When function A calls function B, B must

complete before A resumes operation.• Similarly, if B calls C, C completes before B.

– A may then call other methods before completing.

Page 71: Data Structures

Stacks

a a

b

a

b

c

a

b

a

b

d

a

b

Page 72: Data Structures

Stacks• Stacks are a very good model for

function calls.– In fact, this is one reason why we’re

examining it now. Stacks are the model of how recursion mechanically works.

– In turn, recursion is necessary for operating upon many data structures.

Page 73: Data Structures

Stacks• When debugging, the stack trace (or

call stack) of a program at a given point of execution is exactly this – a description of the order of active method calls within the program.

• The area of memory where function data lives is literally called the stack space.

Page 74: Data Structures

Stacks + Math• Stacks have often been used in

mathematical operations.– Some graphing calculators use what is

called “Reverse Polish Notation” (RPN), which is based upon postfix operators.

– Combined with a stack, this notation is much easier to program for than infix operations.

Page 75: Data Structures

Stacks + Math• Let’s consider the following

mathematical expression:

2 + 5 * 7 – 6 / 3

• In what order do we perform the operations?– Consider trying to code something that

would be able to interpret this!

Page 76: Data Structures

Stacks + Math• Using the standard order of

operations, this becomes:

2 + (5 * 7) – (6 / 3)

• The postfix notation for this:

2 5 7 * + 6 3 / -((2 (5 7 *) +) (6 3 /) -)

Page 77: Data Structures

Stacks + Math2 + (5 * 7) – (6 / 3)

2 + (35) – (2)

37 – 2

35

Page 78: Data Structures

Stacks + Math2 5 7 * + 6 3 / -

• Let’s see how this facilitates getting the right answer.

Page 79: Data Structures

Stacks + Math

2 2

5

2

5

7

2

35

37 37

6

2 5 7 * + 6 3 / -

Page 80: Data Structures

Stacks + Math2 + (5 * 7) – (6 / 3)

2 + (35) – (6 / 3)

37 – (6 / 3)

Page 81: Data Structures

Stacks + Math

37

6

37

6

3

37

2

35

2 5 7 * + 6 3 / -37 6 3 / -

37

Page 82: Data Structures

Stacks + Math2 + (5 * 7) – (6 / 3)

2 + (35) – (6 / 3)

37 – (6 / 3)

37 – 2

35

Page 83: Data Structures

Stacks + Math• Math done in “standard” (i.e, infix

notation) is typically first converted to postfix notation for actual computation.– This “conversion” is known as the

Shunting-yard algorithm. It’s up on Wikipedia, so feel free to take a look.

Page 84: Data Structures

Stacks• C++ provides the std::stack class.– This implementation is something of a

“wrapper class” that uses a vector, list, or deque internally, limiting it to stack-like behavior.• We’ll see deques in a moment.• The methods push_back(), pop_back(), and back() are designed from a stack perspective.

Page 85: Data Structures

Questions?• Home exercise – implement and use

a stack

Page 86: Data Structures

Queues• The data structure known as a queue

is a “First In, First Out” (FIFO) structure.– That is, the first input to the structure is

the first output obtained from it.– Consider a line of people – the person in

front has priority to whatever the line is waiting on… like buying tickets at the movies or gaining access to a sports event.

Page 87: Data Structures

Queues• Queues are significantly like lists,

except that we have additional restrictions placed on them.– Additions may only happen at the list’s

end.– Removals may only happen at the list’s

beginning.• As a result, standard array-based

behavior may not be optimal.

Page 88: Data Structures

Queues

b c

a b c

a b

a

a

b

a

c

Page 89: Data Structures

Queues• In C++, the queue class is provided.– This implementation is also something

of a “wrapper class” that uses a list, or deque internally, limiting it to queue-like behavior.• list works well as a queue, as linked-lists

can easily be altered from both ends.

Page 90: Data Structures

Stacks + Queues• The “deque”, or double-ended

queue, combines the behaviors of stacks and queues into a single structure.– Items may be added or removed at

either end of the structure.– This allows for either LIFO or FIFO

behavior – it’s all in how you use the structure.• Mixed behavior is also possible, so beware!

Page 91: Data Structures

Deques• C++ defines the deque class for such

uses.– This is a full-fledged object in its own

right, and is array-based.• It may use multiple arrays and modular

arithmetic, to allow efficient additions at the front for example.

– It is the default object used internally by both stack and queue.

Page 92: Data Structures

Questions?• Home exercise – implement and use

a queue and a deque