Data Structures

Data Structures

Fundamental Data Storage

Data Structures• For sizeable programs, one problem

that can quickly arise is that of data storage.–What is the most efficient or effective

way to organize and utilize information within a program?

– Quick answer – it depends on the task.

Data Structures• For some tasks, it is helpful (at

minimum) and possibly necessary to have sorted data.

• For other tasks, it is not necessary to note where any given piece of data is stored within a storage data structure.

Data Structures• Note: while we have seen these in

passing and as examples earlier in the course, we will now examine these a little more closely.

Arrays• Possibly the most basic non-trivial

data storage structure is that of the array.–We’ve already seen the notion of a

“vector” that dynamically resizes.

10 2 43 5 6 87 9

Beyond Arrays• Note that the main structure being

implemented by an array is effectively that of an ordered list.– Just like with an array, each element

being stored has a specific location, which implies an ordering.

10 2 43 5 6 87 9

Beyond Arrays• In Java, there is an ArrayList class

in the java.util.* package.– This class internally uses an array and

resizes it when necessary as new items are added to the conceptual underlying list.

– This resizing is handled internally and automatically by the class.

Beyond Arrays• In C++, there is a vector class as

part of the std namespace.– Likewise, this class internally uses an

array and resizes it when necessary as new items are added to the conceptual underlying list.

– This resizing is also handled internally and automatically by the class.

Beyond Arrays• However, arrays are not the only way

to model a list.– Another such model is that of the linked

list. (See the graphic below.)

Linked Lists• The linked list stores each data

element separately and individually, allocating space for new elements whenever as they are added into the list.

Linked Lists• Adding data to the end of a linked list

is trivial, as it (usually) also is for an array.

Linked Lists• Adding data in the middle of the list,

or at its beginning, is (relatively) very time-consuming for an array.

• For a linked list, however, it is often a much simpler operation.

Adding Elements• Remember that for an array,

elements are in fixed locations.• To insert an element into the middle

of an array requires moving all elements at and after the point of insertion, e.g., insert 7 at index 3.

3 8 1 2 4 13 42 9 5

10 2 43 5 6 87 9

Adding Elements

3 8 1 7 2 4 13 42 9 5

10 2 43 5 6 87 9

3 8 1 2 4 13 42 9 5

10 2 43 5 6 87 9

3 8 1 2 4 13 42 9 5

10 2 43 5 6 87 9

Adding Elements• For a linked list, however, each

element’s storage space is distinct and separate from the others.

• New storage may be placed directly in the middle of the chain.

Adding Elements

Linked Lists• Naturally, there is the question of

what these “links of the chain” actually are, or more properly, how to represent them.

Linked Lists• In their most basic and simple form…

template <typename T> class Node<T>{

public: T value;Node<T>* next;

}

Linked Liststemplate <typename T> class Node<T>{

public: T value;Node<T>* next;

}

value next

Linked Lists

Remember – objects are handled by reference, so the class Node<T> doesn’t actually contain another Node<T> – just a reference to the next one in line.

Linked Lists

The end of the “linked list chain” is denoted by a null reference in the last node.

The “ground” symbol at the end denotes this.

Lists• Note that we now have two different

ways of storing data, each of which has its own pros and cons.– Arrays• Good for adding items to the end of lists and

for random access to items within the list.• Bad for cases with many additions and

removals at various places within the list.


ways of storing data, each of which has its own pros and cons.– Arrays• Good for adding items to the end of lists and

for random access to items within the list.• Bad for cases with many additions and

removals at various places within the list.


ways of storing data, each of which has its own pros and cons.– Linked Lists• Better for adding and removing items at

random locations within the list.• Bad at randomly accessing items from the

list. – Note that to use a random item within the list, we

must traverse the chain to find it.

Lists• Note that both of these objects fulfill

the same end goal – to represent a group of objects with some implied ordering upon them.

• While they meet this goal differently, their primary purpose is identical.

Templates• Templates are integral to generic

programming in C++– Template is like a blueprint– Blueprint is used to instantiate function

when it is actually used in code– “Actual” types are substituted in for the

“formal” types of the template

Why Templates?What is the difference between the following two functions?

int compare(const string &v1, const string &v2) { if (v1 < v2) return -1; if (v2 < v1) return 1; return 0;}

int compare(const double &v1, const double &v2) { if (v1 < v2) return -1; if (v2 < v1) return 1; return 0;}

Only the types!

Why Templates?What if we could write the function once for any type and have the compiler just use the right types?

template <typename T>

int compare(const T &v1, const T &v2) { if (v1 < v2) return -1; if (v2 < v1) return 1; return 0;}

Requires type T to have < operator

Exercise 1• Implement the generic compare

function• Implement a main() that compares

two doubles, two ints, two chars, and two strings using the compare fcn.

• Compile and see that it is good!

What is Going On?• Compiler sees structure when template

is defined, blueprint when generic function is coded (in header)

• When call to function is seen, compiler substitutes types used in invocation into blueprint and generates required code

• Can’t catch many errors until invocation is seen

Abstracting Beyond Lists• We have this notion of a “list”

structure, which maps its stored objects to indices.–What if we don’t actually need to have a

lookup position for our stored objects?• But wait! How could we possibly iterate

over the objects in a for loop?

The Iterator• Many programming languages

provide objects called iterators for enumerating objects contained within data structures– C++ and Java are no exceptions– C++’s versions are defined in the

<iterator> header file– (see 3.4 – 3.5)

The Iterator• This iterator may be used to get each

contained object in order, one at a time, in a controllable manner.– It’s especially designed to work well with for loops.

The Iterator• Example code:vector<int> numbers;

// omitted code initializing numbers.

iterator<int> iter;for(iter = numbers.begin();

iter != numbers.end(); iter++){

cout << *iter << ‘ ’;}

The Iterator• In C++, iterators are designed to

look like and act something like pointers.– The * and -> operators are overloaded

to give pointer-like semantics, allowing users of the iterator object to “dereference” the object currently “referenced” by the iterator.

The Iterator• In C++, iterators are designed to

look like and act something like pointers.– Note the use of operator ++ to

increment the iterator to the next item• This is another way we can interact with

pointers; it’s useful for iterating across an array while using pointer semantics… but keep a copy of the original around!

The Iteratorvector<int> numbers;

// omitted code initializing numbers.

iterator<int> iter;for(iter = numbers.begin();

iter != numbers.end(); iter++){

cout << *iter << ‘ ’;}

The Iterator• C++11 (the newest edition/standard) also

provides an alternate version of the for-loop which is designed to work with iterable structures and iterators

• Looks like “foreach” in other languagesvector<Person> structure;for(Person &p:structure){

//Code.}

The Iterator• Both the std::vector and std::list classes of C++ implement iterators.– begin() returns an iterator to the list’s

first element– end() is a special iterator “just after” the

final element of the list, useful for checking when we’re done with iteration

– Use != to check for termination

Exercise 2• Include <iterator> header• Use iterator to walk through an array

you define and print out its contents• Compile and run• See that it is good

Abstracting Beyond Lists• There are many, many other

techniques for storing data than the model of a list.– Such other data structures have

different techniques for accessing stored data.

– You have seen one in your lab exercises

Other Data Structures• Let’s move on from this idea of a

“list” structure.• In particular, note how lists map their

stored objects to indices (or can map an index to the stored object)–What if we don’t actually need to have a

lookup position for our stored objects?– In particular, does it really need to be an

integer?

Other Data Structures• There are many, many other

techniques for storing data than the model of a list.– Such other data structures have

different techniques for accessing and handling stored data.

– These “different techniques” are often designed with a focus on different usage patterns.

Other Data Structures• A first example: arrays index their

contained objects by integers.– Should integers be the only thing by

which we can index an item within a collection-oriented data structure?

– Think up some examples with neighbors

bearapple A113 cake42 blue red …

Maps• The interface built on this idea within

Java is the Map.• TreeMap and HashMap are the two prominent

implementations.– The value is the object being stored

within the map.– The key is the data element used as an

index into the map for that value (i.e., how you “look up” the value)

– Key is like key in a database, sometimes call “tag” in associative memory

Maps• The classes built on this idea within

C++ are map and unordered_map.• Sidenote – these are also not

polymorphically related.–Map stores items in order of keys– Unordered map does not require keys to

have order relation at all!

Maps• How would such a map work?–We could just use matching arrays for

the keys and values.– However, this wouldn’t be the most

efficient idea – better techniques are known.

Hash Maps• Hash maps work by converting the

key to a unique integer, where possible, through a hashing function.– C++: hash maps are represented by unordered_map.

– The selection of such a function is not a simple operation.• As such, the constructor takes in a hashing

function as an argument, mapping each key to a nearly-unique integer.

Hash Maps• This “hash code” is then mapped into

an array for storage.– Problem: the “hash code” can easily be

larger than the storage array’s size.– Solution: modular arithmetic. Divide by

the array’s size and use the remainder.

Hash Mapsi Key Value0 “Football” “Will”123456

New input:(“Football”, “Will”)

hash(“Football”)-2070369658

-2070369658 mod 70

Hash Mapsi Key Value0 “Football” “Will”123 “Basketball” “Billy”456

New input:(“Basketball”, “Billy”)

hash(“Horton”)-2127646392

-2127646392 mod 7-4 => 3

Hash MapsNew input:

(“Gymnastics”, “Rhonda”)

hash(“Gymnastics”)2068792

2068792 mod 75

i Key Value0 “Football” “Will”123 “Basketball” “Billy”45 “Gymnastics” “Rhonda”6

Hash MapsNew input:

(“Soccer”, “Becky”)

hash(“Soccer”)-2026118662

-2026118662 mod 7-1 => 6

i Key Value0 “Football” “Will”123 “Basketball” “Billy”45 “Gymnastics” “Rhonda”6 “Soccer” “Becky”

Hash Maps• Pros:– direct, instant lookup of values,

regardless of the key’s type.• Cons:– does not support sorting– requires a specialized hashing function

for keys that creates a unique int for each possible key.

Map Example#include <map>#include <iterator>

main() {map<string, size_t> wordcount;String word; while (cin >> word) { ++word_count[word]; // use map to look up value } for (const auto &w : word_count) { // iterator cout << w.first << “ occurs ” << w.second << ((w.second > 1) ? “ times ” : “ time ”) << endl; }exit 0;}

Exercise 3• Include <map> header• Use unordered map – to store >= four <key, value> pairs –

your choice– Look up values based on keys and print– Or code up previous example

• Compile and run• See that it is good

Maps• What if we want to have the entries

sorted by their keys?– It is possible to build structures that

efficiently keep their data permanently sorted by key!

Binary Tree• The binary tree is an example of one

structure that can accomplish this.– Think of it as a linked list, but with two

links per node instead of one.

Binary Tree• The corresponding Java structure is

the TreeMap class.– It implements the SortedMap interface.

Binary Tree• The corresponding C++ structure, on

the other hand, is the std::map class.

Binary Tree• The “first” node of the tree is called

the root.– Any key smaller than the root’s key is in

the left branch.– Any key larger than the root’s key is in the

right branch.

Binary Tree13

7 25

2 9 17 42

root

Binary Tree• Binary trees require the ability to compare

the keys – C++ assumes that operator< has been

overloaded for custom data types

Binary Tree• Of particular note with binary trees –

operations on them tend to be highly recursive due to their structure.– You’ve done this in lab – twice now!

Binary Tree• Pros:– the items are always in an established,

sorted order! (By key)• Pro/Con:– accesses are slower than an unordered_map, but generally faster than a list.

Questions?• You have already implemented trees

Input/Output Modeling• Certain other structures exist to

model specialized, restricted input and output behavior.– Consider the usual interaction someone

might have with a stack of papers.– Another possibility: the usual behavior

of a group of people waiting in line… in a queue waiting to be served.

Stacks• The data structure known as a stack

is a “Last In, First Out” (LIFO) structure.– That is, the last input to the structure is

the first output obtained from it.– Consider a stack of papers – when

searching through it, one typically starts at the top and searches downward, from newest to oldest.

Stacks

a a

b

a

b

c

a

b

a

b

d

a

b

Stacks• Stacks are a very good model for

function calls.–When function A calls function B, B must

complete before A resumes operation.• Similarly, if B calls C, C completes before B.

– A may then call other methods before completing.

Stacks

a a

b

a

b

c

a

b

a

b

d

a

b

Stacks• Stacks are a very good model for

function calls.– In fact, this is one reason why we’re

examining it now. Stacks are the model of how recursion mechanically works.

– In turn, recursion is necessary for operating upon many data structures.

Stacks• When debugging, the stack trace (or

call stack) of a program at a given point of execution is exactly this – a description of the order of active method calls within the program.

• The area of memory where function data lives is literally called the stack space.

Stacks + Math• Stacks have often been used in

mathematical operations.– Some graphing calculators use what is

called “Reverse Polish Notation” (RPN), which is based upon postfix operators.

– Combined with a stack, this notation is much easier to program for than infix operations.

Stacks + Math• Let’s consider the following

mathematical expression:

2 + 5 * 7 – 6 / 3

• In what order do we perform the operations?– Consider trying to code something that

would be able to interpret this!

Stacks + Math• Using the standard order of

operations, this becomes:

2 + (5 * 7) – (6 / 3)

• The postfix notation for this:

2 5 7 * + 6 3 / -((2 (5 7 *) +) (6 3 /) -)

Stacks + Math2 + (5 * 7) – (6 / 3)

2 + (35) – (2)

37 – 2

35

Stacks + Math2 5 7 * + 6 3 / -

• Let’s see how this facilitates getting the right answer.

Stacks + Math

2 2

5

2

5

7

2

35

37 37

6

2 5 7 * + 6 3 / -

Stacks + Math2 + (5 * 7) – (6 / 3)

2 + (35) – (6 / 3)

37 – (6 / 3)

Stacks + Math

37

6

37

6

3

37

2

35

2 5 7 * + 6 3 / -37 6 3 / -

37

Stacks + Math2 + (5 * 7) – (6 / 3)

2 + (35) – (6 / 3)

37 – (6 / 3)

37 – 2

35

Stacks + Math• Math done in “standard” (i.e, infix

notation) is typically first converted to postfix notation for actual computation.– This “conversion” is known as the

Shunting-yard algorithm. It’s up on Wikipedia, so feel free to take a look.

Stacks• C++ provides the std::stack class.– This implementation is something of a

“wrapper class” that uses a vector, list, or deque internally, limiting it to stack-like behavior.• We’ll see deques in a moment.• The methods push_back(), pop_back(), and back() are designed from a stack perspective.

Questions?• Home exercise – implement and use

a stack

Queues• The data structure known as a queue

is a “First In, First Out” (FIFO) structure.– That is, the first input to the structure is

the first output obtained from it.– Consider a line of people – the person in

front has priority to whatever the line is waiting on… like buying tickets at the movies or gaining access to a sports event.

Queues• Queues are significantly like lists,

except that we have additional restrictions placed on them.– Additions may only happen at the list’s

end.– Removals may only happen at the list’s

beginning.• As a result, standard array-based

behavior may not be optimal.

Queues

b c

a b c

a b

a

a

b

a

c

Queues• In C++, the queue class is provided.– This implementation is also something

of a “wrapper class” that uses a list, or deque internally, limiting it to queue-like behavior.• list works well as a queue, as linked-lists

can easily be altered from both ends.

Stacks + Queues• The “deque”, or double-ended

queue, combines the behaviors of stacks and queues into a single structure.– Items may be added or removed at

either end of the structure.– This allows for either LIFO or FIFO

behavior – it’s all in how you use the structure.• Mixed behavior is also possible, so beware!

Deques• C++ defines the deque class for such

uses.– This is a full-fledged object in its own

right, and is array-based.• It may use multiple arrays and modular

arithmetic, to allow efficient additions at the front for example.

– It is the default object used internally by both stack and queue.

Questions?• Home exercise – implement and use

a queue and a deque

Documents

Data Structures