Upload
sheng
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Data Structures. Fundamental Data Storage. Data Structures. For sizeable programs, one problem that can quickly arise is that of data storage. What is the most efficient or effective way to organize and utilize information within a program? Quick answer – it depends on the task. - PowerPoint PPT Presentation
Citation preview
Data Structures
Fundamental Data Storage
Data Structures• For sizeable programs, one problem
that can quickly arise is that of data storage.–What is the most efficient or effective
way to organize and utilize information within a program?
– Quick answer – it depends on the task.
Data Structures• For some tasks, it is helpful (at
minimum) and possibly necessary to have sorted data.
• For other tasks, it is not necessary to note where any given piece of data is stored within a storage data structure.
Data Structures• Note: while we have seen these in
passing and as examples earlier in the course, we will now examine these a little more closely.
Arrays• Possibly the most basic non-trivial
data storage structure is that of the array.–We’ve already seen the notion of a
“vector” that dynamically resizes.
10 2 43 5 6 87 9
Beyond Arrays• Note that the main structure being
implemented by an array is effectively that of an ordered list.– Just like with an array, each element
being stored has a specific location, which implies an ordering.
10 2 43 5 6 87 9
Beyond Arrays• In Java, there is an ArrayList class
in the java.util.* package.– This class internally uses an array and
resizes it when necessary as new items are added to the conceptual underlying list.
– This resizing is handled internally and automatically by the class.
Beyond Arrays• In C++, there is a vector class as
part of the std namespace.– Likewise, this class internally uses an
array and resizes it when necessary as new items are added to the conceptual underlying list.
– This resizing is also handled internally and automatically by the class.
Beyond Arrays• However, arrays are not the only way
to model a list.– Another such model is that of the linked
list. (See the graphic below.)
Linked Lists• The linked list stores each data
element separately and individually, allocating space for new elements whenever as they are added into the list.
Linked Lists• Adding data to the end of a linked list
is trivial, as it (usually) also is for an array.
Linked Lists• Adding data in the middle of the list,
or at its beginning, is (relatively) very time-consuming for an array.
• For a linked list, however, it is often a much simpler operation.
Adding Elements• Remember that for an array,
elements are in fixed locations.• To insert an element into the middle
of an array requires moving all elements at and after the point of insertion, e.g., insert 7 at index 3.
3 8 1 2 4 13 42 9 5
10 2 43 5 6 87 9
Adding Elements
3 8 1 7 2 4 13 42 9 5
10 2 43 5 6 87 9
3 8 1 2 4 13 42 9 5
10 2 43 5 6 87 9
3 8 1 2 4 13 42 9 5
10 2 43 5 6 87 9
Adding Elements• For a linked list, however, each
element’s storage space is distinct and separate from the others.
• New storage may be placed directly in the middle of the chain.
Adding Elements
Linked Lists• Naturally, there is the question of
what these “links of the chain” actually are, or more properly, how to represent them.
Linked Lists• In their most basic and simple form…
template <typename T> class Node<T>{
public: T value;Node<T>* next;
}
Linked Liststemplate <typename T> class Node<T>{
public: T value;Node<T>* next;
}
value next
Linked Lists
Remember – objects are handled by reference, so the class Node<T> doesn’t actually contain another Node<T> – just a reference to the next one in line.
Linked Lists
The end of the “linked list chain” is denoted by a null reference in the last node.
The “ground” symbol at the end denotes this.
Lists• Note that we now have two different
ways of storing data, each of which has its own pros and cons.– Arrays• Good for adding items to the end of lists and
for random access to items within the list.• Bad for cases with many additions and
removals at various places within the list.
Lists• Note that we now have two different
ways of storing data, each of which has its own pros and cons.– Arrays• Good for adding items to the end of lists and
for random access to items within the list.• Bad for cases with many additions and
removals at various places within the list.
Lists• Note that we now have two different
ways of storing data, each of which has its own pros and cons.– Linked Lists• Better for adding and removing items at
random locations within the list.• Bad at randomly accessing items from the
list. – Note that to use a random item within the list, we
must traverse the chain to find it.
Lists• Note that both of these objects fulfill
the same end goal – to represent a group of objects with some implied ordering upon them.
• While they meet this goal differently, their primary purpose is identical.
Templates• Templates are integral to generic
programming in C++– Template is like a blueprint– Blueprint is used to instantiate function
when it is actually used in code– “Actual” types are substituted in for the
“formal” types of the template
Why Templates?What is the difference between the following two functions?
int compare(const string &v1, const string &v2) { if (v1 < v2) return -1; if (v2 < v1) return 1; return 0;}
int compare(const double &v1, const double &v2) { if (v1 < v2) return -1; if (v2 < v1) return 1; return 0;}
Only the types!
Why Templates?What if we could write the function once for any type and have the compiler just use the right types?
template <typename T>
int compare(const T &v1, const T &v2) { if (v1 < v2) return -1; if (v2 < v1) return 1; return 0;}
Requires type T to have < operator
Exercise 1• Implement the generic compare
function• Implement a main() that compares
two doubles, two ints, two chars, and two strings using the compare fcn.
• Compile and see that it is good!
What is Going On?• Compiler sees structure when template
is defined, blueprint when generic function is coded (in header)
• When call to function is seen, compiler substitutes types used in invocation into blueprint and generates required code
• Can’t catch many errors until invocation is seen
Abstracting Beyond Lists• We have this notion of a “list”
structure, which maps its stored objects to indices.–What if we don’t actually need to have a
lookup position for our stored objects?• But wait! How could we possibly iterate
over the objects in a for loop?
The Iterator• Many programming languages
provide objects called iterators for enumerating objects contained within data structures– C++ and Java are no exceptions– C++’s versions are defined in the
<iterator> header file– (see 3.4 – 3.5)
The Iterator• This iterator may be used to get each
contained object in order, one at a time, in a controllable manner.– It’s especially designed to work well with for loops.
The Iterator• Example code:vector<int> numbers;
// omitted code initializing numbers.
iterator<int> iter;for(iter = numbers.begin();
iter != numbers.end(); iter++){
cout << *iter << ‘ ’;}
The Iterator• In C++, iterators are designed to
look like and act something like pointers.– The * and -> operators are overloaded
to give pointer-like semantics, allowing users of the iterator object to “dereference” the object currently “referenced” by the iterator.
The Iterator• In C++, iterators are designed to
look like and act something like pointers.– Note the use of operator ++ to
increment the iterator to the next item• This is another way we can interact with
pointers; it’s useful for iterating across an array while using pointer semantics… but keep a copy of the original around!
The Iteratorvector<int> numbers;
// omitted code initializing numbers.
iterator<int> iter;for(iter = numbers.begin();
iter != numbers.end(); iter++){
cout << *iter << ‘ ’;}
The Iterator• C++11 (the newest edition/standard) also
provides an alternate version of the for-loop which is designed to work with iterable structures and iterators
• Looks like “foreach” in other languagesvector<Person> structure;for(Person &p:structure){
//Code.}
The Iterator• Both the std::vector and std::list classes of C++ implement iterators.– begin() returns an iterator to the list’s
first element– end() is a special iterator “just after” the
final element of the list, useful for checking when we’re done with iteration
– Use != to check for termination
Exercise 2• Include <iterator> header• Use iterator to walk through an array
you define and print out its contents• Compile and run• See that it is good
Abstracting Beyond Lists• There are many, many other
techniques for storing data than the model of a list.– Such other data structures have
different techniques for accessing stored data.
– You have seen one in your lab exercises
Other Data Structures• Let’s move on from this idea of a
“list” structure.• In particular, note how lists map their
stored objects to indices (or can map an index to the stored object)–What if we don’t actually need to have a
lookup position for our stored objects?– In particular, does it really need to be an
integer?
Other Data Structures• There are many, many other
techniques for storing data than the model of a list.– Such other data structures have
different techniques for accessing and handling stored data.
– These “different techniques” are often designed with a focus on different usage patterns.
Other Data Structures• A first example: arrays index their
contained objects by integers.– Should integers be the only thing by
which we can index an item within a collection-oriented data structure?
– Think up some examples with neighbors
bearapple A113 cake42 blue red …
Maps• The interface built on this idea within
Java is the Map.• TreeMap and HashMap are the two prominent
implementations.– The value is the object being stored
within the map.– The key is the data element used as an
index into the map for that value (i.e., how you “look up” the value)
– Key is like key in a database, sometimes call “tag” in associative memory
Maps• The classes built on this idea within
C++ are map and unordered_map.• Sidenote – these are also not
polymorphically related.–Map stores items in order of keys– Unordered map does not require keys to
have order relation at all!
Maps• How would such a map work?–We could just use matching arrays for
the keys and values.– However, this wouldn’t be the most
efficient idea – better techniques are known.
Hash Maps• Hash maps work by converting the
key to a unique integer, where possible, through a hashing function.– C++: hash maps are represented by unordered_map.
– The selection of such a function is not a simple operation.• As such, the constructor takes in a hashing
function as an argument, mapping each key to a nearly-unique integer.
Hash Maps• This “hash code” is then mapped into
an array for storage.– Problem: the “hash code” can easily be
larger than the storage array’s size.– Solution: modular arithmetic. Divide by
the array’s size and use the remainder.
Hash Mapsi Key Value0 “Football” “Will”123456
New input:(“Football”, “Will”)
hash(“Football”)-2070369658
-2070369658 mod 70
Hash Mapsi Key Value0 “Football” “Will”123 “Basketball” “Billy”456
New input:(“Basketball”, “Billy”)
hash(“Horton”)-2127646392
-2127646392 mod 7-4 => 3
Hash MapsNew input:
(“Gymnastics”, “Rhonda”)
hash(“Gymnastics”)2068792
2068792 mod 75
i Key Value0 “Football” “Will”123 “Basketball” “Billy”45 “Gymnastics” “Rhonda”6
Hash MapsNew input:
(“Soccer”, “Becky”)
hash(“Soccer”)-2026118662
-2026118662 mod 7-1 => 6
i Key Value0 “Football” “Will”123 “Basketball” “Billy”45 “Gymnastics” “Rhonda”6 “Soccer” “Becky”
Hash Maps• Pros:– direct, instant lookup of values,
regardless of the key’s type.• Cons:– does not support sorting– requires a specialized hashing function
for keys that creates a unique int for each possible key.
Map Example#include <map>#include <iterator>
main() {map<string, size_t> wordcount;String word; while (cin >> word) { ++word_count[word]; // use map to look up value } for (const auto &w : word_count) { // iterator cout << w.first << “ occurs ” << w.second << ((w.second > 1) ? “ times ” : “ time ”) << endl; }exit 0;}
Exercise 3• Include <map> header• Use unordered map – to store >= four <key, value> pairs –
your choice– Look up values based on keys and print– Or code up previous example
• Compile and run• See that it is good
Maps• What if we want to have the entries
sorted by their keys?– It is possible to build structures that
efficiently keep their data permanently sorted by key!
Binary Tree• The binary tree is an example of one
structure that can accomplish this.– Think of it as a linked list, but with two
links per node instead of one.
Binary Tree• The corresponding Java structure is
the TreeMap class.– It implements the SortedMap interface.
Binary Tree• The corresponding C++ structure, on
the other hand, is the std::map class.
Binary Tree• The “first” node of the tree is called
the root.– Any key smaller than the root’s key is in
the left branch.– Any key larger than the root’s key is in the
right branch.
Binary Tree13
7 25
2 9 17 42
root
Binary Tree• Binary trees require the ability to compare
the keys – C++ assumes that operator< has been
overloaded for custom data types
Binary Tree• Of particular note with binary trees –
operations on them tend to be highly recursive due to their structure.– You’ve done this in lab – twice now!
Binary Tree• Pros:– the items are always in an established,
sorted order! (By key)• Pro/Con:– accesses are slower than an unordered_map, but generally faster than a list.
Questions?• You have already implemented trees
Input/Output Modeling• Certain other structures exist to
model specialized, restricted input and output behavior.– Consider the usual interaction someone
might have with a stack of papers.– Another possibility: the usual behavior
of a group of people waiting in line… in a queue waiting to be served.
Stacks• The data structure known as a stack
is a “Last In, First Out” (LIFO) structure.– That is, the last input to the structure is
the first output obtained from it.– Consider a stack of papers – when
searching through it, one typically starts at the top and searches downward, from newest to oldest.
Stacks
a a
b
a
b
c
a
b
a
b
d
a
b
Stacks• Stacks are a very good model for
function calls.–When function A calls function B, B must
complete before A resumes operation.• Similarly, if B calls C, C completes before B.
– A may then call other methods before completing.
Stacks
a a
b
a
b
c
a
b
a
b
d
a
b
Stacks• Stacks are a very good model for
function calls.– In fact, this is one reason why we’re
examining it now. Stacks are the model of how recursion mechanically works.
– In turn, recursion is necessary for operating upon many data structures.
Stacks• When debugging, the stack trace (or
call stack) of a program at a given point of execution is exactly this – a description of the order of active method calls within the program.
• The area of memory where function data lives is literally called the stack space.
Stacks + Math• Stacks have often been used in
mathematical operations.– Some graphing calculators use what is
called “Reverse Polish Notation” (RPN), which is based upon postfix operators.
– Combined with a stack, this notation is much easier to program for than infix operations.
Stacks + Math• Let’s consider the following
mathematical expression:
2 + 5 * 7 – 6 / 3
• In what order do we perform the operations?– Consider trying to code something that
would be able to interpret this!
Stacks + Math• Using the standard order of
operations, this becomes:
2 + (5 * 7) – (6 / 3)
• The postfix notation for this:
2 5 7 * + 6 3 / -((2 (5 7 *) +) (6 3 /) -)
Stacks + Math2 + (5 * 7) – (6 / 3)
2 + (35) – (2)
37 – 2
35
Stacks + Math2 5 7 * + 6 3 / -
• Let’s see how this facilitates getting the right answer.
Stacks + Math
2 2
5
2
5
7
2
35
37 37
6
2 5 7 * + 6 3 / -
Stacks + Math2 + (5 * 7) – (6 / 3)
2 + (35) – (6 / 3)
37 – (6 / 3)
Stacks + Math
37
6
37
6
3
37
2
35
2 5 7 * + 6 3 / -37 6 3 / -
37
Stacks + Math2 + (5 * 7) – (6 / 3)
2 + (35) – (6 / 3)
37 – (6 / 3)
37 – 2
35
Stacks + Math• Math done in “standard” (i.e, infix
notation) is typically first converted to postfix notation for actual computation.– This “conversion” is known as the
Shunting-yard algorithm. It’s up on Wikipedia, so feel free to take a look.
Stacks• C++ provides the std::stack class.– This implementation is something of a
“wrapper class” that uses a vector, list, or deque internally, limiting it to stack-like behavior.• We’ll see deques in a moment.• The methods push_back(), pop_back(), and back() are designed from a stack perspective.
Questions?• Home exercise – implement and use
a stack
Queues• The data structure known as a queue
is a “First In, First Out” (FIFO) structure.– That is, the first input to the structure is
the first output obtained from it.– Consider a line of people – the person in
front has priority to whatever the line is waiting on… like buying tickets at the movies or gaining access to a sports event.
Queues• Queues are significantly like lists,
except that we have additional restrictions placed on them.– Additions may only happen at the list’s
end.– Removals may only happen at the list’s
beginning.• As a result, standard array-based
behavior may not be optimal.
Queues
b c
a b c
a b
a
a
b
a
c
Queues• In C++, the queue class is provided.– This implementation is also something
of a “wrapper class” that uses a list, or deque internally, limiting it to queue-like behavior.• list works well as a queue, as linked-lists
can easily be altered from both ends.
Stacks + Queues• The “deque”, or double-ended
queue, combines the behaviors of stacks and queues into a single structure.– Items may be added or removed at
either end of the structure.– This allows for either LIFO or FIFO
behavior – it’s all in how you use the structure.• Mixed behavior is also possible, so beware!
Deques• C++ defines the deque class for such
uses.– This is a full-fledged object in its own
right, and is array-based.• It may use multiple arrays and modular
arithmetic, to allow efficient additions at the front for example.
– It is the default object used internally by both stack and queue.
Questions?• Home exercise – implement and use
a queue and a deque