Problem-solving on large-scale clusters:
theory and applications
Lecture 1: Introduction and Theoretical Background
Today’s Outline
• Introductions
• Quiz
• Course Objective & Administrative Info• fold and map: Theory
Introductions
• Name + trivia
Quiz Time!
• Not graded; helps us calibrate how difficult to make this seminar
• Okay (and encouraged!) to leave questions blank
Course Outline• Introduction to parallel programming and
distributed system design– successfully decompose problems into map and
reduce stages – decide whether a problem can be solved with a
parallel algorithm, and evaluate its strengths and weaknesses
– understand the basic tradeoffs and major issues in distributed system design
– know the common pitfalls of distributed system design
• This seminar is light on “facts” and “recipes”, heavy on “tradeoffs”
Course Information (1 of 2)
• Lecturers: – Albert J. Wong– Hannah Tang
• Lab consultant:– Alden King
• Liasons:– John Zahorjan– Christophe Bisciglia
Course Information (2 of 2)
• Textbook– None; see online course readings
• Webpage: http://www.cs.washington.edu/cse490h
• Mailing lists:– Course discussion: cse490h@...
Warning: Theory Ahead!
• Before we can talk about MapReduce, we need to talk about the concepts on which it is founded:– Programming languages: fold and map– Distributed systems: data dependancies
Digression: Function Objects (1 of 3)
• A function object is a function that can be manipulated as an object– Sometimes referred to as a “functor”
• In Java, this is usually implemented with a class that has an execute() (or similarly named) method
class ReverseAlphaOrder implements Comparable {
public int Compare(Object o1, Object o2) {
if(o1 instanceof String && o2 instanceof String) {
return String(o1) >= String(o2);
}
}
String[] myStrings;
ReverseAlphaOrder rao;
Collections.sort(myStrings, rao);
Digression: Function Objects (2 of 3)
• Example: Inheriting from the Comparable interface to use Collections.sort()
The underlying idea is to pass the “greater than” operation to sort()
Digression: Function Objects (3 of 3)
• In Java, methods that take function objects are “higher-order functions”– Collections.sort() is a higher-order function
• Mathematically, a “higher order function” is a function which does at least one of the following:– Take one or more functions as input– Output a function
• Examples: – The derivative (from calculus)
d/dx (x3 + 2x) = 3x2 + 2
fold - Introduction• fold is a family of higher-order functions
that process a data structure and return a single value– Commonly, fold takes a function f and a list l, and recursively applies f to “combine” the elements of l
– The return value may be “complex”, e.g. a list
• Example:– fold (+) [1,2,4,8] -> ???– fold (/) [64,8,4,2] -> ???
fold - Directionality• Remember how we said fold was “a family of
functions”? – foldr (/) [64,8,4,2] -> 64 / (8 / (4/2)) -> 16– foldl (/) [64,8,4,2] -> ((64/8) / 4) / 2 -> 1
• “fold right” – recursively applies f over the right side of the list
• “fold left” – recursively applies f over the left side of the list
Right fold Left fold
64
84
÷
÷
2
÷ 4
64 8
÷
÷
2
÷
fold - Questions
• Discussion questions:– What should the base case return?
•foldr (+) [] -> ???•foldr (/) [] -> ???
– Can a right fold be implemented as a loop (using tail recursion)? What about left fold?
• Enrichment questions:– What happens to a right fold when given an
infinite list? What about left fold?
fold - Formal Definition• fold takes a function and a list as its inputs –
but it can also take more values. – In particular, fold maintains context / state across
each invocation of f
-- If the list is empty, return the initial value ‘z’foldr f z [] = z -- If the list is not empty, calculate the result of folding the-- rest, and apply f to the first element and to that result.-- The context from previous invocations of f is implicitly -- passed to the current invocation of via foldrfoldr f z (x:xs) = f x (foldr f z xs)
What is the formal definition of foldl?
fold – An Intuition• fold “iterates” over a data structure, and
maintains one unit of state– At each iteration, f is invoked with the current
element and the current state– fold’s return value is the result of f’s final
invocation
map - Introduction• map is a higher-order function that
“transforms” each element in a sequence of elements– Commonly, map takes a function f and a
sequence s, and applies f to each element of s
• Example:– map square_root [1,4,9,16] -> ???
map’s Return Value• map returns a sequence
– The new sequence s’ is not necessarily the same size as s
– The elements of s’ do not necessarily have the same type as the elements of s
• Recall that the sum of N vectors was equal to the sum of their components:
• Let components() decompose a vector into its X and Y components
map’s Return Value – Example
a
b
a+b
map components [ ] = , ,
), (,), (, ,= [ ( ) ] ???
, ,, , ,= [ ] ???
map - Questions
• Enrichment questions:– For what values of f and z will fold f z l = l? How can you modify f such that fold f z l = map f l?
– Bonus question: can you implement map in terms of fold?
– Visit foldl.com and foldr.com :)
map – Formal definition• map takes a function and a data structure
as its inputs
-- If the list is empty, there’s nothing to domap f [] = [] -- If the list is not empty, apply f to the first element and-- add the result to the mapping of f on all other elementsmap f (x:xs) = f x : map f xs
What is the complexity of map? What is its runtime?
Exercise (1 of 2)• Individually:
– Determine how these operations can be solved with a fold, a map, or some combination of fold and map:
• Given a list of vectors, add them to determine the resultant vector.
• Ray tracing a single ray– Ray tracing takes a list of rays that intersect the camera, and
traces their path back to their respective lightsources, even across their reflection over several surfaces
• Assuming you had access to a company’s monthly paystubs for all employees for an entire year, calculate how much annual income tax is owed per-person.
• Run-length encoding. – Run-length encoding takes a possibly-repetitive string and
rewrites it as a (value, frequency) pair, eg “aaa b ccccc dd” -> “a3 b c5 d2”.
• Find the smallest element in an array– Come up with some challenging problems yourself!
Exercise (2 of 2)
• In small groups, compare your answers to the above, and stump your team with the problems you came up with!