MapReduce Theory and Practice course/cs402/2010/ 彭波 [email protected] 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

MapReduce

Theory and Practice

http://net.pku.edu.cn/~course/cs402/2010/彭波

[email protected]北京大学信息科学技术学院

7/15/2010

Some Slides borrow from Jimmy Lin and Aaron Kimball

http://net.pku.edu.cn/~course/cs402/2010/

http://www.umiacs.umd.edu/~jimmylin/cloud-computing/index.html

http://www.cs.washington.edu/education/courses/cse490h/07wi/

2

大纲

Functional Language and MapReduce MapReduce Basic MapReduce Algorithm Design Hadoop and Java Practice

Functional Language and MapReduce

4

What is Functional Programming?

In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast with the imperative programming style that emphasizes changes in state.[1]

http://en.wikipedia.org/wiki/Computer_science

http://en.wikipedia.org/wiki/Programming_paradigm

http://en.wikipedia.org/wiki/Computation

http://en.wikipedia.org/wiki/Function_%28mathematics%29

http://en.wikipedia.org/wiki/Program_state

http://en.wikipedia.org/wiki/Immutable_object

http://en.wikipedia.org/wiki/Imperative_programming

http://en.wikipedia.org/wiki/Functional_programming#cite_note-hudak1989-0

5

Example

Summing the integers 1 to 10 in Java:

total = 0;

for (i = 1; i 10; ++i)

total = total+i;

The computation method is variable assignment.

5

6

Example

Summing the integers 1 to 10 in Haskell:

sum [1..10]

The computation method is function application.

6

7

Why is it Useful?

The abstract nature of functional programming leads to considerably simpler programs;

It also supports a number of powerful new ways to structure and reason about programs.

8

Functional Programming Review

Functional operations do not modify data structures: they always create new ones

Original data still exists in unmodified form Data flows are implicit in program design Order of operations does not matter

9

Functional Programming Review

fun foo(l: int list) = sum(l) + mul(l) + length(l)

Order of sum() and mul(), etc does not matter They do not modify l

10

Functional Updates Do Not Modify Structures

fun append(x, lst) = let lst' = reverse lst in reverse ( x :: lst' )

The append() function above reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item.

But it never modifies lst!

11

Functions Can Be Used As Arguments

fun DoDouble(f, x) = f (f x)

It does not matter what f does to its argument; DoDouble() will do it twice.

A function is called higher-order if it takes a function as an argument or returns a function as a result

12

Map

map f lst: (’a->’b) -> (’a list) -> (’b list) Creates a new list by applying f to each element of the i

nput list; returns output in order.

f f f f f f

13

Fold

fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b Moves across a list, applying f to each element plus an a

ccumulator. f returns the next accumulator value, which is combined with the next element of the list

f f f f f returned

initial

14

fold left vs. fold right

Order of list elements can be significant Fold left moves left-to-right across the list Fold right moves from right-to-left

SML Implementation:

fun foldl f a [] = a | foldl f a (x::xs) = foldl f (f(x, a)) xs

fun foldr f a [] = a | foldr f a (x::xs) = f(x, (foldr f a xs))

15

Example


How can we implement this by map and foldl?

16

Example (Solved)


fun sum(lst) = foldl (fn (a,x)=>a+x) 0 lstfun mul(lst) = foldl (fn (a,x)=>a*x) 1 lstfun length(lst) = foldl (fn (a,x)=>a+1) 0 lst

17

map Implementation

This implementation moves left-to-right across the list, mapping elements one at a time

… But does it need to?

fun map f [] = [] | map f (x::xs) = (f x) :: (map f xs)

18

Implicit Parallelism In map

In a purely functional setting, elements of a list being computed by map cannot see the effects of the computations on other elements

If order of application of f to elements in list is commutative, we can reorder or parallelize execution

This is the “secret” that MapReduce exploits

19

References

http://net.pku.edu.cn/~course/cs501/2008/resource/haskell/functional.ppt

http://net.pku.edu.cn/~course/cs501/2008/resource/haskell/





MapReduce Basic

21

Typical Large-Data Problem

Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate results Aggregate intermediate results Generate final output

Key idea: provide a functional abstraction for these two operations

Map

Reduce

(Dean and Ghemawat, OSDI 2004)

22

Roots in Functional Programming

g g g g g

f f f f fMap

Fold

23

MapReduce

Programmers specify two functions:map (k, v) → <k’, v’>*reduce (k’, v’) → <k’, v’>* All values with the same key are sent to the

same reducer The execution framework handles

everything else…

mapmap map map

Shuffle and Sort: aggregate values by keys

reduce reduce reduce

k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6

ba 1 2 c c3 6 a c5 2 b c7 8

a 1 5 b 2 7 c 2 3 6 8

r1 s1 r2 s2 r3 s3

25

MapReduce

Programmers specify two functions:map (k, v) → <k’, v’>*reduce (k’, v’) → <k’, v’>* All values with the same key are sent to the

same reducer The execution framework handles

everything else…

What’s “everything else”?

26

MapReduce “Runtime”

Handles scheduling Assigns workers to map and reduce tasks

Handles “data distribution” Moves processes to data

Handles synchronization Gathers, sorts, and shuffles intermediate data

Handles errors and faults Detects worker failures and restarts

Everything happens on top of a distributed FS (later)

27

MapReduce

Programmers specify two functions:map (k, v) → <k’, v’>*reduce (k’, v’) → <k’, v’>* All values with the same key are reduced together

The execution framework handles everything else…

Not quite…usually, programmers also specify:partition (k’, number of partitions) → partition for k’ Often a simple hash of the key, e.g., hash(k’) mod n Divides up key space for parallel reduce operationscombine (k’, v’) → <k’, v’>* Mini-reducers that run in memory after the map phase Used as an optimization to reduce network traffic

combinecombine combine combine

ba 1 2 c 9 a c5 2 b c7 8

partition partition partition partition

mapmap map map

k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6

ba 1 2 c c3 6 a c5 2 b c7 8

Shuffle and Sort: aggregate values by keys

reduce reduce reduce

a 1 5 b 2 7 c 2 9 8

r1 s1 r2 s2 r3 s3

c 2 3 6 8

29

Two more details…

Barrier between map and reduce phases But we can begin copying intermediate data

earlier Keys arrive at each reducer in sorted order

No enforced ordering across reducers

30

“Hello World”: Word Count

Map(String docid, String text): for each word w in text: Emit(w, 1);

Reduce(String term, Iterable<Int> values): int sum = 0; for each v in values: sum += v; Emit(term, value);

31

MapReduce can refer to…

The programming model The execution framework (aka “runtime”) The specific implementation

Usage is usually clear from context!

32

MapReduce Implementations

Google has a proprietary implementation in C++ Bindings in Java, Python

Hadoop is an open-source implementation in Java Development led by Yahoo, used in production Now an Apache project Rapidly expanding software ecosystem

Lots of custom research implementations For GPUs, cell processors, etc.

split 0

split 1

split 2

split 3

split 4

worker

worker

worker

worker

worker

Master

UserProgram

outputfile 0

outputfile 1

(1) submit

(2) schedule map (2) schedule reduce

(3) read(4) local write

(5) remote read(6) write

Inputfiles

Mapphase

Intermediate files(on local disk)

Reducephase

Outputfiles

Adapted from (Dean and Ghemawat, OSDI 2004)

MapReduce Algorithm Design

35

“Everything Else”

The execution framework handles everything else… Scheduling: assigns workers to map and reduce tasks “Data distribution”: moves processes to data Synchronization: gathers, sorts, and shuffles intermediate data Errors and faults: detects worker failures and restarts

Limited control over data and execution flow All algorithms must expressed in m, r, c, p

You don’t know: Where mappers and reducers run When a mapper or reducer begins or finishes Which input a particular mapper is processing Which intermediate key a particular reducer is processing

36

Tools for Programmer

Cleverly-constructed data structures Bring partial results together

Sort order of intermediate keys Control order in which reducers process keys

Partitioner Control which reducer processes which keys

Preserving state in mappers and reducers Capture dependencies across multiple keys and valu

es

37

Preserving State

Mapper object

configure

map

close

stateone object per task

Reducer object

configure

reduce

close

state

one call per input key-value pair

one call per intermediate key

API initialization hook

API cleanup hook

38

Scalable Hadoop Algorithms: Themes

Avoid object creation Inherently costly operation Garbage collection

Avoid buffering Limited heap size Works for small datasets, but won’t scale!

39

Importance of Local Aggregation

Ideal scaling characteristics: Twice the data, twice the running time Twice the resources, half the running time

Why can’t we achieve this? Synchronization requires communication Communication kills performance

Thus… avoid communication! Reduce intermediate data via local aggregation Combiners can help

40

Shuffle and Sort

Mapper

Reducer

other mappers

other reducers

circular buffer (in memory)

spills (on disk)

merged spills (on disk)

intermediate files (on disk)

Combiner

Combiner

41

Word Count: Baseline

What’s the impact of combiners?

42

Word Count: Version 1

Are combiners still needed?

43

Word Count: Version 2


Key: preserve state across

input key-value pairs!

44

Design Pattern for Local Aggregation

“In-mapper combining” Fold the functionality of the combiner into the mapp

er by preserving state across multiple map calls Advantages

Speed Why is this faster than actual combiners?

Disadvantages Explicit memory management required Potential for order-dependent bugs

45

Combiner Design

Combiners and reducers share same method signature Sometimes, reducers can serve as combiners Often, not…

Remember: combiner are optional optimizations Should not affect algorithm correctness May be run 0, 1, or multiple times

Example: find average of all integers associated with the same key

46

Computing the Mean: Version 1

Why can’t we use reducer as combiner?

47


Why doesn’t this work?

48


Fixed?

49



50

Algorithm Design: Running Example

Term co-occurrence matrix for a text collection M = N x N matrix (N = vocabulary size) Mij: number of times i and j co-occur in some context

(for concreteness, let’s say context = sentence) Why?

Distributional profiles as a way of measuring semantic distance

Semantic distance useful for many language processing tasks

51

MapReduce: Large Counting Problems

Term co-occurrence matrix for a text collection= specific instance of a large counting problem A large event space (number of terms) A large number of observations (the collection itself) Goal: keep track of interesting statistics about the ev

ents Basic approach

Mappers generate partial counts Reducers aggregate partial counts

How do we aggregate partial counts efficiently?

52

First Try: “Pairs”

Each mapper takes a sentence: Generate all co-occurring term pairs For all pairs, emit (a, b) → count

Reducers sum up counts associated with these pairs

Use combiners!

53

Pairs: Pseudo-Code

54

“Pairs” Analysis

Advantages Easy to implement, easy to understand

Disadvantages Lots of pairs to sort and shuffle around (upper

bound?) Not many opportunities for combiners to work

55

Idea: group together pairs into an associative array

Each mapper takes a sentence: Generate all co-occurring term pairs For each term, emit a → { b: countb, c: countc, d: countd … }

Reducers perform element-wise sum of associative arrays

Another Try: “Stripes”

(a, b) → 1 (a, c) → 2 (a, d) → 5 (a, e) → 3 (a, f) → 2

a → { b: 1, c: 2, d: 5, e: 3, f: 2 }

a → { b: 1, d: 5, e: 3 }a → { b: 1, c: 2, d: 2, f: 2 }a → { b: 2, c: 2, d: 7, e: 3, f: 2 }

+

Key: cleverly-constructed data structure

brings together partial results

56

Stripes: Pseudo-Code

57

“Stripes” Analysis

Advantages Far less sorting and shuffling of key-value pairs Can make better use of combiners

Disadvantages More difficult to implement Underlying object more heavyweight Fundamental limitation in terms of size of event

space

58Cluster size: 38 coresData Source: Associated Press Worldstream (APW) of the English Gigaword Corpus (v3), which contains 2.27 million documents (1.8 GB compressed, 5.7 GB uncompressed)

59

60

Relative Frequencies

How do we estimate relative frequencies from counts?

Why do we want to do this? How do we do this with MapReduce?

'

)',(count

),(count

)(count

),(count)|(

B

BA

BA

A

BAABf

Joint Event

Marginal

61

f(B|A): “Stripes”

Easy! One pass to compute (a, *) Another pass to directly compute f(B|A)

a → {b1:3, b2 :12, b3 :7, b4 :1, … }

62

f(B|A): “Pairs”

For this to work: Must emit extra (a, *) for every bn in mapper Must make sure all a’s get sent to same reducer (us

e partitioner) Must make sure (a, *) comes first (define sort order) Must hold state in reducer across different key-value

pairs

(a, b1) → 3 (a, b2) → 12 (a, b3) → 7(a, b4) → 1 …

(a, *) → 32

(a, b1) → 3 / 32 (a, b2) → 12 / 32(a, b3) → 7 / 32(a, b4) → 1 / 32…

Reducer holds this value in memory

63

“Order Inversion”

Common design pattern Computing relative frequencies requires

marginal counts But marginal cannot be computed until you see

all counts Buffering is a bad idea! Trick: getting the marginal counts to arrive at

the reducer before the joint counts Optimizations

Apply in-memory combining pattern to accumulate marginal counts

Should we apply combiners?

64

Synchronization: Pairs vs. Stripes

Approach 1: turn synchronization into an ordering problem

Sort keys into correct order of computation Partition key space so that each reducer gets the

appropriate set of partial results Hold state in reducer across multiple key-value pairs to

perform computation Illustrated by the “pairs” approach

Approach 2: construct data structures that bring partial results together

Each reducer receives all the data it needs to complete the computation

Illustrated by the “stripes” approach

65

Secondary Sorting

MapReduce sorts input to reducers by key Values may be arbitrarily ordered

What if want to sort value also? E.g., k → (v1, r), (v3, r), (v4, r), (v8, r)…

66

Secondary Sorting: Solutions

Solution 1: Buffer values in memory, then sort Why is this a bad idea?

Solution 2: “Value-to-key conversion” design pattern: form

composite intermediate key, (k, v1) Let execution framework do the sorting Preserve state across multiple key-value pairs

to handle processing Anything else we need to do?

67

Recap: Tools for Synchronization

Cleverly-constructed data structures Bring data together

Sort order of intermediate keys Control order in which reducers process keys

Partitioner Control which reducer processes which keys

Preserving state in mappers and reducers Capture dependencies across multiple keys and valu

es

68

Issues and Tradeoffs

Number of key-value pairs Object creation overhead Time for sorting and shuffling pairs across the netwo

rk Size of each key-value pair

De/serialization overhead Local aggregation

Opportunities to perform local aggregation varies Combiners make a big difference Combiners vs. in-mapper combining RAM vs. disk vs. network

69

Debugging at Scale

Works on small datasets, won’t scale… why? Memory management issues (buffering and

object creation) Too much intermediate data Mangled input records

Real-world data is messy! Word count: how many unique words in

Wikipedia? There’s no such thing as “consistent data” Watch out for corner cases Isolate unexpected behavior, bring local

Hadoop and Java Practice

71

Basic Hadoop API*(0.20.0)

Mapper map(KEYIN key, VALUEIN value, Mapper.Context conte

xt) setup(Mapper.Context context) cleanup(Mapper.Context context)

Reducer/Combiner reduce(KEYIN key, Iterable<VALUEIN> values, Reducer.

Context context) Setup/cleanup

Partitioner getPartition(KEY key, VALUE value, int numPartitions)

*Note: forthcoming API changes…

http://net.pku.edu.cn/%7Ecourse/cs402/2010/resource/2009/hadoop-0.20.0/docs/api/org/apache/hadoop/mapreduce/Mapper.html#map%28KEYIN,%20VALUEIN,%20org.apache.hadoop.mapreduce.Mapper.Context%29

http://net.pku.edu.cn/%7Ecourse/cs402/2010/resource/2009/hadoop-0.20.0/docs/api/org/apache/hadoop/mapreduce/Mapper.html

http://net.pku.edu.cn/%7Ecourse/cs402/2010/resource/2009/hadoop-0.20.0/docs/api/org/apache/hadoop/mapreduce/Mapper.html

http://net.pku.edu.cn/%7Ecourse/cs402/2010/resource/2009/hadoop-0.20.0/docs/api/org/apache/hadoop/mapreduce/Mapper.Context.html

http://net.pku.edu.cn/%7Ecourse/cs402/2010/resource/2009/hadoop-0.20.0/docs/api/org/apache/hadoop/mapreduce/Mapper.html#setup%28org.apache.hadoop.mapreduce.Mapper.Context%29

http://net.pku.edu.cn/%7Ecourse/cs402/2010/resource/2009/hadoop-0.20.0/docs/api/org/apache/hadoop/mapreduce/Mapper.Context.html

http://net.pku.edu.cn/%7Ecourse/cs402/2010/resource/2009/hadoop-0.20.0/docs/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup%28org.apache.hadoop.mapreduce.Mapper.Context%29

72

Data Types in Hadoop

Writable Defines a de/serialization protocol. Every data type in Hadoop is a Writable.

WritableComprable Defines a sort order. All keys must be of this type (but not values).

IntWritableLongWritableText…

Concrete classes for different data types.

SequenceFiles Binary encoded of a sequence of key/value pairs

73

Complex Data Types in Hadoop

How do you implement complex data types? The easiest way:

Encoded it as Text, e.g., (a, b) = “a:b” Use regular expressions to parse and extract data Works, but pretty hack-ish

The hard way: Define a custom implementation of WritableComprable Must implement: readFields, write, compareTo Computationally efficient, but slow for rapid prototyping

74

Basic Cluster Components

One of each: Namenode (NN) Jobtracker (JT)

Set of each per slave machine: Tasktracker (TT) Datanode (DN)

75

Putting everything together…

datanode daemon

Linux file system

…

tasktracker

slave node

datanode daemon

Linux file system

…

tasktracker

slave node

datanode daemon

Linux file system

…

tasktracker

slave node

namenode

namenode daemon

job submission node

jobtracker

76

Anatomy of a Job

MapReduce program in Hadoop = Hadoop job Jobs are divided into map and reduce tasks An instance of running a task is called a task attempt Multiple jobs can be composed into a workflow

Job submission process Client (i.e., driver program) creates a job, configures it, and sub

mits it to job tracker JobClient computes input splits (on client end) Job data (jar, configuration XML) are sent to JobTracker JobTracker puts job data in shared location, enqueues tasks TaskTrackers poll for tasks Off to the races…

InputSplit

Source: redrawn from a slide by Cloduera, cc-licensed

InputSplit InputSplit

Input File Input File

InputSplit InputSplit

RecordReader RecordReader RecordReader RecordReader RecordReader

Mapper

Intermediates

Mapper

Intermediates

Mapper

Intermediates

Mapper

Intermediates

Mapper

Intermediates

Inp

utF

orm

at


Mapper Mapper Mapper Mapper Mapper

Partitioner Partitioner Partitioner Partitioner Partitioner

Intermediates Intermediates Intermediates Intermediates Intermediates

Reducer Reducer Reduce

Intermediates Intermediates Intermediates

(combiners omitted here)


Reducer Reducer Reduce

Output File

RecordWriter

Ou

tpu

tFo

rmat

Output File

RecordWriter

Output File

RecordWriter

80

Input and Output

InputFormat: TextInputFormat KeyValueTextInputFormat SequenceFileInputFormat …

OutputFormat: TextOutputFormat SequenceFileOutputFormat …

81

Shuffle and Sort in Hadoop

Probably the most complex aspect of MapReduce! Map side

Map outputs are buffered in memory in a circular buffer When buffer reaches threshold, contents are “spilled” to dis

k Spills merged in a single, partitioned file (sorted within each pa

rtition): combiner runs here Reduce side

First, map outputs are copied over to reducer machine “Sort” is a multi-pass merge of map outputs (happens in me

mory and on disk): combiner runs here Final merge pass goes directly into reducer

Q&A

83

What is Hugs?

An interpreter for Haskell, and the most widely used implementation of the language;

An interactive system, which is well-suited for teaching and prototyping purposes;

Hugs is freely available from:

www.haskell.org/hugs

84

The Standard Prelude

When Hugs is started it first loads the library file Prelude.hs, and then repeatedly prompts the user for an expression to be evaluated.

For example:

> 2+3*414

> (2+3)*420

85

> length [1,2,3,4]4

> product [1,2,3,4]24

> take 3 [1,2,3,4,5][1,2,3]

The standard prelude also provides many useful functions that operate on lists. For example:

86

Function Application

In mathematics, function application is denoted using parentheses, and multiplication is often denoted using juxtaposition or space.

f(a,b) + c d

Apply the function f to a and b, and add the result to the product of c

and d.

87

In Haskell, function application is denoted using space, and multiplication is denoted using *.

f a b + c*d

As previously, but in Haskell syntax.

Documents

MapReduce Theory and Practice course/cs402/2010/ 彭波 [email protected] 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and