87
MapReduce Theory and Practice http://net.pku.edu.cn/~course/cs402/2010/ 彭彭 [email protected] 彭彭彭彭彭彭彭彭彭彭彭彭 7/15/2010 Some Slides borrow from Jimmy Lin and Aaron Kimball

MapReduce Theory and Practice course/cs402/2010/ 彭波 [email protected] 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

Embed Size (px)

Citation preview

Page 1: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

MapReduce

Theory and Practice

http://net.pku.edu.cn/~course/cs402/2010/彭波

[email protected]北京大学信息科学技术学院

7/15/2010

Some Slides borrow from Jimmy Lin and Aaron Kimball

Page 2: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

2

大纲

Functional Language and MapReduce MapReduce Basic MapReduce Algorithm Design Hadoop and Java Practice

Page 3: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

Functional Language and MapReduce

Page 4: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

4

What is Functional Programming?

In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast with the imperative programming style that emphasizes changes in state.[1]

Page 5: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

5

Example

Summing the integers 1 to 10 in Java:

total = 0;

for (i = 1; i 10; ++i)

total = total+i;

The computation method is variable assignment.

5

Page 6: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

6

Example

Summing the integers 1 to 10 in Haskell:

sum [1..10]

The computation method is function application.

6

Page 7: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

7

Why is it Useful?

The abstract nature of functional programming leads to considerably simpler programs;

It also supports a number of powerful new ways to structure and reason about programs.

Page 8: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

8

Functional Programming Review

Functional operations do not modify data structures: they always create new ones

Original data still exists in unmodified form Data flows are implicit in program design Order of operations does not matter

Page 9: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

9

Functional Programming Review

fun foo(l: int list) = sum(l) + mul(l) + length(l)

Order of sum() and mul(), etc does not matter They do not modify l

Page 10: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

10

Functional Updates Do Not Modify Structures

fun append(x, lst) = let lst' = reverse lst in reverse ( x :: lst' )

The append() function above reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item.

But it never modifies lst!

Page 11: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

11

Functions Can Be Used As Arguments

fun DoDouble(f, x) = f (f x)

It does not matter what f does to its argument; DoDouble() will do it twice.

A function is called higher-order if it takes a function as an argument or returns a function as a result

Page 12: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

12

Map

map f lst: (’a->’b) -> (’a list) -> (’b list) Creates a new list by applying f to each element of the i

nput list; returns output in order.

f f f f f f

Page 13: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

13

Fold

fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b Moves across a list, applying f to each element plus an a

ccumulator. f returns the next accumulator value, which is combined with the next element of the list

f f f f f returned

initial

Page 14: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

14

fold left vs. fold right

Order of list elements can be significant Fold left moves left-to-right across the list Fold right moves from right-to-left

SML Implementation:

fun foldl f a [] = a | foldl f a (x::xs) = foldl f (f(x, a)) xs

fun foldr f a [] = a | foldr f a (x::xs) = f(x, (foldr f a xs))

Page 15: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

15

Example

fun foo(l: int list) = sum(l) + mul(l) + length(l)

How can we implement this by map and foldl?

Page 16: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

16

Example (Solved)

fun foo(l: int list) = sum(l) + mul(l) + length(l)

fun sum(lst) = foldl (fn (a,x)=>a+x) 0 lstfun mul(lst) = foldl (fn (a,x)=>a*x) 1 lstfun length(lst) = foldl (fn (a,x)=>a+1) 0 lst

Page 17: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

17

map Implementation

This implementation moves left-to-right across the list, mapping elements one at a time

… But does it need to?

fun map f [] = [] | map f (x::xs) = (f x) :: (map f xs)

Page 18: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

18

Implicit Parallelism In map

In a purely functional setting, elements of a list being computed by map cannot see the effects of the computations on other elements

If order of application of f to elements in list is commutative, we can reorder or parallelize execution

This is the “secret” that MapReduce exploits

Page 20: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

MapReduce Basic

Page 21: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

21

Typical Large-Data Problem

Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate results Aggregate intermediate results Generate final output

Key idea: provide a functional abstraction for these two operations

Map

Reduce

(Dean and Ghemawat, OSDI 2004)

Page 22: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

22

Roots in Functional Programming

g g g g g

f f f f fMap

Fold

Page 23: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

23

MapReduce

Programmers specify two functions:map (k, v) → <k’, v’>*reduce (k’, v’) → <k’, v’>* All values with the same key are sent to the

same reducer The execution framework handles

everything else…

Page 24: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

mapmap map map

Shuffle and Sort: aggregate values by keys

reduce reduce reduce

k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6

ba 1 2 c c3 6 a c5 2 b c7 8

a 1 5 b 2 7 c 2 3 6 8

r1 s1 r2 s2 r3 s3

Page 25: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

25

MapReduce

Programmers specify two functions:map (k, v) → <k’, v’>*reduce (k’, v’) → <k’, v’>* All values with the same key are sent to the

same reducer The execution framework handles

everything else…

What’s “everything else”?

Page 26: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

26

MapReduce “Runtime”

Handles scheduling Assigns workers to map and reduce tasks

Handles “data distribution” Moves processes to data

Handles synchronization Gathers, sorts, and shuffles intermediate data

Handles errors and faults Detects worker failures and restarts

Everything happens on top of a distributed FS (later)

Page 27: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

27

MapReduce

Programmers specify two functions:map (k, v) → <k’, v’>*reduce (k’, v’) → <k’, v’>* All values with the same key are reduced together

The execution framework handles everything else…

Not quite…usually, programmers also specify:partition (k’, number of partitions) → partition for k’ Often a simple hash of the key, e.g., hash(k’) mod n Divides up key space for parallel reduce operationscombine (k’, v’) → <k’, v’>* Mini-reducers that run in memory after the map phase Used as an optimization to reduce network traffic

Page 28: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

combinecombine combine combine

ba 1 2 c 9 a c5 2 b c7 8

partition partition partition partition

mapmap map map

k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6

ba 1 2 c c3 6 a c5 2 b c7 8

Shuffle and Sort: aggregate values by keys

reduce reduce reduce

a 1 5 b 2 7 c 2 9 8

r1 s1 r2 s2 r3 s3

c 2 3 6 8

Page 29: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

29

Two more details…

Barrier between map and reduce phases But we can begin copying intermediate data

earlier Keys arrive at each reducer in sorted order

No enforced ordering across reducers

Page 30: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

30

“Hello World”: Word Count

Map(String docid, String text): for each word w in text: Emit(w, 1);

Reduce(String term, Iterable<Int> values): int sum = 0; for each v in values: sum += v; Emit(term, value);

Page 31: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

31

MapReduce can refer to…

The programming model The execution framework (aka “runtime”) The specific implementation

Usage is usually clear from context!

Page 32: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

32

MapReduce Implementations

Google has a proprietary implementation in C++ Bindings in Java, Python

Hadoop is an open-source implementation in Java Development led by Yahoo, used in production Now an Apache project Rapidly expanding software ecosystem

Lots of custom research implementations For GPUs, cell processors, etc.

Page 33: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

split 0

split 1

split 2

split 3

split 4

worker

worker

worker

worker

worker

Master

UserProgram

outputfile 0

outputfile 1

(1) submit

(2) schedule map (2) schedule reduce

(3) read(4) local write

(5) remote read(6) write

Inputfiles

Mapphase

Intermediate files(on local disk)

Reducephase

Outputfiles

Adapted from (Dean and Ghemawat, OSDI 2004)

Page 34: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

MapReduce Algorithm Design

Page 35: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

35

“Everything Else”

The execution framework handles everything else… Scheduling: assigns workers to map and reduce tasks “Data distribution”: moves processes to data Synchronization: gathers, sorts, and shuffles intermediate data Errors and faults: detects worker failures and restarts

Limited control over data and execution flow All algorithms must expressed in m, r, c, p

You don’t know: Where mappers and reducers run When a mapper or reducer begins or finishes Which input a particular mapper is processing Which intermediate key a particular reducer is processing

Page 36: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

36

Tools for Programmer

Cleverly-constructed data structures Bring partial results together

Sort order of intermediate keys Control order in which reducers process keys

Partitioner Control which reducer processes which keys

Preserving state in mappers and reducers Capture dependencies across multiple keys and valu

es

Page 37: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

37

Preserving State

Mapper object

configure

map

close

stateone object per task

Reducer object

configure

reduce

close

state

one call per input key-value pair

one call per intermediate key

API initialization hook

API cleanup hook

Page 38: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

38

Scalable Hadoop Algorithms: Themes

Avoid object creation Inherently costly operation Garbage collection

Avoid buffering Limited heap size Works for small datasets, but won’t scale!

Page 39: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

39

Importance of Local Aggregation

Ideal scaling characteristics: Twice the data, twice the running time Twice the resources, half the running time

Why can’t we achieve this? Synchronization requires communication Communication kills performance

Thus… avoid communication! Reduce intermediate data via local aggregation Combiners can help

Page 40: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

40

Shuffle and Sort

Mapper

Reducer

other mappers

other reducers

circular buffer (in memory)

spills (on disk)

merged spills (on disk)

intermediate files (on disk)

Combiner

Combiner

Page 41: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

41

Word Count: Baseline

What’s the impact of combiners?

Page 42: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

42

Word Count: Version 1

Are combiners still needed?

Page 43: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

43

Word Count: Version 2

Are combiners still needed?

Key: preserve state across

input key-value pairs!

Page 44: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

44

Design Pattern for Local Aggregation

“In-mapper combining” Fold the functionality of the combiner into the mapp

er by preserving state across multiple map calls Advantages

Speed Why is this faster than actual combiners?

Disadvantages Explicit memory management required Potential for order-dependent bugs

Page 45: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

45

Combiner Design

Combiners and reducers share same method signature Sometimes, reducers can serve as combiners Often, not…

Remember: combiner are optional optimizations Should not affect algorithm correctness May be run 0, 1, or multiple times

Example: find average of all integers associated with the same key

Page 46: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

46

Computing the Mean: Version 1

Why can’t we use reducer as combiner?

Page 47: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

47

Computing the Mean: Version 2

Why doesn’t this work?

Page 48: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

48

Computing the Mean: Version 3

Fixed?

Page 49: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

49

Computing the Mean: Version 4

Are combiners still needed?

Page 50: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

50

Algorithm Design: Running Example

Term co-occurrence matrix for a text collection M = N x N matrix (N = vocabulary size) Mij: number of times i and j co-occur in some context

(for concreteness, let’s say context = sentence) Why?

Distributional profiles as a way of measuring semantic distance

Semantic distance useful for many language processing tasks

Page 51: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

51

MapReduce: Large Counting Problems

Term co-occurrence matrix for a text collection= specific instance of a large counting problem A large event space (number of terms) A large number of observations (the collection itself) Goal: keep track of interesting statistics about the ev

ents Basic approach

Mappers generate partial counts Reducers aggregate partial counts

How do we aggregate partial counts efficiently?

Page 52: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

52

First Try: “Pairs”

Each mapper takes a sentence: Generate all co-occurring term pairs For all pairs, emit (a, b) → count

Reducers sum up counts associated with these pairs

Use combiners!

Page 53: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

53

Pairs: Pseudo-Code

Page 54: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

54

“Pairs” Analysis

Advantages Easy to implement, easy to understand

Disadvantages Lots of pairs to sort and shuffle around (upper

bound?) Not many opportunities for combiners to work

Page 55: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

55

Idea: group together pairs into an associative array

Each mapper takes a sentence: Generate all co-occurring term pairs For each term, emit a → { b: countb, c: countc, d: countd … }

Reducers perform element-wise sum of associative arrays

Another Try: “Stripes”

(a, b) → 1 (a, c) → 2 (a, d) → 5 (a, e) → 3 (a, f) → 2

a → { b: 1, c: 2, d: 5, e: 3, f: 2 }

a → { b: 1, d: 5, e: 3 }a → { b: 1, c: 2, d: 2, f: 2 }a → { b: 2, c: 2, d: 7, e: 3, f: 2 }

+

Key: cleverly-constructed data structure

brings together partial results

Page 56: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

56

Stripes: Pseudo-Code

Page 57: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

57

“Stripes” Analysis

Advantages Far less sorting and shuffling of key-value pairs Can make better use of combiners

Disadvantages More difficult to implement Underlying object more heavyweight Fundamental limitation in terms of size of event

space

Page 58: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

58Cluster size: 38 coresData Source: Associated Press Worldstream (APW) of the English Gigaword Corpus (v3), which contains 2.27 million documents (1.8 GB compressed, 5.7 GB uncompressed)

Page 59: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

59

Page 60: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

60

Relative Frequencies

How do we estimate relative frequencies from counts?

Why do we want to do this? How do we do this with MapReduce?

'

)',(count

),(count

)(count

),(count)|(

B

BA

BA

A

BAABf

Joint Event

Marginal

Page 61: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

61

f(B|A): “Stripes”

Easy! One pass to compute (a, *) Another pass to directly compute f(B|A)

a → {b1:3, b2 :12, b3 :7, b4 :1, … }

Page 62: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

62

f(B|A): “Pairs”

For this to work: Must emit extra (a, *) for every bn in mapper Must make sure all a’s get sent to same reducer (us

e partitioner) Must make sure (a, *) comes first (define sort order) Must hold state in reducer across different key-value

pairs

(a, b1) → 3 (a, b2) → 12 (a, b3) → 7(a, b4) → 1 …

(a, *) → 32

(a, b1) → 3 / 32 (a, b2) → 12 / 32(a, b3) → 7 / 32(a, b4) → 1 / 32…

Reducer holds this value in memory

Page 63: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

63

“Order Inversion”

Common design pattern Computing relative frequencies requires

marginal counts But marginal cannot be computed until you see

all counts Buffering is a bad idea! Trick: getting the marginal counts to arrive at

the reducer before the joint counts Optimizations

Apply in-memory combining pattern to accumulate marginal counts

Should we apply combiners?

Page 64: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

64

Synchronization: Pairs vs. Stripes

Approach 1: turn synchronization into an ordering problem

Sort keys into correct order of computation Partition key space so that each reducer gets the

appropriate set of partial results Hold state in reducer across multiple key-value pairs to

perform computation Illustrated by the “pairs” approach

Approach 2: construct data structures that bring partial results together

Each reducer receives all the data it needs to complete the computation

Illustrated by the “stripes” approach

Page 65: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

65

Secondary Sorting

MapReduce sorts input to reducers by key Values may be arbitrarily ordered

What if want to sort value also? E.g., k → (v1, r), (v3, r), (v4, r), (v8, r)…

Page 66: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

66

Secondary Sorting: Solutions

Solution 1: Buffer values in memory, then sort Why is this a bad idea?

Solution 2: “Value-to-key conversion” design pattern: form

composite intermediate key, (k, v1) Let execution framework do the sorting Preserve state across multiple key-value pairs

to handle processing Anything else we need to do?

Page 67: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

67

Recap: Tools for Synchronization

Cleverly-constructed data structures Bring data together

Sort order of intermediate keys Control order in which reducers process keys

Partitioner Control which reducer processes which keys

Preserving state in mappers and reducers Capture dependencies across multiple keys and valu

es

Page 68: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

68

Issues and Tradeoffs

Number of key-value pairs Object creation overhead Time for sorting and shuffling pairs across the netwo

rk Size of each key-value pair

De/serialization overhead Local aggregation

Opportunities to perform local aggregation varies Combiners make a big difference Combiners vs. in-mapper combining RAM vs. disk vs. network

Page 69: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

69

Debugging at Scale

Works on small datasets, won’t scale… why? Memory management issues (buffering and

object creation) Too much intermediate data Mangled input records

Real-world data is messy! Word count: how many unique words in

Wikipedia? There’s no such thing as “consistent data” Watch out for corner cases Isolate unexpected behavior, bring local

Page 70: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

Hadoop and Java Practice

Page 72: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

72

Data Types in Hadoop

Writable Defines a de/serialization protocol. Every data type in Hadoop is a Writable.

WritableComprable Defines a sort order. All keys must be of this type (but not values).

IntWritableLongWritableText…

Concrete classes for different data types.

SequenceFiles Binary encoded of a sequence of key/value pairs

Page 73: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

73

Complex Data Types in Hadoop

How do you implement complex data types? The easiest way:

Encoded it as Text, e.g., (a, b) = “a:b” Use regular expressions to parse and extract data Works, but pretty hack-ish

The hard way: Define a custom implementation of WritableComprable Must implement: readFields, write, compareTo Computationally efficient, but slow for rapid prototyping

Page 74: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

74

Basic Cluster Components

One of each: Namenode (NN) Jobtracker (JT)

Set of each per slave machine: Tasktracker (TT) Datanode (DN)

Page 75: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

75

Putting everything together…

datanode daemon

Linux file system

tasktracker

slave node

datanode daemon

Linux file system

tasktracker

slave node

datanode daemon

Linux file system

tasktracker

slave node

namenode

namenode daemon

job submission node

jobtracker

Page 76: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

76

Anatomy of a Job

MapReduce program in Hadoop = Hadoop job Jobs are divided into map and reduce tasks An instance of running a task is called a task attempt Multiple jobs can be composed into a workflow

Job submission process Client (i.e., driver program) creates a job, configures it, and sub

mits it to job tracker JobClient computes input splits (on client end) Job data (jar, configuration XML) are sent to JobTracker JobTracker puts job data in shared location, enqueues tasks TaskTrackers poll for tasks Off to the races…

Page 77: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

InputSplit

Source: redrawn from a slide by Cloduera, cc-licensed

InputSplit InputSplit

Input File Input File

InputSplit InputSplit

RecordReader RecordReader RecordReader RecordReader RecordReader

Mapper

Intermediates

Mapper

Intermediates

Mapper

Intermediates

Mapper

Intermediates

Mapper

Intermediates

Inp

utF

orm

at

Page 78: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

Source: redrawn from a slide by Cloduera, cc-licensed

Mapper Mapper Mapper Mapper Mapper

Partitioner Partitioner Partitioner Partitioner Partitioner

Intermediates Intermediates Intermediates Intermediates Intermediates

Reducer Reducer Reduce

Intermediates Intermediates Intermediates

(combiners omitted here)

Page 79: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

Source: redrawn from a slide by Cloduera, cc-licensed

Reducer Reducer Reduce

Output File

RecordWriter

Ou

tpu

tFo

rmat

Output File

RecordWriter

Output File

RecordWriter

Page 80: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

80

Input and Output

InputFormat: TextInputFormat KeyValueTextInputFormat SequenceFileInputFormat …

OutputFormat: TextOutputFormat SequenceFileOutputFormat …

Page 81: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

81

Shuffle and Sort in Hadoop

Probably the most complex aspect of MapReduce! Map side

Map outputs are buffered in memory in a circular buffer When buffer reaches threshold, contents are “spilled” to dis

k Spills merged in a single, partitioned file (sorted within each pa

rtition): combiner runs here Reduce side

First, map outputs are copied over to reducer machine “Sort” is a multi-pass merge of map outputs (happens in me

mory and on disk): combiner runs here Final merge pass goes directly into reducer

Page 82: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

Q&A

Page 83: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

83

What is Hugs?

An interpreter for Haskell, and the most widely used implementation of the language;

An interactive system, which is well-suited for teaching and prototyping purposes;

Hugs is freely available from:

www.haskell.org/hugs

Page 84: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

84

The Standard Prelude

When Hugs is started it first loads the library file Prelude.hs, and then repeatedly prompts the user for an expression to be evaluated.

For example:

> 2+3*414

> (2+3)*420

Page 85: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

85

> length [1,2,3,4]4

> product [1,2,3,4]24

> take 3 [1,2,3,4,5][1,2,3]

The standard prelude also provides many useful functions that operate on lists. For example:

Page 86: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

86

Function Application

In mathematics, function application is denoted using parentheses, and multiplication is often denoted using juxtaposition or space.

f(a,b) + c d

Apply the function f to a and b, and add the result to the product of c

and d.

Page 87: MapReduce Theory and Practice course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and

87

In Haskell, function application is denoted using space, and multiplication is denoted using *.

f a b + c*d

As previously, but in Haskell syntax.