85
Introduction to Map/Reduce Examples and Principles

Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Introduction to Map/Reduce

Examples and Principles

Page 2: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Recall the framework:

D1 map()

• User defines <key,value>, mapper, and reducer

reduce() Output

Page 3: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Recall the framework:

D1

D2

Dn

map()

map()

map()

• Hadoop handles the logistics

reduce()

reduce()

O1

Om

Page 4: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Hadoop Rule of Thumb

• 1 mapper per data split (typically)

Page 5: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Hadoop Rule of Thumb

• 1 mapper per data split (typically)

• 1 reducer per computer core (best parallelism)

Page 6: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Hadoop Rule of Thumb

• 1 mapper per data split (typically)

• 1 reducer per computer core (best parallelism)

Processing Time

Number Output Files

Page 7: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Wordcount Strategy

• Let <word, 1> be the <key,value> • Simple mapper & reducer • Hadoop did the hard work of shuffling &

grouping

Page 8: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Good key-value properties • Simple • Enables reducers to get correct output

Shuffling & Grouping

Key-Value simplicity

Page 9: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Good Task Decomposition:

Mappers: simple and separable Reducers: easy consolidation

Page 10: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Example: Trending Wordcount

Page 11: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Trending Wordcount • Twitter Data: date, message, location,

… [other metadata]

Page 12: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Trending Wordcount • Twitter Data: date, message, location,

… [other metadata]

Task 1 Get word count by day Task 2 Get total word count

Page 13: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Trending Wordcount

Task 1: get word count by day

Page 14: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Trending Wordcount

Task 1: get word count by day Design: Use composite key Map/Reduce: <date word,count>

Page 15: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Trending Wordcount

Task 2: get total word count

Page 16: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Trending Wordcount

Task 2: get total word count Easy way: re-use previous wordcount

Page 17: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Trending Wordcount

Task 2: get total word count Alternatively: use Task 1 output (it’s partially aggregated)

Page 18: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Cascading Map/Reduce

D1 map() reduce() Output

Task 1:

E1 map() reduce() Output

Task 2:

Task 3 …

Page 19: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Example: Joining Data

Page 20: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Joining Data • Task: combine datasets by key

– A standard data management function

Page 21: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Joining Data • Task: combine datasets by key

– A standard data management function – In pseudo SQL

Select * from table A, table B, where A.key=B.key

Page 22: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Joining Data • Task: combine datasets by key

– A standard data management function – In pseudo SQL

Select * from table A, table B, where A.key=B.key

– Joins can be inner, left or right outer

Page 23: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Joining Data • Task: given two wordcount datasets …

Page 24: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Joining Data • Task: given two wordcount datasets …

able , 5

actor , 18 burger , 25 . . .

File A: <word, total-count>

Page 25: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Joining Data • Task: given two wordcount datasets …

able , 5

actor , 18 burger , 25 . . .

File A: <word, total-count> File B: <date word, day-count> Jan-16 able , 2 Feb-22 actor , 15 May-03 actor , 3 Jul-4 burger, 20 . . .

Page 26: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

able , 5 actor , 18 burger , 25 . . .

Joining Data • Task: combine by word

Jan-16 able , 2 Feb-22 actor , 15 May-03 actor , 3 Jul-04 burger, 20 . . .

File A: <word, total-count> File B: <date word, day-count>

Page 27: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Joining Data • Result wanted:

able Jan-16, 2 5 actor Feb-22, 15 18 actor May-03, 1 18 burger Jul-04, 20 25 . . .

File AjoinB: <word date, day-count total-count >

Page 28: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Joining Data • Recall that data is split in parts

actor 18

Feb-22 actor 15

May-03 actor 1

Apr-15 actor 2 How to gather the right pieces?

Page 29: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Key-Value & Task Decomposition • Main design consideration: Join depends on word (e.g. Select * where A.word=B.word)

Page 30: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Key-Value & Task Decomposition • For the join:

– Let <key> = word – Let <value> = other info

<word, … >

Page 31: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Key-Value & Task Decomposition

• Note: able , 5

actor , 18 . . .

File A: <word, total-count> File B: <date word, day-count> Jan-16 able , 2 Feb-22 actor , 15 . . .

Page 32: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Key-Value & Task Decomposition

• Note: able , 5

actor , 18 . . .

File A: <word, total-count> File B: <date word, day-count> Jan-16 able , 2 Feb-22 actor , 15 . . .

word already the key

Page 33: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Key-Value & Task Decomposition

• Note: able , 5

actor , 18 . . .

File A: <word, total-count> File B: <date word, day-count> Jan-16 able , 2 Feb-22 actor , 15 . . .

date needs to be filtered out

Page 34: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Key-Value & Task Decomposition

• Note: able , 5

actor , 18 . . .

File A: <word, total-count> File B: <date word, day-count> Jan-16 able , 2 Feb-22 actor , 15 . . .

date needs to be filtered out Where should date info go?

Page 35: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Key-Value & Task Decomposition <word, date day-count total-count > put date into value field

Page 36: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Task Decomposition • Now data sets are:

File B_new: <word, date count> File A: <word, total-count> able , Jan 16 2 actor , Feb-22 15 actor , May-03 3 burger , Jul-04 20 . . .

able , 5 actor , 18 burger , 25 . . .

Page 37: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Task Decomposition • How will Hadoop shuffle & group these?

File B_new: <word, date day-count> File A: <word, total-count> able , Jan-16 2 actor , Feb-22 15 actor , May-03 3 burger , Jul-04 20 . . .

able , 5 actor , 18 burger , 25 . . .

Page 38: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Task Decomposition • How will Hadoop shuffle & group these?

actor , Feb-22 15 actor , May-03 3

actor , 18

Let’s focus on 1 key:

Page 39: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Task Decomposition • Hadoop gathers the data for a join

actor , Feb-22 15 actor , May-03 3

actor , 18

actor , Feb-22 15 actor , 18 actor , May-03 3

Page 40: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Task Decomposition • Reducer now has all the data for same

word grouped together

actor , 18 actor , Feb-22 15 actor , May-03 3

A number or date indicates file source

Page 41: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Task Decomposition • Reducer can now join the data and put

date back into key

actor , 18 actor , Feb-22 15 actor , May-03 3

Feb-22 actor, 15 18 May-03 actor, 3 18

Page 42: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Example: Vector Multiplication

Page 43: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication • Task: multiply 2 arrays of N numbers

– A basic mathematical operation – Let’s assume N is very large

Page 44: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication

X A B = (𝟓 x 𝟐.𝟕) # 1st of A & B + (𝟒 x 1.9) # 2nd of A & B + (–𝟑.𝟐 x –1.3) # 3rd … . . . + (–𝟐 x 𝟏) # Nth of A & B

5 4 -3.2 . . . -2

2.7 1.9 -1.3 . . . 1

• Task: multiply 2 arrays of N numbers

Page 45: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication

• Recall: – data partitioned in HDFS

5 4

-3.2 .

. -2

2.7

1.9 -1.3 .

. 1

. . . . . .

A B

Page 46: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication • Main design consideration: need elements with same index together Let <key, value> = <index, number>

Page 47: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication

• Problem: array partitions don’t have an index

5 4

-3.2 .

. -2

2.7

1.9 -1.3 .

. 1

. . . . . .

A B

Page 48: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication

• Problem: array partitions don’t have an index

5 4

-3.2 .

. -2

2.7

1.9 -1.3 .

. 1

. . . . . .

A B

no index!

Page 49: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication

5 4

-3.2 .

. -2

2.7

1.9 -1.3 .

. 1

. . . . . .

A B

Environment Information

Page 50: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication

5 4

-3.2 .

. -2

2.7

1.9 -1.3 .

. 1

. . . . . .

A B

Environment Information

map() os.getenv ('map_input_file')

Side effect

map() can access info

info outside map/reduce <key, value>

Page 51: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication • Let’s assume:

– each line already has <index, number>

B A 1, 5 2, 4

3, -3.2 .

. N, -2

1, 2.7

2, 1.9 3,-1.3 .

. N, 1

. . . . . .

Page 52: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication • Let’s assume:

– each line already has <index, number>

B A 1, 5 2, 4

3, -3.2 .

. N, -2

1, 2.7

2, 1.9 3,-1.3 .

. N, 1

. . . . . . Note: mapper only needs to pass data (identity function)

Page 53: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication

shuffle & group indices

B A

1, 5 2, 4

3, -3.2 .

. N, -2

1, 2.7

2, 1.9 3,-1.3 .

. N, 1

. . . . . .

<index, num> 1, 5

1, 2.7 3, -1.3 3, -3.2

2, 1.9 2, 4 …

<index, num> <index, num>

Page 54: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication What should reducers do?

1, 5 1, 2.7 3, -1.3 3, -3.2

2, 1.9 2, 4

<index, num>

. . .

A,B grouped

Page 55: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication Reducer: -get pairs of <index, number>

1, 5 1, 2.7 3, -1.3 3, -3.2

2, 1.9 2, 4

<index, num>

A,B grouped

. . .

Page 56: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication Reducer: -get pairs of <index, number> -multiply & add

1, 5 1, 2.7 3, -1.3 3, -3.2

2, 1.9 2, 4

<index, num>

A,B grouped

17.66

7.6

subtotals

. . .

+

Page 57: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector Multiplication Reducer: -get pairs of <index, number> -multiply & add (Still need get total sum, but should be largely reduced)

1, 5 1, 2.7 3, -1.3 3, -3.2

2, 1.9 2, 4

<index, num>

A,B grouped

17.66

7.6

subtotals

. . .

+

Page 58: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs

In Vector Multiplication

Page 59: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs

• For Vector Multiplication – How many <index, number> are

output from map()?

Page 60: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs

• For Vector Multiplication – How many <index, number> are

output from map()? – How many <index> groups have to

be shuffled?

Page 61: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs

• How many <index, number> are output ? A B

1, 5 2, 4 3, -3.2 . . . N, -2

1, 2.7 2, 1.9 3, -1.3 . . . N, 1

For: 2 Vectors with N indices each Then: 2N <index, number> are output from map()

Page 62: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs • How many <index> groups have to be shuffled?

For: 2N indices and N pairs Then: N groups are shuffled to reducers

1, 5 1, 2.7 3, -1.3 3, -3.2

2, 1.9 2, 4 …

<index, num>

A,B grouped

Page 63: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs • Can we reduce shuffling?

Page 64: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs • Can we reduce shuffling?

• Try: ‘combine’ map indices in mapper

(works better for Wordcount)

Page 65: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs • Can we reduce shuffling?

• Or Try: use index ranges of length R

Page 66: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs • Index Ranges: let R=10 & bin the array indices

1 2 3 4 … 10 11 12 …..19 20 21 …. (N-9) …. N N keys

Page 67: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs • Index Ranges: let R=10 & bin the array indices

1 2 3 4 … 10 11 12 …..19 20 21 …. (N-9) …. N

1 2 3 . . . N/R

place in bins

Page 68: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs • For example, let R=10, and bin the array indices

1 2 3 4 … 10 11 12 …..19 20 21 …. (N-9) …. N

1 2 3 . . . N/R

N keys are now N/R = N/10 keys

Page 69: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs • For example, let R=10, and bin the array indices

1 2 3 4 … 10 11 12 …..19 20 21 …. (N-9) …. N

1 2 3 . . . N/R

N keys are now N/R=N/10 keys <key,value> is now <index bin, original-index number>

Page 70: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Computational Costs • Now shuffling costs depend on N/R groups

If: R=1 Then: N/R=N groups (same as before) If: R>1 Then: N/R<N (less shuffling to do)

Page 71: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

• Trade-offs:

Computational Costs

If: size of (N/R) ↑ Then: shuffle costs ↑

Page 72: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

• Trade-offs:

Computational Costs

If: size of (N/R) ↑ Then: shuffle costs ↑ But: reducer complexity ↓

Page 73: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

• Trade-offs:

Computational Costs

If: size of (N/R) ↑ Then: shuffle costs ↑ But: reducer complexity ↓

-you control R (specific tradeoffs depend on data and hardware)

Page 74: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Vector to Matrices • Matrix multiplication needs row-index and

col-index in the keys

• Matrix multiplication more pertinent to data analytic topics

Page 75: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Summary

And Looking Beyond

Page 76: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Task Decomposition

• mappers are separate and independent • mappers work on data parts

Page 77: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Summary of Map/Reduce Design Considerations

• <key, value> must enable correct output • Let Hadoop do the hard work • Trade-offs

Hadoop

Programming

Page 78: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Summary of Map/Reduce Design Considerations

• Common mappers: – Filter (subset data) – Identity (just pass data) – Splitter (as for counting)

Page 79: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Summary of Map/Reduce Design Considerations

• Composite <keys>

Page 80: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Summary of Map/Reduce Design Considerations

• Composite <keys> • Extra info in <values>

Page 81: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Summary of Map/Reduce Design Considerations

• Composite <keys> • Extra info in <values> • Cascade Map/Reduce jobs

Page 82: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Summary of Map/Reduce Design Considerations

• Composite <keys> • Extra info in <values> • Cascade Map/Reduce jobs • Bin keys into ranges

Page 83: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Summary of Map/Reduce Design Considerations

• Composite <keys> • Extra info in <values> • Cascade Map/Reduce jobs • Bin keys into ranges • Aggregate map output when possible

(combiner option)

Page 84: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Potential Limitations Map/Reduce

• Must fit <key, value> paradigm • Map/Reduce data not persistent • Requires programming/debugging • Not interactive

Page 85: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall

Beyond Map/Reduce

• Data access tools (Pig, HIVE) – SQL like syntax

• Interactivity & Persistency (Spark)