Upload
phungduong
View
222
Download
0
Embed Size (px)
Citation preview
Knowledge Discovery and Data Mining 1 (VO) (707.003)Map-Reduce
Denis Helic
KTI, TU Graz
Oct 24, 2013
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82
Big picture: KDDM
Linear Algebra Map-Reduce
Mathematical Tools Infrastructure
Knowledge Discovery Process
Information Theory Statistical Inference
Probability Theory
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 2 / 82
Outline
1 Motivation
2 Large Scale Computation
3 Map-Reduce
4 Environment
5 Map-Reduce Skew
Slides
Slides are partially based on “Mining Massive Datasets” course fromStanford University by Jure Leskovec
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 3 / 82
Motivation
Map-Reduce
Today’s data is huge
ChallengesHow to distribute computation?Distributed/parallel programming is hard
Map-reduce addresses both of these points
Google’s computational/data manipulation modelElegant way to work with huge data
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 4 / 82
Motivation
Single node architecture
CPU
Memory
MemoryDisk
Data fits in memory
Machine learning, statistics
“Classical” data mining
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 5 / 82
Motivation
Motivation: Google example
20+ billion Web pages
Approx. 20 KB per page
Approx. 400+ TB for the whole Web
Approx. 1000 hard drives to store the Web
A single computer reads 30− 35 MB/s from disk
Approx. 4 months to read the Web with a single computer
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 6 / 82
Motivation
Motivation: Google example
Takes even more time to do something with the data
E.g. to calculate the PageRank
If m is the number of the links on the Web
Average degree on the Web is approx. 10, thus m ≈ 2 · 1011
To calculate PageRank we need per iteration step m multiplications
We need approx. 100+ iteration steps
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 7 / 82
Motivation
Motivation: Google example
Today a standard architecture for such problems is emerging
Cluster of commodity Linux nodes
Commodity network (ethernet) to connect them
2− 10 Gbps between racks
1 Gbps within racks
Each rack contains 16− 64 nodes
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 8 / 82
Motivation
Cluster architecture
CPU
Memory
MemoryDisk
CPU
Memory
MemoryDisk
CPU
Memory
MemoryDisk
CPU
Memory
MemoryDisk
... ... ...
Switch Switch
Switch
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 9 / 82
Motivation
Motivation: Google example
2011 estimation: Google had approx. 1 million machines
http://www.datacenterknowledge.com/archives/2011/08/01/
report-google-uses-about-900000-servers/
Other examples: Facebook, Twitter, Amazon, etc.
But also smaller examples: e.g. Wikipedia
Single source shortest path: m + n time complexity, approx. 260 · 106
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 10 / 82
Large Scale Computation
Large scale computation
Large scale computation for data mining on commodity hardware
Challenges
How to distribute computation?
How can we make it easy to write distributed programs?
How to cope with machine failures?
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 11 / 82
Large Scale Computation
Large scale computation: machine failures
One server may stay up 3 years (1000 days)
The failure rate per day: p = 10−3
How many failures per day if we have n machines?
Binomial r.v.
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 12 / 82
Large Scale Computation
Large scale computation: machine failures
One server may stay up 3 years (1000 days)
The failure rate per day: p = 10−3
How many failures per day if we have n machines?
Binomial r.v.
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 12 / 82
Large Scale Computation
Large scale computation: machine failures
PMF of a Binomial r.v.
p(k) =
(n
k
)(1− p)n−kpk
Expectation of a Binomial r.v.
E [X ] = np
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 13 / 82
Large Scale Computation
Large scale computation: machine failures
n = 1000, E [X ] = 1
If we have 1000 machines we lose one per day
n = 1000000, E [X ] = 1000
If we have 1 million machines (Google) we lose 1 thousand per day
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 14 / 82
Large Scale Computation
Coping with node failures: Exercise
Exercise
Suppose a job consists of n tasks, each of which takes time T seconds.Thus, if there are no failures, the sum over all compute nodes of the timetaken to execute tasks at that node is nT . Suppose also that theprobability of a task failing is p per job per second, and when a task fails,the overhead of management of the restart is such that it adds 10Tseconds to the total execution time of the job. What is the total expectedexecution time of the job?
Example
Example 2.4.1 from “Mining Massive Datasets”.
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 15 / 82
Large Scale Computation
Coping with node failures: Exercise
Failure of a single task is a Bernoulli r.v. with parameter p
The number of failures in n tasks is a Binomial with parameter n andp
PMF of a Binomial r.v.
p(k) =
(n
k
)(1− p)n−kpk
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 16 / 82
Large Scale Computation
Coping with node failures: Exercise
The time until the first failure of a task is a Geometric r.v. withparameter p
PMF
p(k) = (1− p)k−1p
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 17 / 82
Large Scale Computation
Coping with node failures: Exercise
If we go to very fine scales we can approximate a Geometric r.v. withan Exponential r.v. with λ = p
The time (T ) until a task fails is distributed exponentially
f (t;λ) =
{λe−λt , t ≥ 0
0, t < 0
CDF
F (t;λ) =
{1− e−λt , t ≥ 0
0, t < 0
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 18 / 82
Large Scale Computation
Coping with node failures: Exercise
Expected execution time of a task
E [T ] = PSTS + PF (E [TL] + TR + E [T ])
PS = 1− PF
PF = 1− e−λTs
PS = e−λTs
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 19 / 82
Large Scale Computation
Coping with node failures: Exercise
Expected execution time of a task
E [T ] = PSTS + PF (E [TL] + TR + E [T ])
PS = 1− PF
PF = 1− e−λTs
PS = e−λTs
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 19 / 82
Large Scale Computation
Coping with node failures: Exercise
After simplifying, we get:
E [T ] = TS +PF
1− PF(E [TL] + TR)
PF
1− PF=
1− e−λTS
e−λTS
= eλTS − 1 (1)
E [T ] = TS + (eλTS − 1)(E [TL] + TR)
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 20 / 82
Large Scale Computation
Coping with node failures: Exercise
E [TL] =?
What is PDF of TL
TL models time lost because of failure
Time is lost if and only if a failure occurs before the task finishes, i.e.we know that within [0,TS ] a failure has occurred
Let this be an event B
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 21 / 82
Large Scale Computation
Coping with node failures: Exercise
What is P(B)?
P(B) = F (TS) = 1− e−λTS
Our TL is now a r.v. conditioned on event B
I.e. we are interested in the probability of event A (failure occurs attime t < TS) given that B occurred
P(A) = λe−λt
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 22 / 82
Large Scale Computation
Coping with node failures: Exercise
P(A|B) =P(A ∩ B)
P(B)
What is A ∩ B?
A: failure occurs at time t < TS
B: failure occurs within [0,TS ]
A ∩ B: failure occurs at time t < TS
A ∩ B = A
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 23 / 82
Large Scale Computation
Coping with node failures: Exercise
P(A|B) =P(A)
P(B)
PDF of a r.v. TL
f (t) =λe−λt
1− e−λTS
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 24 / 82
Large Scale Computation
Coping with node failures: Exercise
Expectation E [TL]:
E [TL] =
∫ ∞−∞
tf (t)dt
E [TL] =
∫ TS
0t
λe−λt
1− e−λTSdt
=1
1− e−λTS
∫ TS
0tλe−λtdt (2)
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 25 / 82
Large Scale Computation
Coping with node failures: Exercise
E [TL] =1
1− e−λTS
[−te−λt − e−λt
λ
]∣∣∣∣TS
0
=1
λ− TS
eλTS − 1
E [T ] = TS + (e−λTS − 1)(1
λ− TS
eλTS − 1+ TR)
= (eλTS − 1)(1
λ+ TR) (3)
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 26 / 82
Large Scale Computation
Coping with node failures: Exercise
For a single task:
E [T ] = (epT − 1)(1
p+ 10T )
For n tasks:
E [T ] = n(epT − 1)(1
p+ 10T )
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 27 / 82
Large Scale Computation
Coordination
Using this information we can improve scheduling
We can also optimize checking for node failures
Check-pointing strategies
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 28 / 82
Large Scale Computation
Large scale computation: data copying
Copying data over network takes time
Bring data closer to computation
I.e. process data locally at each node
Replicate data to increase reliability
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 29 / 82
Map-Reduce
Solution: Map-reduce
Storage infrastructure
Distributed file systemGoogle: GFSHadoop: HDFS
Programming model
Map-reduce
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 30 / 82
Map-Reduce
Storage infrastructure
Problem: if node fails how to store a file persistently
Distributed file systemProvides global file namespace
Typical usage pattern
Huge files: several 100s GB to 1 TBData is rarely updated in placeReads and appends are common
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 31 / 82
Map-Reduce
Distributed file system
Chunk servers
File is split into contiguous chunksTypically each chunk is 16− 64 MBEach chunk is replicated (usually 2x or 3x)Try to keep replicas in different racks
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 32 / 82
Map-Reduce
Distributed file system
Master node
Stores metadata about where files are storedMight be replicated
Client library for file access
Talks to master node to find chunk serversConnects directly to chunk servers to access data
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 33 / 82
Map-Reduce
Distributed file system
Reliable distributed file system
Seamless recovery from node failures
Bring computation directly to data
Chunk servers also used as computation nodes
Reliable distributed file system Data kept in “chunks” spread across machines Each chunk replicated on different machines Seamless recovery from disk or machine failure
C0 C1
C2 C5
Chunk server 1
D1
C5
Chunk server 3
C1
C3 C5
Chunk server 2
… C2 D0
D0
Bring computation directly to the data!
C0 C5
Chunk server N
C2 D0
1/8/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 33
Chunk servers also serve as compute servers
Figure: Figure from slides by Jure Leskovec
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 34 / 82
Map-Reduce
Programming model: Map-reduce
Running example
We want to count the number of occurrences for each word in a collectionof documents. In this example, the input file is a repository of documents,and each document is an element.
Example
Example is meanwhile a standard Map-reduce example.
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 35 / 82
Map-Reduce
Programming model: Map-reduce
words input_file | sort | uniq -c
Three step process1 Split file into words, each word on a separate line2 Group and sort all words3 Count the occurrences
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 36 / 82
Map-Reduce
Programming model: Map-reduce
This captures the essence of Map-reduce
Split|Group|Count
Naturally parallelizable
E.g. split and count
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 37 / 82
Map-Reduce
Programming model: Map-reduce
Sequentially read a lot of data
Map: extract something that you care about (key , value)
Group by key: sort and shuffle
Reduce: Aggregate, summarize, filter or transform
Write the result
Outline
Outline is always the same: Map and Reduce change to fit the problem
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 38 / 82
Map-Reduce
Programming model: Map-reduce
2.2. MAP-REDUCE 23
possibility that one of these tasks will fail to execute. In brief, a map-reducecomputation executes as follows:
1. Some number of Map tasks each are given one or more chunks from adistributed file system. These Map tasks turn the chunk into a sequenceof key-value pairs. The way key-value pairs are produced from the inputdata is determined by the code written by the user for the Map function.
2. The key-value pairs from each Map task are collected by a master con-troller and sorted by key. The keys are divided among all the Reducetasks, so all key-value pairs with the same key wind up at the same Re-duce task.
3. The Reduce tasks work on one key at a time, and combine all the val-ues associated with that key in some way. The manner of combinationof values is determined by the code written by the user for the Reducefunction.
Figure 2.2 suggests this computation.
Inputchunks
Groupby keys
Key−value
(k,v)pairs
their valuesKeys with all
outputCombined
Maptasks
Reducetasks
(k, [v, w,...])
Figure 2.2: Schematic of a map-reduce computation
2.2.1 The Map Tasks
We view input files for a Map task as consisting of elements, which can beany type: a tuple or a document, for example. A chunk is a collection ofelements, and no element is stored across two chunks. Technically, all inputs
Figure: Figure from the book: “Mining massive datasets”
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 39 / 82
Map-Reduce
Programming model: Map-reduce
Input: a set of (key , value) pairs (e.g. key is the filename, value is asingle line in the file)
Map(k , v)→ (k ′, v ′)∗
Takes a (k, v) pair and outputs a set of (k ′, v ′) pairsThere is one Map call for each (k , v) pair
Reduce(k ′, (v ′)∗)→ (k ′′, v ′′)∗
All values v ′ with same key k ′ are reduced together and processed in v ′
orderThere is one Reduce call for each unique k ′
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 40 / 82
Map-Reduce
Programming model: Map-reduce
Big Document
Star Wars is anAmerican epicspace operafranchise centeredon a film seriescreated by GeorgeLucas. The filmseries has spawnedan extensive mediafranchise called theExpanded Universeincluding books,television series,computer and videogames, and comicbooks. Thesesupplements to thetwo film trilogies...
Map
(Star, 1)(Wars, 1)
(is, 1)(an, 1)
(American, 1)(epic, 1)
(space, 1)(opera, 1)
(franchise, 1)(centered, 1)
(on, 1)(a, 1)
(film, 1)(series, 1)
(created, 1)(by, 1). . .. . .
Group by key
(Star, 1)(Star, 1)(Wars, 1)(Wars, 1)
(a, 1)(a, 1)(a, 1)(a, 1)(a, 1)(a, 1)
(film, 1)(film, 1)(film, 1)
(franchise, 1)(series, 1)(series, 1)
. . .
. . .
Reduce
(Star, 2)(Wars, 2)
(a, 6)(film, 3)
(franchise, 1)(series, 2)
. . .
. . .
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 41 / 82
Map-Reduce
Programming model: Map-reduce
map(key, value):
// key: document name
// value: a single line from a document
foreach word w in value:
emit(w, 1)
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 42 / 82
Map-Reduce
Programming model: Map-reduce
reduce(key, values):
// key: a word
// values: an iterator over counts
result = 0
foreach count c in values:
result += c
emit(key, result)
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 43 / 82
Environment
Map-reduce computation
Map-reduce environment takes care of:1 Partitioning the input data2 Scheduling the program’s execution across a set of machines3 Performing the group by key step4 Handling machine failures5 Managing required inter-machine communication
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 44 / 82
Environment
Map-reduce computation
1/8/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 43
Big document
MAP: Read input and
produces a set of key-value pairs
Group by key: Collect all pairs with
same key (Hash merge, Shuffle,
Sort, Partition)
Reduce: Collect all values belonging to the key and output
Figure: Figure from the course by Jure Leskovec (Stanford University)
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 45 / 82
Environment
Map-reduce computation
1/8/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 44
All phases are distributed with many tasks doing the work Figure: Figure from the course by Jure Leskovec (Stanford University)
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 46 / 82
Environment
Data flow
Input and final output are stored in distributed file system
Scheduler tries to schedule map tasks close to physical storagelocation of input data
Intermediate results are stored on local file systems of Map andReduce workers
Output is often input to another Map-reduce computation
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 47 / 82
Environment
Coordination: Master
Master node takes care of coordination
Task status, e.g. idle, in-progress, completed
Idle tasks get scheduled as workers become available
When a Map task completes, it notifies the master about the size andlocation of its intermediate files
Master pushes this info to reducers
Master pings workers periodically to detect failures
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 48 / 82
Environment
Map-reduce execution details
2.2. MAP-REDUCE 27
ProgramUser
Master
Worker
Worker
Worker
Worker
WorkerData
Input
File
Output
fork forkfork
Mapassign assign
Reduce
Intermediate
Files
Figure 2.3: Overview of the execution of a map-reduce program
executing at a particular Worker, or completed). A Worker process reports tothe Master when it finishes a task, and a new task is scheduled by the Masterfor that Worker process.
Each Map task is assigned one or more chunks of the input file(s) andexecutes on it the code written by the user. The Map task creates a file foreach Reduce task on the local disk of the Worker that executes the Map task.The Master is informed of the location and sizes of each of these files, and theReduce task for which each is destined. When a Reduce task is assigned by theMaster to a Worker process, that task is given all the files that form its input.The Reduce task executes code written by the user and writes its output to afile that is part of the surrounding distributed file system.
2.2.6 Coping With Node Failures
The worst thing that can happen is that the compute node at which the Masteris executing fails. In this case, the entire map-reduce job must be restarted.But only this one node can bring the entire process down; other failures will bemanaged by the Master, and the map-reduce job will complete eventually.
Suppose the compute node at which a Map worker resides fails. This fail-ure will be detected by the Master, because it periodically pings the Workerprocesses. All the Map tasks that were assigned to this Worker will have tobe redone, even if they had completed. The reason for redoing completed Map
Figure: Figure from the book: “Mining massive datasets”
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 49 / 82
Map-Reduce Skew
Maximizing parallelism
If we want maximum parallelism then
Use one Reduce task for each reducer (i.e. a single key and itsassociated value list)Execute each Reduce task at a different compute node
The plan is typically not the best one
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 50 / 82
Map-Reduce Skew
Maximizing parallelism
There is overhead associated with each task we create
We might want to keep the number of Reduce tasks lower than thenumber of different keysWe do not want to create a task for a key with a “short” list
There are often far more keys than there are compute nodes
E.g. count words from Wikipedia or from the Web
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 51 / 82
Map-Reduce Skew
Input data skew: Exercise
Exercise
Suppose we execute the word-count map-reduce program on a largerepository such as a copy of the Web. We shall use 100 Map tasks andsome number of Reduce tasks.
1 Do you expect there to be significant skew in the times taken by thevarious reducers to process their value list? Why or why not?
2 If we combine the reducers into a small number of Reduce tasks, say10 tasks, at random, do you expect the skew to be significant? Whatif we instead combine the reducers into 10,000 Reduce tasks?
Example
Example is based on the example 2.2.1 from “Mining Massive Datasets”.
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 52 / 82
Map-Reduce Skew
Maximizing parallelism
There is often significant variation in the lengths of value list fordifferent keys
Different reducers take different amounts of time to finish
If we make each reducer a separate Reduce task then the taskexecution times will exhibit significant variance
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 53 / 82
Map-Reduce Skew
Input data skew
Input data skew describes an uneven distribution of the number ofvalues per key
Examples include power-law graphs, e.g. the Web or Wikipedia
Other data with Zipfian distribution
E.g. the number of word occurrences
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 54 / 82
Map-Reduce Skew
Power-law (Zipf) random variable
PMF
p(k) =k−α
ζ(α)
k ∈ N, k ≥ 1, α > 1
ζ(α) is the Riemann zeta function
ζ(α) =∞∑k=1
k−α
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 55 / 82
Map-Reduce Skew
Power-law (Zipf) random variable
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19k
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9pro
babili
ty o
f k
Probability mass function of a Zipf random variable; differing α values
α=2.0
α=3.0
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 56 / 82
Map-Reduce Skew
Power-law (Zipf) input data
0 2 4 6 8 10 12100
101
102
103
104
105
Key size: sum=189681,µ=1.897,σ2 =58.853
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 57 / 82
Map-Reduce Skew
Tackling input data skew
We need to distribute a skewed (power-law) input data into a numberof reducers/Reduce tasks/compute nodes
The distribution of the key lengths inside of reducers/Reducetasks/compute nodes should be approximately normal
The variance of these distributions should be smaller than the originalvariance
If variance is small an efficient load balancing is possible
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 58 / 82
Map-Reduce Skew
Tackling input data skew
Each Reduce task receives a number of keys
The total number of values to process is the sum of the number ofvalues over all keys
The average number of values that a Reduce task processes is theaverage of the number of values over all keys
Equivalently, each compute node receives a number of Reduce tasks
The sum and average for a compute node is the sum and averageover all Reduce tasks for that node
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 59 / 82
Map-Reduce Skew
Tackling input data skew
How should we distribute keys to Reduce tasks?
Uniformly at random
Other possibilities?
Calculate the capacity of a single Reduce task
Add keys until capacity is reached, etc.
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 60 / 82
Map-Reduce Skew
Tackling input data skew
How should we distribute keys to Reduce tasks?
Uniformly at random
Other possibilities?
Calculate the capacity of a single Reduce task
Add keys until capacity is reached, etc.
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 60 / 82
Map-Reduce Skew
Tackling input data skew
How should we distribute keys to Reduce tasks?
Uniformly at random
Other possibilities?
Calculate the capacity of a single Reduce task
Add keys until capacity is reached, etc.
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 60 / 82
Map-Reduce Skew
Tackling input data skew
We are averaging over a skewed distribution
Are there laws that describe how the averages of sufficiently largesamples drawn from a probability distribution behaves?
In other words, how are the averages of samples of a r.v. distributed?
Central-limit Theorem
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 61 / 82
Map-Reduce Skew
Tackling input data skew
We are averaging over a skewed distribution
Are there laws that describe how the averages of sufficiently largesamples drawn from a probability distribution behaves?
In other words, how are the averages of samples of a r.v. distributed?
Central-limit Theorem
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 61 / 82
Map-Reduce Skew
Central-Limit Theorem
The central-limit theorem describes the distribution of the arithmeticmean of sufficiently large samples of independent and identicallydistributed random variables
The means are normally distributed
The mean of the new distribution equals the mean of the originaldistribution
The variance of the new distribution equals σ2
n , where σ2 is thevariance of the original distribution
Thus, we keep the mean and reduce the variance
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 62 / 82
Map-Reduce Skew
Central-Limit Theorem
Theorem
Suppose X1, . . . ,Xn are independent and identical r.v. with theexpectation µ and variance σ2. Let Y be a r.v. defined as:
Yn =1
n
n∑i=1
Xi
The CDF Fn(y) tends to PDF of a normal r.v. with the mean µ andvariance σ2 for n→∞:
limn→∞
Fn(y) =1√
2πσ2
∫ y
−∞e−
(x−µ)2
2σ2
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 63 / 82
Map-Reduce Skew
Central-Limit Theorem
Practically, it is possible to replace Fn(y) with a normal distributionfor n > 30
We should always average over at least 30 values
Example
Approximating uniform r.v. with a normal r.v. by sampling and averaging
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 64 / 82
Map-Reduce Skew
Central-Limit Theorem
0.0 0.2 0.4 0.6 0.8 1.00
2
4
6
8
10
12
14
16
18
Averages: µ=0.5,σ2 =0.08333
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 65 / 82
Map-Reduce Skew
Central-Limit Theorem
0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.700
5
10
15
20
25
30
35
40
Averages: µ=0.499,σ2 =0.00270
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 66 / 82
Map-Reduce Skew
Central-Limit Theorem
IPython Notebook examples
http:
//kti.tugraz.at/staff/denis/courses/kddm1/clt.ipynb
Command Line
ipython notebook –pylab=inline clt.ipynb
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 67 / 82
Map-Reduce Skew
Input data skew
We can reduce impact of the skew by using fewer Reduce tasks thanthere are reducers
If keys are sent randomly to Reduce tasks we average over value listlengths
Thus, we average over the total time for each Reduce task(Central-limit Theorem)
We should make sure that the sample size is large enough (n > 30)
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 68 / 82
Map-Reduce Skew
Input data skew
We can further reduce the skew by using more Reduce tasks thanthere are compute nodes
Long Reduce tasks might occupy a compute node fully
Several shorter Reduce tasks are executed sequentially at a singlecompute node
Thus, we average over the total time for each compute node(Central-limit Theorem)
We should make sure that the sample size is large enough (n > 30)
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 69 / 82
Map-Reduce Skew
Input data skew
0 2 4 6 8 10 12100
101
102
103
104
105
Key size: sum=196524,µ=1.965,σ2 =243.245
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 70 / 82
Map-Reduce Skew
Input data skew
0 200 400 600 800 10000
500
1000
1500
2000
2500
3000
3500
4000
4500
Task key size: sum=196524,µ=196.524,σ2 =25136.428
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 71 / 82
Map-Reduce Skew
Input data skew
0 1 2 3 4 5 6100
101
102
Task key averages: µ=1.958,σ2 =1.886
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 72 / 82
Map-Reduce Skew
Input data skew
0 2 4 6 8 100
5000
10000
15000
20000
25000
Node key size: sum=196524,µ=19652.400,σ2 =242116.267
Node key averages: µ=1.976,σ2 =0.030
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 73 / 82
Map-Reduce Skew
Input data skew
IPython Notebook examples
http:
//kti.tugraz.at/staff/denis/courses/kddm1/mrskew.ipynb
Command Line
ipython notebook –pylab=inline mrskew.ipynb
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 74 / 82
Map-Reduce Skew
Combiners
Sometimes a Reduce function is associative and commutative
Commutative: x ◦ y = y ◦ x
Associative: (x ◦ y) ◦ z = x ◦ (y ◦ z)
The values can be combined in any order, with the same result
The addition in reducer of the word count example is such anoperation
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 75 / 82
Map-Reduce Skew
Combiners
When the Reduce function is associative and commutative we canpush some of the reducers’ work to the Map tasks
E.g. instead of emitting (w , 1), (w , 1), . . .
We can apply the Reduce function within the Map task
In that way the output of the Map task is “combined” beforegrouping and sorting
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 76 / 82
Map-Reduce Skew
Combiners
Map
(Star, 1)(Wars, 1)
(is, 1). . .
(American, 1)(epic, 1)
(space, 1). . .
(franchise, 1)(centered, 1)
(on, 1). . .
(film, 1)(series, 1)
(created, 1). . .. . .. . .
Combiner
(Star, 2). . .
(Wars, 2)(a, 6). . .
(a,3). . .
(film, 3)(franchise, 1)
(series, 2). . .. . .. . .
Group by key
(Star, 2)(Wars, 2)
(a, 6)(a, 3)(a, 4). . .
(film, 3)(franchise, 1)
(series, 2). . .. . .. . .
Reduce
(Star, 2)(Wars, 2)
(a, 13)(film, 3)
(franchise, 1)(series, 2)
. . .
. . .
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 77 / 82
Map-Reduce Skew
Input data skew: Exercise
Exercise
Suppose we execute the word-count map-reduce program on a largerepository such as a copy of the Web. We shall use 100 Map tasks andsome number of Reduce tasks.
1 Do you expect there to be significant skew in the times taken by thevarious reducers to process their value list? Why or why not?
2 If we combine the reducers into a small number of Reduce tasks, say10 tasks, at random, do you expect the skew to be significant? Whatif we instead combine the reducers into 10,000 Reduce tasks?
3 Suppose we do use a combiner at the 100 Map tasks. Do you expectskew to be significant? Why or why not?
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 78 / 82
Map-Reduce Skew
Input data skew
0 2 4 6 8 10 12100
101
102
103
104
105
Key size: sum=195279,µ=1.953,σ2 =83.105
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 79 / 82
Map-Reduce Skew
Input data skew
0 1 2 3 4 5 6 7100
101
102
103
104
105
Task key averages: µ=1.793,σ2 =10.986
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 80 / 82
Map-Reduce Skew
Input data skew
IPython Notebook examples
http://kti.tugraz.at/staff/denis/courses/kddm1/
combiner.ipynb
Command Line
ipython notebook –pylab=inline combiner.ipynb
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 81 / 82
Map-Reduce Skew
Map-Reduce: Exercise
Exercise
Suppose we have an n× n matrix M, whose element in row i and column jwill be denoted mij . Suppose we also have a vector v of length n, whosejth element is vj . Then the matrix-vector product is the vector x of lengthn, whose ith element is given by
xi =n∑
j=1
mijvj
Outline a Map-Reduce program that calculates the vector x.
Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 82 / 82