90
5: MapReduce Theory and Implementation Zubair Nabi [email protected] April 18, 2013 Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 1 / 34

Topic 5: MapReduce Theory and Implementation

Embed Size (px)

DESCRIPTION

Cloud Computing Workshop 2013, ITU

Citation preview

Page 1: Topic 5: MapReduce Theory and Implementation

5: MapReduce Theory and Implementation

Zubair Nabi

[email protected]

April 18, 2013

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 1 / 34

Page 2: Topic 5: MapReduce Theory and Implementation

Outline

1 Introduction

2 Programming Model

3 Implementation

4 Refinements

5 Hadoop

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 2 / 34

Page 3: Topic 5: MapReduce Theory and Implementation

Outline

1 Introduction

2 Programming Model

3 Implementation

4 Refinements

5 Hadoop

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 3 / 34

Page 4: Topic 5: MapReduce Theory and Implementation

Common computations at Google

Process large amounts of data generated from crawled documents,web request logs, etc.

Compute inverted index, graph structure of web documents,summaries of pages crawled per host, etc.Common properties:

1 Computation is conceptually simple and is distributed across hundredsor thousands of machines to leverage parallelism

2 Input data is large3 The original simple computation is made complex by system-level code

to deal with issues of work assignment and distribution, andfault-tolerance

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 4 / 34

Page 5: Topic 5: MapReduce Theory and Implementation

Common computations at Google

Process large amounts of data generated from crawled documents,web request logs, etc.

Compute inverted index, graph structure of web documents,summaries of pages crawled per host, etc.

Common properties:1 Computation is conceptually simple and is distributed across hundreds

or thousands of machines to leverage parallelism2 Input data is large3 The original simple computation is made complex by system-level code

to deal with issues of work assignment and distribution, andfault-tolerance

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 4 / 34

Page 6: Topic 5: MapReduce Theory and Implementation

Common computations at Google

Process large amounts of data generated from crawled documents,web request logs, etc.

Compute inverted index, graph structure of web documents,summaries of pages crawled per host, etc.Common properties:

1 Computation is conceptually simple and is distributed across hundredsor thousands of machines to leverage parallelism

2 Input data is large3 The original simple computation is made complex by system-level code

to deal with issues of work assignment and distribution, andfault-tolerance

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 4 / 34

Page 7: Topic 5: MapReduce Theory and Implementation

Common computations at Google

Process large amounts of data generated from crawled documents,web request logs, etc.

Compute inverted index, graph structure of web documents,summaries of pages crawled per host, etc.Common properties:

1 Computation is conceptually simple and is distributed across hundredsor thousands of machines to leverage parallelism

2 Input data is large

3 The original simple computation is made complex by system-level codeto deal with issues of work assignment and distribution, andfault-tolerance

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 4 / 34

Page 8: Topic 5: MapReduce Theory and Implementation

Common computations at Google

Process large amounts of data generated from crawled documents,web request logs, etc.

Compute inverted index, graph structure of web documents,summaries of pages crawled per host, etc.Common properties:

1 Computation is conceptually simple and is distributed across hundredsor thousands of machines to leverage parallelism

2 Input data is large3 The original simple computation is made complex by system-level code

to deal with issues of work assignment and distribution, andfault-tolerance

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 4 / 34

Page 9: Topic 5: MapReduce Theory and Implementation

Enter MapReduce

Based on the insights mentioned in the previous slide, 2 GoogleEngineers, Jeff Dean and Sanjay Ghemawat, in 2004 designedMapReduce

I Abstraction that helps the programmer express simple computationsI Hides the gory details of parallelization, fault-tolerance, data distribution,

and load balancingI Relies on user-provided map and reduce primitives present in functional

languages

Leverages one key insight: Most of the computation at Google involvedapplying a map operator to each logical record in the input dataset toobtain a set of intermediate key/value pairs and then applying a reduceoperation to all values with the same key, for aggregation

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 5 / 34

Page 10: Topic 5: MapReduce Theory and Implementation

Enter MapReduce

Based on the insights mentioned in the previous slide, 2 GoogleEngineers, Jeff Dean and Sanjay Ghemawat, in 2004 designedMapReduce

I Abstraction that helps the programmer express simple computations

I Hides the gory details of parallelization, fault-tolerance, data distribution,and load balancing

I Relies on user-provided map and reduce primitives present in functionallanguages

Leverages one key insight: Most of the computation at Google involvedapplying a map operator to each logical record in the input dataset toobtain a set of intermediate key/value pairs and then applying a reduceoperation to all values with the same key, for aggregation

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 5 / 34

Page 11: Topic 5: MapReduce Theory and Implementation

Enter MapReduce

Based on the insights mentioned in the previous slide, 2 GoogleEngineers, Jeff Dean and Sanjay Ghemawat, in 2004 designedMapReduce

I Abstraction that helps the programmer express simple computationsI Hides the gory details of parallelization, fault-tolerance, data distribution,

and load balancing

I Relies on user-provided map and reduce primitives present in functionallanguages

Leverages one key insight: Most of the computation at Google involvedapplying a map operator to each logical record in the input dataset toobtain a set of intermediate key/value pairs and then applying a reduceoperation to all values with the same key, for aggregation

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 5 / 34

Page 12: Topic 5: MapReduce Theory and Implementation

Enter MapReduce

Based on the insights mentioned in the previous slide, 2 GoogleEngineers, Jeff Dean and Sanjay Ghemawat, in 2004 designedMapReduce

I Abstraction that helps the programmer express simple computationsI Hides the gory details of parallelization, fault-tolerance, data distribution,

and load balancingI Relies on user-provided map and reduce primitives present in functional

languages

Leverages one key insight: Most of the computation at Google involvedapplying a map operator to each logical record in the input dataset toobtain a set of intermediate key/value pairs and then applying a reduceoperation to all values with the same key, for aggregation

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 5 / 34

Page 13: Topic 5: MapReduce Theory and Implementation

Enter MapReduce

Based on the insights mentioned in the previous slide, 2 GoogleEngineers, Jeff Dean and Sanjay Ghemawat, in 2004 designedMapReduce

I Abstraction that helps the programmer express simple computationsI Hides the gory details of parallelization, fault-tolerance, data distribution,

and load balancingI Relies on user-provided map and reduce primitives present in functional

languages

Leverages one key insight: Most of the computation at Google involvedapplying a map operator to each logical record in the input dataset toobtain a set of intermediate key/value pairs and then applying a reduceoperation to all values with the same key, for aggregation

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 5 / 34

Page 14: Topic 5: MapReduce Theory and Implementation

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 6 / 34

Page 15: Topic 5: MapReduce Theory and Implementation

Outline

1 Introduction

2 Programming Model

3 Implementation

4 Refinements

5 Hadoop

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 7 / 34

Page 16: Topic 5: MapReduce Theory and Implementation

Programming Model

Input: Set of key/value pairs

Output: Set of key/value pairs

The user provides the entire computation in the form of two functions:map and reduce

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 8 / 34

Page 17: Topic 5: MapReduce Theory and Implementation

Programming Model

Input: Set of key/value pairs

Output: Set of key/value pairs

The user provides the entire computation in the form of two functions:map and reduce

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 8 / 34

Page 18: Topic 5: MapReduce Theory and Implementation

Programming Model

Input: Set of key/value pairs

Output: Set of key/value pairs

The user provides the entire computation in the form of two functions:map and reduce

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 8 / 34

Page 19: Topic 5: MapReduce Theory and Implementation

User-defined functions

1 MapI Takes an input pair and produces a set of intermediate key/value pairs

I The framework groups together the intermediate values by key forconsumption by the Reduce

2 ReduceI Takes as input a key and a list of associated valuesI In the common case, it merges these values to result in a smaller set of

values

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 9 / 34

Page 20: Topic 5: MapReduce Theory and Implementation

User-defined functions

1 MapI Takes an input pair and produces a set of intermediate key/value pairsI The framework groups together the intermediate values by key for

consumption by the Reduce

2 ReduceI Takes as input a key and a list of associated valuesI In the common case, it merges these values to result in a smaller set of

values

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 9 / 34

Page 21: Topic 5: MapReduce Theory and Implementation

User-defined functions

1 MapI Takes an input pair and produces a set of intermediate key/value pairsI The framework groups together the intermediate values by key for

consumption by the Reduce

2 ReduceI Takes as input a key and a list of associated values

I In the common case, it merges these values to result in a smaller set ofvalues

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 9 / 34

Page 22: Topic 5: MapReduce Theory and Implementation

User-defined functions

1 MapI Takes an input pair and produces a set of intermediate key/value pairsI The framework groups together the intermediate values by key for

consumption by the Reduce

2 ReduceI Takes as input a key and a list of associated valuesI In the common case, it merges these values to result in a smaller set of

values

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 9 / 34

Page 23: Topic 5: MapReduce Theory and Implementation

Example: Word Count

Counting the occurrence of each word in a large collection of documents

1 MapI Emits each word and the value 1

2 ReduceI Sums together all counts emitted for a particular word

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 10 / 34

Page 24: Topic 5: MapReduce Theory and Implementation

Example: Word Count

Counting the occurrence of each word in a large collection of documents1 Map

I Emits each word and the value 1

2 ReduceI Sums together all counts emitted for a particular word

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 10 / 34

Page 25: Topic 5: MapReduce Theory and Implementation

Example: Word Count

Counting the occurrence of each word in a large collection of documents1 Map

I Emits each word and the value 1

2 ReduceI Sums together all counts emitted for a particular word

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 10 / 34

Page 26: Topic 5: MapReduce Theory and Implementation

Example: Word Count(2)

1 map(String key, String value):

2 // key: document name

3 // value: document contents

4 for each word w in value:

5 EmitIntermediate(w, "1");

67 reduce(String key, Iterator values):

8 // key: a word

9 // values: a list of counts

10 int result = 0;

11 for each v in values:

12 result += ParseInt(v);

13 Emit(AsString(result));

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 11 / 34

Page 27: Topic 5: MapReduce Theory and Implementation

Types

User-supplied map and reduce functions have associated types1 Map

I map(k1, v1)→ list(k2, v2)

2 ReduceI reduce(k2, list(v2))→ list(v2)

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 12 / 34

Page 28: Topic 5: MapReduce Theory and Implementation

Types

User-supplied map and reduce functions have associated types1 Map

I map(k1, v1)→ list(k2, v2)

2 ReduceI reduce(k2, list(v2))→ list(v2)

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 12 / 34

Page 29: Topic 5: MapReduce Theory and Implementation

More applications

Distributed Grep1 Map

F Emits a line if its matches a user-provided pattern

2 ReduceF Identity function

Count of URL Access Frequency1 Map

F Similar to Word Count map. Instead of words we have URLs

2 ReduceF Similar to Word Count reduce

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 13 / 34

Page 30: Topic 5: MapReduce Theory and Implementation

More applications

Distributed Grep1 Map

F Emits a line if its matches a user-provided pattern

2 ReduceF Identity function

Count of URL Access Frequency1 Map

F Similar to Word Count map. Instead of words we have URLs

2 ReduceF Similar to Word Count reduce

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 13 / 34

Page 31: Topic 5: MapReduce Theory and Implementation

More applications (2)

Inverted Index1 Map

F Emits a sequence of < word, document_ID >

2 ReduceF Emits < word, list(document_ID) >

Distributed Sort1 Map

F Identity

2 ReduceF Identity

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 14 / 34

Page 32: Topic 5: MapReduce Theory and Implementation

More applications (2)

Inverted Index1 Map

F Emits a sequence of < word, document_ID >

2 ReduceF Emits < word, list(document_ID) >

Distributed Sort1 Map

F Identity

2 ReduceF Identity

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 14 / 34

Page 33: Topic 5: MapReduce Theory and Implementation

Outline

1 Introduction

2 Programming Model

3 Implementation

4 Refinements

5 Hadoop

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 15 / 34

Page 34: Topic 5: MapReduce Theory and Implementation

Cluster architecture

A large cluster of shared-nothing commodity machines connected viaEthernet

Each node is an x86 system running Linux with local memory

Commodity networking hardware connected in the form of a treetopology

As clusters consist of hundreds or thousands of machines, failure ispretty commonEach machine consists of local hard-drives

I Google Filesystem runs atop of these disks which employs replication toensure availability and reliability

Jobs are submitted to a scheduler, which maps tasks within that job toavailable machines within the cluster

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 16 / 34

Page 35: Topic 5: MapReduce Theory and Implementation

Cluster architecture

A large cluster of shared-nothing commodity machines connected viaEthernet

Each node is an x86 system running Linux with local memory

Commodity networking hardware connected in the form of a treetopology

As clusters consist of hundreds or thousands of machines, failure ispretty commonEach machine consists of local hard-drives

I Google Filesystem runs atop of these disks which employs replication toensure availability and reliability

Jobs are submitted to a scheduler, which maps tasks within that job toavailable machines within the cluster

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 16 / 34

Page 36: Topic 5: MapReduce Theory and Implementation

Cluster architecture

A large cluster of shared-nothing commodity machines connected viaEthernet

Each node is an x86 system running Linux with local memory

Commodity networking hardware connected in the form of a treetopology

As clusters consist of hundreds or thousands of machines, failure ispretty common

Each machine consists of local hard-drivesI Google Filesystem runs atop of these disks which employs replication to

ensure availability and reliability

Jobs are submitted to a scheduler, which maps tasks within that job toavailable machines within the cluster

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 16 / 34

Page 37: Topic 5: MapReduce Theory and Implementation

Cluster architecture

A large cluster of shared-nothing commodity machines connected viaEthernet

Each node is an x86 system running Linux with local memory

Commodity networking hardware connected in the form of a treetopology

As clusters consist of hundreds or thousands of machines, failure ispretty commonEach machine consists of local hard-drives

I Google Filesystem runs atop of these disks which employs replication toensure availability and reliability

Jobs are submitted to a scheduler, which maps tasks within that job toavailable machines within the cluster

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 16 / 34

Page 38: Topic 5: MapReduce Theory and Implementation

Cluster architecture

A large cluster of shared-nothing commodity machines connected viaEthernet

Each node is an x86 system running Linux with local memory

Commodity networking hardware connected in the form of a treetopology

As clusters consist of hundreds or thousands of machines, failure ispretty commonEach machine consists of local hard-drives

I Google Filesystem runs atop of these disks which employs replication toensure availability and reliability

Jobs are submitted to a scheduler, which maps tasks within that job toavailable machines within the cluster

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 16 / 34

Page 39: Topic 5: MapReduce Theory and Implementation

Cluster architecture

A large cluster of shared-nothing commodity machines connected viaEthernet

Each node is an x86 system running Linux with local memory

Commodity networking hardware connected in the form of a treetopology

As clusters consist of hundreds or thousands of machines, failure ispretty commonEach machine consists of local hard-drives

I Google Filesystem runs atop of these disks which employs replication toensure availability and reliability

Jobs are submitted to a scheduler, which maps tasks within that job toavailable machines within the cluster

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 16 / 34

Page 40: Topic 5: MapReduce Theory and Implementation

MapReduce architecture

1 Master: In charge of all meta data, work scheduling and distribution,and job orchestration

2 Workers: Contain slots to execute map or reduce functions

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 17 / 34

Page 41: Topic 5: MapReduce Theory and Implementation

MapReduce architecture

1 Master: In charge of all meta data, work scheduling and distribution,and job orchestration

2 Workers: Contain slots to execute map or reduce functions

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 17 / 34

Page 42: Topic 5: MapReduce Theory and Implementation

Execution

1 The user writes map and reduce functions and stitches together aMapReduce specification with the location of the input dataset, numberof reduce tasks, and other attributes

2 The master logically splits the input dataset into M splits, whereM = (Input_dataset_size)/(GFS_block_size)

I The GFS block size is typically a multiple of 64MB

3 It then earmarks M map tasks and assigns them to workers. Eachworker has a configurable number of task slots. Each time a workercompletes a task, the master assigns it more pending map tasks

4 Once all map tasks have completed, the master assigns R reducetasks to worker nodes

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 18 / 34

Page 43: Topic 5: MapReduce Theory and Implementation

Execution

1 The user writes map and reduce functions and stitches together aMapReduce specification with the location of the input dataset, numberof reduce tasks, and other attributes

2 The master logically splits the input dataset into M splits, whereM = (Input_dataset_size)/(GFS_block_size)

I The GFS block size is typically a multiple of 64MB

3 It then earmarks M map tasks and assigns them to workers. Eachworker has a configurable number of task slots. Each time a workercompletes a task, the master assigns it more pending map tasks

4 Once all map tasks have completed, the master assigns R reducetasks to worker nodes

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 18 / 34

Page 44: Topic 5: MapReduce Theory and Implementation

Execution

1 The user writes map and reduce functions and stitches together aMapReduce specification with the location of the input dataset, numberof reduce tasks, and other attributes

2 The master logically splits the input dataset into M splits, whereM = (Input_dataset_size)/(GFS_block_size)

I The GFS block size is typically a multiple of 64MB

3 It then earmarks M map tasks and assigns them to workers. Eachworker has a configurable number of task slots. Each time a workercompletes a task, the master assigns it more pending map tasks

4 Once all map tasks have completed, the master assigns R reducetasks to worker nodes

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 18 / 34

Page 45: Topic 5: MapReduce Theory and Implementation

Execution

1 The user writes map and reduce functions and stitches together aMapReduce specification with the location of the input dataset, numberof reduce tasks, and other attributes

2 The master logically splits the input dataset into M splits, whereM = (Input_dataset_size)/(GFS_block_size)

I The GFS block size is typically a multiple of 64MB

3 It then earmarks M map tasks and assigns them to workers. Eachworker has a configurable number of task slots. Each time a workercompletes a task, the master assigns it more pending map tasks

4 Once all map tasks have completed, the master assigns R reducetasks to worker nodes

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 18 / 34

Page 46: Topic 5: MapReduce Theory and Implementation

Execution

1 The user writes map and reduce functions and stitches together aMapReduce specification with the location of the input dataset, numberof reduce tasks, and other attributes

2 The master logically splits the input dataset into M splits, whereM = (Input_dataset_size)/(GFS_block_size)

I The GFS block size is typically a multiple of 64MB

3 It then earmarks M map tasks and assigns them to workers. Eachworker has a configurable number of task slots. Each time a workercompletes a task, the master assigns it more pending map tasks

4 Once all map tasks have completed, the master assigns R reducetasks to worker nodes

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 18 / 34

Page 47: Topic 5: MapReduce Theory and Implementation

Mappers

1 A map worker reads the contents of the input split that it has beenassigned

2 It parses the file and converts it to key/value pairs and invokes theuser-defined map function for each pair

3 The intermediate key/value pairs after the application of the map logicare collected (buffered) in memory

4 Once the buffered key/value pairs exceed a threshold they are writtento local disk and partitioned (using a partitioning function) into Rpartitions. The location of each partition is passed to the master

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 19 / 34

Page 48: Topic 5: MapReduce Theory and Implementation

Mappers

1 A map worker reads the contents of the input split that it has beenassigned

2 It parses the file and converts it to key/value pairs and invokes theuser-defined map function for each pair

3 The intermediate key/value pairs after the application of the map logicare collected (buffered) in memory

4 Once the buffered key/value pairs exceed a threshold they are writtento local disk and partitioned (using a partitioning function) into Rpartitions. The location of each partition is passed to the master

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 19 / 34

Page 49: Topic 5: MapReduce Theory and Implementation

Mappers

1 A map worker reads the contents of the input split that it has beenassigned

2 It parses the file and converts it to key/value pairs and invokes theuser-defined map function for each pair

3 The intermediate key/value pairs after the application of the map logicare collected (buffered) in memory

4 Once the buffered key/value pairs exceed a threshold they are writtento local disk and partitioned (using a partitioning function) into Rpartitions. The location of each partition is passed to the master

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 19 / 34

Page 50: Topic 5: MapReduce Theory and Implementation

Mappers

1 A map worker reads the contents of the input split that it has beenassigned

2 It parses the file and converts it to key/value pairs and invokes theuser-defined map function for each pair

3 The intermediate key/value pairs after the application of the map logicare collected (buffered) in memory

4 Once the buffered key/value pairs exceed a threshold they are writtento local disk and partitioned (using a partitioning function) into Rpartitions. The location of each partition is passed to the master

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 19 / 34

Page 51: Topic 5: MapReduce Theory and Implementation

Reducers

1 A reduce worker gets locations of its input partitions from the masterand uses HTTP requests to retrieve them

2 Once it has read all its input, it sorts it by key to group together alloccurrences of the same key

3 It then invokes the user-defined reduce for each key and passes it thekey and its associated values

4 The key/value pairs generated after the application of the reduce logicare then written to a final output file, which is subsequently written tothe distributed filesystem

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 20 / 34

Page 52: Topic 5: MapReduce Theory and Implementation

Reducers

1 A reduce worker gets locations of its input partitions from the masterand uses HTTP requests to retrieve them

2 Once it has read all its input, it sorts it by key to group together alloccurrences of the same key

3 It then invokes the user-defined reduce for each key and passes it thekey and its associated values

4 The key/value pairs generated after the application of the reduce logicare then written to a final output file, which is subsequently written tothe distributed filesystem

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 20 / 34

Page 53: Topic 5: MapReduce Theory and Implementation

Reducers

1 A reduce worker gets locations of its input partitions from the masterand uses HTTP requests to retrieve them

2 Once it has read all its input, it sorts it by key to group together alloccurrences of the same key

3 It then invokes the user-defined reduce for each key and passes it thekey and its associated values

4 The key/value pairs generated after the application of the reduce logicare then written to a final output file, which is subsequently written tothe distributed filesystem

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 20 / 34

Page 54: Topic 5: MapReduce Theory and Implementation

Reducers

1 A reduce worker gets locations of its input partitions from the masterand uses HTTP requests to retrieve them

2 Once it has read all its input, it sorts it by key to group together alloccurrences of the same key

3 It then invokes the user-defined reduce for each key and passes it thekey and its associated values

4 The key/value pairs generated after the application of the reduce logicare then written to a final output file, which is subsequently written tothe distributed filesystem

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 20 / 34

Page 55: Topic 5: MapReduce Theory and Implementation

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 21 / 34

Page 56: Topic 5: MapReduce Theory and Implementation

Book-keeping by the Master

The master contains meta-data for all jobs running in the cluster

For each map and reduce tasks, it stores the state (pending,in-progress, or completed) and the ID of the worker on which it isexecuting (in-progress state)

It stores the locations and sizes of partitions for each map task

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 22 / 34

Page 57: Topic 5: MapReduce Theory and Implementation

Book-keeping by the Master

The master contains meta-data for all jobs running in the cluster

For each map and reduce tasks, it stores the state (pending,in-progress, or completed) and the ID of the worker on which it isexecuting (in-progress state)

It stores the locations and sizes of partitions for each map task

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 22 / 34

Page 58: Topic 5: MapReduce Theory and Implementation

Book-keeping by the Master

The master contains meta-data for all jobs running in the cluster

For each map and reduce tasks, it stores the state (pending,in-progress, or completed) and the ID of the worker on which it isexecuting (in-progress state)

It stores the locations and sizes of partitions for each map task

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 22 / 34

Page 59: Topic 5: MapReduce Theory and Implementation

Fault-tolerance

For large compute clusters, failures are the norm rather than the exception

1 Worker:I Each worker sends a periodic heartbeat signal to the masterI If the master does not receive a heartbeat from a worker in a certain

amount of time, it marks the worker as failedI In-progress map and reduce tasks are simply re-executed on other

nodes. Same goes for completed map tasks (as their output is lost onmachine failure)

I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

2 Master:I The entire computation is marked as failedI But simple to keep the master soft state and re-spawn

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

Page 60: Topic 5: MapReduce Theory and Implementation

Fault-tolerance

For large compute clusters, failures are the norm rather than the exception1 Worker:

I Each worker sends a periodic heartbeat signal to the master

I If the master does not receive a heartbeat from a worker in a certainamount of time, it marks the worker as failed

I In-progress map and reduce tasks are simply re-executed on othernodes. Same goes for completed map tasks (as their output is lost onmachine failure)

I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

2 Master:I The entire computation is marked as failedI But simple to keep the master soft state and re-spawn

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

Page 61: Topic 5: MapReduce Theory and Implementation

Fault-tolerance

For large compute clusters, failures are the norm rather than the exception1 Worker:

I Each worker sends a periodic heartbeat signal to the masterI If the master does not receive a heartbeat from a worker in a certain

amount of time, it marks the worker as failed

I In-progress map and reduce tasks are simply re-executed on othernodes. Same goes for completed map tasks (as their output is lost onmachine failure)

I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

2 Master:I The entire computation is marked as failedI But simple to keep the master soft state and re-spawn

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

Page 62: Topic 5: MapReduce Theory and Implementation

Fault-tolerance

For large compute clusters, failures are the norm rather than the exception1 Worker:

I Each worker sends a periodic heartbeat signal to the masterI If the master does not receive a heartbeat from a worker in a certain

amount of time, it marks the worker as failedI In-progress map and reduce tasks are simply re-executed on other

nodes. Same goes for completed map tasks (as their output is lost onmachine failure)

I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

2 Master:I The entire computation is marked as failedI But simple to keep the master soft state and re-spawn

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

Page 63: Topic 5: MapReduce Theory and Implementation

Fault-tolerance

For large compute clusters, failures are the norm rather than the exception1 Worker:

I Each worker sends a periodic heartbeat signal to the masterI If the master does not receive a heartbeat from a worker in a certain

amount of time, it marks the worker as failedI In-progress map and reduce tasks are simply re-executed on other

nodes. Same goes for completed map tasks (as their output is lost onmachine failure)

I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

2 Master:I The entire computation is marked as failedI But simple to keep the master soft state and re-spawn

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

Page 64: Topic 5: MapReduce Theory and Implementation

Fault-tolerance

For large compute clusters, failures are the norm rather than the exception1 Worker:

I Each worker sends a periodic heartbeat signal to the masterI If the master does not receive a heartbeat from a worker in a certain

amount of time, it marks the worker as failedI In-progress map and reduce tasks are simply re-executed on other

nodes. Same goes for completed map tasks (as their output is lost onmachine failure)

I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

2 Master:I The entire computation is marked as failed

I But simple to keep the master soft state and re-spawn

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

Page 65: Topic 5: MapReduce Theory and Implementation

Fault-tolerance

For large compute clusters, failures are the norm rather than the exception1 Worker:

I Each worker sends a periodic heartbeat signal to the masterI If the master does not receive a heartbeat from a worker in a certain

amount of time, it marks the worker as failedI In-progress map and reduce tasks are simply re-executed on other

nodes. Same goes for completed map tasks (as their output is lost onmachine failure)

I Completed reduce tasks are not re-executed as their output resides onthe distributed filesystem

2 Master:I The entire computation is marked as failedI But simple to keep the master soft state and re-spawn

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 23 / 34

Page 66: Topic 5: MapReduce Theory and Implementation

Locality

Network bandwidth is a scare resource in typical clusters

GFS slices files into 64MB blocks and stores 3 replicas across thecluster

The master exploits this information by scheduling a map task near itsinput data. Preference is in the order, node-local, rack/switch-local, andany

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 24 / 34

Page 67: Topic 5: MapReduce Theory and Implementation

Locality

Network bandwidth is a scare resource in typical clusters

GFS slices files into 64MB blocks and stores 3 replicas across thecluster

The master exploits this information by scheduling a map task near itsinput data. Preference is in the order, node-local, rack/switch-local, andany

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 24 / 34

Page 68: Topic 5: MapReduce Theory and Implementation

Locality

Network bandwidth is a scare resource in typical clusters

GFS slices files into 64MB blocks and stores 3 replicas across thecluster

The master exploits this information by scheduling a map task near itsinput data. Preference is in the order, node-local, rack/switch-local, andany

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 24 / 34

Page 69: Topic 5: MapReduce Theory and Implementation

Speculative re-execution

Every now and then the entire computation is held-up by a “straggler”task

Stragglers can arise due to a number of reasons, such as machineload, network traffic, software/hardware bugs, etc.

To deal with stragglers, the master speculatively re-executes slow taskson other machines

The task is marked as completed whenever the primary or the backupfinishes its execution

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 25 / 34

Page 70: Topic 5: MapReduce Theory and Implementation

Speculative re-execution

Every now and then the entire computation is held-up by a “straggler”task

Stragglers can arise due to a number of reasons, such as machineload, network traffic, software/hardware bugs, etc.

To deal with stragglers, the master speculatively re-executes slow taskson other machines

The task is marked as completed whenever the primary or the backupfinishes its execution

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 25 / 34

Page 71: Topic 5: MapReduce Theory and Implementation

Speculative re-execution

Every now and then the entire computation is held-up by a “straggler”task

Stragglers can arise due to a number of reasons, such as machineload, network traffic, software/hardware bugs, etc.

To deal with stragglers, the master speculatively re-executes slow taskson other machines

The task is marked as completed whenever the primary or the backupfinishes its execution

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 25 / 34

Page 72: Topic 5: MapReduce Theory and Implementation

Speculative re-execution

Every now and then the entire computation is held-up by a “straggler”task

Stragglers can arise due to a number of reasons, such as machineload, network traffic, software/hardware bugs, etc.

To deal with stragglers, the master speculatively re-executes slow taskson other machines

The task is marked as completed whenever the primary or the backupfinishes its execution

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 25 / 34

Page 73: Topic 5: MapReduce Theory and Implementation

Scalability

Possible to run on multiple scales: from single nodes to data centerswith tens of thousands of nodes

Nodes can be added/removed on the fly to scale up/down

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 26 / 34

Page 74: Topic 5: MapReduce Theory and Implementation

Scalability

Possible to run on multiple scales: from single nodes to data centerswith tens of thousands of nodes

Nodes can be added/removed on the fly to scale up/down

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 26 / 34

Page 75: Topic 5: MapReduce Theory and Implementation

Outline

1 Introduction

2 Programming Model

3 Implementation

4 Refinements

5 Hadoop

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 27 / 34

Page 76: Topic 5: MapReduce Theory and Implementation

Partitioning

By default MapReduce uses hash partitioning to partition the keyspace

I hash(key) % R

Optionally, the user can provide a custom partitioning function to say,negate skew or to ensure that certain keys always end up at aparticular reduce worker

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 28 / 34

Page 77: Topic 5: MapReduce Theory and Implementation

Partitioning

By default MapReduce uses hash partitioning to partition the keyspace

I hash(key) % R

Optionally, the user can provide a custom partitioning function to say,negate skew or to ensure that certain keys always end up at aparticular reduce worker

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 28 / 34

Page 78: Topic 5: MapReduce Theory and Implementation

Combiner function

For reduce functions which are commutative and associative, the usercan additionally provide a combiner function which is applied to theoutput of the map for local merging

Typically, the same reduce function is used as a combiner

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 29 / 34

Page 79: Topic 5: MapReduce Theory and Implementation

Combiner function

For reduce functions which are commutative and associative, the usercan additionally provide a combiner function which is applied to theoutput of the map for local merging

Typically, the same reduce function is used as a combiner

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 29 / 34

Page 80: Topic 5: MapReduce Theory and Implementation

Input/output formats

By default, the library supports a number of input/output formats

I For instance, text as input and key/value pairs as output

Optionally, the user can specify custom input readers and outputwriters

I For instance, to read/write from/to a database

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 30 / 34

Page 81: Topic 5: MapReduce Theory and Implementation

Input/output formats

By default, the library supports a number of input/output formatsI For instance, text as input and key/value pairs as output

Optionally, the user can specify custom input readers and outputwriters

I For instance, to read/write from/to a database

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 30 / 34

Page 82: Topic 5: MapReduce Theory and Implementation

Input/output formats

By default, the library supports a number of input/output formatsI For instance, text as input and key/value pairs as output

Optionally, the user can specify custom input readers and outputwriters

I For instance, to read/write from/to a database

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 30 / 34

Page 83: Topic 5: MapReduce Theory and Implementation

Input/output formats

By default, the library supports a number of input/output formatsI For instance, text as input and key/value pairs as output

Optionally, the user can specify custom input readers and outputwriters

I For instance, to read/write from/to a database

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 30 / 34

Page 84: Topic 5: MapReduce Theory and Implementation

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 31 / 34

Page 85: Topic 5: MapReduce Theory and Implementation

Outline

1 Introduction

2 Programming Model

3 Implementation

4 Refinements

5 Hadoop

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 32 / 34

Page 86: Topic 5: MapReduce Theory and Implementation

Hadoop

Open-source implementation of MapReduce, developed by DougCutting originally at Yahoo! in 2004

Now a top-level Apache open-source project

Implemented in Java (Google’s in-house implementation is in C++)

Comes with an associated distributed filesystem, HDFS (clone of GFS)

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 33 / 34

Page 87: Topic 5: MapReduce Theory and Implementation

Hadoop

Open-source implementation of MapReduce, developed by DougCutting originally at Yahoo! in 2004

Now a top-level Apache open-source project

Implemented in Java (Google’s in-house implementation is in C++)

Comes with an associated distributed filesystem, HDFS (clone of GFS)

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 33 / 34

Page 88: Topic 5: MapReduce Theory and Implementation

Hadoop

Open-source implementation of MapReduce, developed by DougCutting originally at Yahoo! in 2004

Now a top-level Apache open-source project

Implemented in Java (Google’s in-house implementation is in C++)

Comes with an associated distributed filesystem, HDFS (clone of GFS)

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 33 / 34

Page 89: Topic 5: MapReduce Theory and Implementation

Hadoop

Open-source implementation of MapReduce, developed by DougCutting originally at Yahoo! in 2004

Now a top-level Apache open-source project

Implemented in Java (Google’s in-house implementation is in C++)

Comes with an associated distributed filesystem, HDFS (clone of GFS)

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 33 / 34

Page 90: Topic 5: MapReduce Theory and Implementation

References

Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplifieddata processing on large clusters. In Proceedings of the 6thSymposium on Operating Systems Design & Implementation -(OSDI’04), Vol. 6. USENIX Association, Berkeley, CA, USA.

Zubair Nabi 5: MapReduce Theory and Implementation April 18, 2013 34 / 34