22
SLIDES CREATED BY : SHRIDEEP P ALLICKARA L10.1 CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science, Colorado State University CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [MAPREDUCE] Shrideep Pallickara Computer Science Colorado State University September 26, 2019 L10.1 CS555: Distributed Systems [Fall 2019] Dept. Of Computer Science, Colorado State University L10.2 Professor: SHRIDEEP P ALLICKARA Frequently asked questions from the previous class survey September 26, 2019

CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.1

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS 555: DISTRIBUTED SYSTEMS

[MAPREDUCE]

Shrideep PallickaraComputer Science

Colorado State University

September 26, 2019 L10.1

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.2Professor: SHRIDEEP PALLICKARA

Frequently asked questions from the previous class survey

September 26, 2019

Page 2: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.2

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.3Professor: SHRIDEEP PALLICKARA

Topics covered in this lecture

¨ MapReduce

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

MAPREDUCE

September 26, 2019 L10.4

Page 3: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.3

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.5Professor: SHRIDEEP PALLICKARA

MapReduce: Topics that we will cover

¨ Why?

¨ What it is and what it is not?¨ The core framework and original Google paper

¨ Development of simple programs using Hadoop¤ The dominant MapReduce implementation

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.6Professor: SHRIDEEP PALLICKARA

MapReduce

¨ It’s a framework for processing data residing on a large number of computers

¨ Very powerful framework¤ Excellent for some problems¤ Challenging or not applicable in other classes of problems

September 26, 2019

Page 4: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.4

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.7Professor: SHRIDEEP PALLICKARA

What is MapReduce?

¨ More a framework than a tool

¨ You are required to fit (some folks shoehorn it) your solution into the MapReduce framework

¨ MapReduce is not a feature, but rather a constraint

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.8Professor: SHRIDEEP PALLICKARA

What does this constraint mean?

¨ It makes problem solving easier and harder

¨ Clear boundaries for what you can and cannot do¤ You actually need to consider fewer options than what you are used to

¨ But solving problems with constraints requires planning and a change in your thinking

September 26, 2019

Page 5: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.5

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.9Professor: SHRIDEEP PALLICKARA

But what does this get us?

¨ Tradeoff of being confined to the MapReduce framework?¤ Ability to process data on a large number of computers¤ But, more importantly, without having to worry about concurrency, scale,

fault tolerance, and robustness

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.10Professor: SHRIDEEP PALLICKARA

A challenge in writing MapReduce programs

¨ Design!¤ Good programmers can produce bad software due to poor design¤ Good programmers can produce bad MapReduce algorithms

¨ Only in this case your mistakes will be amplified¤ Your job may be distributed on 100s or 1000s of machines and operating

on a Petabyte of data

September 26, 2019

Page 6: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.6

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.11Professor: SHRIDEEP PALLICKARA

MapReduce: Origins of the design

¨ Process crawled data and logs of web requests

¨ Several computations work on this raw data to compute derived data¤ Inverted indices¤ Representation of graph structure of web documents¤ Pages crawled per host¤ Most frequent queries in a day …

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.12Professor: SHRIDEEP PALLICKARA

Most computations are conceptually straightforward

¨ But data is large

¨ Computations must be scalable¤ Distributed across thousands of machines¤ To complete in a reasonable amount of time

September 26, 2019

Page 7: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.7

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.13Professor: SHRIDEEP PALLICKARA

Complexity of managing distributed computations can …

¨ Obscure simplicity of original computation

¨ Contributing factors:

① How to parallelize computation

② Distribute the data

③ Handle failures

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.14Professor: SHRIDEEP PALLICKARA

MapReduce was developed to cope with this complexity

¨ Express simple computations

¨ Hide messy details of ¤ Parallelization¤ Data distribution¤ Fault tolerance¤ Load balancing

September 26, 2019

Page 8: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.8

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.15Professor: SHRIDEEP PALLICKARA

MapReduce

¨ Programming model

¨ Associated implementation for ¤ Processing & Generating large data sets

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.16Professor: SHRIDEEP PALLICKARA

Programming model

¨ Computation takes a set of input key/value pairs

¨ Produces a set of output key/value pairs

¨ Express the computation as two functions:¤ Map¤ Reduce

September 26, 2019

Page 9: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.9

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.17Professor: SHRIDEEP PALLICKARA

Map

¨ Takes an input pair

¨ Produces a set of intermediate key/value pairs

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.18Professor: SHRIDEEP PALLICKARA

MapReduce library

¨ Groups all intermediate values with the same intermediate key

¨ Passes them to the Reduce function

September 26, 2019

Page 10: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.10

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.19Professor: SHRIDEEP PALLICKARA

Reduce function

¨ Accepts intermediate key I and ¤ Set of values for that key

¨ Merge these values together to get¤ Smaller set of values

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.20Professor: SHRIDEEP PALLICKARA

Counting number occurrences of each word in a large collection of documentsmap (String key, String value)

//key: document name//value: document contents

for each word w in valueEmitIntermediate(w, “1”)

September 26, 2019

Page 11: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.11

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.21Professor: SHRIDEEP PALLICKARA

Counting number occurrences of each word in a large collection of documents

reduce (String key, Iterator values)//key: a word//value: a list of counts

int result = 0; for each v in values

result += ParseInt(v);Emit(AsString(result)); Sums together all counts

emitted for a particular word

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.22Professor: SHRIDEEP PALLICKARA

MapReduce specification object contains

¨ Names of¤ Input¤ Output

¨ Tuning parameters

September 26, 2019

Page 12: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.12

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.23Professor: SHRIDEEP PALLICKARA

Map and reduce functions have associated types drawn from different domains

map(k1, v1) à list(k2, v2)

reduce(k2, list(v2)) à list(v2)

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.24Professor: SHRIDEEP PALLICKARA

What’s passed to-and-from user-defined functions

¨ Strings

¨ User code converts between¤ String¤ Appropriate types

September 26, 2019

Page 13: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.13

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.25Professor: SHRIDEEP PALLICKARA

Programs expressed as MapReduce computations: Distributed Grep

¨ Map¤ Emit line if it matches specified pattern

¨ Reduce¤ Just copy intermediate data to the output

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.26Professor: SHRIDEEP PALLICKARA

Term-Vector per Host

¨ Summarizes important terms that occur in a set of documents <word, frequency>

¨ Map¤ Emit <hostname, term vector>¤ For each input document

¨ Reduce function¤ Has all per-document vectors for a given host¤ Add term vectors; discard away infrequent terms

n <hostname, term vector>

September 26, 2019

Page 14: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.14

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

IMPLEMENTATION OF THE RUNTIME

September 26, 2019 L10.27

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.28Professor: SHRIDEEP PALLICKARA

Implementation

¨ Machines are commodity machines

¨ GFS is used to manage the data stored on the disks

September 26, 2019

Page 15: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.15

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.29Professor: SHRIDEEP PALLICKARA

Execution Overview – Part I

¨ Maps distributed across multiple machines

¨ Automatic partitioning of data into M splits

¨ Splits processed concurrently on different machines

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.30Professor: SHRIDEEP PALLICKARA

Execution Overview – Part II

¨ Partition intermediate key space into R pieces

¨ E.g. hash(key) mod R¨ User specified parameters

¤ Partitioning function¤ Number of partitions (R)

September 26, 2019

Page 16: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.16

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.31Professor: SHRIDEEP PALLICKARA

Execution Overview

Split 0Split 1Split 2Split 3Split 4

User Program

Master

Worker

Worker

Worker

Worker

Worker

Output file 0

Output file 1

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.32Professor: SHRIDEEP PALLICKARA

Execution Overview: Step IThe MapReduce library

¨ Splits input files into M pieces¤ 16-64 MB per piece

¨ Starts up copies of the program on a cluster of machines

September 26, 2019

Page 17: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.17

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.33Professor: SHRIDEEP PALLICKARA

Execution Overview: Step IIProgram copies

¨ One of the copies is a Master

¨ There are M map tasks and R reduce tasks to assign

¨ Master¤ Picks idle workers¤ Assigns each worker a map or reduce task

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.34Professor: SHRIDEEP PALLICKARA

Execution Overview: Step IIIWorkers that are assigned a map task

¨ Read contents of their input split

¨ Parses <key, value> pairs out of input data

¨ Pass each pair to user-defined Map function

¨ Intermediate <key, value> pairs from Maps¤ Buffered in Memory

September 26, 2019

Page 18: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.18

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.35Professor: SHRIDEEP PALLICKARA

Execution Overview: Step IVWriting to disk

¨ Periodically, buffered pairs are written to disk

¨ These writes are partitioned¤ By the partitioning function

¨ Locations of buffered pairs on local disk¤ Reported to back to Master¤ Master forwards these locations to reduce workers

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.36Professor: SHRIDEEP PALLICKARA

Execution Overview: Step VReading Intermediate data

¨ Master notifies Reduce worker about locations

¨ Reduce worker reads buffered data from the local disks of Maps

¨ Read all intermediate data; sort by intermediate key¤ All occurrences of same key grouped together¤ Many different keys map to the same Reduce task

September 26, 2019

Page 19: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.19

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.37Professor: SHRIDEEP PALLICKARA

Execution Overview: Step VIProcessing data at the Reduce worker

¨ Iterate over sorted intermediate data

¨ For each unique key pass¤ Key + set of intermediate values to Reduce function

¨ Output of Reduce function is appended¤ To output file of reduce partition

September 26, 2019

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.38Professor: SHRIDEEP PALLICKARA

Execution Overview: Step VIIWaking up the user

¨ After all Map & Reduce tasks have been completed

¨ Control returns to the user code

September 26, 2019

Page 20: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.20

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

TASK GRANULARITY

September 26, 2019 L10.39

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.40Professor: SHRIDEEP PALLICKARA

Task Granularity

¨ Subdivide map phase into M pieces

¨ Subdivide reduce phase into R pieces

¨ M, R >> number of worker machines

¨ Each worker performing many different tasks¤ Improves dynamic load balancing

¤ Speeds up recovery during failures

September 26, 2019

Page 21: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.21

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.41Professor: SHRIDEEP PALLICKARA

Master Data Structures

September 26, 2019

¨ For each Map and Reduce task¤ State: {idle, in-progress, completed}¤ Worker machine identity

¨ For each completed Map task store ¤ Location and sizes of R intermediate file regions

¨ Information pushed incrementally to in-progress Reduce tasks

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.42Professor: SHRIDEEP PALLICKARA

Practical bounds on how large M and R can be

¨ Master must make O(M + R) scheduling decisions

¨ Keep O(MR) state in memory

September 26, 2019

Page 22: CS 555: DISTRIBUTED SYSTEMS [MAPREDUCEcs555/lectures/slides/CS555-L10... · SLIDESCREATEDBY: SHRIDEEPPALLICKARA L10.1 CS555: Distributed Systems[Fall 2019] Dept. Of Computer Science,

SLIDES CREATED BY: SHRIDEEP PALLICKARA L10.22

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2019]Dept. Of Computer Science, Colorado State University

L10.43Professor: SHRIDEEP PALLICKARA

The contents of this slide-set are based on the following references¨ JEFFREY DEAN and SANJAY GHEMAWAT: MapReduce: Simplified Data Processing on

Large Clusters. OSDI 2004: 137-150

¨ MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoopand Other Systems. 1st Edition. Donald Miner and Adam Shook. O'Reilly Media ISBN: 978-1449327170. [Chapter 1]

September 26, 2019