CS60021: Scalable Data Mining Map Reducecse.iitkgp.ac.in/~sourangshu/coursefiles/SDM17A/mapreduce1.pdf · 2 Cluster Architecture Mem Disk CPU Mem Disk CPU … Switch Each rack contains

CS60021:ScalableDataMining

MapReduceSourangshuBha=acharya

Mo?va?on:GoogleExample•  20+billionwebpagesx20KB=400+TB•  1computerreads30-35MB/secfromdisk

–  ~4monthstoreadtheweb•  ~1,000harddrivestostoretheweb•  Takesevenmoretodosomethingusefulwiththedata!

•  Today,astandardarchitectureforsuchproblemsisemerging:–  ClusterofcommodityLinuxnodes–  Commoditynetwork(ethernet)toconnectthem

J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,h=p://

www.mmds.org2

ClusterArchitecture

Mem

Disk

CPU

Mem

Disk

CPU

…

Switch

Eachrackcontains16-64nodes

Mem

Disk

CPU

Mem

Disk

CPU

…

Switch

Switch1Gbpsbetweenanypairofnodesinarack

2-10Gbpsbackbonebetweenracks

In 2011 it was guestimated that Google had 1M machines, http://bit.ly/Shh0RO J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,h=p://

www.mmds.org3

Large-scaleCompu?ng•  Large-scalecompuAngfordataminingproblemsoncommodityhardware

•  Challenges:– HowdoyoudistributecomputaAon?– Howcanwemakeiteasytowritedistributedprograms?

– Machinesfail:•  Oneservermaystayup3years(1,000days)•  Ifyouhave1,000servers,expecttoloose1/day•  Peoplees?matedGooglehad~1Mmachinesin2011

–  1,000machinesfaileveryday!J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,h=p://

www.mmds.org4

BigDataChallengesq Scalability:processingshouldscalewithincreaseindata.q FaultTolerance:func?oninpresenceofhardwarefailureq CostEffec?ve:shouldrunoncommodityhardwareq Easeofuse:programsshouldbesmallq Flexibility:abletoprocessunstructureddata

q Solu?on:MapReduce!

IdeaandSolu?on•  Issue:CopyingdataoveranetworktakesAme•  Idea:

– Bringcomputa?onclosetothedata– Storefilesmul?ple?mesforreliability

•  Map-reduceaddressestheseproblems– Elegantwaytoworkwithbigdata– StorageInfrastructure–Filesystem

•  Google:GFS.Hadoop:HDFS– Programmingmodel

•  Map-Reduce J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,h=p://

www.mmds.org6

StorageInfrastructure

•  Problem:–  Ifnodesfail,howtostoredatapersistently?

•  Answer:– DistributedFileSystem:

•  Providesglobalfilenamespace•  GoogleGFS;HadoopHDFS;

•  TypicalusagepaKern– Hugefiles(100sofGBtoTB)– Dataisrarelyupdatedinplace–  Readsandappendsarecommon


www.mmds.org7

WhatisHadoop?q Ascalablefault-tolerantdistributedsystemfordatastorage

andprocessing.q CoreHadoop:

q HadoopDistributedFileSystem(HDFS)q HadoopYARN:JobSchedulingandClusterResourceManagementq HadoopMapReduce:Frameworkfordistributeddataprocessing.

q OpenSourcesystemwithlargecommunitysupport.h=ps://hadoop.apache.org/

HadoopArchitecture

YARN

Courtesy:h=p://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/YARN.html

HDFS

HDFSq Assump?ons

q Hardwarefailureisthenorm.q Streamingdataaccess.q Writeonce,readmany?mes.q Highthroughput,notlowlatency.q Largedatasets.

q Characteris?cs:q Performsbestwithmodestnumberoflargefilesq Op?mizedforstreamingreadsq Layerontopofna?vefilesystem.

HDFSq Dataisorganizedintofileanddirectories.q Filesaredividedintoblocksanddistributedtonodes.q Blockplacementisknownatthe?meofread

q Computa?onmovedtosamenode.

q Replica?onisusedfor:q Speedq Faulttoleranceq Selfhealing.

Goals of HDFS •  Very Large Distributed File System

– 10K nodes, 100 million files, 10 PB •  Assumes Commodity Hardware

– Files are replicated to handle hardware failure – Detect failures and recovers from them

•  Optimized for Batch Processing – Data locations exposed so that computations can move to where data resides – Provides very high aggregate bandwidth

•  User Space, runs on heterogeneous OS

Distributed File System •  Single Namespace for entire cluster •  Data Coherency

– Write-once-read-many access model – Client can only append to existing files

•  Files are broken up into blocks – Typically 128 MB block size – Each block replicated on multiple DataNodes

•  Intelligent Client – Client can find location of blocks – Client accesses data directly from DataNode

NameNode Metadata •  Meta-data in Memory

– The entire metadata is in main memory – No demand paging of meta-data

•  Types of Metadata – List of files – List of Blocks for each file – List of DataNodes for each block – File attributes, e.g creation time, replication factor

•  A Transaction Log – Records file creations, file deletions. etc

DataNode •  A Block Server

– Stores data in the local file system (e.g. ext3) – Stores meta-data of a block (e.g. CRC) – Serves data and meta-data to Clients

•  Block Report – Periodically sends a report of all existing blocks to the NameNode

•  Facilitates Pipelining of Data – Forwards data to other specified DataNodes

HDFSreadclient

Source:Hadoop:TheDefini?veGuide

HDFSwriteClient


Block Placement •  Current Strategy

-- One replica on local node -- Second replica on a remote rack -- Third replica on same remote rack -- Additional replicas are randomly placed

•  Clients read from nearest replica •  Would like to make this policy pluggable

NameNode Failure •  A single point of failure •  Transaction Log stored in multiple directories

– A directory on the local file system – A directory on a remote file system (NFS/CIFS)

•  Need to develop a real HA solution

Data Pipelining •  Client retrieves a list of DataNodes on which to place

replicas of a block •  Client writes block to the first DataNode •  The first DataNode forwards the data to the next

DataNode in the Pipeline •  When all replicas are written, the Client moves on to

write the next block in file

MAPREDUCE

WhatisMapReduce?q  Methodfordistribu?ngataskacrossmul?pleservers.q ProposedbyDeanandGhemawat,2004.q Consistsoftwodevelopercreatedphases:

q Mapq Reduce

q  InbetweenMapandReduceistheShuffleandSortphase.q Userisresponsibleforcas?ngtheproblemintomap–reduce

framework.q Mul?plemap-reducejobscanbe“chained”.

ProgrammingModel:MapReduce

Warm-uptask:•  Wehaveahugetextdocument

•  Countthenumberof?meseachdis?nctwordappearsinthefile

•  SampleapplicaAon:– AnalyzewebserverlogstofindpopularURLs


www.mmds.org25

Task:WordCount

Case1:–  Filetoolargeformemory,butall<word,count>pairsfitinmemory

Case2:•  Countoccurrencesofwords:

– words(doc.txt) | sort | uniq -c •  wherewordstakesafileandoutputsthewordsinit,oneperaline

•  Case2capturestheessenceofMapReduce– Greatthingisthatitisnaturallyparallelizable


www.mmds.org26

MapReduce:Overview•  Sequen?allyreadalotofdata•  Map:

–  Extractsomethingyoucareabout

•  Groupbykey:SortandShuffle•  Reduce:

–  Aggregate,summarize,filterortransform

•  Writetheresult

Outlinestaysthesame,MapandReducechangetofittheproblem


www.mmds.org27

MapReduce:TheMapStep

vk

k v

k v

mapvk

vk

…

k vmap

Input key-value pairs

Intermediate key-value pairs

…

k v


www.mmds.org28

MapReduce:TheReduceStep

k v

…

k v

k v

k v

Intermediate key-value pairs

Groupbykey

reduce

reduce

k v

k v

k v

…

k v

…

k v

k v v

v v

Key-value groups Output key-value pairs


www.mmds.org29

MoreSpecifically•  Input:asetofkey-valuepairs•  Programmerspecifiestwomethods:

– Map(k, v) → <k’, v’>* •  Takesakey-valuepairandoutputsasetofkey-valuepairs

–  E.g.,keyisthefilename,valueisasinglelineinthefile

•  ThereisoneMapcallforevery(k,v)pair

– Reduce(k’, <v’>*) → <k’, v’’>* •  Allvaluesv’withsamekeyk’arereducedtogetherandprocessedinv’order

•  ThereisoneReducefunc?oncallperuniquekeyk’


www.mmds.org30

MapReduce:WordCoun?ng

The crew of the space shuttle Endeavor recently re turned to Ear th as ambassadors, harbingers of a new era o f space exploration. Scientists at NASA are saying that the recent assembly of the Dextre bot is the first step in a long-term space-based man/mache partnership. '"The work we're doing now -- the robotics we're doing -- is what we're going to need ……………………..

Big document

(The,1)(crew,1)(of,1)(the,1)(space,1)(shu=le,1)

(Endeavor,1)(recently,1)

….

(crew,1)(crew,1)(space,1)(the,1)(the,1)(the,1)

(shu=le,1)(recently,1)

…

(crew,2)(space,1)(the,3)

(shu=le,1)(recently,1)

…

MAP:Readinputandproducesasetofkey-valuepairs

Groupbykey:Collectallpairswithsamekey

Reduce:Collectallvaluesbelongingtothekeyandoutput

(key, value)

Provided by the programmer

Provided by the programmer

(key, value) (key, value)

Sequ

en?a

llyre

adth

edata

Onlysequ

en?a

lreads


www.mmds.org31

WordCountUsingMapReduce

map(key, value): // key: document name; value: text of the document for each word w in value:

emit(w, 1)

reduce(key, values): // key: a word; value: an iterator over counts result = 0 for each count v in values: result += v emit(key, result)


www.mmds.org32

MapPhaseq Userwritesthemappermethod.q  Inputisanunstructuredrecord:

q E.g.ArowofRDBMStable,q Alineofatextfile,etc

q Outputisasetofrecordsoftheform:<key,value>q Bothkeyandvaluecanbeanything,e.g.text,number,etc.q E.g.forrowofRDBMStable:<columnid,value>q Lineoftextfile:<word,count>

Shuffle/Sortphaseq Shufflephaseensuresthatallthemapperoutputrecordswith

thesamekeyvalue,goestothesamereducer.q Sortensuresthatamongtherecordsreceivedateach

reducer,recordswithsamekeyarrivestogether.

Reducephaseq Reducerisauserdefinedfunc?onwhichprocessesmapper

outputrecordswithsomeofthekeysoutputbymapper.q  Inputisoftheform<key,value>

q Allrecordshavingsamekeyarrivetogether.

q Outputisasetofrecordsoftheform<key,value>q Keyisnotimportant

Parallelpicture

Example

•  WordCount:Countthetotalno.ofoccurrencesofeachword

MapReduce

Whatwasthemax/mintemperatureforthelastcentury?

HadoopMapReduceq Provides:

q Automa?cparalleliza?onandDistribu?onq FaultToleranceq MethodsforinterfacingwithHDFSforcoloca?onofcomputa?onand

storageofoutput.q StatusandMonitoringtoolsq APIinJavaq Abilitytodefinethemapperandreducerinmanylanguagesthrough

Hadoopstreaming.

HadoopMRDataFlow


Hadoop(v2)MRjob


Shuffleandsort


DataFlow

•  Inputandfinaloutputarestoredonadistributedfilesystem(FS):– Schedulertriestoschedulemaptasks“close”tophysicalstorageloca?onofinputdata

•  IntermediateresultsarestoredonlocalFSofMapandReduceworkers

•  OutputisoVeninputtoanotherMapReducetaskJ.Leskovec,A.Rajaraman,J.Ullman:

MiningofMassiveDatasets,h=p://www.mmds.org

43

Coordina?on:Master

•  MasternodetakescareofcoordinaAon:–  Taskstatus:(idle,in-progress,completed)–  Idletasksgetscheduledasworkersbecomeavailable– Whenamaptaskcompletes,itsendsthemastertheloca?onandsizesofitsRintermediatefiles,oneforeachreducer

– Masterpushesthisinfotoreducers

•  Masterpingsworkersperiodicallytodetectfailures


www.mmds.org44

Faulttolerance

q Comesfromscalabilityandcosteffec?venessq HDFS:

q Replica?onq MapReduce

q Restar?ngfailedtasks:mapandreduceq Wri?ngmapoutputtoFSq Minimizesre-computa?on

DealingwithFailures

•  Mapworkerfailure– Maptaskscompletedorin-progressatworkerareresettoidle

–  Reduceworkersareno?fiedwhentaskisrescheduledonanotherworker

•  Reduceworkerfailure–  Onlyin-progresstasksareresettoidle–  Reducetaskisrestarted

•  Masterfailure– MapReducetaskisabortedandclientisno?fied


www.mmds.org46

Failures

q Taskfailureq Taskhasfailed–reporterrortonodemanager,appmaster,client.

q Tasknotresponsive,JVMfailure–Nodemanagerrestartstasks.

q Applica?onMasterfailureq Applica?onmastersendsheartbeatstoresourcemanager.q Ifnotreceived,theresourcemanagerretrivesjobhistoryoftheruntasks.

q Nodemanagerfailure

HowmanyMapandReducejobs?

•  Mmaptasks,Rreducetasks•  Ruleofathumb:

– MakeMmuchlargerthanthenumberofnodesinthecluster

– OneDFSchunkpermapiscommon–  Improvesdynamicloadbalancingandspeedsuprecoveryfromworkerfailures

•  UsuallyRissmallerthanM– BecauseoutputisspreadacrossRfiles


www.mmds.org48

TaskGranularity&Pipelining

•  Finegranularitytasks:maptasks>>machines– Minimizes?meforfaultrecovery– Candopipelineshufflingwithmapexecu?on– Be=erdynamicloadbalancing


www.mmds.org49

Refinements:BackupTasks•  Problem

–  Slowworkerssignificantlylengthenthejobcomple?on?me:

•  Otherjobsonthemachine•  Baddisks•  Weirdthings

•  SoluAon– Nearendofphase,spawnbackupcopiesoftasks

•  Whicheveronefinishesfirst“wins”•  Effect

– Drama?callyshortensjobcomple?on?me


www.mmds.org50

Refinement:Combiners

•  OvenaMaptaskwillproducemanypairsoftheform(k,v1),(k,v2),…forthesamekeyk– E.g.,popularwordsinthewordcountexample

•  CansavenetworkAmebypre-aggregaAngvaluesinthemapper:– combine(k, list(v1)) à v2 – Combinerisusuallysameasthereducefunc?on

•  Worksonlyifreducefunc?oniscommuta?veandassocia?ve


www.mmds.org51

Refinement:Combiners

•  BacktoourwordcounAngexample:–  Combinercombinesthevaluesofallkeysofasinglemapper(singlemachine):

– Muchlessdataneedstobecopiedandshuffled!


www.mmds.org52

Refinement:Par??onFunc?on

•  WanttocontrolhowkeysgetparAAoned–  Inputstomaptasksarecreatedbycon?guoussplitsofinputfile

–  Reduceneedstoensurethatrecordswiththesameintermediatekeyendupatthesameworker

•  SystemusesadefaultparAAonfuncAon:–  hash(key) mod R

•  SomeAmesusefultooverridethehashfuncAon:–  E.g.,hash(hostname(URL)) mod RensuresURLsfromahostendupinthesameoutputfile


www.mmds.org53

Example:JoinByMap-Reduce•  ComputethenaturaljoinR(A,B)⋈S(B,C)•  RandSareeachstoredinfiles•  Tuplesarepairs(a,b)or(b,c)


www.mmds.org54

A B a1 b1

a2 b1

a3 b2

a4 b3

B C b2 c1

b2 c2

b3 c3

⋈A C a3 c1

a3 c2

a4 c3

=

R S

Map-ReduceJoin•  UseahashfuncAonhfromB-valuesto1...k•  AMapprocessturns:

– EachinputtupleR(a,b)intokey-valuepair(b,(a,R))– EachinputtupleS(b,c)into(b,(c,S))

•  Mapprocessessendeachkey-valuepairwithkeybtoReduceprocessh(b)– Hadoopdoesthisautoma?cally;justtellitwhatkis.

•  EachReduceprocessmatchesallthepairs(b,(a,R))withall(b,(c,S))andoutputs(a,b,c).J.Leskovec,A.Rajaraman,J.Ullman:

MiningofMassiveDatasets,h=p://www.mmds.org

55

Documents

CS60021: Scalable Data Mining Map Reducecse.iitkgp.ac.in/~sourangshu/coursefiles/SDM17A/mapreduce1.pdf · 2 Cluster Architecture Mem Disk CPU Mem Disk CPU … Switch Each rack contains