64
csinparallel.org Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel Adams, Calvin College

Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Using Map-Reduce to Teach Parallel Programming

Concepts

Dick Brown, St. Olaf CollegeLibby Shoop, Macalester College

Joel Adams, Calvin College

Page 2: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Workshopsite

CSinParallel.org->Workshops->WMRWorkshopSeealsoworkshophandout

Page 3: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Introductorycomments

– Roleofundergraduateresearchers:Therewouldbenoworkshopwithoutthem!

– ThankstoAmazonWebServicesforprovidingcreditstohostourWMRinstance

– Disclaimer:Wearenotproposingmap-reduceastheonlyapproachtointroducingparallelism,concurrency

– Avalue:Honorthyneighbor'scurricularapproach

Page 4: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Goals

–  Introducemap-reducecompuLng,usingtheWebMapReduce(WMR)simplifiedinterfacetoHadoop

• Whyusemap-reduceinthecurriculum?

– Hands-onexerciseswithWMRforfoundaLoncourses

Page 5: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Goals

Part1–IntroducLon– Map-reducecompuLng,andtheWebMapReduce(WMR)simplifiedinterfacetoHadoop

– Hands-onexerciseswithWMRforfoundaLoncourses

Part2–TeachingwithWMR– Whyusemap-reduceinthecurriculum?– UseofWMRforintermediateandadvancedcourses– Hands-onexercisesformoreadvanceduse

Part3(opLonal)–What’sunderthehood?

Page 6: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

SneakPreview:Materialsavailable

(Incaseyoualreadyknowyourmap-reduce…)

•  CSinParallelmodule:Map-reduceCompu;ngforIntroductoryStudentsusingWebMapReduce– Seecsinparallel.org

Page 7: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Part1:IntroducLontoMap-ReduceCompuLngandWMR

Page 8: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

IntroducLontoMap-ReduceCompuLng

Page 9: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

History

– ThecomputaLonalmodelofusingmapandreduceoperaLonswasdevelopeddecadesago,forLISP

– GoogledevelopedMapReducesystemforsearchengine,published(DeanandGhemawat,2004)

– Yahoo!createdHadoop,anopen-sourceimplementaLon(underApache);Javamappersandreducers

Page 10: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Map-Reduce:The2-minuteoverview

Whatifyouwantedtocountthefrequenciesofallwords

in1,000,000books?

Page 11: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Map-Reduce:The2-minuteoverview

Whatifyouwantedtocountthefrequenciesofallwords

in1,000,000books?

1.  Breakupthelinesoftext:generateonelabelledpieceperword

•  Usethatwordaslabel;value1foreachpiece

2.  Groupthepiecesaccordingtolabel(word)3.  Addupthe1’sineachgroup

Page 12: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Map-ReduceConcept

Page 13: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Themap-reducecomputaLonalmodel•  Map-reduceisatwo-stageprocesswitha"shuffletwist"

betweenthestages.

•  StagesarecontrolledbyfuncLons:mapper(),reducer()

Page 14: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Themap-reducecomputaLonalmodel

•  mapper()funcLon:– Argumentisonelineofinputfromafile–  Produces(key,value)pairs

•  Example:word-countmapper()"thecatinthehat”-->[mapperforthisline]("the","1"),("cat","1"),("in","1"),("the","1"),("hat","1")

Page 15: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Themap-reducecomputaLonalmodel

•  Shufflestage:– groupallmappers’(key,value)pairstogetherthathavethesamekey,andfeedeachgrouptoitsowncallofreduce()

–  Input:all(key,value)pairsfromallmappers– Output:Thosepairsrearranged,senttocallsofreduce()accordingtokey

•  Note:Shufflealsosorts(opLmizaLon)

Page 16: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Themap-reducecomputaLonalmodel

•  reducer()funcLon:– Receivesallkey-valuepairsforonekey– Producesanaggregateresult

•  Example:word-countreducer()("the","1"),("the","1")-->[reducerfor"the"]("the","2")

Page 17: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Themap-reducecomputaLonalmodel

–  Inmap-reduce,aprogrammercodesonlytwofuncLons(plusconfiginformaLon)

•  Amodelforfutureparallel-programmingframeworks

– Underlyingmap-reducesystemreusescodefor•  ParLLoningthedataintochunksandlines,•  Runsmappers/reducerswherethechunksarelocal•  Movingdatabetweenmappersandreducers•  Auto-recoveringfromanycrashesthatmayoccur•  ...

– OpLmized,Distributed,Fault-tolerant,Scalable

Page 18: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Themap-reducecomputaLonalmodel

•  OpLmized,Distributed,Fault-tolerant,Scalable

1.mappers 2.shuffle 3.reducersLocalI/O

GlobalI/O

Page 19: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

DemoofWMR

cumulus.cs.stolaf.edu/wmr

Intromodule

Page 20: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Materialsavailable

•  CSinParallelmodule:Map-reduceCompu;ngforIntroductoryStudentsusingWebMapReduce– Seecsinparallel.org

Page 21: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Overviewofsuggestedexercises

Availableonthecsinparallel.orgsite

– Runwordcount(provided),withsmallandlargedata

– Modify,runvariaLonsonwordcount:strippunctuaLon;caseinsensiLve;etc.

– AlternaLveexercises

Page 22: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

AddiLonalexercises

Beyondyourfirstsimpleexercises,considerexploringthefollowing:•  Variousdatasets

– Note:PleaseavoidlargeGutenberg"groups"forthisworkshop

•  ExtendedsetofexercisesforCS1(textanalysis)

Page 23: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Hands-onexploraLonofWMR

Page 24: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Part2:TeachingwithWMR

Whymap-reduce?WhyWMR?

TeachingWMRinCS1;inothercourses

Page 25: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Whyteachmap-reduce?

Page 26: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Whymap-reduceforteachingnoLonsofparallelism/concurrency?

– Concepts:•  dataparallelism;•  taskparallelism;•  locality;•  effectsofscale;•  exampleeffecLveparallelprogrammingmodel;•  distributeddatawithredundancyforfaulttolerance;...

Page 27: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Whymap-reduceforteachingnoLonsofparallelism/concurrency?

– Real-World:Hadoopwidelyused– ExciLng:theappealofGoogle,Facebook,etc.– Useful:forappropriateapplicaLons– Powerful:scalabilitytolargeclusters,largedata

Page 28: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

WhyWMR?

–  Introduceconceptsofparallelism– Lowbarforentry,feasibleforCS1(andbeyond)– CapturetheimaginaLonsofstudents

•  SupportsrapidintroducLonofconceptsofparallelismforeveryCSstudent–  Intromoduledesignedfor1-3daysofclass

Page 29: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

WebMapReduce(WMR)

– SimplifiedwebinterfaceforHadoopcomputaLons

– Goals:•  StrategicallySimplesuitableforCS1,butnotatoy

•  Configurablewritemappers/reducersinanylanguage

•  AccessiblewebapplicaLon•  MulL-plamorm,front-endandback-end

Page 30: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

WMRFeatures(Briefly)

– TesLnginterface•  Errorfeedback•  BypassesHadoop--smalldataonly!

–  StudentsenterthefollowinginformaLon:•  choiceoflanguage•  datatoprocess•  definiLonofmapperinthatlanguage•  definiLonofreducerinthatlanguage

Page 31: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

WMRsysteminformaLon

– Languagescurrentlysupported:Java,C++,Python,Scheme,C,C#,Javascript

•  Rcomingsoon

– Backendstodate:localcluster,AmazonEC2cloudimages

•  Versionlimitsandmorebackendscomingsoon

Moredetailsaboutthesystemin(opLonal)Part3oftheworkshop

Page 32: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Teachingmap-reducewithWMRintheintroductorysequence

Page 33: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

KinestheLcstudentacLvity•  VisualizaLonsofmap-reducecomputaLonsareenoughforsomestudents,butnotall

•  Anin-classac/vitytoactoutthemap/shuffle/reduceprocesshelpsothers

•  Alsohelpful:imagesofclusters;sequenLalversions;contextofwell-knownwebservices

Page 34: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

WMRinadvancedcourses

Example:PDCelecLve•  CS1module•  Map-reduceprogrammingtechniques

– FeaturesofWMR– Contextforwarding– Structuredvalues;structuredkeys– Mul;-casemappers;mul;-casereducers– Broadcas;ngdatavalues

Page 35: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

ExamplesforTextProcessingTechniques

•  Combiningdatawithinamapper–  Mapper:Tallycountsofwordsbeforesendingtoreducer

•  ComputaLonallinguisLcs:–  wordsthatareco-located

•  FindandcountpairsExampleIn:thecatinthecathatEmits:1cat|in1in|the1cat|hat2the|cat

•  Usecombiningproceduretofind‘stripes’ExampleIn:thecatandthedogfoughtoverthedogboneEmits:(the,{cat:1,dog:2}

Thanksto:DataIntensiveTextProcessing,byJimmyLinandChrisDyer

Page 36: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

ApplicaLonideas

•  Examplesintheintroductorymodule•  Bigdatasetspeoplecareabout

•  Especiallyforunstructureddata•  Convenientforcertainkindsofprojects

– E.g.,mostcommonmedicalterminology

Page 37: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

WMRHands-on,conLnuedModuleexercisesExtendedexercisesetDatasetsavailable/shared/MovieLens2/movieRaLngs/shared/gutenberg/WarAndPeace.txt/shared/gutenberg/CompleteShakespeare.txt

Page 38: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Part3:What’sunderthehood

Page 39: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

AboutWMR

•  WMRanditsarchitecture

•  ObtainingandinstallingWMR– WebMapReduce.sf.com

ClusterHeadNode

WebServer

UserBrowser

UserBrowser

Cluster

Page 40: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.orgBasic

Hadoopcomponents•  Internals:

–  Jobmanagement(percluster)– Taskmanagement(percomputaLonnode)

•  Somecomponentsvisibletotheuser:– HadoopAPI–Java,orarbitraryexecutables(“Streaming”)

– HadoopDistributedFileSystem(HDFS)– Supporttools,includinghadoop command– Limitedjobmonitoring…

Page 41: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Goals

–  Introducemap-reducecompuLng,usingtheWebMapReduce(WMR)simplifiedinterfacetoHadoop

• Whyusemap-reduceinthecurriculum?– Hands-onexerciseswithWMRforfoundaLoncourses

– UseofWMRforintermediateandadvancedcourses

• What’sunderthehoodwithWMR•  ApeekatHadoop…

– Hands-onexercisesformoreadvanceduse

Page 42: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

WMRinadvancedcourses

Page 43: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

InverLng"Chapter1:CallmeIshmael.Some…”"Chapter2:Istuffedashirtortwo...""Chapter3:Enteringthatgable-ended...”

-->[mapper]("call","1"),("me","1"),...,("i","2"),("stuffed”,"2"),...,("entering","3"),...

-->[reducer]

"a""1,1,1,1,...,2,2,2,...""aback""3,7,7,8,...”...

Page 44: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Whenismap-reduceappropriate?

•  Massive,unstructuredorirregularlystructured“bigdata”(Terascaleandupward)– Rawtext– Webpages– XML– Unstructuredstreamsofdata

•  Otherapproachesmayfitstructured“bigdata”– Scalabledatabases– Large-scalestaLsLcalapproaches

Page 45: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

UsingHadoopdirectly(Java)

WordCount.javaexample

Page 46: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

DirectHadoopExamples

•  Wordcount–  Java

Page 47: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

QuickquesLons/commentssofar?

Page 48: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Hands-on

Page 49: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Overviewofsuggestedexercises

– ComputaLonswithMovieLens2data;mulLplemap-reducecycles

– Trafficdataanalysis– NetworkanalysisusingFlixterdata– TheMillionSongdataset

Page 50: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Discussion

Page 51: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

EvaluaLons!

Linksat:CSinParallel.org->Workshops->WMRWorkshop(endofthepage)

Page 52: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

SomeconsideraLonswithHadoop

– Numbersofmappersandreducers– DFS– Faulttolerance–  I/Oformats

– Note:wehavefurtherslideswithaddiLonalinformaLonabouttheseaspects,foryoutolookatonyourown.

Page 53: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

DirectHadoopexercisesetup

– Edityourownfiles,locally– scptocluster'sadminnode(oncloud)– sshtocompile,launchjob– Percentageprogressoutputisprovided– MulLplecyclesupportviaDFS–  (Cleanup)

Page 54: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

AddiLonalDetailsaboutHadoop

Page 55: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

ThehadoopprojectdocumentaLon

•  h}p://hadoop.apache.org/common/docs/current/index.html

Page 56: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Howmanymappers?

•  TheHadoopMap/ReduceframeworkspawnsonemappertaskforeachInputSplitgeneratedbytheInputFormatforthejob.

•  Thenumberofmappersisusuallydrivenbythetotalsizeoftheinputs,thatis,thetotalnumberofblocksoftheinputfiles.

Page 57: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Howmanyreducers?

•  ThenumberofreducersforthejobissetbytheuserviaJobConf.setNumReduceTasks(int)

•  Thesizeofyoureventualoutputmaydictatehowmanyreducersyouchoose.

Page 58: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

HDFS

•  Fault-tolerantdistributedfilesystemmodeleda~ertheGoogleFileSystem– we'vehadstudentsreadtheoriginalGFSpaperinanadvancedcourse

•  h}p://hadoop.apache.org/hdfs/docs/current/index.html

•  NotethesecLonaboutthefilesystemcommandsyoucanrunfromthecommandline:hadoopfs-lsHadoopfs-getor-put

Page 59: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

HDFSAssumpLonsandGoals

•  Hardwarefailure–  HardwarefailureisthenormratherthantheexcepLon.

•  Streamingdataaccess–  ApplicaLonsthatrunonHDFSneedstreamingaccesstotheirdatasets.TheyarenotgeneralpurposeapplicaLonsthattypicallyrunongeneralpurposefilesystems.

•  LargeDataSets•  Simplecoherencymodel

–  Readmany,writeonce•  MovingcomputaLonsissimplerthanmovingdata•  Portabilityacrossvarioushardware/so~ware

Page 60: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Page 61: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Page 62: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Input/Outputformats

–  InputintomappersareinterpretedusingclassesimplemenLngtheinterfaceInputFormat,andoutputfromreducersareimplementedusingclassesimplemenLngtheinterfaceOutputFormat.

–  InWMR,themapperinputandreduceroutputisperformedwithkey-valuepairs.ThiscorrespondstousingtheclassesKeyValueTextInputFormatandTextOutputFormat.

–  IndirectHadoop,thedefaultinputformatisTextInputFormat,inwhichvaluesarelinesofthefileandkeysareposiLonswithinthatfile.

Page 63: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

SomefurtherfeaturesofHadoop

– Combiner,anopLmizaLon:performsome"reducLon"duringthemapphase,a~ermapper()andbeforeshuffle

– SorLngcontrol•  Note:hardtosortonsecondarykey

– Threeprogramminginterfaces:Java;pipes(C++);streaming(executables)

Page 64: Using Map-Reduce to Teach Parallel Programming Concepts · Using Map-Reduce to Teach Parallel Programming Concepts Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel

csinparallel.org

Pagerankalgorithmideas

•  Originaldata:onewebpageperline

–  mapperproduces("dest","1/kPn")foreachlinkinpagePnwhereklinksappearwithinthatpagePnreducerproduces("dest","weight_0P1P2P2P3P4...")whereweightissumoftheweightsfromkeyvaluepairsemi}edbyP1,P2,...

–  SubsequentmappersandreducersproducerefinedweightsthattakeintoaccountdeeperchainsofpagespoinLngtopages

–  Finalreducerdelivers("dest","weight_k")[dropPns]