Upload
alexander-taylor
View
220
Download
0
Embed Size (px)
Citation preview
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 1/15
1
Lecture31
Multiprocessors
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 2/15
2
Question: Whatdoesitmeantocompute?
TuringMachine
program
data
y' !d ag
Perhaps:manipulateandtransformdata.
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 3/15
3
Question: WhywouldIwanttohavemorethanone
computerworkatthesameproblematthesametime?
Theideais:ifittakestimeTtofinishataskusingonecomputer,itwilltaketimeT/Ntoaccomplishthesame
taskusingNcomputers.Right?Well,kindof.
Amdahl’sLaw:TheLawofDiminishingReturns
t improvemenof Amount
t improvemenbyaffected time Executiont improvemenafter time Execution =
unaffected time Execution+
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 4/15
4
Conclusion: Inordertosee speedup equaltothenumberof
processorsusedtoexecuteanapplicationinparallel,this
applicationmustbehavenosequentialcomponentatall.
Sometimes,whenthesizeofproblemgrowsvery
large,thefractionofexecutiontimewhichcanbe
affectedbyimprovementgrowsmuchfasterthanthe
executiontimethatisunaffected.Inthosecases,parallelcomputingwillyieldgreatgains.
t improvemenof Amount
t improvemenbyaffected time Executiont improvemenafter time Execution =
unaffected time Execution+
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 5/15
5
Littleroadmapfortherestofthelecture:
Quickglanceatafewproblemsthatariseinmultiprocessing
Categoriesofmultiprocessorssystems
Panoramiclookatmultiprocessorcomputers
Awordortwoonprogrammingmultiprocessors
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 6/15
6
processor
cache
singlebus
processor
cache
processor
cache
…
Main
Memory
I/O
Question: Whataretheproblemswiththispicture?
Enterthecachecoherency protocols…
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 7/15
7
processor
cache
interconnectionnetwork
processor
cache
processor
cache
…
memory memory memory
Question: Whataretheproblemswiththispicture?
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 8/15
8
ClassificationAccordingtoMemoryAccessTimes
Singleaddressspace:UMA:
Uniform
Memory
Access
NUMA:
Non-Uniform
Memory
Access
Sametime,nomatterwhich
processor,nomatterwhat
addressisaccessed(SMP).
Timedependsonwhichprocessor
isaskingforthedataandwhere
thedataisinmemory.
Multipleaddressspaces: distributedmemory,messagepassing.
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 9/15
9
ClassificationAccordingtoProcessingModel
SISD:
Singleinstructionstream
Singledatastream
MIMD:
Singleinstructionstream
Singledatastream
MISD:
Multipleinstructionstreams
Singledatastream
SIMD:
Singleinstructionstream
Multipledatastreams
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 10/15
10
SIMDComputers:TheMASPAR
ACU: arraycontrolunit;issues
instructionstoallthePEs (RISC).
PEs: clustersof32-bitALUs;64KBmemory,
6432-bitregisters.
Topology: gridconnection.
Scalability: 1024,2048,4096,8192or16384
processors.
Target: greatfordataparallelapplications.
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 11/15
11
SIMDComputers:TheConnectionMachineCM-2
A5feettallcubeformedof
smallercubes,representinga
12-dimensionalhypercube
structureofthenetworkthat
connectedtheprocessors
together.
“Thishardgeometricobject,black,thenon-colorofsheer,staticmass,was transparent,filledwithasoft,constantlychangingcloudoflightsfromtheprocessor
chips,red,thecoloroflifeandenergy.Itwasthearchetypeofanelectronicbrain,
aliving,thinkingmachine.”
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 12/15
12
MIMDComputers:TheSGIOrigin2000
Expandableandflexiblerackdesign:addprocessorsasneedsgrow.Usescc-NUMAbuildingblockstoscalethesingle
shared-memorysystemfrom2to16processorsinasingle
rack.
EachmodulesupportstwotoeightMIPS®processorsandup
to16GBofmemoryandprovidesI/Obandwidthof6.24GB
persecond.
“Capableofconnectingwithmultiplerackstoscaleto64processorsina
single-systemimageutilizingtherevolutionaryNUMAlink TM interconnect, ahigh-speed,scalableinterconnectfabricthatprovidesincrementalbandwidth
whilemaintainingtheshared-memorymodelofanSMPserver.”
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 13/15
13
MIMDComputers:TheSunEnterprise6500
KeySpecifications: Upto30CPUs,maximummemoryof60GB(SMPstyle
sharedmemory),RAIDdisks.
KeyBenefits:Ahighlyexpandablesystemthatoffersmission-criticalperformanceand
availability.
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 14/15
14
MIMDComputers:Beowulf-typeClusters
Grendel (ClemsonUniversity): anexperimentalparallelcomputerbuiltfromcommoditycomponents.
Apile-of-PCsof18machines,eachwiththefollowing:150MHzPentiumCPU,
64MBEDODRAM,2GBIDEdisk,2FastEthernetcards.
OperatingSystem: RedHatLinux (kernel>=v2.0)
Themachinesaretiedtogetherwithtwonetworks.Thefirstisa busnetwork
usingastackof100Mb/shubs.Thesecondisafull-duplexswitchednetwork
usingaFastEthernetswitch.Defines2 nodes forinteractionwiththesystem,and
usestheother16asdedicatedcomputeandI/Oservers.Theconceptincludesnot
onlycommodityoff-the-shelf(COTS)hardware,butalsotheuseoffreely
availableoperatingsystemssuchasLinux,messagepassingsoftwaresuchasPVMandMPI,andothersoftwareoftencontributedbyBeowulfusers.
Cost: Canitgetanylower???
Woo-hoo!
8/4/2019 Computer Science 37 Lecture 31
http://slidepdf.com/reader/full/computer-science-37-lecture-31 15/15
15
DoesMultiprocessingAloneSolveThePerformanceProblem?
It’sbeendecadessinceresearchonparallelprocessingstarted,programmingamultiprocessorisstillahardtask.
Loop:{
Readdata;
Processdata;
Writedata;
}
Loop:{
Readdata;
Processdata;
Writedata;
}
ProcessorA ProcessorB
Problem: Communication(itstakestimetotransferdataaround).
Problem: Synchronization(dowehavetoagreeontime?).
Problem: Attherootofitall:DATADEPENDENCIES.