15
1 Lecture31 Multiprocessors

Computer Science 37 Lecture 31

Embed Size (px)

Citation preview

Page 1: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 1/15

1

Lecture31

Multiprocessors

Page 2: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 2/15

2

Question: Whatdoesitmeantocompute?

TuringMachine

program

data

y' !d ag

Perhaps:manipulateandtransformdata.

Page 3: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 3/15

3

Question: WhywouldIwanttohavemorethanone

computerworkatthesameproblematthesametime?

Theideais:ifittakestimeTtofinishataskusingonecomputer,itwilltaketimeT/Ntoaccomplishthesame

taskusingNcomputers.Right?Well,kindof.

Amdahl’sLaw:TheLawofDiminishingReturns

t improvemenof  Amount 

t improvemenbyaffected time Executiont improvemenafter time Execution =

unaffected time Execution+

Page 4: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 4/15

4

Conclusion: Inordertosee speedup equaltothenumberof

processorsusedtoexecuteanapplicationinparallel,this

applicationmustbehavenosequentialcomponentatall.

Sometimes,whenthesizeofproblemgrowsvery

large,thefractionofexecutiontimewhichcanbe

affectedbyimprovementgrowsmuchfasterthanthe

executiontimethatisunaffected.Inthosecases,parallelcomputingwillyieldgreatgains.

t improvemenof  Amount 

t improvemenbyaffected time Executiont improvemenafter time Execution =

unaffected time Execution+

Page 5: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 5/15

5

Littleroadmapfortherestofthelecture:

Quickglanceatafewproblemsthatariseinmultiprocessing

Categoriesofmultiprocessorssystems

Panoramiclookatmultiprocessorcomputers

Awordortwoonprogrammingmultiprocessors

Page 6: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 6/15

6

processor

cache

singlebus

processor

cache

processor

cache

Main

Memory

I/O

Question: Whataretheproblemswiththispicture?

Enterthecachecoherency protocols…

Page 7: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 7/15

7

processor

cache

interconnectionnetwork 

processor

cache

processor

cache

memory memory memory

Question: Whataretheproblemswiththispicture?

Page 8: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 8/15

8

ClassificationAccordingtoMemoryAccessTimes

Singleaddressspace:UMA:

Uniform

Memory

Access

NUMA:

Non-Uniform

Memory

Access

Sametime,nomatterwhich

processor,nomatterwhat

addressisaccessed(SMP).

Timedependsonwhichprocessor

isaskingforthedataandwhere

thedataisinmemory.

Multipleaddressspaces: distributedmemory,messagepassing.

Page 9: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 9/15

9

ClassificationAccordingtoProcessingModel

SISD:

Singleinstructionstream

Singledatastream

MIMD:

Singleinstructionstream

Singledatastream

MISD:

Multipleinstructionstreams

Singledatastream

SIMD:

Singleinstructionstream

Multipledatastreams

Page 10: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 10/15

10

SIMDComputers:TheMASPAR

ACU: arraycontrolunit;issues

instructionstoallthePEs (RISC).

PEs: clustersof32-bitALUs;64KBmemory,

6432-bitregisters.

Topology: gridconnection.

Scalability: 1024,2048,4096,8192or16384

processors.

Target: greatfordataparallelapplications.

Page 11: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 11/15

11

SIMDComputers:TheConnectionMachineCM-2

A5feettallcubeformedof

smallercubes,representinga

12-dimensionalhypercube

structureofthenetworkthat

connectedtheprocessors

together.

“Thishardgeometricobject,black,thenon-colorofsheer,staticmass,was transparent,filledwithasoft,constantlychangingcloudoflightsfromtheprocessor

 chips,red,thecoloroflifeandenergy.Itwasthearchetypeofanelectronicbrain,

 aliving,thinkingmachine.” 

Page 12: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 12/15

12

MIMDComputers:TheSGIOrigin2000

Expandableandflexiblerackdesign:addprocessorsasneedsgrow.Usescc-NUMAbuildingblockstoscalethesingle

shared-memorysystemfrom2to16processorsinasingle

rack.

EachmodulesupportstwotoeightMIPS®processorsandup

to16GBofmemoryandprovidesI/Obandwidthof6.24GB

persecond.

“Capableofconnectingwithmultiplerackstoscaleto64processorsina

 single-systemimageutilizingtherevolutionaryNUMAlink TM  interconnect, ahigh-speed,scalableinterconnectfabricthatprovidesincrementalbandwidth

whilemaintainingtheshared-memorymodelofanSMPserver.”

Page 13: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 13/15

13

MIMDComputers:TheSunEnterprise6500

KeySpecifications: Upto30CPUs,maximummemoryof60GB(SMPstyle

sharedmemory),RAIDdisks.

KeyBenefits:Ahighlyexpandablesystemthatoffersmission-criticalperformanceand

availability.

Page 14: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 14/15

14

MIMDComputers:Beowulf-typeClusters

Grendel (ClemsonUniversity): anexperimentalparallelcomputerbuiltfromcommoditycomponents.

Apile-of-PCsof18machines,eachwiththefollowing:150MHzPentiumCPU,

64MBEDODRAM,2GBIDEdisk,2FastEthernetcards.

OperatingSystem: RedHatLinux (kernel>=v2.0)

Themachinesaretiedtogetherwithtwonetworks.Thefirstisa busnetwork

usingastackof100Mb/shubs.Thesecondisafull-duplexswitchednetwork

usingaFastEthernetswitch.Defines2 nodes forinteractionwiththesystem,and

usestheother16asdedicatedcomputeandI/Oservers.Theconceptincludesnot

onlycommodityoff-the-shelf(COTS)hardware,butalsotheuseoffreely

availableoperatingsystemssuchasLinux,messagepassingsoftwaresuchasPVMandMPI,andothersoftwareoftencontributedbyBeowulfusers.

Cost: Canitgetanylower???

Woo-hoo!

Page 15: Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 15/15

15

DoesMultiprocessingAloneSolveThePerformanceProblem?

It’sbeendecadessinceresearchonparallelprocessingstarted,programmingamultiprocessorisstillahardtask.

 Loop:{ 

 Readdata;

 Processdata;

Writedata;

 }

 Loop:{ 

 Readdata;

 Processdata;

Writedata;

 }

ProcessorA ProcessorB

Problem: Communication(itstakestimetotransferdataaround).

Problem: Synchronization(dowehavetoagreeontime?).

Problem: Attherootofitall:DATADEPENDENCIES.