Computer Science 37 Lecture 31

Preview:

Citation preview

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 1/15

1

Lecture31

Multiprocessors

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 2/15

2

Question: Whatdoesitmeantocompute?

TuringMachine

program

data

y' !d ag

Perhaps:manipulateandtransformdata.

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 3/15

3

Question: WhywouldIwanttohavemorethanone

computerworkatthesameproblematthesametime?

Theideais:ifittakestimeTtofinishataskusingonecomputer,itwilltaketimeT/Ntoaccomplishthesame

taskusingNcomputers.Right?Well,kindof.

Amdahl’sLaw:TheLawofDiminishingReturns

t improvemenof  Amount 

t improvemenbyaffected time Executiont improvemenafter time Execution =

unaffected time Execution+

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 4/15

4

Conclusion: Inordertosee speedup equaltothenumberof

processorsusedtoexecuteanapplicationinparallel,this

applicationmustbehavenosequentialcomponentatall.

Sometimes,whenthesizeofproblemgrowsvery

large,thefractionofexecutiontimewhichcanbe

affectedbyimprovementgrowsmuchfasterthanthe

executiontimethatisunaffected.Inthosecases,parallelcomputingwillyieldgreatgains.

t improvemenof  Amount 

t improvemenbyaffected time Executiont improvemenafter time Execution =

unaffected time Execution+

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 5/15

5

Littleroadmapfortherestofthelecture:

Quickglanceatafewproblemsthatariseinmultiprocessing

Categoriesofmultiprocessorssystems

Panoramiclookatmultiprocessorcomputers

Awordortwoonprogrammingmultiprocessors

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 6/15

6

processor

cache

singlebus

processor

cache

processor

cache

Main

Memory

I/O

Question: Whataretheproblemswiththispicture?

Enterthecachecoherency protocols…

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 7/15

7

processor

cache

interconnectionnetwork 

processor

cache

processor

cache

memory memory memory

Question: Whataretheproblemswiththispicture?

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 8/15

8

ClassificationAccordingtoMemoryAccessTimes

Singleaddressspace:UMA:

Uniform

Memory

Access

NUMA:

Non-Uniform

Memory

Access

Sametime,nomatterwhich

processor,nomatterwhat

addressisaccessed(SMP).

Timedependsonwhichprocessor

isaskingforthedataandwhere

thedataisinmemory.

Multipleaddressspaces: distributedmemory,messagepassing.

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 9/15

9

ClassificationAccordingtoProcessingModel

SISD:

Singleinstructionstream

Singledatastream

MIMD:

Singleinstructionstream

Singledatastream

MISD:

Multipleinstructionstreams

Singledatastream

SIMD:

Singleinstructionstream

Multipledatastreams

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 10/15

10

SIMDComputers:TheMASPAR

ACU: arraycontrolunit;issues

instructionstoallthePEs (RISC).

PEs: clustersof32-bitALUs;64KBmemory,

6432-bitregisters.

Topology: gridconnection.

Scalability: 1024,2048,4096,8192or16384

processors.

Target: greatfordataparallelapplications.

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 11/15

11

SIMDComputers:TheConnectionMachineCM-2

A5feettallcubeformedof

smallercubes,representinga

12-dimensionalhypercube

structureofthenetworkthat

connectedtheprocessors

together.

“Thishardgeometricobject,black,thenon-colorofsheer,staticmass,was transparent,filledwithasoft,constantlychangingcloudoflightsfromtheprocessor

 chips,red,thecoloroflifeandenergy.Itwasthearchetypeofanelectronicbrain,

 aliving,thinkingmachine.” 

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 12/15

12

MIMDComputers:TheSGIOrigin2000

Expandableandflexiblerackdesign:addprocessorsasneedsgrow.Usescc-NUMAbuildingblockstoscalethesingle

shared-memorysystemfrom2to16processorsinasingle

rack.

EachmodulesupportstwotoeightMIPS®processorsandup

to16GBofmemoryandprovidesI/Obandwidthof6.24GB

persecond.

“Capableofconnectingwithmultiplerackstoscaleto64processorsina

 single-systemimageutilizingtherevolutionaryNUMAlink TM  interconnect, ahigh-speed,scalableinterconnectfabricthatprovidesincrementalbandwidth

whilemaintainingtheshared-memorymodelofanSMPserver.”

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 13/15

13

MIMDComputers:TheSunEnterprise6500

KeySpecifications: Upto30CPUs,maximummemoryof60GB(SMPstyle

sharedmemory),RAIDdisks.

KeyBenefits:Ahighlyexpandablesystemthatoffersmission-criticalperformanceand

availability.

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 14/15

14

MIMDComputers:Beowulf-typeClusters

Grendel (ClemsonUniversity): anexperimentalparallelcomputerbuiltfromcommoditycomponents.

Apile-of-PCsof18machines,eachwiththefollowing:150MHzPentiumCPU,

64MBEDODRAM,2GBIDEdisk,2FastEthernetcards.

OperatingSystem: RedHatLinux (kernel>=v2.0)

Themachinesaretiedtogetherwithtwonetworks.Thefirstisa busnetwork

usingastackof100Mb/shubs.Thesecondisafull-duplexswitchednetwork

usingaFastEthernetswitch.Defines2 nodes forinteractionwiththesystem,and

usestheother16asdedicatedcomputeandI/Oservers.Theconceptincludesnot

onlycommodityoff-the-shelf(COTS)hardware,butalsotheuseoffreely

availableoperatingsystemssuchasLinux,messagepassingsoftwaresuchasPVMandMPI,andothersoftwareoftencontributedbyBeowulfusers.

Cost: Canitgetanylower???

Woo-hoo!

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 15/15

15

DoesMultiprocessingAloneSolveThePerformanceProblem?

It’sbeendecadessinceresearchonparallelprocessingstarted,programmingamultiprocessorisstillahardtask.

 Loop:{ 

 Readdata;

 Processdata;

Writedata;

 }

 Loop:{ 

 Readdata;

 Processdata;

Writedata;

 }

ProcessorA ProcessorB

Problem: Communication(itstakestimetotransferdataaround).

Problem: Synchronization(dowehavetoagreeontime?).

Problem: Attherootofitall:DATADEPENDENCIES.