Computer Science 37 Lecture 31

8/4/2019 Computer Science 37 Lecture 31

http://slidepdf.com/reader/full/computer-science-37-lecture-31 1/15

1

Lecture31

Multiprocessors



2

Question: Whatdoesitmeantocompute?

TuringMachine

program

data

y' !d ag

Perhaps:manipulateandtransformdata.



3

Question: WhywouldIwanttohavemorethanone

computerworkatthesameproblematthesametime?

Theideais:ifittakestimeTtofinishataskusingonecomputer,itwilltaketimeT/Ntoaccomplishthesame

taskusingNcomputers.Right?Well,kindof.

Amdahl’sLaw:TheLawofDiminishingReturns

t improvemenof Amount

t improvemenbyaffected time Executiont improvemenafter time Execution =

unaffected time Execution+



4

Conclusion: Inordertosee speedup equaltothenumberof

processorsusedtoexecuteanapplicationinparallel,this

applicationmustbehavenosequentialcomponentatall.

Sometimes,whenthesizeofproblemgrowsvery

large,thefractionofexecutiontimewhichcanbe

affectedbyimprovementgrowsmuchfasterthanthe

executiontimethatisunaffected.Inthosecases,parallelcomputingwillyieldgreatgains.

t improvemenof Amount

t improvemenbyaffected time Executiont improvemenafter time Execution =

unaffected time Execution+



5

Littleroadmapfortherestofthelecture:

Quickglanceatafewproblemsthatariseinmultiprocessing

Categoriesofmultiprocessorssystems

Panoramiclookatmultiprocessorcomputers

Awordortwoonprogrammingmultiprocessors



6

processor

cache

singlebus

processor

cache

processor

cache

…

Main

Memory

I/O

Question: Whataretheproblemswiththispicture?

Enterthecachecoherency protocols…



7

processor

cache

interconnectionnetwork

processor

cache

processor

cache

…

memory memory memory

Question: Whataretheproblemswiththispicture?



8

ClassificationAccordingtoMemoryAccessTimes

Singleaddressspace:UMA:

Uniform

Memory

Access

NUMA:

Non-Uniform

Memory

Access

Sametime,nomatterwhich

processor,nomatterwhat

addressisaccessed(SMP).

Timedependsonwhichprocessor

isaskingforthedataandwhere

thedataisinmemory.

Multipleaddressspaces: distributedmemory,messagepassing.



9

ClassificationAccordingtoProcessingModel

SISD:

Singleinstructionstream

Singledatastream

MIMD:


Singledatastream

MISD:

Multipleinstructionstreams

Singledatastream

SIMD:


Multipledatastreams



10

SIMDComputers:TheMASPAR

ACU: arraycontrolunit;issues

instructionstoallthePEs (RISC).

PEs: clustersof32-bitALUs;64KBmemory,

6432-bitregisters.

Topology: gridconnection.

Scalability: 1024,2048,4096,8192or16384

processors.

Target: greatfordataparallelapplications.



11

SIMDComputers:TheConnectionMachineCM-2

A5feettallcubeformedof

smallercubes,representinga

12-dimensionalhypercube

structureofthenetworkthat

connectedtheprocessors

together.

“Thishardgeometricobject,black,thenon-colorofsheer,staticmass,was transparent,filledwithasoft,constantlychangingcloudoflightsfromtheprocessor

chips,red,thecoloroflifeandenergy.Itwasthearchetypeofanelectronicbrain,

aliving,thinkingmachine.”



12

MIMDComputers:TheSGIOrigin2000

Expandableandflexiblerackdesign:addprocessorsasneedsgrow.Usescc-NUMAbuildingblockstoscalethesingle

shared-memorysystemfrom2to16processorsinasingle

rack.

EachmodulesupportstwotoeightMIPS®processorsandup

to16GBofmemoryandprovidesI/Obandwidthof6.24GB

persecond.

“Capableofconnectingwithmultiplerackstoscaleto64processorsina

single-systemimageutilizingtherevolutionaryNUMAlink TM interconnect, ahigh-speed,scalableinterconnectfabricthatprovidesincrementalbandwidth

whilemaintainingtheshared-memorymodelofanSMPserver.”



13

MIMDComputers:TheSunEnterprise6500

KeySpecifications: Upto30CPUs,maximummemoryof60GB(SMPstyle

sharedmemory),RAIDdisks.

KeyBenefits:Ahighlyexpandablesystemthatoffersmission-criticalperformanceand

availability.



14

MIMDComputers:Beowulf-typeClusters

Grendel (ClemsonUniversity): anexperimentalparallelcomputerbuiltfromcommoditycomponents.

Apile-of-PCsof18machines,eachwiththefollowing:150MHzPentiumCPU,

64MBEDODRAM,2GBIDEdisk,2FastEthernetcards.

OperatingSystem: RedHatLinux (kernel>=v2.0)

Themachinesaretiedtogetherwithtwonetworks.Thefirstisa busnetwork

usingastackof100Mb/shubs.Thesecondisafull-duplexswitchednetwork

usingaFastEthernetswitch.Defines2 nodes forinteractionwiththesystem,and

usestheother16asdedicatedcomputeandI/Oservers.Theconceptincludesnot

onlycommodityoff-the-shelf(COTS)hardware,butalsotheuseoffreely

availableoperatingsystemssuchasLinux,messagepassingsoftwaresuchasPVMandMPI,andothersoftwareoftencontributedbyBeowulfusers.

Cost: Canitgetanylower???

Woo-hoo!



15

DoesMultiprocessingAloneSolveThePerformanceProblem?

It’sbeendecadessinceresearchonparallelprocessingstarted,programmingamultiprocessorisstillahardtask.

Loop:{

Readdata;

Processdata;

Writedata;

}

Loop:{

Readdata;

Processdata;

Writedata;

}

ProcessorA ProcessorB

Problem: Communication(itstakestimetotransferdataaround).

Problem: Synchronization(dowehavetoagreeontime?).

Problem: Attherootofitall:DATADEPENDENCIES.

Documents

Computer Science 37 Lecture 31