Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Curs1
ProgramareParalelasiDistribuita
Curs 1 - PPD - 2
Continutulcursului(realizatsipebazapehttp://grid.cs.gsu.edu/~tcpp/curriculum/?q=home)
Teoretic• Notiuniintroductive:arhitecturi,
concurenta,paralelism• Etapeindezvoltareaprogramelorparalele• Evaluareaperformanteiprogramelor
paralele• Modeledeprogramareparalela
– Diferentaintrecelebazatepememoriepartajatasimemoriedistribuita
• Patterns– Ptprogramareparalela– Ptprogramaredistribuita
Practic• Javathreads(lowlevelAPI)• C++(>=C++11)threads• High-levelAPI:pacheteJava->
java.util.concurrentpackages.• Javastreams• OpenMP(C++)• CUDA(C++)• MPI–MessagePassingInterface
– exemplificariC,C++
Curs 1 - PPD - 3
ProblemaConcurs
Alternativalaexamenulscris• Link:
https://drive.google.com/drive/folders/1qU77HrC-wG96x6EXtP2NeDs20hcpmecW?usp=sharing
• Nuesteobligatoriu!• Sepoateincercadarpentruvalidaretrebuierespectate
conditiile!
Curs 1 - PPD - 4
Bibliografie• IanFoster.DesigningandBuildingParallelPrograms,Addison-Wesley1995.• BernaL.Massingill,TimothyG.Mattson,andBeverlyA.Sanders,AddisonAPatternLanguagefor
ParallelProgramming.WesleySoftwarePatternsSeries,2004.• MichaelMcCool,ArchRobinson,JamesReinders,StructuredParallelProgramming:Patternsfor
EfficientComputation,”MorganKaufmann,,2012.• D.Culler,J.PalSingh,A.Gupta.ParallelComputerArchitecture:AHardware/SoftwareApproach.
MorganKaufmann.1998.• Grama,A.Gupta,G.Karypis,V.Kumar.IntroductiontoParallelComputing,AddisonWesley,2003.• D.Grigoras.CalcululParalel.Delasistemelaprogramareaaplicatiilor.ComputerLibrisAgora,2000.• V.Niculescu.CalculParalel.Proiectaresidezvoltareformalaaprogramelorparalele.PresaUniv.
Clujana,2006.• B.Wilkinson,M.Allen,ParallelProgrammingTechniquesandApplicationsUsingNetworked
WorkstationsandParallelComputers,PrenticeHall,2002• A.Williams.C++ConcurrencyinActionPRACTICALMULTITHREADING.ManningPublisher.2012.• TutorialeJava:http://docs.oracle.com/javase/tutorial/essential/concurrency/further.html• C++11http://en.cppreference.com/w/• OpenMP:http://openmp.org/• MPI:http://www.mpi-forum.org/
Curs 1 - PPD - 5
Evaluare
• Laborator(30%)>=5– Exercitiilaborator– ProgramefolosindJava/C++threads,OpenMP,MPI
• Seminar(10%)– Prezenta(7puncte<–7prezente)– Participareactiva[-1(insuficient),0(suficient),1(bine),2(foartebine)]
• Testpractic(multithreading)– 05%• ProiectCUDA-10%• Examen>=5
– Scris– sesiune40%
• Informatiicurs– http://www.cs.ubbcluj.ro/~vniculescu/didactic/PPD/CursPPD.html
6Curs 1 - PPD -
ProcesareParalela
• Uncalculatorparalelesteuncalculator(sistem)carefolosestemultipleelementedeprocesaresimultanaintr-omanieracooperativapentruarezolvaoproblemacomputationala.
• ProcesareaParalelaincludetehnicisitehnologiicarefacposibilcalcululinparalel– Hardware,retele,SO,biblioteci,limbaje,compilatoare,algoritmi…
• Paralelismulestenatural.
• PERFORMANTA– Parallelismisverymuchaboutperformance!
7Curs 1 - PPD -
Curs 1 - PPD - 8
CalculSerialvs.Paralel(imagesfromIntroductiontoParallelComputingBlaiseBarney)
“Itwouldappearthatwehavereachedthelimitsofwhatitispossibletoachievewithcomputertechnology,althoughoneshouldbecarefulwithsuchstatements,astheytendtosoundprettysillyin5years.“(JohnvonNeumann,1949)
Curs 1 - PPD - 9
Curs 1 - PPD - 10
Limitealeprogramariiseriale• Vitezadetransmisie–
• Vitezaluminii(30cm/nanosecond),limitadetransmisiepefirdecupru(9cm/nanosecond).
• Limitareaminiaturizarii–numardetrazistoripechip.– LegealuiMoore:
număruldetranzistoricarepotfiplasatipeunsingurcircuitintegrat(persquareinchchip)sedubleazalafiecare2ani.
– Darimpunecosturimari.
• Limitarieconomice
Istoric
• CrestereaperformanteiprocesorprincrestereafrecventeiceasuluiCPU(CPUclockfrequency)– RidingMoore’slaw
• Probleme:incalzireaputernicaachipurilor!– Frecventaceasmaimare⇒consumelectricmaimare(Pentium4heatsink¦FryinganeggonaPentium4)
• Solutie–adugaremaimultorcore-uriptaajungelaperformantadorita– Sepastreazafrecventadeceaslafelsauchiarmicsorare– nucresteconsumul.
Curs 1 - PPD - 11
Nivelurideparalelism
1.paralelismlaniveldejob:-intrejoburi;-intrefazealejoburilor;2.paralelismlaniveldeprogram:-intrepărţialeprogramului;-inanumitecicluri;3.paralelismlaniveldeinstrucţiune:
-intrediferitefazedeexecuţiealeuneiinstrucţiuni;4.paralelismlanivelaritmeticşilaniveldebit:- intreelementealeuneioperaţiivectoriale;- intrecircuitelelogiciiaritmetice.
Curs 1 - PPD - 12
• Arhitecturilecurentesebazeazatotmaimultpeparalelismlanivelhardwarepentruaimbunatatiperformanta:
– Multipleexecutionunits– Pipelinedinstructions– Multi-core
Curs 1 - PPD - 13
Paralelism<->Concurenta
• Considerammaimultetaskuricaretrebuieexecutatepeuncalculator• Taskurileseconsideraafipurparaleledaca:
– Sepotexecutainacelasitimp(parallelexecution)• Dependente->executieconcurenta:
– Untaskarenevoiederezultatelealtora;– Untasktrebuiesaseexecutedupaceoanumitaconditieeindeplinita– Maimultetaskuriincearcasafoloseascaaceeasiresursa
=>Formedesincronizaretrebuiefolositepentruasatisfaceconditiile/dependentele
• Concurentaestefundamentalaincomputerscience– Sistemedeoperare,bazededate,networking,…
14Curs 1 - PPD -
Paralelismvs.Concurenta
15 Curs 1 - PPD -
Obs:– Sepotfolosithreadurisauproceseinambelecazuri– Dacauncalculparalelnecesitaacceslaresursecomuneatunci
estenevoiesasegestionezecorectconcurenta=>Paralelismulpoateimplicaconcurenta
Paralelism:Sefolosescmaimulteresursepentruarezolvaoproblemamairapid
resources
Concurenta:Gestiuneacorectasieficientaaaccesuluilaresursecomune
requestswork
resource
ConcurentasiParalelism
• Concurentvs.paralel• ExecutieParalela:
– Taskurileseexecutaefectivinacelasitimp;
– Estenecesaraexistentademultipleresursedecalcul
• Paralelism=concurenta+hardware“paralel”
16Curs 1 - PPD -
Paralelism
• Existamaimultenivelurideparalelism:– Procese,threads,routine,instructiuni,…
• Trebuiesafiesuportatederesurselehardware– Procesoare,nuclee(cores),…(executiainstructiunilor)– Memorii,DMA,retele,…(operatiiasociate)
17Curs 1 - PPD -
oabordaresimplista
Curs 1 - PPD - 18
http://www.java-programming.info/tutorial/pdf/java/11-Java-Multithreaded-Programming.pdf
Decesafolosimprogramareparalela?
• Motiveprimare:– Timpdecalculmairapid(responsetime)– Rezolvareaproblemelor‘mari’decalcul(intimprezonabildecalcul)
• Motivesecundare:– Folosireaefectivaaresurselordecalcul– Costurireduse– Reducereaconstrangerilorasociatememoriei– Limitarilemasinilorseriale
• Paralelism=concurenta+hardware‘paralel’+performanta
19Curs 1 - PPD -
Curs 1 - PPD - 20
• Rezolvareaproblemelordificile,mari:– "GrandChallenge"(en.wikipedia.org/wiki/Grand_Challenge)problemsrequiring
PetaFLOPSandPetaBytesofcomputingresources.– Websearchengines/databasesprocessingmillionsoftransactionspersecond
• Folosirearesurselornon-locale:
– SETI@home(setiathome.berkeley.edu)usesover330,000computersforacomputepowerover528TeraFLOPS(asofAugust04,2008)
– Folding@home(folding.stanford.edu)usesover340,000computersforacomputepowerof4.2PetaFLOPS(asofNovember4,2008)
Directiiinprocesareaparalela
• Arhitecturiparalele– NecesitatiHardware– Computersystemdesign
• Sistemedeoperare(Paralelism/concurenta)• Gestionareaaspectelorsistempentruuncalculatorparalel• Programareparalela
– Biblioteci(low-level,high-level)– Limbaje– Mediidedezvoltare– Software
• AlgoritmiParaleli• Evaluareaperformanteiprogramelorparalele• Testareavs.asigurareacorectitudinii• Paralleltools:
– Performanta,analize,vizualizare,…
21Curs 1 - PPD -
Decesastudiemprogramareparalela?
• Arhitecturidecalcul– Inovatiileconduclanoimodeledeprogramare
• Convergentatehnologica– “killermicro”estepestetot– Laptop-urilesisupercomputeresuntfundamentalsimilare– Trend-urileactualeconduclaconvergentaabordarilordiverse
• Trenduriletehnologicefaccalcululparalelinevitabil– Multi-coreprocessors!– Acumoricesistemdecalculesteparalel
• Intelegereaprincipiilorfundamentale!!!– Programare,comunicatii,memorie,…– Performanta
• “Parallelismisthefutureofcomputing”-BlaiseBarney– M.Andrews,J.S.Walicki.“Concurrencyandparallelism—futureofcomputing”in
ProceedingofACM'85Proceedingsofthe1985ACMannualconferenceonTherangeofcomputing:mid-80'sperspective.pp.224-231.
22Curs 1 - PPD -
InevitabilitateaProcesariiParalele
• Cerinteleptaplicatii– Necesitateauriasadecicluridecalcul
• Trenduritehnologice– Procesaresimemorie
• TrenduriArchitecturale• Factorieconomici• Treduriactuale:
– Today’smicroprocessorshavemultiprocessorsupport– Serversandworkstationsavailableasmultiprocessors– Tomorrow’smicroprocessorsaremultiprocessors– Multi-coreisheretostayand#cores/processorisgrowing– Accelerators(GPUs,gamingsystems)
23Curs 1 - PPD -
exemple…
• ProcesorAMDRyzenThreadripper1950X~1300USD(2017)– MemorieCache40MB– Frecventaprocesor(MHz)3500– TurboBoostpanala 4000MHz– Numarnuclee 16Nuclee=>32Threads
• Intel®Xeon®ProcessorE7v4Family~8000USD(2017)
– #ofCores=24– #ofThreads=48– ProcessorBaseFrequency=2.40GHz– MaxTurboFrequency=3.40GHz– Cache=60MB
Curs 1 - PPD - 24
UBBCLUSTER–IBMIntelligentCluster• Hybridarchitecture
– HPCsystem+– privatecloud
PPD-curs2 25
HPC – IBM NextScale
• Rpeak 62 Tflops, Rmax 40 Tflops • 68 noduri NX360 M5, din care
– 12 nodes with 2 GPU Nvidia K40X, – 6 nodes with Intel Phi
• 2 processors E5-2660 v3 with 10Cores per node • 128 GB RAM per node, 2 HDD SATA de 500 Gb / node • Subscription rate 1:1 between nodes based on Switch: IB Melanox SX6512
with 216 ports • Storage NetApp E5660, 120 HDD SAS cu 600 Gb/Hdd => total 72Tb
– IBM GPFS 4.x -parallel file system
• IBM TS3100 Tape library for data archivation • Operating systems on each node : RedHat Linux 6 with subscription • Management Software: IBM Platform HPC 4.2
PPD-curs2 26
Private Cloud – IBM Flex System
• 10 virtualization servers Flex System x240 – 128 Gb RAM / server – Procesoare 2 x Intel Xeon E5-2640 v2 / server – 2 x SSD SATA 240 Gb / server
• 1 management server • Software for private cloud: IBM cloud manager with OpenStack 4.2 • Software for monitorizing and management: IBM Flex System Manager
software stack • Virtualization software: Vmware vSphere Enterprise 5.1
PPD-curs2 27
Mini cluster – IBM nestscale
~90cores
• 4nodesx2processorsX10cores• 1Managementnode• IP:193.226.40.133
• NOTA– Rmax-MaximalLINPACK(benchmark)performanceachieved– Rpeak-Theoreticalpeakperformance.
• NodeperformanceinGFlops=(CPUspeedinGHz)x(numberofCPUcores)x(CPUinstructionpercycle)x(numberofCPUspernode)
28PPD-curs2
Aplicatiiinformaticeperformante
• Performantaaplicatiilorimpunehardwareperformant(rapid,resursemultiple,etc)
• Hardware-ulavansatgenereazanoiaplicatii• Noileaplicatiiaucererideperformantamaimari
– Crestereexponentialaaperformanteimicroprocesoarelor– Inovatiiinarhitecturileparalelesiinintegrare
• Cerintedeperformanta=>– Performantasistemelortrebuiesaseimbunatateascainansamblu– Schimbari/abordari/reevaluariinSoftwareengineering– Costuri-technologieavansata
applicationsperformance
hardware
29Curs 1 - PPD -
Curs 1 - PPD - 30
Programareparalelavs.Programaredistribuita
INTERCONNECTION NETWORK
P2
P3
P1
P4 P5
Pn . . . .
Curs 1 - PPD - 31
TiPURI DE MULTIPROCESARE PARALLEL DISTRIBUTED
ASPECTE TEHNICE • PARALLEL COMPUTERS (- IN MOD UZUAL ) LUCREAZA BAZAT PE
• CUPLARE STRANSA, • in general bazate pe SINCRONICITATE, • CU UN SISTEM DE COMUNICATIE FOARTE RAPID SI FIABIL • Spatiu unic de adresare (intr-o masura mare)
• DISTRIBUTED COMPUTERS • MAI INDEPENDENTE, • COMUNICATIE MAI PUTIN FRECVENTA SI mai putin RAPIDA (ASINCRONA) • COOPERARE LIMITATA • NU EXISTA CEAS GLOBAL • “Independent failures”
SCOPURI
• PARALLEL COMPUTERS COOPEREAZA PENTRU A REZOLVA MAI EFICIENT PROBLEME DIFICILE
• DISTRIBUTED COMPUTERS AU SCOPURI INDIVIDUALE SI ACTIVITATI PRIVATE. DOAR UNEORI INTERCOMUNICAREA ESTE NECESARA
PARALLEL COMPUTERS: COOPERARE IN SENS “POSITIV”
DISTRIBUTED COMPUTERS: COOPERARE IN SENS “NEGATIV” -- DOAR ATUNCI CAND ESTE NECESARA
Curs 1 - PPD - 32
Aplicatii paralele Suntem interesati sa rezolvam problemele mai rapid in paralel
Aplicatii distribuite
Suntem interesati sa rezolvam anumite probleme specifice :
• COMMUNICATION SERVICES ROUTING BROADCASTING
• MAINTENANCE OF CONTROL STUCTURE TOPOLOGY UPDATE LEADER ELECTION • RESOURCE CONTROL ACTIVITIES LOAD BALANCING MANAGING GLOBAL DIRECTORIES
Ingeneral…
Curs 1 - PPD - 33
Parallelv.s.DistributedSystems(fromM.FUKUDACSS434SystemModels)
Parallel Systems Distributed Systems
Memory Tightly coupled shared memory UMA, NUMA
Distributed memory Message passing, RPC, and/or used of distributed shared memory
Control Global clock control SIMD, MIMD
No global clock control Synchronization algorithms needed
Processor interconnection
Order of Tbps Bus, mesh, tree, mesh of tree, and hypercube (-related) network
Order of Gbps Ethernet(bus), token ring and SCI (ring), myrinet(switching network)
Main focus Performance Ex. - Scientific computing
Performance(cost and scalability) Reliability/availability Information/resource sharing
Unpunctdevedere…
Curs 1 - PPD - 34
SistemeleDistribuite
-potfifolositepentru-
• Aplicatiidistribuiteimplicit– BDDistribuite,rezervaribilieteavion/etc.sistembancar
• Informatiipartajateintreuseri• Partajareresurse• Raportcost/performantamaibunptaplicatiiparalele
– Potfifolositeeficientpt.aplicatiicugranularitatemare(coarse-grained)si/sauptaplicatiiparaleledetipembarrassinglyparallelapplications
• Fiabilitate(Reliability).• Scalabilitate
– Cuplareslaba(Looselycoupledconnection);hotplug-in
• Flexibilitate– Reconfiguraresistemptaintrunicerintele
Curs 1 - PPD - 35
Performanta/Scalabilitate
Spredeosebiredesistemeleparaleleceledistribuiteimplica:- mediumaiputinrapiddetransferaldatelor(reteamaiputinrapida)- HeterogenitateSolutii:-Procesarebatchamesajelor:
• SeevitainterventiaSOptfiecaretransferdemesaj.– Cachedata
• Seevitarepetareatransferuluiaceleiasidate– Evitateaentitatilorsiaalgoritmilorcentralizati
• Evitareasaturariiretelei– Realizareoperatii“post”lanivelulclientului
• Evitareatraficuluiintensintreclientisiservere
– ….
Curs 1 - PPD - 36
Securitate
• Nuexistadoarunsingurpunctdecontrol
• Probleme:– Mesaje,furate,modificate,copiate,…
• Solutie:folosireCriptografie– Failures
• FaultTolerancesolutions
Tipuridesistemedistribuite(Munehiro Fukuda – PDP Fundamentals)
• Modele– Minicomputer– Workstation– Workstation-server– Processor-pool– Cluster– Gridcomputing
Curs 1 - PPD - 37
38
ModelulMinicomputer
• ExtensieasistemuluideTimesharing– Useriitrebuiesaseloghezepepropriulminicomputer.– Apoiselogheazalaaltamasina(remotemachine)prinprogramede
tiptelnet.• Partajareresurse
– DB– High-performancedevices
Mini-computer
Mini-computer
Mini-computer
net
Curs 1 - PPD -
Curs 1 - PPD - 39
ModelulWorkstation
• MigrareProcese– Useriiselogheazamaiintaipestatiadelucrupersonala;– Dacaexistastatii“inasteptare”–unjob“mare”poatemigrala
unadintreele.• Probleme:
– Cumseidentificastatiile“inasteptare”(idle)?– Cummigreazaunjob?– Ceseintampladacaunaltuserselogheazapemasinafolosita?
100MbpsLAN
Workstation
Workstation Workstation
WorkstationWorkstation
Curs 1 - PPD - 40
ModelulWorkstation-Server
• StatiiClient– AplicatiileGrafice/interactiveseproceseazalocal– Ptaltecereridecalculsetrimitcererilaservere.
• Servere(minicomputers)– Fiecareserverestededicatunuiasaumaimultor
tipurideservicii.• Modeldecomunicare:Client-Servermodel
– RPC(RemoteProcedureCall)– RMI(RemoteMethodInvocation)
• Unprocesclientcheamaofunctieaprocesuluiserver.
• Nusefacemigraredeprocese• Examplu:NFS
100GbpsLAN
Workstation
Workstation Workstation
Mini-Computerfileserver
Mini-Computerhttpserver
Mini-Computercycleserver
Curs 1 - PPD - 41
ModelulProcessor-Pool
• Clientii:– Selogheazalaunterminal– Toateserviciilesuntgestionatede
catreservere.• Servere:
– Ptfiecareusersealocanrnecesardeprocesoaredinpool.
• Utilizarebunadarinteractivitateslaba.
Server1
100MbpsLAN
ServerN
Curs 1 - PPD - 42
ClusterModel
• ConstainmaimultePC/workstationsconectatelaoreteadetiphigh-speed.
• Focuspeperformanta.100Mbps
LAN
Workstation
Workstation Workstation
Masternode
Slave1
SlaveN
Slave2
1GbpsSAN
httpserver1
httpserver2
httpserverN
Curs 1 - PPD - 43
High-speedInformationhighway
GridComputing
• Scop– Colectareaputeridecalculamaimultor
supercomputeresiclusteredispersategeografic
• DistributedSupercomputing– Ptproblemwfoartemari.Dificile.
(CPUintensive,memoryintensive).• High-ThroughputComputing
– Folosireamultorresursecarenusuntfolosite
• On-DemandComputing– Resurseladistantaintegrateincalculullocal
• Data-intensiveComputing– distributeddata
• CollaborativeComputing– Suportptcomunicareintremaimulteparti
Super-computer
Cluster
Super-computer Cluster
Mini-computer
Workstation
Workstation Workstation