29
2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS [email protected] http://parallel.vub.ac.be

2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS [email protected]

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Limits of

Parallel/Distributed Computing

Prof. Erik DIRKXVUB-INFO-PADS

[email protected]://parallel.vub.ac.be

Page 2: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Introduction• (Cluster)Computers : a tool for a new way

of doing science & engineering (cheap:BYO !!!)

• “Hardware”

• “Software”

Page 3: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Need for Speed• Processing :

signal : structured (e.g. MP3)dynamic : unstructured

• Data : pictures, movie, simulation, …

• Interconnect :

bandwidth ><latency

Page 4: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Scaleable computing : COW• Cluster of• Workstations (TX : RanchOW)

Fundamental Observation : (Erik’s law)

(><marginal production cost = 0) (remember : 20%+ of earth = Si …)

• Only general purpose programmable devices will survive in the long term yet …

“programmable” = ??

??*)cost(:

0)$(lim:cost

npriceprofit

chipofcopynthn

Page 5: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

The original cluster (neo-cortex)• 10**11 general purpose neurons

=> compute & memory = “gray” matter

• 10**5 connections / neuron=> interconnect = “white” matter

• Switching time >1ms (digital PPM)• Input ~100 Mbps

(pre-thalamus)

• Output<<Input

storage : ~ 10**17 bits(do not drink & think …)

• ~20 W , Electro-Chemical,Carbo-Hydrate powered

Page 6: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

General Purpose (neo)Cortex• General purpose

“cellular columns”(e.g. blind musician)

• 6 layer : 1 in,1out,4 compute

• 4 A4 pages constant density

• Tuned by “emotional”subsystem: real time, pre-emptive priorities

• Hierarchy root = “prefrontal cortex” (L=+, R=-)

Page 7: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Comparison• Human (general purpose)

speed - - [12km/h]

endurance - - [42.195 km]

power - - [200w@120km]

force - - [52*13 ???]

accuracy - -

(re?)-configurability +++

=> Learning (Software ?)

• Other predator (special purpose)

speed ++ (e.g. cheetah)

endurance ++ (e.g. orca)

power ++ (e.g. hyena)

force ++ (e.g. shark)

accuracy ++ (e.g. eagle)

++ @ price of general purposeness

=> Genetics (Hardware?)

Page 8: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Fundamental Bound (to Your enthousiasm ?) (physical)

technology

problem

)lg(**)$(

*),$(

2

1

max

1

nnkS

nkMP

SS

T

TSSpeedup

yet

n

0dn

dS0

dn

dS

0dn

dS

Page 9: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

• Critical Parameter

informally

=> Hard <> Easy Problems

Granularity

eCommunicat

Compute

TT

GGray > (Compute cap.)

White < (Communication cap.)

Page 10: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Granularity (II)• Experience : situation, optimum :

too coarse => sub-optimal : !

too fine => comm bottleneck : !

• Tcomp = # instr * CPI * 1/f = Rproblem* Rmachine

• Tcomm = latency + #bits/bandwidth?=? Cproblem * Cmachine

• Cproblem = #databits

• Cmachine = …

Page 11: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Granularity (III)

• Cmachine =

• bandwidth ~ 1012 b/s => bw-1~1 ps

• latency 10Ghz = 0.1ns= 3 cm (vacuum);3mm (si)=> 1ps ~ 30m

bandwidthbits

latency 1

#

Page 12: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Granularity (IV)• Amdahl sections (i.e. bottlenecks, rest = “easy”

parallellism) : #bits ~ 1 !!

• ?? How to construct a “compiler”/computersystem with

dynamically tunable machine granularity to adapt to

dynamically varying demands on R and C from application(s)

• Structured ?! [ad hoc] / Unstructured ??

Page 13: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Fine Grain Parallellism

• FPGA implementation (NOT automatic)

• ATM switch sim @ faster than real time …

• Speed-up = traffic pattern dependent

Page 14: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Conclusion• Cluster computing is here to stay• Cluster computing is a vehicle for a new way of doing science &

engineering (for the masses)• COW is only one example of compute engines satisfying fundamental laws• (Digital) hardware : understood & economically sound• “Software” : cf. 1950’s

ad-hoc, need for language(s), theorethical support, run-time, fault tolerance, …

• VUB (INFO) : “Advanced Computer Architecture” + “Concurrente Systemen”(NL)/”Parallel Systems”(E)

• http://parallel.vub.ac.be

Page 15: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

General Purpose “Computer”

• Amplifying elements

transistor (n*1000 atoms + quantum

mechanics)

• Connecting Elements

wire/fibre/wireless(Maxwell equations)

Page 16: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Cluster : BYO 101

• Step 1 : design Your “compute element” : e.g. look-up table, ALU, …

and build a BIG factory

• Step 2 : design Your “memory element” : e.g. a capacitor, MRAM, …

and build a BIG factory

• Step 3 : design Your “switch” : e.g. cross-bar

and build a BIG factory

Page 17: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Generic Multiprocessor

Processors/Memory

Interconnect (1)

Front-end Processors/Memory

Interconnect (2)

VUB/Internet

Interconnect 1 : High Bandwidth, Low Latency, DL-free (!!)Interconnect 2 : OTS TCP/IP

Page 18: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Cluster BYO (1+2)• Step 1 // Step 2

PC based COW :

B(uild) Y(our) O(wn)

Motherboards : BI- or QUAD CPU ?!

CPU, Memory : OTSDisk : RAID5

Hierarchical Control (remember pre-frontal cortex…)

Bottleneck !!!!

Page 19: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Cluster BYO (3)• Step 3

Buy a few km of wire/fibre

Buy switches=> compute switches

(PVM/MPI)=> diagnostic out-of-

band(TCP/IP)

=> KVM switches

Page 20: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

VUB INFO : BYO (1995)• Design & Test

! Experience! Students

• Blue Gene (2004)256000 OTS CPUs4GB/CPU DRAM10 Gbps/node Power : ??MTBF : ??job run-time : months …

Page 21: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Other Examples

• Field Programmable Gate Array

Satisfies description, Fundamental Limits !“Program”/(Re)configure ??

• Hybrid : COW cluster + accelerator in each node e.g. Deep Blue : 32 * ( 1 + 8)

=> Variable Granularity …(someone interested in an interesting PhD topic??)

=>VUB / Erasmus hogeschool

Page 22: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Lookahead Accumulation inDiscrete Event Simulation

• Improve Gproblem through compile time aggregation

• A-synchronous synchronization system (!)

Page 23: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

System Software• Compute => Sequential Languages

(?? Non-determinism, synchronization)• Storage => Virtual Memory, RAID, …• Communication => Communication Library

e.g. “Parallel Virtual Machine” : Opene.g. “Message Passing Interface” : Standard

• Fundamental Issue :

Parallel Operating System n*Linux + MPI …

(21st century Microsoft/Intel ?)

Page 24: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Application Software

• BYO : kursus “Parallel Systems”, VUB

• Public Domain Packages=> granularity !!

Numerical, well structurednon-Numerical, dynamic, ill structureddatabases (e.g. Google)

Page 25: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

History• 1985 : B Army

mainframe + staff : <100 Kops, x00 MB,n*4800 bps

PC to “fine tune” + 1 temporary mil. service : >1Mips,20MB,10Mbps+ 1 EE/CS student in search for a PhD topic

• 1990 : “A Parallel Simulation Testbed for Computer Networks” : solved 0.1, posed 10 questions …

• 1992 : IBM T.J. Watson Vulcan/Deep Blue• 1993 : ETL, Tsukuba : Heterogeneous granularity • 1999 : Xilinx, San Jose : Reconfiguration

Page 26: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Vrije Universiteit Brussel : location

• Belgium : a EU experiment avant la lettre ??

Holland (A’dam) France (Paris)• ° [Alamo – 6]• 3 languages (NL,F,D)• 5 governements

(w/o county,city !)• NO supercomputer

• (Meta) stable ??• ~ Free education• 60 km coast / 10 M

people, 1 2L highway• No capital gains tx …• L&H … (Martha ??)• Airforce : F16 – ECM• CEC location & 1 of

the capitals …

Page 27: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Page 28: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Alternative

• Special purpose device

=> temporary=> point solution

??? $$$$ [design, debug, …]??? Dynamic environment

!? Power (cf. context)?! Patent (cf. EU software patent dispute …)

Page 29: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof. Erik DIRKX VUB-INFO-PADS Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

General Purpose Hardware• P(rocessor)

=> ALU (compute) + CU (control)

• M(emory)=> as much as possible=> as fast as possible

• S(witch)=> throughput (telecom !)=> latency (telecom ?)