Transcript

Simulation on ReconfiguraSimulation on ReconfiguraSimulation on ReconfiguraDipl Inf Alexander Schöll Prof Dr Hans JoDipl.-Inf. Alexander Schöll, Prof. Dr. Hans-JopI tit t f C t A hit t d C tInstitute of Computer Architecture and Computp p

MotivationMotivationHeterogeneous computer architectures are integrated into single chipsHeterogeneous computer architectures are integrated into single chips

Heterogeneous ArchitecturesHeterogeneous Architectures

Latency‐optimized Throughput‐optimized Reconfigurabley p g p p g

GPU FPGACPU GPUGraphics

FPGAField

CPUCentral Graphics

ProcessingField 

ProgrammableCentral 

Processing ProcessingUnit

ProgrammableGate Array

Processing Unit y

CPU GPU CPU FPGACPU GPUCPU GPU FPGACPU GPU CPU FPGACPU GPUCPU GPU FPGA

x86‐64ARM

Intel MIC AMD PiledriverI t l H ll

AMD RXN idi K l

Xilinx Zynq Xilinx VirtexAlt St tiARM

MIPSIntel HaswellNvidia Maxwell

Nvidia Kepler Altera StratixMIPSSPARC

Nvidia MaxwellSPARC

Overview of current heterogeneous architecturesOverview of current heterogeneous architectures

Upcoming: Reconfigurable Heterogeneous Computer ArchitecturesUpcoming: Reconfigurable Heterogeneous Computer Architectures

GoalGoal

Development of new methods that enable the direct mapping of− Development of new methods that enable the direct mapping of simulation applications to innovative reconfigurable heterogeneoussimulation applications to innovative reconfigurable heterogeneous computer architecturesp

Current CollaborationsCurrent CollaborationsExpansion of the Pro-Apoptotic Mapping of DomaExpansion of the Pro-Apoptotic Mapping of Doma

Receptor Clustering Model the Nonlinear WReceptor Clustering Model the Nonlinear WH tCooperation with M. Daub • G. Schneider Heterogeneoug

C ti ithInclusion of the extracellular space Cooperation withpProblem: Solve theSimulation of random impact of Ligands on Problem: Solve the Simulation of random impact of Ligands on

th ll b on large domains wthe cell membrane on large domains wnodes

Simulation on multiple time scales for optimal nodes

p pbalance between biological process u u=balance between biological process tt xxu u= −resolution and computation costsresolution and computation costs

Mapping of equationMapping of particle simulation to

Mapping of equationMapping of particle simulation to heterogeneous archheterogeneous architectures

heterogeneous archheterogeneous architectures

SimTech Phase I: Simulation Applications andSimTech Phase I: Simulation Applications andppParallel Simulation of Pro-Apoptotic Acceleration of MParallel Simulation of Pro-Apoptotic Acceleration of M

Receptor Clustering on GPGPU Carlo MoleculaReceptor Clustering on GPGPUM C A hit t [1]

Carlo MoleculaH b id C tMany-Core Architectures[1] Hybrid Computy y p

C. Braun • M. Daub • A. Schöll • G. Schneider • H.-J. Wunderlich C. Braun • S. Holst • J. Cas

Apoptosis: Structured decomposition of Markov-Chain MontApoptosis: Structured decomposition of d d i f t d ll i lti ll l

Markov Chain Monti l ti thdamaged or infected cells in multi-cellular simulations are the

organisms thermodynamicsorganisms thermodynamics

Initiated by signal competent clusters of Problem: Inherent seInitiated by signal competent clusters of Problem: Inherent setumor necrosis factor receptors (TNFR) and this method very hartumor necrosis factor receptors (TNFR) and this method very harcorresponding TNF Ligands on the cell Typically coarse grap g gmembrane

Typically coarse-gramembrane. distribution of simulaP ti l i l ti h b d t

distribution of simulak t ti idParticle simulation has been mapped to or workstation grids pp

GPGPU many core architectureg

GPGPU many-core architecture. MCMC molecular simR d ti f i l ti ti f d t GPGPUReduction of simulation times from mapped to GPGPUsmonths to hours enables practical energy calculationsmonths to hours enables practical energy calculations application of the model in research evaluation of Monte-application of the model in research (S d 300 )

evaluation of Monte(Speedup: 300x) Speedups of up to( p p ) Speedups of up to

[1] C Braun M Daub A Schöll G Schneider and H J Wunderlich "Parallel Sim[1] C. Braun, M. Daub, A. Schöll, G. Schneider, and H.-J. Wunderlich, Parallel Simi P IEEE I t ti l C f Bi i f ti d Bi di i BIBin Proc. IEEE International Conference on Bioinformatics and Biomedicine, BIB

[2] C Braun S Holst J Castillo J Groß and H J Wunderlich "Acceleration of[2] C. Braun, S. Holst, J. Castillo, J. Groß, and H.-J. Wunderlich, Acceleration ofi P 30th IEEE I t ti l C f C t D i ICCD'12 Min Proc. 30th IEEE International Conference on Computer Design, ICCD'12, Mo

able Heterogeneous Architecturesable Heterogeneous Architecturesable Heterogeneous Architecturesoachim Wunderlichoachim Wunderlich

E i i U i it f St tt t Ger Engineering, University of Stuttgart, Germanyg g, y g , y

ChallengesChallengesgMapping of simulation applicationsMapping of simulation applications

− Simulation applications contain algorithms from distinct problem classesSimulation applications contain algorithms from distinct problem classesDense

Spectral StructuredSparseLinear 

SpectralM h d

Structured Grids U t t d

pLinearAlgebra

MethodsN‐Body

Grids Unstructuredd

Linear Algebra

gAlgebra N BodyGrids

gAlgebra

Many different programming paradigms and languages− Many different programming paradigms and languages

Achieving optimal performanceAchieving optimal performance

P i l f d l i h d b id d− Particular aspects of underlying hardware structures must be consideredPerformance depends on the combination of implementation and utilized− Performance depends on the combination of implementation and utilized architecturearchitecture

CPU

GPUOptimal

GPUMapping?pp g

FPGA

Simulation Mapping Heterogeneous Reliability Applications

Mapping gArchitecturesReliability Applications Architectures

Simulation applications are executed for days and months− Simulation applications are executed for days and months− Fault-tolerant execution memories and communication requiredFault tolerant execution, memories and communication required

Current WorkCurrent WorkImplementation of specific compute modulesin Decomposition for Implementation of specific compute modules in Decomposition for for heterogeneous architecturesWave Equation to for heterogeneous architecturesWave Equation to

A hit t − Modules perform basic tasks in targetedus Architectures Modules perform basic tasks in targeted h M D b G S h id simulation applicationsh M. Daub • G. Schneider ppnonlinear wave equation − Implementation in distinctive realizations fornonlinear wave equation Implementation in distinctive realizations for

diff t t f tiwith a huge amount of different types of computing resourceswith a huge amount of

− Goal: Abstract and hide the actual execution f d l f th i l ti li ti

3d u n u+ of modules from the simulation applicationc cd u n u− +

n solver to reconfigurablen solver to reconfigurable CPU CPU CPU CPUFFT

hitecturesCPU CPU CPU CPUFFT

hitecturesMM

GPU GPU GPU GPUMapping

CFD

d Collaborations FPGA FPGA FPGA FPGA…d CollaborationsSimulation Modules Heterogeneous

Markov-Chain Monte- Applications ArchitectureMarkov-Chain Monte- pp

ar Simulations onDevelopment of resource management and

ar Simulations on t A hit t [2] Development of resource management and ter Architectures[2]

load-balancing mechanismsload balancing mechanismsstillo • J. Groß • H.-J. Wunderlich

− Investigation of efficient module organizationInvestigation of efficient module organization and execution strategieste-Carlo (MCMC) gte Carlo (MCMC)

f t k i − Goal: Optimization of simulation performancecore of many tasks in Goal: Optimization of simulation performance d d ti t h i t ditiand adaption to changing system conditions

during runtimeerial dependencies make during runtime erial dependencies make rd to parallelizerd to parallelize

GPUCPU …

ined parallelization andined parallelization and ation instances on clustersation instances on clusters i li d

FFT MM CFD

is appliedDevelopment of fault tolerance measures for

ppDevelopment of fault-tolerance measures for mulation has been modulesl iti ll l moduless, exploiting parallel

− Repeated or redundant execution of modulesand speculative − Repeated or redundant execution of modules and speculative and comparison of results-Carlo moves pCarlo moves

− Algorithm-based fault-tolerance400x Algorithm based fault tolerance 400x

mulation of Apoptotic Receptor Clustering on GPGPU Many Core Architectures"mulation of Apoptotic Receptor-Clustering on GPGPU Many-Core Architectures ,BM 2012 Phil d l hi USA 4 7 O t b 2012 155 160BM 2012, Philadelphia, USA, 4.-7. October, 2012, pp. 155-160.f Monte Carlo Molecular Simulations on Hybrid Computing Architectures"f Monte-Carlo Molecular Simulations on Hybrid Computing Architectures ,

t l C d 30 9 3 10 2012 207 212ontreal, Canada, 30.9.-3.10.2012, pp. 207-212.