1
Simulation on Reconfigura Simulation on Reconfigura Simulation on Reconfigura Dipl Inf Alexander Schöll Prof Dr Hans Jo Dipl.-Inf. Alexander Schöll, Prof. Dr. Hans-Jo I tit t fC t A hit t dC t Institute of Computer Architecture and Comput Motivation Motivation Heterogeneous computer architectures are integrated into single chips Heterogeneous computer architectures are integrated into single chips Heterogeneous Architectures Heterogeneous Architectures Latencyoptimized Throughputoptimized Reconfigurable GPU FPGA CPU GPU Graphics FPGA Field CPU Central Graphics Processing Field Programmable Central Processing Processing Unit Programmable Gate Array Processing Unit CPU GPU CPU FPGA CPU GPU CPU GPU FPGA CPU GPU CPU FPGA CPU GPU CPU GPU FPGA x8664 ARM Intel MIC AMD Piledriver Itl H ll AMD RX N idi K l Xilinx Zynq Xilinx Virtex Alt St ti ARM MIPS Intel Haswell Nvidia Maxwell Nvidia Kepler Altera Stratix MIPS SPARC Nvidia Maxwell SPARC Overview of current heterogeneous architectures Overview of current heterogeneous architectures Upcoming: Reconfigurable Heterogeneous Computer Architectures Upcoming: Reconfigurable Heterogeneous Computer Architectures Goal Goal Development of new methods that enable the direct mapping of Development of new methods that enable the direct mapping of simulation applications to innovative reconfigurable heterogeneous simulation applications to innovative reconfigurable heterogeneous computer architectures Current Collaborations Current Collaborations Expansion of the Pro-Apoptotic Mapping of Doma Expansion of the Pro-Apoptotic Mapping of Doma Receptor Clustering Model the Nonlinear W Receptor Clustering Model the Nonlinear W Ht Cooperation with M. Daub • G. Schneider Heterogeneou C ti ith Inclusion of the extracellular space Cooperation with Problem: Solve the Simulation of random impact of Ligands on Problem: Solve the Simulation of random impact of Ligands on th ll b on large domains w the cell membrane on large domains w nodes Simulation on multiple time scales for optimal nodes balance between biological process u u = balance between biological process tt xx u u = resolution and computation costs resolution and computation costs Mapping of equation Mapping of particle simulation to Mapping of equation Mapping of particle simulation to heterogeneous arch heterogeneous architectures heterogeneous arch heterogeneous architectures SimTech Phase I: Simulation Applications and SimTech Phase I: Simulation Applications and Parallel Simulation of Pro-Apoptotic Acceleration of M Parallel Simulation of Pro-Apoptotic Acceleration of M Receptor Clustering on GPGPU Carlo Molecula Receptor Clustering on GPGPU M C A hit t [1] Carlo Molecula H b id C t Many-Core Architectures [1] Hybrid Comput C. Braun • M. Daub • A. Schöll • G. Schneider • H.-J. Wunderlich C. Braun S. Holst J. Cas Apoptosis: Structured decomposition of Markov-Chain Mont Apoptosis: Structured decomposition of d d if td ll i lti ll l Markov Chain Mont i l ti th damaged or infected cells in multi-cellular simulations are the organisms thermodynamics organisms thermodynamics Initiated by signal competent clusters of Problem: Inherent se Initiated by signal competent clusters of Problem: Inherent se tumor necrosis factor receptors (TNFR) and this method very har tumor necrosis factor receptors (TNFR) and this method very har corresponding TNF Ligands on the cell Typically coarse gra membrane Typically coarse-gra membrane. distribution of simula P ti l i l ti h b dt distribution of simula k t ti id Particle simulation has been mapped to or workstation grids GPGPU many core architecture GPGPU many-core architecture. MCMC molecular sim Rd ti f i l ti ti f dt GPGPU Reduction of simulation times from mapped to GPGPUs months to hours enables practical energy calculations months to hours enables practical energy calculations application of the model in research evaluation of Monte- application of the model in research (S d 300 ) evaluation of Monte (Speedup: 300x) Speedups of up to Speedups of up to [1] C Braun M Daub A Schöll G Schneider and H J Wunderlich "Parallel Sim [1] C. Braun, M. Daub, A. Schöll, G. Schneider, and H.-J. Wunderlich, Parallel Sim i P IEEE I t ti lC f Bi i f ti d Bi di i BIB in Proc. IEEE International Conference on Bioinformatics and Biomedicine, BIB [2] C Braun S Holst J Castillo J Groß and H J Wunderlich "Acceleration of [2] C. Braun, S. Holst, J. Castillo, J. Groß, and H.-J. Wunderlich, Acceleration of i P 30th IEEE I t ti lC f C t D i ICCD'12 M in Proc. 30th IEEE International Conference on Computer Design, ICCD'12, Mo able Heterogeneous Architectures able Heterogeneous Architectures able Heterogeneous Architectures oachim Wunderlich oachim Wunderlich E i i Ui it f St tt tG er Engineering, University of Stuttgart, Germany Challenges Challenges Mapping of simulation applications Mapping of simulation applications Simulation applications contain algorithms from distinct problem classes Simulation applications contain algorithms from distinct problem classes Dense Spectral Structured Sparse Linear Spectral M hd Structured Grids U t t d Linear Algebra Methods NBody Grids Unstructured d Linear Algebra Algebra N Body Grids Algebra Many different programming paradigms and languages Many different programming paradigms and languages Achieving optimal performance Achieving optimal performance P i l f d li h d b id d Particular aspects of underlying hardware structures must be considered Performance depends on the combination of implementation and utilized Performance depends on the combination of implementation and utilized architecture architecture CPU GPU Optimal GPU Mapping? FPGA Simulation Mapping Heterogeneous Reliability Applications Mapping Architectures Reliability Applications Architectures Simulation applications are executed for days and months Simulation applications are executed for days and months Fault-tolerant execution memories and communication required Fault tolerant execution, memories and communication required Current Work Current Work Implementation of specific compute modules in Decomposition for Implementation of specific compute modules in Decomposition for for heterogeneous architectures Wave Equation to for heterogeneous architectures Wave Equation to A hit t Modules perform basic tasks in targeted us Architectures Modules perform basic tasks in targeted hM D b GSh id simulation applications h M. Daub • G. Schneider nonlinear wave equation Implementation in distinctive realizations for nonlinear wave equation Implementation in distinctive realizations for diff tt f ti with a huge amount of different types of computing resources with a huge amount of Goal: Abstract and hide the actual execution f dl f th i l ti li ti 3 du nu + of modules from the simulation application c c du nu + n solver to reconfigurable n solver to reconfigurable CPU CPU CPU CPU FFT hitectures CPU CPU CPU CPU FFT hitectures MM GPU GPU GPU GPU Mapping CFD d Collaborations FPGA FPGA FPGA FPGA d Collaborations Simulation Modules Heterogeneous Markov-Chain Monte- Applications Architecture Markov-Chain Monte- ar Simulations on Development of resource management and ar Simulations on t A hit t [2] Development of resource management and ter Architectures [2] load-balancing mechanisms load balancing mechanisms stillo J. Groß H.-J. Wunderlich Investigation of efficient module organization Investigation of efficient module organization and execution strategies te-Carlo (MCMC) te Carlo (MCMC) f t k i Goal: Optimization of simulation performance core of many tasks in Goal: Optimization of simulation performance d d ti t h i t diti and adaption to changing system conditions during runtime erial dependencies make during runtime erial dependencies make rd to parallelize rd to parallelize GPU CPU ined parallelization and ined parallelization and ation instances on clusters ation instances on clusters i li d FFT MM CFD is applied Development of fault tolerance measures for Development of fault-tolerance measures for mulation has been modules l iti ll l modules s, exploiting parallel Repeated or redundant execution of modules and speculative Repeated or redundant execution of modules and speculative and comparison of results -Carlo moves Carlo moves Algorithm-based fault-tolerance 400x Algorithm based fault tolerance 400x mulation of Apoptotic Receptor Clustering on GPGPU Many Core Architectures" mulation of Apoptotic Receptor-Clustering on GPGPU Many-Core Architectures , BM 2012 Phil d l hi USA 4 7Otb 2012 155 160 BM 2012, Philadelphia, USA, 4.-7. October, 2012, pp. 155-160. f Monte Carlo Molecular Simulations on Hybrid Computing Architectures" f Monte-Carlo Molecular Simulations on Hybrid Computing Architectures , t lC d 30 9 3 10 2012 207 212 ontreal, Canada, 30.9.-3.10.2012, pp. 207-212.

Simulation on Reconfiguraable Heterogeneous Architectures · Expansion of the Pro-ApoptoticApoptotic Mapping of Doma Receptor Clustering Model the Nonlinear W Cooperation with M

  • Upload
    ngoanh

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Simulation on ReconfiguraSimulation on ReconfiguraSimulation on ReconfiguraDipl Inf Alexander Schöll Prof Dr Hans JoDipl.-Inf. Alexander Schöll, Prof. Dr. Hans-JopI tit t f C t A hit t d C tInstitute of Computer Architecture and Computp p

MotivationMotivationHeterogeneous computer architectures are integrated into single chipsHeterogeneous computer architectures are integrated into single chips

Heterogeneous ArchitecturesHeterogeneous Architectures

Latency‐optimized Throughput‐optimized Reconfigurabley p g p p g

GPU FPGACPU GPUGraphics

FPGAField

CPUCentral Graphics

ProcessingField 

ProgrammableCentral 

Processing ProcessingUnit

ProgrammableGate Array

Processing Unit y

CPU GPU CPU FPGACPU GPUCPU GPU FPGACPU GPU CPU FPGACPU GPUCPU GPU FPGA

x86‐64ARM

Intel MIC AMD PiledriverI t l H ll

AMD RXN idi K l

Xilinx Zynq Xilinx VirtexAlt St tiARM

MIPSIntel HaswellNvidia Maxwell

Nvidia Kepler Altera StratixMIPSSPARC

Nvidia MaxwellSPARC

Overview of current heterogeneous architecturesOverview of current heterogeneous architectures

Upcoming: Reconfigurable Heterogeneous Computer ArchitecturesUpcoming: Reconfigurable Heterogeneous Computer Architectures

GoalGoal

Development of new methods that enable the direct mapping of− Development of new methods that enable the direct mapping of simulation applications to innovative reconfigurable heterogeneoussimulation applications to innovative reconfigurable heterogeneous computer architecturesp

Current CollaborationsCurrent CollaborationsExpansion of the Pro-Apoptotic Mapping of DomaExpansion of the Pro-Apoptotic Mapping of Doma

Receptor Clustering Model the Nonlinear WReceptor Clustering Model the Nonlinear WH tCooperation with M. Daub • G. Schneider Heterogeneoug

C ti ithInclusion of the extracellular space Cooperation withpProblem: Solve theSimulation of random impact of Ligands on Problem: Solve the Simulation of random impact of Ligands on

th ll b on large domains wthe cell membrane on large domains wnodes

Simulation on multiple time scales for optimal nodes

p pbalance between biological process u u=balance between biological process tt xxu u= −resolution and computation costsresolution and computation costs

Mapping of equationMapping of particle simulation to

Mapping of equationMapping of particle simulation to heterogeneous archheterogeneous architectures

heterogeneous archheterogeneous architectures

SimTech Phase I: Simulation Applications andSimTech Phase I: Simulation Applications andppParallel Simulation of Pro-Apoptotic Acceleration of MParallel Simulation of Pro-Apoptotic Acceleration of M

Receptor Clustering on GPGPU Carlo MoleculaReceptor Clustering on GPGPUM C A hit t [1]

Carlo MoleculaH b id C tMany-Core Architectures[1] Hybrid Computy y p

C. Braun • M. Daub • A. Schöll • G. Schneider • H.-J. Wunderlich C. Braun • S. Holst • J. Cas

Apoptosis: Structured decomposition of Markov-Chain MontApoptosis: Structured decomposition of d d i f t d ll i lti ll l

Markov Chain Monti l ti thdamaged or infected cells in multi-cellular simulations are the

organisms thermodynamicsorganisms thermodynamics

Initiated by signal competent clusters of Problem: Inherent seInitiated by signal competent clusters of Problem: Inherent setumor necrosis factor receptors (TNFR) and this method very hartumor necrosis factor receptors (TNFR) and this method very harcorresponding TNF Ligands on the cell Typically coarse grap g gmembrane

Typically coarse-gramembrane. distribution of simulaP ti l i l ti h b d t

distribution of simulak t ti idParticle simulation has been mapped to or workstation grids pp

GPGPU many core architectureg

GPGPU many-core architecture. MCMC molecular simR d ti f i l ti ti f d t GPGPUReduction of simulation times from mapped to GPGPUsmonths to hours enables practical energy calculationsmonths to hours enables practical energy calculations application of the model in research evaluation of Monte-application of the model in research (S d 300 )

evaluation of Monte(Speedup: 300x) Speedups of up to( p p ) Speedups of up to

[1] C Braun M Daub A Schöll G Schneider and H J Wunderlich "Parallel Sim[1] C. Braun, M. Daub, A. Schöll, G. Schneider, and H.-J. Wunderlich, Parallel Simi P IEEE I t ti l C f Bi i f ti d Bi di i BIBin Proc. IEEE International Conference on Bioinformatics and Biomedicine, BIB

[2] C Braun S Holst J Castillo J Groß and H J Wunderlich "Acceleration of[2] C. Braun, S. Holst, J. Castillo, J. Groß, and H.-J. Wunderlich, Acceleration ofi P 30th IEEE I t ti l C f C t D i ICCD'12 Min Proc. 30th IEEE International Conference on Computer Design, ICCD'12, Mo

able Heterogeneous Architecturesable Heterogeneous Architecturesable Heterogeneous Architecturesoachim Wunderlichoachim Wunderlich

E i i U i it f St tt t Ger Engineering, University of Stuttgart, Germanyg g, y g , y

ChallengesChallengesgMapping of simulation applicationsMapping of simulation applications

− Simulation applications contain algorithms from distinct problem classesSimulation applications contain algorithms from distinct problem classesDense

Spectral StructuredSparseLinear 

SpectralM h d

Structured Grids U t t d

pLinearAlgebra

MethodsN‐Body

Grids Unstructuredd

Linear Algebra

gAlgebra N BodyGrids

gAlgebra

Many different programming paradigms and languages− Many different programming paradigms and languages

Achieving optimal performanceAchieving optimal performance

P i l f d l i h d b id d− Particular aspects of underlying hardware structures must be consideredPerformance depends on the combination of implementation and utilized− Performance depends on the combination of implementation and utilized architecturearchitecture

CPU

GPUOptimal

GPUMapping?pp g

FPGA

Simulation Mapping Heterogeneous Reliability Applications

Mapping gArchitecturesReliability Applications Architectures

Simulation applications are executed for days and months− Simulation applications are executed for days and months− Fault-tolerant execution memories and communication requiredFault tolerant execution, memories and communication required

Current WorkCurrent WorkImplementation of specific compute modulesin Decomposition for Implementation of specific compute modules in Decomposition for for heterogeneous architecturesWave Equation to for heterogeneous architecturesWave Equation to

A hit t − Modules perform basic tasks in targetedus Architectures Modules perform basic tasks in targeted h M D b G S h id simulation applicationsh M. Daub • G. Schneider ppnonlinear wave equation − Implementation in distinctive realizations fornonlinear wave equation Implementation in distinctive realizations for

diff t t f tiwith a huge amount of different types of computing resourceswith a huge amount of

− Goal: Abstract and hide the actual execution f d l f th i l ti li ti

3d u n u+ of modules from the simulation applicationc cd u n u− +

n solver to reconfigurablen solver to reconfigurable CPU CPU CPU CPUFFT

hitecturesCPU CPU CPU CPUFFT

hitecturesMM

GPU GPU GPU GPUMapping

CFD

d Collaborations FPGA FPGA FPGA FPGA…d CollaborationsSimulation Modules Heterogeneous

Markov-Chain Monte- Applications ArchitectureMarkov-Chain Monte- pp

ar Simulations onDevelopment of resource management and

ar Simulations on t A hit t [2] Development of resource management and ter Architectures[2]

load-balancing mechanismsload balancing mechanismsstillo • J. Groß • H.-J. Wunderlich

− Investigation of efficient module organizationInvestigation of efficient module organization and execution strategieste-Carlo (MCMC) gte Carlo (MCMC)

f t k i − Goal: Optimization of simulation performancecore of many tasks in Goal: Optimization of simulation performance d d ti t h i t ditiand adaption to changing system conditions

during runtimeerial dependencies make during runtime erial dependencies make rd to parallelizerd to parallelize

GPUCPU …

ined parallelization andined parallelization and ation instances on clustersation instances on clusters i li d

FFT MM CFD

is appliedDevelopment of fault tolerance measures for

ppDevelopment of fault-tolerance measures for mulation has been modulesl iti ll l moduless, exploiting parallel

− Repeated or redundant execution of modulesand speculative − Repeated or redundant execution of modules and speculative and comparison of results-Carlo moves pCarlo moves

− Algorithm-based fault-tolerance400x Algorithm based fault tolerance 400x

mulation of Apoptotic Receptor Clustering on GPGPU Many Core Architectures"mulation of Apoptotic Receptor-Clustering on GPGPU Many-Core Architectures ,BM 2012 Phil d l hi USA 4 7 O t b 2012 155 160BM 2012, Philadelphia, USA, 4.-7. October, 2012, pp. 155-160.f Monte Carlo Molecular Simulations on Hybrid Computing Architectures"f Monte-Carlo Molecular Simulations on Hybrid Computing Architectures ,

t l C d 30 9 3 10 2012 207 212ontreal, Canada, 30.9.-3.10.2012, pp. 207-212.