Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Adaptive Parallel Simulation of a Two-Timescale Model for Apoptotic Receptor-Clustering on GPUs
Cooperation with M. Daub • G. Schneider
Extending the Scope of Approximate Computing to Scientific Computing and Simulation Technology
Dipl.-Inf. Alexander Schöll, Prof. Dr. Hans-Joachim Wunderlich
E-mail: [email protected], [email protected]
Institute of Computer Architecture and Computer Engineering
MotivationHeterogeneous computer architectures
Goal: Efficient and fault-tolerant execution of simulation applications
Simulation on Reconfigurable Heterogeneous Architectures
ChallengesReliability• Simulation applications:
• often executed for days and months
• Modern CMOS devices: • Increasingly vulnerable to reliability threats
• Required: Fault-tolerant simulation algorithms
Achieving optimal performance• Performance depends on the combination of implementation and
utilized architecture
AlexanderSchöll
Hans-JoachimWunderlich
SimTech Cluster of Excellence
www.simtech.uni-stuttgart.de
Approximate Computing• Trade-off precision for efficiency
• Often limited to applications withinherent error tolerance
Applying approximate computing tosimulation technology• Tight accuracy constraints
• Often low error resilience
Acceleration of Markov-Chain Monte-Carlo Molecular Simulations
Cooperation with Cooperation with J. Castillo • J. Groß
Markov-Chain Monte-Carlo (MCMC)• Core of many tasks in thermodynamics
• Mapping to GPU: exploiting parallel energy calculations and speculative evaluation of Monte-Carlo moves
• Heterogeneous mappingto CPU and GPU resultsin significant speedups
Collaborations in SimTech
Current work
MolecularConfiguration
Motivation• Deeper understanding for the
activation of apoptosis
Simulation: Dominated by extensive computing times
Goals• Reduction of computation time
• … to obtain extensive and detailed conclusions about the clustering behavior
Computational Performance Results• Adaptive discretization of time and heterogeneous
mapping to CPU and GPU results in significant speedups
Biological Evaluation
Evolution of ligand-receptor clusters in less
than 0.5s
Preconditioned Conjugate Gradient (PCG):Important sparse linear system solver• Iterative solving method
Goal: PCG on approximate hardware with guaranteed result accuracy
Challenges:• Error resilience is changing over time
• Overhead by additional operationsto monitor error resilience
Solution: • Use efficient fault tolerance
to monitor and adapt approximation at runtime
Experimental Results
• Hardware utilization and iteration countcompared to execution on precise hardware
50%
60%
70%
80%
90%
100%
110%
120%
130%
Hardware utilization Iteration count
[1] A. Schöll, C. Braun, M. A. Kochte, and H.-J. Wunderlich, "Efficient Algorithm-Based Fault Tolerance for Sparse Matrix Operations", Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN'16, Toulouse, France, 28. June-1. July, 2016, pp. 251-262.
[2] A. Schöll, C. Braun, and H.-J. Wunderlich, "Applying Efficient Fault Tolerance to Enable the Preconditioned Conjugate Gradient Solver on Approximate Computing Hardware”, in Proceedings of the 29th Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT'16, University of Connecticut, USA, 19. – 20. September, 2016 , pp. 21 - 26. DFTS Best Paper Award 2016.
[3] A. Schöll, C. Braun, M. A. Kochte, and H.-J. Wunderlich, "Low-Overhead Fault-Tolerance for the Preconditioned Conjugate Gradient Solver", in Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT'15, Amherst, MA, USA, 12.-14. October, 2015, pp. 60-66.
[4] C. Braun, S. Holst, J. Castillo, J. Groß, and H.-J. Wunderlich, "Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures", in Proceedings of the 30th IEEE International Conference on Computer Design, ICCD'12, Montreal, Canada, 30. September-3. October, 2012, pp. 207-212.
[5] A. Schöll, C. Braun, M. Daub, G. Schneider, and H.-J. Wunderlich, "Adaptive Parallel Simulation of a Two-Timescale Model for Apoptotic Receptor-Clustering on GPUs", in Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2014, Belfast, UK, 2.-5. November, 2014, pp. 424-431. SimTech Best Paper Award 2014.
x86-64ARMSPARC
Intel MIC AMD ExcavatorIntel Skylake
Nvidia Pascal Xilinx Zynq Xilinx VirtexAltera Stratix
Central Processing Unit Graphics Processing Unit Field Programmable Gate Array
CPU GPU FPGA
CPU CPU GPU GPUGPUCPU CPU FPGAFPGA
Approximate Computing Paradigm
AC
Emerging
Trade-off precision fora gain in efficiency
Required: Exploit inherent error tolerance of applications
Approximate Computing in image processing