Upload
prescott-pollard
View
16
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Profile-directed speculative optimization of reconfigurable floating point data paths. Workshop on Reconfigurable Computing at 2008 Ashley Brown, 27 th Jan 2007. Introduction. Computational science requires reproducible and accurate results IEEE-754 is a compromise - PowerPoint PPT Presentation
Citation preview
The Queen’s TowerThe Queen’s TowerImperial College LondonImperial College LondonSouth Kensington, SW7South Kensington, SW7
27th Jan 2008 | Ashley Brown
Profile-directed speculative optimization
ofreconfigurable floating
point data pathsWorkshop on Reconfigurable
Computing at 2008
Ashley Brown, 27th Jan 2007
27th Jan 2008 | Ashley Brown # 2
IntroductionIntroduction
• Computational science requires reproducible and accurate results
• IEEE-754 is a compromise– Broad range of values
– Many special cases
• Idea: use profiling to reduce range and remove special cases
Generate floating-point data-paths for FPGAs which are smaller and faster
• BUT KEEP RESULTS CONSISTENT WITH IEEE-754
27th Jan 2008 | Ashley Brown # 3
Advantages of Smaller Floating PointAdvantages of Smaller Floating Point
• Embedded Systems– Do the same work for a lower cost– Implement IEEE-754 compliant floating point where
it may not have been possible before
• High performance– Do more work with the same hardware– Increase in parallel execution on FPGAs– No need to sacrifice IEEE-754 compliance
Four Pictures to Explain: #1Four Pictures to Explain: #1
27th Jan 2008 | Ashley Brown # 4
Four Pictures to Explain: #2Four Pictures to Explain: #2
27th Jan 2008 | Ashley Brown # 5
Four Pictures to Explain: #3Four Pictures to Explain: #3
Four Pictures to Explain: #4Four Pictures to Explain: #4
27th Jan 2008 | Ashley Brown # 7
Pre-optimisation Post-optimisation
27th Jan 2008 | Ashley Brown # 8
Optimisation TechniqueOptimisation Technique
• Remove features from the floating-point unit:– Operand alignment– Normalisation– Operand swap
• If these were required, detect and fall-back to alternative solution:– Software-based on embedded/host processor– Hardware-based full implementation for larger
designs
Optimisation OptionsOptimisation Options
27th Jan 2008 | Ashley Brown # 9
The stages of optimisationThe stages of optimisation
• Profile target application with training datasets– Source usually FORTRAN, C
• Identify frequently-executed blocks
• Check for good value-locality
• Generate reduced-size floating point datapath– Reduced operand alignment hardware– Reduced normalisation hardware
• Error checking: execute with additional datasets, check error rates
27th Jan 2008 | Ashley Brown # 10
27th Jan 2008 | Ashley Brown # 11
FloatWatch ProfilerFloatWatch Profiler
• Valgrind-based value profiler
• Can return some metrics of interest here:– Floating point value
ranges– Ratio of floating point
operands
• Each has uses for optimisation!
27th Jan 2008 | Ashley Brown # 12
VFLOAT LibraryVFLOAT Library
• VHDL variable-precision floating-point library– Initially developed by Belanovic at Northeastern,
continued development under the supervision of Leeser
• Allows basic customisation of precision, exponent bit widths
• Further customisations added for our optimisations:– Operand alignment
– Normalisation
• Performance is lower than vendor-specific libraries
27th Jan 2008 | Ashley Brown # 13
Data-path GeneratorData-path Generator
• Takes user-selected data-path and generates VHDL implementation
• Assembles modified version of the RPL library – customised to allow removal of various items
• Builds hardware/software integration layer– C library for software– VHDL for hardware
• Does not modify the software source automatically (yet)
27th Jan 2008 | Ashley Brown # 14
Proof-of-Concept TestingProof-of-Concept Testing
• Original application modified to call C library (usually from FORTRAN)
• Data sent to hardware, calculated, and returned– Software waits for response– No data-aggregation or hardware-side error
detection occurs
• Software layer performs same calculation for verification
• Overall error rate reported
27th Jan 2008 | Ashley Brown # 15
‘‘ydl_pij’ydl_pij’
• ‘ydl_pij’ is an iterative solver for quantum mechanics, using the “Molecular Mechanics – Valence Bond” method
• Datasets of various sizes available, allowing a variety of test cases be used
• Initial profiling and testing use separate datasets
27th Jan 2008 | Ashley Brown # 16
‘‘ydl_pij’: Profiling (Hot Code Section)ydl_pij’: Profiling (Hot Code Section)
Narrow value ranges
27th Jan 2008 | Ashley Brown # 17
‘‘ydl_pij’: Identificationydl_pij’: Identification
• FloatWatch identifies the regions of code executing the most operations
• In this case, these show narrow value ranges
• Create optimised datapaths for testing– Maximum operand alignment reduced to 2n
, where n is in the range [1, 6]
– Normalisation hardware modified similarly
‘‘ydl_pij’ Error Rateydl_pij’ Error Rate
Not profiled
‘ydl_pij’: Error Rate and Size
• 20% size reduction with negligible re-execution rate (< 0.5%)
• 27% size reduction with 3% re-execution rate
• Size reduction permits ~40% increase parallelism due to better space usage
ydl_pij: Area saving for one F.P. ydl_pij: Area saving for one F.P. adder/subtractoradder/subtractor
27th Jan 2008 | Ashley Brown # 20
Pre-optimisation Post-optimisation
27th Jan 2008 | Ashley Brown # 21
Coming SoonComing Soon
• Per-operation optimisations– Currently only at data-path level
• Optimisation of operand-swap hardware
• Per-operation exponent customisation (size, bias)
• Performance evaluation using state-of-the-art FPGA accelerator hardware
• Implementation of error detection and re-execution
• Potential for even greater size reductions