Upload
elliott-boyle
View
13
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms. Kyungtae Han Ph.D. Defense Committee Members: Prof. Ross Baldick (Dept. of ECE) Prof. Brian L. Evans (Dept. of ECE), advisor Prof. Margarida F. Jacome (Dept. of ECE) - PowerPoint PPT Presentation
Citation preview
Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms
Kyungtae HanPh.D. Defense
Committee Members:Prof. Ross Baldick (Dept. of ECE)
Prof. Brian L. Evans (Dept. of ECE), advisorProf. Margarida F. Jacome (Dept. of ECE)Prof. Earl E. Swartzlander (Dept. of ECE)Prof. Robert A. van de Geijn (Dept. of CS)
Computer Engineering Curriculum TrackDept. of Electrical and Computer Engineering
The University of Texas at AustinMay 9th, 2006
2
Outline
• Introduction
• Background
• Contributions
Optimize fixed-point wordlengths
Reduce power consumption in arithmetic
Automate transformations of systems
• Conclusion
3
Implementing Digital Signal Processing Algorithms
Introduction
CodeConversion
WordlengthOptimization
Floating-Point Program
Fixed Point (Uniform Wordlength)
Fixed Point (Optimized Wordlength)
Floating-Point
Processor
Fixed-Point
Processor
Fixed-Point ASIC
Price Power*Hardware
Digital SignalProcessingAlgorithms
* Power consumption
HL
HL
HL
ASIC: Application Specific Integrated Circuit
4
Transformations to Fixed Point
• Advantages Lower hardware complexity Lower power consumption Faster speed in processing
• Disadvantages Introduces distortion due to
quantization error Search for optimum wordlength
by trial & error is time-consuming
• Research goals Automate transformations to fixed point Control distortion vs. complexity tradeoffs
CodeConversion
WordlengthOptimization
Floating-Point Program
Fixed Point (Optimized Wordlength)
Tra
nsfo
rmat
ion
Introduction
5
Outline
• Introduction
• Background
• Contributions
Optimize fixed-point wordlengths
Reduce power consumption in arithmetic
Automate transformations of systems
• Conclusion
6
Fixed-Point Data Format
• Integer wordlength (IWL) Number of bits assigned to integer representation
• Fractional wordlength (FWL) Number of bits assigned to fraction
• Wordlength (WL)
SystemC formatwww.systemc.org
FWLIWLWL
S X X X X X
Wordlength
Integer wordlength
Fractional wordlength
(Binary point)
π = 3.14159…(10) [Floating Point]
3.140625(10) = 011.001001(2) [WL=9; IWL=3; FWL=6]
3.141479492(10) = 011.00100100001110(2) [WL=16; IWL=3; FWL=13]
Background
7
Feasible region
Distortion vs. Complexity Tradeoffs
• Shorter wordlength may increase application distortion and decrease implementation complexity
Background
• Minimize implementation cost• Minimize application distortion
Implementationcomplexity c(w)
Applicationdistortion d(w)
Optimaltradeoff curve
c(w) Implementation cost function
Cmax Constant for maximum implementation cost
d(w) Application distortion function
Dmax Constant for maximum application distortion
Wordlength lower bounds
Wordlength upper boundsw
w
8
Wordlength Optimization Constraints
• Distortion constraint • Complexity constraint
Background
ImplementationComplexity c(w)
Application-specific distortion d(w)
Dmax
ImplementationComplexity c(w)
Application-specific distortion d(w)
Cmax
Enforcing both constraints bounds the search to a finite area region
9
Wordlength Optimization
• Wordlengths of signals (variables) in digital system as vector
• Multiple objective optimization
},,,,{ 1210 Nwwww w
Background
www
w
w
wwΙw
max
max
)(
)(
tosubject
)](),([ min
Cc
Dd
dcn
• Single objective optimization
www
w
w
wwΙw
max
max
)(
)(
tosubject
)()( min
Cc
Dd
daca dcn
10
Genetic Algorithm
• Evolutionary algorithm Inspired by Holland
1975 Mimic processes of
plant and animal evolution
Find optimum of a complex function
New GenePool
FunctionEvaluation
Mutation Selection
MatingChild
Genes
Parental Genes
Genes w/Measure
[From Greg Rohling’s Ph.D Defense 2004]
Background
11
Pareto Optimality
• Pareto optimality: “best that could be achieved without disadvantaging at least one group” [Allan Schick 1970]
• Pareto optimal set is set of nondominated solutions E is dominated by C as all objectives for C
are less than corresponding objectives for E Solutions A, B, C, D are nondominated (not
dominated by any solution)
• Pareto front is boundary (tradeoff curve) that connects Pareto optimal set solutions
Obj
ecti
ve
2
Objective 1
Pareto Front
: Nondominated : Dominated
F
E
GH
I
D
C
B
A
Background
12
Outline
• Introduction
• Background
• Contributions
Optimize fixed-point wordlengths
Reduce power consumption in arithmetic
Automate transformations of systems
• Conclusion
13
Search for Optimum Wordlength
• Complete search Search whole space Impractical in systems with many variables
• Gradient-based search Utilizes gradient information to determine next candidates Complexity measure (CM) [Sung and Kum, 1995]
Distortion measure (DM) [Han et al., 2001]
Complexity-and-distortion measure (CDM) [Han and Evans, 2004]
• Guided random search Genetic algorithm for single objective [Leban and Tasic, 2000]
Multiple objective genetic algorithm
Contribution #1
Proposed
Proposed
14
Complexity-and-Distortion Measure
• Weighted combination of measures
• Single objective function:• Gradient-based search
Initialization Iterative greedy search based
on complexity and distortiongradient information
)( )( )( www dcf dccd
www
w
w
wΙw
max
max
)(
)(
tosubject
)(min
Cc
Dd
fcdn
10,10,1 dcdcwhere
c(w) Complexity function
d(w) Distortion function
Dmax Constant for maximum distortion
Cmax Constant for maximum complexity
Contribution #1
15
Case Study: Filter Design
• Infinite impulse response (IIR) filter Complexity measure: Area model of field-programmable
gate array (FPGA) [Constantinides, Cheung, and Luk 2003]
Distortion measure: Root mean square (RMS) error Seven fixed-point variables (indicated by slashes)
Delay
b0
b1-a1
x[n] y[n]
Contribution #1
16
Case Study: Gradient-Based Search
• CDM could lead to lower complexity and lower number of simulations compared to DM and CM
Search
Method
Gradient
Measure
Number
of
Simulations
Complexity Estimate
(LUT)
Distortion
(RMS)*
Gradient
Gradient
Gradient
Complete
DM
CDM
CM
-
316
145
417
167 **
51.05
49.85
51.95
-
0.0981
0.0992
0.0986
-
* Maximum distortion measured by root mean square (RMS) error is 0.1** 167 = 268,435,456 (8.5 years, if 1 second per 1 simulation)
Contribution #1
17
Case Study: Genetic Algorithm
20 40 60 80 10010
-2
10-1
100
Area (LUTs)
Err
or
(RM
S)
non-dom (90/90)
20 40 60 80 10010
-2
10-1
100
Area (LUTs)
Err
or
(RM
S)
non-dom (67/90)
dom (23/90)
20 40 60 80 10010
-2
10-1
100
Area (LUTs)
Err
or
(RM
S)
non-dom (76/90)
dom (14/90)
100th Generation 250th Generation 500th Generation
• Search Pareto optimal set (nondominated) • Handles multiple objectives: Error and Area
* Population for one generation: 90
Pareto Front
Contribution #1
LUT: Lookup table
9,000 simulations 22,500 simulations 45,000 simulations
18
Case Study: Comparison
• Superpose gradient-based search (GS) results on GA results
• GS methods can get stuck in a local minimum
• GS methods reduce running time (CDM: 145 simulations)
* Required RMSmax for gradient-based search are Dmax {0.12, 0.1, 0.08}
500th Generation (45000 simulations)50th Generation (4500 simulations)
Contribution #1
20 40 60 80 10010
-2
10-1
100
Area (LUTs)
Err
or
(RM
S)
non-dom (90/90)
DM solutions
CDM solutions
CM solutions
20 40 60 80 10010
-2
10-1
100
Area (LUTs)
Err
or
(RM
S)
non-dom (35/90)
dom (55/90)
DM solutions
CDM solutions
CM solutions
19
Comparison of Proposed Methods
Gradient-based
search
Genetic
algorithm
Type of Solution One point Family of points
Tradeoff Curve Found No Yes
Execution Time Short Long
Amount of Computation Low High
Parallelism Low High
Contribution #1
20
Outline
• Introduction
• Background
• Contributions
Optimize fixed-point wordlengths
Reduce power consumption in arithmetic
Automate transformations of systems
• Conclusion
21
Lower Power Consumption in DSP
• Minimize power dissipation due to limited battery power and cooling system
• Multipliers often a major source of dynamic power consumption in typical DSP applications
• Multi-precision multipliers can select smaller multipliers (8, 16 or 24 bits) to reduce power consumption
• Wordlength reduction to select any word size [Han, Evans, and Swartzlander 2004]
Contribution #2
Proposed
22
Wordlength Reduction in Multiplication
• Input data wordlength reduction Smaller bits enough to represent,
e.g. π x π ≈ 9
• Truncation
• Signed right shift Move toward the least
significant bit (LSB)
Signed bit extended for arithmetic right shift
0001 0010 0011 01001101 1100 1010 1001
(a) Original Multiplication
0001 0010 0000 00001101 1100 0000 0000
(b) Reduction by Truncation
0000 0000 0001 00101111 1111 1101 1100
(c) Reduction by Signed Right Shift
Sign bit
Contribution #2
23
• Power dissipation Switching power consumption Static power consumption
• Switching power consumption Switching activity parameter, α Reduce α by wordlength
reduction
clkddLswitching fVCP 2
Relationship between reduced wordlength and switching parameter α in power consumption?
CL Load capacitance
Vdd Operating voltage
fclk Operating frequency
Power Reduction via Wordlength Reduction
Contribution #2
24
Analytical Method
• Consider stream of data for one of the multiplicands• Compare two adjacent numbers in stream after reduction• Expectation of bit
switching, x, withprobability Px L-bit input data Truncate input data
to M bits (N bits areremoved)
N-bit signed rightshift in L-bit input(Y is sign bit)
2)(
LXEL
22)(
MNLXEtr
2
)1|(2
1)0|(
2
1)(
L
YXEYXEXErs
L
xX xPxXE
0
)()(
S … …
L bits
M bits N bits
S … …
S S … SS …
Contribution #2
25
Analytical Method
Input Switching expectation
Full length used
L/2
Truncate N bits
M/2
N-bit signed right shift
L/2 Wordlength (L) = 16
Contribution #2
Reduction
No ReductionS … …
L bits
M bits N bits
S … …
S S … SS …
26
Dynamic Power Consumption for Wallace Multiplier (1 MHz)
Reduction(56%)
16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA)
Truncation- FirstTruncation- Second
Contribution #2
Truncate 1st argTruncate 2nd arg(recode,nonrecode)
Wallace multiplier used in TI 320C64 DSP
27
Dynamic Power Consumption for Radix-4 Modified Booth Multiplier (1 MHz)
Reduction(31%)
Sensitive(13%)
16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA)
Contribution #2
Swapping could have benefit
Radix-4 modified Booth multiplier used in TI 320C62 DSP
Truncate 1st argTruncate 2nd arg(recode,nonrecode)
28
Summary of Contribution #2
• Truncation to 8 bits reduces est. power consumption by 56% in Wallace and 31% in Booth 16-bit multipliers
• Signed right shift exhibits no est. power reduction in Wallace multiplier (for any shift) and 25% reduction in Booth multipliers (for 8-bit shift)
• Power consumption in tree-based multiplier Highly depends on input data Simulation of all switching activity matches analysis of switching
activity in reduced multiplicands in Wallace mult.• Operand swapping can reduce power consumption
In Booth multiplier, non-recoded operand 13% more sensitive in power consumption
Contribution #2
29
Outline
• Introduction
• Background
• Contributions
Optimize fixed-point wordlengths
Reduce power consumption in arithmetic
Automate transformations of systems
• Conclusion
30
Automating Transformations from Floating Point to Fixed Point
• Existing fixed-point tools Support fixed-point simulation Convert floating-point code to
raw fixed-point code Manually find optimum
wordlength by trial and error
• Automating transformations Fully automate conversion and wordlength
optimization process (Proposed)
Floating-PointProgram
Wordlength-OptimizedFixed-Point Program
CodeConversion
WordlengthOptimization
• SNU gFix, Autoscaler• CoWare SPW HDS• Synopsys CoCentric• MATLAB Fixed-point toolbox• MATLAB Fixed-point blockset• AccelChip DSP synthesis• Catalytic RMS, MCS
Fixed-point tools
Contribution #3
31
Automatic Transformation Flow
• Code generation Parse floating-point program Generate a raw fixed-point program and auxiliary
programs (top, objective, cost, etc.)
• Range estimation Estimate range to avoid overflow (Analytical/Simulation) Determine integer wordlength (IWL)
• Wordlength optimization Optimize wordlength according to given input, and error
specification (Analytical/Simulation) Determine fractional wordlength (FWL)
Code Generation
Wordlength Optimization
RangeEstimation
Contribution #3
32
Code Generation for Fixed-Point Program
• Adder function in MATLABFunction [c] = adder(a, b)c = 0;c = a + b;
Function [c] = adder_fx(a, b, numtype)c = 0;a = fi (a, numtype.a);b = fi (b, numtype.b);c = fi (c, numtype.c);c(:) = a + b;
(a) Floating point program for adder
(b) Raw fixed-point program
Function [c] = adder_fx(a, b)c = 0;a = fi (a, 1,32,16);b = fi (b, 1,32,16);c = fi (c, 1,32,16);c(:) = a + b;
(c) Converted fixed-point program for automating optimization (Proposed)
SWL
FWL
fi(a, S,WL,FWL) is a constructor function for a fixed-point object in fixed-point toolbox [S: Signed, WL: Wordlength, FWL: Fraction length]
Determined by designers
with trial and error
Contribution #3
33
Automating Transformation Environment for Wordlength Optimization
Top Program
Search Engine
EvaluationProgram
(Objectives)
Fixed-PointProgram
Floating-PointProgram
Error Estimation
Complexity Estimation
RangeEstimation
• Given floating-point program and options, auxiliary programs are automatically generated• Given input data, optimum wordlength is searched
Input Data
Gradient-based or Genetic algorithm
Optimum Wordlength
Contribution #3
34
Demo of Released Software
Contribution #3
35
Conclusion
• Search for optimum wordlength Gradient-based search reduces execution time with
complexity-and-distortion measure method while solutions could be trapped in local optimum
Genetic algorithm can find distortion vs. complexity tradeoff curve, but it requires longer execution time
• Reduce power consumption by data wordlength reduction of multiplicands
• Automate transformations from floating-point programs to fixed-point programs Free software release is available atwww.ece.utexas.edu/~bevans/projects/wordlength/converter/
Conclusion
36
End