View
218
Download
0
Category
Preview:
Citation preview
1
Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders
Chung-Kuan ChengComputer Science and Engineering
Depart. University of California, San Diego
2
Outline Prefix Adder Problem
Background & Previous Work Extensions: High-radix, Ling
Our Work Area/Timing/Power Models Mixed-Radix (2,3,4) Adders ILP Formulation
Experimental Results Future Work
Prefix Adder – Challenges
Logical Levels
Wire Tracks
Fanouts
Area
Physical placement
Detail routing
New Design Scope
Timing
Gate Cap
Wire Cap
Gate sizingBuffer insertion
Signal slope
Input arrival time
Output require time
Increasing impact of physical design and concern of power.
Power
Static power
Dynamic power
Power gating
Activity Probability
4
Binary Addition Input: two n-bit binary numbers
and , one bit carry-in Output: n-bit sum and one bit
carry out Prefix Addition: Carry generation &
propagation
011... aaan
011... bbbn 0c
011... sssn
nc
)(
:Propagate:Generate
1
iiii
iiii
iii
iii
bacscpgc
bapbag
5
Prefix Addition – Formulation
iiiiii bapbag Pre-processing:
Post-processing:
Prefix Computation:
iii
iii
cps
cPGc
0]0:[]0:[1
]:1[]:[]:[
]:1[]:[]:[]:[
kjjiki
kjjijiki
PPP
GPGG
6
1234
12:13:14:1
Prefix Adder – Prefix Structure Graph
gpi
pi
G[i:0]
si
biai
GP[i, j] GP[j-1, k]
GP[i, k]
gp generator
sum generator
GP cell
Pre-processing
Post-processing
Prefix Computation
7
Previous Works – Classical prefix adders
12345678
12:13:14:16:17:18:1 5:1
Brent-Kung:Logical levels: 2log2n–1 Max fanouts: 2Wire tracks: 1
12345678
12:13:14:16:17:18:1 5:1
Kogge-Stone:Logical levels: log2nMax fanouts: 2Wire tracks: n/2
12345678
12:13:14:16:17:18:1 5:1
Sklansky:Logical levels: log2nMax fanouts: n/2Wire tracks: 1
8
High-Radix Adders Each cell has more than two fan-
in’s Pros: less logic levels
6 levels (radix-2) vs. 3 levels (radix-4) for 64-bit addition
Cons: larger delay and power in each cell
10
Ling Adders
iiiiii bapbag Pre-processing:
Post-processing:
Prefix Computation:
iii
iii
cps
cPGc
0]0:1[]0:1[
]:1[]:[]:[
]:1[]:[]:[]:[
kjjiki
kjjijiki
PPP
GPGG
Prefix Ling
* *[ 1:0] [ 1:0] 1( )i i i i i is G t G t p
,i i i i i i
i i i
g a b p a bt a b
1*
]1:[1*
]1:[ , iiiiiiii ppPggG* * * *[ : ] [ : ] [ 1: 1] [ 1: ]i k i j i j j kG G P G
*]:1[
*]:[
*]:[ kjjiki PPP
*]0:1[1 iii Gpc
12
Area Model Distinguish physical placement from logical
structure, but keep the bit-slice structure.
Logical view Physical view
Bit position
Logical level
Bit position
Physical level
Compact placement
12345678 12345678
13
Timing Model Cell delay calculation:
pfd Effort Delay Intrinsic Delay
hgf
Logical Effort Electrical Effort = Cout/Cin= (fanouts+wirelength) / size
Intrinsic properties of the cell
14
Power Model Total power consumption:
Dynamic power + Static Power Static power: leakage current of device
Psta = *#cells Dynamic power: current switching
capacitancePdyn = Cload
is the switching probability = j (j is the logical level*)
cellsCjPPP loadstadyntotal # * Vanichayobon S, etc, “Power-speed Trade-off in Parallel Prefix Circuits”
15
ILP Formulation OverviewStructure variables: •GP cells•Connections (wires)•Physical positions
Capacitance variables: •Gate cap•Vertical wire cap•Horizontal wire cap
Timing variables: •Input arrival time•Output arrival time
Power Objective
ILPILOG CPLEX
Optimal Solution
16
Integer Linear Programming (ILP) ILP: Linear Programming with integer
variables. Difficulties and techniques:
Constraints are not linear Linearize using pseudo linear constraints
Search Space too large Reduce search space
Search is slow Add redundant constraints to speedup
17
ILP – Integer Linear Programming
Linear Programming: linear constraints, linear objective, fractional variables.
Integer Linear Programming: Linear Programming with integer variables.
LP Optimal
Constraints
ILP Optimal
ILP – Pseudo-Linear Constraint
Minimize: x3
Subject to: x1 300 x2 500 x3 = min(x1, x2)
Minimize: x3
Subject to: x1 300 x2 500 x3 x1
x3 x2
x3 x1 – 1000 b1 (1)
x3 x2 – 1000 (1 – b1) (2) b1 is binary
Problem: ILP formulation:
LP objective: 0ILP objective: 300
• A constraint is called pseudo-linear if it’s not effective until some integer variables are fixed.
• Pseudo-linear constraints mostly arise from IF/ELSE scenarios• binary decision variables are introduced to indicate true or false.
19
ILP Solver Search Procedure
0Root (all vars are fractional)
b1
b2
b3
b4
2
‘0’ ‘1’infeasible
‘0’ ‘1’
3feasible
(current best)
infeasible
Minimize F(b1, b2, b3, b4, f1,…) bi is binary
It is VERY helpful if ILP objective is close to LP objective
2
‘0’ ‘1’
4
‘0’ ‘1’
3 2
4
55‘0’ ‘1’
Cut
2 Bound (Smallest candidate)
Interval Adjacency Constraint
H1H2H3H4H5H6H7H8
12345678
(7,3): Interval [7,1]
(3,2): Interval [3,1]
(7,2): Interval [7,4]
Must be adjacent,i.e. 4 = 3 + 1
(column id, logic level)
21
Linearization for Interval Adjacency Constraint
(i, j)
(i, h) (k1, l1) (k2, l2)
wl wr1 wr2
],[ ),(),(R
hiL
hi yy ],[ )1,1()1,1(R
lkL
lk yy ],[ )2,2()2,2(R
lkL
lk yy
],[ ),(),(R
jiL
ji yy
11 if 1),(),( (i,j,k,l) wrwl(i,j,h) yy Llk
Rhi
1 if 1),,,(1),(
),( wl(i,j,h) lkjiwrkylk
Rhi
11 ),,,(1),(
),( wl(i,j,h))(nlkjiwrkylk
Rhi
11 ),,,(1),(
),( wl(i,j,h))(nlkjiwrkylk
Rhi
iyLji ),(
Linearize
Pseudo Linear
Left interval bound equal to column index
22
Search Space Reduction Ling’s adder:
separate odd and even bits
Double the bit-width we are able to search H1H2H3H4H5H6H7H8
12345678
23
Redundant Constraints Cell (i,j) is known to have logic level j before
wire connection Assume load is MinLoad (fanout=1 with
minimum wire length):
MinLoadjP ji ),(
Cell (i,j) has a path of length j-1 Assume each cell along the path has MinLoad
)(),( MinLoadLEPDjT ji
26
Min-Power Radix-2 Adder (delay= 22, power = 45.5FO4 )
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
27
Min-Power Radix-2&4 Adder (delay=18, power = 29.75FO4 )
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
Radix-2 Cell Radix-4 Cell
28
Min-Power Mixed-Radix Adder (delay=20, power = 28.0FO4)
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
Radix-2 Cell Radix-4 Cell Radix-3 Cell
29
Experiments – 16-bit Non-uniform Time (Mixed Radix)
ILP is able to handle non-uniform timingsLing adders are most superior in increasing arrival time– faster carries
36
Experiments – 64-bit Hierarchical Structure (Mixed-Radix) Handle high bit-width applications 16x4 and 8x8
ILP Block ILP Block ILP Block ILP Block
ILP Block
a1b1a16b16a17b17a32b32a33b33a48b48a49b49a64b64
…... …... …... …...
Level 1
Level 2
…... …... …... …...
…... …... …... …...
GP*[64:50]GP*[48:34] GP*[32:18] GP*[16:2]
GP*[1:1]GP*[17:17]GP*[33:33]GP*[49:49]
…... …... …...H64 H49 H48 H33 H32 H17 H16 H1
37
Experiments – 64-bit Hierarchical Structure
TSL: a 64-bit high-radix three-stage Ling adderV. Oklobdzija and B. Zeydel, “Energy-Delay Characteristics of CMOS Adders”,in High-Performance Energy-Efficient Microprocessor Design, pp. 147-170, 2006
38
ASIC Implementation - Results 64-bit hierarchical design (mixed-radix)
by ILP vs. fast carry look-ahead adder by Synopsys Design Compiler
TSMC 90nm standard cell library was used
39
Future Work ILP formulation improvement
Expected to handle 32 or 64 bit applications without hierarchical scheme
Optimizing other computer arithmetic modules Comparator, Multiplier
Recommended