28
Word-Size Optimization for Low Energy, Variable Workload Sub-threshold Systems Sudhanshu Khanna, Anurag Nigam ECE 632 – Fall 2008 University of Virginia <sk4fs, an2z>@virginia.edu

Word-Size Optimization for Low Energy, Variable Workload Sub-threshold Systems

  • Upload
    woody

  • View
    36

  • Download
    3

Embed Size (px)

DESCRIPTION

Word-Size Optimization for Low Energy, Variable Workload Sub-threshold Systems. Sudhanshu Khanna, Anurag Nigam ECE 632 – Fall 2008 University of Virginia @virginia.edu. Introduction. Energy constrained Sub-Vt systems Medical devices Environmental sensors - PowerPoint PPT Presentation

Citation preview

Page 1: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Word-Size Optimization for Low Energy, Variable Workload

Sub-threshold Systems

Sudhanshu Khanna, Anurag Nigam

ECE 632 – Fall 2008University of Virginia

<sk4fs, an2z>@virginia.edu

Page 2: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

– Energy constrained Sub-Vt systems• Medical devices• Environmental sensors

– Need to lower E in order to enable “lifelong” operation

– SMALL “FORM-FACTOR” => Area Reduction

– Total E = Active E + Sleep E

Introduction

Page 3: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Top Level Problems Addressed

• Energy Reduction– Active

–Sleep Mode

• Area Reduction

• Adaptation of Super-threshold designs to sub-threshold

Page 4: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Current Approaches

Voltage Regulated from THIS off-chip, (expensive) DC-DC converter

Ref: K.Craig, R.Matthews, EE632 Fall 2008

Page 5: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Our approach

Make the “starting point” design more E-efficient, Specifically for Sleep Mode operation

Page 6: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Sure way of lowering CV2 : Lower V => Sub-threshold

1.2V 0.2V

Logic System

Logic System

Can we optimize the Logic system for sub-Vt operation, or should

it be the same

Page 7: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Sure way of lowering CV2 : Lower V => Sub-threshold

1.2V

0.2V

Logic System

Smaller Logic

System

Make the system as small as feasible.

Use it over and over till the required operation is done.

Then goto sleep and leak less !!

How do we make the system smaller: USE A SMALLER WORD-SIZE

Will using the SMALL system over and over increase the ACTIVE Energy???

Page 8: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Smaller Word-Size: Problems Addressed

• For Sure, small word-size means:– Lower Area– Lower Sleep Energy– Higher Delay

• We need to find:– How much is the Area/Sleep E benefit ?– Impact of multi-cycle operation on Active E ??– Can we somehow make them faster without

losing the Sleep E and Area advantage ???

Page 9: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Smaller Word-Size: Our Contribution

• For Sure, small word-size means:– Lower Area– Lower Sleep Energy– Higher Delay

• We need to find:– How much is the Area/Sleep

E benefit ?

– Impact of multi-cycle operation on Active E ??

– Can we somehow make them faster without losing the Sleep E and Area advantage ???

> 20x area benefit

> 33x sleep energy benefit

Multi-cycle operation increases Active E

But the final value of the Active E is about the same/lesser than that

of a 32-bit system.

Yes, delay degradation can be overcome !!! while still being

more energy efficient

Page 10: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Systems Compared

• Addition of two 32-bit numbers using:– Large word-size (32-bit)

• Kogge-Stone Adder• Ripple Carry Adder• Full-Adder

– Small word-size (1-bit)

• 1-bit taken for simplicity, the trends would be valid for other word-sizes e.g. 16-bit, 8-bit etc.

• Addition is taken as a sample digital function. However, trends founds can be generalized to other digital functions as well.

Page 11: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

32-bit Kogge-Stone Adder (KSA), 32-bit Ripple Carry Adder (RCA)

32 Bit Register

32 Bit Register

32 Bit Register32 Bit KSA or RCA

PA

PB

Reset

Reset

CLK

CLK

CLK

PA = Parallel input A

PB = Parallel input B

OUT = Parallel output from Sum Register

32 Bit

32 Bit

32 Bit

OUT

Page 12: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Small-Word Size system

n-bit Full Addern-Bit Register n-Bit Register

n-Bit Register

CLK

In general, an n-bit word system will have n-bit operands

Let the smaller word-size be n. Then the system will look like this:

Just like a 32-bit system, but only smaller!

n < 32

In case n = 1, the system will take 32 clock cycles to add two 32-bit numbers. Hence the higher delay.

1-bit Full Adder1-Bit Register 1-Bit Register

1-Bit Register

CLK

n = 1

1-bit Serial Adder (SA)

Page 13: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Serial ADC 1-bit Full Adder

1-Bit RegisterSerial DAC

Serial Multiplier

CLK

Analog Input

Analog Output

CLK

1-Bit Register 1-Bit Register

1-Bit Register

1-Bit Register

1-bit input from other part of chip

1-bit input from other part of chip

Simulated 1-bit SA

A conceptual fully-serial 1-bit system

Page 14: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

32-bit Serial Adder (SA)using Full-Adder

32 Bit Shift Register

32 Bit Shift Register

32 Bit Shift Register1 Bit Full Adder

Carry Flip Flop

PA

PB

CLK

CLK

CLK

Cin

Cout

Regular 32-bit word system,

But parallel adder replaced by 1-bit full adder => LOWER SLEEP ENERGY

Takes 32 cycles but is amenable for use in a an un-modified 32-bit word system

1 Bit

1 Bit

1 Bit

OUT

Page 15: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

• Energy drawn for addition of two 32-bit numbers is measured for all the 4 systems:– 32-bit KSA– 32-bit RCA– 32-bit SA

– 1-bit SA

• Clock and register power taken into account

Important Metric: Energy per operation

Large word-size systems

Small word-size system

Page 16: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Active Energy @ VDD = 300mV

0.00

1.00

2.00

1-bit SA90nm

32-bitKSA90nm

32-bitRCA90nm

32-bitSA

90nm

1-bit SA22nm

32-bitKSA22nm

32-bitRCA22nm

32-bitSA

22nm

E (

pJ)

ElkgEdyn

HIGH Edyn ~ Etot ~ 6pJ

But leakage current is 1.7x

lower

Shows that active energy of 1-bit system < 32-bit systems

40% active energy benefit @ 22nm

33x reduction in leakage current (note that above plot is only showing active energy)

Page 17: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Conclusions @ 300mV

• 1-bit SA has 40% lower active E than the best 32-bit system

• 1-bit SA has 33x lesser leakage current than the best 32-bit system

• 32-bit SA has 1.7x lesser leakage current than 32-bit KSA

Thus multi-cycle operation doesn’t increase active energy too much

Hence once sleep time is added, benefits of small-word systems will increase

Hence once sleep time is added, benefits of small-word systems will increase

=> if word-size limited to 32, serial addition will save energy if the application has lot of sleep time e.g. in sensor nodes !!!

=> if word-size limited to 32, serial addition will save energy if the application has lot of sleep time e.g. in sensor nodes

Hence once sleep time is added, benefits of small-word systems will increase

Hence once sleep time is added, benefits of small-word systems will increase

Page 18: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Logic System

small word

VDD incs => delay decs• Can be used to make small-word size systems

faster !!!• But, impact of the VDD increase on Energy ???

0.4V

1.2V

0.2V

Logic System

Logic System

0.2V

Already compared

Logic System

small word

Page 19: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Energy @ constant delay

• Delay is equal

• Now we compare energy at constant delay

Small word-size more energy efficient even after the VDD increase

But the margins of energy benefits do go down

The same is not true in super-Vt ! WHY???

Difference in On-Current Equation in super-Vt and sub-Vt

0.2V

Logic System

0.4V

Logic System

small word

Page 20: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

1.00E+01

1.00E+03

1.00E+05

1.00E+07

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

VDD (in V)

Del

ay (

in p

S)

SMALL SLOPE

LARGE SLOPE

1.00E+01

1.00E+03

1.00E+05

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

VDD (in V)

En

erg

y (i

n p

J)

SMALL SLOPE

LARGE SLOPE

Sub-Vt Super-Vt

VDD change => no impact on E !!

Page 21: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Pareto-Optimal E-D Curve

1.E+00

1.E+02

1.E+04

1.E+06

1.E+08

1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05

Delay

En

erg

y

32-bit KSA

1-bit SA

Super-Vt -> 32-bit system is pareto-optimal

Sub-Vt -> 1-bit system is pareto-optimal

Cross-over: 1-bit system becoming optimal

Super-Vt Sub-Vt

Page 22: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Generality of Trends• 1-bit system is used as an example. Energy and

area benefits will be achieved in any small word-size system.

• Shift in pareto-optimal curve happens because of difference in Ion equation.

• Hence this behavior can be observed in other parts of a digital system as well, and not just addition.

Opens energy saving opportunities in more areas of digital design

Page 23: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Logic System

small word

Conclusions @ constant delay• While going into sub-Vt operation, re-look the word-size

of the system being used.

• Optimal word-size goes down: Small word size gives lower E and Area and matches delay

0.2V

Logic System

0.4V

Energy less

Leakage less

Area ($$$) less

Delay Same

Page 24: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Different Word-Size Systems•1-bit ( Digital Audio System – Sharp)

• 4-bit ( Marc4 Micro controller, Intel 4040)

• 8-bit ( Micro controllers, Intel 8080 processor)

• 16-bit ( Intel 8086 processor)

• 64-bit ( Athlon 64, Opteron processor)

Page 25: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

FIR Filter

• Used in many real time DSP systems ( audio, video processing)

4-Tap FIR Filter

3

0

)()()(i

inXiKnY

K(i): Filter Coefficients

• Serial Implementation of a Parallel FIR filter

Page 26: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Delay Delay Delay

Multiplier Multiplier Multiplier Multiplier

4-inputParallel Adder

X(n) X(n-2)X(n-1) X(n-3)

K0 K3K2K1

Y(n)

K0 , K1 ,K2 ,K3 : Filter Coefficients

Stored in memory

Page 27: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

Serial Parallel Multiplier

1-bit Serial Adder

Register

Y(n)

Filter Coefficients

(K3, K2, K1, K0)

X(n): serial input data

Serial output

From memory

Page 28: Word-Size Optimization for  Low Energy, Variable Workload  Sub-threshold Systems

QUESTIONS