23
Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom Integrated Circuits Conference September 21, 2011 1

Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Y. Shakhsheer, S. Khanna, K. Craig,

S. Arrabi, J. Lach, and B. H. Calhoun

University of Virginia, Charlottesville, VA

Custom Integrated Circuits Conference

September 21, 2011

1

Page 2: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Motivation

2

Portable applications require extended battery life & small form factor

Electronics need high performance for a fraction of their life

High performance struggles with power density

Page 3: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Classic Dynamic Voltage Scaling (DVS)

Adjust VDD to adjust speed to match workload

Usually chip or core level

Use DC-DC converters to adjust VDD = Slow, infrequent VDD transitions

Constrained by block with highest workload

Need for finer granularity in space and time

3

Page 4: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Panoptic DVS (PDVS) Structure

4

Single-VDD All components share one VDD

Multi-VDD Each component statically

tied to a VDD

PDVS PMOS header switches used

to select a specific VDD

Small set of voltage rails (2-4)

Uses common components

Fine temporal granularity

Fine spatial granularity

[1] Putic, M. et al., ICCD, pp.491-497, 01/10/2009.[2] Di, L. et al., ICCD, pp.605-611, 08/2008.

Page 5: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Contributions

5

This work:

Demonstrates DSP Processor using PDVS

Demonstrates single clock cycle VDD-switching & VDD dithering for near optimal energy scalability

Demonstrates switch efficiently between high performance DVS and subthreshold modes.

Demonstrates energy savings compared to single-VDD and multi-VDD alternatives.

Page 6: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Outline

Chip Architecture

Measured Results

Subthreshold Mode

Conclusion

6

Page 7: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Chip Architecture

7

32b Data Flow Processor• 4 Kogge Stone Adders• 4 Baugh Wooley Multipliers• PMOS Header • Level Converters

Execute arbitrary data flow graphs (DFGs)

32 Kb Data Memory

40 Kb Instruction Memory

Register File

160

32kb Data

Memory

40 kb Instruction

Memory

Control

VDDH VDDM VDDL

*

x4

LC

VDDH VDDM VDDL

+

x4

x8 General Purpose

32b

Coefficients x15

32b

Register Bank

Cro

ssbar

32

PDVS data path

Multi-VDD data path

Single-VDD data path

Sub-VT PDVS data path

VDDH VDDM VDDL

+

+

+

VDDH

+

+

e.g.

e.g.

Fig. 2. Block diagram of the PDVS data flow processor. SRAMs and control

serve four data paths for direct comparison of PDVS with SVDD & MVDD.

Page 8: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

2-stage Pipelined SRAM

8

During the first phase of a conventional SRAM read access when row decode occurs, the SA lies idle

In our scheme, in the first cycle, row decode and BL droop development is completed

At the beginning of the second cycle, SA enable signal occurs

Col Mux acts as pipeline register

Helps lower the cycle time by TSA

23% reduction of the cycle time

WL

SAE

WL

BL

TDEC

TBC

TSA

Col Mux

WL

Drivers

SAs

CLK

WL

SAE

SA Output

BL

TDEC

TBC

TSA

C1 C2

Page 9: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Chip Vitals

9

PDVS MVDD Sub VT SVDD

Inst

Memory

Data

Memory

VCO & Inst

Block

3.3mm

Mult

Adder

Headers

for the

mult

Headers

for the

adder

4.3mm

Feature This Chip

Process 90nm CMOS Bulk w/ Dual VT

Area 4.3mm x 3.3mm

TransistorCount ~2 million

VDD 250mV – 1.2V

SRAMs 40kb & 32kb

Page 10: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Outline

Chip Architecture

Measured Results

Subthreshold Mode

Conclusion

10

Page 11: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Dithering

11

Page 12: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

System Results

12

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

SVDD

MVDD

PDVS

Measu

red

En

ergy

Savin

gs

Measu

red

En

ergy

Savin

gs

Norm

ali

zed

En

ergy

Workload

Page 13: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

System Results Cont’d

13

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

SVDD

MVDD

PDVS

Measu

red

En

ergy

Savin

gs

Measu

red

En

ergy

Savin

gs

Norm

ali

zed

En

ergy

Workload

Page 14: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

System Results Cont’d

14

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

SVDD

MVDD

PDVS

Introducing slack increases savingsM

easu

red

En

ergy

Savin

gs

Are

a S

avin

gs

Norm

ali

zed

En

ergy

Workload

Page 15: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Area Savings vs MVDD

15

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

SVDD

MVDD

PDVS

PDVS area savings are a result of reducing the number of copies

Measu

red

En

ergy

Savin

gs

Are

a S

avin

gs

Norm

ali

zed

En

ergy

Workload

Page 16: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Measured Dithering

16

Time

Page 17: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Overheads

17

32b Adder 32b Mult.

Header Area

2.4% 1.7%

Level Conv. Delay

32.0% 2.0%

Level Conv. Energy

8.0% 0.3%

Level Conv. Area

11.4% 2.1%

Sw. Delay 10.4% 12.0%

Sw. Energy 215.3% 35.0%

BreakevenCycles (NBE)

< 4 < 1

switch

LowHigh

BEE

EEN

Overh

ead

vs

1 M

UL

T @

VD

DH

VDDL (V)

Overh

ead

vs

1 M

UL

T @

VD

DH

VDDL (V)

Multiplier level converter

Multiplier switching

Page 18: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Outline

Chip Architecture

Measured Results

Subthreshold Mode

Conclusion

18

Page 19: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Subthreshold Operation

19

Chip provides subthreshold mode E.g. VDDH/M/L @ 1V, 0.5V 0.25V

Minimum energy operating point

50 1000.2

0.4

0.6

0.8

1

1.2

Vir

tual V

DD

(V)

Time

Page 20: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Subthreshold Circuit Optimizations

20

Added PMOS headers to: Register bank

Crossbar

Tied circuit & Sub-VT

header body connection to Virtual VDD

Optimized level converter for subthreshold Converts from 0.25 up to 1.0V

Added bypass for these level converters

Out

Bypass

Bypass

In

VDDH VDDM VSUBVT

32b

Register Bank

Cro

ssb

ar

VDDH VSUBVT

Sub-VT data path

components

VSUBVT

VDDH

High VT

Out

Bypass

Bypass

In

VDDH VDDM VSUBVT

32b

Register Bank

Cro

ssb

ar

VDDH VSUBVT

Sub-VT data path

components

VSUBVT

VDDH

High VT

Page 21: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Outline

Chip Architecture

Measured Results

Subthreshold Mode

Conclusion

21

Page 22: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Conclusion

22

First processor using PDVS Demonstrate single clock

cycle VDD-switching & VDDdithering for near optimal energy scalability

Switch efficiently between high performance DVS and subthreshold modes.

Energy savings up to 50% and 46% compared to single-VDD and multi-VDD alternatives.

FeatureHoward

ISSCC10

Truong

VLSI08

Nam

ISSCC07

This

work

VDD

Granularity6 cores 1 core 1 core

Add,

Mult

Speed of

VDD change

>10µs

(e.g.[4])2-5ns

>10µs

(e.g. [4])<2ns

VDD

ditheringNo No No Yes

Sub-

thresholdNo No No Yes

[1] J. Howard et al., ISSCC , pp. 22-33, 2010.[2] D. Truong et al., Symp. VLSI, pp.22-23, 2008.[3] B. Nam et al., ISSCC, pp.278-603, 2007.[4] C. Zheng and D. Ma, ISSCC, pp.204-205, 2010.

Page 23: Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach ...Y. Shakhsheer, S. Khanna, K. Craig, S. Arrabi, J. Lach, and B. H. Calhoun University of Virginia, Charlottesville, VA Custom

Thank you!

Any Questions?

23