EE 587 SoC Design & Test

Preview:

DESCRIPTION

EE 587 SoC Design & Test. Partha Pande School of EECS Washington State University pande@eecs.wsu.edu. System Design Issues. Low Energy FPGA Architecture. Architectural level optimization Level 0 – Nearest Neighbor Level 1 – Mesh Level 2 - Hierarchical. Different Architectures. - PowerPoint PPT Presentation

Citation preview

1

EE 587SoC Design & Test

Partha PandeSchool of EECS

Washington State Universitypande@eecs.wsu.edu

2

System Design Issues

3

Low Energy FPGA Architecture

• Architectural level optimization– Level 0 – Nearest Neighbor– Level 1 – Mesh – Level 2 - Hierarchical

4

Different Architectures

5

Paths in Interconnect

• Connection may be long, complex:

LE LE LE LE LE

LE LE LE LE LE

LE LE LE LE LE

Wiring channel

Wir

ing

chan

nel

6

Interconnect Architecture

• Connections from wiring channels to LEs.• Connections between wires in the wiring channels.

LE LE

Wiring channel

7

Switchbox

channel channel

chan

nel

chan

nel

8

Mesh-based Interconnect Network

Switch BoxRouting of the data

Connect BoxConnects cell I/OsTo the global interconnect

InterconnectPoint

Courtesy Dehon and Wawrzyniek

9

Circuit Level Optimization

• The connecting path from one CLB to another is an RC chain

10

Low Swing Interconnect

Mode E (pj) D (ns) ED

Full Swing 72.3 1.9 137

Low Swing 31.4 2.3 72

11

Low Power SRAM Design

12

Memory Organization

Sense amplifiers/drivers

Column decoder

AK

AK-1

AL-1

Storage cell

Word line

Bit line

Input-Output (M bits)

A0

AK-1

2L-K

M.2K

13

SRAM Cell

bit bit

VDD

Sense amplifier

PC

EQ

Output

BL BL

WL

Prechargecircuit

14

Cell Array Power Management

• Smaller transistors• Low supply voltage• Lower voltage swing (0.1V – 0.3V for SRAM)

– Sense amplifier restores the full voltage swing for outside use.

15

SRAM Cell Design

• 6 transistor SRAM cell reduces static current (leakage) but take more area

• Vth reduction in very low Vdd SRAMs suffer from large leakage current

Use multiple threshold devices:

Memory cell with high Vth (reduce leakage)

Peripheral circuits with low Vth (improve speed)

16

Banked Organization

• Banking targets total switched capacitance to achieve reduced power and improved speed

17

Divided Word Line

• Main idea: Divide each row of RAM cells into segments (blocks), use a decoder to access only one segment

• Only the memory cells in the activated block have their bit line pair driven

18

Divided Word Line

• Pros: Improves speed (by decreasing word line delay) Lower power dissipation (by decreasing the number of bit

line pair activated)• However, local decoders add delay• Less cells/block reduces power, but increases area (more local

decoders)• Chang, 1997:

49.8% power reduction, 14.8% area penalty82.9% power reduction, 24.8% area penalty

19

Reduced Bit Line Swing

• Limit voltage swing on bit lines to improve both speed and power:

1. Pulsed word line

2. Bit line isolation

• Need sense amplifiers for each column to sense/restore signal

20

Pulsed Word Line

• Main idea: Isolate memory cells from the bit lines after sensing, to prevent the cells from changing the bit line voltage further

21

Pulsed Word Line

Q

R S

Reset from dummy sense-amp Word enable

SA Sense Amplifiers

Memory Core

Accessed Row

Dum

my

Col

umn

Wor

d D

river

Wor

d D

ecod

er

22

Pulsed Word Line

• Dummy bit lines reach full swing, but trigger pulse shut off when regular bit lines reach 10% swing

• Generation of word line pulses very critical

– Too long: power efficiency degraded

– Too short: Sense amplifiers operation may fail

• Generation of word line using delay lines is susceptible to process and temperature

23

Bit Line Isolation

• Main idea: Isolate sense amplifiers from bit line after sensing, to prevent from having large voltage swings

24

Row Decoders

Collection of 2M complex logic gatesOrganized in regular and dense fashion

(N)AND Decoder

NOR Decoder

25

Hierarchical Decoders

• • •

• • •

A2A2

A2A3

WL 0

A2A3A2A3A2A3

A3 A3A0A0

A0A1A0A1A0A1A0A1

A1 A1

WL 1

Multi-stage implementation improves performance

NAND decoder usingNAND decoder using2-input pre-decoders2-input pre-decoders

26

Data Retention in SRAM

(A)

1.30u

1.10u

900n

700n

500n

300n

100n

0.00 .600 1.20 1.80

Factor 7

0.13 m CMOSm

0.18 m CMOSm

VDD

I lea

kag

e

SRAM leakage increases with technology scaling

27

Reducing Retention Current

• Turning off unused memory blocks• Increasing the thresholds by using body biasing• Inserting extra resistance in the leakage path• Lowering the supply voltage

28

Suppressing Leakage in SRAM

SRAMcell

SRAMcell

SRAMcell

VDD,int

VDD

VDD VDDL

VSS,int

sleep

sleep

SRAMcell

SRAMcell

SRAMcell

VDD,int

sleep

low-threshold transistor

Reducing the supply voltageReducing the supply voltageInserting Extra ResistanceInserting Extra Resistance

Recommended