22
Improved Clock Improved Clock-Gating Control Scheme Gating Control Scheme for Transparent Pipeline for Transparent Pipeline 1 J.H. Choi J.H. Choi 1 , B.G. Kim , B.G. Kim 2 , A. Dasgupta , A. Dasgupta 3 , and K. Roy , and K. Roy 2 1 SoC Architecture Lab, DMC R&D Center, Samsung Electronics, Korea SoC Architecture Lab, DMC R&D Center, Samsung Electronics, Korea 2 Electrical and Computer Eng., Purdue University, W. Lafayette, IN, USA Electrical and Computer Eng., Purdue University, W. Lafayette, IN, USA 3 SoC Enabling Group, Intel Corporation, Austin, TX, USA SoC Enabling Group, Intel Corporation, Austin, TX, USA

Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent Pipelinefor Transparent Pipeline

1

J.H. ChoiJ.H. Choi11, B.G. Kim, B.G. Kim22, A. Dasgupta, A. Dasgupta33, and K. Roy, and K. Roy22

11SoC Architecture Lab, DMC R&D Center, Samsung Electronics, KoreaSoC Architecture Lab, DMC R&D Center, Samsung Electronics, Korea22Electrical and Computer Eng., Purdue University, W. Lafayette, IN, USAElectrical and Computer Eng., Purdue University, W. Lafayette, IN, USA

33SoC Enabling Group, Intel Corporation, Austin, TX, USASoC Enabling Group, Intel Corporation, Austin, TX, USA

Page 2: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

OutlineOutline

�� IntroductionIntroduction�� Previous WorksPrevious Works�� Proposed ApproachProposed Approach�� Simulation Results Simulation Results �� ConclusionConclusion

2

Page 3: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

IntroductionIntroduction

�� What is the clockWhat is the clock--gating?gating?�� Reduce power by blocking clock pulses to inactive Reduce power by blocking clock pulses to inactive

logic blockslogic blocks�� Used for dynamic power reductionUsed for dynamic power reduction

Applicable in different levels of design hierarchy Applicable in different levels of design hierarchy �� Applicable in different levels of design hierarchy Applicable in different levels of design hierarchy from the individual register level to the system levelfrom the individual register level to the system level

3

Circuit BlockCLK

EN_CLK

Gated_CLK

Page 4: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

StageStage--Level ClockLevel Clock--GatingGating

�� Traditional stageTraditional stage--level clocklevel clock--gatinggating

Input Output

i-th stage n-th stage

FF

EN0 EN1 ENi ENn

4

�� ClockClock--gating registergating register�� EN : clock enable signalEN : clock enable signal�� Clocked whenever Valid=1Clocked whenever Valid=1

FFValid0 Valid1 Validi Validn

FFFF FF

D Q

CLK

EN

Clock-gating Logic

D_out

CLK

D_in

latch

N

gCLKEN Operation

0 Hold (clock-gated)

1 Normal operation

Page 5: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

StageStage--Level ClockLevel Clock--Gating (cont.)Gating (cont.)

�� Traditional stageTraditional stage--level clocklevel clock--gatinggating

��Valid=1 when new data Valid=1 when new data arrives at the first stage.arrives at the first stage.

Input Output

FF FF FF FF FFValid

EN

Time

5

Valid 1 0 0 0 0

Clock C U U U U

Valid 0 1 0 0 0

Clock U C U U U

Valid 0 0 1 0 0

Clock U U C U U

Valid 0 0 0 1 0

Clock U U U C U

Valid 0 0 0 0 1

Clock U U U U C [ C: clocked, U: clock-gated (hold) ]

arrives at the first stage.arrives at the first stage.

��Each stage is clocked Each stage is clocked when its Valid(=EN) is 1.when its Valid(=EN) is 1.

��Each stage is clocked Each stage is clocked one time for one data set.one time for one data set.

Time

(1)

(2)

(3)

(4)

(5)

Page 6: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Transparent PipelineTransparent Pipeline

�� Transparent pipelineTransparent pipeline�� Save clock power by dynamically making registers transparentSave clock power by dynamically making registers transparent�� Intermediate stages are replaced with transparent stages which Intermediate stages are replaced with transparent stages which

can selectively bypass data.can selectively bypass data.

Input Output

[Jacobson, ISLPED’04]

6

�� Example: the 2Example: the 2ndnd stage bypass the data A driven by the 1stage bypass the data A driven by the 1stst stage.stage.

EN0 EN1 ENi ENn

A-

1st 2nd 3rd

A

1st 2nd 3rd

Page 7: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Transparent Pipeline (cont.)Transparent Pipeline (cont.)

Valid 1 0 0 0 0Clock C U U U UValid 0 1 0 0 0Clock U C U U U

Valid 1 0 0 0 0Clock C T T T UValid 0 1 0 0 0Clock U T T T U

DataInput

1st 2nd 3rd 4th 5th

DataInput

1st 2nd 3rd 4th 5th

Time Step

(1)

(2)

7[ C: clocked, U: clock-gated (hold), T: transparent (bypassing) ]

Clock U C U U UValid 0 0 1 0 0Clock U U C U UValid 1 0 0 1 0Clock C U U C UValid 0 1 0 0 1Clock U C U U CValid 0 0 1 0 0Clock U U C U UValid 0 0 0 1 0Clock U U U C UValid 0 0 0 0 1Clock U U U U C

Clock U T T T UValid 0 0 1 0 0Clock U T T T UValid 1 0 0 1 0Clock C T T C UValid 0 1 0 0 1Clock U T T U CValid 0 0 1 0 0Clock U T T U UValid 0 0 0 1 0Clock U T T T UValid 0 0 0 0 1Clock U T T T C

(2)

(3)

(4)

(5)

(6)

(7)

(8)

Page 8: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

�� Adjust pipeline depth with Adjust pipeline depth with transparenttransparent stagesstages�� Normal operation with freq. = FNormal operation with freq. = F

Collapsible PipelineCollapsible Pipeline

Data

[Shimada et al., ISLPED’03]

�� LowLow--power operation (shallow mode) with freq. = F/2power operation (shallow mode) with freq. = F/2

�� Clock power saving in Clock power saving in shallowshallow modemode�� No power reduction in normal operationNo power reduction in normal operation

8

Data

Page 9: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

�� Previous transparent pipeline Previous transparent pipeline [Jacobson, ISLPED’04][Jacobson, ISLPED’04]

�� Limited to two consecutive transparent stagesLimited to two consecutive transparent stages�� Transparent stages need Transparent stages need two separate clock lines two separate clock lines for separate for separate

control of master and slave latch.control of master and slave latch.

ContributionsContributions

DataInput

Flip-flop

�� Proposed approachProposed approach�� Improved control logic for transparent pipelineImproved control logic for transparent pipeline

�� Applicable to any number of pipeline stagesApplicable to any number of pipeline stages�� Easily extended for pipeline collapsingEasily extended for pipeline collapsing

�� Low overhead FF for transparent modeLow overhead FF for transparent mode

9

InputM S

m_clk s_clk

Page 10: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Proposed Transparent PipelineProposed Transparent Pipeline

RT

ith stage

RTRG

Input

RT

2nd stage

RG

Nth stage(N-1)th stage1st stage

Output

10

Valid3 Validi

FF

CLCLReqi-1

CL

FFValid1

FF

EN1

FFValid2

Req1

EN2 B2

Req2

ValidN

FF

ENN

ENi Bi

Validi+1

Reqi

ENN-1 BN-1

Page 11: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Pipeline RegistersPipeline Registers

D Q

CLK

EN

Clock-gating Logic

D_outD_in

latch

N

gCLK

D Q

CLK

D_outD_in

01

N

FF with transparent mode

�� Normal (opaque) stageNormal (opaque) stage �� Transparent stageTransparent stage

11

EN B Operation

0 0 Bypass (transparent)

0 1 Hold (clock-gated)1 1 Normal operation

EN Operation

0 Hold (clock-gated)

1 Normal operation

CLK

EN

CLK

Blatch gCLK

�� EN: Clock enable signalEN: Clock enable signal�� B: Transparent state bitB: Transparent state bit

(0: transparent, 1: opaque)(0: transparent, 1: opaque)

Page 12: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

�� Schematic of static DSchematic of static D--FFFF

Pipeline Registers (cont.)Pipeline Registers (cont.)

CLK

CLK

CLK

CLK

CLKCLK

U3 U6

CLK=0 CLK=1

U1/U6 onU3/U4 off

(M.Latch on)

U1/U6 offU3/U4 on

(S.Latch on)

12(Tri-state buffer)

CLK

CLK

IN OUT IN OUTCLK

CLK

D

QCLK

CLK

CLK

CLK

U1 U2 U4 U5

U7

Master latch Slave latch

Page 13: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Pipeline Register (cont.)Pipeline Register (cont.)

D

CLK

CLK

CLK

CLK

CLKCLK

U3 U6

INOUT

CLK

B

�� DD--FF with transparent modeFF with transparent mode

13

D

QCLK CLK

U1 U2 U4 U5

U7

IN OUTB

BCLK

CLK

INB

CLK

B Operation

0 (CLK=0) Bypass D to Q

1 Normal D-FF

Page 14: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Proposed Transparent PipelineProposed Transparent Pipeline

RT

ith stage

RTRG

Input

RT

2nd stage

RG

Nth stage(N-1)th stage1st stage

Output

14

Valid3 Validi

FF

CLCLReqi-1

CL

FFValid1

FF

EN1

FFValid2

Req1

EN2 B2

Req2

ValidN

FF

ENNENi Bi

Validi+1

Reqi

ENN-1 BN-1

�� Criteria for correct functionality Criteria for correct functionality (to avoid race conditions):(to avoid race conditions):At least one opaque stage between two different data setsAt least one opaque stage between two different data sets

�� ReqReq signalsignal: Generated when a stage needs to accept new data set, : Generated when a stage needs to accept new data set, to notify that other stage in the downstream should take the data.to notify that other stage in the downstream should take the data.

Page 15: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

�� Control logic for transparent stage registersControl logic for transparent stage registers

Control LogicControl Logic

Req_prev Req_next

Valid EN

B_curr B_next

ReqiReqi-1

Validi ENi

Clock-gating Control Logic (CL)

(to the next stage)(from the previous stage)

15

B_curr B_next

FF

Bi

Inputs Outputs

B_curr: current state (0: transparent) B_next: next state

Valid: data valid bit EN: clock enable signal

Req_prev: from the prev stage Req_next: to the next stage

Control signalsfor data registers

Page 16: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Control Logic (cont.)Control Logic (cont.)

ValidReq_prev

B_next

EN

Inputs Outputs

B_curr Valid Req_prev B_next EN Req_next

0 0 0 0 0 0

1 0 0 1 0 0

�� Truth table and gateTruth table and gate--level implementationlevel implementation

16

Valid

Req_prev

B_curr

Req_next0

1

ValidB_curr

B_next1 0 0 1 0 0

0 0 1 0 0 1

0 1 1 1 1 0

1 1 1 1 1 1

0 1 0 0 0 0

1 1 0 0 0 1

1 0 1 x (1) x (0) x (1)

Transparent stage is clocked (EN=1) when both Valid and Req_prev are 1.

Page 17: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Control Logic (cont.)Control Logic (cont.)

�� Working example in 5Working example in 5--stage pipelinestage pipeline

Timestep

1st (opaque)

2nd(trans)

3rd (trans)

4th (trans)

5th (opaque)

V E R+ V B- B+ E R+ V B- B+ E R+ V B- B+ E R+ V E R+

(1) 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0

1st 2nd 3rd 4th 5th

A

A-

(1)

(2)

17

(1) 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0

(2) 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

(3) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

(4) 1 1 1 0 0 0 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0

(5) 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1

(6) 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0

(7) 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0

(8) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

[ V: Valid, E: Enable, B-: B_curr, B+: B_next, R+: Req_next ]

-

B- A

A-

AB

(2)

(3)

(4)

(5)

Page 18: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Control Logic (cont.)Control Logic (cont.)

�� Extension for collapsible pipelineExtension for collapsible pipeline

Inputs Outputs

Collapse B_curr Valid Req_prev B_next EN Req_next

0 Same as previous Same as previous

18

1 x x x 0 0 Req_prev

DataInput

1st 2nd 3rd 4th 5th

Collapse Collapse

Validi Validi+1

FF

Reqi-1 Reqi

ENi

FF

Bi

01

CL

Collapse

CollapseCL

5 stages in normal mode3 stages in shallow mode

Page 19: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

�� Comparison with the previous transparent pipeline Comparison with the previous transparent pipeline technique technique [Jacobson, ISLPED’04][Jacobson, ISLPED’04]

�� VddVdd=1.2V, f=500MHz, IBM 90nm technology=1.2V, f=500MHz, IBM 90nm technology

�� Pipeline utilization varies from 0.1 to 0.9.Pipeline utilization varies from 0.1 to 0.9.

�� 6464--bit 4bit 4--stage pipelinestage pipeline

Simulation ResultsSimulation Results

1600

Clo

ck p

ow

er c

on

sum

pti

on

�� 6464--bit 4bit 4--stage pipelinestage pipeline

19

0

200

400

600

800

1000

1200

1400

1600

0 0.2 0.4 0.6 0.8 1

Clo

ck p

ow

er c

on

sum

pti

on

(u

W)

Pipeline Utilization

Previous approach

Our approach

DataInput

1st 2nd 3rd 4th

Page 20: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Simulation Results (cont.)Simulation Results (cont.)

200

250

Clo

ck p

ow

er s

avin

gs

(uW

) 64-bit (7 stages)

64-bit (5 stages)

32-bit (7 stages)200

250

Clo

ck p

ow

er s

avin

gs

(uW

) 4 stages

5 stages

6 stages

�� PPowerower saving over traditional stagesaving over traditional stage--level clocklevel clock--gatinggating�� Various pipeline depth and width are considered.Various pipeline depth and width are considered.

�� Power overhead of control logics is included.Power overhead of control logics is included.

64-bit

20

-50

0

50

100

150

0 0.2 0.4 0.6 0.8 1Clo

ck p

ow

er s

avin

gs

(uW

)

Pipeline Utilization

32-bit (5 stages)

-50

0

50

100

150

0 0.2 0.4 0.6 0.8 1Clo

ck p

ow

er s

avin

gs

(uW

)

Pipeline Utilization

6 stages

7 stages

�� More power saving from wider and deeper pipelinesMore power saving from wider and deeper pipelines�� At high utilization (0.9), the proposed approach consumes At high utilization (0.9), the proposed approach consumes more power due to control overhead and increased glitch.more power due to control overhead and increased glitch.

Page 21: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

Simulation Results (cont.)Simulation Results (cont.)

2500

3000

Clo

ck p

ow

er c

on

sum

pti

on

(u

W)

Normal mode (f=500MHz)

�� Extension to collapsible pipelineExtension to collapsible pipeline�� 6464--bit 5bit 5--stage pipeline (stage pipeline (2nd and 4th stages are collapsible)2nd and 4th stages are collapsible)

�� ffnormalnormal=500MHz, =500MHz, ffshallowshallow=250MHz=250MHz

Data

1st 2nd 3rd 4th 5th

21

0

500

1000

1500

2000

2500

0 0.2 0.4 0.6 0.8 1

Clo

ck p

ow

er c

on

sum

pti

on

(u

W)

Pipeline Utilization

Shallow mode (f=250MHz)DataInput

Collapse Collapse

�� In shallow mode, clock In shallow mode, clock power decreases to ~1/3 power decreases to ~1/3 of normal mode.of normal mode.

Page 22: Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme … · 2010-07-12 · Improved ClockImproved Clock--Gating Control Scheme Gating Control Scheme for Transparent

SummarySummary

�� ClockClock--gating technique for dynamic power gating technique for dynamic power reductionreduction�� Compact control logic for transparent pipelineCompact control logic for transparent pipeline�� Applicable to any number of pipeline stagesApplicable to any number of pipeline stages�� Transparent FF with low hardware overheadTransparent FF with low hardware overhead�� Energy/performance tradeEnergy/performance trade--off through pipeline stage off through pipeline stage

collapsingcollapsing

22