27
Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification Alok Garg, M. W. Rashid, and Michael Huang Department of Electrical & Computer Engineering University of Rochester

Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

  • Upload
    africa

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification. Alok Garg, M. W. Rashid, and Michael Huang Department of Electrical & Computer Engineering University of Rochester. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

Alok Garg, M. W. Rashid, and Michael Huang

Department of Electrical & Computer EngineeringUniversity of Rochester

Page 2: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 2

Motivation

Out-of-order execution needs efficient memory dependence enforcement logic

Conventional approach – complex, hard to scale Tightly coupled forwarding and enforcement

We use two decoupled components to simplify the task Opportunistic forwarding using L0 cache Verification against in-order re-execution Slackened memory dependence enforcement (SMDE)

Page 3: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 3

LSQ: complex & hard to scale

Needs priority CAMs

Forwarding from LSQ on timing critical path Serialized with address translation

Design further complicated by Coherence and consistency considerations Corner cases: e.g., partial overlap of operands

Page 4: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 4

Highlights of prior work

Two-level load store queue [sethumadhavan03], [akkary03], [baugh04], [roth04], [torres05], [gandhi05]

Reducing search frequency using clever filtering and prediction mechanism [park03], [sethumadhavan03]

Memory dependence prediction [moshovos.isca97], [moshovos.micro97], [sha05], [stone05]

Value based re-execution [cain04], [roth04], [sha05]

(more detailed contrast in paper)

Page 5: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 5

Outline

Overview of SMDE Optional performance optimizations Evaluation Conclusion

Page 6: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 6

Overview of SMDE

Page 7: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 7

Decoupled execution

LSQ: competing requirements Front-end execution: little mem dependence enforcement Back-end execution: detect violations (mem access only) Memory B/W: naturally handled

Fetch/Decode/Dispatch Execution(out-of-order) Commit

L1

LSQ

MemoryHierarchy

L0

Front-endexecution

Back-endexecution

MUX

Page 8: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 8

Why it works – two perspectives Back-end execution is the only one required

Totally in-order, preserving dependence Any front-end execution is OK L0 effectively a slow but accurate value predictor

Front-end execution correct most of the time Common case: 99% of loads happen at right time Speculation is on timing of load store pairs

Two-level LSQ speculate on the scope of stores Relatively expensive replays OK

Page 9: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 9

Advantage – simplicity

No priority CAM Decoupled design – flexible, modular Front end – large degree of freedom

No need for address translation Soft errors can be ignored (ECC not needed) Corner cases – handle partial overlaps naturally Can ignore coherence invalidations

Page 10: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 10

Performance of naïve designLQ: 64SQ: 48

Page 11: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 11

Optional performance optimizations

Page 12: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 12

Reducing replay frequency

Major replay cause – RAW violations 48% replays due to RAW violation Replays indirectly cause more replays Often address available (data is not)

Fuzzy disambiguation queue (FDQ) Reject known premature loads

Best effort enough, no need to guarantee anything Conventional LSQ handles this (e.g., POWER 4)

Page 13: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 13

FDQ: How it works

Address AGE

Address AGE

Address AGE

Address AGE

Address AGE

F uzzy

D isambiguation

Q ueue

ROBLDST

1 2 3 4 5 6

Address 2

Old New

Page 14: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 14

FDQ not complex Very different from conventional SQ

Does not have priority logic No need to merge with cache data path Small queue is sufficient – no scalability pressure

Stores do not stay in FDQ for the entire lifetime Flexible replacement

A “local” technique Only support needed load rejection No need to augment issue logic to enforce predicted

dependence

Page 15: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 15

Write buffer at the back-end

Temporarily holds not yet committed stores

Allow back-end execution of loads and stores to start early

A few entries sufficient to streamline back-end execution

Page 16: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 16

Evaluation

Page 17: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 17

Evaluation environment Simulator strives to model SMDE very faithfully

Load speculation, load rejection, and store-load replay Data value in the caches Scheduling replays Do not allocate load queue entry for pre-fetches

SPEC CPU2000 benchmark suite System configuration

ROB/Register (INT, FP) – 512/(400,400) LSQ (LQ, SQ) – 112 (64, 48) L0 speculative cache – 16KB, 2-way, 1 cycle

Page 18: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 18

Impact of 8-entry Write buffer

Page 19: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 19

Replay frequency reduction

(a) Integer applications.

(b) Floating-point applications.

Page 20: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 20

Replay breakdown

Page 21: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 21

Performance improvement

Page 22: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 22

Scalability test

Memory dependence logic unchangedROB, RFs, IQs doubled

Page 23: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 23

Other details in paper

Scope of replay Detailed study on replay causes Replay suppression technique Age based filtering Discussion on L0 flush policy Understanding write buffer Membership test for write buffer

* “Implementation Issues of Slackened Memory Dependence Enforcement”, A. Garg, M. Rashid, and M. Huang, Technical Report.

Page 24: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 24

Conclusions

Common-case forwarding and correctness guarantee separately handled

Decoupled execution allows modular design, verification, and optimization

Forwarding logic is simple to design and incurs minimal interference on execution

Scales very well

Can achieve close to ideal performance

Page 25: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

Alok Garg, M. W. Rashid, and Michael Huang

Department of Electrical & Computer EngineeringUniversity of Rochester

Link to technical report: http://www.ece.rochester.edu/~garg/documents/isca06tr.pdf

Page 26: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 26

Streamlining back-end execution

ROB

1 2 3 4 5 6 7

Cycles

Age –old to new 2

ST

LD

3

1

ST

1

2

LD LD LD

3Reload

Verificationcommit

Bubble

Page 27: Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

6/20/2006 "Slackened Memory Dependence Enforcement", Alok Garg, ISCA 2006 27

Streamlining back-end execution

ROB

1 2 3 4 5 6 7

Cycles

Age –old to new 2

ST

LD

3

1

WB

1

2

RL RL LD

3

Insert write buffer at the commit stage

CT