26
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto {pakl, moshovos}@eecg.toronto.edu

TurboROB A Low Cost Checkpoint/Restore Accelerator

Embed Size (px)

DESCRIPTION

TurboROB A Low Cost Checkpoint/Restore Accelerator. Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto { pakl , moshovos}@eecg.toronto.edu. Recovering From Control Flow Mispredictions. Execution Timeline. - PowerPoint PPT Presentation

Citation preview

Page 1: TurboROB A Low Cost Checkpoint/Restore Accelerator

1/25HIPEAC 2008 TurboROB

TurboROBA Low Cost Checkpoint/Restore

Accelerator

Patrick Akl and Andreas Moshovos

AENAO Research GroupDepartment of Electrical and Computer Engineering

University of Toronto

{pakl, moshovos}@eecg.toronto.edu

Page 2: TurboROB A Low Cost Checkpoint/Restore Accelerator

2/25HIPEAC 2008 TurboROB

• Accelerate Recovery – Improve Performance

Recovering From Control Flow MispredictionsExecution Timeline

Misprediction

Discovered Recover Processor

State

Redirect Fetch

Resume

Execution

Predict a Branch Outcome

Pre

dic

ted

Pat

h

Co

rrec

t P

ath

Page 3: TurboROB A Low Cost Checkpoint/Restore Accelerator

3/25HIPEAC 2008 TurboROB

State-of-the-Art Recovery

Misprediction

Discovered

Predict a Branch Outcome

what old value

Log of Changes

RO

B

State Snapshot

• Scalability and/or Performance Issues

Page 4: TurboROB A Low Cost Checkpoint/Restore Accelerator

4/25HIPEAC 2008 TurboROB

• Make common case fast: – Recover only at branches

• Store only as much as needed: – Partial Log

Turbo-ROB

Misprediction

Discovered

Predict a Branch Outcome

Log of Changes

RO

B

Partial Log of Changes

Page 5: TurboROB A Low Cost Checkpoint/Restore Accelerator

5/25HIPEAC 2008 TurboROB

Outline

• Control Flow Mispeculation Recovery

• TurboROB

• Methodology and Results

• Summary

Page 6: TurboROB A Low Cost Checkpoint/Restore Accelerator

6/25HIPEAC 2008 TurboROB

State Recovery Example: Register Alias Table

RAT

ArchitecturalRegister

PhysicalRegister

# a

rch

. re

gs

Lg(# arch. regs)

A add r1, r2, 100B breq r1, EC sub r1, r2, r2

Original Code

A add p4, p2, 100B breq p4, EC sub r5, p2, p2

Renamed Code

p1

p2

p3

p4p5p5p4

Page 7: TurboROB A Low Cost Checkpoint/Restore Accelerator

7/25HIPEAC 2008 TurboROB

ROB: Slow, Fine-Grain Recovery

• Too slow: recovery latency proportional to number of instructions to squash

Reorder

BufferB B B BB

1. Misprediction discovered2. Locate newest instruction

3. Undo RAT updates in reverse order

Program Order

RATINVALID

Each entry contains

1. Architectural destination register

2. Its previous RAT map

Page 8: TurboROB A Low Cost Checkpoint/Restore Accelerator

8/25HIPEAC 2008 TurboROB

Global Checkpoints: Fast, Coarse-Grain Recovery

• Branch w/ GC: Recovery is “Instantaneous”

Reorder

BufferB B B BB

1. Misprediction discovered

Program Order

RATINVALID

checkpointcheckpointcheckpointcheckpoint

Page 9: TurboROB A Low Cost Checkpoint/Restore Accelerator

9/25HIPEAC 2008 TurboROB

Impact of More Checkpoints

• More checkpoints ?– Power hungry structure

– Increased delay

• Only a few checkpoints can practically be implemented– Cannot always cover all branches

architecturalregister

physical register

Actual Implementation

Working Copy chec

kpoint

sRAT

Concept

Page 10: TurboROB A Low Cost Checkpoint/Restore Accelerator

10/25HIPEAC 2008 TurboROB

Intelligent Checkpointing & BranchTap

• Use Few Checkpoints Effectively

• BranchTap:– Throttle Speculation

B B B BB

checkpointcheckpointcheckpointcheckpoint

Page 11: TurboROB A Low Cost Checkpoint/Restore Accelerator

11/25HIPEAC 2008 TurboROB

Conventional Mechanisms: Recovery Scenarios

BBB

BBB

checkpoint

checkpoint

BBB

checkpoint

Re-Execution

Page 12: TurboROB A Low Cost Checkpoint/Restore Accelerator

12/25HIPEAC 2008 TurboROB

Outline

• Background

• Turbo-ROB

• Methodology and Results

• Summary

Page 13: TurboROB A Low Cost Checkpoint/Restore Accelerator

13/25HIPEAC 2008 TurboROB

Turbo-ROB

We only need to reverse the first subsequent change

for every RAT entry

ROB Recovery B R1 R1

useful redundant

~ Recovery Cost

R2 R2 R1

Page 14: TurboROB A Low Cost Checkpoint/Restore Accelerator

14/25HIPEAC 2008 TurboROB

Turbo-ROB Replacing the ROB

BBB

TROB

BBB

TROB

Re-Execution

Page 15: TurboROB A Low Cost Checkpoint/Restore Accelerator

15/25HIPEAC 2008 TurboROB

Selective Turbo-ROB w/ ROB

BBB

TROB

Selective Turbo-ROB w/ GCs

BBB

TROB

checkpoint

Page 16: TurboROB A Low Cost Checkpoint/Restore Accelerator

16/25HIPEAC 2008 TurboROB

Outline

• Background

• TurboROB

• Methodology and Results

• Summary

Page 17: TurboROB A Low Cost Checkpoint/Restore Accelerator

17/25HIPEAC 2008 TurboROB

Results Overview

• TROB as an ROB replacement– BranchTap offers better performance than ROB– Fewer resources– Even for smaller windows

• Selective TROB as a GC reduction mechanism– TROB reduces pressure for GCs– Offload a critical structure: RAT

• In the paper:– Selective TROB as an ROB accelerator– Even the smallest TROB accelerates recovery

Page 18: TurboROB A Low Cost Checkpoint/Restore Accelerator

18/25HIPEAC 2008 TurboROB

Methodology

• Simulator based on Simplescalar– Alpha/OSF

• 24 SPEC CPU 2000 benchmarks

• Reference Inputs

• Processor configurations– 4-way OoO core– 128/256/512 in-flight instructions– 1K-entry confidence table for low confidence branch

identification / similar results with Anyweak

• 1B committed instructions after skipping 2B

Page 19: TurboROB A Low Cost Checkpoint/Restore Accelerator

19/25HIPEAC 2008 TurboROB

“Perfect Checkpointing” Configuration

• A checkpoint is auto-magically taken at all mispredicted branches– All recoveries are fast

• We report the “deterioration relative to perfect checkpointing”

Page 20: TurboROB A Low Cost Checkpoint/Restore Accelerator

20/25HIPEAC 2008 TurboROB

TROB Replacing the ROB/512-Entry Window

• 64-entry TROB == ROB on the Average• Pathological cases exist 256-entry needed• 512-Entry TROB better than ROB

0%

10%

20%

30%

40%

50%

164.gzip 176.gcc 179.art 197.parser 301.apsi AVG

ROB TROB_32 TROB_64 TROB_128 TROB_256 TROB_512

better

Page 21: TurboROB A Low Cost Checkpoint/Restore Accelerator

21/25HIPEAC 2008 TurboROB

TROB Replacing the ROB/128-Entry Window

• 64-Entry 50% better than ROB• Fewer pathological cases• 128-Entry TROB better than ROB

0%

10%

20%

30%

40%

50%

164.gzip 176.gcc 179.art 197.parser 301.apsi AVG

ROB TROB_32 TROB_64 TROB_128

better

Page 22: TurboROB A Low Cost Checkpoint/Restore Accelerator

22/25HIPEAC 2008 TurboROB

sTROB and Global Checkpoints/128-Entry Window

• TROB + 1 GC better than 4GCs

better

Page 23: TurboROB A Low Cost Checkpoint/Restore Accelerator

23/25HIPEAC 2008 TurboROB

Summary

• TROB vs. ROB– Replacement

• Same resources better performance

• Fewer resources often better performance – Except when accuracy is high

– Acceleration: • ¼ resources 35% improvement

• TROB vs. GCs– Reduce pressure from the critical path– With just 1 GC match the performance of four GCs

• One more alternative for designers– Allows different area/performance/power tradeoffs

Page 24: TurboROB A Low Cost Checkpoint/Restore Accelerator

24/25HIPEAC 2008 TurboROB

TurboROBA Low Cost Checkpoint/Restore Accelerator

Patrick Akl and Andreas Moshovos

AENAO Research GroupDepartment of Electrical and Computer Engineering

University of Toronto

{pakl, moshovos}@eecg.toronto.edu

Page 25: TurboROB A Low Cost Checkpoint/Restore Accelerator

25/25HIPEAC 2008 TurboROB

TROB Replacing the ROB/512-Entry Window

• 64-entry TROB == ROB on the Average• Pathological cases exist 256-entry needed• 512-Entry TROB better than ROB

better

Page 26: TurboROB A Low Cost Checkpoint/Restore Accelerator

26/25HIPEAC 2008 TurboROB

TROB Replacing the ROB/128-Entry Window

• 64-Entry 50% better than ROB• Fewer pathological cases• 128-Entry TROB better than ROB

better