25
Out Out - - of of - - Order Order Superscalar CPU Superscalar CPU May 6 th , 2005 Cliff Frey and Vicky Liu 6.884 Final Project Presentation May 6 th , 2005

Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

OutOut--ofof--Order Order Superscalar CPUSuperscalar CPU

May 6th, 2005Cliff Frey and Vicky Liu

6.884 Final Project PresentationMay 6th, 2005

Page 2: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

The Basis of Our Design

- Allows out-of-order execution

- Instructions wait in Reservation Stations

- Execute instructions once operands have been computed

- Can reorder WAW and WAR

Tomasulo’s Algorithm

Page 3: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

The Basis of Our Design

- In Decode stage, each instruction result is assigned a Tag

- Each register maps to a Value or to a Tag

- When a result is computed, result and tag are broadcast

- All instances of the Tag are updated with the computed value

- Updates RegFile and Reservation Stations

Tomasulo’s Algorithm

Page 4: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

High Level Design

Fetch UnitDecodeRenaming Register File

The major components

Reservation StationsFunctional UnitsCommon Data Bus

Page 5: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

High Level Design

Fetch Decode Execute Write Back(CDB)

Page 6: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

BlueSpec Rule & Module Design

Page 7: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

- Unresolved branches stall decode stage

- Memory operations need to be in order

- Back to back dependent adds take 2 cycles

High Level Design Issues

Page 8: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

Design Exploration: Supporting Precise Exceptions

- Register File contents can be lost

- external changes need to ordered

Short-comings of Tomasulo’s algorithm

Page 9: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

Design Exploration: Supporting Precise Exceptions

… instructions before the excepting instruction, execute normally

… instructions after and including the excepting instruction do not change any programmer visible state of the processor

A Processor Supports Precise Exceptions If…

Page 10: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

Design Exploration: Supporting Precise Exceptions

… instructions before the excepting instruction, execute normally

… instructions after and including the excepting instruction do not change any programmer visible state of the processor

A Processor Supports Precise Exceptions If…

- Register File contents can be lost

- external changes need to ordered

Short-comings of Tomasulo’s algorithm

Page 11: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

Original High Level Design

Page 12: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

Updated High Level Design

Our Solution- Minimal changes to original design - Reorder Buffer (ROB) and Commit stage- Architectural Register File- External changes made at commit time

Page 13: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

Updated High Level Design

Page 14: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

Handling Exceptions

ROB

Undo

Set PC to interrupt vector (0x1100)Exception PC stored in coprocessor register EPCCorrect speculative results in Rename Register FileClear cached information in Functional Units

Page 15: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

Other Features to Get High Performance

- Speculative fetch

- external changes need to ordered

- memory unit can handle many requests at a time

Implemented Features

Unimplemented Features- Branch prediction and target buffering

- Speculative execution

Page 16: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

A Closer Look at the Load/Store Unit

result.get()

Mem

Page 17: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

BlueSpec Stories: Conflicting Rules

Page 18: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

BlueSpec Stories: The Fix

Possible Solutions- One rule for every possible data path

6.884 Final Project PresentationMay 6th, 2005

- Use config regs everywhere- Be slow and blame BlueSpec =P

Page 19: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

BlueSpec Stories: The Fix

Possible Solutions

Our Solutions- Homemade completion buffer

- Make methods write to RWires

- Write “magic” rule to handle all combination of cases

6.884 Final Project PresentationMay 6th, 2005

- One rule for every possible data path- Use config regs everywhere- Be slow and blame BlueSpec =P

Page 20: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

Bypassing from writeback to decode

6.884 Final Project PresentationMay 6th, 2005

Page 21: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

An Excerpt from our Trace Output

Back to back, nondependent adds

Fetch Decode Execute Writeback CommitF | [ ] - - |F |00001000=0 ADD [ ] - - |F |00001004=1 ADD [ 0] - - |F |00001008=2 ADD [ 1] A-0 -00000001 |

|0000100c=3 ADD [ 2] A-1 -00000001 | 0| [ 3] A-2 -00000002 | 1| [ ] A-3 -00000002 | 2| [ ] - - | 3

Page 22: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

An Excerpt from our Trace Output

Decode add mem BR WB commit001398 LW r1, r10 [ |M | ] |00139c ADDI r2, r2, -4 [ |M LW | ] |0013a0 SLT r1, r11, r1 [ADDI |M | ] |0013a4 BEQZ r1, 0x13d8 [ |M | ]ADDI |0013a8 SUBI r3, r12, -1 [ |M LW| ] |

[SUBI |M | ]LW |[SLT |M | ]SUBI |LW[ |M | ]SLT |ADDI[ |M |BEQZ] |SLT[ |M | ]BEQZ |[ |M | ] |BEQZ *taken![ |M | ] |SUBI

Instruction stream with reordering

Page 23: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

Synthesis Results

Clock speed = 4ns

Area = .38 mm2

Page 24: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

Design Choices and Performance

Configurable ParametersResizing reservation stations

Number of slots in ROB and the Fetch Unit buffer

Different functional unit setup

Easily support multicycle functional units

Page 25: Out-of-Order Superscalar CPUcsg.csail.mit.edu/6.884/projects/group2-presentation.pdf · Microsoft PowerPoint - group2-presentation.ppt [Read-Only] Author: ronny Created Date: 5/12/2005

6.884 Final Project PresentationMay 6th, 2005

Design Choices and Performance

Resizing reservation stations

Number of slots in ROB and the Fetch Unit buffer

Different functional unit setup

Easily support multicycle functional units

Configurable Parameters

Branches and stores really hurt performance

Achieved IPC ≈ .5 on vector-add and quicksort

Performance