67
18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015

18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

18-447 Computer Architecture

Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015

Page 2: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Agenda for Today & Next Few Lectures n  Single-cycle Microarchitectures

n  Multi-cycle and Microprogrammed Microarchitectures

n  Pipelining

n  Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery, …

n  Out-of-Order Execution

n  Issues in OoO Execution: Load-Store Handling, …

n  Alternative Approaches to Instruction Level Parallelism

2

Page 3: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Reminder: Announcements

n  Lab 3 due next Friday (Feb 20) q  Pipelined MIPS q  Competition for high performance

n  You can optimize both cycle time and CPI n  Document and clearly describe what you do during check-off

n  Homework 3 due Feb 25 q  A lot of questions that enable you to learn the concepts via

hands-on exercise q  Remember this is all for your benefit (to learn and prepare for

exams) n  HWs have very little contribution to overall grade n  Solutions to almost all questions are online anyway

3

Page 4: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Readings Specifically for Today n  Smith and Sohi, “The Microarchitecture of Superscalar

Processors,” Proceedings of the IEEE, 1995 q  More advanced pipelining q  Interrupt and exception handling q  Out-of-order and superscalar execution concepts

n  Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro 1999.

4

Page 5: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Recap of Last Lecture n  Issues with Multi-Cycle Execution n  Exceptions vs. Interrupts n  Precise Exceptions/Interrupts n  Why Do We Want Precise Exceptions? n  How Do We Ensure Precise Exceptions?

q  Reorder buffer q  History buffer q  Future register file (best of both worlds) q  Checkpointing

n  Register renaming with a reorder buffer n  How to Handle Exceptions n  How to Handle Branch Mispredictions n  Speed of State Recovery: Recovery and Interrupt Latency

q  Checkpointing

n  Registers vs. Memory 5

Page 6: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Important: Register Renaming with a Reorder Buffer

n  Output and anti dependencies are not true dependencies q  WHY? The same register refers to values that have nothing to

do with each other q  They exist due to lack of register ID’s (i.e. names) in

the ISA

n  The register ID is renamed to the reorder buffer entry that will hold the register’s value q  Register ID à ROB entry ID q  Architectural register ID à Physical register ID q  After renaming, ROB entry ID used to refer to the register

n  This eliminates anti- and output- dependencies q  Gives the illusion that there are a large number of registers

6

Page 7: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Review: Register Renaming Examples

7 Boggs et al., “The Microarchitecture of the Pentium 4 Processor,” Intel Technology Journal, 2001.

Page 8: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Review: Checkpointing Idea n  Goal: Restore the frontend state (future file) such that the

correct next instruction after the branch can execute right away after the branch misprediction is resolved

n  Idea: Checkpoint the frontend register state/map at the time a branch is decoded and keep the checkpointed state updated with results of instructions older than the branch q  Upon branch misprediction, restore the checkpoint associated

with the branch

n  Hwu and Patt, “Checkpoint Repair for Out-of-order Execution Machines,” ISCA 1987.

8

Page 9: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Review: Checkpointing n  When a branch is decoded

q  Make a copy of the future file/map and associate it with the branch

n  When an instruction produces a register value q  All future file/map checkpoints that are younger than the

instruction are updated with the value

n  When a branch misprediction is detected q  Restore the checkpointed future file/map for the mispredicted

branch when the branch misprediction is resolved q  Flush instructions in pipeline younger than the branch q  Deallocate checkpoints younger than the branch

9

Page 10: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Review: Registers versus Memory n  So far, we considered mainly registers as part of state

n  What about memory?

n  What are the fundamental differences between registers and memory? q  Register dependences known statically – memory

dependences determined dynamically q  Register state is small – memory state is large q  Register state is not visible to other threads/processors –

memory state is shared between threads/processors (in a shared memory multiprocessor)

10

Page 11: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Maintaining Speculative Memory State: Stores

n  Handling out-of-order completion of memory operations q  UNDOing a memory write more difficult than UNDOing a

register write. Why? q  One idea: Keep store address/data in reorder buffer

n  How does a load instruction find its data?

q  Store/write buffer: Similar to reorder buffer, but used only for store instructions n  Program-order list of un-committed store operations n  When store is decoded: Allocate a store buffer entry n  When store address and data become available: Record in store

buffer entry n  When the store is the oldest instruction in the pipeline: Update

the memory address (i.e. cache) with store data

n  We will get back to this after today! 11

Page 12: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Remember: Static vs. Dynamic Scheduling

12

Page 13: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Remember: Questions to Ponder n  What is the role of the hardware vs. the software in the

order in which instructions are executed in the pipeline? q  Software based instruction scheduling à static scheduling q  Hardware based instruction scheduling à dynamic scheduling

n  What information does the compiler not know that makes static scheduling difficult? q  Answer: Anything that is determined at run time

n  Variable-length operation latency, memory addr, branch direction

13

Page 14: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Dynamic Instruction Scheduling n  Hardware has knowledge of dynamic events on a per-

instruction basis (i.e., at a very fine granularity) q  Cache misses q  Branch mispredictions q  Load/store addresses

n  Wouldn’t it be nice if hardware did the scheduling of instructions?

14

Page 15: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Out-of-Order Execution (Dynamic Instruction Scheduling)

Page 16: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

An In-order Pipeline

n  Problem: A true data dependency stalls dispatch of younger instructions into functional (execution) units

n  Dispatch: Act of sending an instruction to a functional unit

16

F D

E

R E E E E E E E E

E E E E

E E E E E E E E . . .

Integer add

Integer mul

FP mul

Cache miss

W

Page 17: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Can We Do Better? n  What do the following two pieces of code have in common

(with respect to execution in the previous design)?

n  Answer: First ADD stalls the whole pipeline!

q  ADD cannot dispatch because its source registers unavailable q  Later independent instructions cannot get executed

n  How are the above code portions different? q  Answer: Load latency is variable (unknown until runtime) q  What does this affect? Think compiler vs. microarchitecture

17

IMUL R3 ß R1, R2 ADD R3 ß R3, R1 ADD R1 ß R6, R7 IMUL R5 ß R6, R8 ADD R7 ß R9, R9

LD R3 ß R1 (0) ADD R3 ß R3, R1 ADD R1 ß R6, R7 IMUL R5 ß R6, R8 ADD R7 ß R9, R9

Page 18: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Preventing Dispatch Stalls n  Multiple ways of doing it n  You have already seen at least THREE:

q  1. Fine-grained multithreading q  2. Value prediction q  3. Compile-time instruction scheduling/reordering

n  What are the disadvantages of the above three?

n  Any other way to prevent dispatch stalls? q  Actually, you have briefly seen the basic idea before

n  Dataflow: fetch and “fire” an instruction when its inputs are ready

q  Problem: in-order dispatch (scheduling, or execution) q  Solution: out-of-order dispatch (scheduling, or execution)

18

Page 19: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Out-of-order Execution (Dynamic Scheduling)

n  Idea: Move the dependent instructions out of the way of independent ones (s.t. independent ones can execute) q  Rest areas for dependent instructions: Reservation stations

n  Monitor the source “values” of each instruction in the resting area

n  When all source “values” of an instruction are available, “fire” (i.e. dispatch) the instruction q  Instructions dispatched in dataflow (not control-flow) order

n  Benefit:

q  Latency tolerance: Allows independent instructions to execute and complete in the presence of a long latency operation

19

Page 20: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-order vs. Out-of-order Dispatch n  In order dispatch + precise exceptions:

n  Out-of-order dispatch + precise exceptions:

n  16 vs. 12 cycles 20

F D W E E E E R F D E R W

F

IMUL R3 ß R1, R2 ADD R3 ß R3, R1 ADD R1 ß R6, R7 IMUL R5 ß R6, R8 ADD R7 ß R3, R5

D E R W F D E R W

F D E R W

F D W E E E E R F D

STALL STALL

E R W F D

E E E E STALL

E R F D E E E E R W

F D E R W

WAIT

WAIT

W This slide is actually correct

Page 21: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Enabling OoO Execution 1. Need to link the consumer of a value to the producer

q  Register renaming: Associate a “tag” with each data value

2. Need to buffer instructions until they are ready to execute q  Insert instruction into reservation stations after renaming

3. Instructions need to keep track of readiness of source values q  Broadcast the “tag” when the value is produced q  Instructions compare their “source tags” to the broadcast tag

à if match, source value becomes ready

4. When all source values of an instruction are ready, need to dispatch the instruction to its functional unit (FU)

q  Instruction wakes up if all sources are ready q  If multiple instructions are awake, need to select one per FU

21

Page 22: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Tomasulo’s Algorithm n  OoO with register renaming invented by Robert Tomasulo

q  Used in IBM 360/91 Floating Point Units q  Read: Tomasulo, “An Efficient Algorithm for Exploiting Multiple

Arithmetic Units,” IBM Journal of R&D, Jan. 1967.

n  What is the major difference today? q  Precise exceptions: IBM 360/91 did NOT have this q  Patt, Hwu, Shebanow, “HPS, a new microarchitecture: rationale and

introduction,” MICRO 1985. q  Patt et al., “Critical issues regarding HPS, a high performance

microarchitecture,” MICRO 1985.

n  Variants are used in most high-performance processors q  Initially in Intel Pentium Pro, AMD K5 q  Alpha 21264, MIPS R10000, IBM POWER5, IBM z196, Oracle UltraSPARC T4, ARM Cortex A15

22

Page 23: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Two Humps in a Modern Pipeline

n  Hump 1: Reservation stations (scheduling window) n  Hump 2: Reordering (reorder buffer, aka instruction window

or active window)

23

F D

E

W E E E E E E E E

E E E E

E E E E E E E E . . .

Integer add

Integer mul

FP mul

Load/store

R E O R D E R

S C H E D U L E

TAG and VALUE Broadcast Bus

in order out of order in order

Page 24: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

General Organization of an OOO Processor

n  Smith and Sohi, “The Microarchitecture of Superscalar Processors,” Proc. IEEE, Dec. 1995.

24

Page 25: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Tomasulo’s Machine: IBM 360/91

25

FP FU FP FU

from memory

load buffers

from instruction unit FP registers

store buffers

to memory

operation bus

reservation stations

Common data bus

Page 26: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Register Renaming n  Output and anti dependencies are not true dependencies

q  WHY? The same register refers to values that have nothing to do with each other

q  They exist because not enough register ID’s (i.e. names) in the ISA

n  The register ID is renamed to the reservation station entry that will hold the register’s value q  Register ID à RS entry ID q  Architectural register ID à Physical register ID q  After renaming, RS entry ID used to refer to the register

n  This eliminates anti- and output- dependencies q  Approximates the performance effect of a large number of

registers even though ISA has a small number 26

Page 27: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

n  Register rename table (register alias table)

Tomasulo’s Algorithm: Renaming

27

R0

R1

R2

R3

tag value valid?

R4

R5

R6

R7

R8

R9

1

1 1

1

1

1 1

1 1 1

Page 28: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Tomasulo’s Algorithm n  If reservation station available before renaming

q  Instruction + renamed operands (source value/tag) inserted into the reservation station

q  Only rename if reservation station is available n  Else stall n  While in reservation station, each instruction:

q  Watches common data bus (CDB) for tag of its sources q  When tag seen, grab value for the source and keep it in the reservation station q  When both operands available, instruction ready to be dispatched

n  Dispatch instruction to the Functional Unit when instruction is ready n  After instruction finishes in the Functional Unit

q  Arbitrate for CDB q  Put tagged value onto CDB (tag broadcast) q  Register file is connected to the CDB

n  Register contains a tag indicating the latest writer to the register n  If the tag in the register file matches the broadcast tag, write broadcast value

into register (and set valid bit) q  Reclaim rename tag

n  no valid copy of tag in system!

28

Page 29: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

An Exercise

n  Assume ADD (4 cycle execute), MUL (6 cycle execute) n  Assume one adder and one multiplier n  How many cycles

q  in a non-pipelined machine q  in an in-order-dispatch pipelined machine with imprecise

exceptions (no forwarding and full forwarding) q  in an out-of-order dispatch pipelined machine imprecise

exceptions (full forwarding) 29

MUL R3 ß R1, R2 ADD R5 ß R3, R4 ADD R7 ß R2, R6 ADD R10 ß R8, R9 MUL R11 ß R7, R10 ADD R5 ß R5, R11

F D E W

Page 30: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Exercise Continued

30

Page 31: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Exercise Continued

31

Page 32: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Exercise Continued

32

MUL R3 ß R1, R2 ADD R5 ß R3, R4 ADD R7 ß R2, R6 ADD R10 ß R8, R9 MUL R11 ß R7, R10 ADD R5 ß R5, R11

Page 33: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

How It Works

33

Page 34: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Cycle 0

34

Page 35: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Cycle 2

35

Page 36: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

36

Cycle 3

Page 37: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Cycle 4

37

Page 38: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Cycle 7

38

Page 39: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Cycle 8

39

Page 40: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Some Questions n  What is needed in hardware to perform tag broadcast and

value capture? à make a value valid à wake up an instruction

n  Does the tag have to be the ID of the Reservation Station Entry?

n  What can potentially become the critical path? q  Tag broadcast à value capture à instruction wake up

n  How can you reduce the potential critical paths?

40

Page 41: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

An Exercise, with Precise Exceptions

n  Assume ADD (4 cycle execute), MUL (6 cycle execute) n  Assume one adder and one multiplier n  How many cycles

q  in a non-pipelined machine q  in an in-order-dispatch pipelined machine with reorder buffer

(no forwarding and full forwarding) q  in an out-of-order dispatch pipelined machine with reorder

buffer (full forwarding) 41

MUL R3 ß R1, R2 ADD R5 ß R3, R4 ADD R7 ß R2, R6 ADD R10 ß R8, R9 MUL R11 ß R7, R10 ADD R5 ß R5, R11

F D E R W

Page 42: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Out-of-Order Execution with Precise Exceptions

n  Idea: Use a reorder buffer to reorder instructions before committing them to architectural state

n  An instruction updates the register alias table (essentially a future file) when it completes execution

n  An instruction updates the architectural register file when it is the oldest in the machine and has completed execution

42

Page 43: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Out-of-Order Execution with Precise Exceptions

n  Hump 1: Reservation stations (scheduling window) n  Hump 2: Reordering (reorder buffer, aka instruction window

or active window)

43

F D

E

W E E E E E E E E

E E E E

E E E E E E E E . . .

Integer add

Integer mul

FP mul

Load/store

R E O R D E R

S C H E D U L E

TAG and VALUE Broadcast Bus

in order out of order in order

Page 44: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Modern OoO Execution w/ Precise Exceptions

n  Most modern processors use q  Reorder buffer to support in-order retirement of instructions q  A single register file to store registers (speculative and

architectural) – INT and FP are still separate q  Future register map à used for renaming q  Architectural register map à used for state recovery

44

Page 45: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

An Example from Modern Processors

45 Boggs et al., “The Microarchitecture of the Pentium 4 Processor,” Intel Technology Journal, 2001.

Page 46: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Enabling OoO Execution, Revisited 1. Link the consumer of a value to the producer

q  Register renaming: Associate a “tag” with each data value

2. Buffer instructions until they are ready q  Insert instruction into reservation stations after renaming

3. Keep track of readiness of source values of an instruction q  Broadcast the “tag” when the value is produced q  Instructions compare their “source tags” to the broadcast tag

à if match, source value becomes ready

4. When all source values of an instruction are ready, dispatch the instruction to functional unit (FU)

q  Wakeup and select/schedule the instruction

46

Page 47: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Summary of OOO Execution Concepts n  Register renaming eliminates false dependencies, enables

linking of producer to consumers

n  Buffering enables the pipeline to move for independent ops

n  Tag broadcast enables communication (of readiness of produced value) between instructions

n  Wakeup and select enables out-of-order dispatch

47

Page 48: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

OOO Execution: Restricted Dataflow n  An out-of-order engine dynamically builds the dataflow

graph of a piece of the program q  which piece?

n  The dataflow graph is limited to the instruction window q  Instruction window: all decoded but not yet retired

instructions

n  Can we do it for the whole program? n  Why would we like to? n  In other words, how can we have a large instruction

window? n  Can we do it efficiently with Tomasulo’s algorithm?

48

Page 49: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Dataflow Graph for Our Example

49

MUL R3 ß R1, R2 ADD R5 ß R3, R4 ADD R7 ß R2, R6 ADD R10 ß R8, R9 MUL R11 ß R7, R10 ADD R5 ß R5, R11

Page 50: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

State of RAT and RS in Cycle 7

50

Page 51: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Dataflow Graph

51

Page 52: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-Class Exercise on Tomasulo

52

Page 53: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-Class Exercise on Tomasulo

53

Page 54: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-Class Exercise on Tomasulo

54

Page 55: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-Class Exercise on Tomasulo

55

Page 56: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-Class Exercise on Tomasulo

56

Page 57: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-Class Exercise on Tomasulo

57

Page 58: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-Class Exercise on Tomasulo

58

Page 59: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-Class Exercise on Tomasulo

59

Page 60: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-Class Exercise on Tomasulo

60

Page 61: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-Class Exercise on Tomasulo

61

Page 62: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-Class Exercise on Tomasulo

62

Page 63: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

In-Class Exercise on Tomasulo

63

Page 64: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Tomasulo Template

64

Page 65: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

We did not cover the following slides in lecture. These are for your preparation for the next lecture.

Page 66: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Restricted Data Flow n  An out-of-order machine is a “restricted data flow” machine

q  Dataflow-based execution is restricted to the microarchitecture level

q  ISA is still based on von Neumann model (sequential execution)

n  Remember the data flow model (at the ISA level): q  Dataflow model: An instruction is fetched and executed in

data flow order q  i.e., when its operands are ready q  i.e., there is no instruction pointer q  Instruction ordering specified by data flow dependence

n  Each instruction specifies “who” should receive the result n  An instruction can “fire” whenever all operands are received

66

Page 67: 18-447 Computer Architectureece447/s15/lib/exe/... · The same register refers to values that have nothing to do with each other " They exist due to lack of register ID’s (i.e

Questions to Ponder n  Why is OoO execution beneficial?

q  What if all operations take single cycle? q  Latency tolerance: OoO execution tolerates the latency of

multi-cycle operations by executing independent operations concurrently

n  What if an instruction takes 500 cycles? q  How large of an instruction window do we need to continue

decoding? q  How many cycles of latency can OoO tolerate? q  What limits the latency tolerance scalability of Tomasulo’s

algorithm? n  Active/instruction window size: determined by both scheduling

window and reorder buffer size

67