44
Out of Order Execution Pooja Saraff Manoj Mardithaya

Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Out of Order Execution

Pooja Saraff Manoj Mardithaya

Page 2: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

How the notion was conceived Sequential Execution Pipelining Superscalar Execution Out-Of-Order Execution

fetch decode ld fetch decode add

fetch decode sub fetch decode bne

time

fetch decode ld fetch decode add fetch decode sub

fetch decode bne

Tim

e

2

Page 3: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

What is OOO? • OOO execution is a type of processing where the instructions

can begin execution as soon as operands are ready • Instructions are issued in order however execution proceeds

out of order • Evolution CDC 6600 IBM 360/91 Tomasulo's algorithm IBM/Motorola PowerPC 601 Fujitsu/HAL SPARC64, Intel Pentium Pro MIPS R10000, AMD K5 DEC Alpha 21264 Sandy Bridge

1964

1993

1996

1995

1998

1966

2011 3

Page 4: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Architecture without Common Data Bus

Stores field in register-to-register instruction

Storage-to-register instruction

Store data instruction

4

Page 5: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Operation

Decodes Sets Control Bit

As the buffer gets filled

Results 5

Page 6: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Results

• It doesn’t take care of data dependency • Thus busy bit added – however FLOS hold-up

because of busy sink register • Solution to it – Reservation Station (control,sink,source) • Execution now depends on appropriate

reservation station

6

Page 7: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

3 Types of Data Dependencies

• RAW (Read After Write) R2 <- R1 + R3 R4 <- R2 + R3

• WAR (Write After Read) R4 <- R1 + R3 R3 <- R1 + R2

• WAW (Write After Write) R2 <- R4 + R7 R2 <- R1 + R2

Register Renaming uses in-order decoding to properly identify dependences.

7

Page 8: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Register Renaming A: DIVF F3, F1, F0 r1, -, - B: SUBF F2, F1, F0 r2, -, - C: MULF F0, F2, F4 r3, r2, - D: SUBF F6, F2, F3 r4, r2, r1 E: ADDF F2, F5, F4 r5, -, - F: ADDF F0, F0, F2 r6, r3, r5

Need more physical registers than architectural Ignores control flow for the time being.

8

Page 9: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Architecture

6 5 4 3 2 1

Form memory

Load buffers

From instruction unit Floating- point operations FP registers

FP adders FP multipliers

Store buffers

to memory

Common data bus (CDB)

3 2 1

3 2 1

Operation bus

2 1

Reservation Stations

Operand bus

- 3 Adders - 2 Multipliers - Load buffers (6) - Store buffers (3) - FP Queue - FP registers - CDB: Common Data Bus

9

Page 10: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Tomasulo’s Algorithm Steps • Issue

- Issue if empty reservation station is found, fetch operands if they are in registers, otherwise assign a tag

- If no empty reservation is found, stall and wait for one to get free

- Renaming is performed here and WAW and WAR are resolved

• Execute – If operands are not ready, monitor the CDB for them – RAWs are resolved – When they are ready, execute the op in the FU

• Write Back – Send the results to CDB and update registers and the Store

buffers – Store Buffers will write to memory during this step

10

Page 11: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

6 5 4 3 2 1

Form memory

Load buffers

From instruction unit Floating- point operations FP registers

FP adders FP multipliers

Store buffers

to memory

Common data bus (CDB)

3 2 1

3 2 1

Operation bus

2 1

Reservation Stations

Operand bus

TAG

0

AD F0, FLB1

11

Page 12: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

6 5 4 3 2 1

Form memory

Load buffers

From instruction unit Floating- point operations FP registers

FP adders FP multipliers

Store buffers

to memory

Common data bus (CDB)

3 2 1

3 2 1

Operation bus

2 1

Reservation Stations

Operand bus

TAG

1010 (A1)

AD F0, FLB1

Busy bit – 1 Ctrl bit - 1

FLOS issues instruction and update Tag AD F0,…..

12

Page 13: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

6 5 4 3 2 1

Form memory

Load buffers

From instruction unit Floating- point operations FP registers

FP adders FP multipliers

Store buffers

to memory

Common data bus (CDB)

3 2 1

3 2 1

Operation bus

2 1

Reservation Stations

Operand bus

TAG

1011 (A2)

AD F0, FLB1 AD F0,…..

However reservation tag for A1 has initial F0 tag i.e. 1010

Sink - 1010 (A1)

- execution

Tag miss match

Tag 1010

13

Page 14: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

A. Moshovos © ECE1773 - Fall ‘06 ECE Toronto

Drawbacks

• CDB is a bottleneck • Limits the execution time of any instruction

to 2 cycles, minimum

• Complex implementation

14

Page 15: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

HPS: a new microarchitecture

15

Page 16: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

High Performance Substrate

• ~20 years after Tomasulo’s seminal paper, slightly different game.

• Three Tiers to Optimize – Global Parallelism – Sequential Flow – Local Parallelism

• Exploit local parallelism with a very small

‘window’ of instructions at the microarchitecture level

16

Page 17: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Decoder

Fn. Unit

Fn. Unit

Fn. Unit

Fn. Unit

Merger

Node Tables

Result Buffer

17

Page 18: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Decoder

Fn. Unit

Fn. Unit

Fn. Unit

Fn. Unit

Merger

Node Tables

Add 1000, A, B

1000

RD

+

WR

A B

Result Buffer

18

Page 19: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Decoder

Fn. Unit

Fn. Unit

Fn. Unit

Fn. Unit

Merger

Node Tables

Add 1000, A, B

Read from Addr. A

Add 1000, A

Write to Addr. B

Decoder

Result Buffer

19

Page 20: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Decoder

Fn. Unit

Fn. Unit

Fn. Unit

Fn. Unit

Merger

Node Tables Merger

Add 1000, A, B

Read from Addr. A

Add 1000, A

Write to Addr. B

Decoder

?

?

Result Buffer

20

Page 21: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

1000

RD

+

WR

A B

Add 1000, A, B

Add 100, B, C

100

RD

+

WR

B C

21

Page 22: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

100

RD

+

WR

B C 1000

RD

+ WR

A B

Add 1000, A, B Add 100, B, C

100

RD

+ WR

C

22

Page 23: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Decoder

Fn. Unit

Fn. Unit

Fn. Unit

Fn. Unit

Merger

Node Tables Merger

Add 1000, A, B

Read from Addr. A

Add 1000, A

Write to Addr. B

Decoder

?

?

Result Buffer

23

Page 24: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Decoder

Fn. Unit

Fn. Unit

Fn. Unit

Fn. Unit

Merger

Node Tables

Result Buffer

Still Unknown: • Format of the

Node Table • Finding

Dependencies • Scheduling

Nodes

24

Page 25: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Decoder

Fn. Unit

Fn. Unit

Fn. Unit

Fn. Unit

Merger

Node Tables

Result Buffer

Node Tables Operation Tag Ope rands

25

Page 26: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Decoder

Fn. Unit

Fn. Unit

Fn. Unit

Fn. Unit

Merger

Node Tables

Result Buffer

Still Unknown: Format of the

Node Table • Finding

Dependencies • Scheduling

Nodes

26

Page 27: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Decoder

Fn. Unit

Fn. Unit

Fn. Unit

Fn. Unit

Merger

Node Tables

Result Buffer

Register Alias Table

What about Memory Access? 27

Page 28: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Decoder

Fn. Unit

Fn. Unit

Fn. Unit

Fn. Unit

Merger

Node Tables

Result Buffer

Register Alias Table

Still Unknown: Format of the

Node Table Finding

Dependencies • Scheduling

Nodes

28

Page 29: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Scheduling • Node is ‘ready to fire’ when Ready Bits of all

operands are set. • Oldest ‘Fires’ when a Functional Unit is ready.

* Can the scheduling make smarter choices? 29

Page 30: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Decoder

Fn. Unit

Fn. Unit

Fn. Unit

Fn. Unit

Merger

Node Tables

Result Buffer

Register Alias Table

Still Unknown: Format of the

Node Table Finding

Dependencies Scheduling

Nodes

30

Page 31: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Decoder

Fn. Unit

Fn. Unit

Fn. Unit

Fn. Unit

Merger

Node Tables

Result Buffer

Register Alias Table

CDB!

31

Page 32: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Advantages over Tomasulo’s Algorithm

• No ‘renaming’ involved, register alias table. – Eliminates anti and output dependencies without

messy renaming schemes.

• Don’t need to queue instructions to ‘reservation stations’ before both source and sink are ready.

• Node tables allow an ‘active window’ worth of possible parallelism.

32

Page 33: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

HPSm[inimal]

• Implementation of the HPS model. • Minimal, because of practical issues HPS did

not address: – Branch Prediction.

– Memory dependencies.

– Number of nodes per instruction.

33

Page 34: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

HPSm[inimal]

• Implementation of the HPS model. • Minimal, because of practical issues HPS did

not address: – Branch Prediction.

• Fixed to 1 unresolved prediction at a time.

– Memory dependencies. • Fire oldest writes, then oldest reads.

– Number of nodes per instruction. • At most two.

34

Page 35: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Wrong Predictions and Exceptions

• We are executing out of order. – What happens when the executed instructions

shouldn’t have been? – What happens if an exception is thrown?

• Solution: Register Alias Table has backups Current

Backup

Current

Backup

1 2 3 4 5 B 6 7 8

35

Page 36: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Wrong Predictions and Exceptions

Current

Backup

1 2 3 4 5 B 6 7 8

36

Current

Backup

Current

Backup

Current

Backup

When the instructions are decoded:

Page 37: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Wrong Predictions and Exceptions

Current

Backup

1 2 3 4 5 B 6 7 8

37

Current

Backup

Current

Backup

Current

Backup

When they are executed (out of order):

Page 38: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Wrong Predictions and Exceptions

Current

Backup

1 2 3 4 5 B 6 7 8

38

Current

Backup

Current

Backup

Current

Backup

When they are executed (out of order):

Page 39: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Wrong Predictions and Exceptions

Current

Backup

1 2 3 4 5 B 6 7 8

39

Current

Backup

Current

Backup

Current

Backup

When they are executed (out of order):

Page 40: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Wrong Predictions and Exceptions

Backup

Backup

1 2 3 4 5 B X X X

40

Backup

Backup

Backup

Backup

Backup

Backup

When they are executed (out of order):

Page 41: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Memory Dependencies

• Algorithm: – Fire the oldest fire-able memory write. If none: – Fire the oldest fire-able memory read.

• All access addresses translated, sit in the Write Buffer.

• Reads check the Write Buffer before going to memory.

• Write to memory only when the instruction RETIRES.

41

Page 42: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

HPSm Results • Comparison against the RISC II, with both

non-optimizing and optimizing compilers

0.5

1

2

4

8

16

Spee

d up

Nor

mal

ized

to R

ISC

Opt

imize

d

RISC RISC Optimized HPSm HPSm Optimized

42

Page 43: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Out of Order: Today

• Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution capabilities • Use Physical Register Files for renaming.

• Cortex A9, over a year ago – 1st mobile CPU with an OOO Execution Engine.

• Superscalar, OOO, Register Renaming – We’re hitting an ‘ILP Wall’

43

Page 44: Out of Order Execution - University of California, San Diego · 2011. 12. 1. · Out of Order: Today •Sandy Bridge, POWER, Bulldozer and Bobcat all have Awesome out of order execution

Thank you

Any Questions?

44