21
Mothers of Pipelines Sava Krsti´ c, Robert Jones, John O’ Leary Intel Corporation PDPAR 2006

Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

  • Upload
    hathu

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Mothers of Pipelines

Sava Krstic, Robert Jones, John O’ Leary

Intel Corporation

PDPAR 2006

Page 2: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Pipeline Verification: P vs. ISA

Prove that a high-level model P of a pipelined processor is

faithful to the given instruction set architecture (ISA).

✹ Both P and ISA are seen as transition systems; the goal is

to prove that ISA in some reasonable sense simulates P .

2

Page 3: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

ISA

States: I = 〈pc : IAddr, rf : RF, dmem : DMem, imem : IMem〉

Transitions:

guard: imem.pc = . . . action

opc1 dest src1 src2 pc := pc + 4 rf .dest := alu opc1 (rf .src1) (rf .src2)

opc2 dest src1 imm pc := pc + 4 rf .dest := alu opc2 (rf .src1) imm

ld dest src1 offset pc := pc + 4 rf .dest := dmem.(rf .src1 + offset)

st src1 dest offset pc := pc + 4 dmem.(rf .dest + offset) := rf .src1

opc3 reg offset pc :=

{

br target if br takenpc + 4 otherwise

, where

br target = target pc offsetbr taken = taken opc3 (rf .reg)

dest , src1 , src2 , reg : Reg imm, offset : Wordopc1 : {add, sub, mult} opc2 : {addi, subi, multi} opc3 : {beqz, bnez, j}

3

Page 4: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Example: P = DLX

. States of P : 〈pc, rf , dmem, imem, p1, p2, p3, p4〉

.

58?9>:=;<,,

,,,,

,,

��,,

,,,

p1

48?9>:=;<,,

,,,

��,,

,,,

p2

38?9>:=;<,,

,,,

��,,

,,,

p3

28?9>:=;<,,

,,,

��,,

,,,

p4

18?9>:=;<,,

,,,

��,,

,,,,

,,

p′1 p′2 p′3 p′4

p1

68?9>:=;<,,

,,,

��,,

,,,

p2

38?9>:=;<,,

,,,

��,,

,,,

p3

28?9>:=;<,,

,,,

��,,

,,,

p4

18?9>:=;<,,

,,,

��,,

,,,,

,,

ep′2 p′3 p′4

p1

��

p2

38?9>:=;<,,

,,,

��,,

,,,

p3

28?9>:=;<,,

,,,

��,,

,,,

p4

18?9>:=;<,,

,,,

��,,

,,,,

,,

p1e

p′3 p′4

[regular cycle] [branch taken] [stall for load]

.

18?9>:=;< p4 writes back to rf and retires

28?9>:=;< p3 does memory access

38?9>:=;< alu computes result/mem. address for p2

48?9>:=;< p1 gets data from rf or by forwarding;

in the branch case, target/taken computed

58?9>:=;< new p′1 is fetched; pc incremented

68?9>:=;< like 48?9>:=;<, plus updating pc with computed target

4

Page 5: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Simulating P in ISA

. First need to map states of P to ISA states:

.P -stateX = p1 p2 p3 p4 pc rf dmem imem

.

α

��

.

.

.

ISA-stateX = pc∗ rf ∗ dmem∗ imem∗

5

Page 6: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Flushing

. Mapping states of P to ISA states:

.P -state

α

��

X = p1

��

p2

��

p3

��

p4

��

pc rf dmem imem

. Xe

��

p′2

��

p′3

��

p′4

��

pc′ rf ′ dmem ′ imem

Xe

��

e

��

p′′3

��

p′′4

��

pc′ rf ′′ dmem ′′ imem

Xe e e

p′′′4

��

pc′ rf ′′′ dmem ′′′ imem

. Xe e e e

pc′ rf ′′′′ dmem ′′′ imem

ISA-stateX = pc′ rf ′′′′ dmem ′′′ imem

6

Page 7: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

The Burch-Dill Method for DLX [CAV 94]

✹ D DLX states

✹ dlx step : D → D DLX transitions

✹ α : D → I flushing function

D

α

��

dlx step//D

α

��

Theorem

I[isa step]

//I

7

Page 8: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

The Burch-Dill Method in General

✹ P = I × PipeRegs states of P

✹ p step : P → P transitions of P

✹ α : P → I flushing function

✹ Correctness Theorem for P :

P

α

��

p step//P

α

��

I(isa step)∗

//I

✹ For proof, don’t need defs of alu, target, taken, and only needbasic read-write array properties of rf and mem

✹ Correctness thm expressed in the language of SMT solvers

✹ . . . but it’s huge for complex P . Must partition the thm.

8

Page 9: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

The MOP Method

Put a highly-nondeterministic “machine” MOP between P andISA and deduce the P vs. ISA correctness from P vs. MOP andMOP vs. ISA.

✹ MOP is derived from ISA and the features of P

✹ MOP is the mother of many pipelines:

P1

((QQQQQQQQQQQQQQ

P2

,,XXXXXXXXXXXXX

... MOP oo ____________ ISA

...

Pn

66mmmmmmmmmmmmmm

✹ Out-of-order execution and non-determinism in P are OK

✹ Potential to systematically partition correctness theorem

9

Page 10: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

MOP : Starting Observations

✹ One can formalize “instructions-in-flight”—parcels

✹ Parcels form a poset of finite height—progress ordering

✹ With every state of P we can associate a set of parcels

✹ At each cycle of execution of P :

• pc, rf , mem get updated

• some parcels get in (fetched)

• some parcels get out (retired)

• some parcels progress

10

Page 11: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Parcels

Parcel =

〈instr : Instr⊥my pc : IAddr⊥dest , src1 , src2 : Reg⊥imm : Word⊥opc : Opcode⊥data1 , data2 : Word⊥res , mem addr : Word⊥tkn : bool⊥next pc : IAddr⊥wb : {⊥,⊤}pc upd : {⊥, s,m,⊤}〉

11

Page 12: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Definition of MOP

States: M = I × 〈q : N→ Parcel, head : N, tail : N〉

q

head tail

0 1 2 3 4 5 6 7 8

pc rf dmem imem

...

Transitions: Atomic actions occurring in executions

12

Page 13: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Transitions

def i = imem.pc fetch

grd length = 0 ∨ q.tail .pc upd 6= ⊥

act q.(tail + 1) := empty parcel[instr 7→ i, my pc 7→ pc] tail := tail + 1

def p = q.j decode j

grd head ≤ j ≤ tail ¬(decoded p)

act p := decode p

✹ 16 more rules: data1 rf , data1 forward, result, mem addr,write back j, load j, store j, branch target j, branch taken j,next pc branch j, next pc nonbranch j, pc update, speculate,prediction ok j, squash, retire

13

Page 14: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Confluence

MOP# ≡ MOP without fetch

Theorem MOP# is terminating.

Theorem Both MOP and MOP#

are locally confluent.

(≈ 400 little theorems)

ρ1

����������������

ρ2

��222

2222

2222

222

?

��

?

14

Page 15: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Local Confluence: About Proof

Most cases are resolved trivially:

•ρ1

������

����

����

ρ2

��666

6666

6666

6

ρ2��6

6666

6666

666 •

ρ1����

����

����

��

or •

ρ1

����������������

ρ2

��222

2222

2222

222

•ρ2

33•

Some are interesting:

o •

retire

������

����

����

data queue j

��666

6666

6666

6

data rf j

��666

6666

6666

6 •

retire

������

����

����

15

Page 16: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Flushing and Burch-Dill for MOP

✹ Define

• flushed MOP state ≡ its q-component is empty

✹ Given m ∈ M, any MOP# run m→ m1 → m2 → . . .

• . . . terminates . . .

• . . . in the same final flushed state m′

✹ Define

• α(m) = the ISA-component of m′

✹ Burch-Dill Theorem for MOP :

M

α

��

ρ//M

α

��

(for any MOP rule ρ)

I[isa step]

//I16

Page 17: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Simulating Microarchitectures in MOP

Theorem To verify P against ISA, it suffices to find β : P →M

such that, for every p ∈ P , β (p step p) is reachable from β p.

Proof:

p

β

��

p step// p′

β

��

m //

α

��

• //

��

• //

��

• //

��

• //

}}{{{{

{{{{

{{{{

{{{{

{{{{

{{• //

��111

1111

1111

1111

11• //

��

• //

�� m′

α

��

←− mam steps

s //• //• // s′ ←− isa steps

17

Page 18: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Partitioning the Simulation Proof

p

β

��

p step// p′

β

��

m //• //• //• //• //• //• //• // m′

✹ How to find a sequence of MOP transitions from m to m′?

✹ —Using sequence of “quasi P -states” from p to p′:

〈v1, v2, v3, . . . , vn〉 = p

; 〈v′1, v2, v3, . . . , vn〉 = p1

; 〈v′1, v′2, v3, . . . , vn〉 = p2

; . . .

; 〈v′1, v′2, v′3, . . . , v

′n〉 = pn−1

; 〈v′1, v′2, v′3, . . . , v

′n〉 = p′

v′i = next v i (v1, . . . , vn)

✹ . . . get the corresponding MOP states m, m1, . . . , mn−1, m′

and short MOP paths from mi to mi+1.

18

Page 19: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

What We’ve Done

✹ Models of ISA,MOP ,DLX in reFLect

✹ Local confluence proofs with CVCL

✹ Simulation proofs for DLX (via short paths in MOP) with

CVCL

19

Page 20: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

To Do

✹ Case study of an out-of-order processor model

✹ Refining the method

• Systematic ways of defining β : P →M and finding short

paths to connect β(p) with β(p step p) in MOP

• Controlling term size in subgoals (heuristics for

expanding function definitions vs. treating them as

uninterpreted)

✹ Fast and flexible SMT solvers

20

Page 21: Mothers of Pipelines - brenta.dit.unitn.itbrenta.dit.unitn.it/~rseba/pdpar06/SLIDES_TALKS/Krstic_pdpar06.pdf · opc3 reg offset pc := ... One can formalize “instructions-in-flight”—parcels

Selection of Related Work

✹ Burch & Dill [CAV 94]

✹ Damm & Pnueli [CHARME 97]

✹ Shen & Arvind [Formal Techn. for Hardware 98]

✹ Skakkebæk & al. [CAV 98]

✹ Hosabettu & al. [CAV 00]

✹ Lahiri & Bryant [CAV 03]

✹ Manolios & Srinivasan [ICCAD 05]

21