Mothers of Pipelines

Sava Krstic, Robert Jones, John O’ Leary

Intel Corporation

PDPAR 2006

Pipeline Verification: P vs. ISA

Prove that a high-level model P of a pipelined processor is

faithful to the given instruction set architecture (ISA).

✹ Both P and ISA are seen as transition systems; the goal is

to prove that ISA in some reasonable sense simulates P .

States: I = 〈pc : IAddr, rf : RF, dmem : DMem, imem : IMem〉

Transitions:

guard: imem.pc = . . . action

opc1 dest src1 src2 pc := pc + 4 rf .dest := alu opc1 (rf .src1) (rf .src2)

opc2 dest src1 imm pc := pc + 4 rf .dest := alu opc2 (rf .src1) imm

ld dest src1 offset pc := pc + 4 rf .dest := dmem.(rf .src1 + offset)

st src1 dest offset pc := pc + 4 dmem.(rf .dest + offset) := rf .src1

opc3 reg offset pc :=

br target if br takenpc + 4 otherwise

, where

br target = target pc offsetbr taken = taken opc3 (rf .reg)

dest , src1 , src2 , reg : Reg imm, offset : Wordopc1 : {add, sub, mult} opc2 : {addi, subi, multi} opc3 : {beqz, bnez, j}

Example: P = DLX

. States of P : 〈pc, rf , dmem, imem, p1, p2, p3, p4〉

58?9>:=;<,,

��,,

48?9>:=;<,,

��,,

38?9>:=;<,,

��,,

28?9>:=;<,,

��,,

18?9>:=;<,,

��,,

p′1 p′2 p′3 p′4

68?9>:=;<,,

��,,

38?9>:=;<,,

��,,

28?9>:=;<,,

��,,

18?9>:=;<,,

��,,

ep′2 p′3 p′4

��

38?9>:=;<,,

��,,

28?9>:=;<,,

��,,

18?9>:=;<,,

��,,

p′3 p′4

[regular cycle] [branch taken] [stall for load]

18?9>:=;< p4 writes back to rf and retires

28?9>:=;< p3 does memory access

38?9>:=;< alu computes result/mem. address for p2

48?9>:=;< p1 gets data from rf or by forwarding;

in the branch case, target/taken computed

58?9>:=;< new p′1 is fetched; pc incremented

68?9>:=;< like 48?9>:=;<, plus updating pc with computed target

Simulating P in ISA

. First need to map states of P to ISA states:

.P -stateX = p1 p2 p3 p4 pc rf dmem imem

��

ISA-stateX = pc∗ rf ∗ dmem∗ imem∗

Flushing

. Mapping states of P to ISA states:

.P -state

��

X = p1

��

pc rf dmem imem

��

pc′ rf ′ dmem ′ imem

��

p′′3

��

p′′4

��

pc′ rf ′′ dmem ′′ imem

Xe e e

p′′′4

��

pc′ rf ′′′ dmem ′′′ imem

. Xe e e e

pc′ rf ′′′′ dmem ′′′ imem

ISA-stateX = pc′ rf ′′′′ dmem ′′′ imem

The Burch-Dill Method for DLX [CAV 94]

✹ D DLX states

✹ dlx step : D → D DLX transitions

✹ α : D → I flushing function

��

dlx step//D

��

Theorem

I[isa step]

The Burch-Dill Method in General

✹ P = I × PipeRegs states of P

✹ p step : P → P transitions of P

✹ α : P → I flushing function

✹ Correctness Theorem for P :

��

p step//P

��

I(isa step)∗

✹ For proof, don’t need defs of alu, target, taken, and only needbasic read-write array properties of rf and mem

✹ Correctness thm expressed in the language of SMT solvers

✹ . . . but it’s huge for complex P . Must partition the thm.

The MOP Method

Put a highly-nondeterministic “machine” MOP between P andISA and deduce the P vs. ISA correctness from P vs. MOP andMOP vs. ISA.

✹ MOP is derived from ISA and the features of P

✹ MOP is the mother of many pipelines:

((QQQQQQQQQQQQQQ

,,XXXXXXXXXXXXX

... MOP oo ____________ ISA

66mmmmmmmmmmmmmm

✹ Out-of-order execution and non-determinism in P are OK

✹ Potential to systematically partition correctness theorem

MOP : Starting Observations

✹ One can formalize “instructions-in-flight”—parcels

✹ Parcels form a poset of finite height—progress ordering

✹ With every state of P we can associate a set of parcels

✹ At each cycle of execution of P :

• pc, rf , mem get updated

• some parcels get in (fetched)

• some parcels get out (retired)

• some parcels progress

Parcels

Parcel =

〈instr : Instr⊥my pc : IAddr⊥dest , src1 , src2 : Reg⊥imm : Word⊥opc : Opcode⊥data1 , data2 : Word⊥res , mem addr : Word⊥tkn : bool⊥next pc : IAddr⊥wb : {⊥,⊤}pc upd : {⊥, s,m,⊤}〉

Definition of MOP

States: M = I × 〈q : N→ Parcel, head : N, tail : N〉

head tail

0 1 2 3 4 5 6 7 8

pc rf dmem imem

Transitions: Atomic actions occurring in executions

Transitions

def i = imem.pc fetch

grd length = 0 ∨ q.tail .pc upd 6= ⊥

act q.(tail + 1) := empty parcel[instr 7→ i, my pc 7→ pc] tail := tail + 1

def p = q.j decode j

grd head ≤ j ≤ tail ¬(decoded p)

act p := decode p

✹ 16 more rules: data1 rf , data1 forward, result, mem addr,write back j, load j, store j, branch target j, branch taken j,next pc branch j, next pc nonbranch j, pc update, speculate,prediction ok j, squash, retire

Confluence

MOP# ≡ MOP without fetch

Theorem MOP# is terminating.

Theorem Both MOP and MOP#

are locally confluent.

(≈ 400 little theorems)

��

��222

��

Local Confluence: About Proof

Most cases are resolved trivially:

•ρ1

��

��666

ρ2��6

666 •

ρ1��

��

or •

��

��222

•ρ2

Some are interesting:

retire

��

data queue j

��666

data rf j

��666

retire

��

Flushing and Burch-Dill for MOP

✹ Define

• flushed MOP state ≡ its q-component is empty

✹ Given m ∈ M, any MOP# run m→ m1 → m2 → . . .

• . . . terminates . . .

• . . . in the same final flushed state m′

✹ Define

• α(m) = the ISA-component of m′

✹ Burch-Dill Theorem for MOP :

��

(for any MOP rule ρ)

I[isa step]

Simulating Microarchitectures in MOP

Theorem To verify P against ISA, it suffices to find β : P →M

such that, for every p ∈ P , β (p step p) is reachable from β p.

Proof:

��

p step// p′

��

• //

��

• //

��

• //

��

• //

}}{{{{

{{• //

��111

11• //

��

• //

�� m′

��

←− mam steps

s //• //• // s′ ←− isa steps

Partitioning the Simulation Proof

��

p step// p′

��

m //• //• //• //• //• //• //• // m′

✹ How to find a sequence of MOP transitions from m to m′?

✹ —Using sequence of “quasi P -states” from p to p′:

〈v1, v2, v3, . . . , vn〉 = p

; 〈v′1, v2, v3, . . . , vn〉 = p1

; 〈v′1, v′2, v3, . . . , vn〉 = p2

; . . .

; 〈v′1, v′2, v′3, . . . , v

′n〉 = pn−1

; 〈v′1, v′2, v′3, . . . , v

′n〉 = p′

v′i = next v i (v1, . . . , vn)

✹ . . . get the corresponding MOP states m, m1, . . . , mn−1, m′

and short MOP paths from mi to mi+1.

What We’ve Done

✹ Models of ISA,MOP ,DLX in reFLect

✹ Local confluence proofs with CVCL

✹ Simulation proofs for DLX (via short paths in MOP) with

✹ Case study of an out-of-order processor model

✹ Refining the method

• Systematic ways of defining β : P →M and finding short

paths to connect β(p) with β(p step p) in MOP

• Controlling term size in subgoals (heuristics for

expanding function definitions vs. treating them as

uninterpreted)

✹ Fast and flexible SMT solvers

Selection of Related Work

✹ Burch & Dill [CAV 94]

✹ Damm & Pnueli [CHARME 97]

✹ Shen & Arvind [Formal Techn. for Hardware 98]

✹ Skakkebæk & al. [CAV 98]

✹ Hosabettu & al. [CAV 00]

✹ Lahiri & Bryant [CAV 03]

✹ Manolios & Srinivasan [ICCAD 05]

Mothers of Pipelines -...

Documents

Modellingand Regulatingof Cardio-Respiratory ......to investigate the dynamic characteristics of cardio-respiratory responses to the onset and oﬀset exercises. The key device for

Edinburgh Research Explorer · are studied. We ﬁnd that both the three-site OPC3 and four-site TIP4P-ϵ empirical force ﬁelds of water are equally superior for reproducing dielectric

Satisfiability Modulo Theories - DISI, University of Trentodisi.unitn.it/~rseba/papers/p02c12_smt.pdf · general firstorder satisfiability, but rather satisfiability with respect

TAKASHI MURAKAMI AND THEN AND THEN AND … MURAKAMI AND THEN AND THEN AND RUST BLUE Medium: Oﬀset Lithograph Signed, Numbered & Dated Date: 2006 …

Curriculum Vitae Prof. Ing. Roberto Sebastiani, PhDdisi.unitn.it/rseba/inglcurr.pdf · 2.2Development of tools for Automated Reasoning, Formal Veriﬁcation and Require-ment Engineering

Home Loans Star Ratings - Australia's Biggest Financial ... · Reduce Home Loans Inv Rate Buster 100% Oﬀset Var ... RAMS Inv Low Rate IO Regional Australia Bank Inv Mortgage Oﬀset

North-South Oﬀset of Heliospheric Current Sheet and its …sun.stanford.edu/~xuepu/PUBLICATION/iau_jd07.pdf · North-South Oﬀset of Heliospheric Current Sheet and its Causes X

The 2018-19 Signs & Wide Format Pricing Survey Worksheetprintingresearch.org/.../06/2018-SignWorksheet.pdf · Primarily an oﬀset & digital printing ﬁrm providing a small/modest

Algorithms and Data Structures - DISI, University of Trentodisi.unitn.it/~rseba/DIDATTICA/dsa2011_BZ/dsa07.pdf · Data Structures and Algorithms Week 7 1. Dictionaries 2. Hashing

Fundamentals of Artificial Intelligence Chapter 14 ...disi.unitn.it/~rseba/DIDATTICA/fai_2020/SLIDES/...1/35. Outline 1 Bayesian Networks 2 Constructing Bayesian Networks 3 Exact Inference

Techniques in OS-Fingerprinting - nostromonostromo.joeh.org/osf.pdf · Oﬀset Reserved Flags Window Size Checksum Urgent Pointer Options Padding Data ... TECHNIQUES OF OS FINGERPRINTING

GSTE is Partitioned Model Checkingydisi.unitn.it/~rseba/papers/FMSD07.pdf · 2008-02-26 · GSTE is Partitioned Model Checking 3 representation, which are orthogonal issues.) We rst

Virtual memory - Pagingjohanmon/courses/id2206/lectures/paging.pdf · The paging MMU with TLB virtualaddr. // TLB VPN + oﬀset PFN physicaladdress + PTBR VPN Pagetableinmemory PTE

Offset modeling of shell elements - A study in shell ... · elements,whicharedeﬁnedinthemidsurfaceofthecomponent ... wants to oﬀset the element midsurface from the nodal plane

Second Scandinavian PIPAC Workshop - Program... · Incl. PIPAC-OPC3 Trial +SPS ePIPAC and nabPaclitaxel . Martin Graversen Michael Bau Mortensen . 15.00 – 15.30 Coffee break 15.30

Building a Character Animation System - Ari Shapiroarishapiro.com/Building_Animation_Shapiro.pdf · Building a Character Animation System 3 Order Controller Comments 1 World oﬀset

Galileo Solar Space TelescopeCPRIME-GER-RTE-0001/2018_v01 vii 4.2.6 OPC3-2: Power analysis.....80 4.2.7 OPC3-2: Thermal

Fundamentals of Artificial Intelligence Chapter 03: Problem Solving …disi.unitn.it/~rseba/DIDATTICA/fai_2020/SLIDES/03-search.pdf · 2020. 12. 8. · Outline 1 Problem-Solving Agents

Data Structures with Arithmetic Constraints: a Non ...disi.unitn.it/~rseba/TalksFrocos2009/Nicolini.pdf · inrialoria-logo Data Structures with Arithmetic Constraints: a Non-Disjoint

September 7th & 8th 2018 - Alti sites · 14:00 PIPAC-OPC2 & PIPAC-OPC3, M. Graversen (Odense, Denmark) 14:15 Pause SESSION 5 – CLINICAL STUDIES: 14:45 PMGA-PIPAC, B. Rau (Berlin,