A Systematic Methodology for Verifying Superscalar ... · We present a systematic approach to decompose and incrementally build the proof of correctness of pipelined microprocessors

NASA/CR-1999-208991

A Systematic Methodology for Verifying

Superscalar Microprocessors

Mandayam Srivas

SRI International, Menlo Park, CA

Ravi Hosabettu & Ganesh Gopalakrishnan

University of Utah, Salt Lake City, UT

February 1999

https://ntrs.nasa.gov/search.jsp?R=19990018275 2020-04-02T08:25:39+00:00Z

The NASA STI Program Office ... in Profile

Since its founding, NASA has been dedicated tothe advancement of aeronautics and spacescience. The NASA Scientific and Technical

Information (STI) Program Office plays a keypart in helping NASA maintain this importantrole.

The NASA STI Program Office is operated byLangley Research Center, the lead center forNASA's scientific and technical information. The

NASA STI Program Office provides access to theNASA STI Database, the largest collection ofaeronautical and space science STI in the world.The Program Office is also NASA's institutionalmechanism for disseminating the results of itsresearch and development activities. Theseresults are published by NASA in the NASA STIReport Series, which includes the followingreport types:

TECHNICAL PUBLICATION. Reports ofcompleted research or a major significantphase of research that present the results ofNASA programs and include extensivedata or theoretical analysis. Includescompilations of significant scientific andtechnical data and information deemed to

be of continuing reference value. NASAcounterpart of peer-reviewed formalprofessional papers, but having lessstringent limitations on manuscript lengthand extent of graphic presentations.

TECHNICAL MEMORANDUM. Scientific

and technical findings that are preliminaryor of specialized interest, e.g., quick releasereports, working papers, andbibliographies that contain minimalannotation. Does not contain extensive

analysis.

CONTRACTOR REPORT. Scientific and

technical findings by NASA-sponsoredcontractors and grantees.

CONFERENCE PUBLICATION. Collected

papers from scientific and technicalconferences, symposia, seminars, or othermeetings sponsored or co-sponsored byNASA.

SPECIAL PUBLICATION. Scientific,technical, or historical information from

NASA programs, projects, and missions,often concerned with subjects havingsubstantial public interest.

TECHNICAL TRANSLATION. English-language translations of foreign scientificand technical material pertinent to NASA'smission.

Specialized services that complement the STIProgram Office's diverse offerings includecreating custom thesauri, building customizeddatabases, organizing and publishing researchresults ... even providing videos.

For more information about the NASA STI

Program Office, see the following:

• Access the NASA STI Program Home Pageat http'//www.sti.nasa.gov

• E-mail your question via the Internet [email protected]

• Fax your question to the NASA STI HelpDesk at (301) 621-0134

• Phone the NASA STI Help Desk at(301) 621-0390

Write to:

NASA STI Help DeskNASA Center for AeroSpace Information7121 Standard DriveHanover, MD 21076-1320

NASA/CR-1999-208991

A Systematic Methodology for Verifying

Superscalar Microprocessors

Mandayam Srivas

SRI International, Menlo Park, CA


University of Utah, Salt Lake City, UT

National Aeronautics and

Space Administration

Langley Research CenterHampton, Virginia 23681-2199

Prepared for Langley Research Centerunder Contract NAS1-20334

February 1999

Available from:

NASA Center for AeroSpace Information (CASI)7121 Standard Drive

Hanover, MD 21076-1320

(301) 621-0390

National Technical Information Service (NTIS)5285 Port Royal Road

Springfield, VA 22161-2171(703) 605-6000

Abstract

We present a systematic approach to decompose and incrementally build

the proof of correctness of pipelined microprocessors. The central idea is to

construct the abstraction function by using completion functions, one per

unfinished instruction, each of which specifies the effect (on the observables)

of completing the instruction. In addition to avoiding the term size and case

explosion problem that limits the pure flushing approach, our method helps

localize errors, and also handles stages with iterative loops. The technique

is illustrated on pipelined and superscalar pipelined implementations of a

subset of the DLX architecture. It has also been applied to a processor without-of-order execution.

111

Contents

1

2

3

Introduction 1

The Completion Functions Approach 5

2.1 Pipelined Microprocessor Correctness Criteria ......... 5

2.2 The Completion Functions Approach .............. 7

Application of Our Methodology 11

3.1 Application to the DLX Processor ............... 11

3.1.1 Completion Functions and Constructing the Abstrac-tion Function ....................... 13

3.1.2 The Decomposition of the Proof ............ 153.1.3 The Proof Details .................... 17

3.2 Application to Superscalar DLX Processor ........... 20

3.2.1 Completion Functions and the Abstraction Function . 21

3.3 Application to Out-of-order Execution ............. 22

3.3.1 Constructing the Abstraction Function ......... 233.3.2 Proof Details ....................... 24

3.3.3 Comparison with the MAETT Approach ....... 25

3.4 Hybrid Approach to Reduce the Manual Effort ........ 25

4 Conclusions 27

v

List of Figures

2.1 Pipelined microprocessor correctness criteria .......... 6

2.2 A simple four-stage pipeline and decomposition of the proof

under completion functions ................... 8

3.1 Pipelined implementation .................... 12

3.2 The decomposition of the commutative diagram for regfile 16

3.3 The issue logic in the superscalar DLX processor ....... 20

3.4 The processor with out-of-order execution (example used in

[SH97]) .............................. 23

vi

Chapter 1

Introduction

In the past few years, research advance in hardware verification has resulted

in the successful verifications of several large and real hardware designs.

The verification [SM95] using PVS [ORSvH95] of Rockwell International's

AAMP5 and AAMP-FV microprocessors, which was sponsored by NASA's

Langley Research Laboratory, was one example of such an effort. While such

verification efforts have certainly increased the awareness of the value of for-

mal verification within the hardware design industry, the technology is still

far from being successfully and completely transitioned to industry. As the

AAMP verification projects demonstrated, the main obstacles to technology

transition, especially in microprocessor verification, are the following:

1. Lack of efficient capabilities for symbolic simulation (with uninter-

preted functions) of hardware designs and automatic decision proce-

dures for the most commonly used data types in hardware designs,such as bit-vectors.

2. Lack of suitable verification methodologies that are applicable to the

kind of challenging architectures, such as superscalar pipelines, out-

of-order execution, etc., employed by modern microprocessors.

Support for efficient symbolic simulation is crucial because symbolicsimulation is at the core of most of the methods based on commutative

diagram correspondence checking used in verifying microprocessor designs

at the register-transfer level. Although there exist systems, such as ACL2

[KM96], that support faster symbolic simulation than the current public

version of PVS, efficient symbolic simulation alone is not sufficient for scal-

ing up verification to state-of-the-art microprocessors, such as the Pentium

2 Chapter 1. Introduction

processor. We also need a verification methodology, i.e., appropriate ab-

straction/refinement techniques and re-usable proof strategies, to setup the

overall verification and decompose the complex verification problem into

properties that can be automatically verified by symbolic simulation and

decision procedures. Under the sponsorship of NASA's Langley Research

center, SRI has been working on developing solutions to the above obstacles

in scaling up formal verification of microprocessors. This document reports

the result of the second task of developing a systematic methodology for

verifying microprocessors that employ advanced design features, such as su-

perscalar pipelining, speculative execution, and out-of-order execution to

enhance their throughput. Under a separate NASA task, we are enhancing

the efficiency and automation capabilities of symbolic simulation in PVS.

Most approaches to mechanical verification of pipelined processors rely

on several key techniques. First, given a pipelined implementation and a

simpler Instruction Set Architecture (ISA)-level specification, they require

a suitable abstraction mapping from an implementation state to a specifica-

tion state. They use the abstraction function to establish a correspondence

between the two machines by means of a commutative diagram. Second,

they use symbolic simulation to derive logical expressions corresponding

to the two paths in the commutative diagram, which are then tested for

equivalence. An automatic way to perform this equivalence testing is to use

ground decision procedures for equality with uninterpreted functions such as

the ones in PVS. This strategy has been used to verify several processors in

PVS [Cyr93, CRSS94, SM95]. Some of the approaches to pipelined proces-

sor verification rely on the user providing the definition for the abstraction

function. Burch and Dill in [BD94] observed that the effect of flushing the

pipeline, for example by pumping a sequence of NOPs, can be used to auto-

matically compute a suitable abstraction function. Burch and Dill used this

flushing approach along with a validity checker [JDB95, BDL96] (i.e., their

version of a decision procedure for uninterpreted functions with equality) to

effectively automate the verification of pipelined implementations of several

processors.

The pure flushing approach has the drawback of making the size of the

generated abstraction function and the number of examined cases imprac-

tically large for deep and complex superscalar pipelines. To verify a su-

perscalar example using the flushing approach, Burch [Bur96] decomposed

the verification problem into three subproblems and suggested a technique

requiring the user to add some extra control inputs to the implementation

and set them appropriately to construct the abstraction function. He also

had to fine-tune the validity checker used in the experiment, requiring the

userto help it with manymanuallyderivedcasesplits. It is unclearhowthedecompositionof theproofandtheabstractionfunctionusedin [Bur96]canbe reusedfor verifyingothersuperscalarexamples.Anotherdrawbackof the pure flushingapproachis that it is hard to usefor pipelineswithindeterminatelatency. Sucha situation can ariseif the control involvesdata-dependentloopsor if somepart of theprocessor,suchasthe memory-cacheinterface,is abstractedawayfor managingthe complexityin verifyinga largesystem.

Weproposea systematicmethodologyto modularizeaswell asdecom-posethe proofof correctnessof microprocessorswith complexpipelinear-chitectures.Calledthe completion functions method, our approach relies on

the user expressing the abstraction function in terms of a set of completion

functions, one per unfinished instruction in the machine. Each completion

function specifies the desired effect (on the observables) of completing the

instruction. Notice that one is not obligated to state how such completion

would actually be attained, which, indeed, can be very complex, involving

details such as squashing, pipeline stalls, and even data-dependent iterative

loops. Moreover, we strongly believe that a typical designer would have a

very clear understanding of the completion functions, and would not find

the task of describing them and constructing the abstraction function oner-

ous. In addition to actually gaining from designers' insights, verification

based on the completion functions method has other advantages. It results

in a natural decomposition of proofs. Proofs build up in a layered manner

where the designer actually debugs the last pipeline stage first through a

verification condition, and then uses this verification condition as a rewrite

rule in debugging the penultimate stage, and so on. Because of this layering,

the proof strategy employed is fairly simple and almost generic in practice.

Debugging is far more effective than in other methods because errors can

be localized to a stage, eliminating the need to wade through monolithic

proofs.

Related Work

Levitt and Olukotun [LO96] use an "unpipelining" technique for merging

successive pipeline stages through a series of behavior preserving transfor-

mations. While unpipelining also results in a decomposition of the proofs,

their transformation is performed on the implementation, whereas comple-

tion functions are defined based on the specification. Their transformations,

whihc have been used only for a single issue pipeline, can get complex for

superscalar processors and processors with out-of-order execution. Cyrluk's

4 Chapter 1. Introduction

technique in [Cyr96], which has also been applied to a superscalar processor,

tackles the term size and case explosion problem by lazily "inverting the ab-

straction mapping" to replace big implementation terms with smaller spec-

ification terms and using the conditions in the specification terms to guide

the proof. Park and Dill have used aggregation functions [PD96], which are

conceptually similar to completion functions, for distributed cache coherence

protocol verification. In [SH97], Sawada and Hunt used an incremental ver-

ification technique to verify a processor with out-of-order execution, which

we have reverified with our approach. We describe the differences between

the two approaches in section 3.3.

Chapter 2

The Completion Functions

Approach to Processor

Verification

The completion functions approach aims to develop the proof of correctness

of pipelined processors in a modular and layered fashion.

2.1 Pipelined Microprocessor Correctness Crite-

ria

Figure 2.1(a) shows the correctness criterion (used in [SH97, BD94]) that we

aim to establish. Figure 2.1(a) requires that every sequence of n implemen-

tation transitions that start and end with flushed states (i.e., no partially

executed instructions) corresponds to a sequence of m instructions (i.e., tran-

sitions) executed by the specification machine. I_step is the implementation

transition function and A_step is the specification transition function. The

projection extracts only those implementation state components visible

to the specification (i.e., the observables). This criterion is preferred over

others where the commute diagram does not necessarily start with a flushed

state because it corresponds to the intuition that a real pipelined micropro-

cessor starting at a flushed state, running some program and terminating in

a flushed state is emulated by a specification machine whose starting and

terminating states are in direct correspondence through projection. This

criterion can be proved by induction on n once the commutative diagram

condition shown in Figure 2.1(b) has been proved on a single implementa-

Chapter 2. The Completion Functions Approach

tion machine transition. This inductive proof can be constructed once, as we

have demonstrated in the proof files given in [Hos98], for arbitrary machines

that satisfy the conditions described in the next paragraph. In the rest of

the paper, we concentrate on verifying the commutative diagram condition.

flushed

impl state

I I

nI step i ImA stepI I

pro ec io ";flushed

implstate

(a)

im state

El

I step

El

ABS ,.(

.(ABS

)

A step

)

(b)

Figure 2.1: Pipelined microprocessor correctness criteria

Intuitively, Figure 2. l(b) states that if the implementation machine starts

in an arbitrary reachable state impl_state and the specification machine

starts in a corresponding specification state (given by an abstraction func-

tion ABS), then after execution of a transition, their new states correspond.

ABS must be chosen so that for all flushed states fs the projection condi-

tion ABS(fs) = projection(fs) holds. The commutative diagram uses a

modified transition function A_step', which denotes zero or more applica-

tions of A_step, because an implementation transition from an arbitrary

2.2. The Completion Functions Approach 7

state might correspond to executing in the specification machine zero in-

structions (e.g., if the implementation machine stalls because of pipeline

interlocks) or more than one instruction (e.g., if the implementation ma-

chine has multiple pipelines). The number of instructions executed by the

specification machine is provided by a user-defined synchronization function

on implementation states. One of the crucial proof obligations is to show

that this function does not always return zero. One also needs to prove that

the implementation machine will eventually reach a flushed state if no more

instructions are inserted into the machine, to make sure that the correctness

criterion in Figure 2.1(a) is not vacuous. In addition, the user may need to

discover invariants to restrict the set of impl_state considered in the proof

of Figure 2.1(b) and prove that it is closed under I_step.

2.2 The Completion Functions Approach

One way of defining ABS is to use a part of the implementation definition,

modified, if necessary, to construct an explicit flush operation [BD94, Bur96].

The completion functions approach is based on using an abstraction function

that is behaviorally equivalent to flushing but is not derived operationally

via flushing. 1 Rather, we construct the abstraction function as a composi-

tion (followed by a projection) in terms of a set of completion functions that

map an implementation state to an implementation state. Each completion

function specifies the desired effect on the observables of completing a partic-

ular unfinished instruction in the machine (assuming those that were fetched

ahead of it are completed), leaving all nonobservable state components un-

changed. The order in which these functions are composed is determined

by the program order of the unfinished instructions. One can use any order

that is consistent, i.e., that has the same effect, as the program order. The

conditions under which each function is composed with the rest, if any, is

determined by whether the unfinished instructions ahead of it could disrupt

the flow of instructions, for example, by being a taken branch or by raising

an exception. Observe that one is not required to state how these conditions

are actually realized in the implementation. Any mistakes, either in speci-

fying the completion functions or in constructing the abstraction function,

might lead to a false negative verification result, but never a false positive.

Consider a very simple four-stage pipeline with one observable state com-

ponent regfile, which is shown in Figure 2.2. The instructions flow down

the pipeline with every cycle in order with no stalls, hazards, and so forth,

1Later we discuss a hybrid scheme extension that uses operational flushing.

Chapter 2. The Completion Functions Approach

updating the regfile in the last stage. (This is unrealistically simple, but

we explain how to handle these artifacts in subsequent sections.) At any

time, the pipeline can contain three unfinished instructions, which are held

in the three sets of pipeline registers labeled IF/ID, ID/EX, and EX/WB.

The completion function corresponding to an unfinished instruction held

in a set of pipeline registers (such as ID/EX) defines how the information

stored in those registers is combined to complete that instruction. In our

example, the completion functions are C_EX_WB,C_ID_EX, and C_IF_ID, re-

spectively. Now the abstraction function, whose effect should be to flush

the pipeline, can be expressed as a composition of these completion func-

tions as follows (we omit projection here as regfile is the only observable

state component):

ABS(impl_state) = C_IF_ID(C_ID_EX(C_EX_WB(impl_state)))

IFflD ID_Xi i

Fetch i Decode i

IF -- ID --i i

i

i

i

i

i

i

i

i

EX/WBi

Execute i Writeback

EX i WBi

re'file

impl_state

E]C-EX-WBo C ID EX O C IF ID O

I_step _ /v/ vc/ vc / /A_step

C EX WB C ID EX C IF IDI I

Figure 2.2: A simple four-stage pipeline and decomposition of the proof

under completion functions

2.2. The Completion Functions Approach

This definition of the abstraction function leads to a decomposition of

the proof of the commutative diagram for regfile as shown in Figure 2.2,

generating the following series of verification conditions, the last one of which

corresponds to the complete commutative diagram:

VCl: regfile(l_step(impl_state)) = regfile(C_EX_WB(impl_state))

VC2: regfile(C_EX_WB(I_step(impl_state))) =

regfile(C_ID_EX(C_EX_WB(impl_state)))

VC3: regfile(C_ID_EX(C_EX_WB(I_step(impl_state)))) =

regfile(C_IF_ID(C_ID_EX(C_EX_WB(impl_state))))

VC4: regfile(C_IF_ID(C_ID_EX(C_EX_WB(I_step(impl_state))))) =

regfile(A_step(C_IF_ID(C_ID_EX(C_EX_WB(impl_state)))))

The strategy behind the generation of verification conditions uses the fact

that I_step executes some part of each of the instructions already in the

pipeline as well as the newly fetched instruction. Each verification condition

states the expected effect of I_step has in advancing an instruction in the

pipeline. This effect can be expressed in terms of the completion functions.

For example, VC1 expresses the effect of I_step on the instruction in the

EX/WB registers: since regfile is updated in the last stage, we would

expect that after I_step is executed, the contents of regfile would be the

same as after completion of the instruction in the EX/WB registers.

Now consider the instruction in ID/EX. I_step executes it partially as per

the logic in stage EX, and then moves the result to the EX/WB registers.

C_EX_WBcan now be used to complete this instruction. This computation

must result in the same contents of regfile as completion of the instruc-

tions held in sets EX/WB and ID/EX of pipeline registers in that order.

This requirement is captured by VC2. VC3 and VC4 are constructed simi-

larly. Note that our ultimate goal is to prove VC4, with the proofs of VC1

through VC3 acting as "helpers." Each verification condition in the above

series can be proved using a standard strategy that involves expanding the

outermost function on both sides of the equation and using the previously

proved verification conditions (if any) as rewrite rules to simplify the expres-

sions, followed by automatic case analysis of the boolean terms appearing

in the conditional structure of the simplified expressions. Since we expand

only the topmost functions on both sides, and because we use the previously

proved verification conditions, the sizes of the expressions produced during

the proof and the required case analysis are kept in check.

The completion functions approach also supports incremental and lay-

ered verification. When proving VC1, we are verifying the write-back stage

10 Chapter 2. The Completion Functions Approach

of the pipeline against its specification C_EX_I4B.When proving VC2, we are

verifying one more stage of the pipeline, and so on. This makes it easier to

locate errors. In the flushing approach, if there is a bug in the pipeline, the

validity checker would produce a counterexample--a set of formulas poten-

tially involving all the implementation variables--that implies the negation

of the formula corresponding to the commutative diagram. Such a coun-

terexample cannot isolate the stage in which the bug occurred.

Another advantage of using completion functions is that their definition,

unlike that of a flush operation, is not dependent on the latency of the

pipeline. Hence, our method is applicable even when the latency of the

pipeline is indeterminate. Such a situation can occur when, for example, the

pipeline contains data-dependent iterative loops or when the implementation

machine has nondeterminism. The proof that the implementation eventually

reaches a flushed state can be constructed by defining a measure function

that returns the number of cycles the implementation takes to flush and

showing that the measure decreases after a transition from a nonflushedstate.

A disadvantage of the completion functions approach is that the user

must explicitly specify the definitions for these completion functions and

then construct an abstraction function. In a later section, we describe a

hybrid approach to reduce the manual effort involved in this process.

Chapter 3

Application of Our

Methodology

In this section, we illustrate the application of our methodology to verify

three examples: pipelined and superscalar pipelined implementations of a

subset of the DLX processor [HP90] and a processor with out-of-order execu-

tion. The DLX example was previously verified in [BD94] using the flushing

approach, the superscalar DLX example in [Bur96], and the processor with

out-of-order execution in [SH97]. We describe how to specify the comple-

tion functions, construct an abstraction function, and handle stalls. We also

show the handling of speculative fetching and out-of-order execution, and

illustrate the particular decomposition and the proof strategies we used. In

Section 3.4, we explain a hybrid approach that reduces the effort in specify-

ing the completion functions in some cases. Our verification is carried out

in PVS [ORSvH95]. The detailed implementation, specification, and the

proofs for all these examples can be found at [Hos98].

3.1 Application to the DLX Processor

The specification of this processor has four state components: the program

counter pc, the register file regfile, the data memory dmem, and the in-

struction memory imem. The processor supports six types of instruction:

load, store, unconditional jump, conditional branch, alu-immediate and

a three-register alu instruction. The ALU is modeled using an uninter-

preted function. The memory system and the register file are modeled as

stores with read and write operations.

The implementation uses a five-stage pipeline as shown in Figure 3.1. We

11

12 Chapter 3. Application of Our Methodology

bubble idinstr id

Fetch a new

instmction.

Update pc.

IF

IF/ID

Complete jumf

and branch

Rin_taid_;_rna_i as

fo;? ers ID

bubbleex

operanda

operand b

opcode ex

destex

offsetex

ID/EX

Compute alu

result or the

target memot2¢

address.

EX

i

dest illeill

result illeill

load flag

store flag

mar

EX/MEM

iiiiiiiiiiii

dest wbresult wb

Store to or

load fiom

illeillO i_/.

MEM

MEM/WB

Write to the

register file.

WB

Figure 3.1: Pipelined implementation

organize the fifteen pipeline registers holding information about the partially

executed instructions in the design into four sets (shown in columns in Fig-

ure 3.1). The intended functionality of each stage is described in words

inside the box denoting the stage. The observable components modified in

each stage are indicated above the stage (e.g., pc is incremented in the IF

stage and conditionally modified in the ID stage--and hence is shown twice).

The implementation uses a simple "assume not taken" prediction strategy

for jump and branch instructions. Consequently, if a jump or branch is

indeed taken (br_taken signal is asserted), then the pipeline squashes the

subsequent instruction and corrects the pc. If the instruction in the IF/ID

registers is dependent on a load in the ID/EX registers, then that instruc-

tion will be stalled for one cycle (st_issue signal is asserted); otherwise,

the instructions flow down the pipeline with every cycle. No instruction is

fetched in the cycle where stall_input is asserted. The implementation

provides forwarding of data to the instruction decode unit (ID stage) where

the operands are read. The details of forwarding are not shown in the figure.

3.1. Application to the DLX Processor 13

3.1.1 Completion Functions and Constructing the Abstrac-tion Function

The processor can have at most four partially executed instructions at any

time, one each in the four sets of pipeline registers shown in Figure 3.1.

We associate a completion function with each such instruction. We need

to identify how a partially executed instruction is stored in a particular set

of pipeline registers--once this is done, the completion function for that

unfinished instruction can be easily derived from the ISA specification.

Consider the set IF/ID of pipeline registers. The intended functionality

of the IF stage is to fetch an instruction (place it in instr_id) and increment

the pc. The bubble_id register indicates whether or not the instruction is

valid. (It might be invalid, for example, if it is being squashed due to a taken

branch). So, to complete the execution of this instruction, the completion

function should do nothing if the instruction is not valid, otherwise it should

update the pc with the target address if it is a jump or a taken branch

instruction, update the dmem if it is a store instruction and update the

regfile if it is a load, alu-immediate or alu instruction according to the

semantics of the instruction. The details of how these operations are done

can be obtained from the ISA specification. This function is not obtained by

tracing the implementation, instead, the user directly provides the intendedeffect. Also note that we are not concerned with load interlock or data

forwarding while specifying the completion function. We call this functionC_IF_ID.


Complete the unfinished instruction in ID/EX pipeline registers.

Complete_ID_EX(is:state_I):state_I =

is WITH [ (dmem) :=

_ Complete the store instruction.

IF (instr_class(opcode_ex(is)) = store)

AND N0T (bubble_ex(is)) THEN

write_dmem(dmem(is),add(operand_a(is),

offset_ex(is)),operand_b(is))

_ Otherwise leave it unchanged.

ELSE dmem(is) ENDIF,

(regfile) :=

_ Complete the load instruction.

IF N0T (dest_ex(is)=zero_reg) AND NOT(bubble_ex(is))

AND (instr_class(opcode_ex(is)) = load) THEN

write_reg(regfile(is),dest_ex(is),read_dmem(dmem(is),

add(operand_a(is),offset_ex(is))))

_ Complete alu_reg & alu_irgned instructions.

ELSIF N0T (dest_ex(is)=zero_reg) AND N0T (bubble_ex(is))

AND ((instr_class(opcode_ex(is)) = alu_reg) OK

(instr_class(opcode_ex(is)) = alu_immed)) THEN

write_reg(regfile(is),dest_ex(is),

alu(alu_op_of(opcode_ex(is)),

operand_a(is),operand_b(is)))

_ Otherwise leave it unchanged.

ELSE regfile(is) ENDIF ]

k/_

Complete the unfinished instruction in MEM/WB pipeline registers.

Complete_MEM_WB(is:state_I):state_I =

is WITH [ (regfile) :=

_ regfile is the only component updated here.

IF NOT(dest_wb(is)=zero_reg) THEN

write_reg(regfile(is),dest_wb(is),result_wb(is))

ELSE regfile(is) ENDIF ]

Now consider the unfinished instruction in the set ID/EX of pipeline

registers. The ID stage completes the execution of jump and branch in-

structions, so this instruction would affect only dmem and regfile. The

bubble_ex indicates whether or not this instruction is valid, operand_a

and operand_b are the two operands read by the ID stage, opcode_ex and

dest_ex determine the opcode and the destination register of the instruc-

tion and offset_ex is used to calculate the memory address for load and

store instructions. The completion function should state how these bits

of information can be combined to complete the instruction, which again

can be gleaned from the specification. We call this function C_ID_EX. Simi-

larly, the completion functions for the other two sets of pipeline registers--

C_EX_MEM and C_MEM_WB--arespecified.Two ofthesefunctions--C_ID_EXand


C_MEM_WB--areshown in the PVS code fragment [_.

The completion functions for the unfinished instructions in the initial

sets of pipeline registers are very close to the specification and it is very

easy to derive them. (For example, C_IF_ID is almost the same as the speci-

fication.) However, the completion functions for the unfinished instructions

in the later sets of pipeline registers are harder to derive, as the user needsto understand how the information about the instruction is stored in the

various pipeline registers, but the functions themselves are usually much

more compact. For example, once the designer knows that result_wb holds

the result of the instruction in the write-back stage, all C_NEM_WBhas to do

is to update the register using result_wb. Also the completion functions

are independent of how the various stages are implemented and just depend

on their functionality.

Since the instructions flow down the pipeline in program order, the ab-

straction function--which should have the cumulative effect of flushing the

pipeline--is defined as a simple composition of these completion functions:

ABS (impl_state) =

project ion (C_IF_ID(C_ID_EX(C_EX_MEM(C_MEM_WB(impl_state)))))

The synchronization function in this example returns zero if the instruction

in IF/ID registers is not issued because of a load interlock, or if no instruction

is fetched (because stall_input is asserted), or if the instruction fetched is

squashed because of a taken branch; otherwise, it returns one.

sync(impl_state:state_I): nat =

IF st_issue(impl_state) OR stall_input(impl_state)

OR br taken(imp1 state) THEN 0

ELSE I ENDIF

3.1.2 The Decomposition of the Proof

The decomposition we used for regfile for this example is shown in Fig-

ure 3.2. The justification for the first three verification conditions is similar

to that given for the example in Section 2.2.

However, in deriving verification conditions for the instruction i in the

IF/ID registers it is necessary to consider two separate cases depending on

whether or not the instruction could get stalled because of a load interlock.

If i is stalled, that is, st_issue is true in impl_state, then I_step will not

be advancing, i.e., has no effect on, the instruction i. So, the observables

at point 1 in Figure 3.2 should be as though i is not completed--C_IF_ID

should be applied in the upper path in the commutative diagram. VC4_r

captures this case (condition P1 = st_issue) shown in _]).


VC5_r is for the case when the instruction i is issued (so it should be

proved under condition P2 = NOT st_issue) and is generated similar to the

first three verification conditions. Observe that st_issue also appears as

a disjunct in the synchronization function and hence in A_step' Finally,

VC6_r is the verification condition corresponding to the final commutative

diagram for regfile.

VC4_r: LEMMA

FORALL (is:state_I, inp:inputs_type):

st_issue(is) IMPLIES

regfile(Complete_ID_EX(Complete_EX_MEM

(Complete_MEM_WB(l_step(is,inp))))) =

regfile(Complete_ID_EX(Complete_EX_MEM(Complete_MEM_WB(is))))

in] state

[]

I ste

C MEM WB x--O C EXMEM ,- C ID EX ,- CIF ID x__t,"

_ _ _ PIV_VC5r

/ J 1/ _,C MEM WB x--O C EX MEM " (_ C ID EX "1 @ C IF ID x--_

A step _

VC6 r

Figure 3.2: The decomposition of the commutative diagram for regfile

In general, we generate a separate verification condition for each of the

observables, because not every stage modifies every observable. The de-

composition used, and hence the VCs generated, for a particular observable

depends on the pipeline stages where that observable is updated. For exam-

ple, the first verification condition VCI_d for dmem, shown in [], states that

completing the instruction in the MEM/WB registers has no effect on dmem

since dmem is not updated in the last stage of the pipeline. Other verification

conditions are exactly identical to that of regfile.


_ First verification condition for dmem.

VCl_d: LEMMA

FOKALL (impl_state:state_I):

dmem(C_MEM_WB(impl_state)) = dmem(impl_state)

_ First verification condition for pc.

VCl_p: LEMMA


pc(Complete_ID_EX(Complete_EX_MEM(Complete_MEM_WB(impl_state))))

= pc(impl_state)

_ Second verification condition for pc.

VC2_p: LEMMA


N0T st_issue(impl_state) AND N0T br_taken(impl_state) IMPLIES

pc(Complete_IF_ID(Complete_ID_EX(

Complete_EX_MEM(Complete_MEM_WB(impl_state))))) =

pc(impl_state)

_ First verification condition for imem.

VCl_i: LEMMA


imem(Complete_IF_ID(Complete_ID_EX(

Complete_EX_MEM(Complete_MEM_WB(impl_state)))) =

imem(impl_state)

In

The decomposition for pc has three verification conditions. The last

three stages do not modify pc, and this fact is stated by VCI_p (shown in

[]). (The three completion functions are combined into one). The second

verification condition VC2_p captures the conditions under which the in-

struction in IF/ID registers does not affect pc and is shown in []. The third

verification condition corresponds to the final commutative diagram for pc.

Finally, the decomposition for imem has two verification conditions. The

first one is similar to VCI_p and is shown in _ and the second one corre-

sponds to the final commutative diagram for imem.

In summary, the decomposition we used has six verification conditions

for regfile and dmem, three for pc and two for imem, all systematically

generated as explained earlier. Also, this is the particular decomposition

that we chose; others are possible. For example, we could have avoided

generating and proving, say VC2_r, and proved that goal when it arises

within the proof of VC3_r if the prover can handle the term sizes.

3.1.3 The Proof Details

The proof is organized into three phases:


Generating and proving a set of rewrite rules that express certain

general properties about the completion functions.

Proving the verification conditions and other lemmas using the basicrewrite rules.

• Proving other proof obligations mentioned in Chapter 2 including in-

variants, if needed.

Rewrite rules about completion functions

These rules express the basic property that the completion functions: should

not modify, i.e., map to the same value, the hidden (i.e., nonobservable) vari-

ables. For each register in a particular set of pipeline registers, we need a

rewrite rule stating that the register is unaffected by the completion func-

tions of the unfinished instructions ahead of it. For example, for bubble_ex,the rewrite rule is:

bubble_ex (C_EX_MEM(C_MEM_WB(impl_state)) ) = bubble_ex (impl_state).

All these rules can be automatically generated (once the completion func-

tions are identified) and automatically proved by rewriting using the defini-

tions of the completion functions. We then define a PVS strategy

setup-rewrite-rules to make and enter these rules into the prover, and

the definitions and the axioms from the implementation and the specification

(leaving out a few on which we do case analysis), as rewrite rules.

Proving the verification conditions and other lemmas

The proof strategy for proving all the verification conditions is similar--

use the PVS strategy setup-rewrite-rules to install the rewrite rules

mentioned earlier, set up the previously proved verification conditions as

rewrite rules, expand the outermost functions on both sides, use the PVS

command assert to do the rewrites and simplifications by decision proce-

dures, and then perform case analysis with the PVS strategy (apply (then*

(repeat (lift-if)) (bddsimp) (ground))). Minor differences were that

some verification conditions (like VCI_d, VCI_p) were proved simply by ex-

panding the definitions of the completion functions, some verification con-

ditions (like VC4_r) needed the outermost function to be expanded on only

one side (see _], expand the first occurrence of C_ID_EX and then use VC3_r),

and some were slightly more involved (like VC6_r), needing case analysis on

the various terms introduced by expanding A_step' followed by a similar

proof strategy as mentioned above.


The proof above needed a lemma expressing the correctness of the feed-

back logic. With completion functions, we could state this succinctly asfollows:

new_operand_a is the value returned by the feedback logic.

val a is the value found in the register file.

lemma new operand a: LEMMA

FORALL (is:state I):

NOT stall issue(is) AND NOT bubble id(is) IMPLIES

new_operand_a(is) =

val a(Complete ID EX(Complete EX MEM(Complete MEM WB(is))))

LA_

That is, the value read in the ID stage by the feedback logic (when the

instruction in the IF/ID registers is valid and not stalled) is the same as

the value read from regfile after the three instructions ahead of it are

completed. Observe that without completion functions, it would be hard to

state the correctness of the feedback logic. Its proof is done by using the

strategy setup-rewrite-rules to install rewrite rules mentioned earlier,

and then setting up the definitions occurring in the lemma as rewrite rules,

followed by the PVS command assert to do the rewrites and simplifications

by decision procedures, followed by (apply (then* (repeat (lift-if))

(bddsimp) (ground))) to do the case analysis.

Other proof obligations

We needed one invariant on the reachable states in this example, and it was

discovered during the proof of VC3_r. The proof that the invariant is closed

under I_step is trivial.

Finally, we prove that the implementation machine eventually goes to

a flushed state ("Eventual flush" obligation) if it is stalled sufficiently long

enough and then check in that flushed state fs, ABS (fs) = proj ection(fs).

For this example, this proof was done by observing that bubble_id will be

true after two stall transitions (hence no instruction in the IF/ID regis-

ters) and that this "no-instruction'-hess propagates down the pipeline with

every stall transition. We also need to show that the synchronization func-

tion does not always return zero ("No indefinite stutter") and the proof is

straightforward.

The table below shows the overall proof organization:


Proof Obligation Comments

Rewrite rules Automatically generated.

Verification Conditions

Lemma about feedback logicOne invariant

"Eventual flush" obligation

"No indefinite stutter" obligation

6 each for regfile and dmem,

3 for pc and 2 for imem.

All systematically generated.

Uniformly needed in all examples.

3.2 Application to Superscalar DLX Processor

The superscalar DLX processor [Bur96] is a dual-issue version of the DLX

processor. Both the pipelines have a structure similar to the one shown in

Figure 3.1 except that the second pipeline executes only alu-immediate and

alu instructions. In addition, the processor has a one-instruction buffer.

IF IDA I

IF ID B I

_gic

_- Fhst Pipeline

" Second PipelineALU instructions only

IF ID C_ Buffer

Figure 3.3: The issue logic in the superscalar DLX processor

3.2. Application to Superscalar DLX Processor 21

The issue logic in this processor model, shown in Figure 3.3, is fairly

complex--from zero to two instructions can be issued per cycle. Instruction

j can get stalled because of a load interlock, a dependency on instruction i,or because it is neither an alu-immediate nor alu instruction. If instruction

i is a taken branch, then instructions j and k need to be squashed. These

factors affect the latency of an instruction waiting to be issued and lead to

many scenarios in the proof of the commutative diagram. Once instructions

are issued, they flow down the pipeline and complete execution as in the

DLX example.

3.2.1 Completion Functions and the Abstraction Function

Specifying the completion functions for the various unfinished instructions is

similar to the DLX example. This example has nine unfinished instructions,

so there are potentially nine completion functions. Since the issued instruc-

tions proceed down the pipeline in lockstep, we state the combined effect

of completing instructions in the corresponding stages in the two pipelines

for the last three stages, and so we have only six completion functions. One

main difference is in constructing the abstraction function--we must state

how the completion functions of the unfinished instructions (i, j, and k)

in the IF/ID registers and the instruction buffer are composed to handle

the speculative fetching of instructions. These unfinished instructions could

be potential branches since the branch instructions are executed in the first

stage of the first pipeline as shown in Figure 3.3. So, while constructing

the abstraction function, instruction j should be completed only if instruc-

tion i is not a taken branch. This is as shown in [], where the completion

functions are named C_i, C_j, and C_k). Similarly, instruction k should be

completed only if instructions i and j are not taken branches. We used a

similar idea in constructing the synchronization function. The specification

machine would not execute any new instructions if any of the instructions

i, j, k mentioned above is a taken branch.


Completing the instructions i & j.

'rs' should be the composition of the completion functions of

the instructions ahead of i, in order.

This predicate tests if instruction i is a taken branch.

branch_taken_pipe_a(rs:real_state) : bool =

instr_kind_a(rs) = J 0R (instr_kind_a(rs) = BEQZ

AND select(reg(rs),rfl_of(instr_id_a(rs))) = zero)

Complete_IF_ID_AB(rs: impl_state): impl_state =

IF N0T bubble_id_a(rs) AND branch_taken_pipe_a(rs) THEN

Don't complete C_j, if instruction i is a taken branch.

C_i(rs)

ELSE

If not, complete instruction j.

C_j (C_i(rs))

ENDIF

It is very easy and natural to express these conditions by using comple-

tion functions since we are not concerned with exactly when the branches

are taken in the implementation machine. (See, for example, the predicate

branch_taken_pipe_a above). However, in the pure flushing approach, even

the definition of the synchronization function will be much more compli-

cated because it is necessary to cycle the implementation machine for many

cycles [Bur96].

The Differences with the DLX Proof

Because of the complexity of the issue logic in this example_ we needed eight

additional verification conditions capturing the various scenarios in which

the instructions get issued or stalled_ or moved around. The proofs of all

the verification conditions used similar strategies. The second difference was

that the synchronization function had many more cases in this example_ and

the previously proved verification conditions were used several times during

the proof.

3.3 Application to Out-of-order Execution

We have applied our methodology to an out-of-order execution processor

that was verified by Sawada and Hunt in [SH97]. This processor_ shown in

Figure 3.4_ has three execution units--a multiplier_ a load/store unit_ and

an adder--sharing the write-back stage. The patterned rectangular boxes

show the pipeline registers. The structural hazard due to this sharing of the

write-back stage is resolved in the issue logic by ensuring there is at most

3.3. Application to Out-of-order Execution 23

one instruction attempting to enter the write-back stage in any clock cycle.

An add instruction takes one cycle in the execution unit, a load instruction

takes two cycles, and a mult instruction takes three cycles. Since there is

no reorder buffer, instructions retire immediately after execution. So an add

instruction, issued immediately after a mult instruction, can complete before

it. The processor allows such out-of-order execution of an add instruction

only if its destination register is different from that of the mult instruction

issued earlier to avoid write-after-write hazards. The processor keeps track

of the current instructions executing in the three execution units in the

Scheduling Registers block for this purpose.

W/DC DC/EX

Bypass Logic

(address

calculation)

MLI/ML2

Scheduling Registers

ML2/ML3

EX/WB

] Mult

[ Unit 3

LDI_D2

Figure 3.4: The processor with out-of-order execution (example used in

[SH97])

3.3.1 Constructing the Abstraction Function

The abstraction function for this example is as shown in _] where the com-

pletion functions are named using the same convention as in the previous

examples. Note that the completion function for IF/DC and DC/EX stages

have been combined into one. The definitions for the completion functions

were derived in a fashion similar to the one used in the previous exam-

ples. In the previous examples the program order was apparent from the


structure of the pipelines and order in which the pipeline stages were exe-

cuted. Whereas here, because of the possibility of out-of-order execution,

the implementation machine does not have sufficient information to derive

the exact program order. For example, the instruction in EX/WB may or

may not be ahead of the instruction in ML2/ML3 in the program order.

Similarly, the program order of the instructions in LD1/LD2 and ML1/ML2

is unclear. In _], we have used CompleteYlL2YlL3 after Complete_gXNB,

i.e., the stage execution order, in the composition because that order is al-

ways guaranteed to consistent with the program order because there cannot

be write-after-write hazards. For ML1/ML and LD1/LD2, either order is

fine because there can be at most one valid instruction in any given cycle in

those registers.

Complete_IF_DC_EX completes the instruction in DC/EX

and then the one in IF/DC

if the first one is not squashed.

ABS(is:impl state): abs state =

project(Complete IF DC EX(Complete LDI LD2(

Complete MLI ML2(Complete ML2 ML3(Complete EX WB(is))))

3.3.2 Proof Details

The strategy behind generation of the verification conditions for this pro-

cessor is based on the observation that four cases arise when considering the

instruction about to access the write-back stage--a mult instruction in the

ML2/ML3 registers, a load instruction in the LD1/LD2 registers, an add

instruction in DC/EX registers that is about to be issued, and none of these

three possibilities. That these cases are mutually exclusive follows from

the fact that the structural hazard is resolved properly by the issue logic,

which is proved as an invariant. We then systematically build the proof of

the commutative diagram in the above four cases, formulating and proving

the verification conditions as in the earlier examples. The interesting case

is the scenario of out-of-order completion when the add instruction being

issued (from DC/EX registers) bypasses a mult instruction issued earlier

(and present in ML1/ML2 registers). As mentioned previously, the proces-

sor would issue such an add instruction only if its destination register is

different from that of the mult instruction issued earlier. So, though the

implementation completes the add instruction before the mult instruction,

one can prove that the net effect is as though mult is completed before add,

that is, the instructions are completed in the order used by the abstraction

function. This is captured by the following reordering lemma:

3.4. Hybrid Approach to Reduce the Manual Effort 25

lemma_reordering: LEMMA

FORALL (is : impl_state): issue_add?(is) IMPLIES

_ DC/EX has add instruction and MLI/ML2 has a mult instruction.

reg(Complete_MLl_ML2(Complete_DC_EX(

Complete_ML2_ML3(Complete_EX_WB(is))))) =

reg(Complete_DC_EX(Complete_MLl_ML2(

Complete_ML2_ML3(Complete_EX_WB(is)))))

The other details of the proof, such as, handling the bypass logic and squash-

ing, are similar to the earlier examples.

3.3.3 Comparison with the MAETT Approach

In [SH97], Sawada and Hunt construct an intermediate abstraction of the

implementation machine by using a table (called MAETT) representing the

(infinite) trace of all executed instructions up to the present time. They

achieve incrementality by postulating and proving individually a large set

of invariant properties about this intermediate representation, from which

they derive the final correctness proof. The main difference of our approach

is that the incremental nature of the proof in our case arises from the way

we construct our abstraction function and the decomposition of the proof

of the commutative diagram to which it leads. This decomposition is to a

large extent independent of the processor design. Our approach also has

the advantage that the amount of information the user needs to specify is

significantly less than in their method. For example, we require just a few

simple invariants on the reachable states and do not need to construct an

explicit intermediate abstraction of the implementation machine.

3.4 Hybrid Approach to Reduce the Manual Ef-

fort

In some cases, it is possible to derive the definitions of some of the comple-

tion functions automatically from the implementation to reduce the manual

effort. We illustrate this hybrid approach on the DLX example.

The implementation machine is specified in the form of a typical transi-

tion function giving the "new" value for each state component as a function

of the old values. Since the implementation modifies the regfile in the

write-back stage, we take C_MEM_WB to be new_regfile, which is a func-

tion of dest_wb and result_wb. To determine how C_EXYlEM updates the

register file from CAIEM_WB,we perform a step of symbolic simulation of

the nonobservables in the definition of CYlEM_WB,that is, replace dest_wb


and result_wb in its definition with their "new-" counterparts. Since the

MEM stage updates dmem, C_EX21EM will have another component modify-

ing dmem, which we simply take as new_dmem. Similarly, we derive C_ID_EX

from C_EX21EM through symbolic simulation. For the IF/ID registers, this

procedure gets complicated on two counts: the instruction there that could

get stalled because of a load interlock, and the forwarding logic that appears

in the ID stage. So, we let the user specify this function directly. We have

done a complete proof using these completion functions. The details of the

proof are similar. An important difference to note is that the verification

with this hybrid approach eliminated the need for the invariant that wasneeded earlier.

While reducing the manual effort, this way of deriving the completion

functions from the implementation has the disadvantage that we are verify-

ing the implementation against itself This contradicts our view of these as

desired specifications and negates our goal of incremental verification. To

combine the advantages of both, we could use a mixed approach where we

use explicitly provided and symbolically generated completion functions in

combination. For example, we could derive it for the last stage, specify it

for the penultimate stage, then derive it for the stage before that (from the

specification for the penultimate stage), and so on.

Chapter 4

Conclusions

One of the main obstacles to technology transition in the area of micropro-

cessor verification is the lack of a systematic reusable methodology for refin-

ing the verification task into small enough problems that can be discharged

automatically. The methodology must work for advanced optimization fea-

tures that are employed in today's processors. Toward this end, we have

developed a systematic approach to modularize and decompose the proof of

correctness of pipelined microprocessors with complex controllers to imple-

ment design features, such as superscalar pipelining, out-of-order execution,

and speculative execution. The overall efficiency and automation of our

method depends on the capabilities for symbolic simulation of the under-

lying verification system. Under a separate NASA task, we are enhancing

the efficiency and automation capabilities of symbolic simulation in PVS so

that the symbolic simulation speed can be brought to within a few orders

of magnitude of conventional simulation speed.

We have shown its generality by applying it to three different proces-

sors. The methodology relies on the user expressing the cumulative effect

of flushing in terms of a set of completion functions, one per unfinished

instruction. This method results in a natural decomposition of the proof

based on the individual stages of the pipeline and allows the verification to

proceed incrementally, overcoming the term size and case explosion problem

of the flushing approach. While this method increases the manual effort on

the part of the user, we found that the knowledge required in specifying the

completion functions, constructing the abstraction function, and formulat-

ing the verification conditions is close to the designer's intuition about how

the pipeline works.

One of our future plans is to build a system that uses PVS or a part of

27

28 Chapter 4. Conclusions

it as a back-end to support the methodology presented. Besides automating

parts of the methodology, this system would help the user interactively apply

the rest of the process. We would also like to see how our approach can be

extended to verify more complex pipeline control that uses reorder buffers

or other out-of-order completion techniques. Other plans include testing the

efficacy of our approach for verifying pipelines with data dependent iterative

loops and asynchronous memory interface.

Acknowledgments

We thank John Rushby and David Cyrluk for their feedback on earlier drafts

of this report.

Bibliography

[BD94]

[BDL96]

[Bur96]

[CRSS94]

[Cyr93]

[Cyr96]

[Hos98]

J. R. Burch and D. L. Dill. Automatic verification of pipelined

microprocessor control. In David Dill, editor, Computer-Aided

Verification, CAV _9_, volume 818 of Lecture Notes in Com-

puter Science, pages 68-80, Stanford, CA, June 1994. Springer-

Verlag.

Clark Barrett, David Dill, and Jeremy Levitt. Validity check-

ing for combinations of theories with equality. In Srivas and

Camilleri [SC96], pages 187-201.

J. R. Burch. Techniques for verifying superscalar microproces-

sors. In Design Automation Conference, DAC _96, June 1996.

D. Cyrluk, S. Rajah, N. Shankar, and M. K. Srivas. Effective

theorem proving for hardware verification. In Ramayya Ku-

mar and Thomas Kropf, editors, Theorem Provers in Circuit

Design (TPCD _9_), volume 910 of Lecture Notes in Computer

Science, pages 203-222, Bad Herrenalb, Germany, September

1994. Springer-Verlag.

David Cyrluk. Microprocessor verification in PVS: A method-

ology and simple example. Technical Report SRI-CSL-93-12,

Computer Science Laboratory, SRI International, Menlo Park,

CA, December 1993.

David Cyrluk. Inverting the abstraction mapping: A method-

ology for hardware verification. In Srivas and Camilleri [SC96],

pages 172-186.

Ravi Hosabettu. PVS specification and proofs of

the DLX, superscalar DLX examples and the proces-

sor with out-of-order execution, 1998. Available at

ht tp: / / www. csl.sri, co m / ~ravi / nasa / processo r.h tml.

29

30 Bibliography

[HP90]

[JDB95]

[KM96]

[LO96]

[ORSvH95]

[PD96]

[SC96]

[SH97]

[SM95]

John L. Hennessy and David A. Patterson. Computer Architec-

ture: A Quantitative Approach. Morgan Kaufmann, San Mateo,

CA, 1990.

R. B. Jones, D. L. Dill, and J. R. Burch. Efficient validity

checking for processor verification. In International Conference

on Computer Aided Design, ICCAD '95, 1995.

Matt Kaufmann and J Strother Moore. ACL2: An industrial

strength version of Nqthm. In COMPASS '96 (Proceedings

of the Eleventh Annual Conference on Computer Assurance),

pages 23-34, Gaithersburg, MD, June 1996. IEEE WashingtonSection.

Jeremy Levitt and Kunle Olukotun. A scalable formal verifi-

cation methodology for pipelined microprocessors. In Design

Automation Conference, DAC '96, June 1996.

Sam Owre, John Rushby, Natarajan Shankar, and Friedrich vonHenke. Formal verification for fault-tolerant architectures: Pro-

legomena to the design of PVS. IEEE Transactions on Software

Engineering, 21(2):107-125, February 1995.

Seungjoon Park and David L. Dill. Protocol verification by ag-

gregation of distributed actions. In Rajeev Alur and Thomas A.

Henzinger, editors, Computer-Aided Verification, CA V '96, vol-

ume 1102 of Lecture Notes in Computer Science, pages 300-310,

New Brunswick, N J, July/August 1996. Springer-Verlag.

Mandayam Srivas and Albert Camilleri, editors. Formal Meth-

ods in Computer-Aided Design (FMCAD '96), volume 1166 of

Lecture Notes in Computer Science, Palo Alto, CA, November

1996. Springer-Verlag.

J. Sawada and W. A. Hunt, Jr. Trace table based approach

for pipelined microprocessor verification. In Orna Grumberg,

editor, Computer-Aided Verification, CA V '97, volume 1254 of

Lecture Notes in Computer Science, pages 364-375, Haifa, Is-

rael, June 1997. Springer-Verlag.

Mandayam Srivas and Steven P. Miller. Formal verification

of a commercial microprocessor. Technical Report SRI-CSL-

95-4, Computer Science Laboratory, SRI International, Menlo

Bibliography 31

Park, CA, February 1995. Also available under the title Formal

Verification of an Avionics Microprocessor as NASA Contractor

Report 4682, July, 1995.

Form ApprovedREPORT DOCUMENTATION OMB No. 0704-0188

Publicreportingburden for thiscollection of informationis estimatedto average 1 hour per response,includingthe time for reviewinginstructions,searching existingdata sources,gathering and maintainingthe data needed,and completing andreviewingthe collection of information.Send comments regardingthisburden estimateor anyother aspectof thiscollection of information,includingsuggestions for reducingthisburden, to WashingtonHeadquartersServices, Directoratefor InformationOperationsand Reports,1215JeffersonDavisHighway,Suite 1204, Arlington, VA 22202-4302, andto the Officeof Managementand Budget,PaperworkReduCtionProject(0704-0188), Washington,DC 20503.

1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE

February 19994. TITLE AND SUBTITLE

A Systematic Methodology for Verifying Superscalar Microprocessors

6. AUTHOR(S)

Mandayam Srivas


7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)

SRI International; Menlo Park, CA

University of Utah; Salt Lake City, UT

9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES)

National Aeronautics and Space Administration

Langley Research Center

Hampton, VA 23681-2199

3. REPORT TYPE AND DATES COVERED

Contractor Report5. FUNDING NUMBERS

C NAS1-20334

W 519-50-11-01

8. PERFORMING ORGANIZATION

REPORT NUMBER

10. SPONSORING / MONITORINGAGENCY REPORT NUMBER

NASA/CR- 1999-208991

11. SUPPLEMENTARY NOTES

Srivas: SRI International, Menlo Park, CA; Hosabettu, Gopalakrishnan: University of Utah, Salt Lake City, UT.

Langley Technical Monitor: Paul S. Miner Final Report

12a. DISTRIBUTION / AVAILABILITY STATEMENT

Unclassified-Unlimited

Subject Category: 61 Distribution: Standard

Availability: NASA CASI, (301) 621-0390

12b. DISTRIBUTION CODE

13. ABSTRACT (Maximum 200 words)

We present a systematic approach to decompose and incrementally build the proof of correctness of pipelined

microprocessors. The central idea is to construct the abstraction function by using completion functions, one per unfinished

instruction, each of which specifies the effect (on the observables) of completing the instruction. In addition to avoiding the

term size and case explosion problem that limits the pure flushing approach, our method helps localize errors, and also

handles stages with interative loops. The technique is illustrated on pipelined and superscalar pipelined implementations of

a subset of the DLX architecture. It has also been applied to a processor with out-of-order execution.

14. SUBJECT TERMS

Formal Methods, Microprocessor Verification, Superscalar, Theorem Proving

17. SECURITY CLASSIFICATIONOF REPORT

Unclassified

18. SECURITY CLASSIFICATIONOF THIS PAGE

Unclassified

19. SECU RITY CLASSIFICATIONOF ABSTRACT

Unclassified

15. NUMBER OF PAGES

39

16. PRICE CODE

A03

20. LIMITATION OF ABSTRACT

NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89)Prescribedby ANSI Std. Z39-18298-102

Documents

A Systematic Methodology for Verifying Superscalar ... · We present a systematic approach to decompose and incrementally build the proof of correctness of pipelined microprocessors