46
Wolfgang Roesner Verification Tools Development IBM Corp Austin, TX Logic Simulation : Languages, Algorithms, Simulators

Wolfgang Roesner Verification Tools Development IBM Corp Austin, TX Logic Simulation : Languages, Algorithms, Simulators

Embed Size (px)

Citation preview

Wolfgang RoesnerVerification Tools DevelopmentIBM Corp Austin, TX

Logic Simulation :Languages, Algorithms, Simulators

Outline

Hardware Design Languages

Modeling Levels - A Taxonomy of HDL Constructs

General Purpose HDL Simulators - Event-Driven Sim

Improving Simulator Performance

Synchronous Design Methodology and Cycle-Based Sim

Let's Design a Cycle-Based Simulator

Simulation of Hardware Design Languages

Logic or "functional" simulation today is done mostly with HDLs (Hardware Design Languages)

Most popular languages today (both are IEEE standards) Verilog VHDL

Verilog: logic modeling and simulation language started in EDA industry (start-up) in the 80's was acquired by Cadence donated to IEEE as a general industry standard approx. 60% market share in U.S. EDA market

VHDL: committee-designed language contracted by U.S. (DoD) (ADA-derived) functional/logic modeling and simulation language approx. 40% market share in U.S. EDA market

Modeling Levels - Highest Level : Interface

Let's look at the common constructs we use to specify the functionality of a piece of hardware:

Model

inputs

outputs

t

input behavior over time

output behavior over time

Modeling Levels - Major Dimensions (I)

Temporal Dimension: continous (analog) gate delay (psec?) clock cycle instruction cycle events

Data Abstraction: continuous (analog) bit : multiple values bit : binary abstract value composite value ("struct")

discrete time

discrete value

Modeling Levels - Major Dimensions (II)

Functional Dimension: continuous functions (e.g. differential equations) Switch-level (transistors as switches) Boolean Logic Algorithmic (eg. sort procedure) Abstract mathematical formula (e.g. matrix

multiplication) Structural Dimension:

Single black box Functional blocks Detailed hierarchy with primitive library elements

A good VHDL-centric taxonomy can be found at: http://rassp.scra.org

Modeling Levels - Major Dimensions (III)

Continuous Gate Delay Clock Cycle Instruction Cycle Events

Continuous Multivalue Bit Bit abstract value "struct"

Continous Switch Level Boolean Logic Algorithmic Abstract Mathematical

Single Black Box Functional BlocksDetailed

Component Hierarchy

Temporal

Data

Functional

Structural

Coverage of Modeling Levels - Verilog

Continuous Gate Delay Clock Cycle Instruction Cycle Events

Continuous Multivalue Bit Bit abstract value "struct"

Continous Switch Level Boolean Logic Algorithmic Abstract Mathematical

Single Black Box Functional BlocksDetailed

Component Hierarchy

Temporal

Data

Functional

Structural

Coverage of Modeling Levels - VHDL

Continuous Gate Delay Clock Cycle Instruction Cycle Events

Continuous Multivalue Bit Bit abstract value "struct"

Continous Switch Level Boolean Logic Algorithmic Abstract Mathematical

Single Black Box Functional BlocksDetailed

Component Hierarchy

Temporal

Data

Functional

Structural

*

* extremely inefficient compared to Verilog

In HDL Terms

VHDL, Verilog were both defined as simulation languages

Big emphasis on the structural refinement VHDL: entity/architecture/component/port/signal Verilog: module/instance/port/signal/reg

Specification of function General: programming language constructs

VHDL : user-defined data type, package, procedure, function, sequential code

Verilog: function, task, sequential code Parallelism:

VHDL : process, signal update, wait Verilog: "always" block, fork/join, wait, event construct

Special purpose H/W constructs VHDL : concurrent assignment, delayed assignment, signal, Boolean logic Verilog : continuous assignment, 4-value Boolean logic, switch-level

support

General HDL Simulators

Before we can look at architectures of simulators we need to understand the execution model that the HDLs imply:

VHDL's execution model is defined in detail in the IEEE LRM (Language Reference Manual)

Verilog's execution model is defined by Cadence's Verilog-XL simulator ("reference implementation")

Event-Driven Execution

An event-driven VHDL example

process (count)begin my_count <= count; trigger <= not trigger;end process;

process (trigger)begin if (count<=15) then count <= count + 1 after 1ns; else count <= 0 after 1ns; end if;

end process;

Block 1

Block 2

Each process:- loops forever- waits for change in signal from other process

A more hardware-oriented example

s(0) <= a(0) xor b(0) after 2ns;c(0) <= a(0) and b(0) after 1ns;

s(1) <= a(1) xor b(1) xor c(0) after 2ns;c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns;

sum_out(1 to 0) <= s(1 to 0);carry_out <= c(1);

and

a

band

xor

s(0)

c(0)

xorxor =>

s(1)sum_out(1)

and

and

or

or

=>

=>

sum_out(0)

carry_out

A more hardware-oriented example

s(0) <= a(0) xor b(0) after 2ns;c(0) <= a(0) and b(0) after 1ns;

s(1) <= a(1) xor b(1) xor c(0) after 2ns;c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns;

sum_out(1 to 0) <= s(1 to 0);carry_out <= c(1);

and

a

band

xor

s(0)

c(0)

xorxor =>

s(1)sum_out(1)

and

and

or

or

=>

=>

sum_out(0)

carry_out

Let's simulate:a=11b=01

Time = 0ns (step1)

Red Boxes : evaluate in current step

1

1

1

0

A more hardware-oriented example

s(0) <= a(0) xor b(0) after 2ns;c(0) <= a(0) and b(0) after 1ns;

s(1) <= a(1) xor b(1) xor c(0) after 2ns;c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns;

sum_out(1 to 0) <= s(1 to 0);carry_out <= c(1);

and

a

band

xor

s(0)

c(0)

xorxor =>

s(1)sum_out(1)

and

and

or

or

=>

=>

sum_out(0)

carry_out

Let's simulate:

a=11b=01

Time = 0ns (step2)

1

1

1

0 1

A more hardware-oriented example

s(0) <= a(0) xor b(0) after 2ns;c(0) <= a(0) and b(0) after 1ns;

s(1) <= a(1) xor b(1) xor c(0) after 2ns;c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns;

sum_out(1 to 0) <= s(1 to 0);carry_out <= c(1);

and

a

band

xor

s(0)

c(0)

xorxor =>

s(1)sum_out(1)

and

and

or

or

=>

=>

sum_out(0)

carry_out

Let's simulate:

a=11b=01

Time = 0ns (step3)

1

1

1

0 1 1

A more hardware-oriented example

s(0) <= a(0) xor b(0) after 2ns;c(0) <= a(0) and b(0) after 1ns;

s(1) <= a(1) xor b(1) xor c(0) after 2ns;c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns;

sum_out(1 to 0) <= s(1 to 0);carry_out <= c(1);

and

a

band

xor

s(0)

c(0)

xorxor =>

s(1)sum_out(1)

and

and

or

or

=>

=>

sum_out(0)

carry_out

Let's simulate:

a=11b=01

Time = 0ns (step4)

1

1

1

0 1 1 1

A more hardware-oriented example

s(0) <= a(0) xor b(0) after 2ns;c(0) <= a(0) and b(0) after 1ns;

s(1) <= a(1) xor b(1) xor c(0) after 2ns;c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns;

sum_out(1 to 0) <= s(1 to 0);carry_out <= c(1);

and

a

band

xor

s(0)

c(0)

xorxor =>

s(1)sum_out(1)

and

and

or

or

=>

=>

sum_out(0)

carry_out

Let's simulate:

a=11b=01

Time = 1ns (step1)

1

1

1

0 1 1 1

1

1

1

1

A more hardware-oriented example

s(0) <= a(0) xor b(0) after 2ns;c(0) <= a(0) and b(0) after 1ns;

s(1) <= a(1) xor b(1) xor c(0) after 2ns;c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns;

sum_out(1 to 0) <= s(1 to 0);carry_out <= c(1);

and

a

band

xor

s(0)

c(0)

xorxor =>

s(1)sum_out(1)

and

and

or

or

=>

=>

sum_out(0)

carry_out

Let's simulate:

a=11b=01

Time = 1ns (step2)

1

1

1

0 1 0 1

1

1

1

1

1

A more hardware-oriented example

s(0) <= a(0) xor b(0) after 2ns;c(0) <= a(0) and b(0) after 1ns;

s(1) <= a(1) xor b(1) xor c(0) after 2ns;c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns;

sum_out(1 to 0) <= s(1 to 0);carry_out <= c(1);

and

a

band

xor

s(0)

c(0)

xorxor =>

s(1)sum_out(1)

and

and

or

or

=>

=>

sum_out(0)

carry_out

Let's simulate:

a=11b=01

Time = 1ns (step3)

1

1

1

0 1 0 0

1

1

1

1

1

1

A more hardware-oriented example

s(0) <= a(0) xor b(0) after 2ns;c(0) <= a(0) and b(0) after 1ns;

s(1) <= a(1) xor b(1) xor c(0) after 2ns;c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns;

sum_out(1 to 0) <= s(1 to 0);carry_out <= c(1);

and

a

band

xor

s(0)

c(0)

xorxor =>

s(1)sum_out(1)

and

and

or

or

=>

=>

sum_out(0)

carry_out

Let's simulate:

a=11b=01

Time = 1ns (step4)

1

1

11

0 1 0 0

1

1

1

1

1

1

A more hardware-oriented example

s(0) <= a(0) xor b(0) after 2ns;c(0) <= a(0) and b(0) after 1ns;

s(1) <= a(1) xor b(1) xor c(0) after 2ns;c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns;

sum_out(1 to 0) <= s(1 to 0);carry_out <= c(1);

Let's simulate:

a=11b=01

Time = 1ns (step5)

and

a

band

xor

s(0)

c(0)

xorxor =>

s(1)sum_out(1)

and

and

or

or

=>

=>

sum_out(0)

carry_out

1

1

11

0 1 0 0

1

1

1

1

1

11

A more hardware-oriented example

s(0) <= a(0) xor b(0) after 2ns;c(0) <= a(0) and b(0) after 1ns;

s(1) <= a(1) xor b(1) xor c(0) after 2ns;c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns;

sum_out(1 to 0) <= s(1 to 0);carry_out <= c(1);

and

a

band

xor

s(0)

c(0)

xorxor =>

c(1)carry_out

and

and

or

or

=>

=>

sum_out(0)

carry_out

Let's simulate:

a=11b=01

Time = 1ns (step5)

1

1

11

0 1 0 0

1

1

1

1

1

11

That's enough - you get the point

The statement-(process-) dependencies define a network.

Changes are dynamically propagated through the network

Event-Driven Simulation

The simulator maintains a list of all atomic executable blocks a data structure that represents the interconnect of the

blocks via signals a value table that holds all current signal values

At start time the simulator schedules all executable blocks of the models

Core-Architecture of an Event-Driven Simulator

Block Code

Schedule Signal Updates

Scheduler Data- Signal Sink Lists

- Event Queue

Select next scheduled block

More Blocks?

Signal - Updates

More Blocks?

IncrTime

Done

Improving Simulation Speed

The most obvious bottle-neck for functional verification is simulation throughput

There are two basic ways to improve throughput Simulator performance Running many simulations in parallel

Parallelization Hard : parallel simulation algorithms

much parallel event-driven simulation research has not yielded a breakthrough hard to compete against "trivial parallelization"

Simple: run independent testcases on separate machines Workstation "SimFarms" 100s - 1000s of engineer's workstations run simulation in the background ideal parallelization factor

Improving Simulator Performance (I)

Full-HDL support

If full cover of VHDL/Verilog is important

Optimizing compiler techniques treat sequential code constructs like general programming

language all optimizations for language compilers apply:

– data/control-flow analysis– global optimizations– local optimizations (loop unrolling, constant propagation)– register allocation– pipeline optimizations– etc. etc.

Global optimizations are limited because of model-build turn-around time requirements

Example: modern microprocessor is designed w/ ~1Million lines of HDL

Imagine the compile time for a C-program w/ 1M lines!

Improving Simulator Performance (II)

Full-HDL support

Better scheduling algorithms scheduling is clearly the bottle-neck in all event-simulators

Use hybrid techniques: some of the simplifications discussed in the following can be

applied to localized "islands" of HDL. requires an HDL compile process that automatically analyzes the

structure of the model and uses "speed-up" modes for sub-partitions

Problem:– assume 50% of the model has such islands– even if we could speed up simulation of those parts to take 0 time, we

would only gain a speedup factor of 2x

Improving Simulator Performance (III)

Simplifications

Use higher-level HDL specification

s(0 to 2) <= ('0' & a (0 to 1)) + ('0' & b(0 to 1) );

sum_out(0 to 1) <= s(1 to 2);carry_out <= s(0);

s(0) <= a(0) xor b(0) after 2ns;c(0) <= a(0) and b(0) after 1ns;

s(1) <= a(1) xor b(1) xor c(0) after 2ns;c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns;

vs.

Improving Simulator Performance (IV)

Principles of using higher-level HDL specification Common theme: cut down of number of scheduled events

create larger sections of un-interrupted sequential code use less fine-grain granularity for model structure

– -> smaller number of schedulable blocks use higher-level operators use zero-delay wherever possible

– methodology implications: timing verification is not done together with functional simulation

Data abstractions use binary over multi-value bit values

– multi-value : use only for bus contention situations to resolve several drivers with different strengths (strong, resistive, high-impedance)

use word-level operations over bit-level operations

NEXT : Most Powerful - methodology-based subset of HDLs

Synchronous Design Methodology

Clock the design only so fast the longest possible combinational delay path settles before cycle is over

Cycle time depends on the longest topological path Hazards/Races do not disturb function

Longest topological path can be analytically calculated w/o using simulation -> stronger result w/o sim patterns

and

xorxor

and

and

or

or

clock : dependent on critical path

Logic Design Groundrules Synchronous design LSSD : design for testability Critical delay path defines the clock frequency

Behavioral Function and Timing Correctnesscan be verified independently

Design can be verified independently of its implementation -> Logic Synthesis -> Custom Design -> Synthesizable, high-level HDL as main vehicle for functional

verification IF: Boolean Equivalence Checking proofs closure between functional

an implementation view

Functional Verification can use zero-delay functional simulation --> Cycle-Based Simulation, FSM-based Formal Verification

Functional View of Synchronous Designs

Logic mapped to a non-cyclic network of Boolean functions Network also constains state-holding primitives : latch/reg/array HDL does not contain any timing information Function can be evaluated by zero-delay signal evaluation --> Speed of Simulation + Simplicity of Tools

LatchesArrays Boolean

LogicNetwork

Cycle Simulation Algorithm

LevelizedCombinational

LogicLat

che

s

A

B

C

Logic is ordered into levels so thatorder of evaluation is correct.E.g., A and B are computed before C.

Why Is This Faster?As an example, let's assume we compile our circuit into a stream of pseudo instructions....

and

a

band

xor

s(0)

c(0)

xorxor =>

s(1)sum_out(1)

and

and

or

or

=>

=>

sum_out(0)

carry_out

Load temp1, a(0)Load temp2, b(0)Xor temp1, temp2, temp3Store temp3, s(0)And temp1, temp2, temp3Store temp3, c(0)Load temp1, a(1)Load temp2, b(1)Xor temp1, temp2, temp3Load temp4, c(0)Xor temp3, temp4, temp5Store temp5, s(1)

We can cover every Boolean function into a minimal set

(~4 or better)instructions

Methodology FlowRTL

(VHDL, Verilog)

Language Compile Model Build

Physical VLSI Design Tools / Custom Design

Cycle-BasedSimulation Model

Formal Verification :

Boolean Equivalence

CheckVERITY

Cycle-BasedSimulator

Network of Boolean Equations/Operators

Storage Elements

Latch Inference for sequential HDL if there is a possible set of signal evaluations that leave

the value of a signal undefined -> that signal is assumed to be a storage element

Verification & Logic Synthesis must interpret HDL the same way (check with Boolean Equivalence Checking)

Implementation Options For Cycle Sim

HDL Compilation

Evaluation Sequence

Function Evaluation

HDL Compilation Options For Cycle Sim

Preserve the HDL structures Compile HDL like a programming language Preserve design hierarchy, processes, modules, functions,

procedures Implementation process very similar to programming language

compilers Incremental processing is trivial Model optimization is hard (cross-functional boundary) and limited

Map HDL to library of primitive functions (e.g. IBM "Texsim")

Crush design hierarchy to increase optimization potential Synthesis-like process, but simpler because of missing physical

constraints Incremental model build is very hard Designer view of hierarchy must be preserved for model debug

process

Evaluation Sequence Options For Cycle Sim

Oblivious Evaluation For every cycle, evaluate all Boolean logic Minimal amount of book-keeping and runtime control data

structures Large amount of redundant evaluation for large models with low

change activity

Event-Driven Evaluation Evaluate changes only Minimize redundant work for low-activity models Book-keeping overhead (But: use synchronicity constraint)

Hybrid Techniques (example: Texsim) Use design partitions as base granularity for event-driven

evaluation Use “key controling signals” as guards for event-driven evaluation

Function Evaluation Options For Cycle Sim

Interpreted Map design network into an efficient data structure which a

simulator “walks” at run-time “Model” is data, highly portable Few successfull examples where interpretation ofn a

datastructure didn’t add significant performance loss due to indirection

Compiled (example: Texsim) Map design network into a sequence of instructions Generating “C-source” is not an industrial-strength option “Model” consists of machine-instructions, platform dependent Many programming language compiler optimizations apply, but

the problem is simpler (more constrained)

IBM's Texsim simulator

Flattening of the hierarchy Network optimizations - using levelized network

Constant propagation DeMorgan’s law and other Boolean optimizations Equivalent function removal Merging of functions with only one fanout

Final levelization Structural analysis and logic partitioning (e.g. latch-partitioning,

below Create model symbol table and allocate value table Compile logic by partition and level

Generate reference history for use by register allocation Generate machine independent code & perform peephole optimization Perform instruction scheduling Generate target machine object code

Orders of Magnitude in Speed

Event Simulator 1

Cycle Simulator 20

Event driven cycle Simulator

50

Acceleration 1000

Emulation 100000

These numbers are supposed to give you intuitive feel:

The actual numbers depend on many other factors: Model activity rate Test bench activity & implementation language Multiple clock domains

We could write a simulator now...

But we have not talked about many areas:

Model-Build software principles (how to make gigantic model in minutes)

Simulator user interface (how to talk to user and to programs) How a cycle simulator deals with multi-value signals

do we need those ? Yes, mainly for bus logic and power-on-reset sim

How do we take the cycle-sim algorithm and implement in special purpose hardware : simulation acceleration, emulation

Where is there still a need for pure event-simulation?

Good research: only few papers in the field during the last 10 years Good place to start is Peter Maurer at Florida State. Extremely

interesting for folks who want to explore different algorithms

And...

This field is just one segment of verification tools development which is just one segment of Electronic Design Automation