Design of Embedded Processors - WBUTHELP.COMinput buffers provide both the original and the inverted values of each PLA input. The input lines run horizontally into the AND matrix,

Design of Embedded Processors

Lesson 20

Field Programmable Gate Arrays and Applications

Instructional Objectives After going through this lesson the student will be able to

• Define what is a field programmable gate array (FPGA)

• Distinguish between an FPGA and a stored-memory processor

• List and explain the principle of operation of the various functional units within an FPGA

• Compare the architecture and performance specifications of various commercially available FPGA

• Describe the steps in using an FPGA in an embedded system Introduction

An FPGA is a device that contains a matrix of reconfigurable gate array logic circuitry.

When a FPGA is configured, the internal circuitry is connected in a way that creates a hardware implementation of the software application. Unlike processors, FPGAs use dedicated hardware for processing logic and do not have an operating system. FPGAs are truly parallel in nature so different processing operations do not have to compete for the same resources. As a result, the performance of one part of the application is not affected when additional processing is added. Also, multiple control loops can run on a single FPGA device at different rates. FPGA-based control systems can enforce critical interlock logic and can be designed to prevent I/O forcing by an operator. However, unlike hard-wired printed circuit board (PCB) designs which have fixed hardware resources, FPGA-based systems can literally rewire their internal circuitry to allow reconfiguration after the control system is deployed to the field. FPGA devices deliver the performance and reliability of dedicated hardware circuitry. A single FPGA can replace thousands of discrete components by incorporating millions of logic gates in a single integrated circuit (IC) chip. The internal resources of an FPGA chip consist of a matrix of configurable logic blocks (CLBs) surrounded by a periphery of I/O blocks shown in Fig. 20.1. Signals are routed within the FPGA matrix by programmable interconnect switches and wire routes.

Fig. 20.1 Internal Structure of FPGA

PROGRAMMABLE INTERCONNECT

LOGIC BLOCKS

I/O BLOCKS

In an FPGA logic blocks are implemented using multiple level low fan-in gates, which gives it a more compact design compared to an implementation with two-level AND-OR logic. FPGA provides its user a way to configure:

1. The intersection between the logic blocks and 2. The function of each logic block.

Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different combinations of combinational and sequential logic functions. Logic blocks of an FPGA can be implemented by any of the following:

1. Transistor pairs 2. combinational gates like basic NAND gates or XOR gates 3. n-input Lookup tables 4. Multiplexers 5. Wide fan-in And-OR structure.

Routing in FPGAs consists of wire segments of varying lengths which can be interconnected via electrically programmable switches. Density of logic block used in an FPGA depends on length and number of wire segments used for routing. Number of segments used for interconnection typically is a tradeoff between density of logic blocks used and amount of area used up for routing. Simplified version of FPGA internal architecture with routing is shown in Fig. 20.2.

Fig. 20.2 Simplified Internal Structure of FPGA

Logic block

I/O block

Why do we need FPGAs?

By the early 1980’s large scale integrated circuits (LSI) formed the back bone of most of

the logic circuits in major systems. Microprocessors, bus/IO controllers, system timers etc were implemented using integrated circuit fabrication technology. Random “glue logic” or interconnects were still required to help connect the large integrated circuits in order to:

1. Generate global control signals (for resets etc.) 2. Data signals from one subsystem to another sub system.

Systems typically consisted of few large scale integrated components and large number of SSI (small scale integrated circuit) and MSI (medium scale integrated circuit) components.Intial attempt to solve this problem led to development of Custom ICs which were to replace the large amount of interconnect. This reduced system complexity and manufacturing cost, and improved performance. However, custom ICs have their own disadvantages. They are relatively very expensive to develop, and delay introduced for product to market (time to market) because of increased design time. There are two kinds of costs involved in development of custom ICs 1. Cost of development and design 2. Cost of manufacture (A tradeoff usually exists between the two costs) Therefore the custom IC approach was only viable for products with very high volume, and which were not time to market sensitive.FPGAs were introduced as an alternative to custom ICs for implementing entire system on one chip and to provide flexibility of reporogramability to the user. Introduction of FPGAs resulted in improvement of density relative to discrete SSI/MSI components (within around 10x of custom ICs). Another advantage of FPGAs over Custom ICs is that with the help of computer aided design (CAD) tools circuits could be implemented in a short amount of time (no physical layout process, no mask making, no IC manufacturing) Evaluation of FPGA

In the world of digital electronic systems, there are three basic kinds of devices: memory,

microprocessors, and logic. Memory devices store random information such as the contents of a

spreadsheet or database. Microprocessors execute software instructions to perform a wide variety of tasks such as running a word processing program or video game. Logic devices provide specific functions, including device-to-device interfacing, data communication, signal processing, data display, timing and control operations, and almost every other function a system must perform.

The first type of user-programmable chip that could implement logic circuits was the Programmable Read-Only Memory (PROM), in which address lines can be used as logic circuit inputs and data lines as outputs. Logic functions, however, rarely require more than a few product terms, and a PROM contains a full decoder for its address inputs. PROMS are thus an inefficient architecture for realizing logic circuits, and so are rarely used in practice for that purpose. The device that came as a replacement for the PROM’s are programmable logic devices or in short PLA. Logically, a PLA is a circuit that allows implementing Boolean functions in sum-of-product form. The typical implementation consists of input buffers for all inputs, the programmable AND-matrix followed by the programmable OR-matrix, and output buffers. The input buffers provide both the original and the inverted values of each PLA input. The input lines run horizontally into the AND matrix, while the so-called product-term lines run vertically. Therefore, the size of the AND matrix is twice the number of inputs times the number of product-terms. When PLAs were introduced in the early 1970s, by Philips, their main drawbacks were that they were expensive to manufacture and offered somewhat poor speed-performance. Both disadvantages were due to the two levels of configurable logic, because programmable logic planes were difficult to manufacture and introduced significant propagation delays. To overcome these weaknesses, Programmable Array Logic (PAL) devices were developed. PALs provide only a single level of programmability, consisting of a programmable “wired” AND plane that feeds fixed OR-gates. PALs usually contain flip-flops connected to the OR-gate outputs so that sequential circuits can be realized. These are often referred to as Simple Programmable Logic Devices (SPLDs). Fig. 20.3 shows a simplified structure of PLA and PAL.

Fig. 20.3 Simplified Structure of PLA and PAL

Inputs

Outputs

PLAInputs

Outputs

PAL

With the advancement of technology, it has become possible to produce devices with higher capacities than SPLD’s.As chip densities increased, it was natural for the PLD manufacturers to evolve their products into larger (logically, but not necessarily physically) parts called Complex Programmable Logic Devices (CPLDs). For most practical purposes, CPLDs can be thought of as multiple PLDs (plus some programmable interconnect) in a single chip. The larger size of a CPLD allows to implement either more logic equations or a more complicated design.

Fig. 20.4 Internal structure of a CPLD

Logic block

Logic block

Logic block

Logic block

Switch matrix

Fig. 20.4 contains a block diagram of a hypothetical CPLD. Each of the four logic blocks shown there is the equivalent of one PLD. However, in an actual CPLD there may be more (or less) than four logic blocks. These logic blocks are themselves comprised of macrocells and interconnect wiring, just like an ordinary PLD.

Unlike the programmable interconnect within a PLD, the switch matrix within a CPLD may or may not be fully connected. In other words, some of the theoretically possible connections between logic block outputs and inputs may not actually be supported within a given CPLD. The effect of this is most often to make 100% utilization of the macrocells very difficult to achieve. Some hardware designs simply won't fit within a given CPLD, even though there are sufficient logic gates and flip-flops available. Because CPLDs can hold larger designs than PLDs, their potential uses are more varied. They are still sometimes used for simple applications like address decoding, but more often contain high-performance control-logic or complex finite state machines. At the high-end (in terms of numbers of gates), there is also a lot of overlap in potential applications with FPGAs. Traditionally, CPLDs have been chosen over FPGAs whenever high-performance logic is required. Because of its less flexible internal architecture, the delay through a CPLD (measured in nanoseconds) is more predictable and usually shorter. The development of the FPGA was distinct from the SPLD/CPLD evolution just described.This is apparent from the architecture of FPGA shown in Fig 20.1. FPGAs offer the highest amount of logic density, the most features, and the highest performance. The largest FPGA now shipping, part of the Xilinx Virtex™ line of devices, provides eight million "system gates" (the relative density of logic). These advanced devices also offer features such as built-in hardwired processors (such as the IBM Power PC), substantial amounts of memory, clock management systems, and support for many of the latest, very fast device-to-device signaling technologies. FPGAs are used in a wide variety of applications ranging from data processing and storage, to instrumentation, telecommunications, and digital signal processing. The value of programmable logic has always been its ability to shorten development cycles for electronic equipment manufacturers and help them get their product to market faster. As PLD (Programmable Logic Device) suppliers continue to integrate more functions inside their devices, reduce costs, and increase the availability of time-saving IP cores, programmable logic is certain to expand its popularity with digital designers.

FPGA Structural Classification Basic structure of an FPGA includes logic elements, programmable interconnects and memory. Arrangement of these blocks is specific to particular manufacturer. On the basis of internal arrangement of blocks FPGAs can be divided into three classes: Symmetrical arrays This architecture consists of logic elements (called CLBs) arranged in rows and columns of a matrix and interconnect laid out between them shown in Fig 20.2. This symmetrical matrix is surrounded by I/O blocks which connect it to outside world. Each CLB consists of n-input Lookup table and a pair of programmable flip flops. I/O blocks also control functions such as tri-state control, output transition speed. Interconnects provide routing path. Direct interconnects between adjacent logic elements have smaller delay compared to general purpose interconnect Row based architecture Row based architecture shown in Fig 20.5 consists of alternating rows of logic modules and programmable interconnect tracks. Input output blocks is located in the periphery of the rows. One row may be connected to adjacent rows via vertical interconnect. Logic modules can be implemented in various combinations. Combinatorial modules contain only combinational elements which Sequential modules contain both combinational elements along with flip flops. This sequential module can implement complex combinatorial-sequential functions. Routing tracks are divided into smaller segments connected by anti-fuse elements between them. Hierarchical PLDs This architecture is designed in hierarchical manner with top level containing only logic blocks and interconnects. Each logic block contains number of logic modules. And each logic module has combinatorial as well as sequential functional elements. Each of these functional elements is controlled by the programmed memory. Communication between logic blocks is achieved by programmable interconnect arrays. Input output blocks surround this scheme of logic blocks and interconnects. This type of architecture is shown in Fig 20.6.

Fig. 20.5 Row based Architecture

Routing Channels

Logic Block Rows

I/O Blocks

I/O Blocks

I/O

Blo

cks

I/O

Blo

cks

Fig. 20.6 Hierarchical PLD

I/O Block

I/O Block

I/O

Blo

ck

I/O

Blo

ck

Logic Module

Interconnects

FPGA Classification on user programmable switch technologies FPGAs are based on an array of logic modules and a supply of uncommitted wires to route signals. In gate arrays these wires are connected by a mask design during manufacture. In FPGAs, however, these wires are connected by the user and therefore must use an electronic device to connect them. Three types of devices have been commonly used to do this, pass transistors controlled by an SRAM cell, a flash or EEPROM cell to pass the signal, or a direct connect using antifuses. Each of these interconnect devices have their own advantages and disadvantages. This has a major affect on the design, architecture, and performance of the FPGA. Classification of FPGAs on user programmable switch technology is given in Fig. 20.7 shown below.

Fig. 20.7 FPGA Classification on user programmable technology

FPGA

SRAM- Programmed

Antifuse- Programmed

EEPROM- Programmed

Actel ACT1 & 2 Quicklogic’s pASIC Crosspoint’s CP20K Xilinx LCA

AT&T Orca Altera Flex

Toshiba Plesser’s ERA Atmel’s CLi

Altera’s MAX AMD’s Mach Xilinx’s EPLD

SRAM Based The major advantage of SRAM based device is that they are infinitely re-programmable and can be soldered into the system and have their function changed quickly by merely changing the contents of a PROM. They therefore have simple development mechanics. They can also be changed in the field by uploading new application code, a feature attractive to designers. It does however come with a price as the interconnect element has high impedance and capacitance as well as consuming much more area than other technologies. Hence wires are very expensive and slow. The FPGA architect is therefore forced to make large inefficient logic modules (typically a look up table or LUT).The other disadvantages are: They needs to be reprogrammed each time when power is applied, needs an external memory to store program and require large area. Fig. 20.8 shows two applications of SRAM cells: for controlling the gate nodes of pass-transistor switches and to control the select lines of multiplexers that drive logic block inputs. The figures gives an example of the connection of one logic block (represented by the AND-gate in the upper left corner) to another through two pass-transistor switches, and then a multiplexer, all controlled by SRAM cells . Whether an FPGA uses pass-transistors or multiplexers or both depends on the particular product.

Fig. 20.8 SRAM-controlled Programmable Switches.

Logic Cell Logic Cell

Logic Cell Logic Cell

SRAM

SRAM SRAM

Antifuse Based The antifuse based cell is the highest density interconnect by being a true cross point. Thus the designer has a much larger number of interconnects so logic modules can be smaller and more efficient. Place and route software also has a much easier time. These devices however are only one-time programmable and therefore have to be thrown out every time a change is made in the design. The Antifuse has an inherently low capacitance and resistance such that the fastest parts are all Antifuse based. The disadvantage of the antifuse is the requirement to integrate the fabrication of the antifuses into the IC process, which means the process will always lag the SRAM process in scaling. Antifuses are suitable for FPGAs because they can be built using modified CMOS technology. As an example, Actel’s antifuse structure is depicted in Fig. 20.9. The figure shows that an antifuse is positioned between two interconnect wires and physically consists of three sandwiched layers: the top and bottom layers are conductors, and the middle layer is an insulator. When unprogrammed, the insulator isolates the top and bottom layers, but when programmed the insulator changes to become a low-resistance link. It uses Poly-Si and n+ diffusion as conductors and ONO as an insulator, but other antifuses rely on metal for conductors, with amorphous silicon as the middle layer.

wire wire

antifuse

oxide dielectric

Poly-Si

n+ diffusion Silicon substrate

Fig. 20.9 Actel Antifuse Structure.

EEPROM Based The EEPROM/FLASH cell in FPGAs can be used in two ways, as a control device as in an SRAM cell or as a directly programmable switch. When used as a switch they can be very efficient as interconnect and can be reprogrammable at the same time. They are also non-volatile so they do not require an extra PROM for loading. They, however, do have their detractions. The EEPROM process is complicated and therefore also lags SRAM technology. Logic Block and Routing Techniques Crosspoint FPGA: consist of two types of logic blocks. One is transistor pair tiles in which transistor pairs run in parallel lines as shown in figure below:

Transistor Pair

Fig. 20.10 Transistor pair tiles in cross-point FPGA.

second type of logic blocks are RAM logic which can be used to implement random access memory. Plessey FPGA: Basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Fig. 20.11 Plessey Logic Block

Latch

Config RAM

8 in

terc

onne

ct

lines

8-2

m

ultip

lexe

r

CLK

Data

Both Crosspoint and Plessey are fine grain logic blocks. Fine grain logic blocks have an advantage in high percentage usage of logic blocks but they require large number of wire segments and programmable switches which occupy lot of area. Actel Logic Block: If inputs of a multiplexer are connected to a constant or to a signal, it can be used to implement different logic functions. For example a 2-input multiplexer with inputs a and b, select, will implement function ac + bc´. If b=0 then it will implement ac, and if a=0 it will implement bc´.

Typically an Actel logic block consists of multiple number of multiplexers and logic gates.

w

x

y

z

0

1

0

1

0

1n1

n2

n3 n4

Fig. 20.12 Actel Logic Block

Xilinx Logic block In Xilinx logic block Look up table is used to implement any number of different functionality. The input lines go into the input and enable of lookup table. The output of the lookup table gives the result of the logic function that it implements. Lookup table is implemented using SRAM.

Fig. 20.13 Xilinx - LUT based

Inputs

Data in

Enable clock Clock

Reset

Look-up Table

Vix

Gnd (Global Reset)

OR

A B C D E

X

Y

M U X

M U X

S

R

R

S

Outputs

A k-input logic function is implemented using 2^k * 1 size SRAM. Number of different possible functions for k input LUT is 2^2^k. Advantage of such an architecture is that it supports implementation of so many logic functions, however the disadvantage is unusually large number of memory cells required to implement such a logic block in case number of inputs is large. Fig. 20.13 shows 5-input LUT based implementation of logic block LUT based design provides for better logic block utilization. A k-input LUT based logic block can be implemented in number of different ways with tradeoff between performance and logic density.

Set by configuration bit-stream Logic Block

INPUTS 4-LUT FF OUTPUT

4-input “look up table”

latch

1

0

An n-lut can be shown as a direct implementation of a function truth-table. Each of the latch holds the value of the function corresponding to one input combination. For Example: 2-lut shown in figure below implements 2 input AND and OR functions.

Example: 2-lut

INPUTS AND OR 00 0 0 01 0 1 10 0 1 11 1 1

Altera Logic Block Altera's logic block has evolved from earlier PLDs. It consists of wide fan in (up to 100 input) AND gates feeding into an OR gate with 3-8 inputs. The advantage of large fan in AND gate based implementation is that few logic blocks can implement the entire functionality thereby reducing the amount of area required by interconnects. On the other hand disadvantage is the low density usage of logic blocks in a design that requires fewer input logic. Another disadvantage is the use of pull up devices (AND gates) that consume static power. To improve power manufacturers provide low power consuming logic blocks at the expense of delay. Such logic blocks have gates with high threshold as a result they consume less power. Such logic blocks can be used in non-critical paths. Altera, Xilinx are coarse grain architecture. Example: Altera’s FLEX 8000 series consists of a three-level hierarchy. However, the lowest level of the hierarchy consists of a set of lookup tables, rather than an SPLD like block, and so the FLEX 8000 is categorized here as an FPGA. It should be noted, however, that FLEX 8000 is

a combination of FPGA and CPLD technologies. FLEX 8000 is SRAM-based and features a four-input LUT as its basic logic block. Logic capacity ranges from about 4000 gates to more than 15,000 for the 8000 series. The overall architecture of FLEX 8000 is illustrated in Fig. 20.14.

Fig. 20.14 Architecture of Altera FLEX 8000 FPGAs.

Fast Track interconnect

LAB(8 Logic Elements & local interconnect)

I/O

I/O

The basic logic block, called a Logic Element (LE) contains a four-input LUT, a flip-flop, and special-purpose carry circuitry for arithmetic circuits. The LE also includes cascade circuitry that allows for efficient implementation of wide AND functions. Details of the LE are illustrated in Fig. 20.15.

Fig. 20.15 Altera FLEX 8000 Logic Element (LE).

Cascade out

LE out

Carry out

Cascade in

data1 data2 data3 data4

cntrl2

cntrl3 cntrl4

cntrl1

Carry in Carry

clock

set/clear

Cascade D

R

S Q Look-up

Table

In the FLEX 8000, LEs are grouped into sets of 8, called Logic Array Blocks (LABs, a term borrowed from Altera’s CPLDs). As shown in Fig. 20.16, each LAB contains local interconnect and each local wire can connect any LE to any other LE within the same LAB. Local interconnect also connects to the FLEX 8000’s global interconnect, called FastTrack. All FastTrack wires horizontal wires are identical, and so interconnect delays in the FLEX 8000 are more predictable than FPGAs that employ many smaller length segments because there are fewer programmable switches in the longer path

Fig. 20.16 Altera FLEX 8000 Logic Array Block (LAB).

To Fast Track interconnect



From Fast Track interconnect

to adjacent LAB

Local interconnect

data

cntrl Cascade, carry

LE

LE

LE

4

4 2

FPGA Design Flow One of the most important advantages of FPGA based design is that users can design it using CAD tools provided by design automation companies. Generic design flow of an FPGA includes following steps: System Design At this stage designer has to decide what portion of his functionality has to be implemented on FPGA and how to integrate that functionality with rest of the system. I/O integration with rest of the system Input Output streams of the FPGA are integrated with rest of the Printed Circuit Board, which allows the design of the PCB early in design process. FPGA vendors provide extra automation software solutions for I/O design process.

Design Description Designer describes design functionality either by using schematic editors or by using one of the various Hardware Description Languages (HDLs) like Verilog or VHDL. Synthesis Once design has been defined CAD tools are used to implement the design on a given FPGA. Synthesis includes generic optimization, slack optimizations, power optimizations followed by placement and routing. Implementation includes Partition, Place and route. The output of design implementation phase is bit-stream file. Design Verification Bit stream file is fed to a simulator which simulates the design functionality and reports errors in desired behavior of the design. Timing tools are used to determine maximum clock frequency of the design. Now the design is loading onto the target FPGA device and testing is done in real environment. Hardware design and development The process of creating digital logic is not unlike the embedded software development process. A description of the hardware's structure and behavior is written in a high-level hardware description language (usually VHDL or Verilog) and that code is then compiled and downloaded prior to execution. Of course, schematic capture is also an option for design entry, but it has become less popular as designs have become more complex and the language-based tools have improved. The overall process of hardware development for programmable logic is shown in Fig. 20.17 and described in the paragraphs that follow. Perhaps the most striking difference between hardware and software design is the way a developer must think about the problem. Software developers tend to think sequentially, even when they are developing a multithreaded application. The lines of source code that they write are always executed in that order, at least within a given thread. If there is an operating system it is used to create the appearance of parallelism, but there is still just one execution engine. During design entry, hardware designers must think-and program-in parallel. All of the input signals are processed in parallel, as they travel through a set of execution engines-each one a series of macrocells and interconnections-toward their destination output signals. Therefore, the statements of a hardware description language create structures, all of which are "executed" at the very same time.

Fig. 20.17 Programmable logic design process

Design Entry

Simulation

Synthesis

Place and Route

Download

Design Constraints

Design Library

Typically, the design entry step is followed or interspersed with periods of functional simulation. That's where a simulator is used to execute the design and confirm that the correct outputs are produced for a given set of test inputs. Although problems with the size or timing of the hardware may still crop up later, the designer can at least be sure that his logic is functionally correct before going on to the next stage of development. Compilation only begins after a functionally correct representation of the hardware exists. This hardware compilation consists of two distinct steps. First, an intermediate representation of the hardware design is produced. This step is called synthesis and the result is a representation called a netlist. The netlist is device independent, so its contents do not depend on the particulars of the FPGA or CPLD; it is usually stored in a standard format called the Electronic Design Interchange Format (EDIF). The second step in the translation process is called place & route. This step involves mapping the logical structures described in the netlist onto actual macrocells, interconnections, and input and output pins. This process is similar to the equivalent step in the development of a printed circuit board, and it may likewise allow for either automatic or manual layout optimizations. The result of the place & route process is a bitstream. This name is used generically, despite the fact that each CPLD or FPGA (or family) has its own, usually proprietary, bitstream format. Suffice it to say that the bitstream is the binary data that must be loaded into the FPGA or CPLD to cause that chip to execute a particular hardware design. Increasingly there are also debuggers available that at least allow for single-stepping the hardware design as it executes in the programmable logic device. But those only complement a simulation environment that is able to use some of the information generated during the place & route step to provide gate-level simulation. Obviously, this type of integration of device-specific information into a generic simulator requires a good working relationship between the chip and simulation tool vendors.

Things to Ponder Q.1 Define the following acronyms as they apply to digital logic circuits:

• ASIC • PAL • PLA • PLD • CPLD • FPGA

Q2.How granularity of logic block influences the performance of an FPGA? Q3. Why would anyone use programmable logic devices (PLD, PAL, PLA, CPLD, FPGA, etc.) in place of traditional "hard-wired" logic such as NAND, NOR, AND, and OR gates? Are there any applications where hard-wired logic would do a better job than a programmable device?

Q4.Some programmable logic devices (and PROM memory devices as well) use tiny fuses which are intentionally "blown" in specific patterns to represent the desired program. Programming a device by blowing tiny fuses inside of it carries certain advantages and disadvantages - describe what some of these are. Q5. Use one 4 x 8 x 4 PLA to implement the function.

1

2

( , , , ) ' ' ' ' '( , , , ) ' ' '

= + += +

F w x y z wx y z wx yz wxyF w x y z wx y x y z

Lesson 21

Introduction to Hardware Description Languages - I

Instructional Objectives At the end of the lesson the student should be able to

• Describe a digital IC design flow and explain its various abstraction levels.

• Explain the need for a hardware description language in the IC desing flow

• Model simple hardware devices at various levels of abstraction using Verilog (Gate/Switch/Behavioral)

• Write Verilog codes meeting the prescribed requirement at a specified level

1.1 Introduction 1.1.1 What is a HDL and where does Verilog come? HDL is an abbreviation of Hardware Description Language. Any digital system can be represented in a REGISTER TRANSFER LEVEL (RTL) and HDLs are used to describe this RTL. Verilog is one such HDL and it is a general-purpose language –easy to learn and use. Its syntax is similar to C. The idea is to specify how the data flows between registers and how the design processes the data. To define RTL, hierarchical design concepts play a very significant role. Hierarchical design methodology facilitates the digital design flow with several levels of abstraction. Verilog HDL can utilize these levels of abstraction to produce a simplified and efficient representation of the RTL description of any digital design.

For example, an HDL might describe the layout of the wires, resistors and transistors on an Integrated Circuit (IC) chip, i.e., the switch level or, it may describe the design at a more micro level in terms of logical gates and flip flops in a digital system, i.e., the gate level. Verilog supports all of these levels. 1.1.2 Hierarchy of design methodologies Bottom-Up Design The traditional method of electronic design is bottom-up (designing from transistors and moving to a higher level of gates and, finally, the system). But with the increase in design complexity traditional bottom-up designs have to give way to new structural, hierarchical design methods. Top-Down Design For HDL representation it is convenient and efficient to adapt this design-style. A real top-down design allows early testing, fabrication technology independence, a structured system design and offers many other advantages. But it is very difficult to follow a pure top-down design. Due to this fact most designs are mix of both the methods, implementing some key elements of both design styles.

1.1.3 Hierarchical design concept and Verilog To follow the hierarchical design concepts briefly mentioned above one has to describe the design in terms of entities called MODULES. Modules A module is the basic building block in Verilog. It can be an element or a collection of low level design blocks. Typically, elements are grouped into modules to provide common functionality used in places of the design through its port interfaces, but hides the internal implementation. 1.1.4 Abstraction Levels

• Behavioral level • Register-Transfer Level • Gate Level • Switch level

Behavioral or algorithmic Level This level describes a system by concurrent algorithms (Behavioral). Each algorithm itself is sequential meaning that it consists of a set of instructions that are executed one after the other. ‘initial’, ‘always’ ,‘functions’ and ‘tasks’ blocks are some of the elements used to define the system at this level. The intricacies of the system are not elaborated at this stage and only the functional description of the individual blocks is prescribed. In this way the whole logic synthesis gets highly simplified and at the same time more efficient. Register-Transfer Level Designs using the Register-Transfer Level specify the characteristics of a circuit by operations and the transfer of data between the registers. An explicit clock is used. RTL design contains exact timing possibility, operations are scheduled to occur at certain times. Modern definition of a RTL code is "Any code that is synthesizable is called RTL code". Gate Level Within the logic level the characteristics of a system are described by logical links and their timing properties. All signals are discrete signals. They can only have definite logical values (`0', `1', `X', `Z`). The usable operations are predefined logic primitives (AND, OR, NOT etc gates). It must be indicated here that using the gate level modeling may not be a good idea in logic design. Gate level code is generated by tools like synthesis tools in the form of netlists which are used for gate level simulation and for backend.

Switch Level This is the lowest level of abstraction. A module can be implemented in terms of switches, storage nodes and interconnection between them. However, as has been mentioned earlier, one can mix and match all the levels of abstraction in a design. RTL is frequently used for Verilog description that is a combination of behavioral and dataflow while being acceptable for synthesis. Instances A module provides a template from where one can create objects. When a module is invoked Verilog creates a unique object from the template, each having its own name, variables, parameters and I/O interfaces. These are known as instances. 1.1.5 The Design Flow This block diagram describes a typical design flow for the description of the digital design for both ASIC and FPGA realizations.

LEVEL OF FLOW TOOLS USED

Specification Word processor like Word, Kwriter, AbiWord, Open Office

High Level Design Word processor like Word, Kwriter, AbiWord, for drawing waveform use tools like waveformer or testbencher or Word, Open Office.

Micro Design/Low level design

Word processor like Word, Kwriter, AbiWord, for drawing waveform use tools like waveformer or testbencher or Word. For FSM StateCAD or some similar tool, Open Office

RTL Coding Vim, Emacs, conTEXT, HDL TurboWriter

Simulation Modelsim, VCS, Verilog-XL, Veriwell, Finsim, iVerilog, VeriDOS

Synthesis Design Compiler, FPGA Compiler, Synplify, Leonardo Spectrum. You can download this from FPGA vendors like Altera and Xilinx for free

Place & Route For FPGA use FPGA' vendors P&R tool. ASIC tools require expensive P&R tools like Apollo. Students can use LASI, Magic

Post Si Validation For ASIC and FPGA, the chip needs to be tested in real environment. Board design, device drivers needs to be in place

Specification This is the stage at which we define the important parameters of the system that has to be designed. For example for designing a counter one has to decide its bit-size, whether it should have synchronous reset whether it must be active high enable etc. High Level Design This is the stage at which one defines various blocks in the design in the form of modules and instances. For instance for a microprocessor a high level representation means splitting the design into blocks based on their function. In this case the various blocks are registers, ALU, Instruction Decode, Memory Interface, etc. Micro Design/Low level design Low level design or Micro design is the phase in which, designer describes how each block is implemented. It contains details of State machines, counters, Mux, decoders, internal registers. For state machine entry you can use either Word, or special tools like State CAD. It is always a good idea if waveform is drawn at various interfaces. This is the phase, where one spends lot of time. A sample low level design is indicated in the figure below.

RTL Coding In RTL coding, Micro Design is converted into Verilog/VHDL code, using synthesizable constructs of the language. Normally, vim editor is used, and conTEXT, Nedit and Emacs are other choices. Simulation Simulation is the process of verifying the functional characteristics of models at any level of abstraction. We use simulators to simulate the the Hardware models. To test if the RTL code meets the functional requirements of the specification, see if all the RTL blocks are functionally correct. To achieve this we need to write testbench, which generates clk, reset and required test vectors. A sample testbench for a counter is as shown below. Normally, we spend 60-70% of time in verification of design.

We use waveform output from the simulator to see if the DUT (Device Under Test) is functionally correct. Most of the simulators come with waveform viewer, as design becomes complex, we write self checking testbench, where testbench applies the test vector, compares the output of DUT with expected value. There is another kind of simulation, called timing simulation, which is done after synthesis or after P&R (Place and Route). Here we include the gate delays and wire delays and see if DUT works at the rated clock speed. This is also called as SDF simulation or gate level simulation

Synthesis Synthesis is the process in which a synthesis tool like design compiler takes in the RTL in Verilog or VHDL, target technology, and constrains as input and maps the RTL to target technology primitives. The synthesis tool after mapping the RTL to gates, also does the minimal amount of timing analysis to see if the mapped design is meeting the timing requirements. (Important thing to note is, synthesis tools are not aware of wire delays, they know only gate delays). After the synthesis there are a couple of things that are normally done before passing the netlist to backend (Place and Route)

• Verification: Check if the RTL to gate mapping is correct. • Scan insertion: Insert the scan chain in the case of ASIC.

Place & Route Gate-level netlist from the synthesis tool is taken and imported into place and route tool in the Verilog netlist format. All the gates and flip-flops are placed, Clock tree synthesis and reset is routed. After this each block is routed. Output of the P&R tool is a GDS file, this file is used by a

foundry for fabricating the ASIC. Normally the P&R tool are used to output the SDF file, which is back annotated along with the gatelevel netlist from P&R into static analysis tool like Prime Time to do timing analysis. Post Silicon Validation Once the chip (silicon) is back from fabrication, it needs to be put in a real environment and tested before it can be released into market. Since the speed of simulation with RTL is very slow (number clocks per second), there is always a possibility to find a bug

1.2 Verilog HDL: Syntax and Semantics

1.2.1 Lexical Conventions The basic lexical conventions used by Verilog HDL are similar to those in the C programming language. Verilog HDL is a case-sensitive language. All keywords are in lowercase. 1.2.2 Data Types Verilog Language has two primary data types :

• Nets - represents structural connections between components. • Registers - represent variables used to store data.

Every signal has a data type associated with it. Data types are: • Explicitly declared with a declaration in the Verilog code. • Implicitly declared with no declaration but used to connect structural building blocks in

the code. Implicit declarations are always net type "wire" and only one bit wide.

Types of Net Each net type has functionality that is used to model different types of hardware (such as PMOS, NMOS, CMOS, etc).This has been tabularized as follows:

Net Data Type Functionality wire, tri Interconnecting wire - no special resolution function

wor, trior Wired outputs OR together (models ECL) wand,triand Wired outputs AND together (models open-collector)

tri0,tri1 Net pulls-down or pulls-up when not driven supply0,suppy1 Net has a constant logic 0 or logic 1 (supply strength)

Register Data Types

• Registers store the last value assigned to them until another assignment statement changes their value.

• Registers represent data storage constructs. • Register arrays are called memories.

• Register data types are used as variables in procedural blocks. • A register data type is required if a signal is assigned a value within a procedural block • Procedural blocks begin with keyword initial and always.

Some common data types are listed in the following table:

Data Types Functionality reg Unsigned variable

integer Signed variable – 32 bits time Unsigned integer- 64 bits real Double precision floating point variable

1.2.3 Apart from these there are vectors, integer, real & time register data types. Some examples are as follows: Integer integer counter; // general purpose variable used as a counter. initial counter= -1; // a negative one is stored in the counter Real real delta; // Define a real variable called delta. initial begin delta= 4e10; // delta is assigned in scientific notation delta = 2.13; // delta is assigned a value 2.13 end integer i; // define an integer I; initial i = delta ; // I gets the value 2(rounded value of 2.13) Time time save_sim_time; // define a time variable save_sim_time initial save_sim_time = $time; // save the current simulation time. n.b. $time is invoked to get the current simulation time Arrays integer count [0:7]; // an array of 8 count variables reg [4:0] port_id[0:7]; // Array of 8 port _ids, each 5 bit wide. integer matrix[4:0] [0:255] ; // two dimensional array of integers.

1.2.4 Some Constructs Using Data Types Memories Memories are modeled simply as one dimensional array of registers each element of the array is know as an element of word and is addressed by a single array index. reg membit [0:1023] ; // memory meme1bit with 1K 1- bit words reg [7:0] membyte [0:1023]; memory membyte with 1K 8 bit words membyte [511] // fetches 1 byte word whose address is 511. Strings A string is a sequence of characters enclosed by double quotes and all contained on a single line. Strings used as operands in expressions and assignments are treated as a sequence of eight-bit ASCII values, with one eight-bit ASCII value representing one character. To declare a variable to store a string, declare a register large enough to hold the maximum number of characters the variable will hold. Note that no extra bits are required to hold a termination character; Verilog does not store a string termination character. Strings can be manipulated using the standard operators. When a variable is larger than required to hold a value being assigned, Verilog pads the contents on the left with zeros after the assignment. This is consistent with the padding that occurs during assignment of non-string values. Certain characters can be used in strings only when preceded by an introductory character called an escape character. The following table lists these characters in the right-hand column with the escape sequence that represents the character in the left-hand column. Modules

• Module are the building blocks of Verilog designs • You create design hierarchy by instantiating modules in other modules. • An instance of a module can be called in another, higher-level module.

Ports

• Ports allow communication between a module and its environment. • All but the top-level modules in a hierarchy have ports. • Ports can be associated by order or by name. You declare ports to be input, output or inout. The port declaration syntax is : input [range_val:range_var] list_of_identifiers; output [range_val:range_var] list_of_identifiers; inout [range_val:range_var] list_of_identifiers;

Schematic

1.2.5 Port Connection Rules

• Inputs : internally must always be type net, externally the inputs can be connected to variable reg or net type.

• Outputs : internally can be type net or reg, externally the outputs must be connected to a variable net type.

• Inouts : internally or externally must always be type net, can only be connected to a variable net type.

• Width matching: It is legal to connect internal and external ports of different sizes. But beware, synthesis tools could report problems.

• Unconnected ports : unconnected ports are allowed by using a "," • The net data types are used to connect structure • A net data type is required if a signal can be driven a structural connection.

Example – Implicit dff u0 ( q,,clk,d,rst,pre); // Here second port is not connected Example – Explicit dff u0 (.q (q_out), .q_bar (), .clk (clk_in), .d (d_in), .rst (rst_in), .pre (pre_in)); // Here second port is not connected 1.3 Gate Level Modeling In this level of abstraction the system modeling is done at the gate level ,i.e., the properties of the gates etc. to be used by the behavioral description of the system are defined. These definitions are known as primitives. Verilog has built in primitives for gates, transmission gates, switches, buffers etc.. These primitives are instantiated like modules except that they are predefined in verilog and do not need a module definition. Two basic types of gates are and/or gates & buf /not gates. 1.3.1 Gate Primitives And/Or Gates: These have one scalar output and multiple scalar inputs. The output of the gate is evaluated as soon as the input changes . wire OUT, IN1, IN2; // basic gate instantiations and a1(OUT, IN1, IN2); nand na1(OUT, IN1, IN2); or or1(OUT, IN1, IN2);

nor nor1(OUT, IN1, IN2); xor x1(OUT, IN1, IN2); xnor nx1(OUT, IN1, IN2); // more than two inputs; 3 input nand gate nand na1_3inp(OUT, IN1, IN2, IN3); // gate instantiation without instance name and (OUT, IN1, IN2); // legal gate instantiation Buf/Not Gates: These gates however have one scalar input and multiple scalar outputs \// basic gate instantiations for bufif bufif1 b1(out, in, ctrl); bufif0 b0(out, in, ctrl); // basic gate instantiations for notif notif1 n1(out, in, ctrl); notif0 n0(out, in, ctrl); Array of instantiations wire [7:0] OUT, IN1, IN2; // basic gate instantiations nand n_gate[7:0](OUT, IN1, IN2); Gate-level multiplexer A multiplexer serves a very efficient basic logic design element // module 4:1 multiplexer module mux4_to_1(out, i1, i2 , i3, s1, s0); // port declarations output out; input i1, i2, i3; input s1, s0; // internal wire declarations wire s1n, s0n; wire y0, y1, y2, y3 ; //gate instantiations // create s1n and s0n signals not (s1n, s1); not (s0n, s0); // 3-input and gates instantiated and (y0, i0, s1n, s0n); and (y1, i1, s1n, s0); and (y2, i2, s1, s0n); and (y3, i3, s1, s0); // 4- input gate instantiated or (out, y0, y1, y2, y3); endmodule

1.3.2 Gate and Switch delays In real circuits, logic gates haves delays associated with them. Verilog provides the mechanism to associate delays with gates.

• Rise, Fall and Turn-off delays. • Minimal, Typical, and Maximum delays

Rise Delay The rise delay is associated with a gate output transition to 1 from another value (0,x,z).

Fall Delay The fall delay is associated with a gate output transition to 0 from another value (1,x,z).

Turn-off Delay The Turn-off delay is associated with a gate output transition to z from another value (0,1,x). Min Value The min value is the minimum delay value that the gate is expected to have. Typ Value The typ value is the typical delay value that the gate is expected to have. Max Value The max value is the maximum delay value that the gate is expected to have. 1.4 Verilog Behavioral Modeling

1.4.1 Procedural Blocks Verilog behavioral code is inside procedures blocks, but there is an exception, some behavioral code also exist outside procedures blocks. We can see this in detail as we make progress. There are two types of procedural blocks in Verilog

• initial : initial blocks execute only once at time zero (start execution at time zero). • always : always blocks loop to execute over and over again, in other words as the name

means, it executes always.

Example – initial module initial_example(); reg clk,reset,enable,data; initial begin clk = 0; reset = 0; enable = 0; data = 0; end endmodule In the above example, the initial block execution and always block execution starts at time 0. Always blocks wait for the the event, here positive edge of clock, where as initial block without waiting just executes all the statements within begin and end statement. Example – always module always_example(); reg clk,reset,enable,q_in,data; always @ (posedge clk) if (reset) begin data <= 0; end else if (enable) begin data <= q_in; end endmodule In always block, when the trigger event occurs, the code inside begin and end is executed and then once again the always block waits for next posedge of clock. This process of waiting and executing on event is repeated till simulation stops. 1.4.2 Procedural Assignment Statements

• Procedural assignment statements assign values to reg , integer , real , or time variables and can not assign values to nets ( wire data types)

• You can assign to the register (reg data type) the value of a net (wire), constant, another register, or a specific value.

1.4.3 Procedural Assignment Groups If a procedure block contains more then one statement, those statements must be enclosed within Sequential begin - end block

• Parallel fork - join block Example - "begin-end" module initial_begin_end(); reg clk,reset,enable,data; initial begin

#1 clk = 0; #10 reset = 0; #5 enable = 0; #3 data = 0; end endmodule Begin : clk gets 0 after 1 time unit, reset gets 0 after 6 time units, enable after 11 time units, data after 13 units. All the statements are executed sequentially. Example - "fork-join" module initial_fork_join(); reg clk,reset,enable,data; initial fork #1 clk = 0; #10 reset = 0; #5 enable = 0; #3 data = 0; join endmodule

1.4.4 Sequential Statement Groups The begin - end keywords:

• Group several statements together. • Cause the statements to be evaluated sequentially (one at a time)

o Any timing within the sequential groups is relative to the previous statement. o Delays in the sequence accumulate (each delay is added to the previous delay) o Block finishes after the last statement in the block.

1.4.5 Parallel Statement Groups The fork - join keywords:

• Group several statements together. • Cause the statements to be evaluated in parallel ( all at the same time).

o Timing within parallel group is absolute to the beginning of the group. o Block finishes after the last statement completes( Statement with high delay, it

can be the first statement in the block). Example – Parallel module parallel(); reg a; initial fork #10 a = 0;’ #11 a = 1; #12 a = 0; #13 a = 1;

#14 a = $finish; join endmodule Example - Mixing "begin-end" and "fork - join" module fork_join(); reg clk,reset,enable,data; initial begin $display ( "Starting simulation" ); fork : FORK_VAL #1 clk = 0; #5 reset = 0; #5 enable = 0; #2 data = 0; join $display ( "Terminating simulation" ); #10 $finish; end endmodule 1.4.6 Blocking and Nonblocking assignment Blocking assignments are executed in the order they are coded, Hence they are sequential. Since they block the execution of the next statement, till the current statement is executed, they are called blocking assignments. Assignment are made with "=" symbol. Example a = b; Nonblocking assignments are executed in parallel. Since the execution of next statement is not blocked due to execution of current statement, they are called nonblocking statement. Assignment are made with "<=" symbol. Example a <= b; Example - blocking and nonblocking module blocking_nonblocking(); reg a, b, c, d ; // Blocking Assignment initial begin #10 a = 0;’ #11 a = 1; #12 a = 0; #13 a = 1; end initial begin #10 b <= 0; #11 b <=1; #12 b <=0; #13 b <=1; end initial begin c = #10 0; c = #11 1;

c = #12 0; c = #13 1; end initial begin d <= #10 0; d <= #11 1; d <= #12 0; d <= #13 1; end initial begin $monitor( " TIME = %t A = %b B = %b C = %b D = %b" ,$time, a, b, c, d ); #50 $finish(1); end endmodule 1.4.7 The Conditional Statement if-else The if - else statement controls the execution of other statements. In programming language like c, if - else controls the flow of program. When more than one statement needs to be executed for an if conditions, then we need to use begin and end as seen in earlier examples. Syntax: if if (condition) statements; Syntax: if-else if (condition) statements; else statements; 1.4.8 Syntax: nested if-else-if if (condition) statements; else if (condition) statements; ................ ................ else statements; Example- simple if module simple_if(); reg latch; wire enable,din; always @ (enable or din) if (enable) begin latch <= din; end endmodule Example- if-else module if_else();

reg dff; wire clk,din,reset; always @ (posedge clk) if (reset) begin dff <= 0; end else begin dff <= din; end endmodule Example- nested-if-else-if module nested_if(); reg [3:0] counter; wire clk,reset,enable, up_en, down_en; always @ (posedge clk) // If reset is asserted if (reset == 1'b0) begin counter <= 4'b0000; // If counter is enable and up count is mode end else if (enable == 1'b1 && up_en == 1'b1) begin counter <= counter + 1'b1; // If counter is enable and down count is mode end else if (enable == 1'b1 && down_en == 1'b1) begin counter <= counter - 1'b0; // If counting is disabled end else begin counter <= counter; // Redundant code end endmodule

Parallel if-else In the above example, the (enable == 1'b1 && up_en == 1'b1) is given highest pritority and condition (enable == 1'b1 && down_en == 1'b1) is given lowest priority. We normally don't include reset checking in priority as this does not fall in the combo logic input to the flip-flop as shown in figure below.

So when we need priority logic, we use nested if-else statements. On the other end if we don't want to implement priority logic, knowing that only one input is active at a time i.e. all inputs are mutually exclusive, then we can write the code as shown below. It is a known fact that priority implementation takes more logic to implement then parallel implementation. So if you know the inputs are mutually exclusive, then you can code the logic in parallel if. module parallel_if(); reg [3:0] counter; wire clk,reset,enable, up_en, down_en; always @ (posedge clk) // If reset is asserted if (reset == 1'b0) begin counter <= 4'b0000; end else begin // If counter is enable and up count is mode if (enable == 1'b1 && up_en == 1'b1) begin counter <= counter + 1'b1; end // If counter is enable and down count is mode if (enable == 1'b1 && down_en == 1'b1) begin counter <= counter - 1'b0; end end endmodule 1.4.9 The Case Statement The case statement compares an expression with a series of cases and executes the statement or statement group associated with the first matching case

• case statement supports single or multiple statements. • Group multiple statements using begin and end keywords. Syntax of a case statement look as shown below. case () < case1 > : < statement > < case2 > : < statement > default : < statement > endcase

1.4.10 Looping Statements Looping statements appear inside procedural blocks only. Verilog has four looping statements like any other programming language.

• forever • repeat • while • for

The forever statement The forever loop executes continually, the loop never ends. Normally we use forever statement in initial blocks. syntax : forever < statement >

Once should be very careful in using a forever statement, if no timing construct is present in the forever statement, simulation could hang. The repeat statement The repeat loop executes statement fixed < number > of times. syntax : repeat (< number >) (< statement >)

The while loop statement The while loop executes as long as an evaluates as true. This is same as in any other programming language. syntax: while (expression)<statement>

The for loop statement The for loop is same as the for loop used in any other programming language.

• Executes an < initial assignment > once at the start of the loop. • Executes the loop as long as an < expression > evaluates as true. • Executes a at the end of each pass through the loop

syntax : for (< initial assignment >; < expression >, < step assignment >) < statement >

Note : verilog does not have ++ operator as in the case of C language. 1.5 Switch level modeling 1.5.1 Verilog provides the ability to design at MOS-transistor level, however with increase in complexity of the circuits design at this level is growing tough. Verilog however only provides digital design capability and drive strengths associated to them. Analog capability is not into picture still. As a matter of fact transistors are only used as switches. MOS switches //MOS switch keywords nmos pmos Whereas the keyword nmos is used to model a NMOS transistor, pmos is used for PMOS transistors. Instantiation of NMOS and PMOS switches nmos n1(out, data, control); // instantiate a NMOS switch pmos p1(out, data, control); // instantiate a PMOS switch CMOS switches Instantiation of a CMOS switch.

cmos c1(out, data, ncontrol, pcontrol ); // instantiate a cmos switch The ‘ncontrol’ and ‘pcontrol’ signals are normally complements of each other Bidirectional switches These switches allow signal flow in both directions and are defined by keywords tran,tranif0 , and tranif1 Instantiation tran t1(inout1, inout2); // instance name t1 is optional tranif0(inout1, inout2, control); // instance name is not specified tranif1(inout1, inout2, control); // instance name t1 is not specified 1.5.2 Delay specification of switches pmos, nmos, rpmos, rnmos

• Zero(no delay) pmos p1(out,data, control); • One (same delay in all) pmos#(1) p1(out,data, control); • Two(rise, fall) nmos#(1,2) n1(out,data, control); • Three(rise, fall, turnoff)mos#(1,3,2) n1(out,data,control);

1.5.3 An Instance: Verilog code for a NOR- gate // define a nor gate, my_nor module my_nor(out, a, b); output out; input a, b; //internal wires wire c; // set up pwr n ground lines supply1 pwr;// power is connected to Vddsupply0 gnd; // connected to Vss // instantiate pmos switches pmos (c, pwr, b); pmos (out, c, a); //instantiate nmos switches nmos (out, gnd, a); Stimulus to test the NOR-gate // stimulus to test the gate

module stimulus; reg A, B; wire OUT; //instantiate the my_nor module my_nor n1(OUT, A, B); //Apply stimulus initial begin //test all possible combinations A=1’b0; B=1’b0; #5 A=1’b0; B=1’b1; #5 A=1’b1; B=1’b0; #5 A=1’b1; B=1’b1; end //check results initial $ monitor($time, “OUT = %b, B=%b, OUT, A, B); endmodule 1.6 Some Exercises 1.6.1 Gate level modelling i) A 2 inp xor gate can be build from my_and, my_or and my_not gates. Construct an xor module in verilog that realises the logic function z= xy'+x'y. Inputs are x, y and z is the output. Write a stimulus module that exercises all the four combinations of x and y ii) The logic diagram for an RS latch with delay is being shown.

Write the verilog description for the RS latch, including delays of 1 unit when instantiating the nor gates. Write the stimulus module for the RS latch using the following table and verify the outputs.

Set Reset Qn+10 0 qn 0 1 0 1 0 1 1 1 ?

iii) Design a 2-input multiplexer using bufif0 and bufif1 gates as shown below

The delay specification for gates b1 and b2 are as follows

Min Typ Max Rise 1 2 3 Fall 3 4 5 Turnoff 5 6 7

1.6.2. Behavioral modelling i) Using a while loop design a clk generator whose initial value is 0. time period of the clk is 10. ii) Using a forever statement, design a clk with time period=10 and duty cycle =40%. Initial value of clk is 0 iii) Using the repeat loop, delay the statement a=a+1 by 20 positive edges of clk. iv) Design a negative edge triggered D-FF with synchronous clear, active high (D-FF clears only at negative edge of clk when clear is high). Use behavioral statements only. (Hint: output q of D-FF must be declared as reg.) Design a clock with a period of 10units and test the D-FF v) Design a 4 to 1 multiplexer using if and else statements vi) Design an 8-bit counter by using a forever loop, named block, and disabling of named block. The counter starts counting at count =5 and finishes at count =67. The count is incremented at positive edge of clock. The clock has a time period of 10. The counter starts through the loop only once and then is disabled (hint: use the disable statement)

Lesson 22

Introduction to Hardware Description Languages - II


• Call a task and a function in a Verilog code and distinguish between them

• Plan and write test benches to a Verilog code such that it can be simulated to check the desired results and also test the source code

• Explain what are User Defined Primitives, classify them and use them in code 2.1 Task and Function 2.1.1 Task Tasks are used in all programming languages, generally known as procedures or sub- routines. Many lines of code are enclosed in -task....end task- brackets. Data is passed to the task, processing done, and the result returned to the main program. They have to be specifically called, with data in and out, rather than just wired in to the general netlist. Included in the main body of code, they can be called many times, reducing code repetition.

• Tasks are defined in the module in which they are used. it is possible to define a task in a separate file and use compile directive 'include to include the task in the file which instantiates the task.

• Tasks can include timing delays, like posedge, negedge, # delay and wait.

• Tasks can have any number of inputs and outputs.

• The variables declared within the task are local to that task. The order of declaration within the task defines how the variables passed to the task by the caller are used.

• Task can take, drive and source global variables, when no local variables are used. When local variables are used it assigns the output only at the end of task execution.

• One task can call another task or function.

• Task can be used for modeling both combinational and sequential logics.

• A task must be specifically called with a statement, it cannot be used within an expression as a function can.

Syntax

• task begins with the keyword task and ends with the keyword endtask

• Input and output are declared after the keyword task.

• Local variables are declared after input and output declaration. module simple_task(); task convert; input [7:0] temp_in;

output [7:0] temp_out; begin temp_out = (9/5) *( temp_in + 32) end endtask endmodule Example - Task using Global Variables module task_global (); reg[7:0] temp_in; reg [7:0] temp_out; task convert; always@(temp_in) begin temp_out = (9/5) *( temp_in + 32) end endtask endmodule Calling a task Lets assume that the task in example 1 is stored in a file called mytask.v. Advantage of coding the task in a separate file is that it can then be used in multiple modules. module task_calling (temp_a, temp_b, temp_c, temp_d); input [7:0] temp_a, temp_c; output [7:0] temp_b, temp_d; reg [7:0] temp_b, temp_d; `include "mytask.v" always @ (temp_a) Begin convert (temp_a, temp_b); End always @ (temp_c) Begin convert (temp_c, temp_d); End Endmodule Automatic (Re-entrant) Tasks Tasks are normally static in nature. All declared items are statically allocated and they are shared across all uses of the task executing concurrently. Therefore if a task is called simultaneously from two places in the code, these task calls will operate on the same task variables. it is highly likely that the result of such operation be incorrect. Thus, keyword automatic is added in front of the task keyword to make the tasks re-entrant. All items declared within the automatic task are allocated dynamically for each invocation. Each task call operates in an independent space.

Example // Module that contains an automatic re-entrant task //there are two clocks, clk2 runs at twice the frequency of clk and is synchronous with it. module top; reg[15:0] cd_xor, ef_xor; // variables in module top reg[15:0] c,d,e,f ; // variables in module top task automatic bitwise_xor output[15:0] ab_xor ; // outputs from the task input[15:0] a,b ; // inputs to the task begin #delay ab_and = a & b ab_or= a| b; ab_xor= a^ b; end endtask // these two always blocks will call the bitwise_xor task // concurrently at each positive edge of the clk, however since the task is re-entrant, the //concurrent calls will work efficiently always @(posedge clk) bitwise_xor(ef_xor, e ,f ); always @(posedge clk2)// twice the frequency as that of the previous clk bitwise_xor(cd_xor, c ,d ); endmodule 2.1.2 Function Function is very much similar to a task, with very little difference, e.g., a function cannot drive more then one output and, also, it can not contain delays.

• Functions are defined in the module in which they are used. It is possible to define function in separate file and use compile directive 'include to include the function in the file which instantiates the task.

• Function can not include timing delays, like posedge, negedge, # delay. This means that a function should be executed in "zero" time delay.

• Function can have any number of inputs but only one output.

• The variables declared within the function are local to that function. The order of declaration within the function defines how the variables are passed to it by the caller.

• Function can take, drive and source global variables when no local variables are used. When local variables are used, it basically assigns output only at the end of function execution.

• Function can be used for modeling combinational logic.

• Function can call other functions, but can not call a task. Syntax

• A function begins with the keyword function and ends with the keyword endfunction

• Inputs are declared after the keyword function. Example - Simple Function module simple_function(); function myfunction; input a, b, c, d; begin myfunction = ((a+b) + (c-d)); end endfunction endmodule Example - Calling a Function module function_calling(a, b, c, d, e, f); input a, b, c, d, e ; output f; wire f; `include "myfunction.v" assign f = (myfunction (a,b,c,d)) ? e :0; endmodule Automatic (Recursive) Function Functions used normally are non recursive. But to eliminate problems when the same function is called concurrently from two locations automatic function is used. Example // define a factorial with recursive function module top; // define the function function automatic integer factorial: input[31:0] oper; integer i: begin if (operan>=2) factorial= factorial(oper -1)* oper:// recursive call else factorial=1; end endfunction // call the function integer result; initial begin result=factorial(4); // call the factorial of 7 $ display (“Factorial of 4 is %0d”, result) ; // Displays 24 end endmodule

Constant function A constant function is a regular verilog function and is used to reference complex values, can be used instead of constants. Signed function These functions allow the use of signed operation on function return values. module top; // signed function declaration // returns a 64 bit signed value function signed [63:0] compute _signed (input [63:0] vector); -- -- endfunction // call to the signed function from a higher module if ( compute_signed(vector)<-3) begin -- end -- endmodule 2.1.3 System tasks and functions Introduction There are tasks and functions that are used to generate inputs and check the output during simulation. Their names begin with a dollar sign ($). The synthesis tools parse and ignore system functions, and, hence, they can be included even in synthesizable models. $display, $strobe, $monitor These commands have the same syntax, and display text on the screen during simulation. They are much less convenient than waveform display tools like GTKWave. or Undertow. $display and $strobe display once every time they are executed, whereas $monitor displays every time one of its parameters changes. The difference between $display and $strobe is that $strobe displays the parameters at the very end of the current simulation time unit rather than exactly where a change in it took place. The format string is like that in C/C++, and may contain format characters. Format characters include %d (decimal), %h (hexadecimal), %b (binary), %c (character), %s (string) and %t (time), %m (hierarchy level). %5d, %5b. b, h, o can be appended to the task names to change the default format to binary, octal or hexadecimal. Syntax

• $display ("format_string", par_1, par_2, ... );

• $strobe ("format_string", par_1, par_2, ... );

• $monitor ("format_string", par_1, par_2, ... );

• $displayb ( as above but defaults to binary..);

• $strobeh (as above but defaults to hex..);

• $monitoro (as above but defaults to octal..); $time, $stime, $realtime These return the current simulation time as a 64-bit integer, a 32-bit integer, and a real number, respectively. $reset, $stop, $finish $reset resets the simulation back to time 0; $stop halts the simulator and puts it in the interactive mode where the user can enter commands; $finish exits the simulator back to the operating system. $scope, $showscope $scope(hierarchy_name) sets the current hierarchical scope to hierarchy_name. $showscopes(n) lists all modules, tasks and block names in (and below, if n is set to 1) the current scope. $random $random generates a random integer every time it is called. If the sequence is to be repeatable, the first time one invokes random give it a numerical argument (a seed). Otherwise, the seed is derived from the computer clock. $dumpfile, $dumpvar, $dumpon, $dumpoff, $dumpall These can dump variable changes to a simulation viewer like Debussy. The dump files are capable of dumping all the variables in a simulation. This is convenient for debugging, but can be very slow. Syntax

• $dumpfile("filename.dmp")

• $dumpvar dumps all variables in the design.

• $dumpvar(1, top) dumps all the variables in module top and below, but not modules instantiated in top.

• $dumpvar(2, top) dumps all the variables in module top and 1 level below.

• $dumpvar(n, top) dumps all the variables in module top and n-1 levels below.

• $dumpvar(0, top) dumps all the variables in module top and all level below.

• $dumpon initiates the dump.

• $dumpoff stop dumping. $fopen, $fdisplay, $fstrobe $fmonitor and $fwrite These commands write more selectively to files.

• $fopen opens an output file and gives the open file a handle for use by the other commands.

• $fclose closes the file and lets other programs access it.

• $fdisplay and $fwrite write formatted data to a file whenever they are executed. They are the same except $fdisplay inserts a new line after every execution and $write does not.

• $strobe also writes to a file when executed, but it waits until all other operations in the time step are complete before writing. Thus initial #1 a=1; b=0;

• $fstrobe(hand1, a,b); b=1; will write write 1 1 for a and b. $monitor writes to a file whenever any one of its arguments changes.

Syntax

• handle1=$fopen("filenam1.suffix")

• handle2=$fopen("filenam2.suffix")

• $fstrobe(handle1, format, variable list) //strobe data into filenam1.suffix

• $fdisplay(handle2, format, variable list) //write data into filenam2.suffix

• $fwrite(handle2, format, variable list) //write data into filenam2.suffix all on one line. //put in the format string where a new line is // desired.

2.2 Writing Testbenches 2.2.1 Testbenches are codes written in HDL to test the design blocks. A testbench is also known as stimulus, because the coding is such that a stimulus is applied to the designed block and its functionality is tested by checking the results. For writing a testbench it is important to have the design specifications of the "design under test" (DUT). Specifications need to be understood clearly and test plan made accordingly. The test plan, basically, documents the test bench architecture and the test scenarios (test cases) in detail. Example – Counter Consider a simple 4-bit up counter, which increments its count when ever enable is high and resets to zero, when reset is asserted high. Reset is synchronous with clock.

Code for Counter // Function : 4 bit up counter module counter (clk, reset, enable, count); input clk, reset, enable; output [3:0] count; reg [3:0] count; always @ (posedge clk) if (reset == 1'b1) begin count <= 0; end else if ( enable == 1'b1) begin count <= count + 1; end endmodule 2.2.2 Test Plan We will write self checking test bench, but we will do this in steps to help you understand the concept of writing automated test benches. Our testbench environment will look something like shown in the figure.

DUT is instantiated in testbench which contains a clock generator, reset generator, enable logic generator, compare logic. The compare logic calculates the expected count value of the counter and compares its output with the calculated value 2.2.3 Test Cases

• Reset Test : We can start with reset deasserted, followed by asserting reset for few clock ticks and deasserting the reset, See if counter sets its output to zero.

• Enable Test : Assert/deassert enable after reset is applied.

• Random Assert/deassert of enable and reset. 2.2.4 Creating testbenches There are two ways of defining a testbench.

The first way is to simply instantiate the design block(DUT) and write the code such that it directly drives the signals in the design block. In this case the stimulus block itself is the top-level block. In the second style a dummy module acts as the top-level module and both the design(DUT) and the stimulus blocks are instantiated within it. Generally, in the stimulus block the inputs to DUT are defined as reg and outputs from DUT are defined as wire. An important point is that there is no port list for the test bench. An example of the stimulus block is given below. Note that the initial block below is used to set the various inputs of the DUT to a predefined logic state. Test Bench with Clock generator module counter_tb; reg clk, reset, enable; wire [3:0] count; counter U0 ( .clk (clk), .reset (reset), .enable (enable), .count (count) initial begin clk = 0; reset = 0; enable = 0; end always #5 clk = !clk; endmodule Initial block in verilog is executed only once. Thus, the simulator sets the value of clk, reset and enable to 0(0 makes all this signals disabled). It is a good design practice to keep file names same as the module name. Another elaborated instance of the testbench is shown below. In this instance the usage of system tasks has been explored. module counter_tb; reg clk, reset, enable; wire [3:0] count; counter U0 ( .clk (clk), .reset (reset), .enable (enable), .count (count) initial begin clk = 0;

reset = 0; enable = 0; end always #5 clk = !clk; initial begin $dumpfile ( "counter.vcd" ); $dumpvars; end initial begin $display( "\t\ttime,\tclk,\treset,\tenable,\tcount" ); $monitor( "%d,\t%b,\t%b,\t%b,\t%d" ,$time, clk,reset,enable,count); end initial #100 $finish; //Rest of testbench code after this line Endmodule $dumpfile is used for specifying the file that simulator will use to store the waveform, that can be used later to view using a waveform viewer. (Please refer to tools section for freeware version of viewers.) $dumpvars basically instructs the Verilog compiler to start dumping all the signals to "counter.vcd". $display is used for printing text or variables to stdout (screen), \t is for inserting tab. Syntax is same as printf. Second line $monitor is bit different, $monitor keeps track of changes to the variables that are in the list (clk, reset, enable, count). When ever anyone of them changes, it prints their value, in the respective radix specified. $finish is used for terminating simulation after #100 time units (note, all the initial, always blocks start execution at time 0) Adding the Reset Logic Once we have the basic logic to allow us to see what our testbench is doing, we can next add the reset logic, If we look at the testcases, we see that we had added a constraint that it should be possible to activate reset anytime during simulation. To achieve this we have many approaches, but the following one works quite well. There is something called 'events' in Verilog, events can be triggered, and also monitored to see, if a event has occurred. Lets code our reset logic in such a way that it waits for the trigger event "reset_trigger" to happen. When this event happens, reset logic asserts reset at negative edge of clock and de-asserts on next negative edge as shown in code below. Also after de-asserting the reset, reset logic triggers another event called "reset_done_trigger". This trigger event can then be used at some where else in test bench to sync up. Code for the reset logic event reset_trigger;

event reset_done_trigger; initial begin forever begin @ (reset_trigger); @ (negedge clk); reset = 1; @ (negedge clk); reset = 0;

reset_done_trigger; end end Adding test case logic Moving forward, let’s add logic to generate the test cases, ok we have three testcases as in the first part of this tutorial. Let’s list them again.

• Reset Test : We can start with reset deasserted, followed by asserting reset for few clock ticks and deasserting the reset, See if counter sets its output to zero.

• Enable Test: Assert/deassert enable after reset is applied. Random Assert/deassert of enable and reset. Adding compare Logic To make any testbench self checking/automated, a model that mimics the DUT in functionality needs to be designed.For the counter defined previously the model looks similar to: Reg [3:0] count_compare; always @ (posedge clk) if (reset == 1'b1) count_compare <= 0; else if ( enable == 1'b1) count_compare <= count_compare + 1; Once the logic to mimic the DUT functionality has been defined, the next step is to add the checker logic. The checker logic at any given point keeps checking the expected value with the actual value. Whenever there is an error, it prints out the expected and the actual values, and, also, terminates the simulation by triggering the event “terminate_sim”. This can be appended to the code above as follows: always @ (posedge clk) if (count_compare != count) begin $display ( "DUT Error at time %d" , $time); $display ( " Expected value %d, Got Value %d" , count_compare, count); #5 -> terminate_sim; end

2.3 User Defined Primitives

2.3.1 Verilog comes with built in primitives like gates, transmission gates, and switches. This set sometimes seems to be rather small and a more complex primitive set needs to be constructed. Verilog provides the facility to design these primitives which are known as UDPs or User Defined Primitives. UDPs can model:

• Combinational Logic • Sequential Logic

One can include timing information along with the UDPs to model complete ASIC library models. Syntax UDP begins with the keyword primitive and ends with the keyword endprimitive. UDPs must be defined outside the main module definition. This code shows how input/output ports and primitve is declared. primitive udp_syntax ( a, // Port a b, // Port b c, // Port c d // Port d ) output a; input b,c,d; // UDP function code here endprimitive Note:

• A UDP can contain only one output and up to 10 inputs max. • Output Port should be the first port followed by one or more input ports. • All UDP ports are scalar, i.e. Vector ports are not allowed. • UDP's can not have bidirectional ports.

Body Functionality of primitive (both combinational and sequential) is described inside a table, and it ends with reserve word endtable (as shown in the code below). For sequential UDPs, one can use initial to assign initial value to output. // This code shows how UDP body looks like primitive udp_body ( a, // Port a b, // Port b c // Port c ); input b,c;

// UDP function code here // A = B | C; table // B C : A ? 1 : 1; 1 ? : 1; 0 0 : 0; endtable endprimitive Note: A UDP cannot use 'z' in input table and instead it uses ‘x’. 2.3.2 Combinational UDPs In combinational UDPs, the output is determined as a function of the current input. Whenever an input changes value, the UDP is evaluated and one of the state table rows is matched. The output state is set to the value indicated by that row. Let us consider the previously mentioned UDP. TestBench to Check the above UDP include "udp_body.v" module udp_body_tb(); reg b,c; wire a; udp_body udp (a,b,c); initial begin $monitor( " B = %b C = %b A = %b" ,b,c,a); b = 0; c=0; #1 b = 1; #1 c = 1; #1 b = 1'bx; #1 c = 0; #1 b = 1; #1 c = 1'bx; #1 b = 0; #10 $finish; end endmodule Sequential UDPs Sequential UDP’s differ in the following manner from the combinational UDP’s

• The output of a sequential UDP is always defined as a reg

• An initial statement can be used to initialize output of sequential UDP’s

• The format of a state table entry is somewhat different

• There are 3 sections in a state table entry: inputs, current state and next state. The three states are separated by a colon(:) symbol.

• The input specification of state table can be in term of input levels or edge transitions

• The current state is the current value of the output register.

• The next state is computed based on inputs and the current state. The next state becomes the new value of the output register.

• All possible combinations of inputs must be specified to avoid unknown output.

Level sensitive UDP’s // define level sensitive latch by using UDP primitive latch (q, d, clock, clear) //declarations output q; reg q; // q declared as reg to create internal storage input d, clock, clear; // sequential UDP initialization // only one initial statement allowed initial q=0; // initialize output to value 0 // state table table // d clock clear : q : q+ ; ? ? 1 : ? : 0 ;// clear condition // q+ is the new output value 1 1 0 : ? : 1 ;// latch q = data = 1 0 1 0 : ? : 0 ;// latch q = data = 0 ? 0 0 : ? : - ;// retain original state if clock = 0 endtable

endprimitive

Edgesensitive UDP’s //Define edge sensitive sequential UDP; primitive edge_dff(output reg q = 0 input d, clock, clear); // state table table // d clock clear : q : q+ ; ? ? 1 : ? : 0 ; // output=0 if clear =1

? ? (10): ? : - ; // ignore negative transition of clear 1 (10) 0 : ? : 1 ;// latch data on negative transition 0 (10) 0 : ? : 0 ;// clock ? (1x) 0 : ? : - ;// hold q if clock transitions to unknown state ? (0?) 0 : ? : - ;// ignore positive transitions of clock ? (x1) 0 : ? : - ;// ignore positive transitions of clock (??) ? 0 : ? : - ;// ignore any change in d if clock is steady endtable

endprimitive

Some Exercises 1. Task and functions

i. Define a function to multiply 2 four bit number. The output is a 32 bit value. Invoke the function by using stimulus and check results

ii. define a function to design an 8-function ALU that takes 2 bit numbers a and computes a 5 bit result out based on 3 bit select signal . Ignore overflow or underflow bits.

iii. Define a task to compute even parity of a 16 bit number. The result is a 1-bit value that is assigned to the output after 3 positive edges of clock. (Hint: use a repeat loop in the task)

iv. Create a design a using a full adder. Use a conditional compilation (idef). Compile the fulladd4 with def parameter statements in the text macro DPARAM is defined by the 'define 'statement; otherwise compile the full adder with module instance parameter values.

v. Consider a full bit adder. Write a stimulus file to do random testing of the full adder. Use a random number to generate a 32 bit random number. Pick bits 3:0 and apply them to input a; pick bits 7:4 and apply them to input b. use bit 8 and apply it to c_in. apply 20 random test vectors and see the output.

2. Timing i) a. Consider the negative edge triggered with the asynchronous reset D-FF shown below. Write the verilog description for the module D-FF. describe path delays using parallel connection.

b Modify the above if all the path delays are 5. ii) Assume that a six delay specification is to be specified for all the path delays. All path delays are equal. In the specify block define parameters t_01=4, t_10=5, t_0z=7,t_z1=2, t_z0=8. Using the previous DFF write the six delay specifications for all the paths.

3. UDP

i. Define a positive edge triggered d-f/f with clear as a UDP. Signal clear is active low. ii. Define a level sensitive latch with a preset signal. Inputs are d, clock, and preset. Output

is q. If clock=0, then q=d. If clock=1or x then q is unchanged. If preset=1, then q=1. If preset=0 then q is decided by clock and d signals. If preset=x then q=x.

iii. Define a negative edge triggered JK FF, jk_ff with asynchronous preset and clear as a UDP. Q=1when preset=1 and q=0 when clear=1

T he table for JK FF is as follows

J K qn+1 0 0 qn 0 1 0 1 0 1 1 1 qn’

Lesson 23

Introduction to Hardware Description Languages-III


• Interface Verilog code to C & C++ using Programming Language Interface • Synthesize a Verilog code and generate a netlist for layout • Verify the generated code, and carry out optimization and debugging • Classify various types of flows in Verification

3.1 Programming Language interface 3.1.1 Verilog

PLI (Programming Language Interface) is a facility to invoke C or C++ functions from Verilog code.

The function invoked in Verilog code is called a system call. Examples of built-in system calls are $display, $stop, $random. PLI allows the user to create custom system calls, something that Verilog syntax does not allow to do. Some of these are:-

• Power analysis.

• Code coverage tools.

• Can modify the Verilog simulation data structure - more accurate delays.

• Custom output displays.

• Co-simulation.

• Designs debug utilities.

• Simulation analysis.

• C-model interface to accelerate simulation.

• Testbench modeling. To achieve the above few application of PLI, C code should have the access to the internal data structure of the Verilog simulator. To facilitate this Verilog PLI provides with something called acc routines or access routines How it Works?

• Write the functions in C/C++ code.

• Compile them to generate shared lib (*.DLL in Windows and *.so in UNIX). Simulator like VCS allows static linking.

• Use this Functions in Verilog code (Mostly Verilog Testbench).

• Based on simulator, pass the C/C++ function details to simulator during compile process of Verilog Code (This is called linking, and you need to refer to simulator user guide to understand how this is done).

• Once linked just run the simulator like any other Verilog simulation. The block diagram representing above is as follows:

During execution of the Verilog code by the simulator, whenever the simulator encounters the user defined system tasks (the one which starts with $), the execution control is passed to PLI routine (C/C++ function). Example - Hello World Define a function hello ( ), which when called will print "Hello World". This example does not use any of the PLI standard functions (ACC, TF and VPI). For exact linking details, the simulator manuals must be referred. Each simulator implements its own strategy for linking with the C/C++ functions. C Code #include < stdio.h > Void hello () { printf ( "\nHello World\n" ); Verilog Code module hello_pli (); initial begin $hello; #10 $finish; end endmodule

3.1.2 Running a Simulation Once linking is done, simulation is run as a normal simulation with slight modification to the command line options. These modifications tell the simulator that the PLI routines are being used (e.g. Modelsim needs to know which shared objects to load in command line). Writing PLI Application (counter example) Write the DUT reference model and Checker in C and link that to the Verilog Testbench. The requirements for writing a C model using PLI

• Means of calling the C model, when ever there is change in input signals (Could be wire or reg or types).

• Means to get the value of the changes signals in Verilog code or any other signals in Verilog code from inside the C code.

• Means to drive the value on any signal inside the Verilog code from C code.

There are set of routines (functions), that Verilog PLI provides which satisfy the above requirements 3.1.3 PLI Application Specification This can be well understood in context to the above counter logic. The objective is to design the PLI function $counter_monitor and check the response of the designed counter using it. This problem can be addressed to in the following steps:

• Implement the Counter logic in C. • Implement the Checker logic in C. • Terminate the simulation, whenever the checker fails. This is represented in the block diagram in the figure 23.2.

Calling the C function The change in clock signal is monitored and with its change the counter function is executed The acc_vcl_add routine is used. The syntax can be obtained in the Verilog PLI LRM.

acc_vcl_add routine basically monitors the list of signals and whenever any of the monitor signals change, it calls the user defined function (this function is called the Consumer C routine). The vcl routine has four arguments.

• Handle to the monitored object • Consumer C routine to call when the object value changes • String to be passed to consumer C routine • Predefined VCL flags: vcl_verilog_logic for logic monitoring vcl_verilog_strength for

strength monitoring

acc_vcl_add (net, display_net, netname, vcl_verilog_logic);

C Code – Basic The desired C function is Counter_monitor , which is called from the Verilog Testbench. As like any other C code, header files specific to the application are included.Here the include e file comprises of the acc routines.

The access routine acc_initialize initializes the environment for access routines and must be called from the C-language application program before the program invokes any other access routines. Before exiting a C-language application program that calls access routines, it is necessary to exit the access routine environment by calling acc_close at the end of the program.

#include < stdio.h > #include "acc_user.h" typedef char * string; handle clk ; handle reset ; handle enable ; handle dut_count ; int count ; void counter_monitor() { acc_initialize(); clk = acc_handle_tfarg(1); reset = acc_handle_tfarg(2); enable = acc_handle_tfarg(3); dut_count = acc_handle_tfarg(4); acc_vcl_add(clk,counter,null,vcl_verilog_logic); acc_close(); } void counter () printf( "Clock changed state\n" ); Handles are used for accessing the Verilog objects. The handle is a predefined data type that is a pointer to a specific object in the design hierarchy. Each handle conveys information to access routines about a unique instance of an accessible object information about the object type and, also, how and where the data pertaining to it can be obtained. The information of specific object

to handle can be passed from the Verilog code as a parameter to the function $counter_monitor. This parameters can be accessed through the C-program with acc_handle_tfarg( ) routine.

For instance clk = acc_handle_tfarg(1) basically makes that the clk is a handle to the first parameter passed. Similarly, all the other handles are assigned clk can now be added to the signal list that needs to be monitored using the routine acc_vcl_add(clk, counter ,null , vcl_verilog_logic). Here clk is the handle, counter is the user function to execute, when the clk changes.

Verilog Code

Below is the code of a simple testbench for the counter example. If the object being passed is an instance, then it should be passed inside double quotes. Since here all the objects are nets or wires, there is no need to pass them inside the double quotes.

module counter_tb(); reg enable;; reg reset; reg clk_reg; wire clk; wire [3:0] count; initial begin clk = 0; reset = 0; $display( "Asserting reset" ); #10 reset = 1; #10 reset = 0; $display ( "Asserting Enable" ); #10 enable = 1; #20 enable = 0; $display ( "Terminating Simulator" ); #10 $finish; End Always #5 clk_reg = !clk_reg; assign clk = clk_reg; initial begin $counter_monitor(top.clk,top.reset,top.enable,top.count); end counter U( clk (clk), reset (reset), enable (enable), count (count) ); endmodule

Access Routines Access routines are C programming language routines that provide procedural access to information within Verilog. Access routines perform one of two operations:

• Extract information pertaining to an object from the internal data representation. • Write information pertaining to an object into the internal data representation.

Program Flow using access routines include < acc_user.h > void pli_func() { acc_initialize(); // Main body: Insert the user application code here acc_close();

• acc_user.h : all data-structure related to access routines • acc_initialize( ) : initialize variables and set up environment • main body : User-defined application • acc_close( ) : Undo the actions taken by the function acc_initialize( )

Utility Routines Interaction between the Verilog tool and the user’s routines is handled by a set of programs that are supplied with the Verilog toolset. Library functions defined in PLI1.0 perform a wide variety of operations on the parameters passed to the system call and are used to do simulation synchronization or implementing conditional program breakpoint. 3.2 Verilog and Synthesis 3.2.1 What is logic synthesis? Logic synthesis is the process of converting a high-level description of design into an optimized gate-level netlist representation. Logic synthesis uses standard cell libraries which consist of simple cells, such as basic logic gates like and, or, and nor, or macro cells, such as adder, muxes, memory, and flip-flops. Standard cells put together form the technology library. Normally, technology library is known by the minimum feature size (0.18u, 90nm). A circuit description is written in Hardware description language (HDL) such as Verilog Design constraints such as timing, area, testability, and power are considered during synthesis. Typical design flow with a large example is given in the last example of this lesson.

3.2.2 Impact of automation on Logic synthesis For large designs, manual conversions of the behavioral description to the gate-level representation are more prone to error. Prior to the development of modern sophisticated synthesis tools the earlier designers could never be sure that whether after fabrication the design constraints will be met. Moreover, a significant time of the design cycle was consumed in converting the high–level design into its gate level representation. On account of these, if the gate level design did not meet the requirements then the turnaround time for redesigning the blocks was also very high. Each designer implemented design blocks and there was very little consistency in design cycles, hence, although the individual blocks were optimized but the overall design still contained redundant logics. Moreover, timing, area and power dissipation was fabrication process specific and, hence, with the change of processes the entire process needed to be changed with the design methodology. However, now automated logic synthesis has solved these problems. The high level design is less prone to human error because designs are described at higher levels of abstraction. High level design is done without much concentration on the constraints. The tool takes care of all the constraints and sees to it that the constraints are taken care of. The designer can go back, redesign and synthesize once again very easily if some aspect is found unaddressed. The turnaround time has also fallen down considerably. Automated logic synthesis tools synthesize the design as a whole and, thus, an overall design optimization is achieved. Logic synthesis allows a technology independent design. The tools convert the design into gates using cells from the standard cell library provided by the vendor. Design reuse is possible for technology independent designs. If the technology changes the tool is capable of mapping accordingly.

Constructs Not Supported in Synthesis Construct Type Notes

Initial Only in testbenches event Events make more sense for syncing test bench components real Real data type not supported

time Time data type not supported

force and release force and release of data types not supported assign and deassign assign and deassign of reg data types is not supported, but,

assign on wire data type is supported Example of a Non-Synthesizable Verilog construct Codes containing one or more of the above constructs are not synthesizable. But even with synthesizable constructs, bad coding may cause serious synthesis concerns. Example - Initial Statement module synthesis_initial( clk,q,d); input clk,d; output q; reg q; initial begin q <= 0; end always @ (posedge clk) begin q <= d; end endmodule Delays are also non-synthesizable e.g. a = #10 b; This code is useful only for simulation purpose. Synthesis tool normally ignores such constructs, and just assumes that there is no #10 in above statement, treating the above code as just a = b. 3.2.3 Constructs and Their Description

Construct Type Keyword Description ports input, inout, output Use inout only at IO level.

parameters parameter This makes design more generic

module definition module

signals and variables wire, reg, tri Vectors are allowed

instantiation module instances primitive gate instances

Eg- nand (out,a,b) bad idea to code RTL this way.

function and tasks function , task Timing constructs ignored

procedural always, if, then, else, case, casex, casez initial is not supported

procedural blocks begin, end, named blocks, disable Disabling of named blocks allowed

data flow assign Delay information is ignored

named Blocks disable Disabling of named block supported.

loops for, while, forever While and forever loops must contain @(posedge clk) or @(negedge clk)

3.2.4 Operators and Their Description

Operator Type Operator Symbol DESCRIPTION Arithmetic * Multiply

/ Division + Add - Subtract % Modulus + Unary plus - Unary minus

Logical ! Logical negation && Logical and || Logical or

Relational > Greater than < Less than >= Greater than or equal <= Less than or equal

Equality == Equality != inequality

Reduction & Bitwise negation ~& nand | or ~| nor ^ xor ^~ ~^ xnor

Shift >> Right shift

<< Left shift Concatenation { } Concatenation

Conditional ? conditional Constructs Supported In Synthesis

Construct Type Keyword Description ports input, inout, output Use inout only at IO level.

parameters parameter This makes design more generic

module definition module

signals and variables wire, reg, tri Vectors are allowed

instantiation module instances primitive gate instances

Eg- nand (out,a,b) bad idea to code RTL this way.

function and tasks function , task Timing constructs ignored

procedural always, if, then, else, case, casex, casez initial is not supported

procedural blocks begin, end, named blocks, disable Disabling of named blocks allowed

data flow assign Delay information is ignored

named Blocks disable Disabling of named block supported.

loops for, while, forever While and forever loops must contain @(posedge clk) or @(negedge clk)

3.2.5 Overall Logic Circuit Modeling and Synthesis in brief Combinational Circuit modeling using assign RTL description – This comprises the high level description of the circuit incorporating the RTL constructs. Some functional verification is also done at this level to ensure the validity of the RTL description. RTL for magnitude comparator // module magnitude comparator module magnitude_comparator(A_gt_B, A_lt_B, A_eq_B, A,_B); //comparison output; output A_gt_B, A_lt_B, A_eq_B ; // 4- bit numbers input input [3:0] A,B; assign A_gt_B= (A>B) ; // A greater than B assign A_lt_B= (A<B) ; // A greater than B

assign A_eq_B= (A==B) ; // A greater than B endmodule Translation The RTL description is converted by the logic synthesis tool to an optimized, intermediate, internal representation. It understands the basic primitives and operators in the Verilog RTL description but overlooks any of the constraints. Logic optimization The logic is optimized to remove the redundant logic. It generates the optimized internal representation. Technology library The technology library contains standard library cells which are used during synthesis to replace the behavioral description by the actual circuit components. These are the basic building blocks. Physical layout of these, are done first and then area is estimated. Finally, modeling techniques are used to estimate the power and timing characteristics. The library includes the following:

• Functionality of the cells

• Area of the different cell layout

• Timing information about the various cells

• Power information of various cells The synthesis tools use these cells to implement the design. // Library cells for abc_100 technology VNAND// 2 – input nand gate VAND// 2 – input and gate VNOR // 2 – input nor gate VOR// 2 – input or gate VNOT// not gate VBUF// buffer Design constraints Any circuit must satisfy at least three constraints viz. area, power and timing. Optimization demands a compromise among each of these three constraints. Apart from these operating conditions-temperature etc. also contribute to synthesis complexity. Logic synthesis The logic synthesis tool takes in the RTL design, and generates an optimized gate level description with the help of technology library, keeping in pace with design constraints.

Verification of the gate –level netlist An optimized gate level netlist must always be checked for its functionality and, in addition, the synthesis tool must always serve to meet the timing specifications. Timing verification is done in order to manipulate the synthesis parameters in such a way that different timing constraints like input delay, output delay etc. are suitably met. Functional verification Identical stimulus is run with the original RTL and synthesized gate-level description of the design. The output is compared for matches. module stimulus reg [3:0] A, B; wire A_GT_B, A_LT_B, A_EQ_B; // instantiate the magnitude comparator MC (A_GT_B, A_LT_B, A_EQ_B,. A, B); initial $ monitor ($time, “A=%b, B=%b, A_GT_B=%b, A_LT_B=%b, A_EQ_B=%b”, A_GT_B, A_LT_B, A_EQ_B, A, B) // stimulate the magnitude comparator endmodule

3.3 Verification 3.3.1 Traditional verification flow

Traditional verification follows the following steps in general.

1. To verify, first a design specification must be set. This requires analysis of architectural trade-offs and is usually done by simulating various architectural models of the design.

2. Based on this specification a functional test plan is created. This forms the framework for verification. Based on this plan various test vectors are applied to the DUT (design under test), written in verilog. Functional test environments are needed to apply these test vectors.

3. The DUT is then simulated using traditional software simulators. 4. The output is then analyzed and checked against the expected results. This can be done

manually using waveform viewers and debugging tools or else can be done automatically by verification tools. If the output matches expected results then verification is complete.

5. Optionally, additional steps can be taken to decrease the risk of future design respin. These include Hardware Acceleration, Hardware Emulation and assertion based Verification.

Functional verification When the specifications for a design are ready, a functional test plan is created based on them. This is the fundamental framework of the functional verification. Based on this test plan, test vectors are selected and given as input to the design_under_test(DUT). The DUT is simulated to compare its output with the desired results. If the observed results match the expected values, the verification part is over. Functional verification Environment The verification part can be divided into three substages :

• Block level verification: verification is done for blocks of code written in verilog using a number of test cases.

• Full chip verification: The goal of full chip verification, i.e, all the feature of the full chip described in the test plan is complete.

• Extended verification: This stage depicts the corner state bugs. 3.3.2 Formal Verification A formal verification tool proves a design by manipulating it as much as possible. All input changes must, however, conform to the constraints for behaviour validation. Assertions on interfaces act as constraints to the formal tool. Assertions are made to prove the assertions in the RTL code false. However, if the constraints are too tight then the tool will not explore all possible behaviours and may wrongly report the design as faulty. Both the formal and the semi-formal methodologies have come into precedence with the increasing complexity of design.

3.3.3 Semi- formal verification

Semi formal verification combines the traditional verification flow using test vectors with the power and thoroughness of formal verification.

• Semi-formal methods supplement simulation with test vectors

• Embedded assertion checks define the properties targeted by formal methods

• Embedded assertion checks defines the input constraints

• Semi-formal methods explore limited space exhaustibility from the states reached by simulation, thus, maximizing the effect of simulation.The exploration is limited to a certain point around the state reached by simulation.

3.3.4 Equivalence checking After logic synthesis and place and route tools create a gate level netlist and physical implementations of the RTL design, respectively, it is necessary to check whether these functionalities match the original RTL design. Here comes equivalence checking. It is an application of formal verification. It ensures that the gate level or physical netlist has the same functionality as the Verilog RTL that was simulated. A logical model of both the RTL and gate level representations is constructed. It is mathematically proved that their functionality are same.

3.4 Some Exercises 3.4.1 PLI i) Write a user defined system task, $count_and_gates, which counts the number of and gate primitive in a module instance. Hierarchical module instance name is the input to the task. Use this task to count the number of and gates in a 4-to-1 multiplexer. 3.4.2 Verilog and Synthesis i) A 1-bit full subtractor has three inputs x, y, z(previous borrow) and two outputs D(difference) and B(borrow). The logic equations for D & B are as follows D=x’y’z+ x’yz’+ xy’z’ + xyz B= x’y + x’z+ yz Write the verilog RTL description for the full subtractor. Synthesize the full using any technology library available. Apply identical stimulus to the RTL and gate level netlist and compare the outputs. ii) Design a 3-8 decoder, using a Verilog RTL description. A 3-bit input a[2:0] is provided to the decoder. The output of the decoder is out[7:0]. The output bit indexed by a[2:0] gets the value 1, the other bits are 0. Synthesize the decoder, using any technology library available to you. Optimize for smallest area. Apply identical stimulus to the RTL and gate level netlist and compare the outputs. iii) Write the verilog RTL description for a 4-bit binary counter with synchronous reset that is active high.(hint: use always loop with the @ (posedge clock)statement.) synthesize the counter using any technology library available to you. Optimize for smallest area. Apply identical stimulus to the RTL and gate level netlist and compare the outputs.

Documents

Design of Embedded Processors - WBUTHELP.COMinput buffers provide both the original and the inverted values of each PLA input. The input lines run horizontally into the AND matrix,