Digital System Design with PLDs and FPGAs Field ... Programmable...CLB SB CLB SB CLB SB CLB CLB CLB....

Preview:

Citation preview

1

111

Digital System Design with PLDs and FPGAs

Field Programmable Gate Arrays

Kuruvilla Varghese

DESE

Indian Institute of Science

Kuruvilla Varghese

222Topics

• FPGA Architecture (Xilinx, Altera, Actel)

• FPGA related Design issues

• FPGA related Timing issues

• Tool Flow

• FPGA Configuration

• SoPC

• Debugging

• Case Study

Kuruvilla Varghese

2

333Field Programmable Gate Arrays

• ASIC, MPGA/Standard Cell, FPGA

• Volumes, NRE cost, Turn around time

• Array of logic resources with programmable interconnection.

• Logic resources (Combinational, Flip flops)

• Combinational: LUT, Multiplexers, Gates

• Programmable interconnections: SRAM, Flash, Anti-fuse

• Special Resources: PLL/DLL, RAMs, FIFOs,

• Memory Controllers, Network Interfaces, Processors

Kuruvilla Varghese

444Commercial FPGA’s

• Xilinx

– Spartan-3, Spartan-6

– Virtex-4, Virtex-5, Virtex-6

– Artix-7, Kintex-7, Virtex-7, Zynq

• Altera

– Cyclone, Cyclone II, Cyclone III, Cyclone IV, Cyclone V

– Arria II, Arria V

– Stratix II, Stratix III, Startix IV, Startix V

Kuruvilla Varghese

3

555Commercial FPGA’s

• Actel

– Axcelerator (Antifuse)

– IGLOO, IGLOOE (Flash)

– ProASIC Plus (Flash)

– ProASIC3, ProASIC3E (Flash)

– RTAX (Radiation Tolerant, Anti-fuse)

– RTSX -SU (Radiation Tolerant, Anti-fuse)

– Smart Fusion, Smart Fusion 2 (ARM Cortex – M3)

Kuruvilla Varghese

666Structure of an FPGA

Kuruvilla Varghese

4

777Structure of an FPGA

Kuruvilla Varghese Source: Xilinx Data Sheets

888Detailed View

Kuruvilla Varghese

CLB

SB

CLB

SB

CLB

SB

CLB CLB CLB

5

999Switch Block

Kuruvilla Varghese

101010Types of switch blocks

Kuruvilla Varghese

6

111111FPGA

Kuruvilla Varghese Source: Xilinx Data Sheets

121212FPGA

• I/O Blocks (Tri-state output / Input, Synchronizing Flip-

flops)

• Array of Configurable Logic Blocks

• Horizontal and Vertical wires with programmable switches

in between

• Single length, Double length, Quad, Hex and Long lines

• Resources available to user

• Resources for configuring programmable switches in the

interconnect structures and Logic blocks

Kuruvilla Varghese

7

131313Programmable Connections

• SRAM (Pass Transistor)

• Flash

• Antifuse

Kuruvilla Varghese

141414SRAM (Pass Transistor)

Kuruvilla Varghese Source: Xilinx Data Sheets

8

151515Pass Transistor with configuration cell

• Flip-Flop to store the switch status (4 Transistors)

• Write Transistor to write Configuration status

• Total: 6 Transistors

• FFs controlling the Switches are organized as SRAM hence the name

Kuruvilla Varghese

Pass Transistor

Flip-Flop

Write Transistor

161616Flash Transistor

• MOS transistor with a floating gate

• Conducts when not programmed off

• Can be electrically programmed ‘off’ or ‘on’

Kuruvilla Varghese

9

171717Flash Transistor

Kuruvilla Varghese

181818Flash Cell Write

Kuruvilla Varghese

10

191919Flash Cell Erase

Kuruvilla Varghese

202020Anti-fuse

Kuruvilla Varghese

11

212121Programmable Connections

Name Volatile Re-programm-

able

Delay Area

Flash No In-circuit Large Medium

SRAM Yes In-circuit Large Large

Anti-fuse No No Small Small

Kuruvilla Varghese

222222Logic Block size

• Coarse grain

– Owing to SRAM interconnection area (6 transistors) the

Logic Blocks are made large in SRAM based FPGA

– Utilization is made high with configurability within the logic

block

• Fine Grain

– Since the antifuse occupies less area and has less time delay,

antifuse based FPGA’s employs smaller size logic blocks

Kuruvilla Varghese

12

232323Logic Cell Structure – Coarse Grain

Kuruvilla Varghese Source: Xilinx Data Sheets

242424Logic Cell Structure – Fine Grain

Kuruvilla Varghese

13

252525Design Methodology

Kuruvilla Varghese

HDL Source

Synthesis

PAR/Fitting

Programming

Functional

Simulation

Logic Simulation

Static Timing

Analysis

Timing Simulation

Constraints

Timing

ModelConfiguration

File

Equations/Netlists

262626Structure of an FPGA

Kuruvilla Varghese Source: Xilinx Data Sheets

14

272727Commercial Tools

• Simulators– ModelSim (Mentor Graphics)

– Active HDL (Aldec)

• Synthesis Tools– Synplify Pro (Synopsys)

– Precision Synthesis (Mentor Graphics)

• Vendor Tools– Xilinx ISE (Synthesis, Simulation, PAR, Programming, …)

– Xilinx Vivado (Synthesis, Simulation, PAR, Programming, …)

– Altera Quartus II (Synthesis, Simulation, PAR, Programming, …)

– Actel Libero (Synthesis, Simulation, PAR, Programming, …)

Kuruvilla Varghese

282828Commercial Tools

• Cadence Suite

• Synopsis Suite

• Mentor Graphics Suite

Kuruvilla Varghese

15

292929Xilinx Virtex FPGA

• SRAM based programmable connections, configuration

• LUT based combinational Logic

• Flip-Flops with sync/async reset/preset

• Large Configurable Logic Cells (CLBs)

• Block RAM (SPRAM, DPRAM, FIFO)

• LUT as Distributed RAM

• Low skew clock trees, DLL, Tri-state gates for Buses

• Carry Chains / Cascade Chains

• JTAG, Serial, and Parallel Configuration schemes

• I/O Blocks (Registered / Non-registered)

• Multiple I/O standards

Kuruvilla Varghese

303030Xilinx Virtex FPGA

Kuruvilla Varghese

Present day FPGAs use PLL

instead of DLL and has DSP

blocks for fixed point arithmetic

Source: Xilinx Data Sheets

16

313131Virtex CLB

Kuruvilla Varghese Source: Xilinx Data Sheets

323232LUT

• Address lines as inputs, data line as output (read mode)

• Truth table written during configuration (write)

• 4 input, 6 input LUTs

• Fixed AND, Programmable OR

Kuruvilla Varghese

00 0

01 1

10 1

11 0

A1

A0

D0

X

Y

X XOR Y

17

333333FPGA Configuration / Programming

• Writing to configuration memory

• Configuring options in Logic blocks

– Writing LUTs with truth tables

– Combining LUTs,

– Using LUTs as memory

– Selecting clocks, Set/Reset for FFs

– Configuring Various Muxes in Slices

• Using special resources (RAM, FIFOs, PLLs)

• Programming Switch matrices

• Programming I/O blocksKuruvilla Varghese

343434Virtex Family

Kuruvilla Varghese Source: Xilinx Data Sheets

18

353535Important Specifications

• CLB Array, Block RAM Bits

• User I/O, Differential I/O

• Distributed RAM Bits can be calculated from number of

CLBs (multiply by 4 x 64)

• System gates and logic gates are not useful, as these are

equivalent gate counts, it is useless to compare across

vendors

Kuruvilla Varghese

363636Structure of an FPGA

Kuruvilla Varghese

19

373737Virtex CLB

Kuruvilla Varghese Source: Xilinx Data Sheets

Kuruvilla Varghese Source: Xilinx Data Sheets

20

3939394 input LUT and Flip-Flops

• Use LUT and FFs independently

• Use LUT followed with FFs

Kuruvilla Varghese

I3

I2 O

I1

I0

S

D Q

CK

AR

I3

I2 O

I1

I0

S

D Q

CK

AR

4040404 input LUT and Flip-Flops

• Independent LUT Outputs: X, Y

• Dedicated inputs to FF: BX, BY

Kuruvilla Varghese

21

4141415 input LUT

• Two 4 input LUTs are Muxed for 5 input LUT using F5 Mux.

Select line is connected to BX and hence cannot use bottom FF

independently. F5 Mux output is connected to this FF.

Kuruvilla Varghese

I3

I2 O

I1

I0

I3

I2 O

I1

I0

I4

F5

4242426 Input LUT

• Two 5 inputs are Muxed using F6 for a 6 input LUT. Select line is connected to BY and hence cannot use top FF independently. F6 Mux output is connected to this FF.

Kuruvilla Varghese

I3

I2 O

I1

I0

I3

I2 O

I1

I0

F5

I3

I2 O

I1

I0

I3

I2 O

I1

I0

F5

F6

I4 I4

I5

22

434343Cascading LUTs

• 5 inputs and 6 inputs LUT using F5 and F6 muxes are required

in most general case, considering all possible minterms

• But in a specific case of 6 input LUT can be implemented using

cascade of two LUTs

Kuruvilla Varghese

4444446 inputs using 2 LUTs

Y = ABCDE or ABCDF

Y = (ABCD) and (E or F)

ABCD = X

Y = X and E or F

Kuruvilla Varghese

A

B

C

D

E

F

X

Truth Table

ABCD

Truth Table

X or E or F

23

4545455 inputs using 2 cascaded LUTs

Y = ABCDE

Y = (ABCD) and E

ABCD = X

Y = X and E

Kuruvilla Varghese

A

B

C

D

E

X

Truth Table

ABCD

Truth Table

X and E

4646465 inputs using 2 cascaded LUTs

Y = ABCDE or AB/CDE/

Y = (ABCD) and E

ABCD = X

Y = X and E

Kuruvilla Varghese

A

B

C

D

E

X

Truth Table

ABCD

Truth Table

X and E

24

4747475 inputs using 5 input LUT

Y = ABCD xor E

ABCD = Z

Y = ZE/ and Z/E

Kuruvilla Varghese

I3

I2 O

I1

I0

I3

I2 O

I1

I0

E

F5

A

B

C

D

A

B

C

D

Y

484848Virtex CLB: LUT

• LUT and FF can be used separately or together

• 4, 4 inputs LUTs

• 5 inputs LUT from two 4 inputs LUTs using F5 Mux

• 6 inputs LUT from two 5 inputs LUTs through F6 Mux

• Four 4 inputs LUTs / Two 5 inputs LUTs / One 6 inputs LUT

• FF: Sync/Async Set-Reset, Clock Enable– Since, both set and reset is available. Registers can be

initialized to any value, without extra overhead.

Kuruvilla Varghese

25

494949LUT as RAM

• General routing lines can be used to write LUT through the LUT

RAM write control circuit to use LUT as Distributed RAM

Kuruvilla Varghese

I3

I2 O

I1

I0

LUT

RAM

Write

505050Virtex CLB: LUT as distributed RAM

• LUT is written while configuring FPGA, when used for logic

implementation.

• Write control signals are available to be connected to routing wires so that

it can be used a s RAM when it is not used for logic implementation.

• Four 16x1 distributed RAM per CLB

• These can be combined to make various memory sizes and data widths.

• Since it is spread across CLBs, it is called Distributed RAM

• Since, it is spread across, access latency can vary and should be careful, if

you use it without read registering.

Kuruvilla Varghese

26

515151Carry Chain

• Adder

• Requires two lookup tables (Si and Ci+1) at each stage.

• This along with routing makes adder big and slow

• Hence dedicated carry chain to make adder faster,

implementing Ci+1.

iiiiii

iiii

CBABAC

CBAS

)(1 ⊕+=

⊕⊕=

+

Kuruvilla Varghese

525252Carry Chain

iiiiiiCBABAC )(1 ⊕+=

+

Kuruvilla Varghese

LUT

Ci

Si

Ci+1

0 1

Ai

Bi

27

Kuruvilla Varghese Source: Xilinx Data Sheets

545454Carry Chain

• For adders use the operator ‘+’ to be able to use carry chains.

• For higher level functions like counters etc; synthesis tool

infer and use carry chains.

• The AND gate combining Ai and Bi shown in Slice diagram

is for partial product generation in multipliers

• In some FPGAs, carry chain has features to cascade

(AND/OR) the LUT outputs.

Kuruvilla Varghese

28

555555Control of Sequential Circuits

Kuruvilla Varghese

Reg /

Counter /

FSM /

Contr-

oller

clk

en (RA_L)

565656Clock Gating

Kuruvilla Varghese

D7:0

RA_E

RA_L

CLK

D Q

CKCLK’

CLK

RA-L

CLK’

29

575757Re-circulating Multiplexer

Kuruvilla Varghese

D7:0

RA_ERA_L

CLK

D Q

CK

0

1

CLK

RA-L

Register write on the clock edge

585858

Kuruvilla Varghese

D Q

CK

D Q

CK

0

1

CE

if (clk’event and clk = ‘1’) then

if (cntrl_sig = ‘1’) then

q <= d;

end if;

end if;

Re-circulating Multiplexer

30

595959Clock Gating for low power

Kuruvilla Varghese

CLK

D7:0

RA_E

RA_L

D Q

CK

D Q

CK

CLK1CLK2

CLK

RA-L

CLK1

CLK2

606060Combinational Circuit Mapping

One or More LUTS

Kuruvilla Varghese

Comb

31

616161Sequential Circuit Mapping

One or More LUTS

Kuruvilla Varghese

One or more Flip Flops

CombFF FF

626262Counter, FSM Mapping

Kuruvilla Varghese

FFNSL OL

One or more LUTs

One or More FFs

32

636363Virtex IOB

Kuruvilla Varghese Source: Xilinx Data Sheets

646464Virtex IOB

• Three paths: Output, Input, Tri-state enable

• Direct or Through Flip-flops (synchronization)

• Flip-Flops: Set/reset, Clock enable, Clock selection

• Programmable delay at input to make hold time zero (not an issue once registered at IOB, as tcq > th)

• Programmable pull-up, pull down. Hold, slew rate

• PAR tool may move some of the input/output registers to IOB

Kuruvilla Varghese

33

656565Virtex IOB

• Various IO standards

– LVTTL

– LVCMOS33, LVCMOS25

– LVCMOS18, LVCOMS15, LVCMOS12 …

– PCI33, PCI66

– …

• Some IO standards require a Reference voltage for Inputs

• Banks of I/O pins support some of the IO standards

Kuruvilla Varghese

666666Week keeper (Hold)

• Hold circuit hold the previous state of the bus, but provides a weak drive so that it could be driven to ‘0’ or ‘1’.

• This avoids unnecessary switching of inputs by noise, if the bus would have been left in high impedance.

Kuruvilla Varghese

Bus

34

676767Detailed View

Kuruvilla Varghese

CLB

SB

CLB

SB

CLB

SB

CLB CLB CLB

686868Virtex Routing

Kuruvilla Varghese Source: Xilinx Data Sheets

35

696969Virtex Routing

• Direct connection to adjacent CLB

• 24 single length lines (per GRM in each direction

• 72 buffered Hex lines (per 6th GRM in each direction)

• 12 buffered long lines (horizontal & vertical)

• 4 tri-state lines (horizontal & vertical)

Kuruvilla Varghese

707070Bus Lines

Kuruvilla Varghese

• For Busing and Multiplexing it is better to use tri-state gates than

multiplexers

Source: Xilinx Data Sheets

36

717171Fitting Example: FSM

• FSM, with 2 inputs, 3 states, and 2 Mealy outputs. How

many CLBs to fit in?

– State Variables: 2 flip-flops (3 states)

– NSL: 2 state variables + 2 inputs = 4 inputs

– OL: 2 Inputs + 2 state variables = 4 inputs

– 2 LUTs for NSL

– 2 FFs for state variables,

– 2 LUTs for OL

– This requires 1 CLB minus two FFs In fact if output is registered still it

can be accommodated in one CLB

Kuruvilla Varghese

727272CLBs, FSM

Kuruvilla Varghese Source: Xilinx Data Sheets

FFNSL OL

37

737373Fitting Example: Counter

• 8 bit up counter with parallel load feature

– State Variables: 8 Flip-flops

– Incrementer uses carry chain

– NSL: 1 state variables + load + 1 din = 3 inputs per state

variable

– NSL requires 8 LUTs

– This requires 2 CLBs ( 4 Slices)

Kuruvilla Varghese

747474CLB, Counter

Kuruvilla Varghese

FF+1

38

757575Signal Paths in CLB

library ieee;

use ieee.std_logic_1164.all;

entity test is

port (a, b, c, d, e, f, g, h: in std_logic; z: out std_logic);

end entity test;

architecture arch_test of test is

begin

Kuruvilla Varghese

767676Signal Paths in CLB

process (a, b)

begin

if (a = '1') then z <= '0';

elsif (b'event and b = '1') then

if (c = '1') then

z <= (d and e and f and g) xor h;

end if;

end if;

end process;

end arch_test;

Kuruvilla Varghese

39

Kuruvilla Varghese

ab

c

z

d

e

f

g

d

e

f

g

h

787878Virtex DPRAM

Kuruvilla Varghese Source: Xilinx Data Sheets

40

797979Virtex DPRAM

• True Dual port Memory

• Each port can be read/write, read or write

• Synchronous reads and writes

• Can be combined for larger widths and depths

• Instantiated through Core Generator Tool

• Conflict on simultaneous read/write to a location, read

data could be wrong

• Can be initialized in VHDL code

Kuruvilla Varghese

808080Metastability

Kuruvilla Varghese

D Q

CLK

CLK

D

Q

ts th

tco

ts: Setup time: Minimum time input

must be valid before the active clock

edge

th: Hold time: Minimum time input

must be valid after the active clock

edge

tco: Propagation delay for input to

appear at the output from active clock

edge

41

818181Minimum Clock period

Kuruvilla Varghese

D Q

CLK

D Q

CLKComb

clk

Data path

tclk > tco + tcomb + tsetup

tco(min) + tcomb(min) > th(max)

Here we are considering the data path from first flip-flop to the next. We

Are estimating the minimum clock period for proper latching of data on to

second flip-flop

828282Minimum Clock period

Kuruvilla Varghese

• Sequential Circuit / FSM

tclk > tco + tcomb + tsetup

tco(min) + tcomb(min) > th(max)

CombD

CK Q

AR

NSPS

Outputs

Inputs

Clock

Reset

42

838383Clock skew

• Previous analysis assumes that the clock reaches at flip flops at

the same time, it is not practically true, as the wire delay and

buffer delay gets added.

• This creates relative delays between pair of flip flops or registers

• For analysis it is important to consider the clock skew between

flip-flops/registers where there is a data path between them.

• Clock Skew:

– Difference in arrival time of the clock at the flip flops

Kuruvilla Varghese

848484Max Path and Min Path

Kuruvilla Varghese

CHIP

clock

Max Path

Min Path

43

858585Clock Skew: Max path

Kuruvilla Varghese

D Q

CLK1

D Q

CLK2

Comb

clk

tclk – tskew > tcomax +

tcombmax + tsetup

tclk > tcomax + tcombmax +

tsetup + tskew

slack =

tclk – (tcomax + tcombmax + tsetup

+ tskew)

tclk

CLK1tskew

CLK2

ts

tco tcomb

slack

868686Clock Skew: Max path

• Analysis for data path from first flip-flop to next

• We assume tco + tcomb is greater than the hold time of flip-flop

• Hence, when a clock edge comes to both the flip-flops, new data

from first flip-flop arrives at the second flip-flop after the clock

edge, even after the hold time and won’t get latched in second

flip-flop

• But, we estimate the clock period such that when the next clock

edge comes to second flip-flop, data from the first flip-flop due

to current clock edge get latched in the second flip-flop

Kuruvilla Varghese

44

878787Clock Skew: Max path

• Since, the clock to the second flip-flop is skewed or comes early

compared to first, clock period has to accommodate this skew,

requiring a larger clock period than the case where there would

have been no skew

Kuruvilla Varghese

888888Clock Skew: Min path

Kuruvilla Varghese

D Q

CLK1

D Q

CLK2

Comb

clk

• Same edge

tcomin + tcombmin >

tskewmax + thold

• Next edge

tclk > tco + tcomb +

tsetup - tskew

CLK1

tclk

CLK2

tskew

th

tco tcomb

tskew

45

898989Clock Skew: Min path

• Here, an analysis like the case in max path (i.e. from one clock edge at first

flip-flop to next clock edge on second flip-flop) would result is a smaller

clock period, as the clock edge arrives late on second flip-flop

• But, now the real danger is the data from first flip-flop due to current edge,

appearing in the hold time window of the current edge at the second flip-

flop

• If that happens, solution is only to add extra delay to the data path between

these flip-flops, or route the clock in opposite direction

• Practically, this can happen in shift registers as there may not be

combinational delay between flip-flops

Kuruvilla Varghese

909090Clock routing

• Requirement

– Minimum relative delay between any 2 flip-flops, at least between flip

flops where there is a datapath

• Solution

– Balance the number of buffers and approximate the length of wire from

clock input to the flip-flops

– H Clock Tree

Kuruvilla Varghese

46

919191Virtex Clock Tree

Kuruvilla Varghese Source: Xilinx Data Sheets

929292DLL

Kuruvilla Varghese

CLKI CLKO

CLKFB

CLKIN CLKOUT

CLKIN

CLKOUT

tskew tadd

DLL delays CLKOUT by

“tadd” that clock edges of

both CLKIN and CLKOUT

matches

47

939393DLL / PLL

• In a DLL, input clock is delayed for de-skew

• In a PLL, a VCO synthesizes a clock synchronous to the input clock

• DLL adjusts the phase of the input clock.

• PLL synthesizes the clock of same phase and frequency as that of the input clock.

• PLL has the problem of working with a limited range of frequencies, but in FPGAs clock frequency may not change in most cases.

• PLL also cleans up the input jitters.

• Xilinx Virtex 5 has PLL blocks in addition to DLL in DCM.

Kuruvilla Varghese

949494Current FPGAs

• PLL

• Digital Clock Manager (DCM)

– DLL for de-skewing

– Phase shifter

– Frequency multiplication / division

• Clock Buffers, Muxes (Glitchless)

• All these can be connected in clock path

– Clock pins, Clock tree

Kuruvilla Varghese

48

959595Special Resources Usage

• Resources

– Buffers

– DLL / PLL

– Block RAMs

– DSP Blocks

• Usage

– Vendor library components

– Inferred by synthesis tool, when possible

– VHDL attributes with code

Kuruvilla Varghese

969696Virtex Configuration

• JTAG: Prototyping (PC Board)

• Master Serial:

– Configuring from a Serial PROM

– Embedded boards

• Slave Serial

– Works as a slave to master FPGA connected to a serial PROM

• SelectMAP

– 8 /16 bit wide synchronous slave configuration of FPGA

– Suitable for FPGA Interfaces to a CPU

Kuruvilla Varghese

49

979797Virtex Configuration: Serial PROM

Kuruvilla Varghese Source: Xilinx Data Sheets

989898Serial Configuration

• Multiple FPGAs are configured from a single serial (Flash) PROM.

• Master FPGA supplies clock to PROM and slave FPGAs

• Master and slave FPGAs are daisy chained.

• After power on or after PROGRAM request, all FPGAs configuration memory is

cleared.

• Init phase synchronization is done through INIT I/O pin

• Master FPGA programs first sending out ‘1’s on DOUT and slave FPGA waits.

• Once master FPGA is configured it sends configuration stream for first slave and

so on.

• DONE synchronization is done through open drain output DONE, to form wired

AND operation

Kuruvilla Varghese

50

999999SelectMAP Scheme

Kuruvilla Varghese Source: Xilinx Data Sheets

100100100SelectMAP Configuration: Timing

Kuruvilla Varghese Source: Xilinx Data Sheets

51

101101101FPGA Controls while configuring

• While FPGA is being configured, its internal state is not defined and pins levels are also not defined.

• Xilinx FPGA has two internal signals to keep the FPGA state sane during and after configuration.

• GTS: This signal drives all FPGA outputs to tri-state

• GSR: This signal goes to all flip flop set/reset and keeps all flip-flops set or reset as reset state specified.

• Once FPGA is configured, these signals are released.

• Use separate user resets, for normal reset operation.

Kuruvilla Varghese

102102102Spartan 6: Configuration

Kuruvilla Varghese

• Boundary Scan / JTAG / TAP / IEEE 1149.1

– Single Device, Chain

• Master Serial (Chain, Ganged) (SPI: x1, X2, X4)

• Slave Serial (SPI: x1, X2, X4)

• Master SelectMAP (x8, x16)

– Single Device, Chain, Ganged

• Slave SelectMAP (x8, x16)

52

103103103Spartan 6: Bit Stream encryption

• Bit steam is AES encrypted with 256 bit key using BitGen

tool

• Encryption key is programmed in to FPGA device through

JTAG for decryption.

• Once programmed FPGA can be configured for no read back

• Configuration also can’t be read back.

• AES key can be permanently fused in FPGA, Or in an

SRAM with external battery backup

Kuruvilla Varghese

104104104Spartan 6: Bit Stream compression

• Bit steam can be compressed when there are lot of resources

unused

• Less memory for storage

• Less configuration time

Kuruvilla Varghese

53

105105105Spartan 6: Multi Boot

• Multiple Configuration Images in Program Flash

• At least, one Main configuration and one fallback/golden

configuration

• During configuration, if CRC error of bit steam occurs, or

sync word detection is timed out (WDT), configuration

tries fall back configuration

• Supported in SPI (x1, x2, x4) and BPI Modes

Kuruvilla Varghese

106106106Spartan 6: DSP Slices

• Slices to support DSP computations

• 18 bit 2’s complement pre-adder

• 18 x 18 bit Multiplier, 36 bit result

• Result is sign extended to 48 bit

• 48 bit 2’s complement adder/subtracter

Kuruvilla Varghese

54

107107107Spartan 6: DSP48A1Slice

Kuruvilla Varghese Source: Xilinx Data Sheets

108108108Debug: Internal Signal Probing

• Probing the internal signals in FPGA for debug.

• Signal Probe / Logic Analysis

• Use a Signal Capture IP

• Interface this IP to the JTAG port

• PC based software to configure signal capture IP and display the signal waveforms

• Xilinx: ChipScope Pro

• Altera: Signal Probe

Kuruvilla Varghese

55

109109109Xilinx ChipScope Pro

Kuruvilla Varghese Source: Xilinx Data Sheets

110110110Virtex Pins

Kuruvilla Varghese Source: Xilinx Data Sheets

56

111111111Virtex Pins

Kuruvilla Varghese Source: Xilinx Data Sheets

112112112One hot encoding

Kuruvilla Varghese

Next

State

Logic

D

CK Q

AR

Output

Logic

NS

PS

OutputsInputs

Clock

Reset

LogicD

CK Q

AR

NS

PS

Outputs

Inputs

Clock

Reset

tclk > tco + tlogic + tsetup

57

113113113One hot encoding

• e.g. FSM with 5 inputs, 18 states, and 6 outputs

• NSL: 5 + 5 = 10 inputs (worst case)

• For Virtex (Worst Case)

– Basic block: 4 input LUT

– 1 CLB � 6 input LUT

– 16 CLB’s for 10 input LUT

• NSL would be distributed increasing the delay bringing down the

clock frequency of FSM.

• Solution: one hot encoding, where each state is encoded using a

flip flop.

Kuruvilla Varghese

114114114One hot encoding

Dj = condi . Qi + condj . Qj

NSL: 5 + 2 inputs (Worst Case)

Kuruvilla Varghese

Si

Sj

condi

condj

58

115115115One-hot encoding Output logic

• Most Moore outputs are direct decode of a state or decode of

more than one state

• If output is a decode of a single state, then that state flip-flop

output is the output signal

• In case of multiple states produce an output, the output

signal is the logical OR of all those state flip-flops

• Thus, one-hot encoding reduces the output logic also, at the

cost of extra state flip-flops

Kuruvilla Varghese

116116116One hot encoding

• State encoding

– Sequential, gray, one-hot-one, one-hot-zero

• User defined attributes (state encoding)

– attribute state-encoding of type-name: type is value;

(sequential, gray, one-hot-one, one-hot-zero)

attribute state_encoding of statetype: type is gray;

– attribute enum_encoding of type-name: type is “string”;

attribute enum_encoding of statetype: type is “00 01 11 10”;

Kuruvilla Varghese

59

117117117One-hot one, One-hot zero

• One-hot one

00001

00010

00100

01000

10000

• One-hot zero (Almost one-

hot)

0000

0001

0010

0100

1000

• Easy to initialize (reset all flip-

flops

• Starting state is never revisited

Kuruvilla Varghese

118118118One hot encoding

• Explicit declaration of states

signal pr_state, nx_state: std_logic_vector(3 downto 0);

constant a: std_logic_vector(3 downto 0) := “0001”;

constant b: std_logic_vector(3 downto 0) := “0010”;

constant c: std_logic_vector(3 downto 0) := “0100”;

constant d: std_logic_vector(3 downto 0) := “1000”;

Kuruvilla Varghese

60

119119119Altera Stratix

• Two levels of interconnections

• SRAM based programmable connections

• Logic Array Block (10 LE’s)

• LUT as combinational Logic

• Flip-Flops with sync/async reset/preset

• RAM Block (SPRAM, DPRAM, FIFO)

• Low skew clock trees, PLL

• Carry, Cascade chains

• DSP Blocks (Multipliers, Shift Registers)

• I/O Blocks (Registered / Non-registered)

• Multiple I/O standards

• JTAG, Parallel, and Serial Configurations

Kuruvilla Varghese

120120120Altera Stratix

Kuruvilla Varghese

61

121121121Altera Stratix

Kuruvilla Varghese Source: Altera Data Sheets

122122122Actel 54SX-A

• Antifuse based programmable interconnections

• Simple Combinational and Registered cells

• Simple I/O Blocks

• Low skew Clock trees

• Muliple I/O standards

• Hardware probe pins

Kuruvilla Varghese

62

123123123Actel 54SX-A, C Cell

Kuruvilla Varghese Source: Actel Data Sheets

124124124Actel 54SX-A, R Cell

Kuruvilla Varghese Source: Actel Data Sheets

63

125125125Actel 54SX-A

Kuruvilla Varghese Source: Actel Data Sheets

126126126Actel 54SX-A Routing

Kuruvilla Varghese Source: Actel Data Sheets

64

127127127Actel 54SX-A Probe

Kuruvilla Varghese Source: Actel Data Sheets

128128128Actel ProASIC Plus

Kuruvilla Varghese Source: Actel Data Sheets

65

129129129ProASIC Plus, Logic Tile

Kuruvilla Varghese Source: Actel Data Sheets

130130130Latch / FF

Kuruvilla Varghese

1 0

clk

D

Q

D Q

C

D Q

C

D

CLK

Q

Latch with Mux

FF with Latches

66

131131131ProASIC Plus Routing

• Fast Connect

• Short Lines (1, 2, 4), Long Lines

• Clock Tree

• Pad Ring (Pin Locking)

• SRAM Blocks

• Programming Tech: Flash

• Non-volatile

Kuruvilla Varghese

132132132CPLD vs FPGA

Features CPLD FPGA

Logic AND-OR Mux / LUT / Gates

Register to Logic

ratio

Small Large

Timing Simple Complex

Architecture

Variation

Small Large

Programming

Technology

Flash SRAM, Anti-Fuse, Flash

Capacity 10 K 2 M LUT + RAM

Kuruvilla Varghese

67

133133133Static Timing Analysis (STA)

• Timing simulation: simulates the real time operation of the circuit, with timing models of blocks for the specified test vectors

• Time consuming for exhaustive simulation

• Static Timing Analysis, analyzes various path delay from Block and wire delays

• Can make mistake as it is not aware of the real time behavior of the circuit (inputs, FSM/Controller behavior)

• A path that is never used in circuit operation may be reported (False paths)

• Registers which are not enabled every clock cycle may be reported (Multi-cycle paths)

Kuruvilla Varghese

134134134STA: Sequential Circuit

• Register to register path decides the clock frequency. But, if other 2 exceeds one need to

choose the maximum value as the minimum clock period.

• In real life, this is not a great concern many a time we are designing some IPs which goes

inside the chip interfaced to other blocks close by. Even in case inputs are outputs are

brought to external pins, proper placement should take care of these delays.

Kuruvilla Varghese

D Q

CK

D Q

CK

Comb

CLK

OutputInput

Register to Register Path

Clock to setup Clock to outputSetup to

clock

68

135135135Static Timing Analysis: Sequential Circuit

• Clock to Setup: Register to register path with longest delay

– Clock to Setup on destination clock <clk_signal>

• Clock to Pad: FF output delay - from FF output to chip

output pin

– Clock <clk_signal> to Pad

• Setup to Clock: Setup / Hold time of FF with respect to

input pin/pad

– Setup/Hold to clock <clk_signal>

Kuruvilla Varghese

136136136Static Timing Analysis

• Take Maximum of the three to find the maximum clock

frequency for timing simulation

• But, the actual throughput is given by Clock to Setup:

(Register to register path with longest delay)

• In most cases, the Clock to Pad of a module is not of

consequence, as these output when used in top level module

goes as inputs to the nearby module.

Kuruvilla Varghese

69

137137137False Paths

• Improbable Paths

• Static Paths (e.g. Input Registers)

• Paths between clock domains

Kuruvilla Varghese

138138138Multi-cycle path

Kuruvilla Varghese

Clock Enable CE2 comes 3 clock cycles after CE1

D Q

CE

CK

D Q

CE

CK

clk

Comb

CE1 CE2

70

139139139Critical Path

Kuruvilla Varghese

FF1

D Q

CE

CK

D Q

CE

CK

clk

C1

CE1CE2

FF2

C2

Critical path delay = tCO + tC1 + tC2 + tS

140140140Constraint driven PAR

• Constraint editor

• I/O constraints

– I/O locations

– I/O standards (LVTTL, PCI66-3, LVDS ..)

– Drive strength (current)

– Slew rate

– I/O termination (pull up, pull down, hold)

– Input delay

Kuruvilla Varghese

71

141141141Timing constraints

• Global– Clock period, pad to setup, clock to pad

• Per port– pad to setup, clock to pad

• Per group (by net and clock)– Pad to setup, Clock to pad

– FROM – TO, FROM – THRU – TO

• False Paths

• Multi-cycle paths

Kuruvilla Varghese

Recommended