ECE 551 Digital System Design & Synthesis

Preview:

DESCRIPTION

ECE 551 Digital System Design & Synthesis. Lecture 12 “To synthesis, and beyond…”. So, the thing finally synthesized!. So, what have you created so far? A list of the required hardware cells A netlist describing their interconnections - PowerPoint PPT Presentation

Citation preview

ECE 551Digital System Design &

Synthesis

Lecture 12“To synthesis, and beyond…”

So, the thing finally synthesized! So, what have you created so far?

A list of the required hardware cells A netlist describing their interconnections A simulation model that hopefully reflects reality

more accurately than the pure HDL-level simulation Includes semi-accurate logic delays

2

Now What? After synthesis, we have a netlist mapped to

our specific tech library ROMs PLDs FPGAs Standard cells Custom logic

Choose implementation platform based on cost and performance requirements

3

ROMs

Use like a GIANT truth table

Can be inefficient forsimple logic! Gates

Specify just the 1’s Specify just the 0’s

ROM Has to specify both! All outputs for all possible

minterms

4

x y zdcba

0 0 0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 1. . . .

1 1 0 01 1 0 11 1 1 01 1 1 1

. . . .

. . . .

0 0 00 1 10 1 10 1 00 1 10 1 00 1 00 1 10 1 10 1 0. . .. . .. . .0 1 00 1 10 1 11 1 0

address data

ROMs Use like a GIANT truth table

5

x

y

zd

c

b

a0 0 00 1 10 1 10 1 00 1 10 1 00 1 00 1 10 1 10 1 0. . .. . .. . .0 1 00 1 10 1 11 1 0

addressdata

0

1

1

0

1

0

0

ROMs Use like a GIANT truth table 64K ROM: 8K entries x 8 bits (13 addr. lines)

8 Boolean functions using any of these 13 1-bit variables

6

abcdefghijkl

m

s

z

y

x

w

v

ut

address data

ROMs Use like a GIANT truth table 64K ROM: 8K entries x 8 bits (13 addr.

lines) 2 4-bit functions of 3 4-bit variables (plus flag) Other options possible

7

abcdefghijkl

m

s

z

y

x

w

v

ut

address data

ROM Logical Structure

8

AddressD ecoder

(N on-program m able)

O R M em ory Array(2 n x m )

addr[0 ]addr[1 ]

addr[n -1 ]

w [m -1] w [0]

2n M in term s (w ord lines)fo rm ed from inputs

n inputs

m outputs

ROM Circuit Structure

9

AddressD ecodern - to - 2 n

(N on-program m able)

D [m -1 ] D [0]D [1 ]D [2 ]

addr[0 ]

addr[1 ]

addr[n -1 ]

En_bar

m ask-program m edpu lldow n transistor link

V D D

Pull-upR esisto r

O R -P lane (2 n x m )

M em oryC ell

W ord lines

O utpu ts (B it-lines )

Pro

duct

Ter

ms

( Wor

d-lin

es)

Inpu

ts

Erasable Programmable ROM (EPROM)

10

AddressDecodern - to - 2n

(Non-program m able)

w[m -1] w[0]w[1]w[2]

addr[0]

addr[1]

addr[n -1]

En_bar

Floating Gate(P rogram m able) V D D

Pull-upResistor

m in2n -3

m in2n -1

m in2n -2

m in0

OR-Plane (2 n x m )

Mem oryCell

Flash Memory A flash memory is an electrically erasable PROM

configured with additional circuitry to allow erasure/programming blocks of memory (e.g. 16-64 Kbytes) in circuit.

Widely used as the program storage memory for computers and embedded systems, as well as data storage memory (audio, video, file systems) High endurance 100k/1M+ erase cycles

Flash memory (SSDs) are cost-competitive with magnetic disks up to several GB, with no mechanical shock issues, and much better random-access times.

Some FPGAs use flash memory instead of SRAM to allow instant-on behavior and not expose IP.

11

Comparison of ROMs

12

D evice

EEPR O M

F LASH

EPR O M

PRO M

Program m ingM ode Erase M ode

In-circu itByte-by-byte

In-circu it

O ut-of-c ircu it

C ustom byuser (O TP***) N one

In-circu itByte-by-byte

In -circu itBulk or sector

O ut-of-c ircu itBulk, U V Ligh t

C om plexityand C ost

R O M * M ask N one

AccessT im e

150 ns

*R equires h igh vo lum e to o ffset N R E** P rogram m ing tim e: 500 m s*** O ne-tim e program m able

Exam ple

T M S47C 25632K x 8 C M O S

AT27B V400256K x 16 or

512K x 8

In te l 27324K x 8 NM O S 45 ns

In te l 28648K x 8 NM O S

AT49LV102464K x 16 N M O S 70 ns**

ROMs Cheap – couple bucks each Reuse EEPROMs with different truth tables Non-volatile - keep values when power gone Very slow compared to gates (memory read) Combinational-only Limited to fairly simple designs (e.g., 20 or

fewer inputs) due to exponential scaling

ROMs are good for complex operations that use few variables (trigonometry, matrix inversion, etc.)

They are often used in combination with other types of logic 13

PLDs Programmable Logic Devices

PLA (Programmable Logic Array) – programmable AND and OR arrays

PAL (Programmable Array Logic) – programmable AND array and fixed OR array

Programming done at points where wires cross

14

a !a b !b c !c d !d

x y

a !a b !b c !c d !d

x

y

PLA PAL

OutputsInputsInputs

Outputs

ProductTerms

PLDs Programming points where wires cross x = a b c + a d y = a b c d + a b d + b c d

15

a !a b !b c !c d !d

x y

a !a b !b c !c d !d

x

y

PLA PAL

OutputsInputsInputs

Outputs

ProductTerms

PLDs Moderate per-unit price – 1s to 10s of $ Most are re-programmable Faster than ROMs Relatively slow compared to gates

Programming points cause delay Limited complexity

“Complex” PLDs have sequential ability, but are still too limited for very complex designs

Crossbar design scales poorly with number of inputs

Good when you don’t need the complexity of FPGA and want to save money.

16

FPGAs Field Programmable Gate Array

Temporary (Flash/SRAM based) Permanent (Anti-fuse) not as common

Pros Allow for very complex implementations Generally re-useable

(upgrades/bug-fixes/prototype) Low non-recurring engineering (NRE) costs

Cons Expensive per-unit (10s-100s of $) Slower than gates

Programming points MPGA – mask-programmable (one time)

17

Programming an FPGA Most designs based on SRAM

During configuration, the SRAM bits in the device are written with the desired values Note that this means that your IP is being passed into the

FPGA in a serial stream for the whole world to see! Different circuits implemented based on values

set in SRAM bits that form LUTs, control multiplexers, and make routing connections

18

Routing Elements Programmable connection

Programmable bypass

19

RoutingResource #1

P

RoutingResource #2

DFF

OUT

SIGNAL

P

Logic Elements Look-Up Table (LUT)

Essentially a very small memory Most common size is 4-input LUT

20

P1P2

P3P4

P5P6

P7P8

a cb

OUT

01234567

Logic Elements Look-Up Table (LUT) Example

OUT = a XOR b XOR c

21

01

10

10

01

a cb

OUT

01234567

Logic Elements Look-Up Table (LUT) Example

OUT = ab + ac + bc

22

10

11

11

01

a cb

OUT

01234567

Logic Elements Look-Up Table (LUT)

Extremely flexible in implementing logic Can implement any function!

Larger and slower than just using gates

23

P1P2

P3P4

P5P6

P7P8

a cb

OUT

01234567

FPGA Logic Structure

“Cell” or “logic block”: 1 or more LUTs

(generally 4-input) At least one D flip-flop Possibly fast carry logic

Connect several logic blocks to form circuit

24

4-LUT

carry logic

Cout Cin

OUT

DFF

I1 I2 I3 I4

Xilinx 4000 Combinational Logic Block

25

Xilinx 4000 FPGA (# of CLBs not to scale)

26

SwitchMatrix

CLB

IO B

IO B

IO B

IO B

IO B

IO B

IO B

IO B

IO B

IO B

IO B IO B IO B

IO B IOB IO B

Verticallong line

Horizontallong line

CLB

SwitchMatrix

CLBCLB

SwitchMatrix

SwitchMatrix

SwitchMatrix

SwitchMatrix

SwitchMatrix

SwitchMatrix

SwitchMatrixIO B

IO B

IO B

IOB

FPGA Summary Allow for complex implementations Generally reuseable

(upgrades/bugfixes/prototype) Low non-recurring engineering (NRE) costs

Relatively expensive per-unit (10s-100s of $)

Slower than pure gates (programming points), but FPGAs are normally first to latest technology

Newer FPGAs incorporate memories, multipliers, peripherals, and even processors all on the same chip

27

FPGA Trends Hardware specialization

Memory block hierarchies I/O interfaces

High-speed serial I/O Clock management Hardware for DSP (MAC units)

Intellectual Property (IP) cores Hard-cores Soft-cores http://www.altera.com/products/ip/ipm-index.html

Conversion to mask-programmed devices Altera Hard Copy, Xilinx Easy Path

Current Technology Examples...

Xilinx Virtex-5Xilinx’s nearly top of the line FPGA 65nm process technology

550MHz RAM blocks 6-input LUTs

Serial connectivity Ethernet MACs Rocket I/O serial 3.25Gbps PCI Express endpoint

Enhanced DSP blocks (25x18, 48b accum) 1760 pin BGA with 1200 I/O EasyPath

Xilinx Virtex-5 Applications

Xilinx Virtex-5 Family

Altera Stratix III

Stratix III

Stratix III

Altera Stratix III

Altera NIOS

Altera NIOS

Altera NIOS

Stratix III vs. Virtex-5http://www.altera.com/literature/wp/wp-01007.pdf

Stratix III vs. Virtex-5

More Current Products Actel FPGAs

Flash-based design eliminates configuration time Less susceptible to radiation induced upsets

Also manufactured in antifuse technology

Mask-Programmable Gate Arrays Mask-programmable (MPGAs) Fixed logic elements, metal routing added

42

Fixed Spacing

Base Cell

Metal interconnect placed in channels between cellsTransistor / gate

MPGAs Cheap per-unit pricing ($1s-$10s) Fast compared to ROMs/PLDs/FPGAs Simpler Mask than Standard Cell (routing

only) Fixed gates available High non-recurring engineering (NRE) cost -

design time, mask fabrication... $10K-$100Ks

Best for medium-to-large quantities

Used for medium-to-high-volume designs, or hardware that must be faster than FPGA 43

Standard Cells

44

Gates and other small structures

Can also use macroblocks Groups of pre-optimized

cells Larger custom-layout

structures Better logic density

From: http://www.zuraleff.com/layout

Standard Cell Layouts

45

…Adjustable

Spacing

Megacells

Metal interconnect placed in channels between cells

Gate, flip-flop, 1-bit adder, …

IC Layout Styles

Technologies in terms of layout styles:

46

Adjustable Spacing

Megacells

Standard Cell

Gate Array

…Fixed Spacing

Base Cell

Metal interconnect placed in channels between cells

Gate, flip-flop, 1-bit adder, …

Transistor / gate

Standard Cells Cheap per-unit pricing ($1s-$10s) Achieve better logic density than MPGA Fast compared to ROMs/PLDs/FPGAs

High NREs (design time, mask fabrication...) $100Ks-$10Ms More expensive masks than Gate Arrays

Used for Large quantities and/or Performance-critical operations

47

Custom Logic Manual layout Extremely high NRE

Huge design time! Even longer verification

time Maximum performance

and density

PLD/FPGA physicalhardware is custom logic They sell a LOT of them! You don’t have to

amortize all of their NRE, just part

48

Hardware Implementations Making the right platform choice is one of the

most important decisions for a design project’s success

There is no one “best” method

Tradeoffs between cost, speed, time-to-market, upgradeability, power efficiency

Technological changes are shifting traditional design choices. Engineers must be ready.

49

Hardware Trends Standard Cell & Custom getting more

expensive Validation is getting harder with smaller gates and

more complex designs, and is not scaling well w/Moore’s Law.

Licensing of IP is being used to counter-act NRE “Hard” (layout) and “Soft” (HDL) IP cores ARM architecture a great example

50

Hardware Trends FPGAs are getting faster and bigger

Big enough to implement a lot of designs that used to require Standard Cells

Lots of built-in IP for connectivity: Ethernet, USB, SATA

Power is becoming a significant driver Moore’s Law scaling survives for logic density but

is dying for total power consumption More computing devices are battery powered, and

batteries are not keeping pace with Moore’s Law

51

Technology Mapping Generally part of synthesis Use different tools / components based on

standard cells vs. FPGA target

Divides your circuit into basic building blocks.

52

Tech Mapping: Standard Cells Need to select your library

Which cells you’re using Which macro-cells / specialized structures

In this class, we’re using: TSMC 65/45/40 nm cell libraries

Tech mapping then implements your netlist in terms of the available cells

How do you choose?

53

Tech Mapping: Standard Cells Example boolean equation:

z = a b c + c d + e Example cell library:

2-input NAND, INV Resulting tech-mapped circuit:

54

acb

ecd

z

Tech Mapping: FPGAs Need to know building blocks of the FPGA

LUT size (if uses LUTs) Any special resources (multipliers, RAM blocks)

Tech mapping then implements your netlist in terms of those building blocks

55

Tech Mapping: FPGAs Example boolean equation:

z = a b c + c d + e Example basic block:

4-input LUT Resulting tech-mapped circuit:

56

acb

ecd

z

LUT #1

LUT #2

Tech Mapping: FPGAs Example boolean equation:

z = a b c + c d + e Example basic block:

4-input LUT Resulting tech-mapped circuit:

57

acb

ecd

z

LUT #1

LUT #2

a b c

y y + c d + e

Now What? So you’ve:

Designed your hardware in Verilog. Chosen your hardware implementation

(std. cells, FPGA, etc) How do you get from a netlist to silicon?

VLSI CAD (“Physical Design”)

58

VLSI CAD Flow

59

Translation

Verified HDL Description

Generic Netlist

Technology Mapping

Cell Library / FPGA

DescriptionPlace

Route

Partition & Floorplan

Mask Gen.

...To Fab!

Std. cells

Config. Bits

…Program!

FPGA

Post-SynthesisNetlist

Partitioning & Floorplanning Sometimes you have BIG circuits

Makes placement take a long time Yields poor results (too large a solution space)

Use partitioning and floorplanning Partitioning: Divide netlist into partitions Floorplanning: Assign partitions to chip regions Place regions separately Benefit: Small problems are easier to solve well

than large ones

What’s the Disadvantage? 60

Partitioning Example

61

A

B

C

D

E

G

F

H

I

J

K

L

How might we choose to form 3 partitions?

Partitioning Example - Bad

62

A

B

C

D

E

G

F

H

I

J

K

L

Partitioning We want to try to make our partitions as

independent as possible. Independent = fewer outside connections

Why? Want to keep wires short Try to place partitions adjacent to the partitions

they interconnect with If we have a lot of interconnections, this may not

be easy/possible

63

Partitioning Example - Bad

64

A

B

C

D

E

G

F

H

I

J

K

L

Partitioning Example - Better

65

A

B

C

D

E

G

F

H

I

J

K

L

Floorplanning OK, so we’ve divided our problem up into

partitions

Now, figure out where partitions should be placed relative to one another

Assign partitions to regions of the silicon / FPGA

Try to avoid long wires between partitions Don’t want to have to route wires through

too many other partitions Wastes area in those partitions 66

Floorplanning Example

67

4

72

1

5

3

6

9

8

Floorplanning Example Try to arrange partitions to minimize cross-

partition routing

68

4

7

2

1

5 3

6

98

Eat your heart out, Sudoku.

Placement Need to assign physical locations to

cells/LUTs If partitioning

Relative to the partition boundaries Otherwise

Relative to the chip boundaries

Common goal Reduce total wirelength of placed circuit

69

Placement Standard Cells:

Choosing a row for each cell Choosing a location within the row for each cell

FPGAs: Choosing which physical LUTs implement each

netlist LUTs

70

Routing Have locations for all the cells/LUTs in the

netlist Now need to connect them together to

actually make the circuit

Different techniques for std. cell vs. FPGA

Divided into: Global Detailed (local)

71

Global Routing Find a rough path for each net Figure out what areas a signal passes

through

72

Detailed Routing: Std. Cells Connect the cells within the global regions Common goal: minimize channel width

73

Channel Width

1 2 2 4 4 0 3 0 4

2 4 4 3 0 0 3 3 1

Detailed Routing: FPGAs Assign signals in netlist to:

Wires Switchbox points

Fixed set of available resources Can’t “widen” routing channels like Std Cell

Common goal: Reduce congestion Congestion is the ratio of signals:wires By keeping areas “open”, more likely to be able to

route later signals

74

Detailed Routing: FPGAs Common goal: Reduce congestion

75

Detailed Routing: FPGAs Common goal: Reduce congestion

76

Detailed Routing: FPGAs Frequently start with an “idealized” routing

Signals can share wires Repeatedly “rip up” and reroute

One or more nets (signals) Stop when no wires are shared

77

Final Steps: Std. Cells Generate “masks” for each layer, indicating

where the material in that layer goes Have cell locations, cell library has cell “design” Plus metal layers created during the routing phase

Send to chip fabrication foundry

78

Final Steps: FPGAs Generate the “configuration bitstream”

The series of 1’s and 0’s that determine the FPGA’s function

Tools determine these values based on: LUT contents Routing resource useage

Load the configuration onto the FPGA Also called “programming” or “configuring”

79

Conclusion Synthesis isn’t the end of the process!

Many steps after it Choose target implementation

Examine cost/performance tradeoffs Use CAD tools to implement synthesized

circuit on FPGA or std. cells Optionally partition & floorplan Place & Route Generate bitstream or layout masks

See ECE 556 for more details on CAD algorithms

80

3.125 Gb/s Transceiver

Xilinx Digital Clock Manager (DCM)

Eliminate clock skew using Delay-Locked Loop (DLL) Monitors clock skew on output and corrects Frequency doubling, multiphase clocks

Fractional Digital Frequency Synthesizer (DFS) - fOUT = M/N fIN

Input/Output Block (IOB)

Slew rate and drive strength controlPull-up, pull-down and keeperDDR signalsControlled-Z input/outputBoundary scan

Recommended