Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

1

Lava 4 (relevant to take home exam)

Stepping back to see the bigger picture

Where can more info. be found?

What are the hot research topics?

1

Prefix

Given inputs x1, x2, x3 … xn

Compute x1, x1*x2, x1*x2*x3, … , x1*x2*…*xn

Where * is an arbitrary associative (but not necessarily commutative) operator

2

5/11/2008

2

Why interesting?

Microprocessors contain LOTS of parallel prefix circuitsnot only binary and FP adders

address calculation

priority encoding etc.

Overall performance depends on making them fast

But they should also have low power consumption...

Parallel prefix is a good example of a connection pattern for which it is interesting to do better synthesis

3

Serial prefix

least most significant

inputs n=8depth d=7size s=7 (number ops)

Pictures generated by symbolic evaluation of Lava descriptionsStyle is specific to parallel prefix

4

5/11/2008

3

5

serr _ [a] = [a]

serr op (a:b:bs) = a:cs

where

c = op(a,b)

cs = serr op (c:bs)

*Main> simulate (serr plus) [1..10]

[1,3,6,10,15,21,28,36,45,55]

Sklansky

6

5/11/2008

4

Sklansky

32 inputs, depth 5, 80 operators

7

skl _ [a] = [a]

skl op as = init los ++ ros'

where

(los,ros) = (skl op las, skl op ras)

ros' = fan op (last los : ros)

(las,ras) = halveList as

8

5/11/2008

5

9

Brent Kung

fewer ops, at cost of being deeper. Fanout only 2

BK recursive pattern

10P is another half size network operating on only the thick wires

5/11/2008

6

11

Ladner Fischer

NOT the same as Sklansky; many books and papers are wrong about this(including slides from Digital Circuit Design course)

Question

How do we design fast low power prefix networks?

12

5/11/2008

7

Answer

Generalise the above recursive constructions

Use dynamic programming to search for a good solution

User Wired to increase accuracy of power and delay estimations (see later lecture by Emil)

13

BK recursive pattern

14

P is another half size network operating on only the thick wiresThis is an alternative view to the ”forwards and backwards trees” thatsome of you saw in Jeppson’s course

5/11/2008

8

BK recursive pattern generalised

15Each S is a serial network like that shown earlier

16

4 2 3 … 4

This sequence of numbersdetermines how the outer”layer” looks

5/11/2008

9

17

4 2 3 … 4

4 2 3 … 4

-1 +1

sequence for widths of fans at bottom is closely related

18

4 2 3 … 4

3 2 3 … 5

sequence for widths of fans at bottom is closely related

5/11/2008

10

19

4 2 3 … 4

So just look at allpossibilities for this sequence

and for each one findthe best possibility forthe smaller P

Then pick best overall!

Dynamic programming

Search!

need a measure function (e.g. number of operators)

Very similar to a ”shortest paths” algorithm

20

5/11/2008

11

21

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx)whereprefix f = memo pm

wherepm ([d],_,w) = trywire ([d],w)pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC ds (prefix f)| ds <- topds g h (length is)]where

. . . .

The real code!

22




. . . .

The real code!

f1 is the measure function beingoptimised for

5/11/2008

12

23




. . . .

The real code!

g is max width of small S and Fnetworks. Controls fanout.

24




. . . .

The real code!

contextdelays inwire numbers (positions) inallowed depth

(is,xs,w)

5/11/2008

13

25




. . . .

The real code!

use memoisation to avoidexpensive recomputation

26




. . . .

The real code!

base case: single wire

5/11/2008

14

27




. . . .

The real code!

Fail if it is simply impossibleto fit a prefix network in theavailable depth

28




. . . .

The real code!

For each candidate sequence:Build the resulting network(where call of (prefix f) gives the best network for the recursive callinside)(Needed to think hard aboutcontrolling size of search space)

5/11/2008

15

29

parpre f1 g ctx = getans (error "no fit") (prefix f1 ctx)whereprefix f = memo pm



. . . .

The real code!

Finally, pick the best amongall these candidates

30

Result when minimising number of ops, depth 6, 33 inputs, fanout 7

This network is Depth Size Optimal (DSO)

depth + number of ops = 2(number of inputs)-2 (known to be smallest possible no. ops for given depth, inputs)

6 + 58 = 2*33 – 2

5/11/2008

16

31

64 inputs, depth 8, size 118 (also DSO)

BUT not min. depth.

We need to move away from DSO if we want shallow networks

A further generalisation

32

5/11/2008

17

33

parpre1 f1 f2 g m ctx = getans (error "no fit") (prefix f1 ctx)where

prefix f = memo pmwhere

pm ([],_,w) = trywire ([],w)pm ([i],_,w) = trywire ([i],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC1 ds (prefix f) (prefix f2)| ds <- topds1 g h m lis]

34





extra base case for 0 inputs

5/11/2008

18

35





now there are 2 recursive calls

Result

When minimising no. of ops: gives same as Ladner Fischer for 2^n inputs, depth n,

considerably fewer ops and lower fanoutelsewhere (non power of 2, deeper)

Translates into low power plus decent speed when exported to Design Compiler

36

5/11/2008

19

37

Link to Wired allows more accurate estimates. Can then explore design space

38

Can also export to Cadence SoC Encounter

5/11/2008

20

Wired

Start with Lava-like description and then graduallyadd placement info. + wiring ”guides”

Can still use our bag of programming tricks

(still embedded in Haskell)

Quick but relatively accurate design exploration

See lecture by Emil on thursday

39

Obvious questions

This is very low level. What about higher up, earlier in the design?

(Tentative assertion: these were general programming idioms with possible application at other levels of abstraction.)

What about the cases when such a structural approach is inappropriate?

Can we make refinement work?

Can we design appropriate GENERIC verification methods?

40

5/11/2008

21

Putting the designer in control

Connection patterns are essential first step (and give some layout awareness when wanted)

We write circuit generators rather than circuit descriptions. Everything is done behind the scenes by symbolic evaluation. Full power of Haskell is available to the user (but we have some useful idioms to reduce the fear).

Circuit generators are short and sweet and LOOK LIKE circuit descriptions.

41

It’s all about programming

Non-standard interpretation used after generation (as we havelong done) and now also to guide synthesis

Clever circuits a good idiom. Can control choice of components, wiring and topology. Greatly increase expressive power of the connection patterns approach.

Having a full functional language available is a great once onehas had some practice. More idioms to be discovered

Ideas compatible with Intel’s IDV

42

5/11/2008

22

We can’t only think about function

Clever circuits give a way to allow non-functional properties to influence design (even early on). Makes blocks context sensitive.

Vital as we move to deep sub-micron

Separation of concerns becoming less and less possible

First experiments are (and will be) about module generation

Remains to be seen if there are applications at higher levels

Hopefully, a project on DSP Algorithm Design with Ericsson

will explore this

43

44

The Big Picture (Design and Verification Languages)(see chapter in e-Book)

VHDL Verilog

CUML

5/11/2008

23

45

The Big Picture (Languages)

VHDL Verilog

CUML

46

Intel

IDV (Seger)

Forte (Intel’s FV system)

IBM

SystemML(now called HDML, on sourceforge)

Masters projects possible

Behavioural Lava (York)

Lava + Wired

etc.

Bluespec SV

Lustre, Esterel

Cryptol

5/11/2008

24

47

The Big Picture (Verification methods)(see course intro., lectures by Seger and Kunz)

Equivalence Checking (formal)

SimulationProperty Checking

Formal

48

Kunz (Infineon, Siemens, Bosch… OneSpin)

processor and SoC verificationSAT-basedExtremely impressive!

see also work at companies likeNVIDIA, Freescale, …(see panel at FMCAD 2007(links page))

A problem is that there is a lot of unpublished work….

5/11/2008

25

49

Intel (Seger’s lecture)Forte (STE)niches (such as Floating Point Arith.)

IBM Sixth Sensecombines formal and semi-formalemphasises scalability and automation

see great presentation by Baumgartnerfrom FMCAD 2006 (links page)

Hot research topics

Coverage (OneSpin look to have something veryinteresting, but it is not public)

Methodology, Finding new FV ”recipes”

Moving up in abstraction levels

Satisfiability Modulo Theories (SMT), First Order Logic

How to design (and verify) complete systems

has become harder because of multicore

Getting control of non-functional properties (particularly power consumption) 50

5/11/2008

26

Hot research topics

Parallelisation of EDA algorithms

Protocol verification

Increasing automation of FV

(e.g. transformation-based verification ala Sixth Sense)

how to build and use verification IP

reuse

Post-silicon verification

51

You should think about

The two different design flows that you have seenWhat was good and bad about themYOUR opinions based on your experience(which is influenced by previous expertise)

Formal Verificationevidence about its use (suitable niches, module verification)limitations (a main one being scalability)what it can give when it works

52

Documents

Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall