26
5/11/2008 1 Lava 4 (relevant to take home exam) Stepping back to see the bigger picture Where can more info. be found? What are the hot research topics? 1 Prefix Given inputs x1, x2, x3 … xn Compute x1, x1*x2, x1*x2*x3, , x1*x2*…*xn Where * is an arbitrary associative (but not necessarily commutative) operator 2

Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

1

Lava 4 (relevant to take home exam)

Stepping back to see the bigger picture

Where can more info. be found?

What are the hot research topics?

1

Prefix

Given inputs x1, x2, x3 … xn

Compute x1, x1*x2, x1*x2*x3, … , x1*x2*…*xn

Where * is an arbitrary associative (but not necessarily commutative) operator

2

Page 2: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

2

Why interesting?

Microprocessors contain LOTS of parallel prefix circuitsnot only binary and FP adders

address calculation

priority encoding etc.

Overall performance depends on making them fast

But they should also have low power consumption...

Parallel prefix is a good example of a connection pattern for which it is interesting to do better synthesis

3

Serial prefix

least most significant

inputs n=8depth d=7size s=7 (number ops)

Pictures generated by symbolic evaluation of Lava descriptionsStyle is specific to parallel prefix

4

Page 3: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

3

5

serr _ [a] = [a]

serr op (a:b:bs) = a:cs

where

c = op(a,b)

cs = serr op (c:bs)

*Main> simulate (serr plus) [1..10]

[1,3,6,10,15,21,28,36,45,55]

Sklansky

6

Page 4: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

4

Sklansky

32 inputs, depth 5, 80 operators

7

skl _ [a] = [a]

skl op as = init los ++ ros'

where

(los,ros) = (skl op las, skl op ras)

ros' = fan op (last los : ros)

(las,ras) = halveList as

8

Page 5: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

5

9

Brent Kung

fewer ops, at cost of being deeper. Fanout only 2

BK recursive pattern

10P is another half size network operating on only the thick wires

Page 6: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

6

11

Ladner Fischer

NOT the same as Sklansky; many books and papers are wrong about this(including slides from Digital Circuit Design course)

Question

How do we design fast low power prefix networks?

12

Page 7: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

7

Answer

Generalise the above recursive constructions

Use dynamic programming to search for a good solution

User Wired to increase accuracy of power and delay estimations (see later lecture by Emil)

13

BK recursive pattern

14

P is another half size network operating on only the thick wiresThis is an alternative view to the ”forwards and backwards trees” thatsome of you saw in Jeppson’s course

Page 8: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

8

BK recursive pattern generalised

15Each S is a serial network like that shown earlier

16

4 2 3 … 4

This sequence of numbersdetermines how the outer”layer” looks

Page 9: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

9

17

4 2 3 … 4

4 2 3 … 4

-1 +1

sequence for widths of fans at bottom is closely related

18

4 2 3 … 4

3 2 3 … 5

sequence for widths of fans at bottom is closely related

Page 10: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

10

19

4 2 3 … 4

So just look at allpossibilities for this sequence

and for each one findthe best possibility forthe smaller P

Then pick best overall!

Dynamic programming

Search!

need a measure function (e.g. number of operators)

Very similar to a ”shortest paths” algorithm

20

Page 11: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

11

21

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx)whereprefix f = memo pm

wherepm ([d],_,w) = trywire ([d],w)pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC ds (prefix f)| ds <- topds g h (length is)]where

. . . .

The real code!

22

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx)whereprefix f = memo pm

wherepm ([d],_,w) = trywire ([d],w)pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC ds (prefix f)| ds <- topds g h (length is)]where

. . . .

The real code!

f1 is the measure function beingoptimised for

Page 12: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

12

23

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx)whereprefix f = memo pm

wherepm ([d],_,w) = trywire ([d],w)pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC ds (prefix f)| ds <- topds g h (length is)]where

. . . .

The real code!

g is max width of small S and Fnetworks. Controls fanout.

24

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx)whereprefix f = memo pm

wherepm ([d],_,w) = trywire ([d],w)pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC ds (prefix f)| ds <- topds g h (length is)]where

. . . .

The real code!

contextdelays inwire numbers (positions) inallowed depth

(is,xs,w)

Page 13: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

13

25

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx)whereprefix f = memo pm

wherepm ([d],_,w) = trywire ([d],w)pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC ds (prefix f)| ds <- topds g h (length is)]where

. . . .

The real code!

use memoisation to avoidexpensive recomputation

26

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx)whereprefix f = memo pm

wherepm ([d],_,w) = trywire ([d],w)pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC ds (prefix f)| ds <- topds g h (length is)]where

. . . .

The real code!

base case: single wire

Page 14: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

14

27

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx)whereprefix f = memo pm

wherepm ([d],_,w) = trywire ([d],w)pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC ds (prefix f)| ds <- topds g h (length is)]where

. . . .

The real code!

Fail if it is simply impossibleto fit a prefix network in theavailable depth

28

wsoE f1 g ctx = getans (error "no fit") (prefix f1 ctx)whereprefix f = memo pm

wherepm ([d],_,w) = trywire ([d],w)pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC ds (prefix f)| ds <- topds g h (length is)]where

. . . .

The real code!

For each candidate sequence:Build the resulting network(where call of (prefix f) gives the best network for the recursive callinside)(Needed to think hard aboutcontrolling size of search space)

Page 15: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

15

29

parpre f1 g ctx = getans (error "no fit") (prefix f1 ctx)whereprefix f = memo pm

wherepm ([d],_,w) = trywire ([d],w)pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC ds (prefix f)| ds <- topds g h (length is)]where

. . . .

The real code!

Finally, pick the best amongall these candidates

30

Result when minimising number of ops, depth 6, 33 inputs, fanout 7

This network is Depth Size Optimal (DSO)

depth + number of ops = 2(number of inputs)-2 (known to be smallest possible no. ops for given depth, inputs)

6 + 58 = 2*33 – 2

Page 16: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

16

31

64 inputs, depth 8, size 118 (also DSO)

BUT not min. depth.

We need to move away from DSO if we want shallow networks

A further generalisation

32

Page 17: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

17

33

parpre1 f1 f2 g m ctx = getans (error "no fit") (prefix f1 ctx)where

prefix f = memo pmwhere

pm ([],_,w) = trywire ([],w)pm ([i],_,w) = trywire ([i],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC1 ds (prefix f) (prefix f2)| ds <- topds1 g h m lis]

34

parpre1 f1 f2 g m ctx = getans (error "no fit") (prefix f1 ctx)where

prefix f = memo pmwhere

pm ([],_,w) = trywire ([],w)pm ([i],_,w) = trywire ([i],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC1 ds (prefix f) (prefix f2)| ds <- topds1 g h m lis]

extra base case for 0 inputs

Page 18: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

18

35

parpre1 f1 f2 g m ctx = getans (error "no fit") (prefix f1 ctx)where

prefix f = memo pmwhere

pm ([],_,w) = trywire ([],w)pm ([i],_,w) = trywire ([i],w) pm (is,_,w) | 2^h < length is = Fail where h = maxd(is,w)pm (is,xs,w) = ((bestOnE xs is f).dropFail)

[wrpC1 ds (prefix f) (prefix f2)| ds <- topds1 g h m lis]

now there are 2 recursive calls

Result

When minimising no. of ops: gives same as Ladner Fischer for 2^n inputs, depth n,

considerably fewer ops and lower fanoutelsewhere (non power of 2, deeper)

Translates into low power plus decent speed when exported to Design Compiler

36

Page 19: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

19

37

Link to Wired allows more accurate estimates. Can then explore design space

38

Can also export to Cadence SoC Encounter

Page 20: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

20

Wired

Start with Lava-like description and then graduallyadd placement info. + wiring ”guides”

Can still use our bag of programming tricks

(still embedded in Haskell)

Quick but relatively accurate design exploration

See lecture by Emil on thursday

39

Obvious questions

This is very low level. What about higher up, earlier in the design?

(Tentative assertion: these were general programming idioms with possible application at other levels of abstraction.)

What about the cases when such a structural approach is inappropriate?

Can we make refinement work?

Can we design appropriate GENERIC verification methods?

40

Page 21: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

21

Putting the designer in control

Connection patterns are essential first step (and give some layout awareness when wanted)

We write circuit generators rather than circuit descriptions. Everything is done behind the scenes by symbolic evaluation. Full power of Haskell is available to the user (but we have some useful idioms to reduce the fear).

Circuit generators are short and sweet and LOOK LIKE circuit descriptions.

41

It’s all about programming

Non-standard interpretation used after generation (as we havelong done) and now also to guide synthesis

Clever circuits a good idiom. Can control choice of components, wiring and topology. Greatly increase expressive power of the connection patterns approach.

Having a full functional language available is a great once onehas had some practice. More idioms to be discovered

Ideas compatible with Intel’s IDV

42

Page 22: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

22

We can’t only think about function

Clever circuits give a way to allow non-functional properties to influence design (even early on). Makes blocks context sensitive.

Vital as we move to deep sub-micron

Separation of concerns becoming less and less possible

First experiments are (and will be) about module generation

Remains to be seen if there are applications at higher levels

Hopefully, a project on DSP Algorithm Design with Ericsson

will explore this

43

44

The Big Picture (Design and Verification Languages)(see chapter in e-Book)

VHDL Verilog

CUML

Page 23: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

23

45

The Big Picture (Languages)

VHDL Verilog

CUML

46

Intel

IDV (Seger)

Forte (Intel’s FV system)

IBM

SystemML(now called HDML, on sourceforge)

Masters projects possible

Behavioural Lava (York)

Lava + Wired

etc.

Bluespec SV

Lustre, Esterel

Cryptol

Page 24: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

24

47

The Big Picture (Verification methods)(see course intro., lectures by Seger and Kunz)

Equivalence Checking (formal)

SimulationProperty Checking

Formal

48

Kunz (Infineon, Siemens, Bosch… OneSpin)

processor and SoC verificationSAT-basedExtremely impressive!

see also work at companies likeNVIDIA, Freescale, …(see panel at FMCAD 2007(links page))

A problem is that there is a lot of unpublished work….

Page 25: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

25

49

Intel (Seger’s lecture)Forte (STE)niches (such as Floating Point Arith.)

IBM Sixth Sensecombines formal and semi-formalemphasises scalability and automation

see great presentation by Baumgartnerfrom FMCAD 2006 (links page)

Hot research topics

Coverage (OneSpin look to have something veryinteresting, but it is not public)

Methodology, Finding new FV ”recipes”

Moving up in abstraction levels

Satisfiability Modulo Theories (SMT), First Order Logic

How to design (and verify) complete systems

has become harder because of multicore

Getting control of non-functional properties (particularly power consumption) 50

Page 26: Prefix - Chalmers...5/11/2008 2 Why interesting? Microprocessors contain LOTS of parallel prefix circuits not only binary and FP adders address calculation priority encoding etc. Overall

5/11/2008

26

Hot research topics

Parallelisation of EDA algorithms

Protocol verification

Increasing automation of FV

(e.g. transformation-based verification ala Sixth Sense)

how to build and use verification IP

reuse

Post-silicon verification

51

You should think about

The two different design flows that you have seenWhat was good and bad about themYOUR opinions based on your experience(which is influenced by previous expertise)

Formal Verificationevidence about its use (suitable niches, module verification)limitations (a main one being scalability)what it can give when it works

52