19
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4) Prof. Sherief Reda Division of Engineering, Brown University http://ic.engin.brown.edu

Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

  • Upload
    louise

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4). Prof. Sherief Reda Division of Engineering, Brown University http://ic.engin.brown.edu. Fast Fourier transform. One of the most important subroutines in scientific computing - PowerPoint PPT Presentation

Citation preview

Page 1: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Reconfigurable Computing(EN2911X, Fall07)

Lecture 16: Application-Driven Hardware Acceleration (1/4)

Prof. Sherief RedaDivision of Engineering, Brown University

http://ic.engin.brown.edu

Page 2: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Fast Fourier transform

• One of the most important subroutines in scientific computing

• Used in many applications including: signal and image processing, solution of differential equations, multiplication of polynomial functions, data compression, …, etc

• One of the most widely implemented hardware accelerators

Page 3: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Discrete Fourier transform

DFT

110 ,...,, Nxxx

110 ,...,, NXXX

1

0

2N

i

ikNj

ik exX

Maps a set of input points to another set of output points.The operation is reversible.

Page 4: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Roots of the unity

real

imaginary

(1, 0)

(0, j)

(-1, 0)

(0, -j)

• What are the Nth roots of unity? If N = 8 then we have

78

26

8

2

58

24

8

2

38

22

8

2

18

20

8

2

,

,

,

,

jj

jj

jj

jj

ee

ee

ee

ee

Define Nj

N eW2

Page 5: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Calculating the DFT

1

0

1

0

2 N

i

iki

N

i

ikNj

ik NWxexX

1

3

2

1

0

321

3963

2642

1321

1

3

2

1

0

.

...1

...

...1

...1

...1

1...1111

...

NNN

NN

NN

NNNNN

NNNNN

NNNNN

N x

x

x

x

x

WWWW

WWWW

WWWW

WWWW

X

X

X

XX

How many arithmetic (+ and *) operations do we need to calculate the DFT?

Page 6: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Computing the DFT using the FFT• How can we do better? Fast Fourier Transform (FFT)

ok

kN

ekk

N

i

N

ikj

ikN

N

i

N

ikj

ik

N

kjN

i

N

ikj

i

N

i

N

ikj

ik

N

i

N

kij

i

N

i

N

kij

ik

N

i

ikNj

ik

XWXX

exWexX

eexexX

exexX

exX

12/

0

2/

2

12

12/

0

2/

2

2

212/

0

2/

2

12

12/

0

2/

2

2

12/

0

)12(2

12

12/

0

)2(2

2

1

0

2

DFT of even indices

DFT of odd indicesThe sum of N point DFT has been broken into two N/2 point DFTs

Page 7: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Example when N=8Objective: Compute X0, X1, … X7 given x0, x1, …, x7

magic box

magic box

x0

x2

x4

x6

x1

x3

x5

x7

eX 0eX1eX 2eX 3

oX 0oX1oX 2oX 3

18W

08W

X0

28W38W48W58W68W

78W

X1

X2

X3

X4

X5

X6

X7

kN

NkN WW 2/

Note that

Page 8: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Now let’s apply the idea recursively

x0

x4

x2

x6

x1

x5

x3

x7

eX 0eX1eX 2eX 3

oX 0oX1oX 2oX 3

18W

08W

X0

28W

38W48W58W68W

78W

X1

X2

X3

X4

X5

X6

X7

eeX 0eeX1eoX 0eoX1

oeX 0oeX1ooX 0ooX1

04W14W

24W

34W

04W14W

24W

34W

Page 9: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

One more timex0

x4

x2

x6

x1

x5

x3

x7

eX 0eX1eX 2eX 3

oX 0oX1oX 2oX 3

18W

08W

X0

28W

38W48W58W68W

78W

X1

X2

X3

X4

X5

X6

X7

eeX 0eeX1eoX 0

eoX1oeX 0oeX1ooX 0

ooX1

04W14W

24W

34W

04W14W

24W

34W

• How many operations do we need now?• What is the execution time on a general purpose CPU?• What is the execution time on a FPGA? How many resources u need?

Page 10: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Another way to visualize FFT computations

How can we determine the order of the first inputs?

x0

x4

x2

x6

x1

x5

x3

x7

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

X0

X4

X1

X5

X2

X6

X3

X7

Page 11: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Application of FFT: faster multiplication of two polynomials

Suppose we want to evaluate A(x) at x0, how many operations do we need?

Use Horner’s rule

Suppose you have two polynomials represented by the coefficient vectors

• How many operations it takes to add these two polynomials?• How many operations it takes to multiply these two polynomials?

Page 12: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Point value representation

A point-value representation of a polynomial A(x) of degree-bound N is a set of N point-value pairs

such that all of the xk are distinct and yk=A(xk) for k=0, 1, …, N-1

How many operations do we need to compute the point representation of a polynomial? How can we do better?

Page 13: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Interpolation of polynomials from point-value representations

Given the point representation of a polynomial, how can we inverse the evaluation, i.e., determine the coefficient form of a polynomial from a point representation?

NNNN

N

N

NN y

y

y

a

a

a

x

x

x

xx

xx

xx

......

...1

...

...1

..1

1

0

1

1

0

11

11

10

211

211

200

How can we find the a’s?

Page 14: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Adding and multiplying polynomials in point representation

If polynomial C(x)=A(x)+B(x) then we can get point representation of C easily

Polynomial A

Polynomial B

How many operations do we need? How about C(x)=A(x)*B(x)?

Page 15: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

How can we convert a polynomial quickly from coefficient form to point-value and back?

Evaluate O(N2)

Point-wisemultiplication

Interpolate O(N2)

Ordinary multiplication

O(N2)

O(N)

It does not make sense now. How can we evaluate and interpolate faster than O(N2)? Can we choose the evaluation points smartly?

Page 16: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Choosing the evaluation points smartly

.

.

.

Page 17: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Finally multiplying polynomials in O(NlogN)

FFT O(N log N)

Point-wisemultiplication

Inverse FFT

Ordinary multiplication

O(N2)

O(N)

Page 18: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Back to signal processing

Linear systemwith

Impulse response(b0, b1, …, bN-1)

(a0, a1, …, aN-1)

T=0: a0b0

T=1: a0b1+a1b0

T=2: a0b2+a1b1+a2b0

….

….The response of the system to the input signal at different times is equal to the coefficients of the polynomial produced from multiplying the input signal polynomial with the impulse response polynomial? Commonly known as the convolution of the input and the system’s impulse response. How to do to find the output response faster than O(N2)?

Page 19: Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Summary

• The lecture covered one of the most important hardware accelerators: FFT

• We have seen how it can be parallelized and speed up

• Examined some of the applications