67
Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University Department of Computational Physics Joint Advanced Student School 2009

Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Parallel FFT-algorithms

Shmeleva YuliaSaint-Petersburg State University

Department of Computational Physics

Joint Advanced Student School 2009

Page 2: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Outline

• Motivation• Mathematical theory• Serial FFT• Parallel FFT

• Binary-exchange algorithm • Transpose algorithm

• Conclusion

Page 3: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Motivation

• Linear partial differential equations• Waveform analysis• Convolution and correlation• Digital signal processing • Image filtering

Page 4: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Continuous Fourier Transform

• (Forward) Fourier Transform

,2∫+∞

∞−

= dth(t)еH(f) πift

• (Inverse) Fourier Transform

,)()( 2∫+∞

∞−

−= dfefHth iftπ

where 1−=i

where 1−=i

Page 5: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Discrete Fourier transform

• Finite time series , sampled at an interval Δ

][][][ Δ== khthkh k

1-N ..., ,1 ,0=k

>−=< ]1[..., ],1[],0[ Nhhhh

where

Page 6: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Discrete Fourier transform

• The discrete (forward) Fourier transform

1-N ..., ,1 ,0=j

>−=< ]1[..., ],1[ ],0[ NHHHH

,][][1

0

/2∑−

=

=N

k

NikjekhjH πwhere

Page 7: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Discrete Fourier transform

• Let NiN eW /2π=

• The discrete (forward) Fourier transform

∑−

=

=1

0 ][][

N

k

jkNWkhjH

• The discrete (inverse) Fourier transform

∑−

=

−=1

0 ][1][

N

j

jkNWjH

Nkh

Page 8: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Discrete Fourier transform

Each H[j] requires N multiplications

The entire sequence H needs an order of

N² operations!

Page 9: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

DFT property of symmetry

Recall πi/NN eW 2 ⇒= 1 ,1W 2/

N −== NN

N W

If we associate ][][ jHkh ⇔

then ][][ jHkh −⇔−

H[j]][ lkWlkh −⇔+

][][Wlk ljHkh +⇔

Page 10: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Serial FFT (Cooley and Tukey, 1965)

Assume that dN 2=

∑ ∑−

=

=

++=12/

0

12/

0

2kjN

jN

2N 1]Wk2[WW]2[][

N

k

N

k

kj hkhjH

∑ ∑−

=

=

+−=+12/

0

12/

0

2kjN

jN

2kjN 1]Wk2[W[2k]W]2/[

N

k

N

khhNjH

Page 11: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Serial FFT

Butterfly

jNW

ja

jb

jj

Nj bWa +

jj

Nj bWa −

Page 12: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Serial FFT (decomposition)

h[7] h[6] h[5] h[4] h[3] h[2] h[1] h[0]

h[6] h[4] h[2] h[0] h[7] h[5] h[3] h[1]

h[4] h[0] h[6] h[2] h[7] h[3]h[5] h[1]

h[0] h[4] h[2] h[6] h[5]h[1] h[3] h[7]

Page 13: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Serial FFT (Cooley and Tukey, 1965)

H[0]

H[1]

H[2]

H[3]

H[5]

H[4]

H[6]

H[7]

0NW

4NW

2NW

6NW

7NW

3NW

5NW

1NW

1=m 3=m2=m0

NW

0NW

6NW

2NW

4NW

4NW

2NW

6NW

0NW

0NW

0NW

0NW

4NW

4NW

4NW

4NWh[6]

h[5]

h[3]

h[7]

h[2]

h[4]

h[0]

h[1]

Page 14: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Serial FFT (Cooley and Tukey, 1965)

• iterationsN2log

• During each iteration – N operations

Page 15: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Serial FFT

N² operationsOriginal DFT:

operationsFFT: NN 2log

Compare!

Page 16: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Serial FFT (decomposition)

h[7] h[6] h[5] h[4] h[3] h[2] h[1] h[0]

h[6] h[4] h[2] h[0] h[7] h[5] h[3] h[1]

h[4] h[0] h[6] h[2] h[7] h[3]h[5] h[1]

h[0] h[4] h[2] h[6] h[5]h[1] h[3] h[7]

Page 17: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Serial FFT• Bit reversal sorting algorithm

Original sequence Rearranged sequence

decimal binary binary decimal0 000 000 01 001 100 42 010 010 23 011 110 64 100 001 15 101 101 56 110 011 37 111 111 7

Page 18: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Definitions

• Bisection width –minimum number of communication links that can be removed to break a network into two equal sized disconnected networks

bisection width is 1

Page 19: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Definitions

• Speedup –measure that gives the relative benefit of solving a problem in parallel

p

s

TTS =

Page 20: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Definitions

• Efficiency –measure of the fraction of time for which a processing element is usefully employed

pSE =

Page 21: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Definitions

• Scalability –measure of capacity to increase speedup in proportion to the number of processing elementsin order to maintain efficiency fixed.

Page 22: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Definitions

• Problem size –number of basic computation steps in the best sequential algorithm to solve the problem on a single processing element

Page 23: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Definitions

• Isoefficiency function –function which dictates the growth rate of problem size required to keep the efficiency fixed as a number of processors increases.

Page 24: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Parallel FFT

1. The Binary-Exchange algorithm

2. The transpose algorithm

Page 25: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

• Full bandwidth network• p parallel processes• bisection width is an order of p

Example: hypercube network

Page 26: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

• Simple mapping:

One task ↔ one processp = N

• Assume thatrN 2=

Page 27: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

h[0]

h[1]

h[2]

h[3]

h[5]

h[4]

h[7]

1=m

h[6]

0P

5P4P3P2P1P

7P6P

000

101

100

011

010

001

111

110

Page 28: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

h[0]

h[1]

h[2]

h[3]

h[5]

h[4]

h[7]

1=m

h[6]

0P

5P4P3P2P1P

7P6P

000

101

100

011

010

001

111

110

Page 29: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

0P

5P4P3P2P1P

7P6P

Binary-Exchange algorithm

h[0]

h[1]

h[2]

h[3]

h[5]

h[4]

h[7]

1=m

h[6]

000

101

100

011

010

001

111

110

Page 30: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

0P

5P4P3P2P1P

7P6P

Binary-Exchange algorithm

h[0]

h[1]

h[2]

h[3]

h[5]

h[4]

h[7]

1=m

h[6]

000

101

100

011

010

001

111

110

Page 31: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

0P

5P4P3P2P1P

7P6P

Binary-Exchange algorithm

h[0]

h[1]

h[2]

h[3]

h[5]

h[4]

h[7]

1=m 2=m

h[6]

000

101

100

011

010

001

111

110

Page 32: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

0P

5P4P3P2P1P

7P6P

Binary-Exchange algorithm

h[0]

h[1]

h[2]

h[3]

h[5]

h[4]

h[7]

1=m 2=m

h[6]

000

101

100

011

010

001

111

110

Page 33: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

0P

5P4P3P2P1P

7P6P

Binary-Exchange algorithm

h[0]

h[1]

h[2]

h[3]

h[5]

h[4]

h[7]

H[0]

H[1]

H[2]

H[3]

H[5]

H[4]

H[6]

H[7]

1=m 2=m

h[6]

000

101

100

011

010

001

111

110

3=m

Page 34: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

• Another mapping:multiple tasks ↔ one process

p < N

• Assume that dr pN 2 , 2 ==

• Partition the sequence into blocks of N/p elements one block ↔ one process

Page 35: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

h[0]

h[1]

h[2]

h[3]

h[5]

h[4]

h[7]

H[0]

H[1]

H[2]

H[3]

H[5]

H[4]

H[6]

H[7]

1=m 3=m2=m

h[6]

0P

3P

2P

1P

)3 ,2( 4 , 8 ==== rdpN• Consider

000

101

100

011

010

001

111

110

Page 36: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

h[0]

h[1]

h[2]

h[3]

h[5]

h[4]

h[7]

H[0]

H[1]

H[2]

H[3]

H[5]

H[4]

H[6]

H[7]

1=m 3=m2=m

h[6]

0P

3P

2P

1P

)3 ,2( 4 , 8 ==== rdpN• Consider

101

100

011

010

111

110

000

001

Page 37: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

h[0]

h[1]

h[2]

h[3]

h[5]

h[4]

h[7]

H[0]

H[1]

H[2]

H[3]

H[5]

H[4]

H[6]

H[7]

1=m 3=m2=m

h[6]

0P

3P

2P

1P

)3 ,2( 4 , 8 ==== rdpN• Consider

101

100

111

110

000

001

010

011

Page 38: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

First d iterations N/p words of data exchangepd 2log=

Last (r-d) iterationsNr 2log=

No interprocess interaction

Page 39: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

• Denote:

st ─ the startup time for the data transfer

wt ─ the per-word transfer time

─ computation timect

Page 40: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

• The parallel run time

ct ct

ppNtptN

pNtT wscp 222 logloglog ++=

• Speedup

p)N/t(tp)p/t(tNNNpN

TNNt

Scwcsp

c

22

22

logloglogloglog

++==

Page 41: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

• Efficiency

)log/()log()log/()log(11

2222 NtptNNtpptE

cwcs ++=

• Scalability

EEKpp

ttKpp

ttKppW wKt

c

w

c

s

−=

⎭⎬⎫

⎩⎨⎧

=1

,log ,log ,logmax 2t/

22c

.

Page 42: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

• Assume that

.

25 ,4 ,2 === swc ttt

Page 43: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

• Assume that

.

256 ,25 ,4 ,2 ==== pttt swc

Page 44: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

• Limited bandwidth network• p parallel processes• bisection width is less then Θ(p)

Example: a mesh interconnection network

Page 45: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

• Mapping:

pp ×N tasks ↔ processes

• Assume thatrN 2=dp 2=

Page 46: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

p2log steps ─communicating processes are in the same row

p2log steps ─communicating processes are in the same column

Page 47: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

• The parallel run time

ct ct

pNtptN

pNtT wscp 2loglog 22 ++=

• Speedup

p)N/t(tp)p/t(tNNNpNS

cwcs 2logloglog

2

2

++=

Page 48: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Binary-Exchange algorithm

• Efficiency

)log/()(2)log/()log(11

222 NtptNNtpptE

cwcs ++=

• Scalability

EEKpp

ttKpp

ttKppW pKt

c

w

c

s w

−=

⎭⎬⎫

⎩⎨⎧

=1

,2 ,log ,logmax )t/(222

c

.

Page 49: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

• Two-dimensional transpose algorithm

sequence of size N → NN × array

Page 50: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

1. A -point FFT is computed for each column of the initial array

2. The array is transposed

3. A -point FFT for each column of the transposed array

N

N

Page 51: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

• Simple mappingone column ↔ one process

Np =

• Assume that dr pN 2 , 2 ==

• Full bandwidth network• bisection width is an order of p• example: hypercube

Page 52: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

Transpose

Page 53: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

• Another mapping:Several columns ↔ one process

• Assume that dr pN 2 , 2 ==

• Partition the array into blocks of rows one block ↔ one process

pN /

Np <

Page 54: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

• The parallel run time

ct ct

• Speedup pNtptN

pNtT wscp +−+= )1(log 2

NttpttNNNpN

Scwcs )/()/(log

log2

2

2

++≈

Page 55: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

• Efficiency

• Scalability

)log( 22 ppW Θ=

.

)log/()log/()(11

222 NttNNtpt

Ecwcs ++

Page 56: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

Compare two algorithms!

pNtptN

pNtT wscp +−+= )1(log 2

Binary-exchangealgorithm

Transposealgorithm

ppNtptN

pNtT wscp 222 logloglog ++=

Page 57: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

Compare two algorithms!

pNtptN

pNtT wscp +−+= )1(log 2

Binary-exchangealgorithm

Transposealgorithm

ppNtptN

pNtT wscp 222 logloglog ++=

Page 58: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

• Three-dimensional transpose algorithm

sequence of size N → 333 NNN ×× array

• Mapping

333 NNN ×× ↔ pp × mesh ofarray processes

Page 59: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

Page 60: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

1. A -point FFT along the z-axis2. Each of the cross-sections along the

y-z plane is transposed3. A -point FFT along the z-axis4. Each of the cross-sections along the

x-z plane is transposed5. A -point FFT along the z-axis

3 N

3 N

3 N

33 NN ×

33 NN ×

Page 61: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm• The transposition phases in the tree-dimensional

transpose algorithm

Page 62: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Transpose algorithm

• The parallel run time

ct ct

• Speedup

NttppttNNNpNS

cwcs )/(2)1()/(2loglog

2

2

+−+≈

pNt)p(tN

pNtT wscp 212log 2 +−+=

Page 63: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Comparison of described algorithms

• Assume that 64 ,25 ,4 ,2 ==== pttt swc

Page 64: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Conclusion

• Binary-exchange algorithm• high communication bandwidth • shared memory (Open MP)

• Transpose algorithm• limited communication bandwidth • distributed memory (MPI)

Page 65: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Parallel FFT software

• Parallel Engineering and Scientific Subroutine Library (PESSL)

• “Fastest Fourier Transform in the West.” (FFTW)

• Intel® Math Kernel Library (Intel® MKL)

Page 66: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Acknowledgments

Sergei Andreevitch Nemnyugin

Sergei Yurievitch Slavyanov

Roman Kuralev

Page 67: Parallel FFT-algorithms - TUM€¦ · Parallel FFT-algorithms Shmeleva Yulia Saint-Petersburg State University. Department of Computational Physics. Joint Advanced Student School

Jass 2009, Shmeleva Yulia, SPbSU

Thank you for attention!