CSE 246: Computer Arithmetic Algorithms and Hardware Design

CSE 246: Computer Arithmetic Algorithms and Hardware Design

Instructor:Prof. Chung-Kuan Cheng

Lecture 6.1 Multiplication Arithmetic

CSE 246 2

Topics:

Karatsuba’s Method (1962) Toom’s Method (1963) Modular Method FFT

CSE 246 3

Karatsuba’s Method U=2nU1+U0, V=2nV1+V0

UV= 22nU1V1+2n(U1V0+U0V1)+U0V0

= (22n+2n)U1V1+2n(U1-U0)(V0-V1)+(2n+1)U0V0

T(2n)<= 3T(n)+cnT(2k)<=c(3k-2k)T(n)=T(2lgn)<=c(3lgn-2lgn)<3cnlg3

lg3=1.585

CSE 246 4

Toom’s Method U=2rnUr+…+2nU1+U0

V=2rnVr+…+2nV1+V0

U(x)= xrUr+…+xU1+U0

V(x)= xrVr+…+xV1+V0

U(x)V(x)=W(x)= x2rW2r+…+xW1+W0

Set 2r+1 equations:W(0)=U(0)V(0)W(1)=U(1)V(1)W(2r)=U(2r)V(2r)

CSE 246 5

Toom’s Method T((r+1)n)<= (2r+1)T(n)+cn T(n)<=cnlogr+1(2r+1)<cn1+logr+12

Theorem: Given e> 0, there exists a multiplication algorithm such that the number of elementary operation T(n) needed to multiply two n-bit numbers satisfies for some constant c(e) independent of n

T(n)<c(e)n1+e

CSE 246 6

Toom’s Method U=(4,13,2)16, V=(9,2,5)16

U(x)=4x2+13x+2, V=9x2+2x+5 W(x)=U(x)V(x) W(0)=10, W(1)=304,W(2)=1980 W(3)=7084,W(4)=18526 W(x)= x2rW2r+…+xW1+W0

CSE 246 7

Toom’s Method W(x)= x2rW2r+…+xW1+W0

Rewrite W(x)= a2rx2r+…+a1x1+a0

where xk=x(x-1)…(x-k+1)W(x+1)-W(x)= 2ra2rx2r-1+(2r-1)a2r-1x2r-2…+a1

(W(x+2)-W(x+1))-(W(x+1)-W(x))=2r(2r-1)a2rx2r-2+(2r-1)(2r-2)a2r-1x2r-3…+2a2

CSE 246 8

Toom’s Method W(*)=10, 304, 1980, 7084, 18526 W’(*)=294, 1676, 5104, 11442 W’’(*)=1382, 3428, 6338 W’’(*)/2= 691, 1714, 3169 W’’’(*)/2= 1023, 1455 W’’’(*)/6= 341, 485 W’’’’(*)/6= 144 W’’’’(*)/24= 36 W(x)= 36x4+341x3+691x2+294x1+10=(((36(x-3)+341)(x-2)+691)(x-1)+294)x+10= 36x4+125x3+64x2+69x+10

CSE 246 9

Toom’s Method36 341

-3x36

36 233 691

-2x36 -2x233

36 161 225 294

-1x36 -1x161 -1x225

36 125 64 69 10

CSE 246 10

Toom and Cook’s Method Theorem: There is a constant c such

that the execution time of Toom and Cook’s method is less than

cn23.5sqrt(lgn) cycles

CSE 246 11

Modular Method (Schonhage) Recursive formula: q0=1, qk+1=3qk-1 Thus, we have qk=1/2(3k+1) Relatively prime pi

6qk-1,6qk+1,6qk+2,6qk+3,6qk+5,6qk+7 Set six moduli mi=2pi-1

CSE 246 12

Modular Method Given U and V, Find W=UxV Compute ui=Umodmi vi=Vmodmi

Compute wi=uixvimodmi

Recover W T(n)=O(nlog36)=O(n1.631)

CSE 246 13

FFT

Set w=exp(2/K), i.e. wK=1 us= sum(0<=t<K) wstut

vs= sum(0<=t<K) wstvt

U(s)V(s)=(u0v0,u1v1,…,uK-1vK-1) P(s)=U(s)V(s), ps=usvs

ps= sum(0<=t<K) wstpt

Given U(t)=(u0,u1,…uK-1),V(t)=(v0,v1,…vK-1)Find P(t)=(p0,p1,…,pK-1),where pt=sum(i+j=t modK) uivj

CSE 246 14

FFT K>= 2n-1, un=un+1=…=uK-1=0 vn=vn+1=…=vK-1=0 pt=sum(i+j=t modK)uivj

=utv0+ut-1v1+…+u0vt

CSE 246 15

FFT (K=2k ,t=(tk-1,…,t0))

Set A0(tk-1,…,t0)=ut ,i.e. A0(t)=ut

Set A1(sk-1,tk-2,…,t0)=

A0(0,tk-2,…,t0)+w2k-1sk-1A0(1,tk-2,…,t0) Set A2(sk-1,sk-2,tk-3,…,t0)=

A1(sk-1,0,tk-3,…,t0)+

w2k-2(sk-2sk-1)2A1(sk-1,1,tk-3,…,t0) Set Ak(sk-1,sk-2,sk-3,…,s0)=

Ak-1(sk-1,…,s1,0)+

w(s0s1…sk-1)2 Ak-1(sk-1,…,s1,1)

CSE 246 16

FFT (K=2k ,t=(tk-1,…,t0))

Replace tk-1 with sk-1

sk-1 determines w2k-1sk-1

Replace tk-2 with sk-2

sk-1,sk-2 determines w2k-2(sk-2sk-1)2

Replace t0 with s0

sk-1,sk-2,…,s0 determines w(s0s1…sk-1)2

Binary s=(s0,s1,…,sk-1)2

CSE 246 17

FFT (K=2k ,t=(tk-1,…,t0))

By induction, we have Aj(sk-1,…,sk-j,tk-j-1,…,t0)=

sum(tk-1,…,tk-j)w2k-j (sk-j,…,sk-1)2 (tk-1,…,tk-j)2ut

Ak(sk-1,…,s0)=

sum(tk-1,…,t0) w(s0,…,sk-1)2(tk-1,…,t0)2ut

=us

CSE 246 18

FFT: k=2

(00) (01) (10) (11)

(00) 1 1 1 1

(01) 1 w w2 w3

(10) 1 w2 w4 w6

(11) 1 w3 w6 w9

u0

u1

u2

u3

u0

u1

u2

u3

=

CSE 246 19

FFT: k=2

(00) (10) (01) (11)

(00) 1 1 1 1

(10) 1 w4 w2 w6

(01) 1 w2 w w3

(11) 1 w6 w3 w9

u0

u2

u1

u3

u0

u2

u1

u3

=

CSE 246 20

FFT: k=2

(00) (10) (01) (11)

(00) 1 1 1 1

(10) 1 1 -1 -1

(01) 1 -1 w -w

(11) 1 -1 -w w

u0

u2

u1

u3

u0

u2

u1

u3

=

CSE 246 21

FFT: k=2

1 1 1 1

1 1 -1 -1

1 -1 w -w

1 -1 -w w

1 0 1 0

1 0 -1 0

0 1 0 w

0 1 0 -w

1 1 0 0

1 -1 0 0

0 0 1 1

0 0 1 -1

=

CSE 246 22

FFT: k=3

(000) (001) (010) (011) (100) (101) (110) (111)

(000) 1 1 1 1 1 1 1 1(001) 1 w w2 w3 w4 w5 w6 w7

(010) 1 w2 w4 w6 w8 w10 w12 w14

(011) 1 w3 w6 w9 w12 w15 w18 w21

(100) 1 w4 w8 w12 w16 w20 w24 w28

(101) 1 w5 w10 w15 w20 w25 w30 w35

(110) 1 w6 w12 w18 w24 w30 w36 w42

(111) 1 w7 w14 w21 w28 w35 w42 w49

CSE 246 23

FFT: k=3

(000) (100) (010) (110) (001) (101) (011) (111)

(000) 1 1 1 1 1 1 1 1(100) 1 w16 w8 w24 w4 w20 w12 w28

(010) 1 w8 w4 w12 w2 w10 w6 w14

(110) 1 w24 w12 w36 w6 w30 w18 w42

(001) 1 w4 w2 w6 w w5 w3 w7

(101) 1 w20 w10 w30 w5 w25 w15 w35

(011) 1 w12 w6 w18 w3 w15 w9 w21

(111) 1 w28 w14 w42 w7 w35 w21 w49

CSE 246 24

FFT: k=3

(000) (100) (010) (110) (001) (101) (011) (111)

(000) 1 1 1 1 1 1 1 1(100) 1 1 1 1 -1 -1 -1 -1(010) 1 1 -1 -1 w2 w2 -w2 -w2

(110) 1 1 -1 -1 -w2 -w2 w2 w2

(001) 1 -1 w2 -w2 w -w w3 -w3

(101) 1 -1 w2 -w2 -w w -w3 w3

(011) 1 -1 -w2 w2 w3 -w3 w -w(111) 1 -1 -w2 w2 -w3 w3 -w w

CSE 246 25

FFT: k=31 1 1 1 1 1 1 1

1 1 1 1 -1 -1 -1 -1

1 1 -1 -1 w2 w2 -w2 -w2

1 1 -1 -1 -w2 -w2 w2 w2

1 -1 w2 -w2 w -w w3 -w3

1 -1 w2 -w2 -w w -w3 w3

1 -1 -w2 w2 w3 -w3 w -w

1 -1 -w2 w2 -w3 w3 -w w

1 0 0 0 1 0 0 0

1 0 0 0 -1 0 0 0

0 1 0 0 0 w2 0 0

0 1 0 0 0 -w2 0 0

0 0 1 0 0 0 w 0

0 0 1 0 0 0 -w 0

0 0 0 1 0 0 0 w3

0 0 0 1 0 0 0 -w3

1 0 1 0 0 0 0 0

1 0 -1 0 0 0 0 0

0 1 0 w2 0 0 0 0

0 1 0 -w2 0 0 0 0

0 0 0 0 1 0 1 0

0 0 0 0 1 0 -1 0

0 0 0 0 0 1 0 w2

0 0 0 0 0 1 0 -w2

1 1 0 0 0 0 0 0

1 -1 0 0 0 0 0 0

0 0 1 1 0 0 0 0

0 0 1 -1 0 0 0 0

0 0 0 0 1 1 0 0

0 0 0 0 1 -1 0 0

0 0 0 0 0 0 1 1

0 0 0 0 0 0 1 -1

=

CSE 246 26

FFT us=u0+u1s+u2s2+…+u2k-1s2k-1

us=u0+u2s2+…+u2k-2s2k-2

+u1s+u3s3+…+u2k-1s2k-1

us= Fe(s2) + sFd(s2)

Fe(s2)=u0+u2s2+…+u2k-2s2k-2

Fd(s2)=u1+u3s2+…+u2k-1s2k-1

us= Fee(s4)+s2Fed(s4) + s[Fde(s4) +s2Fdd(s4)]

CSE 246 27

FFT us=u0+u1s+u2s2+…+u2k-1s2k-1

us= Fee(s4)+s2Fed(s4) + s[Fde(s4) +s2Fdd(s4)] us= Feee(s8)+ s4Feed(s8) + s2[Fede(s8)+

s4Fedd(s8)] + s{[Fdee(s8)+s4Fded(s8)] +s2[Fdde(s8)+ s4Fddd(s8)]}

Fx…x(s2k-1)= Fx…xe(s2k) + s2k-1Fx…xd(s2k)

CSE 246 28

FFT us=u0+u1s+u2s2+u3s3+u4s4+u5s5+u6s6+u7s7

us= Fe(s2) + sFd(s2)

Fe(s2)=u0+u2s2+u4s4+u6s6

Fd(s2)=u1+u3s2+u5s4+u7s6

Fe(s2)=Fee(s4) + s2Fed(s4)

Fee(s4)=u0+u4s4, Fed(s4)=u2+u6s6

Fd(s2)=Fde(s4) + s2Fdd(s4)

Fde(s4)=u1+u5s4, Fdd(s4)=u3+u7s4

Fx(s=w0)=Fx(s=w4), Fx(s=w2)=Fx(s=w6), Fx(s=w)=Fx(s=w5), Fx(s=w3)=Fx(s=w7)

x=e,d (s0,s1,s2)=(-,0,0),(-,0,1),(-,1,0),(-,1,1)

Fxx(s=w0)=Fxx(s=w2)=Fxx(s=w4)=Fxx(s=w6), Fxx(s=w)=Fxx(s=w3)=Fxx(s=w5)=Fxx(s=w7),

xx=ee,ed,de,dd, (s0,s1,s2)=(-,-,0),(-,-,1)

CSE 246 29

FFT (Inversion) ur== sum(0<=s<K)wrsus

= sum(0<=s,t<K)wrswstut

= sum(0<=t<K)utsum(0<=s<K)ws(t+r)

=Ku(-r)modK

sum(0<=s<K)wsj=K if jmodK=0, 0 otherwise.

CSE 246 30

FFT 2n<=2k g< 4n, K=2k

Precision m= 6k Let M= time of m-bit multiplication Total time to multiply n-bit numbers O(n)+O(Mnk/g)

Documents

CSE 246: Computer Arithmetic Algorithms and Hardware Design