Upload
hanzila
View
27
Download
0
Embed Size (px)
DESCRIPTION
CSE 246: Computer Arithmetic Algorithms and Hardware Design. Lecture 6.1 Multiplication Arithmetic. Instructor: Prof. Chung-Kuan Cheng. Topics:. Karatsuba ’ s Method (1962) Toom ’ s Method (1963) Modular Method FFT. Karatsuba ’ s Method. U=2 n U 1 +U 0 , V=2 n V 1 +V 0 - PowerPoint PPT Presentation
Citation preview
CSE 246: Computer Arithmetic Algorithms and Hardware Design
Instructor:Prof. Chung-Kuan Cheng
Lecture 6.1 Multiplication Arithmetic
CSE 246 2
Topics:
Karatsuba’s Method (1962) Toom’s Method (1963) Modular Method FFT
CSE 246 3
Karatsuba’s Method U=2nU1+U0, V=2nV1+V0
UV= 22nU1V1+2n(U1V0+U0V1)+U0V0
= (22n+2n)U1V1+2n(U1-U0)(V0-V1)+(2n+1)U0V0
T(2n)<= 3T(n)+cnT(2k)<=c(3k-2k)T(n)=T(2lgn)<=c(3lgn-2lgn)<3cnlg3
lg3=1.585
CSE 246 4
Toom’s Method U=2rnUr+…+2nU1+U0
V=2rnVr+…+2nV1+V0
U(x)= xrUr+…+xU1+U0
V(x)= xrVr+…+xV1+V0
U(x)V(x)=W(x)= x2rW2r+…+xW1+W0
Set 2r+1 equations:W(0)=U(0)V(0)W(1)=U(1)V(1)W(2r)=U(2r)V(2r)
CSE 246 5
Toom’s Method T((r+1)n)<= (2r+1)T(n)+cn T(n)<=cnlogr+1(2r+1)<cn1+logr+12
Theorem: Given e> 0, there exists a multiplication algorithm such that the number of elementary operation T(n) needed to multiply two n-bit numbers satisfies for some constant c(e) independent of n
T(n)<c(e)n1+e
CSE 246 6
Toom’s Method U=(4,13,2)16, V=(9,2,5)16
U(x)=4x2+13x+2, V=9x2+2x+5 W(x)=U(x)V(x) W(0)=10, W(1)=304,W(2)=1980 W(3)=7084,W(4)=18526 W(x)= x2rW2r+…+xW1+W0
CSE 246 7
Toom’s Method W(x)= x2rW2r+…+xW1+W0
Rewrite W(x)= a2rx2r+…+a1x1+a0
where xk=x(x-1)…(x-k+1)W(x+1)-W(x)= 2ra2rx2r-1+(2r-1)a2r-1x2r-2…+a1
(W(x+2)-W(x+1))-(W(x+1)-W(x))=2r(2r-1)a2rx2r-2+(2r-1)(2r-2)a2r-1x2r-3…+2a2
CSE 246 8
Toom’s Method W(*)=10, 304, 1980, 7084, 18526 W’(*)=294, 1676, 5104, 11442 W’’(*)=1382, 3428, 6338 W’’(*)/2= 691, 1714, 3169 W’’’(*)/2= 1023, 1455 W’’’(*)/6= 341, 485 W’’’’(*)/6= 144 W’’’’(*)/24= 36 W(x)= 36x4+341x3+691x2+294x1+10=(((36(x-3)+341)(x-2)+691)(x-1)+294)x+10= 36x4+125x3+64x2+69x+10
CSE 246 9
Toom’s Method36 341
-3x36
36 233 691
-2x36 -2x233
36 161 225 294
-1x36 -1x161 -1x225
36 125 64 69 10
CSE 246 10
Toom and Cook’s Method Theorem: There is a constant c such
that the execution time of Toom and Cook’s method is less than
cn23.5sqrt(lgn) cycles
CSE 246 11
Modular Method (Schonhage) Recursive formula: q0=1, qk+1=3qk-1 Thus, we have qk=1/2(3k+1) Relatively prime pi
6qk-1,6qk+1,6qk+2,6qk+3,6qk+5,6qk+7 Set six moduli mi=2pi-1
CSE 246 12
Modular Method Given U and V, Find W=UxV Compute ui=Umodmi vi=Vmodmi
Compute wi=uixvimodmi
Recover W T(n)=O(nlog36)=O(n1.631)
CSE 246 13
FFT
Set w=exp(2/K), i.e. wK=1 us= sum(0<=t<K) wstut
vs= sum(0<=t<K) wstvt
U(s)V(s)=(u0v0,u1v1,…,uK-1vK-1) P(s)=U(s)V(s), ps=usvs
ps= sum(0<=t<K) wstpt
Given U(t)=(u0,u1,…uK-1),V(t)=(v0,v1,…vK-1)Find P(t)=(p0,p1,…,pK-1),where pt=sum(i+j=t modK) uivj
CSE 246 14
FFT K>= 2n-1, un=un+1=…=uK-1=0 vn=vn+1=…=vK-1=0 pt=sum(i+j=t modK)uivj
=utv0+ut-1v1+…+u0vt
CSE 246 15
FFT (K=2k ,t=(tk-1,…,t0))
Set A0(tk-1,…,t0)=ut ,i.e. A0(t)=ut
Set A1(sk-1,tk-2,…,t0)=
A0(0,tk-2,…,t0)+w2k-1sk-1A0(1,tk-2,…,t0) Set A2(sk-1,sk-2,tk-3,…,t0)=
A1(sk-1,0,tk-3,…,t0)+
w2k-2(sk-2sk-1)2A1(sk-1,1,tk-3,…,t0) Set Ak(sk-1,sk-2,sk-3,…,s0)=
Ak-1(sk-1,…,s1,0)+
w(s0s1…sk-1)2 Ak-1(sk-1,…,s1,1)
CSE 246 16
FFT (K=2k ,t=(tk-1,…,t0))
Replace tk-1 with sk-1
sk-1 determines w2k-1sk-1
Replace tk-2 with sk-2
sk-1,sk-2 determines w2k-2(sk-2sk-1)2
Replace t0 with s0
sk-1,sk-2,…,s0 determines w(s0s1…sk-1)2
Binary s=(s0,s1,…,sk-1)2
CSE 246 17
FFT (K=2k ,t=(tk-1,…,t0))
By induction, we have Aj(sk-1,…,sk-j,tk-j-1,…,t0)=
sum(tk-1,…,tk-j)w2k-j (sk-j,…,sk-1)2 (tk-1,…,tk-j)2ut
Ak(sk-1,…,s0)=
sum(tk-1,…,t0) w(s0,…,sk-1)2(tk-1,…,t0)2ut
=us
CSE 246 18
FFT: k=2
(00) (01) (10) (11)
(00) 1 1 1 1
(01) 1 w w2 w3
(10) 1 w2 w4 w6
(11) 1 w3 w6 w9
u0
u1
u2
u3
u0
u1
u2
u3
=
CSE 246 19
FFT: k=2
(00) (10) (01) (11)
(00) 1 1 1 1
(10) 1 w4 w2 w6
(01) 1 w2 w w3
(11) 1 w6 w3 w9
u0
u2
u1
u3
u0
u2
u1
u3
=
CSE 246 20
FFT: k=2
(00) (10) (01) (11)
(00) 1 1 1 1
(10) 1 1 -1 -1
(01) 1 -1 w -w
(11) 1 -1 -w w
u0
u2
u1
u3
u0
u2
u1
u3
=
CSE 246 21
FFT: k=2
1 1 1 1
1 1 -1 -1
1 -1 w -w
1 -1 -w w
1 0 1 0
1 0 -1 0
0 1 0 w
0 1 0 -w
1 1 0 0
1 -1 0 0
0 0 1 1
0 0 1 -1
=
CSE 246 22
FFT: k=3
(000) (001) (010) (011) (100) (101) (110) (111)
(000) 1 1 1 1 1 1 1 1(001) 1 w w2 w3 w4 w5 w6 w7
(010) 1 w2 w4 w6 w8 w10 w12 w14
(011) 1 w3 w6 w9 w12 w15 w18 w21
(100) 1 w4 w8 w12 w16 w20 w24 w28
(101) 1 w5 w10 w15 w20 w25 w30 w35
(110) 1 w6 w12 w18 w24 w30 w36 w42
(111) 1 w7 w14 w21 w28 w35 w42 w49
CSE 246 23
FFT: k=3
(000) (100) (010) (110) (001) (101) (011) (111)
(000) 1 1 1 1 1 1 1 1(100) 1 w16 w8 w24 w4 w20 w12 w28
(010) 1 w8 w4 w12 w2 w10 w6 w14
(110) 1 w24 w12 w36 w6 w30 w18 w42
(001) 1 w4 w2 w6 w w5 w3 w7
(101) 1 w20 w10 w30 w5 w25 w15 w35
(011) 1 w12 w6 w18 w3 w15 w9 w21
(111) 1 w28 w14 w42 w7 w35 w21 w49
CSE 246 24
FFT: k=3
(000) (100) (010) (110) (001) (101) (011) (111)
(000) 1 1 1 1 1 1 1 1(100) 1 1 1 1 -1 -1 -1 -1(010) 1 1 -1 -1 w2 w2 -w2 -w2
(110) 1 1 -1 -1 -w2 -w2 w2 w2
(001) 1 -1 w2 -w2 w -w w3 -w3
(101) 1 -1 w2 -w2 -w w -w3 w3
(011) 1 -1 -w2 w2 w3 -w3 w -w(111) 1 -1 -w2 w2 -w3 w3 -w w
CSE 246 25
FFT: k=31 1 1 1 1 1 1 1
1 1 1 1 -1 -1 -1 -1
1 1 -1 -1 w2 w2 -w2 -w2
1 1 -1 -1 -w2 -w2 w2 w2
1 -1 w2 -w2 w -w w3 -w3
1 -1 w2 -w2 -w w -w3 w3
1 -1 -w2 w2 w3 -w3 w -w
1 -1 -w2 w2 -w3 w3 -w w
1 0 0 0 1 0 0 0
1 0 0 0 -1 0 0 0
0 1 0 0 0 w2 0 0
0 1 0 0 0 -w2 0 0
0 0 1 0 0 0 w 0
0 0 1 0 0 0 -w 0
0 0 0 1 0 0 0 w3
0 0 0 1 0 0 0 -w3
1 0 1 0 0 0 0 0
1 0 -1 0 0 0 0 0
0 1 0 w2 0 0 0 0
0 1 0 -w2 0 0 0 0
0 0 0 0 1 0 1 0
0 0 0 0 1 0 -1 0
0 0 0 0 0 1 0 w2
0 0 0 0 0 1 0 -w2
1 1 0 0 0 0 0 0
1 -1 0 0 0 0 0 0
0 0 1 1 0 0 0 0
0 0 1 -1 0 0 0 0
0 0 0 0 1 1 0 0
0 0 0 0 1 -1 0 0
0 0 0 0 0 0 1 1
0 0 0 0 0 0 1 -1
=
CSE 246 26
FFT us=u0+u1s+u2s2+…+u2k-1s2k-1
us=u0+u2s2+…+u2k-2s2k-2
+u1s+u3s3+…+u2k-1s2k-1
us= Fe(s2) + sFd(s2)
Fe(s2)=u0+u2s2+…+u2k-2s2k-2
Fd(s2)=u1+u3s2+…+u2k-1s2k-1
us= Fee(s4)+s2Fed(s4) + s[Fde(s4) +s2Fdd(s4)]
CSE 246 27
FFT us=u0+u1s+u2s2+…+u2k-1s2k-1
us= Fee(s4)+s2Fed(s4) + s[Fde(s4) +s2Fdd(s4)] us= Feee(s8)+ s4Feed(s8) + s2[Fede(s8)+
s4Fedd(s8)] + s{[Fdee(s8)+s4Fded(s8)] +s2[Fdde(s8)+ s4Fddd(s8)]}
Fx…x(s2k-1)= Fx…xe(s2k) + s2k-1Fx…xd(s2k)
CSE 246 28
FFT us=u0+u1s+u2s2+u3s3+u4s4+u5s5+u6s6+u7s7
us= Fe(s2) + sFd(s2)
Fe(s2)=u0+u2s2+u4s4+u6s6
Fd(s2)=u1+u3s2+u5s4+u7s6
Fe(s2)=Fee(s4) + s2Fed(s4)
Fee(s4)=u0+u4s4, Fed(s4)=u2+u6s6
Fd(s2)=Fde(s4) + s2Fdd(s4)
Fde(s4)=u1+u5s4, Fdd(s4)=u3+u7s4
Fx(s=w0)=Fx(s=w4), Fx(s=w2)=Fx(s=w6), Fx(s=w)=Fx(s=w5), Fx(s=w3)=Fx(s=w7)
x=e,d (s0,s1,s2)=(-,0,0),(-,0,1),(-,1,0),(-,1,1)
Fxx(s=w0)=Fxx(s=w2)=Fxx(s=w4)=Fxx(s=w6), Fxx(s=w)=Fxx(s=w3)=Fxx(s=w5)=Fxx(s=w7),
xx=ee,ed,de,dd, (s0,s1,s2)=(-,-,0),(-,-,1)
CSE 246 29
FFT (Inversion) ur== sum(0<=s<K)wrsus
= sum(0<=s,t<K)wrswstut
= sum(0<=t<K)utsum(0<=s<K)ws(t+r)
=Ku(-r)modK
sum(0<=s<K)wsj=K if jmodK=0, 0 otherwise.
CSE 246 30
FFT 2n<=2k g< 4n, K=2k
Precision m= 6k Let M= time of m-bit multiplication Total time to multiply n-bit numbers O(n)+O(Mnk/g)