Upload
edwin-efrain-jimenez-lepe
View
327
Download
2
Embed Size (px)
Citation preview
Convolution as matrix multiplication
• Edwin Efraín Jiménez Lepe
16 24 32
47 18 26
68 12 9
Input
0 1
-1 0
2 3
4 5
W1
W2
∗¿
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
Im2col (input)
0 5
1 3
-1 4
0 2x
W1 W2
¿
23 353
50 535
-14 354
-14 248
Rearrange
23 -14
50 -14
353 354
535 248
FeedForward
Applying kernel rotation
16 24 32
47 18 26
68 12 9
Input
0 1
-1 0
2 3
4 5
W1
W2
∗¿
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
Im2col (input)
0 5
1 3
-1 4
0 2x
W1 W2
¿
23 353
50 535
-14 354
-14 248
Rearrange
24 -13
51 -13
353 354
535 248
Now with bias
1
1
1
11 0
FeedForward
16 24 32
47 18 26
68 12 9
Input
0 0
-2.94504954e-05 0
d_y
∗ ¿
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
Im2col (input)
0 0
-2e-05 6e-06
0 0
0 0x
Im2col(d_y)
-1.38417328e-03 3.00583533e-04
-2.00263369e-03 4.34886814e-04
-5.30108917e-04 1.15117098e-04
-3.53405945e-04 7.67447318e-05
Rearrange
-1.38417328e-03 -5.30108917e-04
-2.00263369e-03 3.53405945e-04
d_w = input * d_y
The update correspond to theRotated kernel
BackPropagation
d_w
0 0
6.39539432e-06 0
-1.38417328e-03 -5.30108917e-04
-2.00263369e-03 3.53405945e-04
0 0-2.94504954e-05 0
d_y
d_x = d_y * w (without rotation)
BackPropagation
0 0
-6.39539432e-06 0
We need full convolutionAnd keep kernel unrotated
0 0 0 0
0 0 0 0
0 -2.94504954e-05 0 0
0 0 0 0
d_y
0 0 0 0
0 0 0 0
0 6.39539432e-06 0 0
0 0 0 0
0 1
-1 0
2 3
4 5
W1
W2
∗
0 1
-1 0
2 3
4 5
W1
W2
∗
d_x = d_y * w (without rotation)
BackPropagation
0 0 0 0
0 0 0 0
0 -2.94504954e-05 0 0
0 0 0 0
d_y
0 0 0 0
0 0 0 0
0 6.39539432e-06 0 0
0 0 0 0
=
0 1
-1 0
2 3
4 5
W1
W2
∗
0 0 0 0 0 -2.94e-05 0 0 0
0 0 0 0 -2.94e-05 0 0 0 0
0 0 -2.94e-05 0 0 0 0 0 0
0 -2.94e-05 0 0 0 0 0 0 0
0
-1
1
0x
0 0 0 0 0 6.395e-06 0 0 0
0 0 0 0 6.395e-06 0 0 0 0
0 0 6.395e-06 0 0 0 0 0 0
0 6.395e-06 0 0 0 0 0 0 0
2
4
3
5x
T
T
d_x = d_y * w (without rotation)
BackPropagation
0 0 0 0 0 -2.94e-05 0 0 0
0 0 0 0 -2.94e-05 0 0 0 0
0 0 -2.94e-05 0 0 0 0 0 0
0 -2.94e-05 0 0 0 0 0 0 0
0
-1
1
0x
0 0 0 0 0 6.395e-06 0 0 0
0 0 0 0 6.395e-06 0 0 0 0
0 0 6.395e-06 0 0 0 0 0 0
0 6.395e-06 0 0 0 0 0 0 0
2
4
3
5x
T
T=
0
0
-0.2945e-04
0
0.2945e-04
0
0
0
0
0
0.3198e-04
0.1919-04
0
0.2558-04
0.1279-04
0
0
0
d_x = d_y * w (without rotation)
BackPropagation
0
0
-0.2945e-04
0
0.2945e-04
0
0
0
0
0
0.3198e-04
0.1919-04
0
0.2558-04
0.1279-04
0
0
0
+ =
0
0.3198e-04
-0.1026e-04
0
0.5503e-04
0.1279-04
0
0
0
reshape
0 0 0
0.3198e-04 0.5503e-04 0
-0.1026e-04 0.1279-04 0
d_x = d_y * w (without rotation)
BackPropagation
0 0 0 0 0 -2.94e-05 0 0 0
0 0 0 0 -2.94e-05 0 0 0 0
0 0 -2.94e-05 0 0 0 0 0 0
0 -2.94e-05 0 0 0 0 0 0 0
0
-1
1
0x0 0 0 0 0 6.395e-06 0 0 0
0 0 0 0 6.395e-06 0 0 0 0
0 0 6.395e-06 0 0 0 0 0 0
0 6.395e-06 0 0 0 0 0 0 0
2
4
3
5
T
=
In fact, we can do it in just one operation
0
0.3198e-04
-0.1026e-04
0
0.5503e-04
0.1279-04
0
0
0
Notice, every channel of delta is multiplied by the correspondent filter that generates it
A multi-channel example16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,3,3) (2,3,2,2) Output= (2,2,2)
¿2171 2170
5954 2064
13042 13575
11023 6425
Applying theano convolution (which rotates Automatically the filters)
A multi-channel example (vectorized)
16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,3,3) (2,3,2,2)
¿
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
0 60
1 32
-1 22
0 18
5 35
3 7
4 46
2 23
16 78
68 20
24 81
-2 42
x
T
A multi-channel example (vectorized)16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
x =
T 0 60
1 32
-1 22
0 18
5 35
3 7
4 46
2 23
16 78
68 20
24 81
-2 42
2171 13042
5954 11023
2170 13575
2064 6425
Channel 1
Channel 2
Rearrange
2171 2170
5954 2064
13042 13575
11023 6425
Backpropagation
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
Imagine we got the next errorfrom an up-layer
And we want to propagate it to the correspondent layer (input of convolution)
We need to compute d_y * w (without rotation)But is a ‘full’ convolution, so we add 1 zero padding to d_y
0 0 0 0
0 .1678 .098 0
0 .002 .246 0
0 0 0 0
0 0 0 0
0 0.5 .67 0
0 .21 .487 0
0 0 0 0
Backpropagationd_y-1=d_y * w (without rotation)
0 0 0 0
0 .1678 .098 0
0 .002 .246 0
0 0 0 0
0 0 0 0
0 0.5 .67 0
0 .21 .487 0
0 0 0 0
im2col
0 0 0 0 .1678 .002 0 .098 .246
0 0 0 .1678 .002 0 .098 .246 0
0 .1678 .002 0 .098 .246 0 0 0
.1678 .002 0 .098 .246 0 0 0 0
0 0 0 0 .5 .21 0 .67 .487
0 0 0 .5 .21 0 .67 .487 0
0 .5 .21 0 .67 .487 0 0 0
.5 .21 0 .67 .487 0 0 0 0
Backpropagationd_y-1=d_y * w (without rotation)
0 0 0 0 .1678 .002 0 .098 .246
0 0 0 .1678 .002 0 .098 .246 0
0 .1678 .002 0 .098 .246 0 0 0
.1678 .002 0 .098 .246 0 0 0 0
0 0 0 0 .5 .21 0 .67 .487
0 0 0 .5 .21 0 .67 .487 0
0 .5 .21 0 .67 .487 0 0 0
.5 .21 0 .67 .487 0 0 0 0
T
Notice, every channel of delta is multiplied by the correspondent filter that generates it
0
-1
1
0
2
4
3
5
-2
24
68
16
18
22
32
60
23
46
7
35
42
81
20
78
x =
30 18.339 41.6848
28.7678 11.3634 37.8224
6.722 1.476 4.336
51.0322 47.6112 98.3552
64.376 44.7626 99.7084
19.61 8.981 35.284
14.642 31.212 56.622
22.528 38.992 73.295
8.766 11.693 19.962
Backpropagationd_y-1=d_y * w (without rotation)
30 18.339 41.6848
28.7678 11.3634 37.8224
6.722 1.476 4.336
51.0322 47.6112 98.3552
64.376 44.7626 99.7084
19.61 8.981 35.284
14.642 31.212 56.622
22.528 38.992 73.295
8.766 11.693 19.962
rearrange
30 51.0322 14.642
28.7678 64.376 22.528
6.722 19.61 8.766
18.339 47.6112 31.212
11.3634 44.7626 38.992
1.476 8.981 11.693
41.6848 98.3552 56.622
37.8224 99.7084 73.295
4.336 35.284 19.962
Backpropagation (no vectorized)d_y-1=d_y * w (without rotation)
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
(2,2,2)
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(2,3,2,2)
Transpose dimensions 0 and 1
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
(2,2,2)
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 3542 20
81 78
(3,2,2,2)
Filter 3
Backpropagation (no vectorized, full convolution) d_y-1=d_y * w (without rotation)
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
(2,2,2)
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 3542 20
81 78
(3,2,2,2)
Filter 3
¿
30 51.0322 14.642
28.7678 64.376 22.528
6.722 19.61 8.766
18.339 47.6112 31.212
11.3634 44.7626 38.992
1.476 8.981 11.693
41.6848 98.3552 56.622
37.8224 99.7084 73.295
4.336 35.284 19.962
Backpropagationd_w=input * d_y16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
(3,3,3)
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
d_y(2,2,2)
Dimensions do not match,So it is telling us that we need toApply both filters to any cannel of the input
16 24 32
47 18 26
68 12 9∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
9.5588 13.5952
12.7386 7.8064
42.716 49.882
55.684 33.323
26 57 43
24 21 12
02 11 19∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
15.1628 16.7726
8.7952 9.3958
66.457 67.564
31.847 30.103
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
9.1104 12.9086
6.8332 5.4248
44.252 44.674
33.744 21.991
18 47 21
4 6 12
81 22 13
Backpropagationd_w=input * d_y
=
33.832 43.2764
28.367 22.627
153.425 162.12
121.275 85.417
Error associated with rotated kernel, it meansWe need to rotate this result to update the unrotated kernel
9.5588 13.5952
12.7386 7.8064
42.716 49.882
55.684 33.323
15.1628 16.7726
8.7952 9.3958
66.457 67.564
31.847 30.103
9.1104 12.9086
6.8332 5.4248
44.252 44.674
33.744 21.991
+
+
Backpropagation vectorizedd_w=input * d_y (without rotate d_y)
16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
(3,3,3)
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
d_y(2,2,2)
Dimensions do not match,So it is telling us that we need toApply both filters to any cannel of the input
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
T
x
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
Backpropagation vectorizedd_w=input * d_y (without rotate d_y)
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
T
x
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
=
33.832 153.425
28.367 121.275
43.2764 162.12
22.627 85.417
33.832 43.2764
28.367 22.627
153.425 162.12
121.275 85.417
rearrange