Convolution as matrix multiplication

Convolution as matrix multiplication

• Edwin Efraín Jiménez Lepe

16 24 32

47 18 26

68 12 9

Input

0 1

-1 0

2 3

4 5

W1

W2

∗¿

16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

Im2col (input)

0 5

1 3

-1 4

0 2x

W1 W2

¿

23 353

50 535

-14 354

-14 248

Rearrange

23 -14

50 -14

353 354

535 248

FeedForward

Applying kernel rotation

16 24 32

47 18 26

68 12 9

Input

0 1

-1 0

2 3

4 5

W1

W2

∗¿

16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

Im2col (input)

0 5

1 3

-1 4

0 2x

W1 W2

¿

23 353

50 535

-14 354

-14 248

Rearrange

24 -13

51 -13

353 354

535 248

Now with bias

1

1

1

11 0

FeedForward

16 24 32

47 18 26

68 12 9

Input

0 0

-2.94504954e-05 0

d_y

∗ ¿

16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

Im2col (input)

0 0

-2e-05 6e-06

0 0

0 0x

Im2col(d_y)

-1.38417328e-03 3.00583533e-04

-2.00263369e-03 4.34886814e-04

-5.30108917e-04 1.15117098e-04

-3.53405945e-04 7.67447318e-05

Rearrange

-1.38417328e-03 -5.30108917e-04

-2.00263369e-03 3.53405945e-04

d_w = input * d_y

The update correspond to theRotated kernel

BackPropagation

d_w

0 0

6.39539432e-06 0

-1.38417328e-03 -5.30108917e-04

-2.00263369e-03 3.53405945e-04

0 0-2.94504954e-05 0

d_y

d_x = d_y * w (without rotation)

BackPropagation

0 0

-6.39539432e-06 0

We need full convolutionAnd keep kernel unrotated

0 0 0 0

0 0 0 0

0 -2.94504954e-05 0 0

0 0 0 0

d_y

0 0 0 0

0 0 0 0

0 6.39539432e-06 0 0

0 0 0 0

0 1

-1 0

2 3

4 5

W1

W2

∗

0 1

-1 0

2 3

4 5

W1

W2

∗


BackPropagation

0 0 0 0

0 0 0 0

0 -2.94504954e-05 0 0

0 0 0 0

d_y

0 0 0 0

0 0 0 0

0 6.39539432e-06 0 0

0 0 0 0

=

0 1

-1 0

2 3

4 5

W1

W2

∗

0 0 0 0 0 -2.94e-05 0 0 0

0 0 0 0 -2.94e-05 0 0 0 0

0 0 -2.94e-05 0 0 0 0 0 0

0 -2.94e-05 0 0 0 0 0 0 0

0

-1

1

0x

0 0 0 0 0 6.395e-06 0 0 0

0 0 0 0 6.395e-06 0 0 0 0

0 0 6.395e-06 0 0 0 0 0 0

0 6.395e-06 0 0 0 0 0 0 0

2

4

3

5x

T

T


BackPropagation

0 0 0 0 0 -2.94e-05 0 0 0

0 0 0 0 -2.94e-05 0 0 0 0

0 0 -2.94e-05 0 0 0 0 0 0

0 -2.94e-05 0 0 0 0 0 0 0

0

-1

1

0x

0 0 0 0 0 6.395e-06 0 0 0

0 0 0 0 6.395e-06 0 0 0 0

0 0 6.395e-06 0 0 0 0 0 0

0 6.395e-06 0 0 0 0 0 0 0

2

4

3

5x

T

T=

0

0

-0.2945e-04

0

0.2945e-04

0

0

0

0

0

0.3198e-04

0.1919-04

0

0.2558-04

0.1279-04

0

0

0


BackPropagation

0

0

-0.2945e-04

0

0.2945e-04

0

0

0

0

0

0.3198e-04

0.1919-04

0

0.2558-04

0.1279-04

0

0

0

+ =

0

0.3198e-04

-0.1026e-04

0

0.5503e-04

0.1279-04

0

0

0

reshape

0 0 0

0.3198e-04 0.5503e-04 0

-0.1026e-04 0.1279-04 0


BackPropagation

0 0 0 0 0 -2.94e-05 0 0 0

0 0 0 0 -2.94e-05 0 0 0 0

0 0 -2.94e-05 0 0 0 0 0 0

0 -2.94e-05 0 0 0 0 0 0 0

0

-1

1

0x0 0 0 0 0 6.395e-06 0 0 0

0 0 0 0 6.395e-06 0 0 0 0

0 0 6.395e-06 0 0 0 0 0 0

0 6.395e-06 0 0 0 0 0 0 0

2

4

3

5

T

=

In fact, we can do it in just one operation

0

0.3198e-04

-0.1026e-04

0

0.5503e-04

0.1279-04

0

0

0

Notice, every channel of delta is multiplied by the correspondent filter that generates it

A multi-channel example16 24 32

47 18 26

68 12 9

Input

26 57 43

24 21 12

02 11 19

18 47 21

4 6 12

81 22 13

0 1

-1 0

2 3

4 5

Filter 1 Filter 2

∗

-2 68

24 16

18 32

22 60

23 7

46 35

42 20

81 78

(3,3,3) (2,3,2,2) Output= (2,2,2)

¿2171 2170

5954 2064

13042 13575

11023 6425

Applying theano convolution (which rotates Automatically the filters)

A multi-channel example (vectorized)

16 24 32

47 18 26

68 12 9

Input

26 57 43

24 21 12

02 11 19

18 47 21

4 6 12

81 22 13

0 1

-1 0

2 3

4 5

Filter 1 Filter 2

∗

-2 68

24 16

18 32

22 60

23 7

46 35

42 20

81 78

(3,3,3) (2,3,2,2)

¿

16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

26 24 57 21

24 2 21 11

57 21 43 12

21 11 12 19

18 4 47 6

4 81 6 22

47 6 21 12

6 22 12 13

0 60

1 32

-1 22

0 18

5 35

3 7

4 46

2 23

16 78

68 20

24 81

-2 42

x

T

A multi-channel example (vectorized)16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

26 24 57 21

24 2 21 11

57 21 43 12

21 11 12 19

18 4 47 6

4 81 6 22

47 6 21 12

6 22 12 13

x =

T 0 60

1 32

-1 22

0 18

5 35

3 7

4 46

2 23

16 78

68 20

24 81

-2 42

2171 13042

5954 11023

2170 13575

2064 6425

Channel 1

Channel 2

Rearrange

2171 2170

5954 2064

13042 13575

11023 6425

Backpropagation

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

Imagine we got the next errorfrom an up-layer

And we want to propagate it to the correspondent layer (input of convolution)

We need to compute d_y * w (without rotation)But is a ‘full’ convolution, so we add 1 zero padding to d_y

0 0 0 0

0 .1678 .098 0

0 .002 .246 0

0 0 0 0

0 0 0 0

0 0.5 .67 0

0 .21 .487 0

0 0 0 0

Backpropagationd_y-1=d_y * w (without rotation)

0 0 0 0

0 .1678 .098 0

0 .002 .246 0

0 0 0 0

0 0 0 0

0 0.5 .67 0

0 .21 .487 0

0 0 0 0

im2col

0 0 0 0 .1678 .002 0 .098 .246

0 0 0 .1678 .002 0 .098 .246 0

0 .1678 .002 0 .098 .246 0 0 0

.1678 .002 0 .098 .246 0 0 0 0

0 0 0 0 .5 .21 0 .67 .487

0 0 0 .5 .21 0 .67 .487 0

0 .5 .21 0 .67 .487 0 0 0

.5 .21 0 .67 .487 0 0 0 0


0 0 0 0 .1678 .002 0 .098 .246

0 0 0 .1678 .002 0 .098 .246 0

0 .1678 .002 0 .098 .246 0 0 0

.1678 .002 0 .098 .246 0 0 0 0

0 0 0 0 .5 .21 0 .67 .487

0 0 0 .5 .21 0 .67 .487 0

0 .5 .21 0 .67 .487 0 0 0

.5 .21 0 .67 .487 0 0 0 0

T

Notice, every channel of delta is multiplied by the correspondent filter that generates it

0

-1

1

0

2

4

3

5

-2

24

68

16

18

22

32

60

23

46

7

35

42

81

20

78

x =

30 18.339 41.6848

28.7678 11.3634 37.8224

6.722 1.476 4.336

51.0322 47.6112 98.3552

64.376 44.7626 99.7084

19.61 8.981 35.284

14.642 31.212 56.622

22.528 38.992 73.295

8.766 11.693 19.962


30 18.339 41.6848

28.7678 11.3634 37.8224

6.722 1.476 4.336

51.0322 47.6112 98.3552

64.376 44.7626 99.7084

19.61 8.981 35.284

14.642 31.212 56.622

22.528 38.992 73.295

8.766 11.693 19.962

rearrange

30 51.0322 14.642

28.7678 64.376 22.528

6.722 19.61 8.766

18.339 47.6112 31.212

11.3634 44.7626 38.992

1.476 8.981 11.693

41.6848 98.3552 56.622

37.8224 99.7084 73.295

4.336 35.284 19.962

Backpropagation (no vectorized)d_y-1=d_y * w (without rotation)

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

(2,2,2)

0 1

-1 0

2 3

4 5

Filter 1 Filter 2

∗

-2 68

24 16

18 32

22 60

23 7

46 35

42 20

81 78

(2,3,2,2)

Transpose dimensions 0 and 1

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

(2,2,2)

0 1

-1 0

2 3

4 5

Filter 1 Filter 2

∗

-2 68

24 16

18 32

22 60

23 7

46 3542 20

81 78

(3,2,2,2)

Filter 3

Backpropagation (no vectorized, full convolution) d_y-1=d_y * w (without rotation)

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

(2,2,2)

0 1

-1 0

2 3

4 5

Filter 1 Filter 2

∗

-2 68

24 16

18 32

22 60

23 7

46 3542 20

81 78

(3,2,2,2)

Filter 3

¿

30 51.0322 14.642

28.7678 64.376 22.528

6.722 19.61 8.766

18.339 47.6112 31.212

11.3634 44.7626 38.992

1.476 8.981 11.693

41.6848 98.3552 56.622

37.8224 99.7084 73.295

4.336 35.284 19.962

Backpropagationd_w=input * d_y16 24 32

47 18 26

68 12 9

Input

26 57 43

24 21 12

02 11 19

18 47 21

4 6 12

81 22 13

(3,3,3)

∗

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

=

d_y(2,2,2)

Dimensions do not match,So it is telling us that we need toApply both filters to any cannel of the input

16 24 32

47 18 26

68 12 9∗

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

=

9.5588 13.5952

12.7386 7.8064

42.716 49.882

55.684 33.323

26 57 43

24 21 12

02 11 19∗

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

=

15.1628 16.7726

8.7952 9.3958

66.457 67.564

31.847 30.103

∗

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

=

9.1104 12.9086

6.8332 5.4248

44.252 44.674

33.744 21.991

18 47 21

4 6 12

81 22 13

Backpropagationd_w=input * d_y

=

33.832 43.2764

28.367 22.627

153.425 162.12

121.275 85.417

Error associated with rotated kernel, it meansWe need to rotate this result to update the unrotated kernel

9.5588 13.5952

12.7386 7.8064

42.716 49.882

55.684 33.323

15.1628 16.7726

8.7952 9.3958

66.457 67.564

31.847 30.103

9.1104 12.9086

6.8332 5.4248

44.252 44.674

33.744 21.991

+

+

Backpropagation vectorizedd_w=input * d_y (without rotate d_y)

16 24 32

47 18 26

68 12 9

Input

26 57 43

24 21 12

02 11 19

18 47 21

4 6 12

81 22 13

(3,3,3)

∗

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

=

d_y(2,2,2)

Dimensions do not match,So it is telling us that we need toApply both filters to any cannel of the input

16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

26 24 57 21

24 2 21 11

57 21 43 12

21 11 12 19

18 4 47 6

4 81 6 22

47 6 21 12

6 22 12 13

T

x

.1678 0.5

.002 0.21

.098 0.67

.246 0.487

.1678 0.5

.002 0.21

.098 0.67

.246 0.487

.1678 0.5

.002 0.21

.098 0.67

.246 0.487

Backpropagation vectorizedd_w=input * d_y (without rotate d_y)

16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

26 24 57 21

24 2 21 11

57 21 43 12

21 11 12 19

18 4 47 6

4 81 6 22

47 6 21 12

6 22 12 13

T

x

.1678 0.5

.002 0.21

.098 0.67

.246 0.487

.1678 0.5

.002 0.21

.098 0.67

.246 0.487

.1678 0.5

.002 0.21

.098 0.67

.246 0.487

=

33.832 153.425

28.367 121.275

43.2764 162.12

22.627 85.417

33.832 43.2764

28.367 22.627

153.425 162.12

121.275 85.417

rearrange

Software

Convolution as matrix multiplication