22
Convolution as matrix multiplication •Edwin Efraín Jiménez Lepe

Convolution as matrix multiplication

Embed Size (px)

Citation preview

Page 1: Convolution as matrix multiplication

Convolution as matrix multiplication

• Edwin Efraín Jiménez Lepe

Page 2: Convolution as matrix multiplication

16 24 32

47 18 26

68 12 9

Input

0 1

-1 0

2 3

4 5

W1

W2

∗¿

16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

Im2col (input)

0 5

1 3

-1 4

0 2x

W1 W2

¿

23 353

50 535

-14 354

-14 248

Rearrange

23 -14

50 -14

353 354

535 248

FeedForward

Applying kernel rotation

Page 3: Convolution as matrix multiplication

16 24 32

47 18 26

68 12 9

Input

0 1

-1 0

2 3

4 5

W1

W2

∗¿

16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

Im2col (input)

0 5

1 3

-1 4

0 2x

W1 W2

¿

23 353

50 535

-14 354

-14 248

Rearrange

24 -13

51 -13

353 354

535 248

Now with bias

1

1

1

11 0

FeedForward

Page 4: Convolution as matrix multiplication

16 24 32

47 18 26

68 12 9

Input

0 0

-2.94504954e-05 0

d_y

∗ ¿

16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

Im2col (input)

0 0

-2e-05 6e-06

0 0

0 0x

Im2col(d_y)

-1.38417328e-03 3.00583533e-04

-2.00263369e-03 4.34886814e-04

-5.30108917e-04 1.15117098e-04

-3.53405945e-04 7.67447318e-05

Rearrange

-1.38417328e-03 -5.30108917e-04

-2.00263369e-03 3.53405945e-04

d_w = input * d_y

The update correspond to theRotated kernel

BackPropagation

d_w

0 0

6.39539432e-06 0

-1.38417328e-03 -5.30108917e-04

-2.00263369e-03 3.53405945e-04

Page 5: Convolution as matrix multiplication

0 0-2.94504954e-05 0

d_y

d_x = d_y * w (without rotation)

BackPropagation

0 0

-6.39539432e-06 0

We need full convolutionAnd keep kernel unrotated

0 0 0 0

0 0 0 0

0 -2.94504954e-05 0 0

0 0 0 0

d_y

0 0 0 0

0 0 0 0

0 6.39539432e-06 0 0

0 0 0 0

0 1

-1 0

2 3

4 5

W1

W2

0 1

-1 0

2 3

4 5

W1

W2

Page 6: Convolution as matrix multiplication

d_x = d_y * w (without rotation)

BackPropagation

0 0 0 0

0 0 0 0

0 -2.94504954e-05 0 0

0 0 0 0

d_y

0 0 0 0

0 0 0 0

0 6.39539432e-06 0 0

0 0 0 0

=

0 1

-1 0

2 3

4 5

W1

W2

0 0 0 0 0 -2.94e-05 0 0 0

0 0 0 0 -2.94e-05 0 0 0 0

0 0 -2.94e-05 0 0 0 0 0 0

0 -2.94e-05 0 0 0 0 0 0 0

0

-1

1

0x

0 0 0 0 0 6.395e-06 0 0 0

0 0 0 0 6.395e-06 0 0 0 0

0 0 6.395e-06 0 0 0 0 0 0

0 6.395e-06 0 0 0 0 0 0 0

2

4

3

5x

T

T

Page 7: Convolution as matrix multiplication

d_x = d_y * w (without rotation)

BackPropagation

0 0 0 0 0 -2.94e-05 0 0 0

0 0 0 0 -2.94e-05 0 0 0 0

0 0 -2.94e-05 0 0 0 0 0 0

0 -2.94e-05 0 0 0 0 0 0 0

0

-1

1

0x

0 0 0 0 0 6.395e-06 0 0 0

0 0 0 0 6.395e-06 0 0 0 0

0 0 6.395e-06 0 0 0 0 0 0

0 6.395e-06 0 0 0 0 0 0 0

2

4

3

5x

T

T=

0

0

-0.2945e-04

0

0.2945e-04

0

0

0

0

0

0.3198e-04

0.1919-04

0

0.2558-04

0.1279-04

0

0

0

Page 8: Convolution as matrix multiplication

d_x = d_y * w (without rotation)

BackPropagation

0

0

-0.2945e-04

0

0.2945e-04

0

0

0

0

0

0.3198e-04

0.1919-04

0

0.2558-04

0.1279-04

0

0

0

+ =

0

0.3198e-04

-0.1026e-04

0

0.5503e-04

0.1279-04

0

0

0

reshape

0 0 0

0.3198e-04 0.5503e-04 0

-0.1026e-04 0.1279-04 0

Page 9: Convolution as matrix multiplication

d_x = d_y * w (without rotation)

BackPropagation

0 0 0 0 0 -2.94e-05 0 0 0

0 0 0 0 -2.94e-05 0 0 0 0

0 0 -2.94e-05 0 0 0 0 0 0

0 -2.94e-05 0 0 0 0 0 0 0

0

-1

1

0x0 0 0 0 0 6.395e-06 0 0 0

0 0 0 0 6.395e-06 0 0 0 0

0 0 6.395e-06 0 0 0 0 0 0

0 6.395e-06 0 0 0 0 0 0 0

2

4

3

5

T

=

In fact, we can do it in just one operation

0

0.3198e-04

-0.1026e-04

0

0.5503e-04

0.1279-04

0

0

0

Notice, every channel of delta is multiplied by the correspondent filter that generates it

Page 10: Convolution as matrix multiplication

A multi-channel example16 24 32

47 18 26

68 12 9

Input

26 57 43

24 21 12

02 11 19

18 47 21

4 6 12

81 22 13

0 1

-1 0

2 3

4 5

Filter 1 Filter 2

-2 68

24 16

18 32

22 60

23 7

46 35

42 20

81 78

(3,3,3) (2,3,2,2) Output= (2,2,2)

¿2171 2170

5954 2064

13042 13575

11023 6425

Applying theano convolution (which rotates Automatically the filters)

Page 11: Convolution as matrix multiplication

A multi-channel example (vectorized)

16 24 32

47 18 26

68 12 9

Input

26 57 43

24 21 12

02 11 19

18 47 21

4 6 12

81 22 13

0 1

-1 0

2 3

4 5

Filter 1 Filter 2

-2 68

24 16

18 32

22 60

23 7

46 35

42 20

81 78

(3,3,3) (2,3,2,2)

¿

16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

26 24 57 21

24 2 21 11

57 21 43 12

21 11 12 19

18 4 47 6

4 81 6 22

47 6 21 12

6 22 12 13

0 60

1 32

-1 22

0 18

5 35

3 7

4 46

2 23

16 78

68 20

24 81

-2 42

x

T

Page 12: Convolution as matrix multiplication

A multi-channel example (vectorized)16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

26 24 57 21

24 2 21 11

57 21 43 12

21 11 12 19

18 4 47 6

4 81 6 22

47 6 21 12

6 22 12 13

x =

T 0 60

1 32

-1 22

0 18

5 35

3 7

4 46

2 23

16 78

68 20

24 81

-2 42

2171 13042

5954 11023

2170 13575

2064 6425

Channel 1

Channel 2

Rearrange

2171 2170

5954 2064

13042 13575

11023 6425

Page 13: Convolution as matrix multiplication

Backpropagation

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

Imagine we got the next errorfrom an up-layer

And we want to propagate it to the correspondent layer (input of convolution)

We need to compute d_y * w (without rotation)But is a ‘full’ convolution, so we add 1 zero padding to d_y

0 0 0 0

0 .1678 .098 0

0 .002 .246 0

0 0 0 0

0 0 0 0

0 0.5 .67 0

0 .21 .487 0

0 0 0 0

Page 14: Convolution as matrix multiplication

Backpropagationd_y-1=d_y * w (without rotation)

0 0 0 0

0 .1678 .098 0

0 .002 .246 0

0 0 0 0

0 0 0 0

0 0.5 .67 0

0 .21 .487 0

0 0 0 0

im2col

0 0 0 0 .1678 .002 0 .098 .246

0 0 0 .1678 .002 0 .098 .246 0

0 .1678 .002 0 .098 .246 0 0 0

.1678 .002 0 .098 .246 0 0 0 0

0 0 0 0 .5 .21 0 .67 .487

0 0 0 .5 .21 0 .67 .487 0

0 .5 .21 0 .67 .487 0 0 0

.5 .21 0 .67 .487 0 0 0 0

Page 15: Convolution as matrix multiplication

Backpropagationd_y-1=d_y * w (without rotation)

0 0 0 0 .1678 .002 0 .098 .246

0 0 0 .1678 .002 0 .098 .246 0

0 .1678 .002 0 .098 .246 0 0 0

.1678 .002 0 .098 .246 0 0 0 0

0 0 0 0 .5 .21 0 .67 .487

0 0 0 .5 .21 0 .67 .487 0

0 .5 .21 0 .67 .487 0 0 0

.5 .21 0 .67 .487 0 0 0 0

T

Notice, every channel of delta is multiplied by the correspondent filter that generates it

0

-1

1

0

2

4

3

5

-2

24

68

16

18

22

32

60

23

46

7

35

42

81

20

78

x =

30 18.339 41.6848

28.7678 11.3634 37.8224

6.722 1.476 4.336

51.0322 47.6112 98.3552

64.376 44.7626 99.7084

19.61 8.981 35.284

14.642 31.212 56.622

22.528 38.992 73.295

8.766 11.693 19.962

Page 16: Convolution as matrix multiplication

Backpropagationd_y-1=d_y * w (without rotation)

30 18.339 41.6848

28.7678 11.3634 37.8224

6.722 1.476 4.336

51.0322 47.6112 98.3552

64.376 44.7626 99.7084

19.61 8.981 35.284

14.642 31.212 56.622

22.528 38.992 73.295

8.766 11.693 19.962

rearrange

30 51.0322 14.642

28.7678 64.376 22.528

6.722 19.61 8.766

18.339 47.6112 31.212

11.3634 44.7626 38.992

1.476 8.981 11.693

41.6848 98.3552 56.622

37.8224 99.7084 73.295

4.336 35.284 19.962

Page 17: Convolution as matrix multiplication

Backpropagation (no vectorized)d_y-1=d_y * w (without rotation)

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

(2,2,2)

0 1

-1 0

2 3

4 5

Filter 1 Filter 2

-2 68

24 16

18 32

22 60

23 7

46 35

42 20

81 78

(2,3,2,2)

Transpose dimensions 0 and 1

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

(2,2,2)

0 1

-1 0

2 3

4 5

Filter 1 Filter 2

-2 68

24 16

18 32

22 60

23 7

46 3542 20

81 78

(3,2,2,2)

Filter 3

Page 18: Convolution as matrix multiplication

Backpropagation (no vectorized, full convolution) d_y-1=d_y * w (without rotation)

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

(2,2,2)

0 1

-1 0

2 3

4 5

Filter 1 Filter 2

-2 68

24 16

18 32

22 60

23 7

46 3542 20

81 78

(3,2,2,2)

Filter 3

¿

30 51.0322 14.642

28.7678 64.376 22.528

6.722 19.61 8.766

18.339 47.6112 31.212

11.3634 44.7626 38.992

1.476 8.981 11.693

41.6848 98.3552 56.622

37.8224 99.7084 73.295

4.336 35.284 19.962

Page 19: Convolution as matrix multiplication

Backpropagationd_w=input * d_y16 24 32

47 18 26

68 12 9

Input

26 57 43

24 21 12

02 11 19

18 47 21

4 6 12

81 22 13

(3,3,3)

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

=

d_y(2,2,2)

Dimensions do not match,So it is telling us that we need toApply both filters to any cannel of the input

16 24 32

47 18 26

68 12 9∗

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

=

9.5588 13.5952

12.7386 7.8064

42.716 49.882

55.684 33.323

26 57 43

24 21 12

02 11 19∗

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

=

15.1628 16.7726

8.7952 9.3958

66.457 67.564

31.847 30.103

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

=

9.1104 12.9086

6.8332 5.4248

44.252 44.674

33.744 21.991

18 47 21

4 6 12

81 22 13

Page 20: Convolution as matrix multiplication

Backpropagationd_w=input * d_y

=

33.832 43.2764

28.367 22.627

153.425 162.12

121.275 85.417

Error associated with rotated kernel, it meansWe need to rotate this result to update the unrotated kernel

9.5588 13.5952

12.7386 7.8064

42.716 49.882

55.684 33.323

15.1628 16.7726

8.7952 9.3958

66.457 67.564

31.847 30.103

9.1104 12.9086

6.8332 5.4248

44.252 44.674

33.744 21.991

+

+

Page 21: Convolution as matrix multiplication

Backpropagation vectorizedd_w=input * d_y (without rotate d_y)

16 24 32

47 18 26

68 12 9

Input

26 57 43

24 21 12

02 11 19

18 47 21

4 6 12

81 22 13

(3,3,3)

.1678 .098

.002 .246

0.5 0.67

0.21 0.487

=

d_y(2,2,2)

Dimensions do not match,So it is telling us that we need toApply both filters to any cannel of the input

16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

26 24 57 21

24 2 21 11

57 21 43 12

21 11 12 19

18 4 47 6

4 81 6 22

47 6 21 12

6 22 12 13

T

x

.1678 0.5

.002 0.21

.098 0.67

.246 0.487

.1678 0.5

.002 0.21

.098 0.67

.246 0.487

.1678 0.5

.002 0.21

.098 0.67

.246 0.487

Page 22: Convolution as matrix multiplication

Backpropagation vectorizedd_w=input * d_y (without rotate d_y)

16 47 24 18

47 68 18 12

24 18 32 26

18 12 26 9

26 24 57 21

24 2 21 11

57 21 43 12

21 11 12 19

18 4 47 6

4 81 6 22

47 6 21 12

6 22 12 13

T

x

.1678 0.5

.002 0.21

.098 0.67

.246 0.487

.1678 0.5

.002 0.21

.098 0.67

.246 0.487

.1678 0.5

.002 0.21

.098 0.67

.246 0.487

=

33.832 153.425

28.367 121.275

43.2764 162.12

22.627 85.417

33.832 43.2764

28.367 22.627

153.425 162.12

121.275 85.417

rearrange