60
N eural N etw orks for D ata Science A pplications M aster’s D egree in D ata Science Lecture 6: D esigning deep convolutional netw orks Lecturer : S. Scardapane

Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

N e u r a l N e t w o r k s f o r D a t a S c i e n c e A p p l i c a t i o n s

M a s t e r ’s D e g r e e i n D a t a S c i e n c e

L e c t u r e 6 : D e s i g n i n g d e e p c o n v o l u t i o n a l n e t w o r k s

L e c t u r e r : S . S c a r d a p a n e

Page 2: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

C o n t e n t f o r t h i s l e c t u r e

1 . W e d e s c r i b e s e v e r a l p r o b l e m s a f fl i c t i n g d e e p n e u r a l n e t w o r k s .

2 . W e i n t r o d u c e s o m e ( n o v e l a n d l e s s n o v e l ) t e c h n i q u e s t o h a n d l e t h e m ( d r o p o u t ,

b a t c h n o r m a l i z a t i o n , . . . ) .

3 . I n p a r a l l e l , w e p r o v i d e a n h i s t o r i c a l o v e r v i e w o f m o d e r n i m a g e c l a s s i fi c a t i o n

a r c h i t e c t u r e s .

2

Page 3: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

F r o m s h a l l o w t o d e e p n e t w o r k s

O v e r fi t t i n g a n d o p t i m i z a t i o n

Page 4: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

T h e d a n g e r o f o v e r fi t t i n g

A m o d e l t h a t p e r f o r m s w e l l o n n e w d a t a i s s a i d t o g e n e r a l i z e c o r r e c t l y . A m o d e l

t h a t p e r f o r m s p o o r l y o n n e w d a t a h a s o v e r fi t t h e t r a i n i n g d a t a .

F i g u r e 1 : S o u r c e : W i k i p e d i a .

3

Page 5: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

O v e r fi t t i n g i n n e u r a l n e t w o r k s

O v e r fi t t i n g i s a p o t e n t i a l p r o b l e m i n a n y m a c h i n e l e a r n i n g a l g o r i t h m ( s t u d i e d b y ,

a m o n g o t h e r s , s t a t i s t i c a l l e a r n i n g t h e o r y , P A C t h e o r y , . . . ) .

F o r d e e p N N , i t m i g h t b e w o r s e : a s u f fi c i e n t l y l a r g e d e e p n e t w o r k c a n m e m o r i z e t h e

e n t i r e t r a i n i n g s e t .1

I n t h i s s c e n a r i o , t h e n e t w o r k s e n d s u p b e i n g n o t h i n g m o r e t h a n a s i m p l e l o o k - u p

t a b l e o n o u r d a t a s e t .

1Z h a n g , C . , e t a l . , 2 0 1 6 . U n d e r s t a n d i n g d e e p l e a r n i n g r e q u i r e s r e t h i n k i n g g e n e r a l i z a t i o n . P r o c . I C L R 2 0 1 7 .

4

Page 6: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

V i s u a l i z a t i o n o f o v e r fi t t i n g i n d e e p n e t w o r k s

F i g u r e 2 : T a k e n f r o m ( Z h a n g e t a l . , 2 0 1 6 ) . A l a r g e C N N c a n fi t d a t a p e r f e c t l y e v e n w i t h

r a n d o m l a b e l s a n d / o r r a n d o m p i x e l s . 5

Page 7: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

T h e o p t i m i z a t i o n l a n d s c a p e

D i f f e r e n t l y f r o m l i n e a r r e g r e s s i o n , t h e o p t i m i z a t i o n l a n d s c a p e o f a N N c a n h a v e

m a n y s t a t i o n a r y p o i n t s , e a c h w i t h d i f f e r e n t g e n e r a l i z a t i o n p r o p e r t i e s .

W e c a n i n fl u e n c e t h i s l a n d s c a p e b y c h a n g i n g t h e i n i t i a l i z a t i o n o f t h e n e t w o r k , i t s

a r c h i t e c t u r e , o r e v e n t h e o p t i m i z a t i o n p r o c e s s .

O v e r a l l , t h i s i s s t i l l a n o p e n r e s e a r c h fi e l d , a n d t h e ‘ b e s t ’ s e l e c t i o n i s a c o m b i n a t i o n

o f t h e o r y , e x p e r i e n c e a n d e m p i r i c a l r u l e - o f - t h u m b s .

6

Page 8: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

O v e r v i e w o f t h e l e c t u r e

PreprocessingNormalization

Fully-connectedConvolutional

Max-pooling

Sigmoid,ReLUModelselection&

Hyper-parameteroptimization

Dropout

TrainingRegularization,Early-stopping

EvaluationOverfitting

Batchnormalization

Weightinitialization

Dataaugmentation(optional)

Trainingdataset

Testdataset

Validationdataset

Components

SGD

AdamAdagrad

Optimizers

...

7

Page 9: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

P r e - A l e x N e t s t r a t e g i e s

E a r l y - s t o p p i n g p r o c e d u r e s

Page 10: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

O v e r fi t t i n g a n d o p t i m i z a t i o n

B e c a u s e o f o v e r fi t t i n g , o p t i m i z i n g t o c o n v e r g e n c e m i g h t n o t b e b e n e fi c i a l ( t h e n e t -

w o r k s w i t c h e s f r o m l e a r n i n g t o m e m o r i z i n g ) .

F i g u r e 3 : B l u e l i n e i s t r a i n i n g l o s s , r e d l i n e i s t e s t l o s s . S o u r c e : W i k i p e d i a .

8

Page 11: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

E a r l y s t o p p i n g

E a r l y s t o p p i n g i s a p r o c e d u r e t o fi n d t h i s c r i t i c a l s w i t c h i n g p o i n t :

1 . K e e p a p o r t i o n o f t h e d a t a s e t a s t h e v a l i d a t i o n s e t .

2 . F o r e a c h e p o c h , c h e c k t h e v a l i d a t i o n l o s s ( o r a c c u r a c y ) .

3 . W h e n e v e r v a l i d a t i o n l o s s i s n o t i m p r o v i n g f o r a w h i l e ( a c e r t a i n n u m b e r o f

e p o c h s ) , s t o p t h e o p t i m i z a t i o n p r o c e s s .

E a r l y s t o p p i n g i s e x t r e m e l y c o m m o n i n n e u r a l n e t w o r k s ; i t h i g h l i g h t s t h e d i f f e r e n c e

b e t w e e n p u r e o p t i m i z a t i o n a n d l e a r n i n g .

9

Page 12: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

P r e - A l e x N e t s t r a t e g i e s

R e g u l a r i z a t i o n

Page 13: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

I m p r o v i n g t r a i n i n g w i t h r e g u l a r i z a t i o n

A w a r n i n g s i g n o f o v e r fi t t i n g c a n b e v e r y l a r g e w e i g h t s : t h e s e n e t w o r k s t e n d t o b e

l e s s s m o o t h a n d m a k e s h a r p e r c h a n g e s i n t h e i r o u t p u t s .

R e g u l a r i z a t i o n f o r c e s t h e o p t i m i z a t i o n t o s e l e c t a n e t w o r k w i t h s m a l l e r w e i g h t s b y

p e n a l i z i n g l a r g e n o r m s :

θ∗ = argminn∑

il(f (xi), yi) + C · ‖θ‖2

o

, ( 1 )

C i s a h y p e r - p a r a m e t e r : w i t h C = 0 w e h a v e n o r e g u l a r i z a t i o n ; w i t h a C t o o l a r g e , a l l

w e i g h t s w o u l d g o t o 0.

H i n t

A s m a l l C ( e . g . , 10−3) c a n i m p r o v e r e s u l t s i n m a n y s i t u a t i o n s .

1 0

Page 14: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

W e i g h t d e c a y *

C o n s i d e r t h e g r a d i e n t u p d a t e o f a r e g u l a r i z e d l o s s :

−G r a d i e n t o f l o s s = −∇h∑

il(f (xi), yi)i

−2Cθ . ( 2 )

I n t h e a b s e n c e o f t h e fi r s t t e r m , t h e w e i g h t s w o u l d d e c a y e x p o n e n t i a l l y t o z e r o . I n

p u r e S G D , t h i s f o r m o f r e g u l a r i z a t i o n i s a l s o c a l l e d w e i g h t d e c a y .

I n o t h e r o p t i m i z a t i o n a l g o r i t h m s , w e i g h t d e c a y a n d r e g u l a r i z a t i o n a r e d i f f e r e n t s t r a t e -

g i e s a n d m u s t b e i m p l e m e n t e d d i f f e r e n t l y .2

2L o s h c h i l o v , I . a n d H u t t e r , F . , 2 0 1 7 . F i x i n g w e i g h t d e c a y r e g u l a r i z a t i o n i n A d a m . a r X i v p r e p r i n t a r X i v : 1 7 1 1 . 0 5 1 0 1 .

1 1

Page 15: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

O t h e r f o r m s o f r e g u l a r i z a t i o n

R e g u l a r i z a t i o n a l l o w s u s t o s t e e r t h e o p t i m i z a t i o n p r o b l e m t o w a r d s f a v o u r a b l e s o -

l u t i o n s .

M a n y o t h e r t y p e s o f r e g u l a r i z a t i o n e x i s t s ! F o r e x a m p l e , r e p l a c i n g t h e E u c l i d e a n

n o r m o f t h e w e i g h t s w i t h t h e s u m o f a b s o l u t e v a l u e s :

θ∗ = argminn∑

il(f (xi), yi) + C ·

j|θj|o

, ( 3 )

c a n l e a d t o s p a r s e r s o l u t i o n s .3

3S c a r d a p a n e , S . , C o m m i n i e l l o , D . , H u s s a i n , A . a n d U n c i n i , A . , 2 0 1 7 . G r o u p s p a r s e r e g u l a r i z a t i o n f o r d e e p n e u r a l

n e t w o r k s . N e u r o c o m p u t i n g , 2 4 1 , p p . 8 1 - 8 9 .

1 2

Page 16: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

F r o m A l e x N e t t o R e s N e t

2 0 1 2 : A l e x N e t a n d I m a g e N e t

Page 17: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

O v e r v i e w o f A l e x N e t

A l e x N e t w a s t h e fi r s t C N N t o w i n a r e a l - w o r l d i m a g e c l a s s i fi c a t i o n c o m p e t i t i o n b y a

l a r g e m a r g i n .4

I t h a s 8 a d a p t a b l e l a y e r s ( 5 c o n v o l u t i o n a l , 3 f u l l y - c o n n e c t e d ) . F o r t r a i n i n g , i t e x -

p l o i t e d s e v e r a l i d e a s , s o m e o f w h i c h r e l a t i v e l y n o v e l a t t h e t i m e :

É R e L U a c t i v a t i o n i n s t e a d o f s i g m o i d - l i k e f u n c t i o n s ;

É D a t a a u g m e n t a t i o n a n d d r o p o u t t o h a n d l e o v e r fi t t i n g .

4K r i z h e v s k y , A . , S u t s k e v e r , I . a n d H i n t o n , G . E . , 2 0 1 2 . I m a g e n e t c l a s s i fi c a t i o n w i t h d e e p c o n v o l u t i o n a l n e u r a l

n e t w o r k s . I n A d v a n c e s i n n e u r a l i n f o r m a t i o n p r o c e s s i n g s y s t e m s ( p p . 1 0 9 7 - 1 1 0 5 ) .

1 3

Page 18: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

V i s u a l i z a t i o n o f A l e x N e t

F i g u r e 4 : T o p : L e N e t ( 1 9 9 8 ) , b o t t o m : s i m p l i fi e d v e r s i o n o f t h e o r i g i n a l A l e x N e t ( 2 0 1 2 ) .

S o u r c e : D i v e I n t o D e e p L e a r n i n g , C h a p t e r 7 . 1 . 1 4

Page 19: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

A d d i t i o n a l o b s e r v a t i o n s

É N o t e t h e c a r e f u l d e s i g n : e a r l y c o n v o l u t i o n s h a v e a l a r g e r r e c e p t i v e fi e l d b e -

c a u s e o f t h e l a r g e i m a g e s i z e .

É T h e d e n s e l a y e r s h a v e t h e v a s t m a j o r i t y o f p a r a m e t e r s (≈ 25M w e i g h t s ) a n d

c o m p u t a t i o n a l r e q u i r e m e n t s .

M a k i n g t h e m o d e l w o r k o n m u l t i p l e G P U s w a s a k e y a c h i e v e m e n t i n t h e o r i g i n a l

p a p e r . H o w e v e r , R e L U , d r o p o u t a n d d a t a a u g m e n t a t i o n w e r e e q u a l l y i m p o r t a n t f o r

i t s t r a i n i n g .

1 5

Page 20: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

F r o m A l e x N e t t o R e s N e t

I m a g e a u g m e n t a t i o n

Page 21: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

D a t a a u g m e n t a t i o n

D a t a a u g m e n t a t i o n i s a t e c h n i q u e t o v i r t u a l l y i n c r e a s e t h e s i z e o f t h e d a t a s e t a t

t r a i n i n g t i m e :

1 . S a m p l e o f m i n i - b a t c h o f e x a m p l e s ;

2 . F o r e a c h e x a m p l e , a p p l y o n e o r m o r e t r a n s f o r m a t i o n s r a n d o m l y s a m p l e d ( e . g . ,

fl i p p i n g , c r o p p i n g , . . . ) .

3 . T r a i n o n t h e t r a n s f o r m e d m i n i - b a t c h .

D a t a a u g m e n t a t i o n c a n b e e x t r e m e l y h e l p f u l f o r o v e r fi t t i n g , m a k i n g t h e n e t w o r k

m o r e r o b u s t t o s m a l l c h a n g e s i n t h e i n p u t d a t a .

1 6

Page 22: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

V i s u a l i z i n g i m a g e a u g m e n t a t i o n

F i g u r e 5 : D e v i s i n g d a t a a u g m e n t a t i o n s t r a t e g i e s i s e s p e c i a l l y e a s y w i t h i m a g e s , e . g . ,

c r o p p i n g , s h e a r i n g , s h i f t i n g ( i m a g e s o u r c e ) .

1 7

Page 23: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

O t h e r t y p e s o f d a t a a u g m e n t a t i o n

D e v i s i n g e f fi c i e n t f o r m s o f d a t a a u g m e n t a t i o n i s a p o p u l a r r e s e a r c h fi e l d .

F o r e x a m p l e , m i x u p5c o m b i n e s t w o e x a m p l e s (x1, y1) a n d (x2, y2) b y t a k i n g c o n v e x

c o m b i n a t i o n s w i t h a r a n d o m λ:

x = λx1 + (1− λ)x2 , ( 4 )

y = λy1 + (1− λ)y2 . ( 5 )

5Z h a n g , H . , C i s s e , M . , D a u p h i n , Y . N . a n d L o p e z - P a z , D . , 2 0 1 7 . m i x u p : B e y o n d e m p i r i c a l r i s k m i n i m i z a t i o n . a r X i v

p r e p r i n t a r X i v : 1 7 1 0 . 0 9 4 1 2 .

1 8

Page 24: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

F r o m A l e x N e t t o R e s N e t

D r o p o u t r e g u l a r i z a t i o n

Page 25: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

D a t a a u g m e n t a t i o n r e v i s i t e d

W h y i s d a t a a u g m e n t a t i o n h e l p f u l ?

T h e c o r e i d e a i s t h a t w e c a n m a k e t h e n e t w o r k m o r e r o b u s t b y a d d i n g s l i g h t p e r t u r -

b a t i o n s t o t h e i n p u t . W e c a n p r o v e t h i s t o b e a f o r m o f r e g u l a r i z a t i o n .6

D r o p o u t e x t e n d s t h i s i d e a t o t h e n e t w o r k i t s e l f : i n s t e a d o f p e r t u r b i n g t h e i m a g e s ,

w e p e r t u r b t h e h i d d e n l a y e r s b y r a n d o m l y d r o p p i n g ( r e m o v i n g ) s o m e o f t h e n e u r o n s .

6B i s h o p , C . M . , 1 9 9 5 . T r a i n i n g w i t h n o i s e i s e q u i v a l e n t t o T i k h o n o v r e g u l a r i z a t i o n . N e u r a l C o m p u t a t i o n , 7 ( 1 ) ,

p p . 1 0 8 - 1 1 6 .

1 9

Page 26: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

V i s u a l i z a t i o n o f d r o p o u t

Normaloperation

Withdropout

...

F i g u r e 6 : W i t h d r o p o u t , t h e n e t w o r k c a n b e s e e n a s b e i n g d r a w n f r o m a ( v e r y l a r g e )

c o l l e c t i o n o f s u b - n e t w o r k s . D r o p o u t c a n a l s o b e a p p l i e d t o t h e i n p u t o f t h e n e t w o r k .2 0

Page 27: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

D r o p o u t r e g u l a r i z a t i o n

C o n s i d e r t h e o u t p u t f r o m a g e n e r i c l a y e r g(x) o f t h e n e t w o r k . W i t h d r o p o u t , d u r i n g

t r a i n i n g w e r e p l a c e i t w i t h :

eg(x) = g(x)� m , ( 6 )

w h e r e m i s a b i n a r y v e c t o r w i t h e n t r i e s d r a w n f r o m a B e r n o u l l i d i s t r i b u t i o n w i t h p r o b -

a b i l i t y p ( i . e . , mi c a n b e e i t h e r 0 o r 1 w i t h p r o b a b i l i t y p) .

S r i v a s t a v a , N . , e t a l . , 2 0 1 4 . D r o p o u t : a s i m p l e w a y t o p r e v e n t n e u r a l n e t w o r k s f r o m o v e r fi t t i n g . T h e J o u r n a l o f

M a c h i n e L e a r n i n g R e s e a r c h , 1 5 ( 1 ) , p p . 1 9 2 9 - 1 9 5 8 .

2 1

Page 28: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

D r o p o u t a t i n f e r e n c e t i m e

D r o p o u t i s d i f f e r e n t f r o m o t h e r l a y e r s w e i n t r o d u c e d u p t o n o w , b e c a u s e i t i s r e -

m o v e d w h e n n o t t r a i n i n g .

T o d o t h i s , w h e n n o t t r a i n i n g w e r e p l a c e t h e o u t p u t o f t h e l a y e r w i t h i t s e x p e c t e d

v a l u e d u r i n g t r a i n i n g :

E[eg(x)] = p · g(x) . ( 7 )

F a i l i n g t o d o s o w i l l i n t r o d u c e a n u n d e s i r e d b i a s i n t h e n e t w o r k ’s b e h a v i o u r .

2 2

Page 29: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

D r o p o u t i n T e n s o r F l o w

M a n y l a y e r s i n T F h a v e s e p a r a t e t r a i n i n g / t e s t b e h a v i o u r s . T o s e l e c t t h e c o r r e c t o n e ,

o n e c a n u s e a s p e c i fi c fl a g w h e n c a l l i n g i t :

net = MyNetwork()net(x, training=True) # Use the training version

U s i n g t h e predict a n d fit f u n c t i o n s o f K e r a s a u t o m a t i c a l l y s e l e c t s t h e c o r r e c t

v e r s i o n .

2 3

Page 30: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

I m a g e N e t 2 0 1 4 : V G G a n d G o o g L e N e t

F r o m l a y e r s t o b l o c k s

Page 31: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

B u i l d i n g d e e p e r n e t w o r k s

I n o r d e r t o b u i l d d e e p e r m o d e l s , l a t e r g r o u p s s t a r t e d r e a s o n i n g i n t e r m s o f b l o c k s :

s e q u e n c e s o f l a y e r s t h a t c o u l d e a s i l y b e r e p e a t e d i n t h e m o d e l i t s e l f .

T h e O x f o r d ’s V i s u a l G e o m e t r y G r o u p ( V G G ) w a s a b l e t o o b t a i n g o o d i m p r o v e m e n t s

b y u s i n g a v e r y s i m p l e b l o c k :

1 . M u l t i p l e c o n v o l u t i o n a l l a y e r s w i t h s i z e 3× 3 a n d t h e s a m e n u m b e r o f fi l t e r s ;

2 . A s i n g l e m a x - p o o l i n g b l o c k w i t h 2× 2 w i n d o w s .

I n t h e V G G a r c h i t e c t u r e , fi l t e r s a r e g e n e r a l l y d o u b l e d a f t e r o n e o r t w o b l o c k s .7

7S i m o n y a n , K . a n d Z i s s e r m a n , A . , 2 0 1 4 . V e r y d e e p c o n v o l u t i o n a l n e t w o r k s f o r l a r g e - s c a l e i m a g e r e c o g n i t i o n .

a r X i v p r e p r i n t a r X i v : 1 4 0 9 . 1 5 5 6 .

2 4

Page 32: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

V i s u a l i z a t i o n o f V G G - 1 1

Convolutionallayer

3x3kernels

Max-pooling

2x2window

VGGblock

Convolutionallayer

3x3kernels

...

B=1,64filters

Blayers

B=1,128filters

B=3,128filters

B=3,256filters

B=3,512filters

Fully-connected(4096)

Fully-connected(4096)

Fully-connected(1000)

F i g u r e 7 : O r i g i n a l V G G - 1 1 . B y v a r y i n g t h e n u m b e r a n d c o n fi g u r a t i o n o f b l o c k s , w e g o f r o m

V G G - 1 1 t o V G G - 1 9 .

2 5

Page 33: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

A r a t i o n a l e f o r s t a c k i n g c o n v o l u t i o n a l b l o c k s

T h e V G G t e a m s u g g e s t e d t h a t m a n y , t h i n n e r c o n v o l u t i o n a l b l o c k s t e n d t o b e b e t t e r

t h a n f e w e r , l a r g e r o n e s . I n f a c t :

É I g n o r i n g t h e n o n l i n e a r i t i e s , a s e q u e n c e o f 3 3 × 3 c o n v o l u t i o n a l l a y e r s h a s t h e

s a m e r e c e p t i v e s i z e a s a s i n g l e 7× 7 l a y e r .É I t r e q u i r e s l e s s p a r a m e t e r s : 3 · (32) a g a i n s t 72 f o r a s i n g l e - i n p u t , s i n g l e - o u t p u t

c h a n n e l .

2 6

Page 34: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

I m a g e N e t 2 0 1 4 : V G G a n d G o o g L e N e t

2 0 1 4 - 2 0 1 6 : G o o g L e N e t

Page 35: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

R e d u c i n g c o m p u t a t i o n a l c o m p l e x i t y

W h i l e V G G - 1 1 h a s g o o d a c c u r a c y , i t c o m e s a t t h e c o s t o f a l a r g e r n u m b e r o f p a r a m -

e t e r s a n d c o m p u t a t i o n a l c o s t ( a l m o s t 10× s l o w e r ) .

G o o g L e N e t8w a s d e s i g n e d t o t r y t o m a t c h i t s a c c u r a c y w i t h a m u c h s m a l l c o m p u -

t a t i o n a l f o o t p r i n t , w i t h t w o i d e a s :

É T h e i n c e p t i o n b l o c k t o p r o c e s s f e a t u r e s o f v a r y i n g s i z e ;

É G l o b a l a v e r a g e p o o l i n g t o r e m o v e t h e fi n a l f u l l y - c o n n e c t e d l a y e r s .

8S z e g e d y , C . e t a l . , 2 0 1 5 . G o i n g d e e p e r w i t h c o n v o l u t i o n s . I n I E E E C V P R ( p p . 1 - 9 ) .

2 7

Page 36: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

T h e i n c e p t i o n b l o c k

F i g u r e 8 : S o u r c e : D i v e i n t o D e e p L e a r n i n g , C h a p t e r 7 . 4 .

2 8

Page 37: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

T h e i n c e p t i o n b l o c k

F i g u r e 9 : S o u r c e : D i v e i n t o D e e p L e a r n i n g , C h a p t e r 7 . 4 .

2 9

Page 38: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

E x p l a i n i n g t h e i n c e p t i o n b l o c k

É T h e i n c e p t i o n m o d u l e p o p u l a r i z e d t h e i d e a o f h a v i n g m u l t i p l e b r a n c h e s p r o -

c e s s i n g t h e s a m e i n p u t i n p a r a l l e l : i n t h i s c a s e , e l a b o r a t i n g t h e i m a g e a t v a r y i n g

l e v e l s o f g r a n u l a r i t y .

É T h e w h i t e 1× 1 c o n v o l u t i o n s a r e s i m p l y u s e d t o r e d u c e t h e n u m b e r o f c h a n n e l s ,

t o s i m p l i f y t h e c o m p u t a t i o n a l c o m p l e x i t y .

É T h e g l o b a l a v e r a g e p o o l i n g t a k e s t h e a v e r a g e f o r e a c h c h a n n e l w i t h r e s p e c t t o

a l l s p a t i a l p o s i t i o n s .

É N o t e t h e i n i t i a l s e t o f c o n v o l u t i o n s a n d m a x - p o o l i n g : t h i s i s s o m e t i m e s c a l l e d

a s t e m b l o c k .

3 0

Page 39: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

I m a g e N e t 2 0 1 4 : V G G a n d G o o g L e N e t

B a t c h n o r m a l i z a t i o n

Page 40: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

I n t r o d u c i n g b a t c h n o r m a l i z a t i o n l a y e r s

B a t c h n o r m a l i z a t i o n ( B N ) , i n t r o d u c e d i n 2 0 1 5 , i s a s i m p l e h e u r i s t i c t h a t a l l o w e d t o

t r a i n d e e p n e t w o r k s s i g n i fi c a n t l y b e t t e r .9

I t a l l o w s e a c h l a y e r t o c o n t r o l t h e m e a n a n d v a r i a n c e o f i t s o u t p u t s , b y a p p r o p r i a t e l y

r e s c a l i n g t h e m w i t h a s i m p l e a f fi n e t r a n s f o r m a t i o n .

A l o n g w i t h d r o p o u t a n d r e s i d u a l c o n n e c t i o n s ( i n t r o d u c e d n e x t ) , B N w a s i n s t r u m e n -

t a l i n c o n s o l i d a t i n g d e e p l e a r n i n g a s t h e s t a t e - o f - t h e - a r t i n m a n y fi e l d s .

9I o f f e , S . a n d S z e g e d y , C . , 2 0 1 5 . B a t c h n o r m a l i z a t i o n : A c c e l e r a t i n g d e e p n e t w o r k t r a i n i n g b y r e d u c i n g i n t e r n a l

c o v a r i a t e s h i f t . P r o c . I C M L .

3 1

Page 41: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

H o w b a t c h n o r m a l i z a t i o n w o r k s

C o n s i d e r a u n i t w i t h a c t i v a t i o n s s1, . . . , sB i n a m i n i - b a t c h . W e fi r s t w h i t e n t h e m :

si =si − μp

σ2 + ϵ, ( 8 )

w h e r e μ a n d σ a r e t h e e m p i r i c a l m e a n a n d v a r i a n c e s o f t h e m i n i - b a t c h . W e t h e n

r e s c a l e t h e m u s i n g s o m e t r a i n a b l e p a r a m e t e r s α a n d β:

si = αsi + β . ( 9 )

D u r i n g t e s t , μ a n d σ a r e fi x e d t o s o m e v a l u e c o m p u t e d o n t h e o v e r a l l t r a i n i n g s e t .

3 2

Page 42: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

B N i n f u l l y - c o n n e c t e d a n d c o n v o l u t i o n a l l a y e r s

T h e B N o p e r a t i o n i s g e n e r a l l y a p p l i e d b e f o r e t h e n o n l i n e a r i t y ( i . e . , i n o r d e r t o p r o j e c t

t h e a c t i v a t i o n s i n a r e g i o n w h i c h i s f a v o u r a b l e t o t h e n o n l i n e a r i t y ) :

g(x) = ϕ (B N (x)) . ( 1 0 )

F o r c o n v o l u t i o n a l l a y e r s , B N i s s t i l l a p p l i e d b e f o r e t h e n o n l i n e a r i t y , b u t o n e a c h c h a n -

n e l i n d e p e n d e n t l y ( i . e . , m e a n a n d v a r i a n c e a r e a d j u s t e d c h a n n e l - w i s e ) .

3 3

Page 43: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

C o m b i n i n g B N a n d V G G

Max-pooling

2x2window

"Enhanced"VGGblock

...

Convolutionallayer

3x3kernels

Activa

tionfunctio

n(element-w

ise)

Batch-normalizedconvolution

Batch

normaliza

tion

(channel-w

ise)

F i g u r e 1 0 : C o m b i n i n g V G G b l o c k s w i t h B N , w e c a n t r a i n d e e p e r V G G n e t w o r k s m o r e e a s i l y .

3 4

Page 44: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

W h y d o e s b a t c h n o r m a l i z a t i o n w o r k ?

D e s p i t e i t s s i m p l i c i t y , b a t c h n o r m a l i z a t i o n i s e x t r e m e l y e f f e c t i v e w h e n t r a i n i n g d e e p

N N s .

O r i g i n a l l y , i t s e f fi c i e n c y w a s b e l i e v e d t o b e c o n s e q u e n c e o f a s o - c a l l e d i n t e r n a l c o -

v a r i a t e s h i f t ( i . e . , d i s t r i b u t i o n s o f a c t i v a t i o n s c h a n g i n g l a y e r - b y - l a y e r ) .

N o w a d a y s , i t i s b e l i e v e d t h a t B N w o r k s b y m a k i n g t h e o p t i m i z a t i o n l a n d s c a p e s m o o t h e r

a n d , c o n s e q u e n t l y , t h e g r a d i e n t s m o r e p r e d i c t i v e .

S a n t u r k a r , S . e t a l . , 2 0 1 8 . H o w d o e s b a t c h n o r m a l i z a t i o n h e l p o p t i m i z a t i o n ? . I n N e u r I P S ( p p . 2 4 8 3 - 2 4 9 3 ) .

L i p t o n , Z . C . a n d S t e i n h a r d t , J . , 2 0 1 8 . T r o u b l i n g t r e n d s i n m a c h i n e l e a r n i n g s c h o l a r s h i p . a r X i v p r e p r i n t

a r X i v : 1 8 0 7 . 0 3 3 4 1 .

3 5

Page 45: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

I m a g e N e t 2 0 1 4 : V G G a n d G o o g L e N e t

2 0 1 6 a n d b e y o n d : r e s i d u a l c o n n e c t i o n s

Page 46: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

H o w d e e p c a n w e g o ?

R e s i d u a l c o n n e c t i o n s w e r e d e v e l o p e d f o r a s i m p l e o b s e r v a t i o n : i f a n e t w o r k w i t h Ll a y e r s p e r f o r m w e l l , a n e t w o r k w i t h L+ 1 l a y e r s s h o u l d p e r f o r m a t l e a s t a s w e l l .

H o w e v e r , t h i s w a s n o t m a t c h e d b y p r a c t i c e :1 0

1 0H e , K . , Z h a n g , X . , R e n , S . a n d S u n , J . , 2 0 1 6 . D e e p r e s i d u a l l e a r n i n g f o r i m a g e r e c o g n i t i o n . I n I E E E C V P R ( p p .

7 7 0 - 7 7 8 ) . 3 6

Page 47: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

R e s i d u a l c o n n e c t i o n s

T h e i d e a i s t o a l l o w f o r s k i p c o n n e c t i o n s , i n o r d e r t o m o d e l o n l y d e v i a t i o n s f r o m t h e

i d e n t i t y f u n c t i o n :

g(x) = f (x)+x , ( 1 1 )

w h e r e f (x) i s a s t a n d a r d n e t w o r k b l o c k ( i . e . , f u l l y c o n n e c t e d , V G G , . . . ) . T h i s s i m p l e

i d e a a l l o w s t o s c a l e n e t w o r k s u p t o 100 a n d m o r e l a y e r s .

I f x a n d g(x) h a v e d i f f e r e n t d i m e n s i o n a l i t y , w e c a n r e s c a l e x w i t h a m a t r i x m u l t i p l i -

c a t i o n o r a 1× 1 c o n v o l u t i v e b l o c k .

3 7

Page 48: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

V i s u a l i z i n g r e s i d u a l c o n n e c t i o n s

F i g u r e 1 1 : S o u r c e : D i v e i n t o D e e p L e a r n i n g , C h a p t e r 7 . 6 . 3 8

Page 49: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

R e s i d u a l c o n n e c t i o n s w i t h r e s c a l i n g

F i g u r e 1 2 : R e s i d u a l b l o c k w i t h r e s c a l i n g o f t h e s k i p c o n n e c t i o n . S o u r c e : D i v e i n t o D e e p

L e a r n i n g , C h a p t e r 7 . 6 .

3 9

Page 50: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

R e s i d u a l c o n n e c t i o n s w i t h r e s c a l i n g

F i g u r e 1 3 : C o n c a t e n a t i n g m a n y r e s i d u a l b l o c k s , w e o b t a i n a r e s i d u a l n e t w o r k ( R e s N e t ) .

S o u r c e : D i v e i n t o D e e p L e a r n i n g , C h a p t e r 7 . 6 .

4 0

Page 51: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

O v e r v i e w

F i g u r e 1 4 : E v o l u t i o n o f a c c u r a c y v s . c o m p u t a t i o n a l r e q u i r e m e n t s o f d i f f e r e n t

a r c h i t e c t u r e s .

4 1

Page 52: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

A d d i t i o n a l t o p i c s

A d v a n c e d a c t i v a t i o n f u n c t i o n s

Page 53: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

L e a k y R e L U a n d P R e L U

A c t i v a t i o n f u n c t i o n d e s i g n i s a n a c t i v e a r e a o f r e s e a r c h !

F o r e x a m p l e , a g e n e r a l i z e d v e r s i o n o f R e L U h a s a s m a l l ( e . g . , α = 0.1) n e g a t i v e

s l o p e , t o a v o i d t o o m a n y z e r o e s :

L e a k y - R e L U (s) =

(

s i f s ≥ 0 ,

−αs o t h e r w i s e .( 1 2 )

T h e n e g a t i v e s l o p e (α) c a n a l s o b e a d a p t e d f o r e a c h u n i t ( p a r a m e t r i c R e L U , P R e L U ) .1 1

1 1H e , K . , Z h a n g , X . , R e n , S . a n d S u n , J . , 2 0 1 5 . D e l v i n g d e e p i n t o r e c t i fi e r s : S u r p a s s i n g h u m a n - l e v e l p e r f o r m a n c e

o n I m a g e N e t c l a s s i fi c a t i o n . I n I E E E I n t . C o n f . o n C o m p u t e r V i s i o n ( p p . 1 0 2 6 - 1 0 3 4 ) .

4 2

Page 54: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

A d d i t i o n a l t o p i c s

W e i g h t i n i t i a l i z a t i o n

Page 55: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

H e w e i g h t i n i t i a l i z a t i o n

W e i g h t i n i t i a l i z a t i o n i s a n o t h e r i m p o r t a n t f a c t o r i n g r a d i e n t s t a b i l i t y . F o r e x a m p l e ,

X a v i e r ’s i n i t i a l i z a t i o n i s :

W ∼ U�

−p6

pnl + nl+1,

p6

pnl + nl+1

, ( 1 3 )

w h e r e nl a n d nl+1 i s t h e n u m b e r o f i n p u t / o u t p u t c h a n n e l s .

T F h a s a f u l l m o d u l e f o r a d v a n c e d i n i t i a l i z a t i o n s (tf.keras.initializer) .

X i a o , L . , B a h r i , Y . , S o h l - D i c k s t e i n , J . , S c h o e n h o l z , S . S . a n d P e n n i n g t o n , J . , 2 0 1 8 . D y n a m i c a l i s o m e t r y a n d a

m e a n fi e l d t h e o r y o f C N N s : H o w t o t r a i n 1 0 , 0 0 0 - l a y e r v a n i l l a c o n v o l u t i o n a l n e u r a l n e t w o r k s . a r X i v p r e p r i n t

a r X i v : 1 8 0 6 . 0 5 3 9 3 .

4 3

Page 56: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

A b o u t t h e l a b s e s s i o n

T h e t a s k

Page 57: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

L e a r n i n g t o s t e e r

W e c o n s i d e r a v e r y s i m p l i fi e d a u t o n o m o u s d r i v i n g s e t u p , w h e r e a C N N l e a r n s t o

s t e e r a c a r f r o m a s i n g l e c a m e r a f r a m e .

B o j a r s k i , M . , e t a l . , 2 0 1 6 . E n d t o e n d l e a r n i n g f o r s e l f - d r i v i n g c a r s . a r X i v p r e p r i n t a r X i v : 1 6 0 4 . 0 7 3 1 6 .

4 4

Page 58: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

O u r c o n v o l u t i o n a l a r c h i t e c t u r e

Cameraimage455x256x3

Convolutionallayer

Flattening

Fully-co

nnectedlayer

Batch

normaliza

tion

Convolutionallayer

Batch

normaliza

tion

Dropout

Steeringcommand

...

Fully-co

nnectedlayer

Dropout

4 5

Page 59: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

A b o u t t h e l a b s e s s i o n

A d d i t i o n a l T F m o d u l e s

Page 60: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/... · 2019-10-28 · Neural Netw orks for Data Science Applications M aster’s

N e w c o n c e p t s f o r t h e l a b s e s s i o n

1 . tf.image c o m b i n e d w i t h tf.data t o c r e a t e i m a g e p r o c e s s i n g p i p e l i n e s .

2 . D a t a a u g m e n t a t i o n ( a l s o w i t h tf.image) .3 . tf.keras.regularizers f o r a d d i n g r e g u l a r i z a t i o n t o a l a y e r .

4 . B u i l d i n g m o d e l s f r o m s c r a t c h a n d s w i t c h i n g b e t w e e n t r a i n i n g a n d t e s t b e h a v i o u r s .

5 . tf.function f o r s p e e d i n g u p c o d e u s i n g c u s t o m c o m p i l a t i o n .

4 6