Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Energy Propagation in Deep Convolutional NeuralNetworks
Srijanie Dey
Washington State University, Vancouver
September 20, 2017
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 1 / 27
Overview
1 Introduction
2 Motivation
3 Goals
4 Main Results
5 Conclusion
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 2 / 27
Introduction
What is a Convolutional Neural Network (CNN)?
1. Inspired by the animal visual cortex
2. A neural network with convolutional layers that filter input foruseful information
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 3 / 27
Introduction
What is a Convolutional Neural Network (CNN)?
1. Inspired by the animal visual cortex
2. A neural network with convolutional layers that filter input foruseful information
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 3 / 27
Introduction
What is a Convolutional Neural Network (CNN)?
1. Inspired by the animal visual cortex
2. A neural network with convolutional layers that filter input foruseful information
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 3 / 27
Motivation
Let’s talk about a little Physics : Energy Loss During Transmission
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 4 / 27
Motivation
Applications like image processing, natural language processing useDeep CNNs (DCNNs)
DCNN equals series of Convolutional, Non-Linear, Subsampling layers
Conservation of features throughout the network is a big challenge
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 5 / 27
Motivation
Applications like image processing, natural language processing useDeep CNNs (DCNNs)
DCNN equals series of Convolutional, Non-Linear, Subsampling layers
Conservation of features throughout the network is a big challenge
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 5 / 27
Motivation
Applications like image processing, natural language processing useDeep CNNs (DCNNs)
DCNN equals series of Convolutional, Non-Linear, Subsampling layers
Conservation of features throughout the network is a big challenge
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 5 / 27
Goals
(1)
How quickly the energy contained in the feature maps decays across layers?
(2)
How many layers are needed to preserve most of the input signal energy inthe feature vector?
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 6 / 27
Goals
(1)
How quickly the energy contained in the feature maps decays across layers?
(2)
How many layers are needed to preserve most of the input signal energy inthe feature vector?
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 6 / 27
Main Results
Notations
1. Consider input signals f 2 L2(Rd)
2. Employ module-sequence
⌦ := (( n, | . |, Id))n2N (1)
where,(i) n :=
��n [ �g�n
�n2⇤n
✓ L1(Rd) \ L2(Rd)
(ii) the modulus non-linearity | . |: L2(Rd) ! L2(Rd), |f |(x) := |f (x)|
(iii) Pooling through the Identity operator with pooling factor equalto 1
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 7 / 27
Main Results
Notations
1. Consider input signals f 2 L2(Rd)
2. Employ module-sequence
⌦ := (( n, | . |, Id))n2N (1)
where,(i) n :=
��n [ �g�n
�n2⇤n
✓ L1(Rd) \ L2(Rd)
(ii) the modulus non-linearity | . |: L2(Rd) ! L2(Rd), |f |(x) := |f (x)|
(iii) Pooling through the Identity operator with pooling factor equalto 1
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 7 / 27
Main Results
Notations
1. Consider input signals f 2 L2(Rd)
2. Employ module-sequence
⌦ := (( n, | . |, Id))n2N (1)
where,(i) n :=
��n [ �g�n
�n2⇤n
✓ L1(Rd) \ L2(Rd)
(ii) the modulus non-linearity | . |: L2(Rd) ! L2(Rd), |f |(x) := |f (x)|
(iii) Pooling through the Identity operator with pooling factor equalto 1
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 7 / 27
Main Results
Note:
Frame Condition
Ankf k22
kf ⇤�nk22
+X
�n2⇤n
kf ⇤g�nk2 Bnkf k22
, 8f 2 L2(Rd), An,Bn > 0
(2)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 8 / 27
Main Results
Notations
3. There exists an operator Un⇥�n⇤where,
Un⇥�n⇤f = |f ⇤ g�n |. (3)
4. Extending (3) paths on index sets
q = (�1
,�2
, ...,�n) 2 ⇤1X⇤2X ....X⇤n =: ⇤n, n 2 N
according to,
U⇥q⇤f = U
⇥(�
1
,�2
, ...,�n)⇤:= Un[�n]...U2
[�2
]U1
[�1
]f (4)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 9 / 27
Main Results
Notations
3. There exists an operator Un⇥�n⇤where,
Un⇥�n⇤f = |f ⇤ g�n |. (3)
4. Extending (3) paths on index sets
q = (�1
,�2
, ...,�n) 2 ⇤1X⇤2X ....X⇤n =: ⇤n, n 2 N
according to,
U⇥q⇤f = U
⇥(�
1
,�2
, ...,�n)⇤:= Un[�n]...U2
[�2
]U1
[�1
]f (4)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 9 / 27
Main Results
Notations
3. There exists an operator Un⇥�n⇤where,
Un⇥�n⇤f = |f ⇤ g�n |. (3)
4. Extending (3) paths on index sets
q = (�1
,�2
, ...,�n) 2 ⇤1X⇤2X ....X⇤n =: ⇤n, n 2 N
according to,
U⇥q⇤f = U
⇥(�
1
,�2
, ...,�n)⇤:= Un[�n]...U2
[�2
]U1
[�1
]f (4)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 9 / 27
Main Results
Defintions
5. Feature Map :A function mapping input features to hidden units to form new inputfor next layer. The signals, associated with n-th network layer
U[q]f , q 2 ⇤nreferred to as feature maps.
6. Feature Vector :n-dimensional vector of numerical features that represent an object.
�⌦
(f ) :=1[
n=0
�n⌦
(f ) (5)
referred to as feature vector,where, �n⌦
(f ) :=�(U[q]f ) ⇤ �n+1
q2⇤n .
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 10 / 27
Main Results
Defintions
5. Feature Map :A function mapping input features to hidden units to form new inputfor next layer. The signals, associated with n-th network layer
U[q]f , q 2 ⇤nreferred to as feature maps.
6. Feature Vector :n-dimensional vector of numerical features that represent an object.
�⌦
(f ) :=1[
n=0
�n⌦
(f ) (5)
referred to as feature vector,where, �n⌦
(f ) :=�(U[q]f ) ⇤ �n+1
q2⇤n .
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 10 / 27
Main Results
Defintions
5. Feature Map :A function mapping input features to hidden units to form new inputfor next layer. The signals, associated with n-th network layer
U[q]f , q 2 ⇤nreferred to as feature maps.
6. Feature Vector :n-dimensional vector of numerical features that represent an object.
�⌦
(f ) :=1[
n=0
�n⌦
(f ) (5)
referred to as feature vector,where, �n⌦
(f ) :=�(U[q]f ) ⇤ �n+1
q2⇤n .
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 10 / 27
Main Results
Figure: (1) Network architecture underlying the feature extractor (5). The index
�(k)n corresponds to the k-th filter g�(k) of the collection n associated with then-th network layer. The function �n+1
is the output-generating filter of the n-thnetwork layer. The root of the network corresponds to n = 0.
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 11 / 27
Main Results
Energy Decay and Energy Conservation
Energy Decay :Study the decay of
WN(f ) :=X
q2⇤N
kU[q]f k22
, f 2 L2(Rd) (6)
across layers.
Energy Conservation :Establish conditions for conservation of energy in the sense of
A⌦
kf k22
k|�⌦
(f )|k2 B⌦
kf k22
, 8f 2 L2(Rd) (7)
with A⌦
,B⌦
> 0
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 12 / 27
Main Results
Energy Decay and Energy Conservation
Energy Decay :Study the decay of
WN(f ) :=X
q2⇤N
kU[q]f k22
, f 2 L2(Rd) (6)
across layers.
Energy Conservation :Establish conditions for conservation of energy in the sense of
A⌦
kf k22
k|�⌦
(f )|k2 B⌦
kf k22
, 8f 2 L2(Rd) (7)
with A⌦
,B⌦
> 0
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 12 / 27
Main Results
Energy Decay and Energy Conservation
Energy Decay :Study the decay of
WN(f ) :=X
q2⇤N
kU[q]f k22
, f 2 L2(Rd) (6)
across layers.
Energy Conservation :Establish conditions for conservation of energy in the sense of
A⌦
kf k22
k|�⌦
(f )|k2 B⌦
kf k22
, 8f 2 L2(Rd) (7)
with A⌦
,B⌦
> 0
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 12 / 27
Main Results
Previous work :For wavelet-based networks, there exists " > 0 and a > 1 such that(6) satisfies
WN(f ) Z
R|f̂ (!)|2
✓1�
����r̂g✓
!
"aN�1
◆����2
◆d! (8)
for real-valued 1-D signals, f 2 L2(R), N � 2, where r̂g (!) := e�!2
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 13 / 27
Main Results
Figure: (2) Illustration of the impact of network depth N on the upper bound on
WN(f ) for " = 1. The function hN(!) :=
✓1� r̂g (
!"aN )
◆is of increasing
high-pass nature.Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 14 / 27
Main Results
Assumption 1
The�g�n
�n2⇤n
, n 2 N, are analytic in the following sense: For every layer
index n 2 N, for every �n 2 ⇤n, there exists an orthant HA�n✓ Rd , for
A�n 2 O(d), such thatsupp( ˆg�n) ✓ HA�n
(9)
Moreover, there exists � > 0 such that
X
�n2⇤n
| ˆg�n(!)|2 = 0, a.e. ! 2 B�(0) (10)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 15 / 27
Main Results
Note:
Littlewood-Paley Condition
An |�̂n(!)|2 +X
�n2⇤n
| ˆg�n(!)|2 Bn, a.e. ! 2 Rd (11)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 16 / 27
Main Results
Theorem 1
Let ⌦ be the module-sequence (1) with filters�g�n
�n2⇤n
satisfyingconditions in Assumption 1, and let � > 0 be the radius of the spectral gapB�(0) left by the filters
�g�n
�n2⇤n
according to (10). Furthermore, let
s � 0, AN⌦
:=QN
k=1
min�1,Ak
,BN
⌦
:=QN
k=1
max�1,Bk
, and
↵ :=
(1, d=1,
log2
(pd/(d � 1/2)), d � 2.
(12)
(i) We have,
WN(f ) BN⌦
Z
Rd|f̂ (!)|2
✓1�����r̂l✓
!
N↵�
◆����2
◆d!, 8f 2 L2(Rd), 8N � 1
(13)
where r̂l : Rd ! R, r̂l(!) := (1� |!|)l+
, with l > bd2c+ 1.
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 17 / 27
Main Results
Theorem 1
Let ⌦ be the module-sequence (1) with filters�g�n
�n2⇤n
satisfyingconditions in Assumption 1, and let � > 0 be the radius of the spectral gapB�(0) left by the filters
�g�n
�n2⇤n
according to (10). Furthermore, let
s � 0, AN⌦
:=QN
k=1
min�1,Ak
,BN
⌦
:=QN
k=1
max�1,Bk
, and
↵ :=
(1, d=1,
log2
(pd/(d � 1/2)), d � 2.
(12)
(i) We have,
WN(f ) BN⌦
Z
Rd|f̂ (!)|2
✓1�����r̂l✓
!
N↵�
◆����2
◆d!, 8f 2 L2(Rd), 8N � 1
(13)
where r̂l : Rd ! R, r̂l(!) := (1� |!|)l+
, with l > bd2c+ 1.
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 17 / 27
Main Results
Theorem 1 (contd).
(ii) For every Sobolev function f 2 Hs(Rd), there exists � > 0 suchthat
WN(f ) = O(BN⌦
N�↵(2s+�)2s+�+1 ). (14)
(iii) If, in addition to Assumption 1,
0 < A⌦
:= limN!1
AN⌦
B⌦
:= limN!1
BN⌦
< 1 (15)
then we have energy conservation according to
A⌦
kf k22
|k�⌦
(f )|k2 B⌦
kf k22
, 8f 2 L2(R)d . (16)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 18 / 27
Main Results
Theorem 1 (contd).
(ii) For every Sobolev function f 2 Hs(Rd), there exists � > 0 suchthat
WN(f ) = O(BN⌦
N�↵(2s+�)2s+�+1 ). (14)
(iii) If, in addition to Assumption 1,
0 < A⌦
:= limN!1
AN⌦
B⌦
:= limN!1
BN⌦
< 1 (15)
then we have energy conservation according to
A⌦
kf k22
|k�⌦
(f )|k2 B⌦
kf k22
, 8f 2 L2(R)d . (16)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 18 / 27
Main Results
Theorem 1 (contd).
(ii) For every Sobolev function f 2 Hs(Rd), there exists � > 0 suchthat
WN(f ) = O(BN⌦
N�↵(2s+�)2s+�+1 ). (14)
(iii) If, in addition to Assumption 1,
0 < A⌦
:= limN!1
AN⌦
B⌦
:= limN!1
BN⌦
< 1 (15)
then we have energy conservation according to
A⌦
kf k22
|k�⌦
(f )|k2 B⌦
kf k22
, 8f 2 L2(R)d . (16)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 18 / 27
Main Results
Theorem 2
Let r̂l : R ! R, r̂l(!) := (1� |!|)l+
, with l > 1.
Wavelets: Let the mother and father wavelets ,� 2 L1(R) \ L2(R)satisfy supp( ̂) ✓ [1/2, 2] and
|�̂(!)|2 +1X
j=1
| ̂(2�j!)|2 = 1, a.e.! � 0. (17)
Moreover, let gj(x) := 2j (2jx), for x 2 R, j � 1, andgj(x) := 2|j | (�2|j |x), for x 2 R, j �1, and set �(x) := �(|x |), forx 2 R.
Let ⌦ be the module-sequence (1) with filters =
�� [ �gj
j2Z\
�0
in every network layer.
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 19 / 27
Main Results
Theorem 2
Let r̂l : R ! R, r̂l(!) := (1� |!|)l+
, with l > 1.
Wavelets: Let the mother and father wavelets ,� 2 L1(R) \ L2(R)satisfy supp( ̂) ✓ [1/2, 2] and
|�̂(!)|2 +1X
j=1
| ̂(2�j!)|2 = 1, a.e.! � 0. (17)
Moreover, let gj(x) := 2j (2jx), for x 2 R, j � 1, andgj(x) := 2|j | (�2|j |x), for x 2 R, j �1, and set �(x) := �(|x |), forx 2 R.
Let ⌦ be the module-sequence (1) with filters =
�� [ �gj
j2Z\
�0
in every network layer.
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 19 / 27
Main Results
Theorem 2 (contd).
Then,
WN(f ) BN⌦
Z
Rd|f̂ (!)|2
✓1�
����r̂l✓
!
(5/3)N�1
◆����2
◆d!, (18)
8f 2 L2(Rd), 8N � 1
For every Sobolev function f 2 HsR, there exists � > 0 such that
WN(f ) = O✓(5/3)�
N(2s+�)2s+�+1
◆. (19)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 20 / 27
Main Results
Theorem 2 (contd).
Then,
WN(f ) BN⌦
Z
Rd|f̂ (!)|2
✓1�
����r̂l✓
!
(5/3)N�1
◆����2
◆d!, (18)
8f 2 L2(Rd), 8N � 1
For every Sobolev function f 2 HsR, there exists � > 0 such that
WN(f ) = O✓(5/3)�
N(2s+�)2s+�+1
◆. (19)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 20 / 27
Main Results
Number of Layers Needed
How many number of layers are needed to have most of the energy,for instance, ((1� ").100)%, of the input signal energy be containedin the feature vector?
Consider Parseval frames in all layers (An = Bn = 1, n 2 N), andcheck for bounds of the form,
(1� ")kf k22
NX
n=0
|k�n⌦
(f )k|2 kf k22
(20)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 21 / 27
Main Results
Number of Layers Needed
How many number of layers are needed to have most of the energy,for instance, ((1� ").100)%, of the input signal energy be containedin the feature vector?
Consider Parseval frames in all layers (An = Bn = 1, n 2 N), andcheck for bounds of the form,
(1� ")kf k22
NX
n=0
|k�n⌦
(f )k|2 kf k22
(20)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 21 / 27
Main Results
Number of Layers Needed
How many number of layers are needed to have most of the energy,for instance, ((1� ").100)%, of the input signal energy be containedin the feature vector?
Consider Parseval frames in all layers (An = Bn = 1, n 2 N), andcheck for bounds of the form,
(1� ")kf k22
NX
n=0
|k�n⌦
(f )k|2 kf k22
(20)
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 21 / 27
Main Results
Corollary 1
Let ⌦ be the module sequence (1) with filters�g�n
�n2⇤n
satisfyingthe conditions in Assumption 1 and let the corresponding framebounds be An = Bn = 1, n 2 N. Let � > 0 be the radius of thespectral gap B�(0) left by the filters
�g�n
�n2⇤n
according to (10).
Furthermore, let l > bd2c+ 1, " 2 (0, 1), ↵ is the decay exponent,
and f 2 L2(Rd) be L-band limited. If
N �l L
(1� (1� ")1
2l )�
!1/↵
� 1m, (21)
then (20) holds.
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 22 / 27
Main Results
Corollary 1
Let ⌦ be the module sequence (1) with filters�g�n
�n2⇤n
satisfyingthe conditions in Assumption 1 and let the corresponding framebounds be An = Bn = 1, n 2 N. Let � > 0 be the radius of thespectral gap B�(0) left by the filters
�g�n
�n2⇤n
according to (10).
Furthermore, let l > bd2c+ 1, " 2 (0, 1), ↵ is the decay exponent,
and f 2 L2(Rd) be L-band limited. If
N �l L
(1� (1� ")1
2l )�
!1/↵
� 1m, (21)
then (20) holds.
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 22 / 27
Main Results
Figure: (3) Number N of layers needed to ensure that ((1� ").100)% of the inputsignal energy is contained in the features generated in the first N network layers.
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 23 / 27
Conclusion
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 24 / 27
References
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 25 / 27
Questions
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 26 / 27
Srijanie Dey (WSU Vancouver) Seminar September 20, 2017 27 / 27