Sparse Wavelet Representations andClassification
Challenge Kaggle in class
DREEM
March 5th 2015
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 1/10
Introduction
Introduction
I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.
I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.
I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.
I Skewed data set: only 7% of positive exs.I AUC metric.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10
Introduction
Introduction
I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.
I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.
I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.
I Skewed data set: only 7% of positive exs.I AUC metric.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10
Introduction
Introduction
I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.
I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.
I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.
I Skewed data set: only 7% of positive exs.I AUC metric.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10
Introduction
Introduction
I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.
I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.
I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.
I Skewed data set: only 7% of positive exs.
I AUC metric.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10
Introduction
Introduction
I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.
I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.
I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.
I Skewed data set: only 7% of positive exs.I AUC metric.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10
Introduction
The data
The AUC metric
Classification algorithms we triedScatNet + AUC gradient descent (Herschtal, Raskutti,ICML 2004ScatNet + Mixture of Probabilistic PrincipalComponent Analysis (MPPCA) (Tipping, Bishop,1999)Neural NetworksIdea (cheat ?) use subjects IDs
Results and Conclusion
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 3/10
The data
The data
I Times series
I Proportion of each subject in the training setI Positive rate for each subject in the training set
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 4/10
The data
The dataI Times seriesI Proportion of each subject in the training set
I Positive rate for each subject in the training set
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 4/10
The data
The dataI Times seriesI Proportion of each subject in the training setI Positive rate for each subject in the training set
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 4/10
The AUC metric
The AUC metric
I Area under ROC curve
I Estimator (Wilcoxon-Man-Whitney) :P(f (x+) > f (x−))
I Adapted to skewed data
false positive rate0 0.2 0.4 0.6 0.8 1
tru
e p
os
itiv
e r
ate
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1ROC curve - subj. 3 - (test on 20% of train)
AUC GD - 0.8854MPPCA - 0.8480NN - 0.8318
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 5/10
The AUC metric
The AUC metric
I Area under ROC curveI Estimator (Wilcoxon-
Man-Whitney) :P(f (x+) > f (x−))
I Adapted to skewed data
false positive rate0 0.2 0.4 0.6 0.8 1
tru
e p
os
itiv
e r
ate
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1ROC curve - subj. 3 - (test on 20% of train)
AUC GD - 0.8854MPPCA - 0.8480NN - 0.8318
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 5/10
The AUC metric
The AUC metric
I Area under ROC curveI Estimator (Wilcoxon-
Man-Whitney) :P(f (x+) > f (x−))
I Adapted to skewed data
false positive rate0 0.2 0.4 0.6 0.8 1
tru
e p
os
itiv
e r
ate
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1ROC curve - subj. 3 - (test on 20% of train)
AUC GD - 0.8854MPPCA - 0.8480NN - 0.8318
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 5/10
Classification algorithms we tried
AUC gradient descentI linear classifier :
ˆAUC(~ )β =1
PQ
P∑j=1
Q∑k=1
g(~β.(x+j − x−
k )) (1)
I softening (for differentiability)
R(~β) =1
PQ
P∑j=1
Q∑k=1
sigmoid(~β.(x+j − x−
k )) (2)
I computationally feasible estimator :
Rl(~β) =1Q
Q∑k=1
sigmoid(~β.(x+kmodP − x−
k )) (3)
I progressively increase norm of β =⇒ close to realAUC + no local maximum
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 6/10
Classification algorithms we tried
AUC gradient descentI linear classifier :
ˆAUC(~ )β =1
PQ
P∑j=1
Q∑k=1
g(~β.(x+j − x−
k )) (1)
I softening (for differentiability)
R(~β) =1
PQ
P∑j=1
Q∑k=1
sigmoid(~β.(x+j − x−
k )) (2)
I computationally feasible estimator :
Rl(~β) =1Q
Q∑k=1
sigmoid(~β.(x+kmodP − x−
k )) (3)
I progressively increase norm of β =⇒ close to realAUC + no local maximum
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 6/10
Classification algorithms we tried
AUC gradient descentI linear classifier :
ˆAUC(~ )β =1
PQ
P∑j=1
Q∑k=1
g(~β.(x+j − x−
k )) (1)
I softening (for differentiability)
R(~β) =1
PQ
P∑j=1
Q∑k=1
sigmoid(~β.(x+j − x−
k )) (2)
I computationally feasible estimator :
Rl(~β) =1Q
Q∑k=1
sigmoid(~β.(x+kmodP − x−
k )) (3)
I progressively increase norm of β =⇒ close to realAUC + no local maximum
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 6/10
Classification algorithms we tried
AUC gradient descentI linear classifier :
ˆAUC(~ )β =1
PQ
P∑j=1
Q∑k=1
g(~β.(x+j − x−
k )) (1)
I softening (for differentiability)
R(~β) =1
PQ
P∑j=1
Q∑k=1
sigmoid(~β.(x+j − x−
k )) (2)
I computationally feasible estimator :
Rl(~β) =1Q
Q∑k=1
sigmoid(~β.(x+kmodP − x−
k )) (3)
I progressively increase norm of β =⇒ close to realAUC + no local maximum
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 6/10
Classification algorithms we tried
MPPCA
I Generative model
t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)
I Idea : train a model for each classI Probabilistic model⇒ independent of ratio +/−I Compare probabilities of belonging to each class
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 7/10
Classification algorithms we tried
MPPCA
I Generative model
t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)
I Idea : train a model for each class
I Probabilistic model⇒ independent of ratio +/−I Compare probabilities of belonging to each class
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 7/10
Classification algorithms we tried
MPPCA
I Generative model
t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)
I Idea : train a model for each classI Probabilistic model⇒ independent of ratio +/−
I Compare probabilities of belonging to each class
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 7/10
Classification algorithms we tried
MPPCA
I Generative model
t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)
I Idea : train a model for each classI Probabilistic model⇒ independent of ratio +/−I Compare probabilities of belonging to each class
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 7/10
Classification algorithms we tried
Neural Networks
I Apply on raw data (and not on top of ScatNet) not tooverfit.
I Cross-validations in the hyper-parameters space.I Used as a black box.I Best architecture : 4 hidden layers with 14,13,10,9
neurons, and 400 iterations.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 8/10
Classification algorithms we tried
Neural Networks
I Apply on raw data (and not on top of ScatNet) not tooverfit.
I Cross-validations in the hyper-parameters space.
I Used as a black box.I Best architecture : 4 hidden layers with 14,13,10,9
neurons, and 400 iterations.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 8/10
Classification algorithms we tried
Neural Networks
I Apply on raw data (and not on top of ScatNet) not tooverfit.
I Cross-validations in the hyper-parameters space.I Used as a black box.
I Best architecture : 4 hidden layers with 14,13,10,9neurons, and 400 iterations.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 8/10
Classification algorithms we tried
Neural Networks
I Apply on raw data (and not on top of ScatNet) not tooverfit.
I Cross-validations in the hyper-parameters space.I Used as a black box.I Best architecture : 4 hidden layers with 14,13,10,9
neurons, and 400 iterations.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 8/10
Classification algorithms we tried
Use subjects IDs
I Idea 1 : different behaviors among subjects⇒ train 1classifier/subject
I ... lead to overfittingI and not useful for “ready trained” device
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 9/10
Classification algorithms we tried
Use subjects IDs
I Idea 1 : different behaviors among subjects⇒ train 1classifier/subject
I ... lead to overfittingI and not useful for “ready trained” device
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 9/10
Results and Conclusion
Conclusion
MPPCA NN AUC GD0.81578 0.84976 0.86933
I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for
trainingI Possible improvements : ScatNet + AUCSVM
(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?
I Reinforcement learning to adapt the algorithm to thesubject using the device.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10
Results and Conclusion
Conclusion
MPPCA NN AUC GD0.81578 0.84976 0.86933
I Best score : linear classifier on scattering transforms
I NN : blind but goodI All algorithms : less than 15 minutes on a laptop for
trainingI Possible improvements : ScatNet + AUCSVM
(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?
I Reinforcement learning to adapt the algorithm to thesubject using the device.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10
Results and Conclusion
Conclusion
MPPCA NN AUC GD0.81578 0.84976 0.86933
I Best score : linear classifier on scattering transformsI NN : blind but good
I All algorithms : less than 15 minutes on a laptop fortraining
I Possible improvements : ScatNet + AUCSVM(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?
I Reinforcement learning to adapt the algorithm to thesubject using the device.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10
Results and Conclusion
Conclusion
MPPCA NN AUC GD0.81578 0.84976 0.86933
I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for
training
I Possible improvements : ScatNet + AUCSVM(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?
I Reinforcement learning to adapt the algorithm to thesubject using the device.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10
Results and Conclusion
Conclusion
MPPCA NN AUC GD0.81578 0.84976 0.86933
I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for
trainingI Possible improvements : ScatNet + AUCSVM
(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?
I Reinforcement learning to adapt the algorithm to thesubject using the device.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10
Results and Conclusion
Conclusion
MPPCA NN AUC GD0.81578 0.84976 0.86933
I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for
trainingI Possible improvements : ScatNet + AUCSVM
(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?
I Reinforcement learning to adapt the algorithm to thesubject using the device.
MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10