Learning Ensembles of Convolutional Neural Networks
Liran Chen!Faculty mentor: Greg Shakhnarovich
Motivation and Data
Build a model/set of models to efficiently classified the images!!!Mixed National Institute of Standards and Technology database (MNIST) is a large database of handwritten zip code digits provided by the U.S. Postal Service that is commonly used for training various image processing systems. !The database contains 60,000 training images and 10,000 testing images.
Convolutional Neural Network
• Inspired by biological processes!!•A type of feed-forward artificial neural network!!• Individual neurons are tiled in such a way that they respond to overlapping regions in the visual field!!•Widely used models for image recognition/classification!!•Composed of convolutional layers, fully-connected layers, softmax layer, etc.
Training Procedure
Gradient Descent!“Hill climbing” •Batch !
•Mini-Batch!•Stochastic!•Online
•Epoch!!Training result depends on stochastic procedure!!•Non-convex model !!•Randomness comes from
Training Procedure
Order of data in each epoch!!!Initialization of the parameters!!!Learning rates
Ensemble LearningA method for independently generating multiple versions of a predictor network and using them to get an aggregated prediction.
(Krizhevsky et al., 2012)!
Five convolutional layers and three are fully- connected layers!
Averaging the predictions of five similar CNNs gives an error rate of 16.4%, which is reduced from 18.2%. !
!With an extra sixth convolutional layer over the last pooling layer and then “fine-tuning” it on ILSVRC-2012 gives an error rate of 16.6%. Averaging the predictions of seven such CNNs reduce the error to 15.4%!
!!!
(Zeiler & Fergus, 2013)!
!!!
(Naftaly, et al., 1996)!!
Autoregression model!Q is the number of model used in ensemble. Increasing Q contributes to reducing the error.
Ensemble Learning
30 sets of CNN, each is trained from 1 to 20 epochs.!!Increasing epochs doesn’t contribute to the reduce of error
Ensemble Learning
Increasing number of averaged networks contributes to the reduce of error!
!After a threshold, cannot gain accuracy anymore.
Ensemble Learning
The tradeoff between number of models and their complexity!!Fix nnet*epoch to be 30!!Provided enough machines, training time is reduced while gaining training accuracy.
Ensemble Learning
New Softmax Layer!!!!!!!!!
• Use stack Probabilities from training to train new softmax layer
Bagging!!• The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets!!
• Aggregate the predictors by voting
(Breiman, 1994)
Advantage of Ensemble Learning
!!Gain accuracy!!!Save time
Theoretical Work
Theoretical Work
Appendix I
Appendix II
Reducing the units in fully connected layer from 30 to 1.!!Increasing epochs does contributes to the reducing of testing error.
Ensemble the networks
Fix the total epochs trained,! i.e., nnet*epoch = constant
0.5014 0.575 0.4377 0.4502 0.4655 0.3711 0.3704 0.3366 0.336 0.33920.2167 0.2777 0.2074 0.2183 0.2402 0.1545 0.1801 0.1489 0.1328 0.150.2363 0.2245 0.2892 0.2169 0.183 0.1232 0.1318 0.1229 0.1425 0.15430.1141 0.1131 0.1813 0.1133 0.1145 0.0577 0.0601 0.0538 0.0561 0.05940.085 0.0952 0.1504 0.0769 0.0869 0.0468 0.0469 0.043 0.0432 0.04410.0765 0.0775 0.1281 0.0689 0.0867 0.043 0.0415 0.0381 0.039 0.04050.0649 0.0713 0.1259 0.0795 0.0864 0.0405 0.0383 0.036 0.036 0.03550.0689 0.0697 0.1237 0.0735 0.0989 0.0397 0.0375 0.035 0.0362 0.03560.0652 0.0645 0.0989 0.0596 0.0989 0.0381 0.0354 0.0328 0.0334 0.03390.0664 0.0576 0.0846 0.056 0.086 0.0366 0.0347 0.0314 0.034 0.03290.0667 0.0556 0.0824 0.0535 0.1025 0.0355 0.0326 0.0304 0.0306 0.03110.0642 0.0504 0.0781 0.0517 0.0895 0.0323 0.0326 0.0297 0.0297 0.02880.06 0.0443 0.0697 0.0425 0.0704 0.0305 0.0291 0.0266 0.0275 0.0266
0.0582 0.0448 0.0639 0.0425 0.0583 0.031 0.0293 0.0265 0.0271 0.02680.0522 0.0416 0.0612 0.0426 0.0501 0.0301 0.0296 0.0268 0.0275 0.02680.0526 0.0433 0.0559 0.0428 0.0464 0.031 0.0295 0.0266 0.028 0.02730.0526 0.0431 0.0556 0.0412 0.0453 0.0326 0.0305 0.0285 0.0295 0.02860.0522 0.0407 0.049 0.039 0.0428 0.03 0.0288 0.0266 0.0276 0.02770.0524 0.041 0.0485 0.0366 0.0439 0.0317 0.03 0.0276 0.0287 0.02970.0463 0.0392 0.0461 0.0337 0.0411 0.0296 0.0287 0.0259 0.0273 0.02730.0476 0.0414 0.0433 0.0337 0.0408 0.0312 0.0295 0.0273 0.0275 0.02830.0454 0.0388 0.0406 0.0342 0.0383 0.0298 0.0284 0.026 0.0272 0.02770.048 0.0393 0.0392 0.033 0.0386 0.0294 0.0292 0.0261 0.028 0.02760.0457 0.0399 0.037 0.0317 0.0377 0.0293 0.0279 0.0254 0.0268 0.02610.0459 0.0368 0.0366 0.0318 0.0361 0.028 0.0264 0.0257 0.0266 0.0250.0459 0.0378 0.0375 0.0313 0.0367 0.0287 0.0267 0.026 0.0268 0.02610.0443 0.0373 0.038 0.0314 0.0357 0.0281 0.0274 0.0259 0.026 0.02560.0452 0.0371 0.036 0.0314 0.0357 0.0288 0.028 0.0264 0.0268 0.02590.0442 0.0376 0.0374 0.0321 0.0378 0.0287 0.0281 0.0257 0.0267 0.02660.0455 0.0372 0.0377 0.0323 0.0369 0.0287 0.0282 0.0256 0.026 0.0262