Learning Ensembles of Convolutional Neural...

Learning Ensembles of Convolutional Neural Networks

Liran Chen!Faculty mentor: Greg Shakhnarovich

Motivation and Data

Build a model/set of models to efficiently classified the images!!!Mixed National Institute of Standards and Technology database (MNIST) is a large database of handwritten zip code digits provided by the U.S. Postal Service that is commonly used for training various image processing systems. !The database contains 60,000 training images and 10,000 testing images.

Convolutional Neural Network

• Inspired by biological processes!!•A type of feed-forward artificial neural network!!• Individual neurons are tiled in such a way that they respond to overlapping regions in the visual field!!•Widely used models for image recognition/classification!!•Composed of convolutional layers, fully-connected layers, softmax layer, etc.

Training Procedure

Gradient Descent!“Hill climbing” •Batch !

•Mini-Batch!•Stochastic!•Online

•Epoch!!Training result depends on stochastic procedure!!•Non-convex model !!•Randomness comes from

Training Procedure

Order of data in each epoch!!!Initialization of the parameters!!!Learning rates

Ensemble LearningA method for independently generating multiple versions of a predictor network and using them to get an aggregated prediction.

(Krizhevsky et al., 2012)!

Five convolutional layers and three are fully- connected layers!

Averaging the predictions of five similar CNNs gives an error rate of 16.4%, which is reduced from 18.2%. !

!With an extra sixth convolutional layer over the last pooling layer and then “fine-tuning” it on ILSVRC-2012 gives an error rate of 16.6%. Averaging the predictions of seven such CNNs reduce the error to 15.4%!

(Zeiler & Fergus, 2013)!

(Naftaly, et al., 1996)!!

Autoregression model!Q is the number of model used in ensemble. Increasing Q contributes to reducing the error.

Ensemble Learning

30 sets of CNN, each is trained from 1 to 20 epochs.!!Increasing epochs doesn’t contribute to the reduce of error

Ensemble Learning

Increasing number of averaged networks contributes to the reduce of error!

!After a threshold, cannot gain accuracy anymore.

Ensemble Learning

The tradeoff between number of models and their complexity!!Fix nnet*epoch to be 30!!Provided enough machines, training time is reduced while gaining training accuracy.

Ensemble Learning

New Softmax Layer!!!!!!!!!

• Use stack Probabilities from training to train new softmax layer

Bagging!!• The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets!!

• Aggregate the predictors by voting

(Breiman, 1994)

Advantage of Ensemble Learning

!!Gain accuracy!!!Save time

Theoretical Work

Appendix I

Appendix II

Reducing the units in fully connected layer from 30 to 1.!!Increasing epochs does contributes to the reducing of testing error.

Ensemble the networks

Fix the total epochs trained,! i.e., nnet*epoch = constant

0.5014 0.575 0.4377 0.4502 0.4655 0.3711 0.3704 0.3366 0.336 0.33920.2167 0.2777 0.2074 0.2183 0.2402 0.1545 0.1801 0.1489 0.1328 0.150.2363 0.2245 0.2892 0.2169 0.183 0.1232 0.1318 0.1229 0.1425 0.15430.1141 0.1131 0.1813 0.1133 0.1145 0.0577 0.0601 0.0538 0.0561 0.05940.085 0.0952 0.1504 0.0769 0.0869 0.0468 0.0469 0.043 0.0432 0.04410.0765 0.0775 0.1281 0.0689 0.0867 0.043 0.0415 0.0381 0.039 0.04050.0649 0.0713 0.1259 0.0795 0.0864 0.0405 0.0383 0.036 0.036 0.03550.0689 0.0697 0.1237 0.0735 0.0989 0.0397 0.0375 0.035 0.0362 0.03560.0652 0.0645 0.0989 0.0596 0.0989 0.0381 0.0354 0.0328 0.0334 0.03390.0664 0.0576 0.0846 0.056 0.086 0.0366 0.0347 0.0314 0.034 0.03290.0667 0.0556 0.0824 0.0535 0.1025 0.0355 0.0326 0.0304 0.0306 0.03110.0642 0.0504 0.0781 0.0517 0.0895 0.0323 0.0326 0.0297 0.0297 0.02880.06 0.0443 0.0697 0.0425 0.0704 0.0305 0.0291 0.0266 0.0275 0.0266

0.0582 0.0448 0.0639 0.0425 0.0583 0.031 0.0293 0.0265 0.0271 0.02680.0522 0.0416 0.0612 0.0426 0.0501 0.0301 0.0296 0.0268 0.0275 0.02680.0526 0.0433 0.0559 0.0428 0.0464 0.031 0.0295 0.0266 0.028 0.02730.0526 0.0431 0.0556 0.0412 0.0453 0.0326 0.0305 0.0285 0.0295 0.02860.0522 0.0407 0.049 0.039 0.0428 0.03 0.0288 0.0266 0.0276 0.02770.0524 0.041 0.0485 0.0366 0.0439 0.0317 0.03 0.0276 0.0287 0.02970.0463 0.0392 0.0461 0.0337 0.0411 0.0296 0.0287 0.0259 0.0273 0.02730.0476 0.0414 0.0433 0.0337 0.0408 0.0312 0.0295 0.0273 0.0275 0.02830.0454 0.0388 0.0406 0.0342 0.0383 0.0298 0.0284 0.026 0.0272 0.02770.048 0.0393 0.0392 0.033 0.0386 0.0294 0.0292 0.0261 0.028 0.02760.0457 0.0399 0.037 0.0317 0.0377 0.0293 0.0279 0.0254 0.0268 0.02610.0459 0.0368 0.0366 0.0318 0.0361 0.028 0.0264 0.0257 0.0266 0.0250.0459 0.0378 0.0375 0.0313 0.0367 0.0287 0.0267 0.026 0.0268 0.02610.0443 0.0373 0.038 0.0314 0.0357 0.0281 0.0274 0.0259 0.026 0.02560.0452 0.0371 0.036 0.0314 0.0357 0.0288 0.028 0.0264 0.0268 0.02590.0442 0.0376 0.0374 0.0321 0.0378 0.0287 0.0281 0.0257 0.0267 0.02660.0455 0.0372 0.0377 0.0323 0.0369 0.0287 0.0282 0.0256 0.026 0.0262

Learning Ensembles of Convolutional Neural...

Documents

Linear System Theory and Design - Chi-Tsong Chen.pdf

REU Final PowerPoint

REU Communications Class

Reu Epistle)

REU Sloth Poster

REU Poster rev1

PROPOSAL REU

DIMACS REU

Chris REU poster_draft

REU Research: Microrobotics

REU TALK – June 14,2011 Signal Processing REU talk 14jun11 Phil Perillat

Poetic thought - The Tantric synthesis of Dzogs Chen.pdf

Probabilistic Attainability Maps: Efﬁciently Predicting Driver …mobile/Papers/2014IVS_ondruska.pdf · 2014-06-10 · Probabilistic Attainability Maps: Efﬁciently Predicting

International REU 2013

Magazine de Reu

Research Experience for Undergraduates (REU)coen.boisestate.edu/mse-reu/files/2015/02/2014-REU-in-Materials... · Research Experience for Undergraduates (REU) ... 2 2. AcOH Synthesis

INFINITY 16 REU-V1616W V1200 REU-V1620W INFINITY … manual... · INFINITY 16 REU-V1616W V1200 REU-V1620W INFINITY 18 REU-V2018W INFINITY 20 REU-V2020W Rinnai High Capacity Continuous

REU Report I

REU Week III

REU Project