Learning Ensembles of Convolutional Neural Networkstheorycenter.cs.uchicago.edu/REU/2014/presentations/chen.pdf · Motivation and Data Build a model/set of models to efﬁciently

Learning Ensembles of Convolutional Neural Networks

Liran Chen!Faculty mentor: Greg Shakhnarovich

Motivation and Data

Build a model/set of models to efficiently classified the images!!!Mixed National Institute of Standards and Technology database (MNIST) is a large database of handwritten zip code digits provided by the U.S. Postal Service that is commonly used for training various image processing systems. !The database contains 60,000 training images and 10,000 testing images.

Convolutional Neural Network

• Inspired by biological processes!!•A type of feed-forward artificial neural network!!• Individual neurons are tiled in such a way that they respond to overlapping regions in the visual field!!•Widely used models for image recognition/classification!!•Composed of convolutional layers, fully-connected layers, softmax layer, etc.

Training Procedure

Gradient Descent!“Hill climbing” •Batch !

•Mini-Batch!•Stochastic!•Online

•Epoch!!Training result depends on stochastic procedure!!•Non-convex model !!•Randomness comes from

Training Procedure

Order of data in each epoch!!!Initialization of the parameters!!!Learning rates

Ensemble LearningA method for independently generating multiple versions of a predictor network and using them to get an aggregated prediction.

(Krizhevsky et al., 2012)!

Five convolutional layers and three are fully- connected layers!

Averaging the predictions of five similar CNNs gives an error rate of 16.4%, which is reduced from 18.2%. !

!With an extra sixth convolutional layer over the last pooling layer and then “fine-tuning” it on ILSVRC-2012 gives an error rate of 16.6%. Averaging the predictions of seven such CNNs reduce the error to 15.4%!

!!!

(Zeiler & Fergus, 2013)!

!!!

(Naftaly, et al., 1996)!!

Autoregression model!Q is the number of model used in ensemble. Increasing Q contributes to reducing the error.

Ensemble Learning

30 sets of CNN, each is trained from 1 to 20 epochs.!!Increasing epochs doesn’t contribute to the reduce of error

Ensemble Learning

Increasing number of averaged networks contributes to the reduce of error!

!After a threshold, cannot gain accuracy anymore.

Ensemble Learning

The tradeoff between number of models and their complexity!!Fix nnet*epoch to be 30!!Provided enough machines, training time is reduced while gaining training accuracy.

Ensemble Learning

New Softmax Layer!!!!!!!!!

• Use stack Probabilities from training to train new softmax layer

Bagging!!• The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets!!

• Aggregate the predictors by voting

(Breiman, 1994)

Advantage of Ensemble Learning

!!Gain accuracy!!!Save time

Theoretical Work

Theoretical Work

Appendix I

Appendix II

Reducing the units in fully connected layer from 30 to 1.!!Increasing epochs does contributes to the reducing of testing error.

Ensemble the networks

Fix the total epochs trained,! i.e., nnet*epoch = constant

0.5014 0.575 0.4377 0.4502 0.4655 0.3711 0.3704 0.3366 0.336 0.33920.2167 0.2777 0.2074 0.2183 0.2402 0.1545 0.1801 0.1489 0.1328 0.150.2363 0.2245 0.2892 0.2169 0.183 0.1232 0.1318 0.1229 0.1425 0.15430.1141 0.1131 0.1813 0.1133 0.1145 0.0577 0.0601 0.0538 0.0561 0.05940.085 0.0952 0.1504 0.0769 0.0869 0.0468 0.0469 0.043 0.0432 0.04410.0765 0.0775 0.1281 0.0689 0.0867 0.043 0.0415 0.0381 0.039 0.04050.0649 0.0713 0.1259 0.0795 0.0864 0.0405 0.0383 0.036 0.036 0.03550.0689 0.0697 0.1237 0.0735 0.0989 0.0397 0.0375 0.035 0.0362 0.03560.0652 0.0645 0.0989 0.0596 0.0989 0.0381 0.0354 0.0328 0.0334 0.03390.0664 0.0576 0.0846 0.056 0.086 0.0366 0.0347 0.0314 0.034 0.03290.0667 0.0556 0.0824 0.0535 0.1025 0.0355 0.0326 0.0304 0.0306 0.03110.0642 0.0504 0.0781 0.0517 0.0895 0.0323 0.0326 0.0297 0.0297 0.02880.06 0.0443 0.0697 0.0425 0.0704 0.0305 0.0291 0.0266 0.0275 0.0266

0.0582 0.0448 0.0639 0.0425 0.0583 0.031 0.0293 0.0265 0.0271 0.02680.0522 0.0416 0.0612 0.0426 0.0501 0.0301 0.0296 0.0268 0.0275 0.02680.0526 0.0433 0.0559 0.0428 0.0464 0.031 0.0295 0.0266 0.028 0.02730.0526 0.0431 0.0556 0.0412 0.0453 0.0326 0.0305 0.0285 0.0295 0.02860.0522 0.0407 0.049 0.039 0.0428 0.03 0.0288 0.0266 0.0276 0.02770.0524 0.041 0.0485 0.0366 0.0439 0.0317 0.03 0.0276 0.0287 0.02970.0463 0.0392 0.0461 0.0337 0.0411 0.0296 0.0287 0.0259 0.0273 0.02730.0476 0.0414 0.0433 0.0337 0.0408 0.0312 0.0295 0.0273 0.0275 0.02830.0454 0.0388 0.0406 0.0342 0.0383 0.0298 0.0284 0.026 0.0272 0.02770.048 0.0393 0.0392 0.033 0.0386 0.0294 0.0292 0.0261 0.028 0.02760.0457 0.0399 0.037 0.0317 0.0377 0.0293 0.0279 0.0254 0.0268 0.02610.0459 0.0368 0.0366 0.0318 0.0361 0.028 0.0264 0.0257 0.0266 0.0250.0459 0.0378 0.0375 0.0313 0.0367 0.0287 0.0267 0.026 0.0268 0.02610.0443 0.0373 0.038 0.0314 0.0357 0.0281 0.0274 0.0259 0.026 0.02560.0452 0.0371 0.036 0.0314 0.0357 0.0288 0.028 0.0264 0.0268 0.02590.0442 0.0376 0.0374 0.0321 0.0378 0.0287 0.0281 0.0257 0.0267 0.02660.0455 0.0372 0.0377 0.0323 0.0369 0.0287 0.0282 0.0256 0.026 0.0262

Documents

Learning Ensembles of Convolutional Neural Networkstheorycenter.cs.uchicago.edu/REU/2014/presentations/chen.pdf · Motivation and Data Build a model/set of models to efﬁciently