34
Miranda Tilton, M.S. Miranda Tilton, M.S. APPLICATIONS OF A CNN APPLICATIONS OF A CNN FOR AUTOMATIC FOR AUTOMATIC CLASSIFICATION OF CLASSIFICATION OF OUTSOLE FEATURES OUTSOLE FEATURES Feb 20, 2020 Feb 20, 2020

APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Miranda Tilton, M.S.Miranda Tilton, M.S.

APPLICATIONS OF A CNNAPPLICATIONS OF A CNNFOR AUTOMATICFOR AUTOMATIC

CLASSIFICATION OFCLASSIFICATION OF

OUTSOLE FEATURESOUTSOLE FEATURES

Feb 20, 2020Feb 20, 2020

Page 2: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

2 / 342 / 34

Page 3: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

What is the probability ofWhat is the probability ofa coincidental match?a coincidental match?

3 / 343 / 34

Page 4: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Given a particular shoe outsole...

1. De�ne the comparison population

2. Sample shoes from the comparison population

3. Count the number of similar shoes from the comparisonpopulation that are similar to the given shoe

4. Estimate the probability of a coincidental match:

What is the probability ofWhat is the probability ofa coincidental match?a coincidental match?

4 / 344 / 34

Page 5: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

No 100% complete database of all shoes

manufacturer, model, size, tread style, manufacturing molds

Shoe purchases vs. frequency of wear

Local populations may differ wildly (Benedict, et al., 2014)

Obstacles: CharacterizingObstacles: CharacterizingComparison PopulationsComparison Populations

5 / 345 / 34

Page 6: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

How to collect data from the comparison population?

1. Build a low pro�le scanner, place in a high traf�c area

2. Scan shoes of those walking past

3. Create a local-area database of relevant scans

This is an engineering problem

Comparison PopulationComparison Population

Goal:

6 / 346 / 34

Page 7: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Assume a machine exists that can scan shoe outsoles of pedestrians

1. Identify relevant features within the scans

Comparison PopulationComparison Population7 / 347 / 34

Page 8: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Assume a machine exists that can scan shoe outsoles of pedestrians

1. Identify relevant features within the scans

2. De�ne similarity for shoe images

Comparison PopulationComparison Population8 / 348 / 34

Page 9: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Assume a machine exists that can scan shoe outsoles of pedestrians

1. Identify relevant features within the scans

2. De�ne similarity for shoe images

3. Assess the frequency of similar shoes in the sampled data

Comparison PopulationComparison Population9 / 349 / 34

Page 10: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Use features other than make/model and size to characterize shoes

Knockoffs often have very similar tread patternsSimilar styles have similar tread patterns across brandsUnknown shoes can still be classi�ed and assessed

Dr. Martens Eastland Timberland

Work 2295 Rigger 1955 Edition Jett 6" Premium Boot

Relevant FeaturesRelevant Features10 / 3410 / 34

Page 11: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Bowtie Chevron Circle

Line Polygon Quadrilateral

Star Text Triangle

Used to separate shoes by make/model in (small) local samples(Gross, et al., 2013)

Relevant FeaturesRelevant Features11 / 3411 / 34

Page 12: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

IMAGE ANALYSIS ANDIMAGE ANALYSIS AND

FEATURE DETECTIONFEATURE DETECTION

12 / 3412 / 34

Page 13: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Goal: Identify geometric tread features inGoal: Identify geometric tread features inimages of shoe outsolesimages of shoe outsoles

Robust to different lighting conditions, rotation, image quality

Fast processing of new images

Identify features that are explainable to practitioners

Image AnalysisImage Analysis13 / 3413 / 34

Page 14: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Our eyes detect features in an image, and our brains learn toconnect features with labels after seeing many examples.

Analogy: Human Classi�cationAnalogy: Human Classi�cation14 / 3414 / 34

Page 15: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Analogy: Human Classi�cationAnalogy: Human Classi�cation15 / 3415 / 34

Page 16: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

CONVOLUTIONAL NEURALCONVOLUTIONAL NEURAL

NETWORKSNETWORKS

16 / 3416 / 34

Page 17: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

CNN ArchitectureCNN Architecture17 / 3417 / 34

Page 18: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Very deep CNNs can require > 1 million images to optimizeperformance

Using a model base trained on different input data can greatlyreduce the required number of images

Our approach

Use pretrained convolutional base: VGG16Train a new model head

VGG16

Pre-trained CNN (Simonyan, et al., 2014)

Trained on 1.3 million images from ImageNet(Krizhevsky, et al., 2012)

Simple structure

Transfer LearningTransfer Learning18 / 3418 / 34

Page 19: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

FITTING CONNOR:FITTING CONNOR:CONVOLUTIONAL NEURALCONVOLUTIONAL NEURAL

NETWORK FOR OUTSOLENETWORK FOR OUTSOLE

RECOGNITIONRECOGNITION

19 / 3419 / 34

Page 20: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Acquire DataAcquire Data

ShoeScrapeRpackage

over 80,000images scrapedsince April 2018

20 / 3420 / 34

Page 21: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

LabelMe Annotation Tool used as a web interface - creates XML�les with labels and coordinates. (Russell, et al., 2008)

27,710 regions labeled with one or more geometric objects

37,562 labels

Label DataLabel Data

Labeling courtesy of

Jenny KimBen WonderlinMya FisherHolden JudMiranda TiltonCharlotte RoigerSusanVanderPlasJoe Zemmelsand others

21 / 3421 / 34

Page 22: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Label DataLabel Data22 / 3422 / 34

Page 23: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

256 x 256 pixel images

Training data (60%):

1x Augmented images (rotation, skew, zoom, crop) toprevent over�ttingClass weights used to counteract uneven class sizes

Validation and test data (20% each)

Fit using the keras package in R, which provides a high-levelAPI for the tensorflow library

Model TrainingModel Training

23 / 3423 / 34

Page 24: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Model TrainingModel Training

Binary Cross-entropyLoss:

24 / 3424 / 34

Page 25: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Evaluating the ModelEvaluating the Model25 / 3425 / 34

Page 26: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Evaluating the ModelEvaluating the Model26 / 3426 / 34

Page 27: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Evaluating the ModelEvaluating the Model

For multi-labelimages, onlyincorrectpredictionscontribute tooff-diagonalprobabilities

used asthe cutoff

27 / 3427 / 34

Page 28: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Heatmaps are scaled by class. Yellow = high activation

Interpreting the modelInterpreting the modelClass Activation MapsClass Activation Maps

Blue: Predictionmatches imagelabel

Grey: Predictiondoes not matchimage label

28 / 3428 / 34

Page 29: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Heatmaps are scaled by class. Yellow = high activation

Interpreting the modelInterpreting the modelClass Activation MapsClass Activation Maps

Blue: Predictionmatches imagelabel

Grey: Predictiondoes not matchimage label

29 / 3429 / 34

Page 30: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Heatmaps are scaled by class. Yellow = high activation

Interpreting the modelInterpreting the modelClass Activation MapsClass Activation Maps

Blue: Predictionmatches imagelabel

Grey: Predictiondoes not matchimage label

30 / 3430 / 34

Page 31: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

Geometric shapes provide a convenient feature space forassessing shoe similarity

Transfer learning allows application of CNNs to much smallerdatasets

CoNNOR performs well

Reduction in feature space: 256 x 256 x 3 -> 9

88% accuracy; many errors attributable to data labeling

Project SummaryProject Summary31 / 3431 / 34

Page 32: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

[1] I. Benedict, E. Corke, R. Morgan-Smith, et al. “Geographical variation of shoeprint comparisonclass correspondences”. In: Science and Justice 54.5 (2014), pp. 335–337.

[2] S. Gross, D. Jeppesen, and C. Neumann. “The variability and signi�cance of class characteristicsin footwear impressions”. In: Journal of Forensic Identi�cation 63.3 (2013), p. 332.

[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton. “ImageNet Classi�cation with Deep ConvolutionalNeural Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira, C. J.C. Burges, L. Bottou and K. Q. Weinberger. Curran Associates, Inc., 2012, pp. 1097–1105.

[4] B. C. Russell, A. Torralba, K. P. Murphy, et al. “LabelMe: A Database and Web-Based Tool forImage Annotation”. En. In: International Journal of Computer Vision 77.1-3 (May. 2008). 02464, pp.157–173. ISSN: 0920-5691, 1573-1405. DOI: 10.1007/s11263-007-0090-8. URL:http://link.springer.com/10.1007/s11263-007-0090-8.

[5] K. Simonyan and A. Zisserman. “Very Deep Convolutional Networks for Large-Scale ImageRecognition”. En. In: arxiv.org (Sep. 2014). URL: https://arxiv.org/abs/1409.1556.

ReferencesReferences32 / 3432 / 34

Page 33: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

R Packages and Toolkits:

Modeling: keras, tensorflowData Wrangling: magrittr, dplyr, lubridate, stringr,tidyr, purrr, furrrImage Processing: jpeg, imager, magickAnnotation Manipulation: sf, spVisualization: ggplot2, viridis, ggcorrplot, deepviz,tidygraph, ggraph, shinyXML/Web Scraping: xml2, XML, rvest, RSeleniumSlides/Documentation: rmarkdown, xaringan, knitr

Other Software: Docker, Selenium, LabelMe Annotation Tool (w/Matlab toolbox), gimp image editor

ToolsTools33 / 3433 / 34

Page 34: APPLICATIONS OF A CNN A C O F - Center for Statistics and ... · R Packages and Toolkits: Modeling: keras, tensorflow Data Wrangling: magrittr, dplyr, lubridate, stringr, tidyr, purrr,

THANK YOUTHANK YOU

34 / 3434 / 34