Upload
petia-radeva
View
280
Download
0
Embed Size (px)
Citation preview
Presentacin de PowerPoint
Deep Learning for Food Analysis Petia Radeva
www.cvc.uab.es/~petia
Computer Vision at UB (CVUB), Universitat de Barcelona &
Medical Imaging Laboratory, Computer Vision Center
IndexMotivation
Learning and Deep learning
Deep learning for food analysis
Lifelogging209:31AMiTANS16, Albena, 26 of June, 2016
Metabolic diseases and health3
09:31AMiTANS16, Albena, 26 of June, 20164.2 million die of chronic diseases in Europe (diabetesor cancer) linked to lack of physical activities and unhealthy diet.Physical activitiescan increase lifespan by 1.5-3.7 years.Obesity is a chronic disease associated with huge economic, social and personal costs. Risk factors for cancers, cardiovascular and metabolic disorders and leading causes of premature mortality worldwide.
Health and medical careToday, 88% of U.S. healthcare dollars are spent on medical care access to physicians, hospitals, procedures, drugs, etc.
However, medical care only accounts for approximately 10% of a persons health.
Approximately half the decline in U.S. Deaths from coronary heart disease from 1980 through 2000 may be attributable to reductions in major risk factors (systolic blood pressure, smoking, physical inactivity).
4
09:31AMiTANS16, Albena, 26 of June, 2016
Health and medical careRecent data shows evidence of stagnation that may be explained by the increases in obesity and diabetes prevalence.Healthcare resources and dollars must now be dedicated to improving lifestyle and behavior. 5
09:31AMiTANS16, Albena, 26 of June, 2016
Why food analysis?Today, measuring physical activities is not a problem.
But what about food and nutrition? Nutritional health apps are based on food diaries6
09:31AMiTANS16, Albena, 26 of June, 2016
Two main questions?What we eat?
Automatic food recognition vs. Food diaries
And how we eat?
Automatic eating pattern extraction when, where, how, how long, with whom, in which context?
Lifelogging7
09:31AMiTANS16, Albena, 26 of June, 2016
IndexMotivation
Learning and Deep learning
Deep learning for food analysis
Lifelogging809:31AMiTANS16, Albena, 26 of June, 2016
Why Learn?Machine learning consists of:Developing models, methods and algorithms to make computers learn i.e. take decision. Training from big amount of example data.Learning is used when:Humans are unable to explain their expertise (speech recognition)Human expertise does not exist (navigating on Mars),Solution changes in time (routing on a computer network)Solution needs to be adapted to particular cases (user biometrics)Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce. Example in retail: Customer transactions to consumer behavior: People who bought Da Vinci Code also bought The Five People You Meet in Heaven (www.amazon.com)Build a model that is a good and useful approximation to the data.
909:31AMiTANS16, Albena, 26 of June, 2016
Growth of Machine LearningThis trend is accelerating due to:
Big data and data science today are a realityImproved data capture, networking, faster computersNew sensors / IO devices / Internet of ThingsSoftware too complex to write by handDemand for self-customization to userIt turns out to be difficult to extract knowledge from human expertsfailure of expert systems in the 1980s.Improved machine learning algorithms
AMiTANS16, Albena, 26 of June, 20161009:31
Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style Character recognition: Different handwriting styles.Speech recognition: Temporal dependency. Use of a dictionary or the syntax of the language. Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for speechMedical diagnosis: From symptoms to illnessesWeb Advertizing: Predict if a user clicks on an ad on the Internet.
10
09:31AMiTANS16, Albena, 26 of June, 201611
Deep leearning everywhere
1209:31AMiTANS16, Albena, 26 of June, 2016
Deep learning applications
1309:31AMiTANS16, Albena, 26 of June, 2016
Other methods also use unsupervised pre-training to structure a neural network, making it first learn generally useful feature detectors. Then the network is trained further by supervised back-propagation to classify labeled data. The deep model of Hinton et al. (2006) involves learning the distribution of a high level representation using successive layers of binary or real-valued latent variables. It uses a restricted Boltzmann machine to model each new layer of higher level features. Each new layer guarantees an increase on the lower-bound of the log likelihood of the data, thus improving the model, if trained properly. Once sufficiently many layers have been learned the deep architecture may be used as a generative model by reproducing the data when sampling down the model (an "ancestral pass") from the top level feature activations.[8] Hinton reports that his models are effective feature extractors over high-dimensional, structured data.[9]
Natural Language Processing which is used heavily in language conversion in chat rooms or processing text from where human speeches.Optical Character Recognition which is scanning of images. It's gaining traction lately to read an image and extract text out of it and correlate to the objects found on imageSpeech Recognition applications like Siri or Cortana needs no introductionArtificial Intelligence induction to different robots for automating at least a minute level of tasks a human can do. We want them to be a little smarter. Drug discovery though medical imaging-based diagnosis using deep learning. It's kind of in early stages now. Check Butterfly Network for the work they are doing.CRM needs for companies are growing day by day. There are hundreds of thousands of companies around the globe from small to big companies who wants to know their potential customers. Deep Learning has provided some outstanding results. Check for companies like RelateIQ (product) who has seen astounding success of using Machine Learning in this area.
13
Formalization of learningConsider: training examples: D= {z1, z2, .., zn} with the zi being examples sampled from an unknown process P(Z); a model f and a loss functional L(f,Z) that returns a real-valued scalar.
Minimize the expected value of L(f,Z) under the unknown generating process P(Z).Supervised Learning: each example is an (input,target) pair: Z=(X,Y)
classification: Y is a finite integer (e.g. a symbol) corresponding to a class index, and we often take as loss function the negative conditional log-likelihood, with the interpretation that fi(X) estimates P(Y=i|X):L(f,(X,Y)) = -log fi(X), where fi(X)>=0, i fi(X) = 1.
1409:31AMiTANS16, Albena, 26 of June, 2016
Classification/RecognitionIs this an urban or rural area?
Input: xOutput: y {-1,+1}From: M. Pawan Kumar
Which city is this?Output: y {1,2,,C}Binary classificationMulti-class classification09:31AMiTANS16, Albena, 26 of June, 201615
Object Detection and segmentationWhere is the object in the image?Output: y {Pixels}
From: M. Pawan Kumar
What is the semantic class of each pixel?Output: y {1,2,,C}|Pixels|
carroadgrasstreesky09:31AMiTANS16, Albena, 26 of June, 201616
A Simplified View of the PipelineInputxFeatures(x)
Scoresf((x),y)Extract FeaturesComputeScores
maxy f((x),y)Predictiony(f)Learn fFrom: M. Pawan Kumar
09:31AMiTANS16, Albena, 26 of June, 201617
Learning ObjectiveData distribution P(x,y)Predictionf* = argminf EP(x,y) Error(y(f),y)Ground TruthMeasure of prediction quality (error, loss)Distribution is unknownExpectation overdata distributionFrom: M. Pawan Kumar
09:31AMiTANS16, Albena, 26 of June, 201618
Learning ObjectiveTraining data {(xi,yi), i = 1,2,,n}Predictionf* = argminf EP(x,y) Error(y(f),y)Ground TruthMeasure of prediction qualityExpectation overdata distributionFrom: M. Pawan Kumar
09:31AMiTANS16, Albena, 26 of June, 201619
Learning ObjectiveTraining data {(xi,yi), i = 1,2,,n}Predictionf* = argminf i Error(yi(f),yi)Ground TruthMeasure of prediction qualityExpectation overempirical distributionFinite samplesFrom: M. Pawan Kumar
09:31AMiTANS16, Albena, 26 of June, 201620
The problem of image classification
21From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:31AMiTANS16, Albena, 26 of June, 2016
Dual representation of images as points/vectors2209:32AMiTANS16, Albena, 26 of June, 2016
32x32x3 D vectorEach image of M rows by N columns by C channels ( 3 for color images) can be considered as a vector/point in RMxNxC and viceversa.
Linear Classier and key classification components09:3223Given two classes how to learn a hyperplane to separate them?
To find the hyperplane we need to specify :Score functionLoss functionOptimizationAMiTANS16, Albena, 26 of June, 2016
Interpreting a linear classifier
24From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 201632x32x3 D vector
General learning pipeline
25From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 2016Training consists of constructing the prediction model f according to a training set.
The problem of image classification
26From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 2016
Parametric approach: linear classifier
27From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 2016Score function:
Loss function/optimization
28From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 2016The score function
Image classification
29From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 2016
Loss function and optimisationQuestion: if you were to assign a single number to how unhappy you are with these scores, what would you do?09:3230
Question : Given the score and the loss function, how to find the parameters W?
AMiTANS16, Albena, 26 of June, 2016
Interpreting a linear classifier
31From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 201610x3072
Why is a CNN doing deep learning?
32From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 2016
where fi=jwij * xj
w1nf1f2fmx1x2xnw11w12
Activation functions of NN
33From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 2016
Setting the number of layers and their size
34Neurons arranged into fully-connected layers
Bigger = better (but might have to regularize more strongly).
How many parameters to learn?From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 2016
Why a CNN is neural network?
35From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 2016
Architecture of neural networks
09:3236Modern CNNs: ~10 million neuronsHuman visual cortex: ~5 billion neurons
AMiTANS16, Albena, 26 of June, 2016
Activation functions of NN
37From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 2016
Exponential linear units- ELU all benefits of relu, does not die, closer to zero meanoutputs, but computation requires exp()37
What is it a Convolutional Neural Network?
09:3238
AMiTANS16, Albena, 26 of June, 2016
Convolutional and Max-pooling layer
09:3339
Convolutional layerMax-pool layer
How does the CNN work?
09:3240AMiTANS16, Albena, 26 of June, 2016
Example architecture
09:3241The trick is to train the weights such that when the network sees a picture of a truck, the last layer will say truck.AMiTANS16, Albena, 26 of June, 2016
Training a CNN09:32AMiTANS16, Albena, 26 of June, 201642
The process of training a CNN consists of training all hyperparameters: convolutional matrices and weights of the fully connected layers.
Several millions pf parameters!!!
Learned convolutional filters
09:3243
AMiTANS16, Albena, 26 of June, 2016
Neural network training
44From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
Using the chain rule, optimize the parameters, W of the neural network by gradient descent and backpropagation.09:32AMiTANS16, Albena, 26 of June, 2016Optimization consists of training severalmillions of parameters!
Monitoring loss and accuracy
09:3245Looks linear? Learning rate too low!Decreases too slowly? Learning rate too high.Looks too noisy? Increases the batch size.
Big gap?
- you're overfitting, increase regularization!AMiTANS16, Albena, 26 of June, 2016
Transfer learning
46From: Fei-Fei Li & Andrej Karpathy & Justin Johnson09:32AMiTANS16, Albena, 26 of June, 2016
Imagenet
09:3247
AMiTANS16, Albena, 26 of June, 2016
1001 benefits of CNNTransfer learning: Fine tunning for object recognitionReplace and retrain the classier on top of the ConvNetFine-tune the weights of the pre-trained network by continuing the backpropagationFeature extraction by CNNObject detectiionObject segmentation
Image similarity and matching by CNN
09:3248
Convolutional Neural Networks (4096 Features)AMiTANS16, Albena, 26 of June, 2016
ConvNets are everywhere
49From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
09:32AMiTANS16, Albena, 26 of June, 2016
ConvNets are everywhere
50From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
09:32AMiTANS16, Albena, 26 of June, 2016
ConvNets are everywhere
51From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
09:32AMiTANS16, Albena, 26 of June, 2016
ConvNets are everywhere
52From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
09:32AMiTANS16, Albena, 26 of June, 2016
ConvNets are everywhere
53From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
09:32AMiTANS16, Albena, 26 of June, 2016
IndexMotivation
Learning and Deep learning
Deep learning for food analysis
Lifelogging5409:32AMiTANS16, Albena, 26 of June, 2016
Automatic food analysis55Can we automatically recognize food?To detect every instance of a dish in all of its variants, shapes and positions and in a large number of images.
The main problems that arise are:Complexity and variability of the data.Huge amounts of data to analyse.
09:32AMiTANS16, Albena, 26 of June, 2016
Automatic Food AnalysisFood detectionFood recognitionFood environment recognitionEating pattern extraction56
09:32AMiTANS16, Albena, 26 of June, 2016
Food datasets
57Food256- 25.600 images (100 images/class)Classes: 256
Food101 101.000 images (1000 images/class)Classes: 101Food101+FoodCAT: 146.392 (101.000+45.392)Classes: 131EgocentricFood: 5038 imagesClasses: 909:32AMiTANS16, Albena, 26 of June, 2016
Food localization and recognition
58General scheme of our food localization and recognition proposal09:32AMiTANS16, Albena, 26 of June, 2016
Food localization
59
Examples of localization and recognition on UECFood256 (top) and EgocentricFood (bottom). Ground truth is shown in green and our method in blue.09:32AMiTANS16, Albena, 26 of June, 2016
Image InputFoodness MapExtractionFood Detection CNN
Food Recognition CNNFood TypeRecognitionAppleStrawberryFood recognitionResults: TOP-1 74.7%TOP-5 91.6%SoA (Bossard,2014): TOP-1 56,4%09:32AMiTANS16, Albena, 26 of June, 201660
Demo
6109:32AMiTANS16, Albena, 26 of June, 2016
Food environment classification62BakeryBanquet hallBarButcher shopCafeteraIce cream parlorKitchenKitchenetteMarketPantryPicnic AreaRestaurantRestaurant KitchenRestaurant PatioSupermarketCandy storeCoffee shopDinetteDining roomFood courtGalley
Classification results:0.92- Food-related vs. Non-food-related0.68 - 22 classes of Food-related categories 09:32AMiTANS16, Albena, 26 of June, 2016
IndexMotivation
Learning and Deep learning
Deep learning for food analysis
Lifelogging6309:32AMiTANS16, Albena, 26 of June, 2016
Wearable cameras and the life-logging trend
64Shipments of wearable computing devices worldwide by category from 2013 to 2015 (in millions)
09:32AMiTANS16, Albena, 26 of June, 2016
Life-logging dataWhat we have:
09:3265
AMiTANS16, Albena, 26 of June, 2016
Wealth of life-logging dataWe propose an energy-based approach for motion-based event segmentation of life-logging sequences of low temporal resolution- The segmentation is reached integrating different kind of image features and classifiers into a graph-cut framework to assure consistent sequence treatment.09:32AMiTANS16, Albena, 26 of June, 201666Complete dataset of a day captured with SenseCam (more than 4,100 imagesChoice of devise depends on: 1) where they are set: a hung up camera has the advantage that is considered more unobtrusive for the user, or 2) their temporal resolution: a camera with a low fps will capture less motion information, but we will need to process less data.We chose a SenseCam or Narrative - cameras hung on the neck or pinned on the dress that capture 2-4 fps.
100.000 images per month1 TB in 3 yearsOr the hell of life-logging data
Visual Life-logging data
Events to be extracted from life-logging images67
The camera captures up to 2000 images per day, around 100.000 images per month. Applying Computer Vision algorithms we are able to extract the diary of the person:
Activities he/she has doneInteractions he/she has participatedEvents he/she has taken partDuties he/she has performedEnvironments and places he/she visited, etc.09:32AMiTANS16, Albena, 26 of June, 2016
Towards healthy habitsTowards visualizing summarized lifestyle data to ease the management of the users healthy habits (sedentary lifestyles, nutritional activity, etc.).
09:32AMiTANS16, Albena, 26 of June, 201668
ConclusionsHealthy habits one of the main health concern for people, society, and governments
Deep learning a technology that came to stay
A new technological trend with huge power
Specially useful for food recognition and analysis
Lifelogging a unexplored technology that hides big potential to help people monitor and describe their behaviour and thus improve their lifestyle.
6909:32AMiTANS16, Albena, 26 of June, 2016
Thank you for your attention!
09:3270AMiTANS16, Albena, 26 of June, 2016
70
Deep learning applications
7109:32AMiTANS16, Albena, 26 of June, 2016
Medical applications - there are tremendous advances in robotic surgery that relies on extremely sensitive tactile equipment. However, if a doctor can advise a robot to "move a fraction of a millimeter to the left of the clavicle" they could potentially gain more control by directing the robot via full understood voice control.Automotive - we are already seeing self driving cars; deep learning will possibly integrate into automated driving systems to detect and interpret sights and sounds that might be beyond the capacity of humans.Military - drones are particularly well suited to deep learning.Surveillance - here too drones will play a role, but the idea of computers that are able to sense and interpret with a human-like degree of accuracy will change the way in which surveillance is done.
71