1
Results iPad Notifications Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications – Anand Joshi ([email protected]) Introduction Current surveillance and control systems in retail and elsewhere, still require human supervision and intervention. This work will try to provide a detection system in CCTV videos on real time basis, appropriate for; surveillance and control, inventory tracking, theft deterrence, threat perception and detection etc. and apply Machine Learning/Deep Learning techniques for real world applications. This will try to automate many tasks, which can be error prone otherwise due to human errors and fatigue. This solution can potentially have capability to provide real time alerts, notification on smart phones/tablets and provide rich data for analytics purpose. Dataset The Dataset Comprised of Color images in following categories: a) Every Day Objects found in retail environment, obtained from ImageNet. Over 1.2 million images used for training, divided in over 1000 classes. b) Guns and Knives: Knives Images Database, which contains 9340 negative examples and 3559 positive examples, Internet Movie Firearms Database, which contains 8557 images c) Human Hand: Hand Dataset which contains about 14700 hand images from various sources. EgoHands Dataset containing about 120000 images. 90% was used for training and 10% was used for validation. Some Image Samples Application Flow and System Setup Detection using CNN Match found ? yes Send Push Notification req using NODE.js Apple Push Notification Service Feature Extraction and Prediction using CNN Convolution Layer: a set of learnable filters (kernels).Represents a specific part of the image by preserving the spatial relationship between pixels. Pooling layer (subsampling) : reduces the dimensionality of each feature map but retains the most important information. Fully connected layer involves a softmax function which will help us make the prediction, by exponentiation and then normalizing the inputs. Its output represent the probabilities (confidence) of each class prediction. () = ( ) ( ) This is most appropriate for Image classification problem CNN Architecture: Inception-Resnet V2. It is more accurate than previous state of the art models. The Top-1 and Top-5 validation accuracies on the ILSVRC 2012 image classification benchmark based on a single crop of the image is 80.4 & 95.3 respectively. Deep Learning Framework – TensorFlow, for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. Training Epochs Solver Type Base Learning Rate 100 Stochastic Gradient Descent 0.0001 # Epoch Solver Learning Rate Accuracy 100 SGD 0.0001 99.97 REFERENCES [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012. [2] Scalable Object Detection using Deep Neural Networks Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov Google, Inc. [3] Automatic Handgun Detection Alarm in Videos Using Deep Learning Roberto Olmos, Siham Tabik, and Francisco Herrera [4] CCTV object detection with fuzzy classification and image enhancement, Andrzej MATIOLAŃSKI, Aleksandra MAKSIMOWA, Andrzej DZIECH, Multimedia Tools and Applications, 2015 [5] Automated Detection of Firearms and Knives in a CCTV Image, Michał Grega, Andrzej MATIOLAŃSKI, Piotr Guzik, Mikołaj Leszczuk, Sensors, ISSN 1424-8220 Future Despite Inception-ResNetV2 performing the best, I found that many predictions had a probability of 20% to 40%, even if these predictions were correct. The first step I would like to take is to increase the confidence in these predictions so that the model would be more well trained. This could be done my training it on more data or increasing the epochs when training the CNN. Also after developing an end-to-end Proof Of Concept solution, I strongly feel that it has the potential of becoming a commercially viable product

Real Time Monitoring of CCTV Camera Images …cs229.stanford.edu/proj2017/final-posters/5133020.pdfReal Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Real Time Monitoring of CCTV Camera Images …cs229.stanford.edu/proj2017/final-posters/5133020.pdfReal Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification

Results

iPadNotifications

RealTimeMonitoringofCCTVCameraImagesUsingObjectDetectorsandSceneClassificationforRetailandSurveillance

Applications– AnandJoshi([email protected])

IntroductionCurrentsurveillanceandcontrolsystemsinretailandelsewhere,stillrequirehumansupervisionandintervention.ThisworkwilltrytoprovideadetectionsysteminCCTVvideosonrealtimebasis,appropriatefor;surveillanceandcontrol,inventorytracking,theftdeterrence,threatperceptionanddetectionetc.andapplyMachineLearning/DeepLearningtechniquesforrealworldapplications.Thiswilltrytoautomatemanytasks,whichcanbeerrorproneotherwiseduetohumanerrorsandfatigue.Thissolutioncanpotentiallyhavecapabilitytoproviderealtimealerts,notificationonsmartphones/tabletsandproviderichdataforanalyticspurpose.

DatasetTheDatasetComprisedofColorimagesinfollowingcategories:a)EveryDayObjects foundinretailenvironment,obtainedfromImageNet.Over1.2millionimagesusedfortraining,dividedinover1000classes.b)GunsandKnives:KnivesImagesDatabase,whichcontains9340negativeexamplesand3559positiveexamples,InternetMovieFirearmsDatabase,whichcontains8557imagesc)HumanHand:HandDataset whichcontainsabout14700handimagesfromvarioussources.EgoHands Dataset containingabout120000images.90% wasusedfortraining and10% wasusedforvalidation.

SomeImageSamples

ApplicationFlowandSystemSetup

DetectionusingCNN

Matchfound? yes

SendPushNotificationreq usingNODE.js

ApplePushNotificationService

FeatureExtractionandPredictionusingCNN

ConvolutionLayer:asetoflearnablefilters(kernels).Representsaspecificpartoftheimagebypreservingthespatialrelationshipbetweenpixels.Poolinglayer(subsampling):reducesthedimensionalityofeachfeaturemapbutretainsthemostimportantinformation.Fullyconnectedlayerinvolvesasoftmax functionwhichwillhelpusmaketheprediction,byexponentiationandthennormalizingtheinputs.Itsoutputrepresenttheprobabilities(confidence)ofeachclassprediction.𝒔𝒐𝒇𝒕𝒎𝒂𝒙(𝒙)𝒊 = 𝒆𝒙𝒑(𝒙𝒊)

𝚺𝒋𝒆𝒙𝒑(𝒙𝒊)

ThisismostappropriateforImageclassificationproblem

CNNArchitecture:Inception-Resnet V2.Itismoreaccuratethanpreviousstateoftheartmodels.TheTop-1andTop-5validationaccuraciesonthe ILSVRC2012imageclassificationbenchmark basedonasinglecropoftheimageis80.4&95.3respectively.DeepLearningFramework– TensorFlow,fornumericalcomputationusingdataflowgraphs.Nodesinthegraphrepresentmathematicaloperations,whilethegraphedgesrepresentthemultidimensionaldataarrays(tensors)communicatedbetweenthem.

TrainingEpochs SolverType BaseLearningRate100 StochasticGradient

Descent0.0001

#Epoch Solver LearningRate

Accuracy

100 SGD 0.0001 99.97

REFERENCES[1] A.Krizhevsky,I.Sutskever,andG.E.Hinton,“Imagenet classificationwithdeepconvolutionalneuralnetworks,”inAdvancesinneuralinformationprocessingsystems,pp.1097–1105,2012.[2]ScalableObjectDetectionusingDeepNeuralNetworksDumitru Erhan,ChristianSzegedy,AlexanderToshev,andDragomir Anguelov Google,Inc.[3]AutomaticHandgunDetectionAlarminVideosUsingDeepLearningRobertoOlmos,Siham Tabik,andFranciscoHerrera[4]CCTVobjectdetectionwithfuzzyclassificationandimageenhancement,AndrzejMATIOLAŃSKI,AleksandraMAKSIMOWA,AndrzejDZIECH,MultimediaToolsandApplications,2015[5]AutomatedDetectionofFirearmsandKnivesinaCCTVImage,Michał Grega,AndrzejMATIOLAŃSKI,PiotrGuzik,Mikołaj Leszczuk,Sensors,ISSN1424-8220

FutureDespiteInception-ResNetV2performingthebest,I foundthatmanypredictionshadaprobabilityof20%to40%,evenifthesepredictionswerecorrect.ThefirststepI wouldliketotakeistoincreasetheconfidenceinthesepredictionssothatthemodelwouldbemorewelltrained.ThiscouldbedonemytrainingitonmoredataorincreasingtheepochswhentrainingtheCNN.Alsoafterdevelopinganend-to-endProofOfConceptsolution,Istronglyfeelthatithasthepotentialofbecomingacommerciallyviableproduct