Unsupervised Visual Representation Learning by Context Prediction

UnsupervisedVisualRepresentationLearningbyContextPrediction

Mostslidesinthisrepresentationareadoptedfromauthors'originalpresentationatICCV2015

Berkan Demirel

ImageNet +DeepLearning

Beagle

- ImageRetrieval- Detection(RCNN)- Segmentation(FCN)- DepthEstimation- …

ImageNet +DeepLearning

Beagle

Dowe needsemanticlabels?Pose?

Boundaries?Geometry?

Parts?Materials?

ContextasSupervision[Collobert&Weston2008;Mikolov etal.2013]

DeepNet

ContextPredictionforImages

Semanticsfromanon-semantictask

RandomlySamplePatchSampleSecondPatch

CNN CNN

Classifier

RelativePositionTask8possiblelocations

CNN CNN

Classifier

PatchEmbedding

Input NearestNeighbors

CNN Note:connectsacross instances!

Architecture

Patch2Patch1

Fullyconnected

MaxPoolingLRN

ConvolutionConvolutionConvolution

Convolution

MaxPooling

MaxPoolingLRN

Fullyconnected

ConvolutionConvolutionConvolution

Convolution

MaxPooling

Softmax loss

Fullyconnected

TiedWeights

AvoidingTrivialShortcuts

Includeagap

Jitterthepatchlocations

PositioninImage

ANot-So“Trivial”Shortcut

ChromaticAberration

Solutions

ColorDroppingRandomlydrop2ofthe3colorchannelsfromeachpatch.Then,replacingthedroppedcolorswithGaussianNoise(standarddeviation~1/100thestandard

deviationoftheremainingchannel).

ProjectionShiftgreenandmagenta(red+blue)towardsgray

ImplementationDetails• TrainontheImageNet2012trainingset(1.3Mimages),usingonlytheimagesanddiscarding

thelabels.• Resizeeachimagetobetween150Kand450Ktotalpixels,preservingtheaspect-ratio.• Samplepatchesatresolution96-by-96.• Samplethepatchesfromagridlikepattern.Eachsampledpatchcanparticipateinasmanyas

8separatepairings.• Allowagapof48pixelsbetweenthesampledpatchesinthegrid,butalsojitterthe location

ofeachpatchinte gridby–7to7pixelsineachdirection.• Preprocesspatchesby(1)meansubstraction,(2)projectingordroppingcolors,(3)randomly

downsamplingsomepatchestoaslittleas100totalpixels,andthenupsamplingit,tobuildrobustness topixelation.

• Usebatchnormalization,without thescaleandshift.

Experiments• ChromaticAberration• Nearest-NeighborMatching• ObjectDetection• GeometryEstimation• VisualDataMining• LayoutPrediction

ChromaticAberration

Nearest-NeighborMatching• fc6layerfeaturesandonlyoneofthetwostacksareused.• fc7andhigherlayersareremoved.• Normalizedcrosscorrelationisusedtofindsimilarpatches• Randomlyselected96x96patchesareusedinthecomparison.

Whatislearned?

Input RandomInitialization ImageNet AlexNet

Stilldon’tcaptureeverythingInput Ours RandomInitialization ImageNet AlexNet

Youdon’talwaysneedtolearn!Input Ours RandomInitialization ImageNet AlexNet

ObjectDetection

Pre-trainonrelative-positiontask,w/olabels

[Girshick etal.2014]

ObjectDetection

Multi-TaskTraining?

Surface-normalEstimation

Error (LowerBetter) %GoodPixels(HigherBetter)

NoPretraining 38.6 26.5 33.1 46.8 52.5Unsup.Track. 34.2 21.9 35.7 50.6 57.0Ours 33.2 21.3 36.0 51.2 57.8ImageNet Labels 33.3 20.8 36.7 51.7 58.1

VisualDataMining• Sampleaconstellationoffouradjacentpatchesfroman

image(weusefourtoreducethelikelihoodofamatchingspatialarrangementhappeningbychance).

• Findtop100imageswhichhavethestrongestmatchesforallfourpatches,ignoringspatiallayout.

• Useatypeofageometricverificationtofilterawaytheimageswherethefourmatchesarenotgeometricallyconsistent.

• ApplythedescribedminingalgorithmtoPascalVOC2011.

VisualDataMining

ViaGeometricVerification

Simplifiedfrom[Chumetal2007]

MinedfromPascalVOC2011

LayoutPredictionVisualDataMiningAlgorithmresultsfor15,000StreetViewimagesfromParis

Purity Test

So,doweneedsemanticlabels?

SourceCode&SupplementaryMaterials

• MagicInit• UnsupervisedVisualRepresentationLearningbyContextPrediction• VisualDataMiningResultsonunlabeledPASCALVOC2011Images• NearestNeighborsonPASCALVOC2007• More

THANKYOU!

Unsupervised Visual Representation Learning by Context Prediction

Documents

Unsupervised Visual Representation Learning by Context ...€¦ · Unsupervised Visual Representation Learning by Context Prediction Carl Doersch1,2 Abhinav Gupta1 Alexei A. Efros2

PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning ... · 2018-11-16 · Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised

Unsupervised Deep Learning - GitHub Pages · • Predict image rotation R. Zhang et al. “Colorful Image Colorization”, ECCV 2016 S. Gidaris et al. “Unsupervised Representation

Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance

Deep Learning in Customer Churn Prediction: Unsupervised ... · Deep Learning in Customer Churn Prediction: Unsupervised Feature Learning on Abstract Company Independent Feature Vectors

1 Unsupervised Learning for Cell-level Visual ... · Unsupervised Learning for Cell-level Visual Representation in Histopathology Images ... Foundation in China under Grant ... constitution

Unsupervised Visual Representation Learning by Graph …faculty.ucmerced.edu/mhyang/papers/eccv16_feature_learning.pdf · by Graph-based Consistent Constraints ... of unsupervised

Unsupervised learning in medical imaging · •Machine learning in medical imaging • Image segmentation • Detection of disease • Prediction of disease course •Unsupervised

INDUCTIVE AND UNSUPERVISED REPRESENTATION LEARNING … · In this paper, we propose a general framework SEED (Sampling, Encod-ing, and Embedding Distributions) for inductive and unsupervised

Online Deep Clustering for Unsupervised …...Online Deep Clustering for Unsupervised Representation Learning Xiaohang Zhan∗1, Jiahao Xie∗2, Ziwei Liu1, Yew Soon Ong2,3, Chen Change

1 Unsupervised Domain Adaptation for Depth Prediction from ...1 Unsupervised Domain Adaptation for Depth Prediction from Images Alessio Tonioni, Student Member, IEEE, Matteo Poggi,

Unsupervised Learning for Physical Interaction …papers.nips.cc/paper/6161-unsupervised-learning-for...Unsupervised Learning for Physical Interaction through Video Prediction Chelsea

Unsupervised Adversarially Robust Representation Learning

Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Unsupervised Co-Learning on $G$-Manifolds Across Irreducible ...papers.nips.cc/paper/...g-manifolds-across-irreducible-representation… · Unsupervised Co-Learning on G-Manifolds

Patch to the Future: Unsupervised Visual Prediction CVPR 2014 Oral

Unsupervised Visual Representation Learning by Context Prediction › pdf › 1505.05192.pdf · 2016-01-19 · Unsupervised Visual Representation Learning by Context Prediction Carl

Phenotype Prediction using a Tensor Representation and ... › content › 10.1101 › 2020.03.05.978635v1.ful… · - 1 - Phenotype Prediction using a Tensor Representation and Deep

Unsupervised Visual Representation Learning by …openaccess.thecvf.com/content_iccv_2015/papers/Doersch...Unsupervised Visual Representation Learning by Context Prediction Carl Doersch1,2