Beckman Background Knowledge

7/31/2019 Beckman Background Knowledge

http://slidepdf.com/reader/full/beckman-background-knowledge 1/44

Using Background Knowledge

to Improve Visual Learning

Derek Hoiem

Beckman Director’s Seminar

March 11, 2009

Work with: Ali Farhadi, Ian Endres, Gang Wang, Santosh Divvala, James Hays

David Forsyth, Alexei Efros, Martial Hebert



What I’d like to make possible with computer vision

Household Robot Intelligent Vehicle

Security Photo Organization



What we can do (with the right dataset)

• Recognize faces

• Categorize scenes

• Detect, segment, and track objects

• 3D from multiple images or stereo

• Classify actions



But we’re a long way from “Rosie”

Computer vision has been divided into many

task- and dataset-specific problems

– Difficult to coordinate pieces – Poor generalization to unfamiliar environments

– Massive engineering and data collection effort

required for every task/dataset



Goal

Use background knowledge: generalize known

solutions to new problems or dataset



The Challenge

How can we use what we know to make learning

new things easier and more robust?



This Talk

• Three uses of background knowledge

– Contextual knowledge

– Compositional knowledge

– Organizational knowledge



I. Contextual Knowledge

Goal: Use knowledge of objects and spatial

layout to better detect a new object.

Work with Santosh Divvala, James Hays, Alexei Efros, Martial Hebert



Object Detection without Context

Search over many positions and scales




Cat?

Cat?

Cat?

In each window: is this a cat?



Training a Detector

Color

Texture

Edges

xx

x x

x

x

x

xx

oo

oo

o

Classifier FeaturesExamples

http://images.google.com/imgres?imgurl=http://scienceblogs.com/bushwells/upload/2006/07/IcePlantOrgy.JPG&imgrefurl=http://scienceblogs.com/bushwells/2006/07/friday_flower_porn.php&h=1704&w=2272&sz=838&hl=en&start=17&tbnid=RBGFTXqFUNjqAM:&tbnh=113&tbnw=150&prev=/images?q=plant&gbv=2&hl=en&safe=off




In each window: is this a cat?

,




• Top five cat detections in a challenging dataset

Detector: Felzenszwalb et al. CVPR 2008 Dataset: PASCAL VOC 2008



What do we know that can help us?




Knowledge of Other Objects and Scenes

Similar ImagesLarge Set of

Loosely

AnnotatedImages

Associated Keywords

BabyPuppy

Sand

House KittenHelps tell us howlikely the object is to

appear in this image.



.

.

Occlusion Boundaries Depth EstimatesSurface Layout

Helps tell us where and how

big the object is likely to be.

Knowledge of Spatial Layout


Hoiem et al. 2005,2007



Context: Likelihood of Presence

1. Object presence

Contains Cat No Cat



Context: Likelihood of Presence

Image

Gist

Surface Layout

Associated Keywords

BabyPuppy

Sand

House Kitten

gist: Torralba Oliva 2003

Likely to contain a cat?





Context: Likelihood of Size

• Predict height of object based on depth, surface orientations,gist, and image position

Predicted Height

Candidate

Bounding Box

Size from Gist: Torralba Oliva 2003



Rescoring Candidate Objects

Presence Scores

Position Scores

Size Scores

Appearance Score(from detector)

Independently Trained

Classifiers

Bounding BoxScore

Linear Weights L1-Regularized Logistic

Regression



Context improves detection

Top 5: Before Context

Top 5: After Context



Context improves detection accuracy

Average Precision (Higher is Better)



Context changes the error patterns

•More confusion

– Cats and Dogs

– Dogs and Sheep

– Motorbike and Bicycle

• Less confusion

– Objects and background



II. Compositional Knowledge

Goal: Describe new objects using attributes

learned from other objects.

Work with Ali Farhadi, Ian Endres, David Forsyth



A name doesn’t tell us much

Name: Cat

Name: Dog

Name: Horse

Known Objects New Object

Name: Unknown



But what if we learn attributes?

Name: Cat

Properties: four legs, tail,

eyes, ears, furry, has

stripes, gray

Name: Dog

Name: Horse

Known Objects New Object

Properties: four legs,

eyes, ears, snout, tan,

muscular


mane, eyes, ears, snout,

tan

Name: Unknown



We can infer what object is like

Name: Cat


eyes, ears, furry, hasstripes, gray

Name: Dog

Name: Horse

Known Objects

Name: Unknown

Properties: four legs, eyes,

snout, tan, muscular


mane, eyes, ears, snout,

tan

Properties: four legs,eyes, ears, snout,

stripes, mane

New Object



Learning Attributes

• Learn to distinguish between things that have

an attribute and things that don’t

• Train one classifier per attribute



Learning Correlated Attributes

Problem• Many attributes are strongly correlated through the

object category

Most cars are “made of metal” and have “wheels”

When we try to learn “has

wheels”, we may accidentally

learn “made of metal”

Has Wheels, Made of Metal?



Decorrelating Attributes

Solution• Select features that can distinguish between two

classes

– Things that have wheels

– Things that do not, but have other attributes in common

“Has Wheels” “No Wheels”

Vs.



Learning to Describe Objects



Describing New Objects



Identifying Unusual Attributes

Absence of Typical Attributes

Presence of Atypical Attributes

752 reports

68% are correct

951 reports

47% are correct



Recognition from Description

• Learn new classes by describing them to the

algorithm

• Goat = “Is furry, four legged, has snout, has horn”

• 12-Class Classification Accuracy = 32.5%

• Chance = 8%

• As good as having 8 visual examples with original imagefeatures



III. Organizational Knowledge

Goal: Help a person organize his photos usingimage similarity learned from Flickr groups.

Work with Gang Wang, David Forsyth



Taming the Digital Explosion

Photos are easy to take and store.

But it’s still difficult to organize them.



Solution: Learn from photo sharing sites

• Billions of images in Flickr

• Hundreds of thousands of categories



Learn similarity

• Downloads hundreds of groups, each

containing thousands of photos

• Train classifier to predict whether a photo islikely to belong in each group

– Gang Wang created super-fast online training

method for kernelized SVMs

• Images are similar if they are likely to belong

to the same group



We can find similar images

Query Image

Retrieved Images Using

Feature Similarity

Retrieved Images Using

Similarity Learned from Flickr



We can say how two images are similar

Fireworks (15.6)Christmas (7.6)

Rain (4.0)

Water drops (2.5)

Candles (2.0)

Sports (2.6)

Dances (2.0)

Weddings (1.0)

Toys (0.5)

Horses (0.5)

Painting (2.4)

Art (1.2)

Macro-flowers (0.9)

Hands (0.9)

Skateboarding (0.6)



Conclusions

• Background knowledge is a key missing component in today’s

computer vision algorithms

• Existing knowledge can make learning easier

– Provides new abilities (say two things are similar or different)

– More complete visual models (better accuracy, more reasonable

mistakes)

– Better able to handle new objects and situations

• We need to start designing systems that accumulate

visual knowledge



Thank you



Documents

Beckman Background Knowledge