1
SuperParsing: Scalable Nonparametric Image Parsing with Superpixels Joseph Tighe and Svetlana Lazebnik Dept. of Computer Science, University of North Carolina at Chapel Hill http://www.cs.unc.edu/SuperParsing Overview “Open universe” system: no training required, easy to accommodate an evolv- ing dataset as new classes or new training exemplars are added State of the art performance on the SIFT Flow dataset (Liu et al., CVPR 2009) New large-scale baseline for image parsing: per-pixel recognition results on a subset of LabelMe consisting of 15k images, 170 labels Query Image Retrieval set of similar images Per-class likelihood Building Road Sky Car Sky Vertical Horizontal Superpixels Building Car Road Sky Semantic Classes Geometric Classes Image Parsing Method Given a query (test) image: Find a retrieval set of 200 similar images by taking the minimum per-feature rank of four global image features. For each test superpixel s i described by multiple local features {f k i }, compute a likelihood ratio score for each class c found in the retrieval set: L(s i ,c)= P (s i |c) P (s i c) = Y k P (f k i |c) P (f k i c) . The feature likelihood P (f k i |c) is given by a nonparametric density estimate: P (f k i | c)= #(retrieval set features of class c within a fixed radius off k i ) #(total features of class c in the training set) . Use MRF inference to solve for the label field c = {c i } over the entire test image: J (c)= X s i SP -w i log L(s i ,c i )+ λ X (s i ,s j )A E edge (c i ,c j ) , where w i is a weight based on the superpixel size, and edge penalty E edge is based on the co-occurrence of adjacent labels in the training set: E edge (c i ,c j )= - log[(P (c i |c j )+ P (c j |c i ))/2] × δ [c i 6= c j ] . Road Sky Sky Sea Sea Sand Sand Tree Max. Likelihood Ratio Edge Penality MRF Labeling Query Image Joint Semantic and Geometric Labeling Simultaneously solve for a field of semantic labels (c) and geometric labels (g) over the image by optimizing H (c, g)= J (c)+ J (g)+ μ X s i SP ϕ(c i ,g i ) , where ϕ(c i ,g i ) is a coherence term between the semantic and geometric label of the same superpixel. 72.0 68.6 77.6 97.2 97.6 94.8 Road Window Door Sidewalk Building Awning Sign Person Horz Vert Sky Initial Labeling Joint Semantic and Geometric Query Ground Truth Labels Semantic MRF Semantic Classes Geometric Classes Results on Large-Scale Datasets SIFT Flow dataset (Liu et al., CVPR 2009): 2,488 training images, 200 test images, 33 labels Barcelona dataset (a new large-scale benchmark): 14,871 training images, 279 test images, 170 labels 0 20 40 60 80 # of Superpixels (x1000) 264,945 SIFT Flow Dataset Barcelona Dataset Label Frequency in Dataset Per-pixel classification rates (with average per-class rates in parentheses): SIFT Flow Barcelona Semantic Geometric Semantic Geometric Liu et al. (CVPR 2009) 74.75 N/A N/A N/A Local labeling 73.2 (29.1) 89.8 62.5 (8.0) 89.9 MRF 76.3 (28.8) 89.9 66.6 (7.6) 90.2 Joint semantic/geometric 76.9 (29.4) 90.8 66.9 (7.6) 90.7 0% 25% 50% 75% 100% SIFT Flow Dataset Barcelona Dataset Per-class Performance Timing SIFT Flow Barcelona Training set size 2,488 14,871 Image size 256 × 256 640 × 480 Ave. # superpixels 63.9 307.9 Feature extraction 4 sec 5 min Retrieval set search 0.04 ± 0.0 0.21 ± 0.0 Superpixel search 4.4 ± 2.3 34.2 ± 13.4 MRF solver 0.005 ± 0.003 0.03 ± 0.02 Total (excluding features) 4.4 ± 2.3 34.4 ± 13.4 0 100 200 300 400 500 0 20 40 60 80 Number of Superpixels Seconds Full System Times SIFT Flow Dataset Barcelona Dataset Sample Output on SIFT Flow Dataset Initial Labeling Final Labeling Geometric Labeling Query Ground Truth Labels Edge Penalties Horz Unlabeled Tree Sky Road 98.7 98.6 99.2 Car Bridge Vert Sky Tree Sky Sea Horz Vert Sky Road Mountain Grass 95.2 97.1 97.1 Field Desert Tree Sky Sea Horz Vert Sky Sun Mountain 86.6 88.4 88.5 Sea Horz Vert Sky Mountain 68.8 94.2 94.2 Field Desert Sky Building Tree Horz Vert Sky Road 85.4 86.2 94.2 Sky Sidewalk Building Car Horz Window Vert Sky Road Sky Sidewalk 73.2 77.2 93.3 Building Balcony Door Tree Sky Horz Vert Sky Mountain 57.9 73.2 81.3 Building Small Datasets Results on two small-scale datasets using trained boosted decision tree classifiers (instead of retrieval set and superpixel search): Stanford Dataset Geometric Context Dataset 715 images, 8 classes 300 images, 7 classes Semantic Geometric Sub-classes Main classes Gould et al. (ICCV 2009) 76.4 91.0 N/A 86.9 Hoiem et al. (IJCV 2007) N/A N/A 61.5 88.1 Local labeling 76.9 90.5 57.6 87.8 MRF 77.5 90.6 61.0 88.2 Joint semantic/geometric 77.5 90.6 61.0 88.1 Funding This research was supported in part by NSF CAREER award IIS-0845629, Microsoft Research Faculty Fellowship, and Xerox.

Building Desert Field Sky Mountain Sea Sky Vert Horz ...jtighe/Papers/ECCV10/eccv10-jtighe-poster.pdfSea Sea Sand Sand Tree Query Image Max. Likelihood Ratio Edge Penality MRF Labeling

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Building Desert Field Sky Mountain Sea Sky Vert Horz ...jtighe/Papers/ECCV10/eccv10-jtighe-poster.pdfSea Sea Sand Sand Tree Query Image Max. Likelihood Ratio Edge Penality MRF Labeling

SuperParsing: Scalable Nonparametric Image Parsing with SuperpixelsJoseph Tighe and Svetlana Lazebnik Dept. of Computer Science, University of North Carolina at Chapel Hill http://www.cs.unc.edu/SuperParsing

Overview

• “Open universe” system: no training required, easy to accommodate an evolv-ing dataset as new classes or new training exemplars are added

• State of the art performance on the SIFT Flow dataset (Liu et al., CVPR 2009)

• New large-scale baseline for image parsing: per-pixel recognition results on asubset of LabelMe consisting of 15k images, 170 labels

Query Image Retrieval set of similar images

Per-class likelihood

Building

Road

Sky

Car

SkyVertical

Horizontal

Superpixels

Building

Car

Road

Sky Semantic Classes Geometric Classes

Image Parsing MethodGiven a query (test) image:

• Find a retrieval set of 200 similar images by taking the minimum per-feature rankof four global image features.

• For each test superpixel si described by multiple local features {fki }, compute alikelihood ratio score for each class c found in the retrieval set:

L(si, c) =P (si|c)P (si|¬c)

=∏k

P (fki |c)P (fki |¬c)

.

The feature likelihood P (fki |c) is given by a nonparametric density estimate:

P (fki | c) =#(retrieval set features of class c within a fixed radius offki )

#(total features of class c in the training set).

• Use MRF inference to solve for the label field c = {ci} over the entire test image:

J(c) =∑

si∈SP

−wi logL(si, ci) + λ∑

(si,sj)∈A

Eedge(ci, cj) ,

where wi is a weight based on the superpixel size, and edge penalty Eedge isbased on the co-occurrence of adjacent labels in the training set:

Eedge(ci, cj) = − log[(P (ci|cj) + P (cj |ci))/2]× δ[ci 6= cj ] .

Road

Sky Sky

Sea Sea

Sand Sand

Tree

Max. Likelihood Ratio Edge Penality MRF LabelingQuery Image

Joint Semantic and Geometric LabelingSimultaneously solve for a field of semantic labels (c) and geometric labels (g) overthe image by optimizing

H(c,g) = J(c) + J(g) + µ∑

si∈SP

ϕ(ci, gi) ,

where ϕ(ci, gi) is a coherence term between the semantic and geometric label of thesame superpixel.

72.0 68.6 77.6

97.2 97.6 94.8

Road

WindowDoor

SidewalkBuildingAwning

SignPerson

HorzVertSky

InitialLabeling

Joint Semanticand Geometric

Query GroundTruth Labels

SemanticMRF

SemanticClasses

GeometricClasses

Results on Large-Scale Datasets

• SIFT Flow dataset (Liu et al., CVPR 2009): 2,488 training images, 200 test images,33 labels

• Barcelona dataset (a new large-scale benchmark): 14,871 training images, 279test images, 170 labels

020406080

# of

Sup

erpi

xels

(x10

00)

264,945 SIFT Flow Dataset Barcelona DatasetLabel Frequency in Dataset

Per-pixel classification rates (with average per-class rates in parentheses):

SIFT Flow BarcelonaSemantic Geometric Semantic Geometric

Liu et al. (CVPR 2009) 74.75 N/A N/A N/ALocal labeling 73.2 (29.1) 89.8 62.5 (8.0) 89.9MRF 76.3 (28.8) 89.9 66.6 (7.6) 90.2Joint semantic/geometric 76.9 (29.4) 90.8 66.9 (7.6) 90.7

0%25%50%75%

100%SIFT Flow Dataset Barcelona DatasetPer-class Performance

Timing

SIFT Flow BarcelonaTraining set size 2,488 14,871Image size 256× 256 640× 480Ave. # superpixels 63.9 307.9Feature extraction ∼ 4 sec ∼ 5 minRetrieval set search 0.04 ± 0.0 0.21 ± 0.0Superpixel search 4.4 ± 2.3 34.2 ± 13.4MRF solver 0.005 ± 0.003 0.03 ± 0.02Total (excluding features) 4.4 ± 2.3 34.4 ± 13.4 0 100 200 300 400 500

0

20

40

60

80

Number of Superpixels

Sec

onds

Full System Times

SIFT Flow Dataset

Barcelona Dataset

Sample Output on SIFT Flow DatasetInitial

LabelingFinal

LabelingGeometricLabeling

Query GroundTruth Labels

EdgePenalties

HorzUnlabeledTreeSkyRoad

98.7 98.6 99.2

CarBridge VertSky

TreeSkySea HorzVertSkyRoadMountainGrass

95.2 97.1 97.1

FieldDesert

TreeSkySea HorzVertSkySunMountain

86.6 88.4 88.5

Sea HorzVertSkyMountain

68.8 94.2 94.2

FieldDesert SkyBuilding

Tree HorzVertSkyRoad

85.4 86.2 94.2

SkySidewalkBuilding Car

Horz

Window VertSkyRoad SkySidewalk

73.2 77.2 93.3

BuildingBalcony Door

TreeSky HorzVertSkyMountain

57.9 73.2 81.3

Building

Small DatasetsResults on two small-scale datasets using trained boosted decision tree classifiers(instead of retrieval set and superpixel search):

Stanford Dataset Geometric Context Dataset715 images, 8 classes 300 images, 7 classes

Semantic Geometric Sub-classes Main classesGould et al. (ICCV 2009) 76.4 91.0 N/A 86.9Hoiem et al. (IJCV 2007) N/A N/A 61.5 88.1Local labeling 76.9 90.5 57.6 87.8MRF 77.5 90.6 61.0 88.2Joint semantic/geometric 77.5 90.6 61.0 88.1

FundingThis research was supported in part by NSF CAREER award IIS-0845629, Microsoft ResearchFaculty Fellowship, and Xerox.

1