Loss-based Visual Learning with Weak Supervision
M. Pawan Kumar
Joint work with Pierre-Yves Baudin, Danny Goodman,
Puneet Kumar, Nikos Paragios,Noura Azzabou, Pierre Carlier
SPLENDID
Nikos ParagiosEquipe GalenINRIA Saclay
Daphne KollerDAGS
Stanford
Machine LearningWeak AnnotationsNoisy Annotations
ApplicationsComputer VisionMedical Imaging
Self-Paced Learning for Exploiting Noisy, Diverse or Incomplete Data
2 Visits from INRIA to Stanford1 Visit from Stanford to INRIA
2012 ICML
3 Visits Planned2013 MICCAI
Medical Image Segmentation
MRI Acquisitions of the thigh
Medical Image Segmentation
MRI Acquisitions of the thigh
Segments correspond to muscle groups
Random Walks Segmentation
Probabilistic segmentation algorithm
Computationally efficient
Interactive segmentation
Automated shape prior driven segmentation
L. Grady, 2006 L. Grady, 2005; Baudin et al., 2012
Random Walks Segmentation
y(i,s): Probability that voxel ‘i’ belongs to segment ‘s’
x: Medical acquisition
miny E(x,y) = yTL(x)y + wshape||y-y0||2
Positive semi-definite Laplacian matrix
Shape prior on the segmentation
Parameter of the RW algorithm
Convex
Hand-tuned
Random Walks Segmentation
Several Laplacians
L(x) = Σα wαLα(x)
Several shape and appearance priors
Σβ wβ||y-yβ||2
Hand-tuning large number of parameters is onerous
Parameter Estimation
Learn the best parameters from training data
Σα wαyTLα(x)y + Σβ wβ||y-yβ||2
Parameter Estimation
Learn the best parameters from training data
wTΨ(x,y)
w is the set of all parameters
Ψ(x,y) is the joint feature vector of input and output
• Parameter Estimation– Supervised Learning– Hard vs. Soft Segmentation– Mathematical Formulation
• Optimization
• Experiments
• Related and Future Work in SPLENDID
Outline
Supervised LearningDataset of segmented fMRIs
Sample xk, voxel i
zk(i,s) = 1, s is ground-truth
0, otherwise
Probabilistic segmentation??
Supervised Learning
wTΨ(xk,zk)
Energyof
Ground-truth
wTΨ(xk,ŷ)
Energyof
Segmentation
- ≥ Δ(ŷ,zk) - ξk
minw Σk ξk + λ||w||2
Δ(ŷ,zk) = Fraction of incorrectly labeled voxels
Taskar et al., 2003; Tsochantardis et al., 2004
Structured-output Support Vector Machine
Supervised Learning
Convex with several efficient algorithms
No parameter provides ‘hard’ segmentation
We only need a correct ‘soft’ probabilistic segmentation
• Parameter Estimation– Supervised Learning– Hard vs. Soft Segmentation– Mathematical Formulation
• Optimization
• Experiments
• Related and Future Work in SPLENDID
Outline
Hard vs. Soft SegmentationHard segmentation zk
Don’t require 0-1 probabilities
Hard vs. Soft SegmentationSoft segmentation yk
Compatible with zk
Binarizing yk gives zk
Hard vs. Soft Segmentation
yk C(zk)
Soft segmentation yk
Compatible with zk
Which yk to use??
yk provided by best parameter
Unknown
• Parameter Estimation– Supervised Learning– Hard vs. Soft Segmentation– Mathematical Formulation
• Optimization
• Experiments
• Related and Future Work in SPLENDID
Outline
Learning with Hard Segmentation
wTΨ(xk,zk)wTΨ(xk,ŷ) - ≥ Δ(ŷ,zk) - ξk
minw Σk ξk + λ||w||2
Learning with Soft Segmentation
wTΨ(xk,yk)wTΨ(xk,ŷ) - ≥ Δ(ŷ,zk) - ξk
minw Σk ξk + λ||w||2
Learning with Soft Segmentation
wTΨ(xk,yk)wTΨ(xk,ŷ) - ≥ Δ(ŷ,zk) - ξk
minw Σk ξk + λ||w||2
Smola et al., 2005; Felzenszwalb et al., 2008; Yu et al., 2009
Latent Support Vector Machine
minyk
yk C(zk)
• Parameter Estimation
• Optimization
• Experiments
• Related and Future Work in SPLENDID
Outline
Latent SVM
Difference-of-convex problem
minw Σk ξk + λ||w||2
wTΨ(xk,ŷ) – minyk wTΨ(xk,yk) ≥ Δ(ŷ,zk) – ξk
yk C(zk)
Concave-Convex Procedure (CCCP)
CCCP
yk* = minyk wTΨ(xk,yk) s.t. yk C(zk)
Repeat until convergence
Estimate soft segmentation
Update parametersminw Σk ξk + λ||w||2
wTΨ(xk,ŷ) – wTΨ(xk,yk*) ≥ Δ(ŷ,zk) – ξk
Efficient optimization using dual decomposition
Convex optimization
• Parameter Estimation
• Optimization
• Experiments
• Related and Future Work in SPLENDID
Outline
Dataset30 MRI volumes of thigh
Dimensions: 224 x 224 x 100
4 muscle groups + background
80% for training, 20% for testing
Parameters4 Laplacians
2 shape priors
1 appearance prior
Baudin et al., 2012
Grady, 2005
BaselinesHand-tuned parameters
Structured-output SVM
Soft segmentation based on signed distance transform
Hard segmentation
Results
Small but statistically significant improvement
• Parameter Estimation
• Optimization
• Experiments
• Related and Future Work in SPLENDID
Outline
Loss-based Learning
x: Input a: Annotation
Loss-based Learning
x: Input a: Annotation h: Hidden information
h
a = “jumping”h = “soft-segmentation”
Loss-based Learning
min Σk Δ(correct ak, predicted ak) Annotation Mismatch
x: Input a: Annotation h: Hidden information
h
a = “jumping”h = “soft-segmentation”
Loss-based Learning
min Σk Δ(correct ak, predicted ak) Annotation Mismatch
Small improvement using small medical dataset
Loss-based Learning
min Σk Δ(correct ak, predicted ak) Annotation Mismatch
Large improvement using large vision dataset
Loss-based Learning
min Σk Δ(correct {ak,hk}, predicted {ak,hk}) Modeled using a distributionOutput Mismatch
Kumar, Packer and Koller, ICML 2012
Inexpensive annotation
No experts required
Richer models can be learnt
Questions?