Download pptx - Loss-based Visual Learning with Weak Supervision

Loss-based Visual Learning with Weak Supervision

M. Pawan Kumar

Joint work with Pierre-Yves Baudin, Danny Goodman,

Puneet Kumar, Nikos Paragios,Noura Azzabou, Pierre Carlier

SPLENDID

Nikos ParagiosEquipe GalenINRIA Saclay

Daphne KollerDAGS

Stanford

Machine LearningWeak AnnotationsNoisy Annotations

ApplicationsComputer VisionMedical Imaging

Self-Paced Learning for Exploiting Noisy, Diverse or Incomplete Data

2 Visits from INRIA to Stanford1 Visit from Stanford to INRIA

2012 ICML

3 Visits Planned2013 MICCAI

Medical Image Segmentation

MRI Acquisitions of the thigh

Medical Image Segmentation

MRI Acquisitions of the thigh

Segments correspond to muscle groups

Random Walks Segmentation

Probabilistic segmentation algorithm

Computationally efficient

Interactive segmentation

Automated shape prior driven segmentation

L. Grady, 2006 L. Grady, 2005; Baudin et al., 2012


y(i,s): Probability that voxel ‘i’ belongs to segment ‘s’

x: Medical acquisition

miny E(x,y) = yTL(x)y + wshape||y-y0||2

Positive semi-definite Laplacian matrix

Shape prior on the segmentation

Parameter of the RW algorithm

Convex

Hand-tuned


Several Laplacians

L(x) = Σα wαLα(x)

Several shape and appearance priors

Σβ wβ||y-yβ||2

Hand-tuning large number of parameters is onerous

Parameter Estimation

Learn the best parameters from training data

Σα wαyTLα(x)y + Σβ wβ||y-yβ||2

Parameter Estimation

Learn the best parameters from training data

wTΨ(x,y)

w is the set of all parameters

Ψ(x,y) is the joint feature vector of input and output

• Parameter Estimation– Supervised Learning– Hard vs. Soft Segmentation– Mathematical Formulation

• Optimization

• Experiments

• Related and Future Work in SPLENDID

Outline

Supervised LearningDataset of segmented fMRIs

Sample xk, voxel i

zk(i,s) = 1, s is ground-truth

0, otherwise

Probabilistic segmentation??

Supervised Learning

wTΨ(xk,zk)

Energyof

Ground-truth

wTΨ(xk,ŷ)

Energyof

Segmentation

- ≥ Δ(ŷ,zk) - ξk

minw Σk ξk + λ||w||2

Δ(ŷ,zk) = Fraction of incorrectly labeled voxels

Taskar et al., 2003; Tsochantardis et al., 2004

Structured-output Support Vector Machine

Supervised Learning

Convex with several efficient algorithms

No parameter provides ‘hard’ segmentation

We only need a correct ‘soft’ probabilistic segmentation


• Optimization

• Experiments


Outline

Hard vs. Soft SegmentationHard segmentation zk

Don’t require 0-1 probabilities

Hard vs. Soft SegmentationSoft segmentation yk

Compatible with zk

Binarizing yk gives zk

Hard vs. Soft Segmentation

yk C(zk)

Soft segmentation yk

Compatible with zk

Which yk to use??

yk provided by best parameter

Unknown


• Optimization

• Experiments


Outline

Learning with Hard Segmentation

wTΨ(xk,zk)wTΨ(xk,ŷ) - ≥ Δ(ŷ,zk) - ξk


Learning with Soft Segmentation

wTΨ(xk,yk)wTΨ(xk,ŷ) - ≥ Δ(ŷ,zk) - ξk


Learning with Soft Segmentation

wTΨ(xk,yk)wTΨ(xk,ŷ) - ≥ Δ(ŷ,zk) - ξk


Smola et al., 2005; Felzenszwalb et al., 2008; Yu et al., 2009

Latent Support Vector Machine

minyk

yk C(zk)

• Parameter Estimation

• Optimization

• Experiments


Outline

Latent SVM

Difference-of-convex problem


wTΨ(xk,ŷ) – minyk wTΨ(xk,yk) ≥ Δ(ŷ,zk) – ξk

yk C(zk)

Concave-Convex Procedure (CCCP)

CCCP

yk* = minyk wTΨ(xk,yk) s.t. yk C(zk)

Repeat until convergence

Estimate soft segmentation

Update parametersminw Σk ξk + λ||w||2

wTΨ(xk,ŷ) – wTΨ(xk,yk*) ≥ Δ(ŷ,zk) – ξk

Efficient optimization using dual decomposition

Convex optimization


• Optimization

• Experiments


Outline

Dataset30 MRI volumes of thigh

Dimensions: 224 x 224 x 100

4 muscle groups + background

80% for training, 20% for testing

Parameters4 Laplacians

2 shape priors

1 appearance prior

Baudin et al., 2012

Grady, 2005

BaselinesHand-tuned parameters

Structured-output SVM

Soft segmentation based on signed distance transform

Hard segmentation

Results

Small but statistically significant improvement


• Optimization

• Experiments


Outline

Loss-based Learning

x: Input a: Annotation

Loss-based Learning

x: Input a: Annotation h: Hidden information

h

a = “jumping”h = “soft-segmentation”

Loss-based Learning

min Σk Δ(correct ak, predicted ak) Annotation Mismatch

x: Input a: Annotation h: Hidden information

h

a = “jumping”h = “soft-segmentation”

Loss-based Learning


Small improvement using small medical dataset

Loss-based Learning


Large improvement using large vision dataset

Loss-based Learning

min Σk Δ(correct {ak,hk}, predicted {ak,hk}) Modeled using a distributionOutput Mismatch

Kumar, Packer and Koller, ICML 2012

Inexpensive annotation

No experts required

Richer models can be learnt

Questions?