Loss-based Visual Learning with Weak Supervision

M. Pawan Kumar

Joint work with Pierre-Yves Baudin, Danny Goodman,

Puneet Kumar, Nikos Paragios,Noura Azzabou, Pierre Carlier

SPLENDID

Nikos ParagiosEquipe GalenINRIA Saclay

Daphne KollerDAGS

Stanford

Machine LearningWeak AnnotationsNoisy Annotations

ApplicationsComputer VisionMedical Imaging

Self-Paced Learning for Exploiting Noisy, Diverse or Incomplete Data

2 Visits from INRIA to Stanford1 Visit from Stanford to INRIA

2012 ICML

3 Visits Planned2013 MICCAI

Medical Image Segmentation

MRI Acquisitions of the thigh

Medical Image Segmentation

MRI Acquisitions of the thigh

Segments correspond to muscle groups

Random Walks Segmentation

Probabilistic segmentation algorithm

Computationally efficient

Interactive segmentation

Automated shape prior driven segmentation

L. Grady, 2006 L. Grady, 2005; Baudin et al., 2012

y(i,s): Probability that voxel ‘i’ belongs to segment ‘s’

x: Medical acquisition

miny E(x,y) = yTL(x)y + wshape||y-y0||2

Positive semi-definite Laplacian matrix

Shape prior on the segmentation

Parameter of the RW algorithm

Convex

Hand-tuned

Several Laplacians

L(x) = Σα wαLα(x)

Several shape and appearance priors

Σβ wβ||y-yβ||2

Hand-tuning large number of parameters is onerous

Parameter Estimation

Learn the best parameters from training data

Σα wαyTLα(x)y + Σβ wβ||y-yβ||2

Parameter Estimation

Learn the best parameters from training data

wTΨ(x,y)

w is the set of all parameters

Ψ(x,y) is the joint feature vector of input and output

• Parameter Estimation– Supervised Learning– Hard vs. Soft Segmentation– Mathematical Formulation

• Optimization

• Experiments

• Related and Future Work in SPLENDID

Outline

Supervised LearningDataset of segmented fMRIs

Sample xk, voxel i

zk(i,s) = 1, s is ground-truth

0, otherwise

Probabilistic segmentation??

Supervised Learning

wTΨ(xk,zk)

Energyof

Ground-truth

wTΨ(xk,ŷ)

Energyof

Segmentation

- ≥ Δ(ŷ,zk) - ξk

minw Σk ξk + λ||w||2

Δ(ŷ,zk) = Fraction of incorrectly labeled voxels

Taskar et al., 2003; Tsochantardis et al., 2004

Structured-output Support Vector Machine

Supervised Learning

Convex with several efficient algorithms

No parameter provides ‘hard’ segmentation

We only need a correct ‘soft’ probabilistic segmentation

• Optimization

• Experiments

Outline

Hard vs. Soft SegmentationHard segmentation zk

Don’t require 0-1 probabilities

Hard vs. Soft SegmentationSoft segmentation yk

Compatible with zk

Binarizing yk gives zk

Hard vs. Soft Segmentation

yk C(zk)

Soft segmentation yk

Compatible with zk

Which yk to use??

yk provided by best parameter

Unknown

• Optimization

• Experiments

Outline

Learning with Hard Segmentation

wTΨ(xk,zk)wTΨ(xk,ŷ) - ≥ Δ(ŷ,zk) - ξk

Learning with Soft Segmentation

wTΨ(xk,yk)wTΨ(xk,ŷ) - ≥ Δ(ŷ,zk) - ξk

Learning with Soft Segmentation

wTΨ(xk,yk)wTΨ(xk,ŷ) - ≥ Δ(ŷ,zk) - ξk

Smola et al., 2005; Felzenszwalb et al., 2008; Yu et al., 2009

Latent Support Vector Machine

yk C(zk)

• Parameter Estimation

• Optimization

• Experiments

Outline

Latent SVM

Difference-of-convex problem

wTΨ(xk,ŷ) – minyk wTΨ(xk,yk) ≥ Δ(ŷ,zk) – ξk

yk C(zk)

Concave-Convex Procedure (CCCP)

yk* = minyk wTΨ(xk,yk) s.t. yk C(zk)

Repeat until convergence

Estimate soft segmentation

Update parametersminw Σk ξk + λ||w||2

wTΨ(xk,ŷ) – wTΨ(xk,yk*) ≥ Δ(ŷ,zk) – ξk

Efficient optimization using dual decomposition

Convex optimization

• Optimization

• Experiments

Outline

Dataset30 MRI volumes of thigh

Dimensions: 224 x 224 x 100

4 muscle groups + background

80% for training, 20% for testing

Parameters4 Laplacians

2 shape priors

1 appearance prior

Baudin et al., 2012

Grady, 2005

BaselinesHand-tuned parameters

Structured-output SVM

Soft segmentation based on signed distance transform

Hard segmentation

Results

Small but statistically significant improvement

• Optimization

• Experiments

Outline

Loss-based Learning

x: Input a: Annotation

Loss-based Learning

x: Input a: Annotation h: Hidden information

a = “jumping”h = “soft-segmentation”

Loss-based Learning

min Σk Δ(correct ak, predicted ak) Annotation Mismatch

x: Input a: Annotation h: Hidden information

a = “jumping”h = “soft-segmentation”

Loss-based Learning

Small improvement using small medical dataset

Loss-based Learning

Large improvement using large vision dataset

Loss-based Learning

min Σk Δ(correct {ak,hk}, predicted {ak,hk}) Modeled using a distributionOutput Mismatch

Kumar, Packer and Koller, ICML 2012

Inexpensive annotation

No experts required

Richer models can be learnt

Questions?

Loss-based Visual Learning with Weak Supervision

Documents

Weak Supervision for Fake News Detection via Reinforcement ...yaqingwa/files/20/AAAI20.pdf · Weak Supervision for Fake News Detection via Reinforcement Learning Yaqing Wang,1 Weifeng

Relation Extraction with Weak Supervision and ... · Relation Extraction with Weak Supervision and Distributional Semantics by Bonan Min A dissertation submitted in partial fulﬁllment

Erasing Scene Text with Weak Supervision

Named Entity Recognition without Labelled Data: A Weak … · 2020. 6. 20. · Named Entity Recognition without Labelled Data: A Weak Supervision Approach Pierre Lison1, Jeremy Barnes2,

Learning to Segment Under Various Forms of Weak Supervision …pages.cs.wisc.edu/~jiaxu/projects/seg-variours-weak... · 2015-04-18 · Discrete Calculus: Applied Analysis on Graphs

Relation Extraction with Weak Supervision and …...By using a model which provides a better approximation of the weak source of supervision, it outperforms the state-of-the-art methods

Learning under Distributed Weak Supervision

Bayesian Loss for Crowd Count Estimation With Point ......Bayesian Loss for Crowd Count Estimation with Point Supervision Zhiheng Ma1∗ Xing Wei1∗ Xiaopeng Hong1,2† Yihong Gong1

Knowledge-Based Weak Supervision for …raphaelhoffmann.com/publications/acl2011.pdfKnowledge-Based Weak Supervision for Information Extraction of Overlapping Relations Raphael Hoffmann,

Weak Supervision - Artificial Intelligencecs231n.stanford.edu/slides/2018/cs231n_2018_ds07.pdf · 2018-06-02 · Weak Supervision Formulation However, instead of having ground-truth

Cap2Det: Learning to Amplify Weak Caption Supervision for Object … · 2019-10-23 · Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection Keren Ye∗1, Mingda

Lecture Notes on Weak Supervision - Machine Learningcs229.stanford.edu/notes2019fall/weak_supervision_notes.pdf · Lecture Notes on Weak Supervision Mayee Chen, Frederic Sala, Chris

Deep learning and weak supervision for image classificationthoth.inrialpes.fr/workshop/thoth2016/slides/cord.pdf · [Oquab,2015] Regionaggregation=max Selectthehighest-scoringwindow

Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Detecting Fake News with Weak Social SupervisionSocial media has little labeled data but possesses unique characteristics that make it suitable for generating weak supervision, resulting

Loss-based Visual Learning with Weak Supervision M. Pawan Kumar Joint work with Pierre-Yves Baudin, Danny Goodman, Puneet Kumar, Nikos Paragios, Noura

Early Detection of Fake News with Multi-source Weak Social …skai2/files/ecml_pkdd_mwss.pdf · 2020-07-09 · Early Detection of Fake News with Multi-source Weak Social Supervision

Basel Committee on Banking Supervision · Basel Committee on Banking Supervision . Global systemically important banks: updated assessment methodology and the higher loss absorbency

Large-Scale Weakly-Supervised Pre-Training for Video Action ......marize our ﬁndings: • Large-scale weak-supervision is extremely beneﬁcial: Weshowthatlarge-scalevideodata,despitenotproviding

Constrained Deep Weak Supervision for Histopathology Image