Generative Models for Image Understanding

Nebojsa Jojic and Thomas HuangBeckman Institute and ECE Dept.

University of Illinois

Problem: Summarization of High Dimensional Data

• Pattern Analysis: – For several classes c=1,..,C of the data, define probability

distribution functions p(x| c)

• Compression: – Define a probabilistic model p(x) and devise an optimal coding

approach

• Video Summary: – Drop most of the frames in a video sequence and keep interesting

information that summarizes it.

Generative density modeling• Find a probability model that

– reflects desired structure– randomly generates plausible images, – represents the data by parameters

• ML estimation

• p(image|class) used for recognition, detection, ...

Problems we attacked• Transformation as a discrete variable in

generative models of intensity images• Tracking articulated objects in dense stereo

maps• Unsupervised learning for video summary

• Idea - the structure of the generative model reveals the interesting objects we want to extract.

Mixture of Gaussians

The probability of pixel intensities z given that the image is from cluster c is p(z|c) = N(z; c , c)

P(c) = c

Mixture of Gaussians

cP(c) = c

zp(z|c) = N(z; c , c)

• Parameters c, c and c represent the data

• For input z, the cluster responsibilities are

P(c|z) = p(z|c)P(c) / c p(z|c)P(c)

Example: Simulation

c=1P(c) = c

p(z|c) = N(z; c , c)

1= 0.6,

2= 0.4,

Example: Simulation

c=2P(c) = c

p(z|c) = N(z; c , c)

1= 0.6,

2= 0.4,

Example: Learning - E step

Images from data set

P(c|z)

1= 0.5,

2= 0.5,

Example: Learning - E step

Images from data set

P(c|z)0.48

1= 0.5,

2= 0.5,

Example: Learning - M step

1= 0.5,

2= 0.5,

zSet 1 to the average of zP(c=1|z)

Set 2 to the average of zP(c=2|z)

Example: Learning - M step

1= 0.5,

2= 0.5,

zSet 1 to the average of

diag((z-1)T (z-1))P(c=1|z)Set 2 to the average of

diag((z-2)T (z-2))P(c=2|z)

Transformation as a Discrete Latent Variable

withBrendan J. Frey

Computer Science, University of Waterloo, CanadaBeckman Institute & ECE, Univ of Illinois at Urbana

Kind of data we’re interested in

Even after tracking, the features still have unknown positions, rotations, scales, levels of shearing, ...

Oneapproach

Normalization

PatternAnalysis

Images

Normalizedimages

Ourapproach

Joint Normalization

andPattern Analysis

Images

• A continuous transformation moves an image, , along a continuous curve

• Our subspace model should assign images near this nonlinear manifold to the same point in the subspace

What transforming an image does in the vector space of pixel intensities

Tractable approaches to modeling the transformation manifold

\ Linear approximation - good locally

• Discrete approximation - good globally

Adding “transformation” as a discrete latent variable

• Say there are N pixels

• We assume we are given a set of sparse N x N transformation generating matrices G1,…,Gl ,…,GL

• These generate points from point

Transformed Mixture of Gaussians

• l, c, c and c represent the data

• The cluster/transf responsibilities, P(c,l|x), are quite easy to compute

p(x|z,l) = N(x; Gl z , )

P(l) = l l

p(z|c) = N(z; c , c)

P(c) = c

Example: Simulation

G1 = shift left and up, G2 = I, G3 = shift right and up

ML estimation of a Transformed Mixture of Gaussians using EM

• E step: Compute P(l|x), P(c|x) and p(z|c,x) for each x in data

• M step: Set– c = avg of P(c|x)

– l = avg of P(l|x)

– c = avg mean of p(z|c,x)

– c = avg variance of p(z|c,x)

– = avg var of p(x-Gl z|x)

Face ClusteringExamples of 400 outdoor images of 2 people

(44 x 28 pixels)

Mixture of Gaussians15 iterations of EM (MATLAB takes 1 minute)

Cluster meansc = 1 c = 2 c = 3 c = 4

30 iterations of EM

Cluster meansc = 1 c = 2 c = 3 c = 4

Transformed mixture of Gaussians

Video Analysis Using Generative Models

with Brendan Frey, Nemanja Petrovic and Thomas Huang

• Use generative models of video sequences to do unsupervised learning

• Use the resulting model for video summarization, filtering, stabilization, recognition of objects, retrieval, etc.

Transformed Hidden Markov Model

P(c,l|past)

THMM Transition Models

• Independent probability distributions for class and transformations; relative motion

P(ct , lt | past)= P(ct | ct-1) P(d(lt , l t-1))

• Relative motion dependent on the classP(ct , lt | past)= P(ct | ct-1) P(d(lt , l t-1) | ct)

• Autoregressive model for transformation distribution

Inference in THMM

• Tasks:– Find the most likely state at time t given the

whole observed sequence {xt} and the model parameters (class means and variances, transition probabilities, etc.)

– Find the distribution over states for each time t– Find the most likely state sequence– Learn the parameters that maximize he

likelihood of the observed data

Video Summary and Filtering

p(x|z,l) = N(x; Gl z , )

p(z|c) = N(z; c , c) Video summary

Image segmentation

Removal of sensor noise

Image Stabilization

Example: Learning

• Hand-held camera• Moving subject• Cluttered backgroundDATA

c 1 class

121 translations (11 vertical and 11 horizontal shifts)

c5 classes

Examples

• Normalized sequence

• Simulated sequence

• De-noising

• Seeing through distractions

Future work

• Fast approximate learning and inference

• Multiple layers

• Learning transformations from images

Nebojsa Jojic: www.ifp.uiuc.edu/~jojic

Subspace models of imagesExample: Image, R 1200 = f (y, R 2)

Shut eyes

The density of pixel intensitiesz given subspace pointy is p(z|y) = N(z; +y, )

p(y) = N(y; 0, I)

Factor analysis (generative PCA)

Manifold: f (y) = +y, linear

• Parameters , represent the manifold• Observing z induces a Gaussian p(y|z):

COV[y|z] = (I)

E[y|z] = COV[y|z] z

p(z|y) = N(z; +y, )

p(y) = N(y; 0, I)

Factor analysis (generative PCA)

Example: Simulation

p(z|y) = N(z; +y, )

p(y) = N(y; 0, I) Frn

Example: Simulation

p(z|y) = N(z; +y, )

p(y) = N(y; 0, I) Frn

Example: Simulation

p(z|y) = N(z; +y, )

p(y) = N(y; 0, I) Frn

p(z|y) = N(z; +y, )

Transformed Component Analysis

lP(l) = l

p(y) = N(y; 0, I)

The probability of observedimage x is p(x|z,l) = N(x; Gl z , )

Example: Simulation

G1 = shift left & up, G2 = I, G3 = shift right & up

Example: InferenceG1 = shift left & up, G2 = I, G3 = shift right & up

Garbage

P(l=1|x) =

P(l=3|x) =

P(l=2|x) =

EM algorithm for TCA• Initialize , , , to random values • E Step

– For each training case x(t), infer

q(t)(l,z,y) = p(l,z,y |x(t))• M Step

– Compute new,new, new,new,new to maximize

t E[ log p(y) p(z|y) P(l) p(x(t)|z,l)],

where E[] is wrt q(t)(l,z,y) • Each iteration increases log p(Data)

A tough toy problem• 144, 9 x 9 images• 1 shape (pyramid)• 3-D lighting• cluttered background

• 25 possible locations

1st 8 principal components:

• 3 components• 81 transformations

- 9 horiz shifts- 9 vert shifts

• 10 iters of EM

• Model generates realistic examples

:1:2 :3

Expression modeling

• 100 16 x 24 training images

• variation in expression

• imperfect alignment

PCA: Mean + 1st 10 principal components

Factor Analysis: Mean + 10 factors after 70 its of EM

TCA: Mean + 10 factors after 70 its of EM

Fantasies from FA model Fantasies from TCA model

Modeling handwritten digits

• 200 8 x 8 images of each digit

• preprocessing normalizes vert/horiz translation and scale

• different writing angles (shearing) - see “7”

TCA: - 29 shearing + translation combinations - 10 components per digit - 30 iterations of EM per digit

Mean of each digitTransformed means

FA: Mean + 10 components per digit

TCA: Mean + 10 components per digit

Classification Performance• Training: 200 cases/digit, 20 components, 50 EM iters

• Testing: 1000 cases, p(x|class) used for classification

• Results:

Method Error ratek-nearest neighbors (optimized k) 7.6%Factor analysis 3.2%Tranformed component analysis 2.7%

• Bonus: P(l|x) infers the writing angle!

Wrap-up• Papers, MATLAB scripts:

www.ifp.uiuc.edu/~jojicwww.cs.uwaterloo.ca/~frey

• Other domains: audio, bioinfomatics, …

• Other latent image models, p(z)– mixtures of factor analyzers (NIPS99)– layers, multiple objects, occlusions– time series (in preparation)

Wrap-up• Discrete+Linear Combination: Set some

components equal to derivatives of wrt transformations

• Multiresolution approach

• Fast variational methods, belief propagation,...

Other generative models

• Modeling human appearance in stereo images: articulated, self-occluding Gaussians

Generative Models for Image Understanding

Documents

Spatial Image Steganography Based on Generative ...1 Spatial Image Steganography Based on Generative Adversarial Network Jianhua Yang, Kai Liu, Student Member, IEEE, Xiangui Kang ,

Enhanced Image Decoding via Edge-Preserving Generative ... · ENHANCED IMAGE DECODING VIA EDGE-PRESERVING GENERATIVE ADVERSARIAL NETWORKS Qi Mao 1, Shiqi Wang2, Shanshe Wang , Xinfeng

Generative adversarial network based regularized image ... · Generative adversarial network based regularized image reconstruction for PET Zhaoheng Xie 1, Reheman Baikejiang , Tiantian

Optimizing Generative Adversarial Networks for Image Super ... · Optimizing Generative Adversarial Networks for Image Super Resolution via Latent Space Regularization ... reconstructions

StarGAN: Unified Generative Adversarial Networks …openaccess.thecvf.com/content_cvpr_2018/papers/Choi...StarGAN: Uniﬁed Generative Adversarial Networks for Multi-Domain Image-to-Image

MC-GAN: Multi-conditional Generative Adversarial Network for Image ... · MC-GAN: Multi-conditional Generative Adversarial Network for Image Synthesis Hyojin Park 1 wolfrun@snu.ac.kr

Generative Adversarial Networks - Machine Learningmachinelearning.math.rs/Jordanski-GAN.pdf•Text -to-Image Synthesis •Learn useful embeddings •... Generative AdversarialNetworks

Pixel RNN/CNN for Generative Image Modelpr-ai.hit.edu.cn/_upload/article/files/df/3a/bbe2530444669b9bed4ea... · Approach: Spatial LSTM, PixelRNN / PixelCNN [1] Generative image modeling

Generative Image Inpainting With Contextual Attentionopenaccess.thecvf.com/content_cvpr_2018/papers/Yu_Generative_Image_In... · Generative Image Inpainting with Contextual Attention

Image Inpainting via Generative Multi-column Convolutional Neural Networks · 2019-11-29 · Image Inpainting via Generative Multi-column Convolutional Neural Networks Yi Wang1 Xin

Realistic River Image Synthesis using Deep Generative

Generative Image Modeling using Style and Structure ... · Generative Image Modeling using Style and Structure Adversarial Networks Xiaolong Wang, Abhinav Gupta Robotics Institute,

Deep Generative Image Models using a Laplacian …papers.nips.cc/paper/5773-deep-generative-image-models...Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

A Survey of Image Synthesis and Editing with Generative ...kun/papers/gan_survey_final.pdf · 2 Generative Adversarial Networks Generative adversarial networks (GANs) were proposed

Understanding Posterior Collapse in Generative Latent

Image Difficulty Curriculum for Generative Adversarial Networks … · 2020. 2. 25. · Image Difﬁculty Curriculum for Generative Adversarial Networks (CuGAN) Petru Soviany1, Claudiu

Image Colorization using Generative Adversarial Networks

Image Inpainting via Generative Multi-column Convolutional …papers.nips.cc/paper/7316-image-inpainting-via... · 2019-02-19 · Image Inpainting via Generative Multi-column Convolutional

Generative adversarial networks · 2020. 10. 30. · Generative tasks • Generation conditioned on image (image-to-image translation) P. Isola, J.-Y. Zhu, T. Zhou, A. Efros, Image-to-Image

Single Image Dehazing via Conditional Generative ...openaccess.thecvf.com/content_cvpr_2018/papers/Li_Single_Image_De... · Single Image Dehazing via Conditional Generative Adversarial