Unsupervised Object Segmentation by Redrawingwebia.lip6.fr/~chenm/paper/...Redrawing__poster_.pdf · Unsupervised Object Segmentation by Redrawing MickaëlChen1,ThierryArtières2,3

Unsupervised Object Segmentation by RedrawingMickaël Chen1, Thierry Artières2,3 and Ludovic Denoyer4

1Sorbonne Université, CNRS, LIP6, F-75005, Paris, France2Aix Marseille Univ, Université de Toulon, CNRS, LIS, Marseille, France

3Ecole Centrale Marseille 4Facebook Artificial Intelligence Research

We present ReDO (ReDrawing Objects): anunsupervised, data-driven, object segmentationmethod for real images.We assume natural images generation is a compositeprocess in which each object is generatedindependently. Object segmentation is then thediscovery of regions that can be redrawn withoutseeing the rest of the image.

Image Composition Model

We consider the following underlying generative pro-cess G that produce images in three steps.1 Define the position of the different regions ie. globalstructure of the image, by sampling N region masks.

M ∼ p(M), M ∈ {0, 1}N×W×H,N∑k=1

Mkx,y = 1

2 Generate the contents of each region independently.Vk ← Gk(Mk, zk), zk ∼ p(z) for k ∈ {1, . . . , n}

3 Aggregate the resulting regions into the final image.

G(M, z1, . . . , zn) =n∑k=1

Mk �Vk

Towards Learning To Segment

We replace step 1 by a segmentation function F thatproduce a mask given an image input I ∈ RC×W×H.1 Obtain the mask using a segmentation function F.

M← F(I), M ∈ [0, 1]N×W×H,N∑k=1

Mkx,y = 1

2 Generate the contents of each region independently.Vk ← Gk(Mk, zk), zk ∼ p(z) for k ∈ {1, . . . , n}

3 Aggregate the resulting regions into the final image.

GF(I, z1, . . . , zn) =n∑k=1

Mk �Vk

We can train this model end-to-end using an adver-sarial loss to match the distribution of generatedimages to the dataset distribution. But it would nat-urally converge to trivial and uninformative solutions.

Conservation of Information

Problem 1: Mapping all pixels to one region is atrivial but valid solution.

z1

z2

Figure 1: In this example, the input is ignored by the segmen-tation function and region 2 is responsible for the whole image.The model has collapsed into a standard GAN.

Solution: We add a learned function δ that tries toreconstruct the noise vectors zk from the generatedimage.

z1 ẑ1

z2 ẑ2

Figure 2: As region 1 does not contribute, z1 can not be retrievedfrom the generated image.

Redrawing a Single Region

Problem 2: The segmentation function can ignorethe input.

Figure 3: In this example, the segmentation function choose re-gions that are meaningless w.r.t. to the input. The generator canstill generate a perfectly fine image.

Solution: We tie the output to the input by onlyregenerating one region at a time, keeping the rest ofthe image the same.

Figure 4: Given a wrong segmentation, the generator cannot pro-duce a consistant image.

Learning the full model for object segmentation

f

z1G1

generated image

generated region input I inferred mask M1

inferred mask M2

��

�(�)

ẑ1

D

�

�

Realor

Fake?

Figure 5: The ReDO pipeline. Learned neural networks are represented in bold colored lines.

Objective functions: We use the hinge version of the adversarial loss.maxGF,δLG = EI∼pdata,i∼U(n),zi∼p(z)

[D(GF(I, zi, i))− λz||δi(GF(I, zi, i))− zi||2

]max

DLD = EI∼pdata

[min(0,−1 + D(I))

]+ EI∼pdata,i∼U(n),zi∼p(z)

[min(0,−1− D(GF(I, zi, i))

]

Preprint

https://arxiv.org/abs/1905.13539

Code and Pretrained

https://github.com/mickaelChen/ReDO

Unsupervised Object Segmentation by RedrawingMickaël Chen1, Thierry Artières2,3 and Ludovic Denoyer4

1Sorbonne Université, CNRS, LIP6, F-75005, Paris, France2Aix Marseille Univ, Université de Toulon, CNRS, LIS, Marseille, France

3Ecole Centrale Marseille 4Facebook Artificial Intelligence Research

Experiments

We evaluate ReDO on 3 datasets of real images.

•Without supervision, ReDO discoversmeaningful object masks and noise vectorsz codes for specific texture.•ReDO’s performance is comparable to supervisedbaselines trained with about 50-100 labelleddatapoints.•Preliminary experiments indicates that ReDOcan work on datasets with multiple objects ormultiple classes without using labels.

ReDO and supervised baselines

101 102 1030.70

0.75

0.80

0.85

0.90

0.95

1.00

test

set A

ccur

acy

LFW Dataset

101 102 103 1040.70

0.75

0.80

0.85

0.90

0.95

1.00CUB Dataset

101 102 1030.70

0.75

0.80

0.85

0.90

0.95

1.00Flowers Dataset

101 102 103

number of training samples

0.5

0.6

0.7

0.8

0.9

1.0

test

set I

oU

101 102 103 104


0.0

0.2

0.4

0.6

0.8

1.0

101 102 103


0.5

0.6

0.7

0.8

0.9

1.0

supervisedours (unsupervised)

Generated Samples Dataset with 2 Categories

Datasets with 2 Objects

Additional masks

Documents

Unsupervised Object Segmentation by Redrawingwebia.lip6.fr/~chenm/paper/...Redrawing__poster_.pdf · Unsupervised Object Segmentation by Redrawing MickaëlChen1,ThierryArtières2,3