2
Unsupervised Object Segmentation by Redrawing Mickaël Chen 1 , Thierry Artières 2,3 and Ludovic Denoyer 4 1 Sorbonne Université, CNRS, LIP6, F-75005, Paris, France 2 Aix Marseille Univ, Université de Toulon, CNRS, LIS, Marseille, France 3 Ecole Centrale Marseille 4 Facebook Artificial Intelligence Research We present ReDO (ReDrawing Objects): an unsupervised, data-driven, object segmentation method for real images. We assume natural images generation is a composite process in which each object is generated independently. Object segmentation is then the discovery of regions that can be redrawn without seeing the rest of the image. Image Composition Model We consider the following underlying generative pro- cess G that produce images in three steps. 1 Define the position of the different regions ie. global structure of the image, by sampling N region masks. M p(M), M ∈{0, 1} N ×W ×H , N X k =1 M k x,y =1 2 Generate the contents of each region independently. V k G k (M k , z k ), z k p(z) for k ∈{1,...,n} 3 Aggregate the resulting regions into the final image. G(M, z 1 ,..., z n )= n X k =1 M k V k Towards Learning To Segment We replace step 1 by a segmentation function F that produce a mask given an image input I ∈R C ×W ×H . 1 Obtain the mask using a segmentation function F. MF(I), M [ 0, 1] N ×W ×H , N X k =1 M k x,y =1 2 Generate the contents of each region independently. V k G k (M k , z k ), z k p(z) for k ∈{1,...,n} 3 Aggregate the resulting regions into the final image. G F (I, z 1 ,..., z n )= n X k =1 M k V k We can train this model end-to-end using an adver- sarial loss to match the distribution of generated images to the dataset distribution. But it would nat- urally converge to trivial and uninformative solutions. Conservation of Information Problem 1: Mapping all pixels to one region is a trivial but valid solution. z1 z2 Figure 1: In this example, the input is ignored by the segmen- tation function and region 2 is responsible for the whole image. The model has collapsed into a standard GAN. Solution: We add a learned function δ that tries to reconstruct the noise vectors z k from the generated image. z1 ẑ1 z2 ẑ2 Figure 2: As region 1 does not contribute, z 1 can not be retrieved from the generated image. Redrawing a Single Region Problem 2: The segmentation function can ignore the input. Figure 3: In this example, the segmentation function choose re- gions that are meaningless w.r.t. to the input. The generator can still generate a perfectly fine image. Solution: We tie the output to the input by only regenerating one region at a time, keeping the rest of the image the same. Figure 4: Given a wrong segmentation, the generator cannot pro- duce a consistant image. Learning the full model for object segmentation f z1 G1 generated image generated region input I inferred mask M1 inferred mask M2 ẑ1 D Real or Fake ? Figure 5: The ReDO pipeline. Learned neural networks are represented in bold colored lines. Objective functions: We use the hinge version of the adversarial loss. max G F L G = E Ip data ,i∼U (n),z i p(z ) h D(G F (I, z i , i)) - λ z ||δ i (G F (I, z i , i)) - z i || 2 i max D L D = E Ip data h min(0, -1+ D(I)) i + E Ip data ,i∼U (n),z i p(z) h min(0, -1 - D(G F (I, z i , i)) i Preprint https://arxiv.org/abs/1905.13539 Code and Pretrained https://github.com/mickaelChen/ReDO

Unsupervised Object Segmentation by Redrawingwebia.lip6.fr/~chenm/paper/...Redrawing__poster_.pdf · Unsupervised Object Segmentation by Redrawing MickaëlChen1,ThierryArtières2,3

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Unsupervised Object Segmentation by Redrawingwebia.lip6.fr/~chenm/paper/...Redrawing__poster_.pdf · Unsupervised Object Segmentation by Redrawing MickaëlChen1,ThierryArtières2,3

Unsupervised Object Segmentation by RedrawingMickaël Chen1, Thierry Artières2,3 and Ludovic Denoyer4

1Sorbonne Université, CNRS, LIP6, F-75005, Paris, France2Aix Marseille Univ, Université de Toulon, CNRS, LIS, Marseille, France

3Ecole Centrale Marseille 4Facebook Artificial Intelligence Research

We present ReDO (ReDrawing Objects): anunsupervised, data-driven, object segmentationmethod for real images.We assume natural images generation is a compositeprocess in which each object is generatedindependently. Object segmentation is then thediscovery of regions that can be redrawn withoutseeing the rest of the image.

Image Composition Model

We consider the following underlying generative pro-cess G that produce images in three steps.1 Define the position of the different regions ie. globalstructure of the image, by sampling N region masks.

M ∼ p(M), M ∈ {0, 1}N×W×H,N∑k=1

Mkx,y = 1

2 Generate the contents of each region independently.Vk ← Gk(Mk, zk), zk ∼ p(z) for k ∈ {1, . . . , n}

3 Aggregate the resulting regions into the final image.

G(M, z1, . . . , zn) =n∑k=1

Mk �Vk

Towards Learning To Segment

We replace step 1 by a segmentation function F thatproduce a mask given an image input I ∈ RC×W×H.1 Obtain the mask using a segmentation function F.

M← F(I), M ∈ [0, 1]N×W×H,N∑k=1

Mkx,y = 1

2 Generate the contents of each region independently.Vk ← Gk(Mk, zk), zk ∼ p(z) for k ∈ {1, . . . , n}

3 Aggregate the resulting regions into the final image.

GF(I, z1, . . . , zn) =n∑k=1

Mk �Vk

We can train this model end-to-end using an adver-sarial loss to match the distribution of generatedimages to the dataset distribution. But it would nat-urally converge to trivial and uninformative solutions.

Conservation of Information

Problem 1: Mapping all pixels to one region is atrivial but valid solution.

z1

z2

Figure 1: In this example, the input is ignored by the segmen-tation function and region 2 is responsible for the whole image.The model has collapsed into a standard GAN.

Solution: We add a learned function δ that tries toreconstruct the noise vectors zk from the generatedimage.

z1 ẑ1

z2 ẑ2

Figure 2: As region 1 does not contribute, z1 can not be retrievedfrom the generated image.

Redrawing a Single Region

Problem 2: The segmentation function can ignorethe input.

Figure 3: In this example, the segmentation function choose re-gions that are meaningless w.r.t. to the input. The generator canstill generate a perfectly fine image.

Solution: We tie the output to the input by onlyregenerating one region at a time, keeping the rest ofthe image the same.

Figure 4: Given a wrong segmentation, the generator cannot pro-duce a consistant image.

Learning the full model for object segmentation

f

z1G1

generated image

generated region input I inferred mask M1

inferred mask M2

�����

�(�)

ẑ1

D

Realor

Fake?

Figure 5: The ReDO pipeline. Learned neural networks are represented in bold colored lines.

Objective functions: We use the hinge version of the adversarial loss.maxGF,δLG = EI∼pdata,i∼U(n),zi∼p(z)

[D(GF(I, zi, i))− λz||δi(GF(I, zi, i))− zi||2

]max

DLD = EI∼pdata

[min(0,−1 + D(I))

]+ EI∼pdata,i∼U(n),zi∼p(z)

[min(0,−1− D(GF(I, zi, i))

]

Preprint

https://arxiv.org/abs/1905.13539

Code and Pretrained

https://github.com/mickaelChen/ReDO

Page 2: Unsupervised Object Segmentation by Redrawingwebia.lip6.fr/~chenm/paper/...Redrawing__poster_.pdf · Unsupervised Object Segmentation by Redrawing MickaëlChen1,ThierryArtières2,3

Unsupervised Object Segmentation by RedrawingMickaël Chen1, Thierry Artières2,3 and Ludovic Denoyer4

1Sorbonne Université, CNRS, LIP6, F-75005, Paris, France2Aix Marseille Univ, Université de Toulon, CNRS, LIS, Marseille, France

3Ecole Centrale Marseille 4Facebook Artificial Intelligence Research

Experiments

We evaluate ReDO on 3 datasets of real images.

•Without supervision, ReDO discoversmeaningful object masks and noise vectorsz codes for specific texture.•ReDO’s performance is comparable to supervisedbaselines trained with about 50-100 labelleddatapoints.•Preliminary experiments indicates that ReDOcan work on datasets with multiple objects ormultiple classes without using labels.

ReDO and supervised baselines

101 102 1030.70

0.75

0.80

0.85

0.90

0.95

1.00

test

set A

ccur

acy

LFW Dataset

101 102 103 1040.70

0.75

0.80

0.85

0.90

0.95

1.00CUB Dataset

101 102 1030.70

0.75

0.80

0.85

0.90

0.95

1.00Flowers Dataset

101 102 103

number of training samples

0.5

0.6

0.7

0.8

0.9

1.0

test

set I

oU

101 102 103 104

number of training samples

0.0

0.2

0.4

0.6

0.8

1.0

101 102 103

number of training samples

0.5

0.6

0.7

0.8

0.9

1.0

supervisedours (unsupervised)

Generated Samples Dataset with 2 Categories

Datasets with 2 Objects

Additional masks