Automatic Dense Semantic Mapping From Visual Street-level Imagery

Preview:

Citation preview

Automatic Dense Semantic Mapping From Visual Street-level Imagery

Sunando Sengupta[1], Paul Sturgess[1], Lubor Ladicky[2], Phillip H.S. Torr[1]

[1]Oxford Brookes University[2] Visual geometry group, Oxford University

http://cms.brookes.ac.uk/research/visiongroup/index.php 1

Dense Semantic Map

• Generate an overhead view of an urban region.• Label every pixel in the Map View is associated with an

object class label

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post 2

Dense Semantic Map• Street images captured inexpensively from vehicle with

multiple mounted camera[1].

3[1] Yotta. DCL, “Yotta dcl case studies,” Available: http://www.yottadcl.com/surveys/case-studies/

Semantic Mapping Framework

• Semantic mapping framework comprises of two stages

Street level Images acquisition

4

Semantic Mapping Framework

• Semantic mapping framework comprises of two stages– Semantic Image Segmentation at street level.

Street level Images acquisition

Image Segmentation

5

Semantic Mapping Framework

• Semantic mapping framework comprises of two stages– Semantic Image Segmentation at street level.– Ground Plane Labelling at a global level.

• One of the first attempts to do overhead mapping from street level images.

Street level Images acquisition

Image Segmentation

Ground plane labelling

6

Semantic Image Segmentation

Label every pixel in the image with an object class

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

Input Output

Raw Image Labelled Image

Automatic Labeller

Object Class Labels

7

CRFCRF

constructionconstruction

Semantic Image Segmentation• We use Conditional Random Field Framework (CRF)

Final SegmentationInput Image

8

• Each pixel is a node in a grid graph G = (V,E).• Each node is a random variable x taking a label from label

set.

X

Semantic Image Segmentation - CRF• Total energy

• Optimal labelling given as

9

Cc

ccNjVi

jiijVi

ii

i

xxxE )(),()()(,

xx

Epix EpairEregion

Semantic Image Segmentation - CRF

• Total energy E = Epix + Epair + Eregion

• Epix - Model individual pixel’s cost of taking a label.

– Computed via the dense boosting approach– Multi feature variant of texton boost[1]

x

Car 0.2

Road 0.3

10[1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for object class image segmentation,” in ICCV, 2009.

Semantic Image Segmentation - CRF

• Total energy E = Epix + Epair + Eregion

• Epair- Model each pixel neighbourhood interactions.

– Encourages label consistency in adjacent pixels

– Sensitive to edges in images.

– Contrast sensitive Potts modelxi xj

Car

Road

0

g(i,j)

Car

Road

11

Epair

Semantic Image Segmentation - CRF

• Total energy E = Epix + Epair + Eregion

• Eregion - Model behaviour of a group of pixels.

– Classify a region – Encourages all the pixels in a region to take the same label.– Group of pixels given by a multiple meanshift segmentations

c

Car 0.3

Road 0.1

12

Semantic Image Segmentation• Solved using alpha-expansion algorithm[1]

13

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

Input Image Road Expansion

[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99

Semantic Image Segmentation• Solved using alpha-expansion algorithm[1]

14

Input Image Building Expansion

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99

Semantic Image Segmentation• Solved using alpha-expansion algorithm[1]

15

Input Image Sky Expansion

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99

Semantic Image Segmentation• Solved using alpha-expansion algorithm[1]

16

Input Image Pavement Expansion

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99

Semantic Image Segmentation• Solved using alpha-expansion algorithm[1]

17

Input Image Final solution

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99

Ground Plane Labelling• Combine many labellings from street level imagery.

Automatic Labeller

Output

Labelled Ground PlaneStreet Levellabellings

Input

18

Ground Plane CRF• A CRF defined over the ground plane.

• Each ground plane pixel (zi) is a random variable taking a label from the label set.

• Energy for ground plane crf is

Z

19

gpair

gpix

g EEZE )(

Ground Plane Pixel Cost

KX

Z

• We assume a flat world.

20

Ground Plane Pixel Cost

Homography Road Pavement Post/Pole

KX

Z

• A ground plane region is estimated.

21

KX

Z

Ground Plane Pixel Cost

22

Homography Road Pavement Post/Pole

• Each point in the image projects to a unique point on the ground plane.– Creating a homography

KX

Z

Ground Plane Pixel Cost

23

Ground plane

Pixel histogramsHomography Road Pavement Post/Pole

• The image labelling is mapped to the ground plane – via the homography.

• Labels projected from many views are combined in a histogram.• The normalised histogram gives the naïve probability of the

ground plane pixel taking a label.

Ground Plane Pixel Cost

24

KX

ZGround plane Pixel histogramsHomography Road Pavement Post/Pole

Ground Plane Pixel Cost

25

KX

ZGround plane Pixel histogramsHomography Road Pavement Post/Pole

• Labels projected from many views are combined in a histogram.• The normalised histogram gives the naïve probability of the

ground plane pixel taking a label.

Ground Plane labelling

• Histogram is built for every ground plane pixel giving Egpix

• Pairwise cost (Egpair) added to induce smoothness

– Contrast sensitive potts model

Z

Ground Plane labelling• Final CRF solution obtained using alpha expansion.

Void

Ground Plane labelling

Road expansion

• Final CRF solution obtained using alpha expansion.

Ground Plane labelling

Building expansion

• Final CRF solution obtained using alpha expansion.

Ground Plane labelling

Pavement expansion

• Final CRF solution obtained using alpha expansion.

Ground Plane labelling

Car expansion

• Final CRF solution obtained using alpha expansion.

Ground Plane Labelling

Final Solution

• Final CRF solution obtained using alpha expansion.

Dataset

• Subset of the images captured by the van– 14.8 km of track, 8000 images from each camera.

• Pixel-level labelled ground truth images. Dataset available[1].

• 13 object categories –

• Training - 44 images, testing - 42 images.

[1]http://cms.brookes.ac.uk/research/visiongroup/projects/SemanticMap/index.php

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

33

SIS Results

• Input Images, output of our image level CRF, ground truths.• Used Automatic Labelling environment[1]

[1] The Automatic Labelling Environment, L Ladicky, PHS Torr. Code available http://cms.brookes.ac.uk/staff/PhilipTorr/ale.htm

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

34

Input

Semanticsegmentation

Ground Truth

Semantic Map Results

Semantic map of Pembroke city

35

Ground plane Map Evaluation

36

Street Images

Back-projectedMap results

Ground Truth

• We back-project the ground plane map into image domain and evaluate the results.

• Global pixel accuracy of 86%

Results

37

Conclusions• Presented a method to generate

overhead view semantic mapping.

• Experiments on large tracks (~15km) which can be scaled up to country wide mapping

• Dataset available[1].

[1] http://cms.brookes.ac.uk/research/visiongroup/projects/SemanticMap/index.php 38

Future Work

39

Oxford Brookes Vision groupOxford Brookes Universityhttp://cms.brookes.ac.uk/research/visiongroup/index.php

• Perform a 3D street level semantic mapping and reconstruction.

• Add detailed street level information like signs, information boards etc.

Thank you!!!

Ground Plane Pixel Cost

41

• Using single view will create a shadow effect for objects violating flat world assumption and wrong label estimate

KX

Z

Single view

Multi-view

Homography Road Pavement Post/Pole

Recommended