15
Building and road detection from large aerial imagery Shunta SAITO*, Yoshimitsu AOKI* * Graduate School of Science and Technology, Keio University, Japan

Building and road detection from large aerial imagery

Embed Size (px)

Citation preview

Building and road detection from large aerial imagery

Shunta SAITO*, Yoshimitsu AOKI* * Graduate School of Science and Technology,

Keio University, Japan

Motivation• Understanding aerial image is highly demanded for generating maps,

analyzing disaster scale, detecting changes to manage estate, etc. • But it’s been usually done by human experts, so that it’s both slow and costly. • Remote sensing community has been focused on this task but it’s still difficult

to detect terrestrial objects automatically from aerial image with high accuracy.

Goal

Input aerial image (RGB)

Output 3-channels map R: road, G: building, B: others

The trade-off between different objects at a same pixel

Previous WorksSenaras et al., Building detection with decision fusion, 2013

ResultProcess flow

Input

Predicted labels

Ground truth

Mean shift segmentation

Extract 15 different features

Combine the multiple classifica-tion results

segments

feature

feature

feature

Vegetation maskShadow mask

Infrared-Red image

Input aerial image (Infrared, RGB)

Infrared-Red-Green image

Hue-Saturation-Intensity (HSI)

image

Normalized Difference

Vegetation Index (NDVI) image

classifier

classifier

classifier

Previous WorksVolodymyr Mnih, Machine Learning for Aerial Image Labeling, 2013

• They take patch-based approach which is very suited to use Convolutional Neural Network (CNN).

• They formulate the problem as obtaining a mapping from aerial image patch to label image patch.

• However, they train two CNNs, one for building, another for road, despite there may be trade-offs between them.

Process flow Result Description

Aerial imagery

Predicted Label

Noise model

Patches

CNNCNNCNN

Dataset

Aerial image Building labelx 151

Road labelx 1109

Convolutional Neural Network

64 x 64 x 3 (RGB) sized patches

Correct answers

Predictions

16 x 16 x 3 (building, road, other)

Calculate loss

Back

prop

agat

ion

Our ApproachWe train a Convolutional Neural Network (CNN) as a mapping from an input aerial image patch and a 3-channel label image patch using stochastic gradient descent.

RG

B

Input aerialimage patch

Predicted map patch

FC(4096)C(64, 9x9/2) P(2/1) C(128, 7x7/1) C(128, 5x5/1) FC(768)

• We train a CNN which has the above architecture. • The CNN takes a small RGB image patch as input, and

output a predicted 3-channel label patch. • The predicted label patch is consisted of Road channel

and Building channel and others channel. • No pre-processing like segmentation is needed and we

don’t need to design any image features. CNN obtains good feature extractors automatically.

Patch-based framework

allow the network to use surrounding pixels to predict labels in the center patch

Using context

It’s building!?

1

0

0

0

0

1

m̃i2

m̃i1

m̃i3

i

s

wm

wm

ws

ws

Road

Building

Otherwise

Aerial imagepatch

Predicted label patch

p(m̃|s) =

w2m�

i=1

p(m̃i|s)We learn with CNN.

Loss function• Each pixel in predicted label image

• is independent each other (assumption) • is always belonging to only one of the 3 labels (building, road, others)

m̂i

(1.56, 4.37, 3.11)

softmax

(0.05, 0.74, 0.21)

Predicted label

(1, 0, 0)

m̃i

Correct label

c : channel (Building, Road, Others)P : correct label distribution (1-of-3 coding)Q : predicted label distribution

Asymmetric cross entropy

and just minimize this cross entropy by Stochastic Gradient Descent

wc : weight for each channel loss

wbuilding = 1.5, wroad = 1.5, wothers = 0

*because prediction loss in the others channel is not important

Dataset

+ →

Building label Road label 3-channel labelAerial image

• We combine the Volodymyr’s Road and Building detection datasets* to create our 3-channel map dataset.

• Our dataset contains 147 sets of aerial images and 3-channel label images. - 137 sets for training - 10 sets for testing

• Each image is 1500 x 1500 pixel sized at 1m^2/pixel resolution.

• The entire dataset covers roughly 340 km^2 of mainly urban region in Massachusetts, the United States.

Aerial images 3-ch labels

Dataset

* http://www.cs.toronto.edu/~vmnih/data/

Experiment

• Training with 137 images and labels

• Testing with 10 images and labels

• We test some variants of the basic architecture.

RG

B

Input aerialimage patch

Predicted map patch

FC(4096)C(64, 9x9/2) P(2/1) C(128, 7x7/1) C(128, 5x5/1) FC(768)

Basic architecture

Activation Dropout rate Filter sizeS-ReLU(Basic) ReLU N/A 9-7-5

S-ReLU-Dropout ReLU 0.5 9-7-5

S-Maxout Maxout 0.5 9-7-5

ReLU ReLU N/A 16-4-3

ReLU-Dropout ReLU 0.5 16-4-3

Maxout Maxout 0.5 16-4-3

Tested architectures

Input aerial image Predicted 3-channel label image from the basic architecture

Example of Test Results

Input aerial image

Example of Test Results

Predicted 3-channel label image from the basic architecture

Precision-recall curve

Compare road channel result with Volodymyr Mnih’s result

Compare building channel result with Volodymyr Mnih’s result

Precision-recall curve

Road BuildingS-ReLU(Basic) 0.8905 0.9241

S-ReLU-Dropout 0.8889 0.9220

S-Maxout 0.8842 0.9185

ReLU 0.8657 0.8984

ReLU-Dropout 0.8650 0.8973

Maxout 0.8548 0.8940

Volodymyr 0.8873 0.9150

Precision at breakeven point• Our basic architecture achieved the

best results. • Using Maxout or Dropout, or both

seem not to improve the performance. • The architecture which has smaller

filter size is better than ones with bigger filters.

Conclusion• We propose a CNN-based building and road extraction method for aerial

imagery. • Our method doesn’t need hand-designed image features because the good

feature extractors are automatically constructed by training CNN. • Our CNN predicts building and road regions simultaneously at state-of-the-art

accuracy.

Thank you for your kind attention.

All codes to generate our dataset, perform training of CNN, and test of the

resulting models are available on GitHub.

https://github.com/mitmul/ssai