40
Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Embed Size (px)

Citation preview

Page 1: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Auditory and Visual Spatial Sensing

Stan BirchfieldDepartment of Electrical and

Computer EngineeringClemson University

Page 2: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Human Spatial Sensing

The five senses:

Hearing

Taste

Touch

Smell

Seeing

f(t)f(x,y,,t)

Page 3: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Visual and Auditory Pathways

Page 4: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Two Problems inSpatial Sensing

Stereo Vision Acoustic Localization

Page 5: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Clemson Vision Laboratory

head tracking

root detection reconstruction

highway monitoring

motion segmentation

Page 6: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Clemson Vision Lab (cont.)

microphone position calibration

speakerlocalization

Page 7: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Stereo Vision

INPUT

OUTPUT

Left Right

Disparity map Depth discontinuities

epipolarconstraint

Page 8: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Epipolar Constraint

Left camera Right camera

world point

center ofprojection

epipolarplane

epipolarline

Page 9: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Energy Minimization

Left

Right

inte

nsi

ty occluded pixels

E E d(x ,x - ) u(l )data smoothness L Lx iiL

minimize:

dissimilarity discontinuitypenalty

(underconstrained)constraint

Page 10: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

History of Stereo Correspondence

Birchfield & Tomasi 1998

Geiger et al. 1995

Intille &Bobick 1994

Belhumeur & Mumford 1992

Ohta & Kanade 1985

Baker & Binford 1981

MULTIWAY-CUT(2D)

DYNAMICPROGRAMMING

(1D)

Kolmogorov & Zabih 2001, 2002

Lin & Tomasi 2002

Birchfield & Tomasi 1999

Boykov, Veksler, and Zabih 1998

Roy & Cox 1998

Page 11: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Dynamic Programming: 1D Search

Dis

par

ity

map

occlusion

depthdiscontinuity

RIGHTL

EF

T

c a r t

ca

t 3 2 1 1 12 1 0 1 21 0 1 2 30 1 2 3 4

string editing:

stereo matching:

penalties: mismatch = 1 insertion = 1 deletion = 1

c a t

c a r t

Page 12: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Multiway-Cut:2D Search

pixels

labels

pixels

labels

[Boykov, Veksler, Zabih 1998]

Page 13: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Multiway-Cut Algorithm

),( x'x ))(, x(x fg

minimum cut

),(

)]()()[,())(,x'xx

x'xx'xx(x fffg Minimizes

source label

sink label

pixels

(cost of label discontinuity)

(cost of assigninglabel to pixel)

pixels

labels

Page 14: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Sampling-InsensitivePixel Dissimilarity

d(xL,xR)

xL xR

d(xL,xR) = min{d(xL,xR) ,d(xR,xL)}Our dissimilarity measure:

[Birchfield & Tomasi 1998]

IL IR

Page 15: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Given: An interval A such that [xL – ½ , xL + ½] _ A, and

[xR – ½ , xR + ½] _ A

Dissimilarity Measure Theorems

If | xL – xR | ≤ ½, then d(xL,xR) = 0

| xL – xR | ≤ ½ iff d(xL,xR) = 0

∩∩

Theorem 1:

Theorem 2:

(when A is convex or concave)

(when A is linear)

Page 16: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Correspondence as Segmentation

• Problem: disparities (fronto-parallel) O()surfaces (slanted) O( 2 n)=> computationally intractable!

• Solution: iteratively determine which labels to use

labelpixels

find affineparametersof regions

multiway-cut(Expectation)

Newton-Raphson(Maximization)

Page 17: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Stereo Results (Dynamic Programming)

Page 18: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Stereo Results (Multiway-Cut)

Page 19: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Stereo Results on Middlebury Database

imag

eB

irch

fiel

dT

om

asi 1

999

Ho

ng

-C

hen

200

4

Page 20: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Multiway-Cut Challenges

Multiway-cutDynamic programming

Page 21: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Acoustic Localization

Problem: Use microphone signals to determine sound source location

Traditional solutions:1. Delay-and-sum beamforming !2. Time-delay estimation (TDE) !

compact

distributed

Recent solutions:3. Hemisphere sampling !!4. Accumulated correlation !!5. Bayesian !6. Zero-energy !

! efficient ! accurate

Page 22: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Localization Geometry

t2

t1

t -2 t = 1

(one-half hyperboloid)

microphones

sound source

time

Page 23: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Principle of Least Commitment

“Delay decisions as long as possible”

Example:

[Marr 1982 Russell & Norvig 1995]

Page 24: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Localization by Beamforming

mic 1 signaldelay

mic 2 signal

prefilter

prefilter

mic 3 signal

find peak

mic 4 signal

prefilter

prefilter

sum

delay

delay

delay

[Silverman &Kirtman 1992; Duraiswami et al. 2001; Ward & Williamson, 2002]

energy

! accurate NOT efficient

makes decision late in pipeline(“principle of least commitment”)

delays (shifts) each signalfor each candidate location

Page 25: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Localization by Time-Delay Estimation (TDE)

mic 1 signal

correlatefind peakmic 2 signal

prefilter

prefilter

mic 3 signal

correlatefind peakmic 4 signal

prefilter

prefilter

intersect

(may be no intersection)

[Brandstein et al. 1995;

Brandstein & Silverman 1997;

Wang & Chu 1997]

! efficient NOT accurate

decision is made early

cross-correlation computed once for each microphone pair

Page 26: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Localization by Hemisphere Sampling

mic 1 signalcorrelate

map to common

coordinate system

sampled locus

sum

temporalsmoothing

mic 2 signal

prefilter

prefilter

mic 3 signalcorrelate

map to common

coordinate system

mic 4 signal

prefilter

prefilter

finalsampled

locus

correlate

correlate

correlate

correlate

… find peak

[Birchfield & Gillmor 2001]! efficient! accurate

(but restricted to compact arrays)

Page 27: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Localization by Accumulated Correlation

mic 1 signalcorrelate

map to common

coordinate system

sampled locus

sum

temporalsmoothing

mic 2 signal

prefilter

prefilter

mic 3 signalcorrelate

map to common

coordinate system

mic 4 signal

prefilter

prefilter

finalsampled

locus

correlate

correlate

correlate

correlate

… find peak

[Birchfield & Gillmor 2002]! efficient! accurate

Page 28: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Accumulated Correlation Algorithm

microphone

candidatelocation

= likelihood

+

...

pair 1:

pair 2:

+

Page 29: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Comparison

Bayesian:

Zero energy:

Acc corr:

Hem samp:

TDE:

similarity energy

efficient

accurate

Beamforming:

Page 30: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Unifying framework

efficient

accurate

Page 31: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Integration limits

BeamformingBayesianZero energy

Accumulated correlationHemisphere samplingTime-delay estimation

Page 32: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Compact Microphone Array

microphone

d=15cm

sampled hemisphere

Page 33: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Results on compact array

pan

tilt

without PHAT prefilter with PHAT prefilter

Page 34: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

More Comparison

Hemisphere Sampling[Birchfield & Gillmor 2001]

BeamformingAccumulatedCorrelation

[Birchfield & Gillmor 2002]

Page 35: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Results on distributed array

Page 36: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Computational efficiency

0

1000

2000

3000

4000

5000

6000

7000

8000

Compact Distributed

Beamforming

Accumulatedcorrelation

Co

mp

uti

ng

tim

e p

er w

ind

ow

(m

s)

(600x faster) (50x faster)

Page 37: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Simultaneous Speakers

+ =

Page 38: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Detecting Noise Sourcesbackground noise source

Page 39: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Connection with Stereo

[Okutomi & Kanade 1993]

“Multi-baseline stereo”

Page 40: Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

Conclusion

• Spatial sensing achieved by arrays of visual and auditory sensors

• Stereo vision– match visual signals from multiple cameras– recent breakthrough: multiway-cut– limitations of multiway-cut

• Acoustic localization– match acoustic signals from multiple microphones– recent breakthrough: accumulated correlation– connection with multi-baseline stereo