106
Queensland University of Technology CyPhy Lab Vision-only place recognition Peter Corke http://tiny.cc/cyphy ICRA 2014 Workshop on Visual Place Recognition in Changing Environments

Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

Queensland University of TechnologyCyPhy La

b

Vision-only place recognition

Peter Corke

http://tiny.cc/cyphy

ICRA 2014 Workshop on Visual Place Recognition in Changing Environments

Page 2: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Navigation system

• Integrative component–dead reckoning: odometry, VO,inertial etc

Page 3: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Navigation system

• Integrative component–dead reckoning: odometry, VO,inertial etc

• Extroceptive component–GPS, visual place recognition, landmark recognition

Page 4: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

The core problem

• Given a new image of a place, determine which previously seen image is the most similar, from which we infer similarity of place

• Similar to the CV image retrieval problem–Differences:

• we can assume temporal and spatial continuity (locality) across images in the sequence

• viewpoint might be quite different• the scene might appear different due to external factors

Page 5: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Semantic classification

• Given a new image of a place determine what type of place it is–eg. kitchen, bathroom, auditorium

• Can be useful if we have strong priors like a map with labelled places

• Can be useful if place types are unique within the environment

Page 6: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Issue #1: Appearance & geometry

• Geometry is the 3D structure of the world• Appearance is a 2D projection of the world• Geometry → appearance (computer graphics)• Appearance⥇ geometry

Page 7: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

...issue #1: Appearance & geometry

• door or not a door?

Page 8: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Issue #2: Confounding factors

Weather and Lighting

Shadows Seasons

Image credits (L to R): Milford and Wyeth (ICRA2012), Corke et al (IROS2013), Neubert et al (ECMR2013).

Page 9: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Issue #3: Distractors

• Many pixels in the scene are not discriminative–sky–road–etc

Page 10: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Issue #4: Aliasing

• Where am I?–Can I tell?–Does it matter if I can’t?

Page 11: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Issue #5: Viewpoint

• What do we actually mean by place?–Is this the same place?–What if the same location, but facing the other way?

Page 12: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

...issue #5: Viewpoint

• Viewpoints affects the scene globally–all pixels change

• However small elements of the scene are unchanged (invariant)– just shifted

Page 13: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Issue #6: Getting good images

• Robots move–motion blur

• Huge dynamic range outdoors–from dark shadows to highlights

• Huge variation in mean illumination–0.001 lx moonless with clouds–0.27 lx full moon–500 lx office lighting–100,000 lx direct sunlight

• Color constancy

Page 14: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

place

• Appearance is a function of–scene 3D geometry–materials–viewpoint– lighting changes (intensity, color)–exogenous factors (leaves, rain, snow)

• This function is complex, non-linear and not invertible

• Lots of undiscriminative stuff like sky, road etc.

cyphy laboratory

Summary: the nub of the problem

Page 15: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

The easy way out - go for geometry

• Roboticists began to use laser scanners in the early 1990s

• Increase in resolution, rotation rate, reflectance data• Maximum range and cost little changed

Page 16: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

• Appearance is a function of–scene 3D geometry–materials–viewpoint– lighting changes (intensity, color)–exogenous factors (leaves, rain, snow)

• This function is complex, non-linear and not invertible

• Lots of undiscriminative stuff like sky, road etc.

cyphy laboratory

Summary: the nub of the problem

Page 17: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Why do we like lasers?

• metric• sufficient range• we are suckers for colored

3D models of our world

Page 18: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Measurement principles

• Time of flight• Phase shift• Frequency modulated

continuous wave (FMCW) or chirp

Page 19: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

2D scanning

• High speed rotating mirror• Typically a pulse every 0.5deg

Page 20: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

The curse of 1/R4

Page 21: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

3D scanning

• 2-axis scanner• multi-beam laser• pushbroom• flash LIDAR

Page 22: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

3D scanning

• 2-axis scanner• multi-beam laser• pushbroom• flash LIDAR

Page 23: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

3D scanning

• 2-axis scanner• multi-beam laser• pushbroom• flash LIDAR

Page 24: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

3D scanning

• 2-axis scanner• multi-beam laser• pushbroom• flash LIDAR

Page 25: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Long range flash LIDAR

• 1 foot resolution at 1km

US Patent 4,935,616 (1990)

Page 26: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Laser sensing

✓Clearly sufficient✓We have great algorithms for scan matching,

building maps, closing loops etc.✓Great hardware: Sick, Velodyne

- Price point still too high- How will we cope with many vehicles using the

same sensor- Misses out on color and texture.

Page 27: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

The perpetual promise of vision

• Visual place recognition is possible

Page 28: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

The (amazing) sense of vision

• eye invented 540 million years ago

• 10 different eye designs• lensed eye invented 7

times

Page 29: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Compound Eyes of a Holocephala fusca Robber Fly 

Page 30: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Anterior Median and Anterior Lateral Eyes of an Adult Female Phidippus putnami Jumping Spider 

Page 31: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

CRICOS No. 00213Ja university for the worldreal ®

Page 32: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Datasheet for the eye/brain system• 4.5M cone cells

–150,000 per mm2 (~2 µm square)–daylight only

• 100M rod cells–night time only–respond to a single photon

• Total dynamic range 1,000,000:1 (20 bits)

• Human brain–1.5 kg–1011 neurons–~20W– ~1/3 for vision

Page 33: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

We’ve been here before

• Eureka project 1987-95• 1000km on Paris highways, upto 130km/h• 1600km Munich to Copenhagen, overtaking, upto

175km/h• distance between interventions: mean 9km, max

158km1987

Page 34: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

...we’ve been here before

98% autonomous

1995

Page 35: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Approaches to robust place recognition

• Get better images• Robust 2D image descriptors• Understand variation over time• Use fewer pixels• Use a sequence of recent images• Use some invariant• Use 3D structure

–laser, stereo, active range camera (eg. Kinect)

Page 36: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Getting good images

underexposed

flare

blurry

Page 37: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

F =ff

e µ G✓

LTAcos

4 qF2

+h◆

cyphy laboratory

Pixel brightness

pixel area

exposure timescene

luminance

gain

noise

Page 38: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Exposure time T

• T has to be compatible with camera + scene dynamics

Page 39: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Increase ϕ

Page 40: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Photon boosting

Page 41: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Log-response cameras

• Similar dynamic range to human eye (106), but no slow chemical adaption time

592 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 4, APRIL 2001

Fig. 8. Complete camera with a length of 35 mm and a diameter of 25 mm.

Fig. 9. Averaged photoreceptor response of all 100K pixels.

IV. MEASUREMENTS

The sensor performance has been evaluated using a red diodelaser ( nm) and a xenon arc lamp emitting a rela-tively homogeneous white spectrum with a few peaks in thenear infrared. A number of neutral density filters allows to varythe light intensity in a dynamic range of more than 8 decades.It should be mentioned that the absolute value of the outputvoltage has no meaning, as it can be shifted to any value bychanging the reference voltages. Only differences of the voltagesignals correspond to the receptor sensitivity. All measurementswere carried out at a frame rate of 50 Hz.

A. Photoreceptor ResponseFig. 9 shows the averaged response of all 384 288 pixels as

a function of the incident light intensity (xenon arc lamp). Overa dynamic range of 6 decades (from 3 mW/m to 3 kW/m ) thesensor has the expected logarithmic behavior. The slope of thisresponse curve, averaged in a range of three decades, amountsto 250 mV per decade. It can be adjusted, however, to any valuebetween 130 and 720 mV/decade by changing the feedback ca-pacitors of the readout amplifier (Fig. 4). The decreasing slopeat very low intensities results from the low photocurrents in this

Fig. 10. Distribution of pixel offsets at a light intensity of 1 W/m .

region. They reach the value of the diode’s dark current rep-resenting a fixed offset in the response curve. Besides, a longtime is required to charge the parasitic pixel capacitances, espe-cially when switching from calibration to readout. Both effectslimit the sensitivity of the sensor under low light conditions. Thesignal decrease at very high intensities stems from the dischargeof the storage capacitors. Parasitic photocurrents significantlychange the stored correction voltages before the individual pho-toreceptors are read out.

B. Self-Calibration ResultsThe remaining FPN after applying the self-calibration has

been measured with the help of the white arc lamp. The ref-erence current has been adjusted to approximately 1 nA cor-responding to a photocurrent generated at 25 W/m . For bestFPN reduction, photocurrent and reference current should be inthe same range, eleminating possible slope variations. However,there is a certain lower limit for in order to keep the circuitfast enough for performing the self-calibration. The offset dis-tribution at an intensity of 1 W/m can be seen in Fig. 10. Itshows an RMS value of 10.7 mV corresponding to 3.8% of adecade due to the slope of 250 mV/decade. This means a FPNof 0.63% of the total dynamic range. The peak-to-peak valuesare about five times higher than the RMS values leading to 20%of a decade and 3–4% of the total dynamic range, respectively.In addition to the histogram, a Gaussian curve fitted to the mea-sured values has been drawn. Both curves match relatively well,i.e., the pixel offsets are normally distributed.Unfortunately, the architecture of the self-calibrating sensor

does not allowmeasurement of the uncalibrated FPN. In order tostill obtain a reasonable comparison between calibrated and un-calibrated pixel-to-pixel variations, a test column has been builtin the same 0.6- m CMOS process. It consists of simple log-arithmic pixels which do not contain the calibration circuitry.The FPN results of this test sensor [9] amount to 90% of adecade and are more than 20 times worse than the calibratedresults. Referring to another sensor design which has been real-ized in a different CMOS process also using uncalibrated loga-

LOOSE et al.: SELF-CALIBRATING SINGLE-CHIP CMOS CAMERA 595

(a) (b) (c)

Fig. 16. Images of a high-dynamic-range scene taken with (a) a logarithmic CMOS sensor, (b) a logarithmic CMOS sensor with enabled digital zoom (zoomlevel 3), and (c) a CCD camera with opened and closed aperture.

F. Parasitic Discharge of the Analog Memory CellsThe pixel correction voltages are stored on analog memory

cells (capacitors) in order to bridge the time gap between twosuccessive calibration cycles. Since the switch isolating thismemory cell during the storage phase is realized as a MOS tran-sistor, parasitic currents (leakage and photocurrents) dischargethe capacitor with time. This discharge behavior determiningthe maximum time between two calibration cycles has beenexamined after stopping the calibration process. Fig. 15 showsthe mean output voltage of all pixels as a function of time. Itcan be seen that the voltage decreases down to about 2.4 Vand then stays constant. This decrease corresponds to a riseof the storage capacitor voltage because the pixel signalis inverted by the readout amplifier. The output voltage stopsdropping (at 2.4 V) when reaches the upper supplyvoltage .The drop rate significantly depends on the light intensity. At

very high illuminations, the signal has already noticeably de-creased after a few milliseconds. Therefore, the time betweencalibration and readout should be very short in order to expandthe dynamic range as much as possible toward high intensities.On the other hand, however, the pixel circuit takes a long time atlow illuminations to recover its operating point after switchingfrom calibration to readoutmode. Here, a large distance betweencalibration and readout is desired. Consequently, the responsecurve is shifted toward lower or higher intensities by changingthe time between calibration and readout. Usually, the sensor isread out directly (a few hundredmicroseconds) after calibration.

G. Camera ImagesFinally, a high-dynamic-range scene taken with the self-cal-

ibrating camera is presented (Fig. 16). It shows a bright incan-descent bulb and the logo of our laboratory printed on whitepaper. The dynamic range amounts to about 5 decades, whichis too much for a CCD camera [Fig. 16(c)]. Either the logosymbol can be clearly seen but the bulb is completely overex-posed (blooming), or, by closing the aperture, the filament ofthe bulb can be seen but the logo disappears in the black back-ground.The logarithmic sensor [Fig. 16(a)] is able to see the bulb

structure as well as the printed symbol at the same time. Due tothe built-in self-calibration, the FPN is reduced to a level which

TABLE ISENSOR PROPERTIES AND PERFORMANCE SUMMARY

is scarcely noticeable in this image. Fig. 16(b) shows the digitalzoom capability of the camera chip allowing the mapping of asubpart of the sensor area to the full video screen.

V. CONCLUSION

We have presented a 384 288 pixel CMOS image sensorbased on a self-calibrating photoreceptor with logarithmic

LOOSE et al.: SELF-CALIBRATING SINGLE-CHIP CMOS CAMERA 595

(a) (b) (c)

Fig. 16. Images of a high-dynamic-range scene taken with (a) a logarithmic CMOS sensor, (b) a logarithmic CMOS sensor with enabled digital zoom (zoomlevel 3), and (c) a CCD camera with opened and closed aperture.

F. Parasitic Discharge of the Analog Memory CellsThe pixel correction voltages are stored on analog memory

cells (capacitors) in order to bridge the time gap between twosuccessive calibration cycles. Since the switch isolating thismemory cell during the storage phase is realized as a MOS tran-sistor, parasitic currents (leakage and photocurrents) dischargethe capacitor with time. This discharge behavior determiningthe maximum time between two calibration cycles has beenexamined after stopping the calibration process. Fig. 15 showsthe mean output voltage of all pixels as a function of time. Itcan be seen that the voltage decreases down to about 2.4 Vand then stays constant. This decrease corresponds to a riseof the storage capacitor voltage because the pixel signalis inverted by the readout amplifier. The output voltage stopsdropping (at 2.4 V) when reaches the upper supplyvoltage .The drop rate significantly depends on the light intensity. At

very high illuminations, the signal has already noticeably de-creased after a few milliseconds. Therefore, the time betweencalibration and readout should be very short in order to expandthe dynamic range as much as possible toward high intensities.On the other hand, however, the pixel circuit takes a long time atlow illuminations to recover its operating point after switchingfrom calibration to readoutmode. Here, a large distance betweencalibration and readout is desired. Consequently, the responsecurve is shifted toward lower or higher intensities by changingthe time between calibration and readout. Usually, the sensor isread out directly (a few hundredmicroseconds) after calibration.

G. Camera ImagesFinally, a high-dynamic-range scene taken with the self-cal-

ibrating camera is presented (Fig. 16). It shows a bright incan-descent bulb and the logo of our laboratory printed on whitepaper. The dynamic range amounts to about 5 decades, whichis too much for a CCD camera [Fig. 16(c)]. Either the logosymbol can be clearly seen but the bulb is completely overex-posed (blooming), or, by closing the aperture, the filament ofthe bulb can be seen but the logo disappears in the black back-ground.The logarithmic sensor [Fig. 16(a)] is able to see the bulb

structure as well as the printed symbol at the same time. Due tothe built-in self-calibration, the FPN is reduced to a level which

TABLE ISENSOR PROPERTIES AND PERFORMANCE SUMMARY

is scarcely noticeable in this image. Fig. 16(b) shows the digitalzoom capability of the camera chip allowing the mapping of asubpart of the sensor area to the full video screen.

V. CONCLUSION

We have presented a 384 288 pixel CMOS image sensorbased on a self-calibrating photoreceptor with logarithmic

Markus Loose, “A Self-Calibrating CMOS Image Sensor with Logarithmic Response”, Ph. D thesis, Institut Für Hochenergiehysik, Universität Heidelberg, 1999.

Page 42: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Human visual response

S LM

• Rods–night time

• Cones–daytime–3 flavours

Page 43: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

CRICOS No. 00213Ja university for the worldreal ® © Peter Corke

silicon photosensor, orpixel

light

colored filter array(CFA)

The silicon equivalent

Page 44: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

CRICOS No. 00213Ja university for the worldreal ® © Peter Corke

Dichromats

Page 45: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Why stop at 3 cones?

Page 46: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

• FluxData Inc• FS-1665• 3 Bayer + 2 NIR• 3 CCDs

cyphy laboratory

Multispectral cameras

Page 47: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Assorted pixel arrays

Nyquist Frequencyof

Horizontal

Frequency

Vertical Frequency

Nyquist Frequencyof other pixels

0.25fs

0.125fs

G G G G

R GG

R GG

R GG

R GG

B B

B B

R GG

R GG

R GG

R GG

B B

B BR GG

R GG

R GG

R GG

B B

B B

R GG

R GG

R GG

R GG

B B

B B

R GG

R GG

R GG

R GG

B B

B B

R GG

R GG

R GG

R GG

B B

B BR GG

R GG

R GG

R GG

B B

B B

R GG

R GG

R GG

R GG

B B

B BOptical Resolution Limit(N=f/5.6, λ =555nm, p=1.0)

, , ,

(a) 3 colors and 4 exposures CFA in [8] and its Nyquist Limits

1 46

1 57

1 57

1 46

2 3

3 2

1 46

1 57

1 57

1 46

2 3

3 21 46

1 57

1 57

1 46

2 3

3 2

1 46

1 57

1 57

1 46

2 3

3 2

1 46

1 57

1 57

1 46

2 3

3 2

1 46

1 57

1 57

1 46

2 3

3 21 46

1 57

1 57

1 46

2 3

3 2

1 46

1 57

1 57

1 46

2 3

3 2

Horizontal

Frequency

Vertical Frequency

Nyquist Frequencyof other pixels

1

Optical Resolution Limit(N=f/5.6, λ=555nm, p=1.0)

Nyquist Frequencyof

0.25fs(b) 7 colors and 1 exposure CFA in [2] and its Nyquist Limits

Figure 2: Nyquist Limits of previous assorted designs used with sub-micron pixel imagesensors (pixel pitch p = 1.0nm).

resolution limit.

Figure 2 shows the Nyquist limits when the CFA patterns of previous assorted pixels

are used with the sub-micron pixel size image sensor. When the highest frequency of

the input signal is lower than the Nyquist limit, aliasing does not occur, according to the

sampling theorem. Therefore, aliasing is not generated at pixels marked ‘1’ in Figure

2(b).

6

• Better dynamic range–2x2 Bayer filter cells with 3 levels

of neutral density filter• More colors

–3x3 or 4x4 filter cells ➙ 9 or 16 primaries

Page 48: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

Wide field of view

Page 49: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Approaches to robust place recognition

• Get better images• Robust 2D image descriptors• Understand variation over time• Use fewer pixels• Use a sequence of recent images• Use some invariant• Use 3D structure

–laser, stereo, active range camera (eg. Kinect)

Page 50: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Whole scene descriptors

• GIST• HoG• SIFT/SURF on whole

image• Color histograms

Page 51: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

...issue #5: Viewpoint

• Viewpoints affects the scene globally–all pixels change

• However small elements of the scene are unchanged (invariant)– just shifted

Page 52: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Visual elements

• Bag of visual words (BoW)–FABMAP, OpenFABMAP

Page 53: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

• Feature-detection front ends fail completely across extreme perceptual change

Page 54: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Future work

• Really interesting recent work on learning distinctive elements of a scene

• Contextual priming, choose the features for the situation–day/night– indoor/outdoor

Page 55: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Approaches to robust place recognition

• Get better images• Robust 2D image descriptors• Understand variation over time• Use fewer pixels• Use a sequence of recent images• Use some invariant• Use 3D structure

–laser, stereo, active range camera (eg. Kinect)

Page 56: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Understand variation over time

• Traditional visual localization methods are not robust to appearance change

• How do features change over time?• Can we predict appearance based on time?

Page 57: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Generalization of Temporal Change over Space

• Assume we have a “training set” of paired image sequences from locations under two different times of day

Page 58: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Training Images

• Use known matched images to generate a temporal “codebook” across the two appearance configurations

Page 59: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Generalizing about change

Page 60: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Generalising about change: results

Page 61: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Approaches to robust place recognition

• Get better images• Robust 2D image descriptors• Understand variation over time• Use fewer pixels• Use a sequence of recent images• Use some invariant• Use 3D structure

–laser, stereo, active range camera (eg. Kinect)

Page 62: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Camera Resolutions...

courtesy Barry Hendy

Similar story for storage and compute

Page 63: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Pixel subtended angle

• 10 Mpixel sensor, 30deg FOV–0.01 deg per pixel

• 64 pixel sensor, 30deg FOV–4 deg per pixel

Page 64: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Use fewer pixels

Page 65: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

How many pixels do you need?

Eynsham dataset or datasets with odometry, it is possible to use a small range or even single value of vk.

By considering the sum of sub-route difference scores s(i) as a sum of normally distributed random variables, each with the same mean and variance, the sum of normalized differences over a sub-route of length n frames has mean zero and variance n, assuming that frames are captured far enough apart to be considered independent. Dividing by the number of frames produces a normalized route difference score with mean zero, variance 1/n. Percentile rank scores can then be used to determine an appropriate sub-route matching threshold. For example, for the primary sub-route length n = 50 used in this paper, a sub-route threshold of -1 yields a 7.7×10-13 chance of the match occurring by chance.

To determine whether the current sub-route matches to any stored sub-routes, the minimum matching score is compared to a matching threshold sm. If the minimum score is below the threshold, the sub-route is deemed to be a match, otherwise the sub-route is assigned as a new sub-route. An example of the minimum matching scores over every frame of a dataset (the Eynsham dataset described in this paper) is shown in Figure 2. In the second half of the dataset the route is repeated, leading to lower minimum matching scores.

Figure 2. Normalized sub-route difference scores for the Eynsham dataset

with the matching threshold sm that yields 100% precision performance.

IV. EXPERIMENTAL SETUP In this section we describe the four datasets used in this

work and the image pre-processing for each study.

A. Datasets A total of four datasets were processed, each of which

consisted of two traverses of the same route. The datasets were: a 70 km road journey in Eynsham, the United Kingdom, 2 km of motorbike circuit racing in Rowrah, the United Kingdom, 40 km of off-road racing up Pikes Peak in the Rocky Mountains, the United States, and 100 meters in an Office building (italics indicate dataset names). The Eynsham route was the primary dataset on which extensive quantitative analysis was performed. The other datasets were added to provide additional evidence for the general applicability of the algorithm. Key dataset parameters are provided in Table I, including the storage space required to represent the entire dataset using low resolution images.

Figure 3 shows aerial maps and imagery of the Eynsham, Rowrah and Pikes Peak datasets, with lines showing the route that was traversed twice. The Eynsham dataset consisted of

high resolution image captures from a Ladybug2 camera (circular array of five cameras) at 9575 locations spaced along the route. The Rowrah dataset was obtained from an onboard camera mounted on a racing bike. The Pikes Peak dataset was obtained from cameras mounted on two different racing cars racing up the mountain, with the car dashboard and structure cropped from the images. This cropping process could most likely be automated by applying some form of image matching process to small training samples from each of the camera types. The route consisted of heavily forested terrain and switchbacks up the side of a mountain, ending in rocky open terrain partially covered in snow.

TABLE I. DATASETS

Dataset Name Distance Number of

frames Distance between

frames Image

Storage Eynsham 70 km 9575 6.7 m (median) 306 kB

Rowrah 2km 440 4.5 m (mean) 7 kB http://www.youtube.com/watch?v=_UfLrcVvJ5o

Pikes Peak

40 km 4971 8 m (mean) 159 kB http://www.youtube.com/watch?v=4UIOq8vaSCc http://www.youtube.com/watch?v=7VAJaZAV-gQ

Office 53 m 832 0.13 m (mean) 1.6 kB http://df.arcs.org.au/quickshare/790eb180b9e87d53/data3.mat

Figure 3. The (a) 35 km Eynsham, (b) 1 km Rowrah and (c) 20 km Pikes Peak routes, each of which were repeated twice. Copyright 2011 Google.

Figure 4. (a) The Lego Mindstorms dataset acquisition rig with 2 sideways facing light sensors and GoPro camera for evaluation of matched routes. (b)

The 53 meter long route which was repeated twice to create the dataset.

B. Image Pre-Processing 1) Eynsham Resolution Reduced Panoramic Images

For the Eynsham dataset, image processing consisted of image concatenation and resolution reduction (Figure 5). The raw camera images were crudely cropped to remove overlap between images. No additional processing such as camera undistortion, blending or illumination adjustment was performed. The subsequent panorama was then resolution reduced (re-sampling using pixel area relation in OpenCV 2.1.0) to the resolutions shown in Table II.

Page 66: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Example Route Match - Eynsham

Page 67: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Eynsham Resolution Reduction Results

Directi

on

of

goodness

Page 68: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Eynsham Pixel Bit Depth Results

32 pixel images

!!!2 bit image

Page 69: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Approaches to robust place recognition

• Get better images• Robust 2D image descriptors• Understand variation over time• Use fewer pixels• Use a sequence of recent images• Use some invariant• Use 3D structure

–laser, stereo, active range camera (eg. Kinect)

Page 70: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Eynsham Sequence Length Results

32 pixel images

Page 71: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Milford and Wyeth, ICRA2012

Page 72: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Remember ALVINN?

• Back in the 70s and 80s, roboticists, AI and computer vision research only used low resolution images–Camera limitations–Compute limitations–Algorithm limitations

Page 73: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Approaches to robust place recognition

• Get better images• Robust 2D image descriptors• Understand variation over time• Use fewer pixels• Use a sequence of recent images• Use some invariant• Use 3D structure

–laser, stereo, active range camera (eg. Kinect)

Page 74: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Shadows are everywhere! Yet, the human visual system is so adept at filtering them out, that we never give shadows a second thought; that is until we need to deal with them in our algorithms. Since the very beginning of computer vision, the presence of shadows has been responsible for wreaking havoc on a variety of applications....

Lalonde, Efros, Narsimhan ECCV 2010

Page 75: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7pi

xel i

nten

sity

distance along profile (pixels)

0 100 200 300 400 500 6000.2

0.4

0.6

0.8

1

1.2

1.4

1.6

distance along profile (pixels)

colo

r rat

ios

R/GB/G

Page 76: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

T=3000 KT=2000-3000 K

T=5000-5400 K T=8000-10000 K

Blackbody illuminants

Page 77: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

logrR = c1

� c2

T logrB = c01

� c02

T

logrB

logrR

rB =BG

rR =RG

cyphy laboratory

Log-log chromaticity

increasing T

mat

erial

prop

erty

Page 78: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

u (pixels)

v (p

ixel

s)

500 1000 1500 2000 2500

200

400

600

800

1000

1200

1400

1600

1800

0 0.5 1 1.5 2 2.5 3 3.50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

invariant line angle (rad)

inva

rianc

e im

age

varia

nce

cyphy laboratory

Angle of the projection line

q

logrB

logrR

Page 79: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Page 80: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Car park sequence

Page 81: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Car park sequence

Page 82: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Outdoor localization

Fig. 14. The approach does not compensate for shadows containingreflected lighting from objects in the scene. The figure illustrates an examplewhere shadows next to coloured walls are not fully removed.

textures induced by shadows rather than the underlyingstructure. We have applied standard point feature extraction(e.g. SIFT, SURF etc. [20]) to the invariant image withsuccess. Despite the lower SNR of the invariant image allbut the smallest scale features reliably associate with materialrather than lighting features of the scene.

F. Limitations

One of the limitations of this method is that the modelassumes scene lighting by a single Planckian source, (SectionIV) and hence cannot fully compensate when shadows arepartly-illuminated by light reflected from objects populatinga scene. For example, Figure 14 shows a strong shadow nextto a building but the shadow is clearly still evident in theinvariant image. In this case the shadow region is illuminatedby sky light reflected from the coloured wall of the buildingwhich makes its spectrum non-Planckian.

VI. CONCLUSION

In this paper we have described an approach to eliminateshadows from colour images of outdoor scenes that is knownin the computer vision community and applied it to a hardrobotic problem of outdoor vision-based place recognition.We have described the details of key implementation stepssuch as minimising camera spectral channel overlap andestimating the direction of the projection line, and discussedapproaches to overcome practical problems with low andhigh pixel values.

VII. ACKNOWLEDGEMENTS

Peter Corke was supported by Australian Research Coun-cil project DP110103006 Lifelong Robotic Navigation usingVisual Perception. Winston Churchill was supported by anEPSRC Case Studentship with Oxford Technologies Ltd.Paul Newman was supported by an EPSRC Leadership Fel-lowship, EPSRC Grant EP/I005021/1. Authors thank MarkSheehan and Dr. Alastair Harrison for insightful discussionon JR divergence and Dr. Benjamin Davis for maintainingthe robotic platform used for this work. We thank DominicWang for valuable suggestions on this paper.

REFERENCES

[1] J. Lalonde, A. Efros, and S. Narasimhan, “Detecting ground shadowsin outdoor consumer photographs,” Computer Vision–ECCV 2010, pp.322–335, 2010.

[2] W. Churchill and P. Newman, “Practice makes perfect? managingand leveraging visual experiences for lifelong navigation,” IEEEInternational Conference on Robotics and Automation, 2012.

[3] J. Zhu, K. Samuel, S. Masood, and M. Tappen, “Learning to recognizeshadows in monochromatic natural images,” in Computer Vision andPattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010,pp. 223–230.

[4] R. Guo, Q. Dai, and D. Hoiem, “Single-image shadow detectionand removal using paired regions,” in Computer Vision and PatternRecognition (CVPR), 2011 IEEE Conference on. IEEE, 2011, pp.2033–2040.

[5] G. Finlayson, M. Drew, and C. Lu, “Intrinsic images by entropyminimization,” Computer Vision-ECCV 2004, pp. 582–595, 2004.

[6] G. Finlayson, S. Hordley, C. Lu, and M. Drew, “On the removal ofshadows from images,” Pattern Analysis and Machine Intelligence,IEEE Transactions on, vol. 28, no. 1, pp. 59–68, 2006.

[7] S. Narasimhan, V. Ramesh, and S. Nayar, “A class of photometricinvariants: separating material from shape and illumination,” in IEEEInternational Conference on Computer Vision, oct. 2003, pp. 1387–1394 vol.2.

[8] S. Nayar and S. Narasimhan, “Vision in bad weather,” in ComputerVision, 1999. The Proceedings of the Seventh IEEE InternationalConference on, vol. 2. Ieee, 1999, pp. 820–827.

[9] S. Narasimhan and S. Nayar, “Chromatic framework for vision inbad weather,” in Computer Vision and Pattern Recognition, 2000.Proceedings. IEEE Conference on, vol. 1. IEEE, 2000, pp. 598–605.

[10] V. Kwatra, M. Han, and S. Dai, “Shadow removal for aerial imageryby information theoretic intrinsic image analysis,” in ComputationalPhotography (ICCP), 2012 IEEE International Conference on. IEEE,2012, pp. 1–8.

[11] S. Park and S. Lim, “Fast shadow detection for urban autonomousdriving applications,” in Intelligent Robots and Systems, 2009. IROS2009. IEEE/RSJ International Conference on. IEEE, 2009, pp. 1717–1722.

[12] M. Drew and H. Joze, “Sharpening from shadows: Sensor transformsfor removing shadows using a single image,” in Color ImagingConference, 2009, pp. 267–271.

[13] M. Milford, “Visual route recognition with a handful of bits,” inProceedings of Robotics: Science and Systems, Sydney, Australia, July2012.

[14] P. I. Corke, Robotics, Vision & Control: Fundamental Algorithms inMATLAB. Springer, 2011, iSBN 978-3-642-20143-1.

[15] F. Wang, T. Syeda-Mahmood, B. Vemuri, D. Beymer, and A. Rangara-jan, “Closed-form Jensen-Renyi divergence for mixture of gaussiansand applications to group-wise shape registration,” Medical ImageComputing and Computer-Assisted Intervention–MICCAI 2009, pp.648–655, 2009.

[16] Z. Botev, J. Grotowski, and D. Kroese, “Kernel density estimation viadiffusion,” The Annals of Statistics, vol. 38, no. 5, pp. 2916–2957,2010.

[17] M. Sheehan, A. Harrison, and P. Newman, “Self-calibration for a 3dlaser,” The International Journal of Robotics Research, 2011.

[18] A. Hamza and H. Krim, “Image registration and segmentation bymaximizing the Jensen-Renyi divergence,” in Energy MinimizationMethods in Computer Vision and Pattern Recognition. Springer, 2003,pp. 147–163.

[19] P. Felzenszwalb and D. Huttenlocher, “Efficient graph-based imagesegmentation,” International Journal of Computer Vision, vol. 59,no. 2, pp. 167–181, 2004.

[20] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robustfeatures (surf),” Computer Vision and Image Understanding, vol. 110,no. 3, pp. 346–359, 2008.

Page 83: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Place 5

Page 84: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Place 8

Page 85: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Image similarity

•5 places➥2-9 images of each➥total 28 images (48x64)➥compared using ZNCC

Page 86: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

PR curve

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

pre

cisi

on

greyscaleshadow invariant

Page 87: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Park example

u (pixels)

v (p

ixels

)

100 200 300 400 500 600 700 800

100

200

300

400

500

600

u (pixels)

v (p

ixels)

100 200 300 400 500 600 700 800

100

200

300

400

500

600

u (pixels)

v (pix

els)

100 200 300 400 500 600 700 800

100

200

300

400

500

600

Page 88: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Fail!

Page 89: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Approaches to robust place recognition

• Get better images• Robust 2D image descriptors• Understand variation over time• Use fewer pixels• Use a sequence of recent images• Use some invariant• Use 3D structure

–laser, stereo, active range camera (eg. Kinect)

Page 90: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

Determining distance1. Occlusion2. Height in visual field3. Relative size4. Texture density5. Aerial perspective6. Binocular disparity7. Accommodation8. Convergence9. Motion perspective

PERCEPTION, LAYOUT, AND VIRTUAL REALITY 29

Chauvet et al., 1995; Hobbs, 1991) and Egyptian art (seeHagen, 1986; Hobbs, 1991), where it is often used alone,with no other information to convey depth. Thus, onecan make a reasonable claim that occlusion was the firstsource of information discovered and used to depict spa-tial relations in depth.

Because occlusion can never be more than ordinal in-formation—one can only know that one object is in frontof another, but not by how much—it may not seem im-pressive. Indeed, some researchers have rejected it as in-formation about depth (e.g., Landy, Maloney, Johnston,& Young, 1995). But the range and power of occlusion isstriking: As is suggested in Figure 1, it can be trusted atall distances without attenuation, and its depth thresholdexceeds that of all other sources. Even stereopsis seemsto depend on partial occlusion (Anderson & Nakayama,1994). Normalizing size over distance, occlusion pro-vides depth thresholds of 0.1% or better. This is thewidth of one sheet of paper against another at 30 cm, thewidth of a person against a wall at 500 m, or the width ofa car against a building at 2 km. Cutting and Vishton (1995)have provided more background on occlusion along withjustifications for this plotted function, as well as forthose of the other sources of information discussed here.

2. Height in the visual field measures relations amongthe bases of objects in a 3-D environment as projected tothe eye, moving from the bottom of the visual field (or

image) to the top, and assuming the presence of a groundplane, of gravity, and the absence of a ceiling (see Dunn,Gray, & Thompson, 1965). Across the scope of manydifferent traditions in art, a pattern is clear: If one sourceof information about layout is present in a picture be-yond occlusion, that source is almost always height in thevisual field. The conjunction of occlusion and height,with no other sources, can be seen in the paintings at Chau-vet; in classical Greek art and in Roman wall paintings;in 10th-century Chinese landscapes; in 12th- to15th-century Japanese art; in Western works of Cimabue, Duc-cio di Buoninsegna, Simone Martini, and Giovanni diPaolo (13th–15th centuries); and in 15th-century Persian art(see Blatt, 1984; Chauvet et al., 1995; Cole, 1992; Hagen,1986; Hobbs, 1991; Wright, 1983). Thus, height appearsto have been the second source of information discovered,or at least mastered, for portraying depth and layout.

The potential utility of height in the visual field is sug-gested in Figure 1, dissipating with distance. This plot as-sumes an upright, adult observer standing on a flat plane.Since the observer’s eye is at a height of about 1.6 m, nobase closer than 1.6 m will be available; thus, the func-tion is truncated in the near distance, which will have im-plications later. I also assume that a height difference ofabout 5! of arc between two nearly adjacent objects isjust detectable; but a different value would simply shiftthe function up or down. When one is not on a flat plane,

Figure 1. Just-discriminable ordinal depth thresholds as a function of the logarithm of distance from the ob-server, from 0.5 to 10,000 m, for nine sources of information about layout. I assume that more potent sources ofinformation are associated with smaller depth-discrimination thresholds; and that these thresholds reflectsuprathreshold utility. This array of functions is idealized for the assumptions given in Table 1. From “PerceivingLayout and Knowing Distances: The Integration, Relative Potency, and Contextual Use of Different InformationAbout Depth,” by J. E. Cutting and P. M. Vishton, 1995, in W. Epstein and S. Rogers (Eds.), Perception of Spaceand Motion (p. 80), San Diego: Academic Press, Copyright 1995 by Academic Press. Reprinted with permission.How the eye measures reality and virtual reality 1995Cutting, J. E. & Vishton, P. M. | Reprinted from Perception of Space and Motion, W. Epstein and S. Rogers, Perceiving layout and knowing distances: The interaction, relative potency, and contextual use of different information about depth.page 80., Copyright (1995), with permission from Elsevier

Page 91: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

Determining distance1. Occlusion2. Height in visual field3. Relative size4. Texture density5. Aerial perspective6. Binocular disparity7. Accommodation8. Convergence9. Motion perspective

Page 92: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

1. Occlusion2. Height in visual field3. Relative size4. Texture density5. Aerial perspective6. Binocular disparity7. Accommodation8. Convergence9. Motion perspective

How do we estimate distance?

Page 93: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

1. Occlusion2. Height in visual field3. Relative size4. Texture density5. Aerial perspective6. Binocular disparity7. Accommodation8. Convergence9. Motion perspective

How do we estimate distance?

Page 94: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h
Page 95: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

1. Occlusion2. Height in visual field3. Relative size4. Texture density5. Aerial perspective6. Binocular disparity7. Accommodation8. Convergence9. Motion perspective

How do we estimate distance?

Page 96: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

1. Occlusion2. Height in visual field3. Relative size4. Texture density5. Aerial perspective6. Binocular disparity7. Accommodation8. Convergence9. Motion perspective

How do we estimate distance?

Page 97: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

1. Occlusion2. Height in visual field3. Relative size4. Texture density5. Aerial perspective6. Binocular disparity7. Accommodation8. Convergence9. Motion perspective

How do we estimate distance?

Page 98: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

1. Occlusion2. Height in visual field3. Relative size4. Texture density5. Aerial perspective6. Binocular disparity7. Accommodation8. Convergence9. Motion perspective

How do we estimate distance?

Page 99: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

1. Occlusion2. Height in visual field3. Relative size4. Texture density5. Aerial perspective6. Binocular disparity7. Accommodation8. Convergence9. Motion perspective

How do we estimate distance?

http://www.youtube.com/watch?v=6GliSCGkpZ4

Eye Movement Terminology  YouTube 2008Sam Tapsell | Used with permission.

Page 100: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

1. Occlusion2. Height in visual field3. Relative size4. Texture density5. Aerial perspective6. Binocular disparity7. Accommodation8. Convergence9. Motion perspective

How do we estimate distance?

video  from  handheld  camera  while  walking,  with  near  and  far  objects  moving  past

Page 101: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

3D camera

Page 102: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Use 3D structure to identify places

Page 103: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Page 104: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

place

• Appearance is a function of–scene 3D geometry–materials–viewpoint– lighting changes (intensity, color)–exogenous factors (leaves, rain, snow)

• This function is complex, non-linear and not invertible

• Lots of undiscriminative stuff like sky, road etc.

cyphy laboratory

Summary: the nub of the problem

Page 105: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

Approaches to robust visual place recognition

• Get better images• Robust 2D image descriptors• Understand variation over time• Use fewer pixels• Use a sequence of recent images• Use some invariant• Use 3D structure

–laser, stereo, active range camera (eg. Kinect)

Page 106: Vision-only place recognition · 2014. 6. 14. · – ~1/3 for vision. cyphy laboratory We’ve been here before • Eureka project 1987-95 • 1000km on Paris highways, upto 130km/h

cyphy laboratory

• We’re doing: robust vision, semantic vision, vision & action, algorithms & architectures

• Looking for 16 postdocs

We’re hiring

www.roboticvision.org