Upload
dothuan
View
214
Download
0
Embed Size (px)
Citation preview
❖INTRODUCTION ❖
GOAL:To recover the structure of a rigid object usinga sequence of stereo images for suchaerospace applications as autonomousprecision landing, satellite servicing andretrieving payloads.
BASIC ASSUMPTIONS:1. Localised point features such as corners
are readily available.
2. Structure is represented by a collection of3D points.
3. There is a single, unknown rigid motionbetween the cameras and the object.
RESULT:An integrated framework for reconstructing anincrementally accurate and denserepresentation of a rigid object.
❖3D RECONSTRUCTION ❖
PROBLEM FORMULATION:�frame number in image sequence� camera viewpoint�perspective projection function
Set of geometric or textural features,represented as 3D points� � � � � � � � � � � � � � � � � � � � � � �� �and their 2D projections onto image � � � � �
are� � � � � � � �� � � � � � � � � � � � �OBJECTIVE:
Find� � � �
given a set of images� � � � � �
varying in � and/or�.
CHALLENGES:
1. Feature correspondence: how to locate theprojections of a physical 3D point on twodifferent images?
♦ search problem♦ ill-posed, often ambiguous
2. Structure estimation: how to recover thedepth information from featurecorrespondences and how accurate arethe estimates?
♦ want to reduce as much as possible thesensitivity to noise and outliers incorrespondences
3. Implementation: how to lowercomputational complexity and datastorage requirements?
❖CLASSICAL APPROACHES ❖
STEREO
Set-up:
♦ spatially varying images
♦ large baselinea
♦ usually known stereo geometry
Method:
♦ area-based or feature-based stereomatching
♦ reconstruction by triangulation
Properties:
♦ large baseline� accurate depth estimates� correspondence difficult due togeometric distortion, occlusion,changes in specular reflection, etc.
abaseline = separation between two images in terms ofrelative distance between camera positions
MOTION
Set-up:
♦ temporally varying images
♦ small baseline
♦ known or unknown motion
Method:
♦ correspondence by optical flow or featuretracking
♦ motion estimation complementary to shaperecovery
♦ recursive or batch processing of longsequences
Properties:
♦ small baseline� easy correspondence� depth estimates sensitive to errorin 2D feature positions
COMBINED STEREO AND MOTION
Set-up:
♦ two consecutive pairs of stereo images or along stereo sequence
♦ known stereo geometry
♦ known or unknown motion
Method: Adaptations/extensions of existingstereo and/or motion techniques, e.g.,
♦ refine depth estimates for known initialstructure
♦ known motion to constrain stereo matching
♦ extend optical flow to stereo pairs
Properties:
♦ stereo and motion complement each otherto overcome individual weaknesses
♦ lack of unified framework to address all offeature correspondence, motion andstructure estimation
❖PROPOSED APPROACH ❖
BASIC IDEA:♦ Feature matching, 3D reconstruction,
feature tracking and motion estimationbootstrap each other;
♦ Initially unambiguous stereocorrespondences provide 3D points forunique determination of motion estimates;
♦ Ambiguities do not need to be resolvedimmediately at each frame. Matchingcandidates are treated as hypotheses to betested in future frames;
♦ Motion estimates give additional constraintsfor feature tracking and stereo matching� may resolve previous matching
ambiguities� generate more 3D points for moreaccurate motion estimation
NOTATION:� � ! " # image feature $ extracted from % � ! " #� &' ! " # image feature ( extracted from % & ! " #) * ! " + $ + ( # hypothesis that � � ! " # and � &' ! " # arestereo correspondences, � ! " # true projection of a 3D feature on % � ! " #, &' ! " # true projection of a 3D feature on % & ! " #-. ' ! " # 3D point reconstructed from
-, � ! " # and-, &' ! " #MOTION AND MEASUREMENT MODELS
2D MOTION MODEL: Initially, a second order motionestimator is used for each 2D feature point in bothleft and right images:, * ! " / 0 # 1 2 3 4 3 0 5 6778 , * ! " #, * ! " 4 0 #, * ! " 4 9 # : ;;</ = ! " # > ! " #where > ! " # ? @ ! A + % # and = ! " # models theprocess noise.
2D MEASUREMENT MODEL: Feature extractionerrors are modelled as B ! " # ? @ ! A + C # :� * ! " # 1 , * ! " # / B ! " #1 D E F + . ! " # G / B ! " # H
3D MOTION MODEL: After 3D motion estimatesbecome available, rigidity constraint for the wholeobject is enforced using a single consistent motion.. ! " / 0 # 1 I ! " # . ! " # / J ! " #I ! " # is 3 K 3 rotation matrix and J ! " # is atranslation vector.
3D MEASUREMENT MODEL: The measurementvector now consists of the extracted features onboth left and right images:68 � � ! " #� & ! " # :< 1 68 , � ! " #, & ! " # :< / 68 B ! " #B ! " # :<1 68 D E L + . ! " # GD E I + . ! " # G :< / 68 B ! " #B ! " # :<
ALGORITHM:
1. M $ + ( + if � � ! " # and � &' ! " # satisfy set of epipolar andminimum/maximum depth constraintsN Create O ) * ! " + $ + ( # PN Reconstruct O -. ' ! " # P
2. For each) *
, generate predictions-� � ! " / 0 Q " # and-� &' ! " / 0 Q " #
3. Match image features at frame " / 0 withpredictions.
If � �R ! " / 0 # matched with-� � ! " / 0 Q " # , &� &S ! " / 0 # matched with-� &' ! " / 0 Q " # , &O � �R ! " / 0 # , � &S ! " / 0 # P satisfy epipolar constraints,N Create
) * ! " / 0 + T + U # N Update-. R S ! " / 0 # # .
If � �R ! " / 0 # has only one stereo matchingcandidateN -. R S ! " / 0 # V W ! " / 0 #
4. Estimate new 3D motion parameters I ! " # andJ ! " # using W ! " # and W ! " / 0 # .5. Repeat from 1.
Validated motioncorrespondences
Motion estimation
2D right image features2D left image features
Multiple hypothesistracking and stereo matching
Validated stereocorrespondences
3D structurerepresentationMotion parameters
3D reconstruction
The Incremental Reconstruction Algorithm
Del
ay
Mat
chin
g
Gen
erat
e n
ew h
ypo
thes
es
Ste
reo
mat
ch h
ypo
thes
es
at f
ram
e f
Ste
reo
mat
ch h
ypo
thes
es a
t fr
ame
f+1
Hyp
oth
esis
Man
agem
ent
(pru
nin
g, m
erg
ing
)F
or
each
hyp
oth
esis
,g
ener
ate
pre
dic
tio
ns
Imag
e fe
atu
res
Pre
dic
ted
fea
ture
loca
tio
ns
V
alid
ated
s
tere
o &
mo
tio
n
corr
esp
on
den
ces
M
oti
on
p
aram
eter
s,3D
str
uct
ure
Mul
tiple
hypo
thes
istr
acki
ngan
dst
ereo
mat
chin
g
2D dynamics
frame f
frame f+1
rightimage
leftimage
stereo match hypothesis
predicted feature locations
2D dynamics
Without 3D motion parameters
frame fXframe f+1X
rightimage
leftimage
stereo match hypothesis
predicted feature locations
3D dynamics
projection
With 3D motion parameters
❖RESULTS ❖
SYNTHETIC PROBLEM
♦ Thirty 3D data points randomly generatedon synthetic model
♦ Simulated stereo set-up and motion tocreate a stereo image sequence
♦ Occlusion not modelled
♦ Random noise with distribution Y � Z � � �added to simulate feature extraction noise
SUMMARY OF RESULTS:♦ Increased number of reconstructed points
and decreased number of stereo matchinghypotheses over the first few frames
♦ 3D motion estimates incorporated afterframe 6, lost track of some features butreconstruction accuracy improved
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
Frame number
Num
ber
of p
oint
s
[Active hypotheses \ Reconstructed points] Mismatched points — Visible features
REAL IMAGE SEQUENCE
♦ 30 corner features extracted from eachimage in sequence
♦ Many disappearing features due to lightingchanges
♦ New features at each frame are added tolist of hypotheses
Left image Right image
One sample pair of images from the real sequence.
Extracted features are shown as white points.
SUMMARY OF RESULTS:♦ Results not as satisfactory as synthetic
problem
♦ No ground truth to assess accuracy
♦ Many ambiguities unresolved by motionand epipolar constraint alone
♦ Motion estimates affected by outliers
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
Frame numberN
umbe
r of
poi
nts
Active hypothesesReconstructed pointsExisting features
[Active hypotheses \ Reconstructed points
— Visible features
❖CONCLUSIONS ❖
♦ presented incremental 3D reconstructionusing a stereo image sequence
♦ all of feature matching, tracking, motion andstructure estimation integrated into onesingle framework
♦ demonstrated potential in a synthetic problem
♦ motion and epipolar constraints alone notsufficient for real sequence
♦ future work includes: occlusion modelling,robust motion estimation, integrating otherstereo matching techniques
Acknowledgments:
Research in this paper is funded in part by Natural Scienceand Engineering Research Council of Canada. Images arecourtesy of Macdonald Dettwiler Space and AdvancedRobotics Ltd.
References:
I. J. Cox, “A review of statistical data association techniquesfor motion correspondence,” Int. J. Computer Vision, vol.10, no. 1, pp. 53–66, 1993.
I. J. Cox and S. L. Hingorani, “An efficient implementation ofReid’s multiple hypothesis tracking algorithm . . . ,” IEEETrans. PAMI, vol. 18, no. 2, pp. 138–50, Feb. 1996.
U. R. Dhond and J. K. Aggarwal, “Structure from stereo — areview,” IEEE Trans. Systems, Man, and Cybernetics, vol.19, no. 6, pp. 1489–1510, 1989.
T. S. Huang and A. N. Netravali, “Motion and structure fromfeature correspondences: A review,” Proc. IEEE, vol. 82,no. 2, pp. 252–268, Feb. 1994.
G. Stein and A. Shashua, “Direct estimation of motion andextended scene structure for a moving stereo rig,” in Proc.IEEE CVPR, 1998.
C. Tomasi and T. Kanade, “Detection and tracking of pointfeatures,” Tech. Rep. CMU-CS-91-132, Carnegie MellonUniversity, Apr. 1991.
J. Yi and J. Oh, “Recursive resolving algorithm for multiplestereo and motion matches,” Image and Vision Computing,vol. 15, no. 3, pp. 181–96, Mar. 1997.
Res
ults
ofR
econ
stru
ctio
n
^ grou
ndtr
uth
_ reco
nstr
uctio
n
Fram
e1:
allt
hepo
ints
that
initi
ally
have
unam
bigu
ous
ster
eom
atch
esar
ere
cons
truc
ted.
Fron
tvie
wTo
pvi
ew
−60
0−
400
−20
00
200
400
600
800
−50
0
−40
0
−30
0
−20
0
−10
00
100
200
300
400
500
X (
mm
)
Y (mm)
−80
0−
600
−40
0−
200
020
040
060
080
021
00
2200
2300
2400
2500
2600
2700
2800
2900
3000
X (
mm
)
Z (mm)
Fram
e5:
mor
epo
ints
are
reco
nstr
ucte
das
som
eof
the
prev
ious
ambi
guiti
esar
ere
solv
ed.
Fron
tvie
wTo
pvi
ew
−60
0−
400
−20
00
200
400
600
800
−50
0
−40
0
−30
0
−20
0
−10
00
100
200
300
400
500
X (
mm
)
Y (mm)
−80
0−
600
−40
0−
200
020
040
060
080
021
00
2200
2300
2400
2500
2600
2700
2800
2900
3000
X (
mm
)
Z (mm)
Fram
e10
:3D
mot
ion
estim
ates
have
been
inco
rpor
ated
.T
heac
cura
cyof
the
dept
hes
timat
esim
prov
ed.
Fron
tvie
wTo
pvi
ew
−60
0−
400
−20
00
200
400
600
800
−50
0
−40
0
−30
0
−20
0
−10
00
100
200
300
400
500
X (
mm
)
Y (mm)
−80
0−
600
−40
0−
200
020
040
060
080
021
00
2200
2300
2400
2500
2600
2700
2800
2900
3000
X (
mm
)
Z (mm)
Fram
e20
:th
ede
pth
estim
ates
ofso
me
ofth
epo
ints
beco
me
even
mor
eac
cura
te.
Fron
tvie
wTo
pvi
ew
−60
0−
400
−20
00
200
400
600
800
−50
0
−40
0
−30
0
−20
0
−10
00
100
200
300
400
500
X (
mm
)
Y (mm)
−80
0−
600
−40
0−
200
020
040
060
080
021
00
2200
2300
2400
2500
2600
2700
2800
2900
3000
X (
mm
)
Z (mm)