11370 2016 195 Author - ULisboausers.isr.ist.utl.pt/~ngracias/publications/Elibol16_ISR... · 2016. 3. 25. · based navigation, planning the path of the vehicle during 34 the execution

unco

rrec

ted

pro

of

Intel Serv Robotics

DOI 10.1007/s11370-016-0195-4

ORIGINAL RESEARCH PAPER

Online underwater optical mapping for trajectories with gaps

Armagan Elibol1 · Hyunjung Shim1· Seonghun Hong2

· Jinwhan Kim3·

Nuno Gracias4· Rafael Garcia4

Received: 10 September 2015 / Accepted: 2 March 2016

© Springer-Verlag Berlin Heidelberg 2016

Abstract This paper proposes a vision-only online mosaic-1 1

ing method for underwater surveys. Our method tackles a2

common problem in low-cost imaging platforms, where com-3

plementary navigation sensors produce imprecise or even4

missing measurements. Under these circumstances, the suc-5

cess of the optical mapping depends on the continuity of6

the acquired video stream. However, this continuity can-7

not be always guaranteed due to the motion blurs or lack8

of texture, common in underwater scenarios. Such tempo-2 9

ral gaps hinder the extraction of reliable motion estimates10

B Armagan Elibol

[email protected]

Hyunjung Shim

[email protected]

Seonghun Hong

[email protected]

Jinwhan Kim

[email protected]

Nuno Gracias

[email protected]

Rafael Garcia

[email protected]

1 School of Integrated Technology, Yonsei Institute of

Convergence Technology, Yonsei University, Incheon,

Republic of Korea

2 Robotics Program, Korea Advanced Institute of Science and

Technology, Daejeon, Republic of Korea

3 Robotics Program, Department of Mechanical Engineering,

Korea Advanced Institute of Science and Technology,

Daejeon, Republic of Korea

4 Computer Vision and Robotics Institute,

University of Girona, Girona, Spain

from visual odometry, and compromise the ability to infer 11

the presence of loops for producing an adequate optical 12

map. Unlike traditional underwater mosaicing methods, our 13

proposal can handle camera trajectories with gaps between 14

time-consecutive images. This is achieved by constructing 15

minimum spanning tree which verifies whether the cur- 16

rent topology is connected or not. To do so, we embed a 17

trajectory estimate correction step based on graph theory 18

algorithms. The proposed method was tested with several 19

different underwater image sequences and results were pre- 20

sented to illustrate the performance. 21

Keywords Underwater robotics · Optical mapping · Image 22

mosaicing · Environmental monitoring 23

1 Introduction 24

Rapid developments in the robotics field have made pos- 25

sible to design small and low-cost robots installed with a 26

limited number of sensors for aerial and/or underwater opti- 27

cal mapping. These robots are capable of reaching areas 28

beyond human reach. This capability allows them to be used 29

for exploring and mapping unknown environments with dif- 30

ficult accessibility for humans such as the Moon, Mars or 31

deep ocean. Collected optical data and processing that data 32

to obtain a map are crucial for different purposes; map- 33

based navigation, planning the path of the vehicle during 34

the execution of the mission and also the generated map 35

can be used to perform some further processing by human 36

experiment (e.g., localization of interest areas, detection of 37

temporal changes in the morphology of bio-diversity of the 38

mapped environment by comparing generated maps). Our 39

target scenarios are surveys of an area of interest with a 40

minimally instrumented underwater vehicles such as low- 41

123

Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large

Au

tho

r P

ro

of

unco

rrec

ted

pro

of

Intel Serv Robotics

cost ROV, towed arrays with standalone underwater cameras,42

diver or action cameras [1]. Most existing low-cost remotely43

operated vehicles (ROVs) [2,3] available on the market use44

low-cost sensors such as a video camera and a pressure45

sensor and in some cases a compass [4,5]. This is differ-46

ent from autonomous underwater vehicles (AUVs), which47

are generally equipped with a wide range of navigation48

sensors such as ultra-short base line (USBL), Doppler veloc-49

ity log (DVL), inertial navigation system (INS), and ring50

laser gyroscopes. In absence of such extensive sensor suite,51

image sensing may provide valuable navigation information,52

although under the constraints of visibility and texture-rich53

environments.54

Image mosaicing is a well-known tool for building 2D55

maps from images. The mosaicing process can be done online56

during data collection and/or offline, as a batch process after57

data collected. Offline mosaicing mainly aims at produc-58

ing highly accurate map of the surveyed area while online59

mosaicing is usually used for navigational purposes in order60

to provide visual feedback to the robot. Due to the online61

computational restrictions, mosaics might not be as accurate62

as the one built by offline methods.63

Most online mosaicing approaches are based on image-to-64

image registration, which relies on time-consecutive image65

registration when there is no other navigational sensor infor-66

mation available. Underwater environmental challenges such67

as poor visibility and non-uniform illumination, combined68

with limited platform maneuverability or inadequate distance69

to the seafloor may result in failure of the image registration70

process. As an example, non-overlapping time-consecutive71

images may appear when the camera operates a low frame72

rate and the robot navigates at very low altitude leading to a73

very small footprint for each image.74

The underlying goal of this paper is to develop a vision-75

only online mosaicing method that is capable of dealing76

with gaps between time-consecutive images, targeting the77

low-cost platforms with very limited sensor suites. For such78

platforms, images and mosaics built from them are the only79

information source about the area being mapped and the path80

being followed during survey. Therefore, efficient gap han-81

dling is important to carry out successful surveys. To deal82

with gaps between time-consecutive images, our approach83

builds upon the Kalman filter-based mosaicing approaches84

similar to [6]. We use a graph-based representation where85

an image is a node and a successfully registered image pair86

is represented as a link (or edge) between two nodes of the87

graph. Every link has an associated motion observation. In88

this paper, we consider plane-induced motions which are89

represented by a 2D planar transformation in the form of90

a homography. This motion is computed from image reg-91

istration [7,8]. When the registration of time-consecutive92

images fails, we add a virtual link between current and pre-93

vious nodes (or frames). The motion attached to these links94

is an identity mapping1 with large uncertainty, and they are 95

intended to keep the transects of the trajectory together rep- 96

resented with respect to the chosen global frame. A graph is 97

regarded as connected when there is a path between every 98

pair of nodes. In other words, there are no unreachable nodes 99

in a connected graph. In our problem, if the current topol- 100

ogy is connected without taking into account the virtual 101

link(s), then every node can be reached and this results in 102

obtaining a proper trajectory estimate regardless of having 103

gaps between time-consecutive images. Therefore, assump- 104

tion of time-consecutive images can be relaxed by the use 105

of non-consecutive overlapping image pairs and checking 106

the connectivity of the topology graph. Motivated by this, 107

we integrate an MST-based checking and trajectory correc- 108

tion step to the framework and this step is triggered when 109

there is/are virtual link(s) in the system. If the topology graph 110

is connected without taking into account the present virtual 111

links, the trajectory estimate is corrected by minimizing the 112

symmetric transfer error. 113

2 Related work 114

Many online mosaicing algorithms have been presented 115

within the context of visual simultaneous localization and 116

mapping (SLAM). Garcia et al. [9] and Eustice et al. 117

[10] employed an augmented-state Kalman filter (ASKF) to 118

incorporate relative pose measurement obtained from image 119

matching. In [11], a real-time mosaicing system on a seabed 120

was proposed by combining vision and DVL odometery in 121

order to estimate the vehicle’s position and bound the odom- 122

etry drift. However, the system was limited to cover only 123

translation motion in vision processing. Mahon et al. [12] 124

presented mapping results of a large-area seafloor using a 125

stereo vision system, and the stereo rig was also used to pro- 126

vide loop-closure observations. Kim and Eustice [13] showed 127

a photomosaic for ship hull as a result of autonomous inspec- 128

tion using a hovering-type AUV [14]. In their work, the 129

authors used a pose-graph structure based on mutual infor- 130

mation (extended information filter) for inference. These 131

approaches have shown impressive results, but assume the 132

availability of expensive motion sensors (e.g., DVL, INS). 133

However, those sensors cannot be guaranteed in most com- 134

mercial low-cost ROVs [2,3]. 135

Bülow et al. [15] proposed an efficient online mosaicing 136

method for unmanned aerial vehicles (UAVs) using Fourier– 137

Mellin transform. Even though the proposed methodology 138

is able to create the photo maps just based on images with- 139

out any additional measurement input about the vehicle’s 140

motion, they did not address the problem of matching non- 141

1 Identity mapping is no rotation (0 degree with respect to the chosen

global frame), no translation and scale equals to 1.

123


Au

tho

r P

ro

of

unco

rrec

ted

pro

of

Intel Serv Robotics

time consecutive images. As stated, the proposed method142

fails if there is not enough overlapping area between time-143

consecutive images.144

Caballero et al. [6] formulated a vision-only online145

mosaicing algorithm based on extended Kalman filter (EKF).146

In their model, the state vector was composed of absolute147

homographies and the state was updated when a loop-closure148

was detected and images were processed sequentially. Fer-149

reira et al. [16] contributed with a real-time mosaicing150

method in the frame of landmark-based SLAM without the151

need for costly motion sensors. Neither method explicitly152

considers the case of matching failure with respect to sequen-153

tial image pairs and thus their approaches may fail if there is154

not enough overlap between time-consecutive images. Kekec155

et al. [17] utilized a geometric tool from computer graph-156

ics, called separating axis theorem, to detect intersections157

between new and previous images in the mosaic. Those158

approaches can contribute to improvement of total com-159

putational cost by reducing the number of matching trials.160

However, this approach also requires overlap between time-161

consecutive images to compute initial position of the last162

image.163

A common feature of the above online approaches is either164

the reliance on navigation sensors, or the assumption that165

time-consecutive images can always be adequately regis-166

tered.167

This paper builds upon a body of work by the same168

group [18]. In [19], the problem of batch mosaicing problem169

was addressed using Kalman-filter framework and differ-170

ent strategies presented for ranking possible overlapping171

image pairs. In [20], a global alignment method is presented172

which substitutes non-linear minimization with two linear173

successive steps reducing the computational cost signifi-174

cantly. In [21], the problem of obtaining topology with less175

image-matching attempts from unordered images acquired176

previously was addressed using graph theory algorithms. The177

current paper departs from these, in the sense that it addresses178

the mapping online and with gaps between time-consecutive179

images.180

3 Model definitions and nomenclature181

This paper uses the following common mosaicing notation [6,182

19].183

• i H j is the homography relating image points represented184

in the coordinate frame image j to the image i .185

• All images need to be represented in a common single186

coordinate frame, called mosaic frame and represented187

as M .188

• M Hi is the homography relating image points in image i189

to the mosaic frame.190

• The state vector, x = [x1, x2, x3, . . . , xN ]T , is composed 191

of the homography values that relate every image with 192

the mosaic frame and N is the total number of images. 193

M Hi =

si · cos θi −si · sin θi t xi

si · sin θi si · cos θi t yi

0 0 1

, 194

where si is scale, θi is rotation while t xi and t yi are trans- 195

lation parameters. vec(·) is the function that converts the 196

homography matrix input into a vector, thus 197

xi = vec(M Hi ) 198

= [si θi t xi t yi ]T . 199

• P denotes the covariance matrix of the state vector x. 200

• A new observation (measurement) is obtained when 201

two images, i and j , are successfully matched. The 202

observation is represented by the homography between 203

corresponding images. The relation between state and the 204

observation at time k can be expressed as follows: 205

z(k) = vec(i H j ) + v(k) 206

= vec(i HM ·M H j ) + v(k) 207

= vec(mat (xi )−1 · mat (x j )) + v(k), (1) 208

where mat (·) is the function which converts the state 209

vector into homography matrices and v(k) is 4×1 obser- 210

vation noise vector. It is assumed that the observation 211

noise is Gaussian, it is not correlated with state noise, 212

and its covariance matrix is R(k). 213

• The reference frame is the frame of the first image, there- 214

fore M = 1 and it is not part of the parameter vector 215

M H1 = mat ([1 0 0 0]). (2) 216

4 Online mosaicing for trajectories with gaps 217

Our proposal is based on standard image-to-map (ITM) reg- 218

istration using EKF equations for carrying out trajectory 219

estimation. For better numerical stability, we employ square 220

root formulation for Kalman filter as in [22]. As our interest 221

is in creating the optical map of an area from images alone 222

and with no assumptions on the dynamics of the underwater 223

platform, we do not take into account any control input. Our 224

model does not have any state prediction equations, and only 225

update equations are used. The outline and detailed work- 226

flow of our proposal is given in Fig. 1. 227

Image acquisition and matching Each new image 228

acquired at time t is matched against the image t − 1. 229

123


Au

tho

r P

ro

of

unco

rrec

ted

pro

of

Intel Serv Robotics

Ini�alize Data

Structuresand set Flag=0

Matching

successful

Set Switch=0and

Empty=1

Do augmenta�on with

generic values

Acquire new image It

Match image pair It It-1

Y

N

Include It It-1 in

successfully

matched list and

Compute overlap

EKF update virtual obs.

Flag=Flag+1, overlap=0,

Switch=1and Empty=0

Overlap <

threshold

Y

N

Flag==0Y

N

Find overlapping

image pairs by

Similarity Search

Find overlapping

image pairs by

Similarity and

Spa�al Search

Rank possible

overlapping pairs

A�empt to match

selected pairs

Matching

successful

N

Y

Switch==1Y

N

Flag=Flag-1

Do EKF

update with

successfully

matched pairs

Flag>=1

Topology

Connected

Y

Y

Correct

Trajectory and

Flag=0

N

N

B

A

A

B

Update successfully

matched list and

Set Empty=1

Empty==1Y

N

A

Acquire new image It Match image pair It It-1

Generate possible

overlapping image pairs

A�empt to match n

selected image pairs

Update state and its

covariance

Correct trajectory if

topology is connected

Fig. 1 Workflow of the proposed method. It is divided into three main steps; image acquisition and matching, identification of possible overlapping

pairs, and state update and trajectory correction

If this matching is successful, we set Cons Flag to 1 so230

that this image pair (It , It−1) will be used to update state231

and its covariance accordingly. We compute the percent-232

age of overlapping area between images using a quick233

method of projection a subset of point from one image234

to the other using the motion computed through image235

matching and determining the percentage of projected236

point that fall inside the frame of the second image.237

If the matching fails, we add a virtual observation (or238

link) as if this matching was successful. The main goal239

of adding such links is that they allow the segments of240

the trajectory to be represented with respect to the cho-241

sen global frame.2 They are useful for searching the242

possible overlapping images between the last acquired243

image and different segments of the trajectory. Virtual244

links are treated as identity mappings (e.g., in Eq. 2)245

2 The first image frame is usually considered as a global frame in the

absence of any other relevant information.

with a suitably large observation noise covariance to 246

have a minimum impact on state vector. The state vec- 247

tor is updated with this identity mapping. We give an 248

increase to the parameter Flag, which is used as an indi- 249

cator if there is a virtual link in the topology and also its 250

cardinality. 251

Identification of possible overlapping pairs After fin- 252

ishing the image-matching step, either using a virtual link 253

or with successfully matched link, the next step is to find 254

the possible overlapping image pairs between image It 255

and all previous images. To do so, we employ two dif- 256

ferent strategies depending on the existence of virtual 257

links at that time t . If there are virtual links, the trajec- 258

tory estimate will likely be very inaccurate. Therefore, 259

the trajectory estimate and its covariance are not used 260

to predict overlapping image pairs. In such cases, we use 261

only a visual similarity indicator (described next). If there 262

are no virtual links, then we use both the visual similarity 263

and a spatial proximity indicator. 264

123


Au

tho

r P

ro

of

unco

rrec

ted

pro

of

Intel Serv Robotics

The spatial proximity indicator is computed from the265

current state vector and its covariance by using Maha-266

lanobis distance between image centers [7,19]. The267

visual similarity indicator is based on feature descrip-268

tor matching. We scale down image size to reduce the269

computational cost to maintain real-time performance,270

and extract speeded up robust features (SURF) features271

as they provide good performance underwater [23]. We272

use local difference binary (LDB) [24] since it is a very273

fast and compact binary feature descriptor, directly over274

an image patch. It is computed using average intensity275

and first-order gradients, which can be computed very276

fast using integral images. The descriptors are matched277

using Hamming distance. For each descriptor, the two278

best matches are found with a nearest neighbor search.279

The distance ratio from the closest neighbor to the second280

closest is used as a criterion to identify matching descrip-281

tors [25]. For a given pair of images, the visual similarity282

indicator is proportional to the number of descriptors that283

are matched using the distance ratio criterion.284

A list of possibly overlapping image pairs is created by285

selecting the image pairs that have high visual similarity286

(for the case of virtual links present), or a combination287

of both high visual similarity and high spatial proximity288

(for the case when no virtual links exist).289

This list contains the pairs that should be matched in the290

following step using random sampling over the homog-291

raphy constraint. However, this list might be too large292

to process under the real-time constraints. In such case,293

only a subset of pairs is chosen, using the observation294

mutual information (OMI) [19,26] as a ranking criterion.295

The OMI can be regarded as a measure of the amount296

of information that one observation can provide to the297

system within a Kalman filter estimator, since the OMI298

provides the amount of uncertainty of the state that will299

reduce when the observation is done. At time t , given an300

observation z(t), the OMI can be calculated as:301

I (t, z(t)) =1

2ln[|S(t)||R(t)−1|], (3)302

where S(t) is an innovation covariance matrix in Kalman303

filter formulation [27] and R(t) is the observation noise304

covariance matrix. The computation of OMI requires305

noise covariance matrix R(k). Since the real values for306

R(k) cannot be obtained without having images matched,307

we use the same covariance matrix for all image pairs.308

OMI scores computed in this way can be named as pre-309

dicted information gain of observations. The OMI score310

is calculated for each image pair in the possible overlap-311

ping image pairs list.312

Based on preliminaA subset of maximum five image pairs313

after ranking the image pairs, we select a subset. Dur-314

ing our experiments, the subset size was chosen as five 315

image pairs, which was found empirically to be an ade- 316

quate value. Image pairs in the subset are attempted to be 317

matched. For image matching, SURF features and LDB 318

descriptors are used and M-estimator sample consensus 319

(MSAC) [28] is used for outlier rejection and homogra- 320

phy estimation. 321

State update and trajectory correction If some image 322

pairs are successfully matched, then the homography 323

between them is used to update the state. The noise 324

covariance of the homography is calculated from the 325

matched correspondences using first-order covariance 326

propagation [29] by assuming additive Gaussian noise 327

on the positions of correspondences. This noise covari- 328

ance matrix is used to update the state covariance matrix 329

using the Kalman filter update equations. 330

The final step in this framework is to check whether any 331

of the existing virtual links can be removed. To do so, we 332

check if the successfully matched pairs so far are enough 333

to establish a link from the first image to current image. 334

This can be done by finding the spanning tree (ST). If 335

the current form of the topology allows for establishing 336

such a connection, this means that the topology is already 337

connected and there is no need for virtual links. This also 338

allows us to correct the trajectory estimate by minimizing 339

the symmetric transfer error given in Eq. 4. Due to the 340

time constraints of online operation, we use four image 341

corners as correspondences between overlapping image 342

pairs during this step. Since the homography between 343

overlapping image pairs is computed from a large set 344

of inliers, the resulting homography is accurate enough 345

to generate four virtual correspondences namely image 346

corners, for trajectory correction [30]. After estimating 347

the trajectory, its covariance is propagated [7]. 348

This MST correction step is triggered if at least one virtual 349

link is used from the last time the correction was applied. 350

Once the image acquisition is completed, we employ this 351

correction step again in order to reduce the effects of the 352

presence of virtual links on the trajectory estimate. 353

The overall EKF algorithm is described by the following 354

structured text outline: 355

1. Initialize data structures and set Flag = 0 356

2. Wait for new time instant and set Swi tch = 0 and 357

Empty = 1; 358

3. Do state and covariance matrix augmentation with 359

generic values 360

4. Acquire new image It 361

5. Match against last image It−1 and compute overlap 362

6. if matching is successful 363

(a) Add image pair (It , It−1) in successfully matched 364

image pairs list. 365

123


Au

tho

r P

ro

of

unco

rrec

ted

pro

of

Intel Serv Robotics

Fig. 2 Simulated trajectory: white dots represent the image centers

while red dashed lines are for non-overlapping time-consecutive image

pairs

(b) Compute the overlap percentage between image366

(It , It−1)367

7. else368

(a) Do EKF measurement update with virtual measure-369

ment370

(b) Set Flag = Flag + 1, overlap = 0, Swi tch = 1, 371

and Empty = 0; 372

8. end 373

9. if overlap < overlapT hreshold 374

(a) if Flag == 0 375

i. Find images that are in the certain bounding area 376

to the current position of It and that have certain 377

percent of visual similarity. 378

(b) else 379

i. Find images that have certain percent of visual 380

similarity. 381

(c) end 382

(d) Rank possible overlapping image pairs and select top 383

n number of them. 384

(e) Attempt to match selected image pairs 385

(f) if matching is successful 386

i. Update successfully matched list. 387

ii. Set Empty = 1 388

iii. if Swi tch == 1 389

A. Set Flag = Flag − 1 390

iv. end 391

(g) end 392

10. end 393

11. if Empty == 1 394

(a) Do EKF measurement update with all successfully 395

matched images with It 396

(b) if Flag >= 1 397

i. Check if a current topology graph is connected 398

Fig. 3 Final trajectory obtained

for the first dataset. Dots

represent image centers and

dotted (red) lines show the

established non-consecutive

overlapping image pairs, while

dashed lines denote the

non-overlapping

time-consecutive images

123


Au

tho

r P

ro

of

unco

rrec

ted

pro

of

Intel Serv Robotics

A. Correct trajectory estimate399

B. Set Flag = 0400

ii. end401

12. end402

13. Go to 2403

5 Experimental results404

The framework described in the previous section was405

tested on a general setup for image surveys using differ-406

ent unmanned underwater vehicles (UUVs) equipped with407

a down-looking camera. Experiments have been carried out408

using both real and simulated challenging datasets of under-409

water surveys.410

During our experiments, we found that image resolution411

around 256 × 192 pixels provide an acceptable trade-off412

between visual similarity quality and computational perfor-413

mance. Therefore the images were downscaled by a factor of414

2 or 4, depending on the original resolution. The image over-415

lap threshold was set to 90 % (overlapT hreshold = 0.9)416

and the number of overlapping image pairs to be attempted417

to be matched was set to (n = 5).418

All the tests were performed using a desktop computer419

with an Intel Xeon E5-1650™ 3.5 Ghz processor with a420

64-bit operating system and running MATLAB™ on CPU.421

For feature detection and matching, OpenCV functions are422

employed through MEX-file interface. Feature detection and423

computing descriptors took 30 ms on average for an image424

of 256 × 192 pixels, while LDB descriptor matching for an425

image pair having maximum 250 descriptors took approxi-426

mately 0.5 ms. Other steps of the method were implemented427

purely in MATLAB.428

5.1 Dataset properties429

The first dataset is composed of 245 images of 1024 × 1024430

pixels and they are cropped from a high-resolution image431

using a subset of real trajectory executed by Victor-6000 ROV432

during the MoMAR08 cruise3 at the deep-sea Lucky Strike433

hydrothermal field (Mid-Atlantic Ridge) [31]. The trajec-434

tory superimposed on the high-resolution image is illustrated435

in Fig. 2. The total number of overlapping image pairs is436

3321 and the total number of correspondences is 2,218,692.437

This dataset has six large jumps between time-consecutive438

images. In other words, there are six time-consecutive images439

that do not have overlapping area. The distance between the440

centers of these time-consecutive images can be considered441

as large taking into account the area covered by a single442

3 http://www.ifremer.fr/biocean/acces_gb/rapports/Appel_2cruisefr.

htql?numcruise=203. Accessed on August 25th, 2015. Tab

le1

Err

ors

inp

ixel

so

no

bta

ined

traj

ecto

ries

for

test

edd

atas

ets

Dat

aset

Met

hod

Num

ber

of

over

lappin

gim

age

pai

rsIm

age

cente

rdis

tance

erro

rS

ym

met

ric

tran

sfer

dis

tance

erro

rE

rror

on

mosa

icfr

ame

Su

cces

sfu

lly

mat

ched

Unsu

cces

sful

atte

mp

tsM

ean

Std

Max

Mea

nS

tdM

axM

ean

Std

Max

Dat

aset

1a

Pro

pose

d1139

13

39.3

923.8

785.0

65.8

13.0

349.5

72.7

73.0

050.6

3

245

imag

esIm

age-

to-i

mag

e(I

TI)

1053

156

5309.5

32344.9

616,7

39.5

34947.2

95413.0

565,1

26.6

41277.0

32424.4

912,6

01.5

6

1024

×1024

Offl

ine

mosa

icin

gin

[7]

3215

283

42.3

625.7

584.8

05.9

23.0

045.9

72.8

32.9

846.6

7

2,2

18,6

92

corr

esp.

All

-agai

nst

-all

(AG

A)

3321

26,5

69

39.5

723.3

879.0

85.9

63.0

146.0

42.8

52.9

947.1

8

Imag

e-to

-map

(IT

M)

(4-D

OF

)N

.AN

.A21.9

310.2

940.8

06.5

83.3

055.5

33.1

33.2

156.9

0

Imag

e-to

-map

(IT

M)

(8-D

OF

)N

.AN

.AN

.AN

.AN

.A1.1

90.3

84.0

30.5

60.3

63.8

9

Dat

aset

2P

ropose

d1615

1669

43.8

225.6

5124.8

58.4

18.1

3192.5

94.1

98.2

7178.9

4

1011

imag

esIm

age-

to-i

mag

e(I

TI)

2140

976

827.0

5856.0

43665.3

7131. 3

6259.8

14133.3

6110.5

1399.0

23971.2

7

512

×384

Offl

ine

mosa

icin

gin

[7]

3259

35,7

84

3.0

11.8

610.9

66.9

62.6

444.3

53.4

52.6

341.1

7

556,1

23

corr

esp.

All

-agai

nst

-all

(AG

A)

3340

507,2

15

N.A

N.A

N.A

7.1

42.6

839.6

33.5

42.6

739.7

9

Dat

aset

3P

ropose

d1642

1219

250.4

9171.6

7646.7

429.1

615.5

0311.6

213.2

814.5

6269.7

4

493

imag

esIm

age-

to-i

mag

e(I

TI)

1351

387

2130.3

71230.8

15396.6

32255.5

01541.3

67517.0

81425.6

31908.4

28607.7

8

1440

×806

Offl

ine

mosa

icin

gin

[ 7]

3676

27,8

10

287.5

2970.7

93951.1

138.4

1134.3

43657.1

017.5

6126.8

33428.2

4

259,4

43

corr

esp.

All

-agai

nst

-all

(AG

A)

3686

117,5

92

N.A

N.A

N.A

25.9

012.0

9167.0

811.5

811.0

4150.0

8

aF

or

this

dat

aset

,tr

ajec

tory

com

par

ison

isdone

agai

nst

ITM

(6-D

OF

)

123


Au

tho

r P

ro

of

unco

rrec

ted

pro

of

Intel Serv Robotics

Fig. 4 Trajectory comparison for the second dataset. Left trajectory

shown with blue lines is obtained with minimizing the symmetric trans-

fer errors using all-overlapping image pairs, while the black one is

obtained with the proposed method. Final mosaics (rendered using last-

on-top strategy without any intensity blending) for the second dataset.

Center obtained using all-overlapping image pairs. The size of mosaic is

2348×3565. Right obtained with proposed method. The size of mosaic

is 2305 × 3580

image. For this dataset, the mean distance between the centers443

of non-overlapping time-consecutive pairs is approximately444

five times greater than a single image diagonal. The final445

trajectory obtained with our method is illustrated in Fig. 3.446

By registering every single image to the high-resolution447

image, we obtain trajectories that can serve as ground-truth.448

We registered individual images with two different type of449

transformations namely, similarity (4-DOF) and projective450

(8-DOF). These trajectories are included in Table 1 as ITM3 451

4-DOF and 8-DOF.452

The second dataset originally consists of 1136 images of453

512 × 384 obtained by Phantom XTL ROV during a survey454

of a patch reef located in the Florida Reef Tract (depth 7–455

10 m) near Key Largo in the US [32]. We removed some456

of images in order to create our test scenario of having457

relatively large jumps (at least 1.5 times the diagonal size458

of a single image) between some of the time-consecutive459

images. Finally, the dataset has 1011 images of which four460

time-consecutive images do not overlap. The total number of461

overlapping image pairs is 3340, while the total number of462

correspondences is 556,123.463

The third dataset is composed of 493 images of 1440×806464

pixels. This dataset has 15 non-overlapping time-consecutive465

image pairs. The total number of overlapping image pairs is466

3686 and the total number of correspondences is 259,443.467

The fourth dataset is extracted from a dataset that was468

acquired by the ICTINEU AUV [33] during the experiments469

in the Mediterranean Sea, surveying at a depth of 16 m and 470

keeping a distance from the robot to the seafloor of 3 m. The 471

extracted data has 92 images of 384 × 288 and composed of 472

two unconnected trajectories. 473

5.2 Trajectory accuracy comparison 474

For accuracy comparison on trajectories, we computed a 475

trajectory with minimizing the symmetric transfer error 476

given below using all-overlapping image pairs identified by 477

exhaustive all-against-all (AGA) image-matching strategy. 478

min1H2,

1H3,...,1HN

∑

k

∑

m

c∑

j=1

(

‖ kp j − 1H−1k · 1Hm · mp j ‖2 479

+ ‖ mp j − 1H−1m · 1Hk · kp j ‖2

)

, (4) 480

where k and m are image indices that were successfully 481

matched, N is the total number of images, and c is the 482

total number of correspondences between the overlapping 483

image pairs. In our experiments, c is selected as 4-corners 484

of the images due to the real-time constraints. This tra- 485

jectory is referred as AGA in the Table 1. We computed 486

trajectories using offline mosaicing method in [7], which is 487

capable of creating mosaics from totally unordered image 488

set unlike the traditional methods requiring overlap between 489

time-consecutive images. This method makes use of similar- 490

123


Au

tho

r P

ro

of

unco

rrec

ted

pro

of

Intel Serv Robotics

Fig. 5 First row trajectory comparison for the third dataset. Trajectory

illustrated with blue lines is obtained with minimizing the symmet-

ric transfer errors using all-overlapping image pairs, while the black

one is obtained with the proposed method. Right zoomed regions of

trajectory comparison. While the difference between trajectories in the

regions 1 and 2 can be regarded as big, in the region 3 difference is small

relatively. This is mainly related to the number of non-consecutive over-

lapping image pairs used. Images in the region 1 have 474 overlapping

image pairs. Only 132 of them were identified and used by our method.

Similarly in the region 2, there are 267 overlapping image pairs and

198 of them were used. In the region 3, there are 197 pairs and 179 of

them were used. Second row final mosaics (rendered using last-on-top

strategy) for the third dataset. Left obtained using all-overlapping image

pairs. The size of mosaic is 5935 × 7052 pixels. Right obtained with

proposed method. The size of mosaic is 6090 × 7133 pixels

ity information between image pairs computed a priori using491

descriptor matching. We computed this similarity informa-492

tion with the same parameter set used in our proposed493

method. We also include the results of traditional image-to-494

image online mosaicing, referred to as image-to-image (ITI)495

with generating overlapping image pairs through distance496

between image centers for a given trajectory estimate. For497

non-overlapping time-consecutive images, we introduced498

them with identity mapping. Obtained results are summa-499

rized in Table 1. First image frame of the obtained trajectory500

is aligned with the first image frame of the AGA trajectory501

(except Dataset 1) and then distance on image centers are502

calculated against the image centers obtained with AGA tra-503

jectory. Symmetric transfer errors are also computed using504

all correspondences detected over all identified overlapping505

image pairs.506

One of the important properties of symmetric transfer 507

error is invariance to the selected global frame. On the other 508

hand, sometimes it does not directly provide information 509

on the final mosaic quality due to the rendering strategy, 510

namely (last-on-top), we used. If there is a big scale differ- 511

ence between images, one of them may not be visible on the 512

final mosaic. Error between them may not effect the visual 513

quality of the final mosaic. For this reason, we also report 514

error on mosaic frame computed as: 515

min1H2,

1H3,...,1HN

∑

k

∑

m

c∑

j=1

| 1Hk ·k x j −1 Hm ·m x j | , (5) 516

where k and m are images that have an overlap area, and c 517

is the total number of correspondences between them. The 518

123


Au

tho

r P

ro

of

unco

rrec

ted

pro

of

Intel Serv Robotics

Fig. 6 Initial similarity matrix computed detecting SURF features and

matching LDB descriptors without outlier rejection. Maximum number

of matched descriptors is 228

main drawback of this error is that it suffers from the scal-519

ing effect. The smaller the mosaic size gets, the smaller the520

error becomes. For the second and third datasets, although521

the errors on final trajectories may seem high, final mosaic522

quality is similar to their counterparts as seen in Figs. 4 and 5.4 523

Although the offline method was able to identify almost all-524

overlapping image pairs, it failed to obtain coherent trajectory525

for the third dataset as one of the transects remained uncon-526

nected. This is mainly due to the initial similarity matrix given527

in Fig. 6. This initial information can be regarded as noisy528

since it suggests a lot of possible overlapping pairs that are529

not overlapping actually. Our proposal was able to recover530

connection between transects as the MST check applies iter-531

atively for every image unlike the offline method, which tries532

to find an MST from the first image to the last one by adap-533

tively changing virtual links in each iteration.534

5.3 Performance analysis with different number of 535

non-overlapping time-consecutive image pairs 536

In order to see how our method performs with different num- 537

ber of non-overlapping time-consecutive image pairs, we 538

designed an experiment using the first dataset. We removed 539

the correspondences of a certain number of overlapping 540

time-consecutive image pairs randomly and run our method. 541

Results are provided in Table 2. For each number of differ- 542

ent non-overlapping time-consecutive image pairs, results 543

are statistically computed over 100 different runs. For the 544

maximum error columns, we report the maximum of over 545

different runs. Our method was able to obtain the final tra- 546

jectory. This is mainly as a result of the trajectory that is 547

fully connected and moreover it is dense (3321 overlapping 548

pairs for 245 images) thus, removing links between time- 549

consecutive overlapping pairs did not break the connectivity. 550

We also observed that increasing the number of removed 551

links increased the total number of executions of the correc- 552

tion step as expected. On the other hand, if the distance the 553

vehicle traveled between non-overlapping time-consecutive 554

images is relatively small comparing to the diagonal of a 555

single image, intermediate trajectory estimate with virtual 556

link(s) added usually did not totally drift away from the orig- 557

inal trajectory. This provides a good initial trajectory estimate 558

for the correction step, thus reducing the total number of 559

iterations in the non-linear optimization process. Also, in 560

such cases, the execution of correction step can be limited in 561

order to meet computational requirements. Since the trajec- 562

tory estimate usually contains virtual links (especially for a 563

higher number of non-overlapping time-consecutive image 564

pairs), the step of generating the possible list of overlap- 565

ping image pairs mostly reduces to and relies on the only 566

visual similarity search between images. In such cases, the 567

overall performance is highly dependent on the performance 568

of visual similarity search. In order to support these find- 569

ings, we also did a similar test with the third dataset which 570

Table 2 Errors in pixels on obtained trajectories for different number of non-overlapping time-consecutive image pairs for the first dataset

Method Number of non-overlapping

consecutive image pairs

Image center distance error Symmetric transfer

distance error

Error on mosaic frame

Mean Std Max Mean Std Max Mean Std Max

Proposed 25 4.30 3.49 44.40 5.86 3.13 57.49 2.80 3.09 51.54

All-against-all (AGA) 25 N.A N.A N.A 5.98 3.02 46.41 2.86 3.00 47.46

Proposed 50 5.02 4.56 45.39 5.93 3.25 58.45 2.83 3.19 51.54


Proposed 100 8.22 7.54 45.55 6.13 3.72 58.56 2.93 3.59 51.59


Proposed 200 12.59 10.24 46.47 6.49 4.35 59.06 3.10 4.13 51.89


123


Au

tho

r P

ro

of

unco

rrec

ted

pro

of

Intel Serv Robotics

Fig. 7 Left full trajectory. Images from 180 to 225 and 270 to 315

are extracted and formed as a new dataset. Right obtained trajectory

with proposed method. Due to the virtual link established between the

last image of first transect and the first image of the second transect,

absolute positioning of the transects are not the same with original tra-

jectory. However, they are within themselves are aligned similarly to

the original trajectory as in the left plot

Table 3 Errors in pixels on obtained trajectories for the transects of unconnected trajectory

Dataset Method Number of overlapping

image pairs

Image center

distance error

Symmetric transfer

distance error

Error on mosaic frame

Successfully

matched

Unsuccessful

attempts

Mean Std Max Mean Std Max Mean Std Max

Dataset 4 Proposed 152 72 28.00 20.80 63.94 6.86 2.98 41.75 5.38 4.68 58.30

Transect 1 All-against-all (AGA) 293 742 N.A N.A N.A 7.84 2.83 27.14 4.30 3.10 28.43

Dataset 4 Proposed 207 15 + 44a 4.13 4.31 16.04 7.39 3.11 37.00 6.00 5.07 58.04

Transect 2 All-against-all (AGA) 366 669 + 2116a N.A N.A N.A 7.20 3.68 50.69 3.65 3.72 49.80

a This number is the image-matching attempts between image pairs among two transects

is sparser comparing to the first dataset. This dataset has571

already 15 non-overlapping time-consecutive image pairs.572

We randomly removed ten more links between overlapping573

time-consecutive image pairs and run for 100 trials. Tak-574

ing into account the topology, removing ten was more than575

enough to break the connectivity of the topology, which rules576

out the initial scenario of having connected trajectory. 17 of577

these trials trajectory became unconnected due to the links 578

removed. In 36 trials of remaining 83, our method was able 579

to recover the trajectory similar to the original one. In order 580

to see how the result changes with the different parameter 581

set for visual similarity search, we further did a final experi- 582

ment. We increase the number of feature points detected per 583

image by changing the image resizing scale factor from 0.25 584

123


Au

tho

r P

ro

of

unco

rrec

ted

pro

of

Intel Serv Robotics

Fig. 8 Individual trajectory

comparison for transects of

Dataset 4. Left first transect.

Right second transect

to 0.5 for this dataset and reducing the threshold for fea-585

ture detector. These changes have increased the total number586

of features detected and caused an increase on the com-587

putational cost naturally. Feature detection and description588

took approximately 80 ms per image and descriptor match-589

ing took 7 ms in average per image pair both running on590

CPU. Again we remove ten more links between overlapping591

time-consecutive image pairs randomly and run 100 trials.592

In 15 trials trajectory became unconnected and our method593

was able to recover the trajectory in 76 trials of remaining 85.594

Since the proposed method is capable of dealing with differ-595

ent number of non-overlapping time-consecutive image pairs596

depending on the visual similarity search performance, the597

image acquisition order is not fully important. This is mainly598

due to the fact that overlapping image pairs are searched with599

their visual similarity scores. It should be also noted that the600

total number of execution of correction step varies depend-601

ing on the image order. For the first dataset, we order images602

randomly and run our method. This is repeated 100 times.603

Our method was successful in all of 100 trials. This result604

mainly depends on the parameters used for visual similarity605

search and partially the density of dataset. This flexibility on606

image order allows our method to be suitable for multi-robot607

surveying scenarios.608

5.4 Test with unconnected trajectories 609

Finally, we tested our proposal on a trajectory composed 610

of two unconnected transects, although our initial scenario 611

assumes a connected trajectory. Trajectories are illustrated 612

in Fig. 7. Obtained results are given in Table 3 as Dataset 4. 613

Individual trajectory comparisons are depicted in Fig. 8. Our 614

proposal was able to obtain the topologies of independent 615

parts within acceptable accuracy. Absolute positions of the 616

transects were not obtained correctly due to the virtual link. 617

Since the trajectory is unconnected, removing virtual link is 618

not possible. 619

6 Conclusions 620

Owing to inexpensive robotic platforms and optical sensors, 621

underwater exploration and mapping have become avail- 622

able to an increasing number of end-users, who can deploy 623

these systems with little expertise. In this paper, we pro- 624

pose an online mosaicing method that is capable of handling 625

gaps between time-consecutive images unlike the traditional 626

methods and this capability makes it adequate for surveys 627

with such low-cost platforms. This is achieved by the use 628

123


Au

tho

r P

ro

of

unco

rrec

ted

pro

of

Intel Serv Robotics

of MST check step, which triggers the trajectory estimate629

correction step. We also show that visual similarity search630

becomes crucial when there is no additional positioning631

information available. The proposed method was tested with632

several different underwater image sequences and results633

were presented to illustrate the performance.634

References635

1. Gleason A, Gracias N, Lirman D, Gintert B, Smith T, Dick M, Reid636

R (2010) Landscape video mosaic from a mesophotic coral reef.637

Coral Reefs 29(2):253638

2. GNOM Baby (2015). http://www.gnomrov.com/products/gnom-639

baby/5 640

3. VideoRay Scout (2015). http://shop.videoray.com/shop-front#!/641

VideoRay-Scout-Remotely-Operated-Vehicle-ROV-System/p/642

39381588/category=0643

4. SeaBotix LBV150-4 MiniROV (2013). http://www.seabotix.com/644

products/lbv150-4.htm645

5. Proteus 500 ROV (2014). http://www.hydroacousticsinc.com/646

products/rov-remote-operated-vehicles/rov-product-specs.html647

6. Caballero F, Merino L, Ferruz J, Ollero A (2007) Homography648

based Kalman filter for mosaic building. applications to UAV posi-649

tion estimation. In: IEEE international conference on robotics and650

automation, pp 2004–2009651

7. Elibol A, Gracias N, Garcia R, Gleason A, Gintert B, Lirman652

D, Reid PR (2011) Efficient autonomous image mosaicing with653

applications to coral reef monitoring. In: IROS 2011 workshop on654

robotics for environmental monitoring655

8. Garcia-Fidalgo E, Ortiz A, Bonnin-Pascual F, Company JP (2015)656

A mosaicing approach for vessel visual inspection using a micro-657

aerial vehicle. In: 2015 IEEE/RSJ international conference on658

intelligent robots and systems (IROS). IEEE, pp 104–110659

9. Garcia R, Puig J, Ridao P, Cufí X (2002) Augmented state Kalman660

filtering for AUV navigation. In: IEEE international conference on661

robotics and automation, Washington, D.C., vol 3, pp 4010–4015662

10. Eustice R, Pizarro O, Singh H (2004) Visually augmented naviga-663

tion in an unstructured environment using a delayed state history.664

In: 2004 IEEE international conference on robotics and automa-665

tion, 2004. Proceedings. ICRA’04, vol 1. IEEE, pp 25–32666

11. Richmond K, Rock SM (2006) An operational real-time large-667

scale visual mosaicking and navigation system. In: OCEANS 2006.668

IEEE, pp 1–6669

12. Mahon I, Williams SB, Pizarro O, Johnson-Roberson M (2008)670

Efficient view-based slam using visual loop closures. IEEE Trans671

Robot 24(5):1002–1014672

13. Kim A, Eustice R (2009) Pose-graph visual slam with geometric673

model selection for autonomous underwater ship hull inspection.674

In: IEEE/RSJ international conference on intelligent robots and675

systems (IROS’09). IEEE, pp 1559–1565676

14. Vaganay J, Elkins M, Willcox S, Hover F, Damus R, Desset677

S, Morash J, Polidoro V (2005) Ship hull inspection by hull-678

relative navigation and control. In: OCEANS, 2005. Proceedings679

of MTS/IEEE. IEEE, pp 761–766680

15. Bülow H, Birk A (2009) Fast and robust photomapping with an 681

unmanned aerial vehicle (UAV). In: IEEE/RSJ international con- 682

ference on intelligent robots and systems (IROS’09). IEEE, pp 683

3368–3373 684

16. Ferreira F, Veruggio G, Caccia M, Bruzzone G (2012) Real-time 685

optical slam-based mosaicking for unmanned underwater vehicles. 686

Intell Serv Robot 5(1):55–71 687

17. Kekec T, Yildirim A, Unel M (2014) A new approach to real-time 688

mosaicing of aerial images. Robot Auton Syst 62(12):1755–1767 689

18. Elibol A, Gracias N, Garcia R (2012) Efficient topology estimation 690

for large scale optical mapping. In: Springer tracts in advanced 691

robotics, vol 82. Springer, New York 692

19. Elibol A, Gracias N, Garcia R (2010) Augmented state-extended 693

Kalman filter combined framework for topology estimation in 694

large-area underwater mapping. J Field Robot 27(5):656–674 695

20. Elibol A, Garcia R, Gracias N (2011) A new global align- 696

ment approach for underwater optical mapping. Ocean Eng 697

38(10):1207–1219 698

21. Elibol A, Gracias N, Garcia R (2013) Fast topology estimation for 699

image mosaicing using adaptive information thresholding. Robot 700

Auton Syst 61(2):125–136 701

22. Moon H, Tully S, Kantor G, Choset H (2007) Square root 702

iterated Kalman filter for bearing-only SLAM. In: The 4th inter- 703

national conference on ubiquitous robots and ambient intelligence 704

(URAI’07), Pohang 705

23. Garcia R, Gracias N (2011) Detection of interest points in turbid 706

underwater images. In: IEEE OCEANS, pp 1–9 707

24. Yang X, Cheng KT (2014) Local difference binary for ultrafast 708

and distinctive feature description. IEEE Trans Pattern Anal Mach 709

Intell 36(1):188–194 710

25. Lowe D (2004) Distinctive image features from scale-invariant key- 711

points. Int J Comput Vis 60(2):91–110 712

26. Ila V, Porta JM, Andrade-Cetto J (2010) Information-based com- 713

pact pose SLAM. IEEE Trans Robot 26(1):78–93 714

27. Anderson BDO, Moore JB (1979) Optimal filtering. Prentice-Hall, 715

USA 716

28. Torr P, Zisserman A (1998) Robust computation and parametriza- 717

tion of multiple view relations. In: Sixth international conference 718

on computer vision. IEEE, pp 727–732 719

29. Haralick RM (1998) Propagating covariance in computer vision. 720

In: 9. Theoretical foundations of computer vision, pp 95–114 721

30. Gracias N, Negahdaripour S (2005) Underwater mosaic creation 722

using video sequences from different altitudes. In: MTS/IEEE 723

OCEANS conference, Washigton, D.C., pp 1234–1239 724

31. Escartin J, Garcia R, Delaunoy O, Ferrer J, Gracias N, Elibol A, 725

Cufi X, Neumann L, Fornari DJ, Humpris SE, Renard J (2008) 726

Globally aligned photomosaic of the Lucky Strike hydrothermal 727

vent field (Mid-Atlantic Ridge, 3718.5′N): release of georeferenced 728

data, mosaic construction, and viewing software. Geochem Geo- 729

phys Geosyst 9(12):Q12,009 730

32. Lirman D, Gracias N, Gintert B, Gleason A, Reid RP, Negah- 731

daripour S, Kramer P (2007) Development and application of a 732

video-mosaic survey technology to document the status of coral 733

reef communities. Environ Monitor Assess 159:59–73 734

33. Ribas D, Palomeras N, Ridao P, Carreras M, Hernandez E (2007) 735

Ictineu AUV wins the first SAUC-E competition. In: IEEE inter- 736

national conference on robotics and automation, Rome 737

123


Au

tho

r P

ro

of

Documents

11370 2016 195 Author - ULisboausers.isr.ist.utl.pt/~ngracias/publications/Elibol16_ISR... · 2016. 3. 25. · based navigation, planning the path of the vehicle during 34 the execution