9
Camera orientation from a single image using vanishing point accumulation: a practical case study Michele Matteini May 27, 2018 Abstract In this paper a complete methodology for extracting the camera rotation matrix from a single photo is exposed. The idea of accumulation spaces behind this technique is well explored in many other works, but here those ideas are extended and polished to be applied in a real context: detecting camera pose in interior photos. Following a real implementation, details like scoring system adjustments for vanishing points and projection matrix decomposition are explained. An extension to this approach that selects the most fitting focal length from a set is also discussed. 1 Introduction Augmented reality technologies that enable users to add 3d objects to a real scene are now widely used with the support of new smartphones that can provide developers with a known camera and sensors to retrieve the device orientation. In this case study there is however the need to perform similar operations on a photo taken with an unknown mobile device with no informations about the original camera. In particular after taking few photos of his room, the user should be able to open his device photo gallery with a desktop program that enables him to change floor tiling, wallpaper and add new furniture to achieve the desired interior design. Since we expect the user to take these photos with his mobile device few simplifying hypothesis will be introduced in the follow- ing sections, the main of which is to initially consider the camera focal length to be known, looking only for the most fitting camera rotation matrix. A modifica- tion to retrieve both camera orientation and a FOV estimation is then proposed in sec. 4. Differently from other technical papers here the focus is not on researching new techniques or advancing ex- isting ones but to follow the whole implementation pro- cess step by step and show all the heuristic formulas used to compensate for errors and getting the best out of an existing approach. Moreover a detailed expla- nation of the used scoring (aka voting) system will be presented in sec. 3.5. 2 Previous works The problem of extracting camera extrinsic and intrin- sic parameters from a single image is not new, and it’s successfully solved in many other works[5][7][10][8]. Solutions proposed by these studies can be usually de- composed in 3 steps: 1. Pre-processing step where line segments are ex- tracted from the image. 2. Vanishing point (or line clusters that lead to van- ishing points) search on these lines. 3. Projection decomposition step that extracts the needed parameters from vanishing points. To solve point 1, an available study will be used[11]. Many algorithms have been proposed to solve point 2, here we focus on a technique that make use of an Hough transform accumulation space in the polar plane[9][8][2] because it’s easier to adjust to perform well on a specific scenario, and can be GPU-accelerated with just few modifications if better performances are needed. Point 3 have a simple theoretical solution[6], however due to noise and errors in the discussed sce- nario, heuristic adjustments will also be introduced in sec. 3.6. 3 Proposed methodology Given as input a single image, the algorithm can be summarized in the following six steps: 1. Line segment detection from the image 2. Accumulation of line intersection features in a ATan-Polar space map (referred as APM from now on) 3. APM normalization and filtering 4. Conversion from APM to vanishing points 5. Scoring of vanishing point triplets 6. Camera rotation approximation from the best scoring triplet 1

Camera orientation from a single image using vanishing ...michelematteini.altervista.org/contents/files/... · Camera orientation from a single image using vanishing point accumulation:

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Camera orientation from a single image using vanishing ...michelematteini.altervista.org/contents/files/... · Camera orientation from a single image using vanishing point accumulation:

Camera orientation from a single image using vanishing point

accumulation: a practical case study

Michele Matteini

May 27, 2018

AbstractIn this paper a complete methodology for extracting

the camera rotation matrix from a single photo isexposed. The idea of accumulation spaces behind thistechnique is well explored in many other works, but

here those ideas are extended and polished to beapplied in a real context: detecting camera pose ininterior photos. Following a real implementation,

details like scoring system adjustments for vanishingpoints and projection matrix decomposition are

explained. An extension to this approach that selectsthe most fitting focal length from a set is also

discussed.

1 Introduction

Augmented reality technologies that enable users toadd 3d objects to a real scene are now widely usedwith the support of new smartphones that can providedevelopers with a known camera and sensors to retrievethe device orientation.In this case study there is however the need to performsimilar operations on a photo taken with an unknownmobile device with no informations about the originalcamera. In particular after taking few photos of hisroom, the user should be able to open his device photogallery with a desktop program that enables him tochange floor tiling, wallpaper and add new furnitureto achieve the desired interior design. Since we expectthe user to take these photos with his mobile device fewsimplifying hypothesis will be introduced in the follow-ing sections, the main of which is to initially considerthe camera focal length to be known, looking only forthe most fitting camera rotation matrix. A modifica-tion to retrieve both camera orientation and a FOVestimation is then proposed in sec. 4.Differently from other technical papers here the focusis not on researching new techniques or advancing ex-isting ones but to follow the whole implementation pro-cess step by step and show all the heuristic formulasused to compensate for errors and getting the best outof an existing approach. Moreover a detailed expla-nation of the used scoring (aka voting) system will bepresented in sec. 3.5.

2 Previous works

The problem of extracting camera extrinsic and intrin-sic parameters from a single image is not new, andit’s successfully solved in many other works[5][7][10][8].Solutions proposed by these studies can be usually de-composed in 3 steps:

1. Pre-processing step where line segments are ex-tracted from the image.

2. Vanishing point (or line clusters that lead to van-ishing points) search on these lines.

3. Projection decomposition step that extracts theneeded parameters from vanishing points.

To solve point 1, an available study will be used[11].Many algorithms have been proposed to solve point2, here we focus on a technique that make use ofan Hough transform accumulation space in the polarplane[9][8][2] because it’s easier to adjust to performwell on a specific scenario, and can be GPU-acceleratedwith just few modifications if better performances areneeded. Point 3 have a simple theoretical solution[6],however due to noise and errors in the discussed sce-nario, heuristic adjustments will also be introduced insec. 3.6.

3 Proposed methodology

Given as input a single image, the algorithm can besummarized in the following six steps:

1. Line segment detection from the image

2. Accumulation of line intersection features in aATan-Polar space map (referred as APM from nowon)

3. APM normalization and filtering

4. Conversion from APM to vanishing points

5. Scoring of vanishing point triplets

6. Camera rotation approximation from the bestscoring triplet

1

Page 2: Camera orientation from a single image using vanishing ...michelematteini.altervista.org/contents/files/... · Camera orientation from a single image using vanishing point accumulation:

which produce as output the camera rotation matrix.Each one of these steps is explained separately in thefollowing sections. Equations used throughout this pa-per will often contain numerical constants that will beexperimentally calibrated later (see sec. 5). These willall be referred as ki where i is the constant identifier.

Figure 1: Coordinate system used throughout this paperfor reference. The camera points in -k direction.

3.1 Line detection

As explained in section 2 this step of the algorithm isjust an adaptation of the work done in [11]. The outputof this step is a list of line segments in the form:

si = AiBi

where Ai = (xai, yai) and Bi = (xbi, ybi) have coordi-nates in image pixel space. To both keep performancesstable and detect only meaningful (i.e. architectural)lines, the image is scaled down to a fixed pixel width k1before being processed. An example of detection canbe seen in fig. 2, where the NFA value is used to selecta set of meaningful lines.

3.2 Accumulation of line intersectionfeatures in APM

The coordinate system for the APM differs from theone used in [9] for the fact that a tan−1 encoding ofthe distance is used instead of a natural logarithm, thereason behind this choice lies in how intersection er-rors are distributed. Given two orthogonal lines if oneof them is gradually tilted until they become parallel,the distance of their intersection will move with tan-gent speed: noise in segments orientation will followthe same distribution. Using a log encoding has alsobeen found unpractical due to an extreme dilation ofspace near the origin that makes it hard to detect van-ishing points close to the image center. The mappingfrom image pixel space to the proposed coordinate sys-tem of point P = (px, py) is thus defined as:

p′y =2 tan−1(‖P‖k2)

π(1)

(a)

(b)

Figure 2: In (2a) an image from the York urbandatabase[4], in (2b) the detected lines are drawn overit. All the red lines (high NFA value) are used, whileless probable orange lines (low NFA value) will be con-sidered only to reach a minimum amount of total linesif needed.

2

Page 3: Camera orientation from a single image using vanishing ...michelematteini.altervista.org/contents/files/... · Camera orientation from a single image using vanishing point accumulation:

p′x =

{atan2(px,py)+π

2π y′ > R2

[atan2(px,py)+π](y′+1)

Rπ y′ ≤ R2

(2)

where R is the APM resolution, k2 is a scalingfactor that in this study is fixed at 1

w where w is theoriginal image width in pixels. The upper half of theAPM gets compressed in the x direction going towardthe top: this is another modification introduced toavoid excessive dilation of the accumulation space nearthe image center. An accurate conic projection hasbeen considered and discarded due to errors producedby excessive compression for the APM quantized space.

Considering now the APM cell list B = {bi}i=1,R2 ,the value stored for each intersection between two linesl1, l2 differs from[9] for the fact that is not a simple“vote” but a 3-dimensional vector defined as:

bi = (si, ci, Pi) (3)

where Pi is the cosine of the angle between l1 andl2, si and ci are respectively the sum of sine and cosineof the line directions multiplied by Pi. Each Pi valuecan be thought as a score for the intersection since thelower it is, the higher the image space angle between l1and l2 is, making the two lines less likely to be parallelin world space. This value can also be scaled to includedifferent angle ranges:

P ′i =Pi + k31 + k3

(4)

A complete APM (see Figure 3) should ideally havethree Pi peaks corresponding to three orthogonal van-ishing points.

Figure 3: A visualization of the APM from fig. 2a:brightness is proportional to Pi while hue is used forsi and ci. Three bright spots can be easily seen: thegreen peak is the vertical vanishing point, the yellowand purple peaks are the two horizontal ones.

3.2.1 Discarding line intersections

All the possible intersections could be accumulated inthe APM as explained above, however this would resultin a very noisy map where vanishing point candidateswould be hard to isolate (see sec. 3.3). Few rules willthus be applied to filter out intersections that are notlikely leading to architectural vanishing points:

• Parallel lines If l1 and l2 are parallel in imagespace, their infinitely far intersection is discarded.An accumulation space with tan−1 distance en-coding may naively suggest the possibility of alsotracking infinitely far vanishing points but this isgenerally not feasible for a simple reason: it’s verylikely that many groups of parallel lines can befound in the image from patterns and other ob-jects. This in practice leads to an APM where a lotof points at infinity completely overpower actualfinite vanishing points. Excluding parallel lines,the maximum intersection distance is actually fi-nite, since the original image has a finite resolutionof w × h pixels and can be proven to be:

dmax =√w2h2 + h2 ≈ wh

Intersections above this distance can thus be ig-nored. Fake vanishing points infinitely far awaywill be then reintroduced as illustrated in section3.4 to stabilize the process.

• Line segments-intersection distance As apremise to this rule a distance dimin is defined as

3

Page 4: Camera orientation from a single image using vanishing ...michelematteini.altervista.org/contents/files/... · Camera orientation from a single image using vanishing point accumulation:

the minimum of the two distances of the intersec-tion point from the line segments correspondingto l1 and l2. Since in interior photography archi-tectural vanishing points are usually not visuallyreached by lines inside the image plane, we can ex-ploit dimin to identify a fake vanishing point cre-ated by a pattern or object in the scene (e.g. fab-ric folds, fans and books can sometimes generatethese points (see Figure 4)). These intersectionscan be easily discarded with a threshold on dimin:

dimin > w/k4 (5)

where w is the image width.

Figure 4: An Example of misleading detections causedby particular objects: on the left a chandelier fromabove (detected point in orange), on the right a cur-tain (detected point in red).

• Intersection angle As explained above (sec. 3.2)line segments that form wider angles betweenthem are considered less likely to be parallel inworld-space. If equation 4 results into a negativevalue, the corresponding intersection is discarded.

3.3 APM normalization and filtering

Each APM cell vector bi is normalized dividing it byPi:

b′i = (siPi,ciPi, Pi) (6)

Pi can be then optionally divided by its maximum in Bso that the most probable vanishing point have Pi = 1.Equation 6 thus become:

b′i = (siPi,ciPi,

Pimaxj(Pj)

) (7)

The normalized map is then de-noised with a localmaxima filter on Pi. Here a simple in-place imple-mentation is used that erase all the cells that are notthe maximum in a given 2d range. The result of thisoperation can be seen in fig. 5.

(a) Unfiltered vanishing point

(b) Filtered vanishing point

Figure 5: Zoom of a noisy vanishing point in the APM(5a). After filtering, this is reduced to a single APMcell (5b)

4

Page 5: Camera orientation from a single image using vanishing ...michelematteini.altervista.org/contents/files/... · Camera orientation from a single image using vanishing point accumulation:

3.4 From APM to vanishing points

After the filtering step, only few hot cells should beleft. Using the inverse of encoding equations 1 and 2the image-space location of vanishing points can be re-trieved. Using a threshold on the minimum value for Pithe amount of possible vanishing points can be furtherreduced: points with low Pi value are the less votedones or voted by line segments pairs that have an highangle difference and are less likely to lead to a vanish-ing point.

By this point only a low number of candidate vanishingpoints should be left because of the factorial complex-ity of the subsequent step. Operating on the APMresolution R or the Pi threashold has been found to bea good way to compensate for too many or too littlepoints.

This step will also introduce two orthogonal con-structed vanishing points at infinity with Pi = 0 tocompensate for missing points, this is an acceptableapproximation to make the subsequent steps convergeto a plausible solution even when only a single vanish-ing point is found:

v∞1 = (0, dmax, 0)

v∞2 = (dmax, 0, 0)

A representation of the detected points mapped backto image space can be seen in fig. 6: the two pointswith higher Pi are x and z vanishing points. Eventafter filtering many low-scoring points are still present(i.e. blue points).

Figure 6: APM peaks mapped back to image-space:each colored dot is an APM cell, green segments showthe average cell direction (from si and ci) while colortemperature is proportional to Pi, also written on thegraph. In the background image 2b gives a visual ref-erence for vanishing points location.

3.5 Vanishing point triplets scoring

Now that a list of candidate vanishing points is avail-able, any triplet v1, v2, v3 can be potentially used as

columns in a matrix A = KR where K is the devicecalibration matrix and R is the camera rotation matrixwe are looking for.

First all possible triplets are generated, each triplet isthen given a score and the best scoring one is selectedas the most probable camera projection A. A similarsearch step can also be found in [8], but here the scoreis determined using a product of different sub-scores,where each one evaluates a feature of the consideredtriplet. Here are the heuristic metrics that have beengiving the best results with interior images:

• Principal point score This is a well known scor-ing formula where perspectives with a principalpoint far from the image center are more penal-ized. The principal point p0 is found as the or-thocenter of v1, v2, v3[6, p. 226]. The distance isscored with an exponential function:

spp = e−k5

(‖p0−pideal‖

‖v1−v2‖+‖v1−v3‖+‖v2−v3‖

)2

(8)

where pideal is the image center. The factor usedto divide the distance of p0 from pideal is theperimeter of the triangle composed by the threevanishing points. This has been experimentallyfound to be an efficient way to:

– Normalize scores among different image pixelsizes.

– Compensate for p0 noise sensitivity whenvanishing points are far away from eachother.

• Vanishing points orthogonality A com-mon scoring approach used in various similarworks[8][7][5] and highlighted in books[6, p. 215] isto check vanishing points for mutual orthogonalityusing their relation with the absolute conic ω:

cosij =vTi ωvj√

vTi ωvi

√vTj ωvj

(9)

where cosij is the cosine of the angle formed by thedirecitons of two vanishing point vi and vj . Thescore for orthogonality will be defined as the max-imum sinij among the three considered vanishingpoints:

sort = maxi,j∈{1,2,3}

i 6=j

(1− cos2ij) (10)

The absolute conic for this test can be obtainedfrom a constructed calibration matrix K

′(see sec-

tion 3.6) as:

ω = K′−1K

′−T (11)

5

Page 6: Camera orientation from a single image using vanishing ...michelematteini.altervista.org/contents/files/... · Camera orientation from a single image using vanishing point accumulation:

• Vanishing points score This score is given bythe sum of Pi values from the three APM cells:

svp = P1 + P2 + P3 + k6 maxi∈{1,2,3}

Pi (12)

An optional multiplier k6 is used on the most votedpoint to promote (k6 > 0) or demote (k6 < 0)triplets that contains low scoring points.

• Directional Score Directional information aboutthe lines that lead to vanishing points are also ac-cumulated in the APM as seen in equation 3. Thiscan be used to exclude triplets that contains van-ishing points originated from lines with similar di-rections in image space, as they are probably notorthogonal. To compute this score, for each of thethree vanishing points represented by an APM cellbi, a 2d vector vdi = (si, ci) is considered. For eachcombination of two vectors vj , vk the scalar prod-uct vjvk is computed and the directional score isdefined as:

sdir = 1− maxj,k∈{1,2,3}

j 6=k

vdjvdk (13)

The score of a vanishing point triplet v1, v2, v3 isthus defined as the product of the above sub-scores:

s(v1, v2, v3) = spp · sort · svp · sdir (14)

After scoring all the generated triplets, the top scor-ing one is chosen as solution.

3.6 Projection decomposition

The top scoring triplet is composed of three vanishingpoints, that can be expressed in homogeneous coordi-nates as:

v1 = (x1, y1, 1)

v2 = (x2, y2, 1)

v3 = (x3, y3, 1)

(15)

These can be assembled in the corresponding projec-tion matrix as:

A = {v1T , v2T , v3T }

λ1 0 00 λ2 00 0 λ3

= KR (16)

where K is the device calibration matrix, λi are scal-ing factors and R is the rotation matrix we are look-ing for. As explained in [3] with only these informa-tions bothK and R can be analytically recovered underknown aspect ratio and zero-skew assumptions. How-ever due to errors in vanishing point positions this ap-proach is very unreliable by itself. A more heuristicsystem it thus used. If a calibration matrix is availableand we fix λ1 = λ2 = λ3, directions of columns in therotation matrix can then be estimated as:

Ri = K−1vTi vi ∈ {v1, v2, v3} (17)

Since in this case study K is not available, a con-structed K ′ is used defined as:

K ′ =

scale 0 p0x0 scale p0y0 0 1

scale =w

2 tan(FOVx

2 )

(18)

where FOVx is the horizontal field of view of thecamera and p0 = (p0x, p0y) is the principal point. Fornow FOVx is set to a 30mm equivalent focal that is acommon value for most of mobile devices equipped witha camera. The principal point could be obtained asthe orthocenter of the three vanishing points[6, p. 226],however this measure is very sensitive to noise[3] andexperimentally produced worse results than setting itto the image center (p0 = pideal). Once a calibra-tion matrix is available, vanishing points can be back-projected and normalized to obtain the rotated cameraaxes:

R1 =K−1v1‖K−1v1‖

R2 =K−1v2‖K−1v2‖

R3 =K−1v3‖K−1v3‖

(19)

A first camera rotation can be thus obtained by com-bining them into a matrix:

R = {RT1 , R2T , R3T } (20)

3.6.1 Selecting the correct axis order

After composing R like in eq. 20, it’s practical to swapand flip R1, R2 and R3 so that the resulting x, y, zrotations are referred to a predefined coordinate sys-tem. A rule of thumbs for small rotations (i.e. < 45deg) is to choose the axis Ri among the back-projectedvanishing points, that forms the smallest angle with anexpected average axis orientation Ei after rotation:

R′i = Rj

where j = argmaxj{|Ei ∗Rj |} (21)

For the expected axis directions Ei the un-rotatedcamera axis vectors can be used. However since inthe considered scenario we know the user will point hisdevice down to capture the floor, we can bias this basicapproach by choosing an x-rotated Ez:

Ez = (0, sin(k7), cos(k7)) (22)

6

Page 7: Camera orientation from a single image using vanishing ...michelematteini.altervista.org/contents/files/... · Camera orientation from a single image using vanishing point accumulation:

where k7 is the average camera x-rotation. Each axisis thus selected using eq. 21 and its sign changed tomatch the wanted coordinate system.

3.6.2 Orthogonalization of R

Since the found vanishing points were not guaranteedto be orthogonal, the only missing step is orthogonal-ization. This is accomplished using the following strat-egy:

Rz = R′3

Ry = R′2 + kR′2xR

′3x +R′2yR

′3y

R′3z −R′2zRx = Ry ×Rz

(23)

where R′2 is orthogonalized against R′3 with a rota-tion around x axis. As can be seen in equations 23 R′1is not actually used here. To obtain three orthogonaldirections any two vanishing points can be used, exper-imentally however the one pointing in x-direction (seeFigure 1) has been found to be the less precise due toboth the lack of horizontal lines in the analyzed imagesets and to the high sensitivity of its y-rotation to er-rors in the process. After normalization, Rx, Ry, Rzcan be composed in the final camera rotation matrix:

Rortho = { RTx

‖Rx‖,RTy‖Ry‖

,RTz‖Rz‖

} (24)

4 Variable focal length exten-sion

The proposed methodology is based on a fixed focallength assumption that in this case study all devicefollowed. Even if this constraint cannot be completelyremoved, multiple focal lengths can be considered to-gether and the most fitting one selected with a simplemodification. In section 3.5 a scoring system is intro-duced where a score is given to each triplet of vanishingpoints. As can be seen from equations 9 and 11 thisscore depends on K that is built around a fixed fieldof view FOVx as explained in sec. 3.6.

If instead of a single value for FOVx a set val-ues {FOVx1, FOVx2, ..., FOVxn} is used, a scores(v1, v2, v3, FOVxi) can be computed for each FOVxi.Except for this change R can computed as before, con-sidering the pairs triplet-FOV instead of the tripletsalone, so that when the best scoring pair is selected, aguess for the field of view is also available.

The fact that this modification weakens the orthogo-nality score selectivity has to be taken into account,and a small set of focal lengths not too far away fromeach other should be used.

Figure 7: On the x-coordinate the difference (i.e. er-ror) in degrees between the found camera orientationand ground truth. On the y-coordinate the number ofpictures in the YUD that are classified with an errorbelow that difference. The max error distribution (inblack) uses the angle error along the worst performingaxis. The average error distribution uses the averageof the three axes instead.

Figure 8: The distribution of the total time used tocalculate the camera rotation from the image.

Figure 9: An example of an incorrectly classified im-age from the YUD with the detected lines superimposed:most of the horizontal lines on the building goes un-detected due to limitations in the used line detectionlibrary.

7

Page 8: Camera orientation from a single image using vanishing ...michelematteini.altervista.org/contents/files/... · Camera orientation from a single image using vanishing point accumulation:

Meaning Valuek1 Maximum width of the original

image before line detection.800

k2 Scaling constant for tan distanceencoding in the APM

1w

k3 Angle inclusion shift for the Piscore in APM cells

−0.4

k4 Minimum intersection distancemultiplier for fake vanishingpoint detection

0.04

k5 Scaling constant for the spp scor-ing equation

86

k6 Scaling constant for the svp scor-ing equation

0.2

k7 Average expected camera x-rotation

-30

R The APM vertical and horizontalresolution

180

Table 1: Meaning and value list for all ki constantsand APM resolution.

5 Experimental Results

To measure how well the proposed methodology per-forms, few tools have been developed to both createand batch-test image databases in the same format asthe YUD[4]. The vector of constants ki used through-out this paper is set to the values in table 1 for all testsin this section.

The results on the YUD can be see in picture 7,where 95% (97 out of 102) of the pictures stay be-low 5 degrees of average orientation error. Two im-ages are not correctly classified (i.e. error > 10 deg.),namely P1080023 and P1040779. In P1040779 the lackof a dominant horizontal line cluster makes it hard tomatch the y-rotation chosen as ground-truth like in [4].In P1080023 the problem is instead caused by the line-detection algorithm that is unable to detect intersectedline segments and almost no line is detected in one ofthe horizontal directions (see Figure 9).

Execution time has also been tested on the YUD (seeFigure 8) with an average time of about 200 ms on aIntel core i7 7820X CPU. However this is a single-coreimplementation and the AMP accumulation step canbe easily parallelized. On a 200 ms single-core execu-tion time, 70% has been experimentally found to beused for line detection, which can also be optimized[1]to make the whole methodology suitable for real-timescenarios.

To test on the actual interior images considered in thiscase study, an additional database of 250 images withground-truth vanishing point data has been created(see Figure 10). This db contains pictures taken ondifferent cameras with different focal lengths and haveno constraints on having dominant orthogonal direc-tions, but just provide an accurate sample of what an

user may be loading into the final product. The modi-fication introduced in section 4 will be used with focallengths of 18mm, 30mm and 50mm. Results for thistest can be seen in fig. 11. This time only 81% of thepictures (202 out of 250) have an orientation error be-low 5 degrees. To better understand these results a setof the 20 worst instances have been visually examined.The most common failing causes and their frequenciesare listed below:

• 3 / 20 Presence of architectural objects notaligned with the room walls with many detectedline segments.

• 17 / 20 Camera orientation is correctly detected(i.e. max orientation error <5 deg.) but theFOV estimation is wrong, resulting in one of thehorizontal vanishing points being chosen to createan orthogonal triplet with that FOV. Since theground-truth test is based on the vanishing pointdirections and not on the final camera rotation,these tests failed (i.e. reported high angle errors).

This means that most of the errors are caused by awrong FOV while a correct camera orientation is pre-served, resulting in a visually acceptable result for ourcase study. A FOV detection success rate for this testcan be estimated as:

Pfov =‖Isucceeded‖+ ‖Ifailed‖ ∗ (1− Efov)

‖I‖

=202 + 48 ∗ (1− 17

20 )

250≈ 84%

where ‖Isucceeded‖ and ‖Ifailed‖ are respectively thenumber of passed and failed tests, and Efov the per-centage of failures caused by an incorrect FOV estima-tion.

Figure 10: Few sample images from the interiorsground-truth db.

8

Page 9: Camera orientation from a single image using vanishing ...michelematteini.altervista.org/contents/files/... · Camera orientation from a single image using vanishing point accumulation:

Figure 11: The cumulative average orientation erroron the interiors ground-truth db. Almost 20% of theimages display an high orientation error mostly causedby a FOV not included in the search (see sec. 4).

6 Conclusions

In this paper a complete methodology to extract cam-era orientation from interior images based on vanishingpoint accumulation space is explained and its imple-mentation details discussed for the scenario describedin sec. 1. An additional extension that also gives arough estimate for the FOV from a set is proposed.A test on publicly available ground-truth database hasbeen carried out with state-of-art results. A new DBof 250 classified pictures representative of the discussedcase study is then tested with good results on the ex-tracted camera orientation. On this DB, which con-tains photos taken with different FOVs the limits ofits estimation from a set are shown where almost 1out of 5 is significantly incorrect. Since in our sce-nario an incorrect FOV won’t create a visually unap-pealing result since both input image FOV and outputFOV are constrained in a small range our objectivescan be considered reached. Analyzing the whole pro-cess one of most error prone step can be identified withthe line segments detection which is the root cause ofmost of the estimation failures. Another observed issueof vanishing-point accumulation techniques is their de-pendence on line cluster size ratios which cause smaller(potentially significant) clusters to be considered allsimilar against larger ones, which limits precision whenperturbing unaligned objects are visible in the image.

References

[1] C. Akinlar and C. Topal. Edlines: A real-timeline segment detector with a false detection con-trol. Pattern Recognition Letters, 32(13):1633–1642, 2011.

[2] V. Cantoni, L. Lombardi, M. Porta, and N. Sicard.Vanishing point detection: representation analy-sis and new approaches. In Image Analysis and

Processing, 2001. Proceedings. 11th InternationalConference on, pages 90–94. IEEE, 2001.

[3] R. Cipolla, T. Drummond, and D. P. Robertson.Camera calibration from vanishing points in im-age of architectural scenes. In BMVC, volume 99,pages 382–391, 1999.

[4] P. Denis, J. H. Elder, and F. J. Estrada. Effi-cient edge-based methods for estimating manhat-tan frames in urban imagery. In European confer-ence on computer vision, pages 197–210. Springer,2008.

[5] W. Forstner. Optimal vanishing point detectionand rotation estimation of single images from alegoland scene. In Proc. ISPRS Commission IIISymp. Photogramm. Comput. Vis. Image Anal,pages 157–162, 2010.

[6] R. Hartley and A. Zisserman. Multiple view ge-ometry in computer vision. Cambridge universitypress, 2003.

[7] F. M. Mirzaei and S. I. Roumeliotis. Optimal esti-mation of vanishing points in a manhattan world.In Computer Vision (ICCV), 2011 IEEE Inter-national Conference on, pages 2454–2461. IEEE,2011.

[8] C. Rother. A new approach to vanishing point de-tection in architectural environments. Image andVision Computing, 20(9-10):647–655, 2002.

[9] K.-S. Seo, C.-J. Park, and H.-M. Choi. Log-polarcoordinate image space for the efficient detectionof vanishing points. ETRI journal, 28(6):819–821,2006.

[10] J.-P. Tardif. Non-iterative approach for fast andaccurate vanishing point detection. In ComputerVision, 2009 IEEE 12th International Conferenceon, pages 1250–1257. IEEE, 2009.

[11] R. G. Von Gioi, J. Jakubowicz, J.-M. Morel, andG. Randall. Lsd: a line segment detector. ImageProcessing On Line, 2:35–55, 2012.

9