Mosaic-Based 3D Scene Representation and …zhu/ICIP05/JIC06_Zhu_Hanson.pdfMosaic-Based 3D Scene Representation and Rendering Zhigang Zhu (Corresponding Author) Department of Computer

Submission to Image Communication, Special Issue on Interactive Representation of Still and Dynamic Scenes

Mosaic-Based 3D Scene Representation and Rendering

Zhigang Zhu (Corresponding Author)

Department of Computer Science 130th Street and Convent Avenue

City College of New York, New York, NY 10031, USA Tel: 1 (212) 650-8799, Fax: 1 (212) 650-6248

[email protected]

Allen R. Hanson Department of Computer Science

130 Governors Drive University of Massachusetts Amherst, MA 01003, USA

[email protected] Abstract - In this paper we address the problem of fusing images from many video cameras or a moving video camera. The captured images have obvious motion parallax, but they will be aligned and integrated into a few mosaics with a large field-of-view (FOV) that preserve 3D information. We have developed a compact geometric representation that can re-organize the original perspective images into a set of parallel projections with different oblique viewing angles. In addition to providing a wide field of view, mosaics with various oblique views well represent occlusion regions that cannot be seen in a usual nadir view. Stereo pair(s) can be formed from a pair of mosaics with different oblique viewing angles and thus image-based 3D viewing can be achieved. This representation can be used as both an advanced video interface and a pre-processing step for 3D reconstruction. A ray interpolation approach for generating the parallel-projection mosaics is presented, and efficient 3D scene/object rendering based on multiple parallel-projection mosaics is discussed. Several real-world examples are provided, with applications ranging from aerial video surveillance/environmental monitoring, ground mobile robot navigation, to under-vehicle inspection.

Keywords –image-based rendering, parallel projection mosaics, ray interpolation, three dimensional viewing, interactive visual representation.


1

I. INTRODUCTION

This paper presents a novel approach for fusing images from many spatially distributed video cameras

(or a moving video camera) into a few mosaiced images that preserve 3D information. In both cases, a

virtual 2D array of cameras with field-of-view (FOV) overlaps is formed to generate complete coverage

of a scene (or an object). The proposed mosaic representation has been applied to a variety of

applications, including airborne video for environmental monitoring and urban surveillance, ground

mobile robot navigation, and under-vehicle inspection (Fig. 1). These applications represent very different

imaging scenarios, from far-range through extreme close-range. We will show that the same approach can

be applied to all these cases.

a b

c d

Fig. 1. A few application examples: (a) airborne video for environmental monitoring; (b) airborne urban surveillance; (c) ground mobile robot; and (d) under-vehicle inspection.

A mosaic representation with a single viewpoint is not appropriate for representing a 3D scene

captured by a translating camera, due to the well-known problems of occlusion as illustrated in Fig. 2a. A

2D panoramic mosaic of a 3D scene generated from video from a translating camera has the geometry of

multiple viewpoints [1,2], but it only preserves information from a single viewing direction. An example

is shown in Fig. 2b with parallel projection in the direction of the drawing (i.e. in the plane of the page)

and perspective projection in the orthogonal direction (i.e. into the page). Three-dimensional (3D)

structure and surface information from other viewing directions of the original video is lost in such a


2

representation. A digital elevation map (DEM) generated from using traditional aerial photogrammetry

consists of a sampled array of elevations (depths) for a number of ground positions at regularly spaced

intervals [3]. Even though 3D and texture data can be represented, a DEM generated from such a scenario

usually only has a nadir viewing direction (as in Fig. 2b, with parallel projections in both directions),

hence the surfaces from other viewing directions cannot be represented. However, in some applications

such as surveillance and security inspection, a scene or an object (e.g. a vehicle) needs to be observed

from many viewing directions to reveal hidden anomalies (see Section VI.C for an example). Stereo

panoramas [4,5] have been presented as a mechanism for obtaining the “best” 3D information from an

off-center rotating camera. In the case of a translating camera, various layered representations [6-8] have

been proposed to represent both 3D information and occlusions, but such representations need 3D

reconstructions.

O

invisible a

invisible invisible

b

visible invisible

c

invisible visible

d

Fig. 2. Mosaic representations with different projections. (a) Perspective; (b) orthogonal (nadir); (c) oblique looking forward and (d) oblique looking backward. The combination of b to d gives our multi-view parallel

mosaic representation

Figure 2a. illustrates the observation that many viewing directions are already included in each of the

original camera views . This property has been noted and used before, for example in the X-slit mosaics

with non-parallel rays [9] for image-based rendering. In this paper we propose a representation that can

re-organize the original perspective images into a set of parallel projections with different oblique

viewing angles (in both the x and the y directions of the 2D images I (x, y)). Representations with parallel

projections are efficient since only a few 2D image are needed. They are also effective in recovering the

3D structure of the scene in view using the optimal parallel stereo geometry [10-12]. Mosaics with 2D

oblique parallel projections are a unified representation for our previous work on parallel-perspective

stereo mosaics [11,12] and multi-camera mosaics [13,14]. Such representations provide a wide field of


3

view, optimal 3D information for stereo viewing and reconstruction, and the capability to represent

occlusions. Fig. 2 (from b to d) shows three oblique views where all the surfaces can be represented with

the combination of the three views. In practice, more views may be needed for both 3D reconstruction

and 3D viewing.

This paper will focus on mosaic representation and our approach for mosaic-based rendering of 3D

scenes, but other research issues, such as mosaic generation and 3D reconstruction will also be briefly

mentioned. Therefore, the rest of the paper is organized as follows. Section II will briefly introduce

mosaic representations with 2D oblique parallel projection and their inherent properties. In Section III, we

will present several practical cases where 2D parallel-projection mosaics can be generated, and discuss

related research issues in generating and using the parallel mosaics. In Section IV, we will present a

general ray interpolation approach for parallel-project mosaic generation. This section will also discuss

some practical issues in generating the mosaics. An efficient mosaic-based 3D rendering method will be

presented in Section V. Experimental results are given in Section VI for three important applications –

aerial video surveillance, ground mobile robot navigation, and under vehicle inspection. Section VII is a

brief summary.

II. 2D OBLIQUE PARALLEL PROJECTION

A normal perspective camera has a single viewpoint, which means all the light rays pass through a

common nodal point. On the other hand, in orthogonal images with parallel projections in both the x and y

directions, all the rays are parallel to each other. Imagining that we have a sensor with parallel

projections, we could turn the sensor to capture images with different oblique angles (including both nadir

and oblique angles) in both the x and y directions. Thus we can create many pairs of parallel stereo

images, each with two different oblique angles, and can observe surfaces occluded in a nadir view.

β1<0 β2>0

B

P

Z

Fig. 3. Depth from parallel stereo with multiple viewpoints: 1D case.


4

Fig. 3 shows the parallel stereo in a 1D case, where two oblique angles β1 and β2 are chosen. The

depth of a point P can be calculated as

12 tantan ββ −=

BZ (1)

where β1 and β2 are the angles of the two viewing directions, respectively, and B is the adaptive baseline

between the two viewpoints. This adaptive baseline information is recorded in a pair of stereo mosaics

with these two angles, and is proportional to the displacement of the corresponding image projections of

the point P. The baseline is adaptive since a large depth will have a larger baseline given the two angles

than a smaller depth. It has been shown by others [10] and by us [11, 12] that parallel stereo is superior to

both conventional perspective stereo and to the recently developed multi-perspective stereo with

concentric mosaics for 3D reconstruction (e.g., in [5]). The adaptive baseline inherent in the parallel-

perspective geometry permits depth accuracy independent of absolute depth in theory [10,11]. This result

can be easily obtained from Eq. (1) since depth Z is proportional to the adaptive baseline B and therefore

to the recorded visual displacement of the corresponding pair in the two mosaics. In practice, the depth

accuracy is a linear function of depth in stereo mosaics generated from perspective image sequences [12],

due to the ray interpolation process that will be discussed in Section IV. However, this is still better than

perspective stereo or concentric stereo. In contrast, the depth error of perspective stereo and concentric

stereo is proportional to the square of depth.

a b d c

β α α β

Y X

Z

O

ray

Fig. 4. Parallel projections with two oblique angles α and β (around the x and y axes, respectively). (a) Nadir view (α=β=0); (b) β-oblique view (α=0, β≠0); (c) α-oblique view (α≠0, β=0) and (d) dual-oblique view (α≠0, β≠0). Parallel mosaics can be formed by populating each single selected ray in both the x and y directions.

We can make two extensions to this 1D case of parallel stereo. First, we can select various oblique

angles (other than just two) for constructing multiple parallel projections. By doing so we can observe

various degrees of occlusions and can construct stereo pairs with different depth resolution via the


5

selection of different pairs of oblique angles. Second, the 1D parallel projection can be extended to 2D

(Fig. 4), obtaining a mosaiced image that has a nadir view (Fig. 4a), oblique angle(s) only in one direction

(Fig. 4b and c) or oblique angles in both the x and the y directions (Fig. 4d).

III. PRACTICAL SCENARIOS AND RESEARCH ISSUES

It is impractical to use a single sensor to capture orthogonal images with full parallel projections in

both x and y dimensions for a large-scale scene, and with various oblique directions. However, there are at

least three practical ways of generating images with oblique parallel projections using existing sensors: a

2D sensor array of many spatially distributed cameras (Fig. 5a), a “scanner” with a 1D array of cameras

(Fig. 5b), and a single perspective camera that moves in 2D (Fig. 5c).

With a 2D array of many perspective cameras (Fig. 5a), we first assume that the optical axes of all the

cameras point in the same directions (into the paper in Fig 3a), and the viewpoints of all cameras are on a

single plane perpendicular to their optical axes. Then the perspective images can be organized into

mosaiced images with any oblique viewing angles by extracting rays from the original perspective images

with the same viewing directions, one ray from each image. If the camera array is dense enough, then

densely mosaiced images can be generated.

a b c

Fig. 5. Parallel mosaics from 2D bed of cameras. (a) 2D array; (b) 1D scan array and (c) a single scan camera.

If only a 1D linear array of perspective cameras is available (Fig. 5b), the camera array can be

‘scanned’ over the scene to synthesize a virtual 2D camera array. Then stereo mosaic pairs with oblique

parallel projections in both directions can still be generated, given that we can accurately control or

estimate the translation of the camera array. We have actually used this approach in an Under Vehicle

Inspection System (UVIS: Section VI.C) [13, 14, 18].


6

Even just a single camera is used, we can still generate a 2D virtual bed of cameras by moving the

camera in two dimensions, along a “2D scan” path as shown in Fig. 5c. This is the case for aerial video

mosaics [11, 12, 15, 20] where a single camera is mounted on a light aircraft flying over an area (Section

VI.A).

In real applications where parallel-projection mosaics must be generated, there are two challenging

research issues. The first problem is camera orientation estimation (calibration). In our previous study on

an aerial video application, we used external orientation instruments, i.e., GPS, INS and a laser profiler, to

ease the problem of camera orientation estimation [11, 12, 20]. In the case of under-vehicle inspection

using a 1D array of cameras [13], relative relations among cameras can be obtained by an offline camera

calibration procedure. However, the motion of the cameras or vehicles should be estimated through image

matching. In applications of 3D rendering where accurate 3D estimation is not the main issue, an image-

based camera-motion estimation method [11] is used to get an approximation of the camera orientation

parameters, i.e., the affine transformation parameters. In this paper, we assume that the extrinsic and

intrinsic camera parameters are known at each camera location so that parallel-projection mosaics can be

generated.

The second problem is to generate dense parallel mosaics with a sparse, uneven, camera array, and for

a complicated 3D scene. To solve this problem, a Parallel Ray Interpolation for Stereo Mosaics (PRISM)

approach was proposed in [11]. While the PRISM algorithm was originally designed to generate parallel-

perspective stereo mosaics (parallel projection in one direction and perspective projection in the other),

the core idea of ray interpolation can be used for generating a mosaic with full parallel projection at any

oblique angle. This will be discussed in the next section.

In summary, in the stereo mosaic approach for large-scale 3D scene modeling and rendering, the

computation is efficiently distributed in three steps (Fig. 6): camera pose estimation via the external

measurement units, image mosaicing via ray interpolation, and 3D reconstruction from a pair of stereo

mosaics [11, 12, 15] or 3D rendering with multi-view mosaics [17]. In estimating camera poses (for

image rectification), only sparse tie points widely distributed in the two images are needed for performing

bundle adjustments [21]. In generating dense parallel rays in stereo mosaics, local matches are only

performed for parallel-perspective rays between small overlapping regions of successive frames (Section

IV). In using stereo mosaics for 3D recovery, matches are only carried out between the two final mosaics;

for 3D viewing, only mosaic selection and viewing window cropping are needed. Mosaic-based 3D

viewing will be discussed in Section V.


7

Step 1. Orientation Estimation (calibration, geo-location, bundle)

Step 2. Stereo Mosaicing (matching, ray interpolation)

Step 3. 3D Recovery/Viewing (stereo, motion, rendering)

Fig. 6. Three step approach for generating and using parallel-projection mosaics.

IV. PRISM: VIDEO MOSAICING ALGORITHM

This section discusses the generalized Parallel Ray Interpolation for Stereo Mosaics (PRISM) approach

to generating dense parallel mosaics with a sparse, uneven, camera array, and for a complicated 3D scene.

Fig. 7 shows how the PRISM algorithm works for 1D images. The 1D camera has two axes – the optical

axis (Z) and the X-axis. Given the known camera orientation at each camera location, one ray with a

given oblique angle β can be chosen from the image at each camera location to contribute to the parallel

mosaic with this oblique angle β. The oblique angle is defined against the direction perpendicular to the

mosaicing direction, which is the dominant direction of the camera path (Fig. 7).

A B

I

Camera path

Nodal point

Optical axis

Image plane

Parallel rays

Mosaicing direction Interpolated ray

Z

X

β

Fig. 7. Ray interpolation for parallel mosaicing from a camera array


8

But the problem is that the “mosaiced” image constructed from only those existing rays will be sparse

and uneven since the camera arrays are usually not regular and very dense. Therefore interpolated parallel

rays between a pair of existing parallel rays (from two neighboring images) are generated by performing

local matching between these two images. The assumption is that we can find at least two images to

generate the parallel ray. Such an interpolated ray is shown in Fig 4, where Ray I is interpolated from

Image A and Image B.

One interesting property of the parallel mosaics is that all the (virtual) viewpoints are at infinity.

Therefore, even if the original camera path has large deviation in the direction perpendicular to the

mosaicing direction, we can still generate full parallel mosaics. However, note that in practice, too large a

deviation in the perpendicular direction will result in a captured image sequence with rather different

spatial resolutions of the scene, hence the resulting mosaics will have varying spatial quality via ray

interpolation.

The extension of this approach to 2D images is straightforward, and a region triangulation strategy

similar to that in [11] can be applied here to deal with 2D cases. In principle, we need to match all the

points between the two overlapping slices of the successive frames to generate a complete parallel-

perspective mosaic. In an effort to reduce the computational complexity, a fast PRISM algorithm [11]

based on the proposed PRISM method has been developed. It only requires matches between a set of

point pairs in two successive images; the rest of the points are generated by warping a set of triangulated

regions defined by the control points in each of the two images. The proposed fast PRISM algorithm can

be easily extended to use more feature points (thus smaller triangles) in the overlapping slices so that each

triangle really covers a planar patch or a patch that is visually indistinguishable from a planar patch, or to

perform pixel-wise dense matches to achieve true parallel-perspective geometry.

One important issue here is the selection of neighborhood images for ray interpolation. For example,

with a 1D scan sequence of a single camera, it is hard to generate full parallel projection in the other

direction (i.e., the Y direction), which is perpendicular to the motion of the camera, since the interpolated

parallel rays far off the center of the images in the y direction have to use rays with rather different

oblique angles in the original perspective images.

Fig. 8 shows mosaic results from an aerial video sequence of a cultural scene, with a 1D scan of a

single camera. Note that parallel-perspective mosaics are generated in this example, due to the

aforementioned problem. Here we want to show the effectiveness of the PRISM algorithm. A few frames

of this 1000+-frame sequence are shown in Fig. 8a. In order to save matching time, every 10th frame is

used for generating the parallel-perspective mosaics. Please compare the results of parallel-perspective

mosaicing via the PRISM approach [11] vs. 2D mosaicing using a similar approach (manifold mosaicing


9

[2]), by looking along the many building boundaries (associated with depth changes) in the complete

4448x1616 set of mosaics at our web site [15]. Since it is hard to see subtle errors in the 2D mosaics of

the size of Fig. 8a, Fig. 8b and Fig. 8c show close-up windows of the 2D and 3D mosaics for the same

portion of the scene with the tall Campus Center building. In Fig. 8b the multi-perspective mosaic via 2D

mosaicing has obvious seams along the stitching boundaries between two frames. It can be observed by

looking at the region indicated by circles where some fine structures (parts of a white blob and two

rectangles) are missing due to misalignments. As expected, the parallel-perspective mosaic constructed

using 3D mosaicing (Fig. 8c) does not exhibit these problems.

Seams due to misalignment Seamless after

local match

d c

b

The mosaic (b) is the left mosaic generated from a sub-sampled "sparse" image sequence (every 10 frames of total 1000 frames, a few showing in (a)) using the proposed PRISM algorithm. The bottom two zoom sub-images show how PRISM deals with large motion parallax of a tall building: (c) 2D mosaic result with obvious seams (d) 3D mosaic result without seam using PRISM.

a

Fig. 8. Parallel-perspective mosaics of a campus scene from an airborne camera.

V. STEREO VIEWING AND 3D RECONSTRUCTION

Parallel mosaics with various oblique angles represent scenes from the corresponding viewing angles

with parallel rays and with wide fields of view. There are two obvious applications of such representation.

First, for 3D recovery, matches are only performed on a pair of mosaics, not on individual video frames.


10

The stereo mosaic method also solves the baseline versus field-of-view (FOV) dilemma efficiently by

extending the FOV in the directions of mosaicing. More importantly, the parallel stereo mosaics have

optimal/adaptive baselines for all the points, which leads to uniform depth resolution in theory and linear

depth resolution in practice. For 3D reconstruction, epipolar geometry is rather simple due to the full

parallel projections in the mosaic pair. We will present an example of 3D reconstruction of forest scenes

in Section VI; methods and results on 3D reconstruction of urban scenes can be found in [19].

X

Y

Z

β

α γ

R1a L1a

R2 L2

R4

L4 R3

L3

R1b L1b R1c L1c

Fig. 9. 3D rendering based on multi-view parallel-projection mosaics.

Second, a human can perceive the 3D scene from a pair of mosaics with different oblique angles (e.g.,

using polarized glasses) without any 3D recovery. If we have mosaics with various oblique angles in both

the x and the y direction, a virtual fly/walk-through can be generated by a simple procedure of selecting

different pairs of mosaics, cropping corresponding windows, and performing 2D image rotation and

scaling.

Fig. 9 shows the basic concept of mosaic-based rendering. Each grid represents an oblique viewing

direction (please also refer to Fig. 2). Translation, rotation and zoom of the virtual camera can be

simulated as the following simple operations:

(1) Translation in the xy plane can be simulated by shifting the current displayed mosaic pair, due to

the multiple-viewpoint property of the parallel-projection mosaics.


11

(2) Rotations around the X and the Y axes can be simulated by selecting different pairs of mosaics with

different oblique angles. In Fig. 9, the pair (L1a, R1a) gives a user a stereo view with the effect of

“panning” to the left, the pair (L1b, R1b) gives a stereo view with the effect of looking down (into the

paper), while the pair (L1c, R1c) gives a stereo view with the effect of panning to the right, all with a zero

α-oblique angle. The pair (L2, R2) gives a stereo view having both α- and β-oblique angles, thus allowing

the user to perform both panning and tilting.

(3) Rotation (with angle γ) around the optical axis Z only requires selection of an appropriate pair of

mosaics followed by rotation of this pair of mosaics in their image planes by angle γ. In Fig. 9, selecting

the pair (L3, R3) gives a stereo view with the effect of rotating the camera 90 degrees (γ=90), and turning

to the right, whereas the pair (L4, R4) gives a stereo view with the effect of rotating the camera 45 degrees

(γ=45) and turning the camera’s head to the right as well.

(4) The visual disparities can also be controlled by changing the selected angles between the two

mosaics for stereo viewing. In another words, if two images close together are selected (‘close’ in the

sense of Fig. 9), the visual disparities will be smaller; otherwise they will be larger. This is useful when

viewing 3D scenes with varying ranges.

(5). Camera zoom can be simulated by scaling the 2D images in the viewing window. By incorporating

visual disparity control with the scaling operation, a Z-translation effect can also be approximately

simulated.

In a virtual fly-/walk-through, the number of mosaics and the switching between different pairs should

allow smooth viewing changes. Rendering results based on real mosaics will be shown in the next

section. Here we want to point out due the to nature of the parallel-projection of each mosaiced image, the

3D visual effect is different from the usual perspective stereo viewing: usually exaggerated 3D effects

will be observed. However, the mosaic-based rendering approach, almost without any computation,

provides 3D information, occlusions, virtual translation and rotation of the cameras. In fact,

independently moving objects can also represented and visualized in this approach. The discussion of

dynamic aspect of the stereo mosaics can be found in [19].

VI. EXPERIMENTAL EXAMPLES

The proposed mosaic representation has been applied to a variety of applications, including (1)

airborne video for environmental monitoring and urban surveillance; (2) ground mobile robot navigation;

and (3) under-vehicle inspection. In this section, we will mainly show 3D rendering results with multi-


12

view parallel (perspective) mosaics in these three scenarios. These applications represent very different

imaging scenarios, including far-range, medium-range and extreme close-range imaging. We will show

that the same approach can be applied to all these three cases.

A. Video Mosaics from Aerial Video In theory, with a camera on an airplane undergoing an ideal 1D translation and a nadir view direction,

two spatio-temporal images can be generated by extracting two rows of pixels at the front and rear edges

of each frame perpendicular to the direction of motion (Fig. 10). The mosaic images thus generated are

parallel-perspective, with parallel projection in the motion direction and perspective projection in the

other. In addition, these mosaics are obtained from two different oblique viewing angles of a single

camera’s field of view, so that a stereo pair of left and right mosaics captures the inherent 3D information.

Note that we do not generate parallel projection in the y direction for this 1D scan case due to the

difficulty mentioned in Section IV.

In our aerial video environmental monitoring application, a single camera is mounted in a small aircraft

undergoing 6 DOF motion, together with a GPS, INS and laser profiler to measure the moving camera

locations and the distance to the terrain [11, 12]. Given the acquired data, seamless stereo parallel-

perspective video mosaic strips can be generated from the image sequences with a 1D scan path, but with

a rather general motion model, using the proposed parallel ray interpolation for stereo mosaicing

(PRISM) approach [11].

Left view mosaic

……

Rays of left view

……

Rays of right view

Right view mosaic

Front slit Rear slit

Perspective image Y

Z

XO

dx

motion direction

Fig. 10. Parallel-perspective stereo mosaics with a 1D camera scan path


13

A real-world example of video mosaicing using the PRISM approach was shown in Fig. 8. As another

example, Fig. 11 shows stereo mosaics (with two β-oblique angles) generated from a telephoto camera

and 3D recovery for a forest scene in Amazon rain forest. The average height of the airplane is H = 385 m

(i.e., about 1000 feet), and the distance between the two slit windows is selected as dx = 160 pixels (in the

x direction) with images of 720 (x) * 480 (y) pixels. The image resolution is about 7.65 pixels/meter. The

depth map (Fig. 11c) generated from the stereo mosaics (Fig. 11 a and b) was obtained by using a

hierarchical sub-pixel dense correlation method [16]. The range of depth variations of the forest scene

(from a stereo fixation plane) is from -24.0 m (tree canopy) to 24.0 m (the ground). Even before any 3D

recovery, a human observer can perceive the 3D scene from the stereo pair using a pair of red/blue stereo

glasses (Fig. 11d).

a

b

c

d

Fig. 11. Stereo mosaics and 3D reconstruction of a 166-frame telephoto video sequence. (a) Left mosaic (b) right mosaic (c) depth map and (d) stereoscopic view (use left-blue/right-red glasses).


14

motion direction

Multi-view stereo viewing

Multi-view 3D reconstruction

Mutli-view mosaics from a single moving camera. Seven mosaics are shown with 7

different viewing directions.

a perspective frame

Fig. 12. Multi-view parallel-perspective mosaics for 3D reconstruction and stereo viewing.

Fig. 13. Mosaic-based fly-through: snapshots. A 3D effect will be seen with a pair of red-blue glasses. The snapshots also show varying occlusions and independently moving targets (cars on the road).

Multiple oblique parallel-perspective mosaics (Fig. 12) generated in a similar way can be used for

image-based rendering as discussed in Section V. In the case of these parallel-perspective mosaics with

only β-oblique angles, only translation in the X and Y directions, rotation around the Y axis, and camera


15

zoom can be performed, but a very effective 3D virtual fly-through can be generated. A mosaic-based fly-

through demo may be found at [17], which uses 9 oblique mosaics generated from a real video sequence

of the UMass campus. This result shows motion parallax, occlusion and also moving objects in multiple

parallel-perspective mosaics; a few snapshots are shown in Fig. 12 to illustrate these effects. We note that

the rendering shows a parallel-perspective rather than a true perspective perception. A true perspective

fly-through will be enabled by 3D reconstruction from the multiple mosaics.

B. Video Mosaics for mobile robot application The same approach has also been applied to ground mobile robot applications where the ranges of the

roadside scenes to the camera on a mobile robot is from tens feet (indoor) to hundreds feet (outdoor). The

road-side parallel (-perspective) stereo mosaics can be used for human-robot interaction in robot

navigation. Fig. 14 shows three parallel-perspective mosaics from a 517-frame video sequence captured

from a mobile robot viewing a group of bookshelves and cabinets at close range as the robot moves from

one end to the other.

a

b

Fig. 14. Ground video application. (a) A few frames from a 517-frame sequence of image size 320*240. (b) Ground video mosaics: a left view, the center view and a right view are shown. Each mosaic is 4160*288.


16

Fig. 15. Mosaic-based walk-through: stereoscopic snapshots (with red/blue glasses)

For this example, eleven (11) mosaics are generated. The video clip of a virtual walk-through using

these 11 mosaics can be found at [22]. Fig. 15 shows a few snapshots extracted from the video clip at two

camera locations, one viewing the connection between two book shelves (the 1st row), and the other

viewing one end of a cabinet (the 2nd row). In this example, the 3D and occlusion effects are dramatic.

C. Video Mosaics for Under-Vehicle Inspection As one of the real applications of full parallel stereo mosaics, an approximate version of mosaics with

full parallel projections has been generated from a virtual bed of 2D camera arrays by driving a car over a

1D array of cameras in an under-vehicle inspection system (UVIS) [13, 14, 18]. UVIS is a system

designed for security checkpoints such as those at borders, embassies, large sporting events, etc. It is an

example of generating mosaics from very short-range video; a 2D virtual array of camera is necessary for

full coverage of the vehicle undercarriage.

1D camera array inside:

Fig. 16. Conceptual 1D camera array for under-vehicle inspection [13, 14].


17

Fig. 17. 2D parallel mosaic from “13 cameras” spaced 3 inches apart traveling down the length of the vehicle.

Fig. 16 illustrates the system setup where an array of cameras is housed in a platform. When a car

drives over the platform, several mosaics with different oblique angles of the underside of a car are

created. The mosaics can then be viewed by an inspector to thoroughly examine the underside of the

vehicle from different angles. Fig. 17 shows such a mosaic covering the full under-body of a vehicle,

generated from a 1D array of 13 cameras spaced 3 inches apart traveling down the length of the vehicle

taking pictures every 3 inches. This is equivalent to a stationary 1D array of cameras and a moving

vehicle. The 1D array of 13 cameras are simulated by laterally shifting the real experimental set-up of 4

side-by-side cameras spaced 3 inch apart. Fig. 18 shows two more examples, with the array of 4 side-by-

side cameras, undertaking more general motion that includes simulating turning, backing-up and

stopping-then-starting of the vehicle. The 2D parallel mosaics are generated in two steps. In the first step,

each lateral mosaic strip is generated from images captured by the 1D array of cameras at each location of

the camera array. Then in the second step, the sequence of the mosaic strips is sewed in the direction of

the vehicle’s motion to generate the full 2D mosaics.


18

a

b

Fig. 18. 2D parallel mosaics from 4 cameras spaced 3 inches apart traveling down the length of the vehicle, (a) as the vehicle turns sharply; and (b) it stops and backs up while traveling before starting forward again.

Fig. 19b shows one of the five mosaics each with different oblique views, generated from a 130-frame

video sequence (sample video frames are shown in Fig. 19a). Different “occluded” regions under a pipe in

the center can be observed by switching to different mosaics in the mosaic-based rendering results (Fig.

20). A PPT demo of these five oblique parallel views of the mosaics can be found at [18]. More results

on 2D parallel-projection mosaics can be found at [14].

In the case of the 1D camera array, the fixed cameras were pre-calibrated and the geometric and

photometric distortions of these wide FOV cameras were corrected. However challenges remain since (1)

the distance between cameras are large compared to the very short viewing distances to the bottom of the

car; and (2) without the assistance of GPS/INS for pose estimation, we need to determine the car’s motion

by other means, e.g. tracking line features on the car. The proposed ray interpolation approach needs to

take these two factors into consideration.


19

a

b

Fig. 19. Under-vehicle inspection: (a) four frames from a 130-frame video sequence with image size 611x447; (b) one of the stereo mosaic pair.

VII. CONCLUSIONS

This paper presents an approach to the fusion of images from many video cameras or a moving video

camera with external orientation data into a few mosaiced images with oblique parallel projections. In

both cases, a virtual 2D array of cameras with FOV overlaps is formed to generate coverage of the entire

scene (or object). The proposed representation provides wide FOV, preserves extensive 3D information,

and represents occlusions. This representation can be used as both an advanced video interface for

surveillance or a pre-processing step for 3D reconstruction.

We present several practical cases where 2D parallel-projection mosaics can be generated, and discuss

related research issues in generating and using the parallel mosaics. In particular, we present a general ray

interpolation approach for parallel-projection mosaic generation, and discuss some practical issues in

generating the mosaics. A mosaic-based 3D rendering method, almost without any computation, allows

very effective 3D rendering of various complicated visual scenes, from forestry scenes to urban scenes,

with various viewing ranges. Experimental results are given for three important applications – aerial

video surveillance, ground mobile robot navigation, and under vehicle inspection.


20

Fig. 20. Mosaic-based vehicle inspection: rendering snapshots with stereoscopic viewing capability

ACKNOWLEDGMENT

This work is supported by National Science Foundation (NSF) under Award EIA- 9726401, Air Force

Research Lab (AFRL) under Grants FA8650-05-1-1853 and F33615-03-1-63-83, Army Research Office

(ARO) under Award No. W911NF-05-1-0011, and by funding from New York Institute for Advanced

Studies and from Atlantic Coast Technologies, Inc.


21

REFERENCES

[1] J Y Zheng and S Tsuji, Panoramic representation for route recognition by a mobile robot, International Journal of Computer Vision, 9(1), 1992: 55-76

[2] S Peleg, B Rousso, A Rav-Akha, A. Zomet, Mosaicing on adaptive manifolds, IEEE Trans. PAMI, 22(10), Oct 2000: 1144-1154.

[3] USGS DEM, http://data.geocomm.com/dem/

[4] S Peleg, M Ben-Ezra and Y Pritch, OmniStereo: panoramic stereo imaging, IEEE Trans. PAMI, March 2001:279-290.

[5] H-Y Shum and R Szeliski, Stereo reconstruction from multiperspective panoramas, ICCV’99: 14-21.

[6] S Baker, R Szeliski and P Anandan, A layered approach to stereo reconstruction. CVPR'98: 434-441

[7] J Shade, S Gortler, L He. and R Szeliski, Layered depth image. SIGGRAPH'98: 231-242

[8] Z Zhu and A R Hanson, LAMP: 3D Layered, Adaptive-resolution and Multi-perspective Panorama - a New Scene Representation, Computer Vision and Image Understanding, 96(3), Dec 2004: 294-326.

[9] A Zomet, D Feldman, S Peleg, D Weinshall, Mosaicing new views: crossed-slits projection, IEEE Trans. PAMI 25(6), June 2003.

[10] J Chai and H-Y Shum, Parallel projections for stereo reconstruction, CVPR'00: II 493-500.

[11] Z Zhu, E M Riseman, A Hanson, Generalized Parallel-Perspective Stereo Mosaics from Airborne Videos, IEEE Trans. PAMI, 26(2), Feb 2004:226-237.

[12] Z Zhu, A R Hanson, H Schultz and E M Riseman, Generation and error characteristics of parallel-perspective stereo mosaics from real video. In Video Registration, M. Shah and R. Kumar (Eds.), Kluwer, 2003: 72-105.

[13] P Dickson, J Li, Z Zhu, A R Hanson, E M Riseman, H Sabrin, H Schultz and G Whitten, Mosaic generation for under-vehicle inspection. WACV’02: 251-256

[14] http://vis-www.cs.umass.edu/projects/uvis/index.html

[15] http://www-cs.engr.ccny.cuny.edu/~zhu/StereoMosaic.html

[16] H Schultz, Terrain reconstruction from widely separated images, SPIE 2486, April 1995: 113-123.

[17] http://www-cs.engr.ccny.cuny.edu/~zhu/CampusVirtualFly.avi

[18] http://www-cs.engr.ccny.cuny.edu/~zhu/mosaic4uvis.html

[19] Z. Zhu, H. Tang, B. Shen, G. Wolberg, 3D and Moving Target Extraction from Dynamic Pushbroom Stereo Mosaics, IEEE Workshop on Advanced 3D Imaging for Safety and Security, June 25, 2005, San Diego, CA, USA

[20] Z. Zhu, E. M. Riseman, A. R. Hanson and H. Schultz, An Efficient Method for Geo-Referenced Video Mosaicing for Environmental Monitoring. Machine Vision Applications Journal, Springer-Verlag, 16(4): 203-216, 2005


22

[21] C. C. Slama (Ed.), Manual of Photogrammetry, Fourth Edition, American Society of Photogrammetry, 1980

[22] http://www-cs.engr.ccny.cuny.edu/~zhu/Multiview/indoor1Render.avi

Zhigang Zhu received his B.E., M.E. and Ph.D. degrees, all in computer

science from Tsinghua University, Beijing, China, in 1988, 1991 and 1997,

respectively. He is currently an Associate Professor in the Department of

Computer Sciences, the City College of the City University of New York, and

is directing the City College Visual Computing Laboratory (CcvcL).

Previously he has been Associate Professor at Tsinghua University, and Senior Research Fellow at the

University of Massachusetts, Amherst. His research interests include 3D computer vision, Human-

Computer Interaction (HCI), virtual / augmented reality, video representation, and various applications in

education, environment, robotics, surveillance and transportation. He has published over 100 technical

papers in the related fields. Dr. Zhu received the Science and Technology Achievement Award (second-

prize winner) from Ministry of Electronic Industry, China, in 1996 and C. C. Lin Applied Mathematics

Award (first prize winner) from Tsinghua University in 1997. His Ph.D. thesis " On Environment

Modeling for Visual Navigation" was selected in 1999 in the top 100 dissertations in China, and a book

based on his Ph.D. thesis was published by China Higher Education Press in December 2001. He was a

recipient of the CUNY Certificate of Recognition "Salute to Scholars" Award, in both 2004 and 2005. He

is a senior member of the IEEE and a member of the ACM.

Allen R. Hanson received his B.S. degree from Clarkson College of

Technology in 1964 and his M.S. and Ph.D. degrees in Electrical Engineering

from Cornell University in 1966 and 1969, respectively. He joined the

Computer Science Department at the University of Massachusetts as


23

Associate Professor in 1981 and has been a full Professor since 1989. Professor Hanson has conducted

research in computer vision, artificial intelligence, learning, and pattern recognition, and has over 200

publications. He is Co-Director of the Computer Vision Laboratory and has a diverse range of recent

research including aerial digital video analysis for environmental science, three-dimensional terrain

reconstruction, distributed sensor networks, motion analysis and tracking, mobile robot navigation, under-

vehicle inspection for security applications, object recognition, image information retrieval, and

technology for the aged. He has served on most of the major conferences in computer vision in various

ways and is a member of the IEEE and ACM.

Documents

Mosaic-Based 3D Scene Representation and …zhu/ICIP05/JIC06_Zhu_Hanson.pdfMosaic-Based 3D Scene Representation and Rendering Zhigang Zhu (Corresponding Author) Department of Computer