1
ABSTRACT A single camera can be a useful surveillance tool, but video recorded from a single point of reference becomes ineffective when objects of interest are blocked by occlusions; furthermore, a traditional camera’s position and focal length cannot be modified after the video is captured. Our system circumvents these shortcomings by processing data from an array of eighteen cameras, synthesizing a single virtual camera whose position and focal length can be adjusted entirely through software. The effective aperture of this virtual camera is larger than any physically realizable aperture, and allows for the reconstruction of partially occluded objects lying on an arbitrary, user- specified plane. The system can be used in real- time or on pre-recorded video to penetrate occlusions such as foliage and crowds, making it a flexible tool for military and civilian surveillance applications where preserving a line of sight is important, even when objects are partially obscured. AUTHORS: Nick Annetta EE ’09 Timothy McKenna EE ’09 Anil Venkatesh EE ’09 Neeraj Wahi EE ’09 ADVISORS: Jan Van der Spiegel Shih-Schön Lin Special thanks to Kostas Daniilidis and Sid Deliwala DEMO TIMES: Thursday, April 23, 2009 9:30, 10:00 AM, 2:00, 2:30 PM SYSTEM DESIGN Our system is designed to collect data in real-time from eighteen video cameras operating with 640x480 resolution at 30 frames per second. This configuration corresponds to a staggering bandwidth of 1.3 gigabits per second. The data is processed in parallel to overcome bandwidth and CPU power constraints, thus enabling real- time performance. Each slave PC performs image transformations on 3 video streams, averages the transformed streams, and sends the result to the master PC. The master PC then performs final calculations before displaying the result in real-time. The computers are networked over Ethernet using the User Datagram Protocol (UDP) to provide bidirectional communication between the master and slave PCs. DEPTH OF FIELD Depth of field is inversely proportional to the size of the aperture. Accordingly, a large aperture is desirable because it allows for the blurring of near- field occluding objects while remaining focused on the object of interest. MATHEMATICAL BACKGROUND To focus on a single plane, the images must be warped before they are combined. This warp is realized by a projective transformation known as homography, which maps the pixel coordinates of one other objects are misaligned and appear blurred when a simple average is taken. CAMERA CALIBRATION Parallax calibration is used in order to find the location of each camera relative to the others. A large checkerboard is placed at two different distances in full view of all the cameras. VIRTUAL VIEWS Least-squares optimization yields a unique homography matrix for every camera, which warps each image to a novel perspective, producing a scene that is captured by no actual camera. A camera’s virtual view homography matrix can be expressed as a linear combination of the camera’s homography matrices to two predetermined cameras, as shown below. image to those of an image from a different perspective. Using homography, every image is warped to a common perspective, aligning just those objects which lie at a given distance from the camera array. All RESULTS Correspondent points are found on the checkerboard between all camera images. The positions of the cameras can be found based on the change in locations of corresponding points. 2 , 1 , ) 1 ( i i i H H H IRIS: VISION BEYOND OBSTRUCTION Image from single camera Image from synthetic aperture camera array

ABSTRACT A single camera can be a useful surveillance tool, but video recorded from a single point of reference becomes ineffective when objects of interest

Embed Size (px)

Citation preview

Page 1: ABSTRACT A single camera can be a useful surveillance tool, but video recorded from a single point of reference becomes ineffective when objects of interest

ABSTRACTA single camera can be a useful surveillance tool, but video recorded from a single point of reference becomes ineffective when objects of interest are blocked by occlusions; furthermore, a traditional camera’s position and focal length cannot be modified after the video is captured.

Our system circumvents these shortcomings by processing data from an array of eighteen cameras, synthesizing a single virtual camera whose position and focal length can be adjusted entirely through software. The effective aperture of this virtual camera is larger than any physically realizable aperture, and allows for the reconstruction of partially occluded objects lying on an arbitrary, user-specified plane.

The system can be used in real-time or on pre-recorded video to penetrate occlusions such as foliage and crowds, making it a flexible tool for military and civilian surveillance applications where preserving a line of sight is important, even when objects are partially obscured.

AUTHORS: Nick Annetta EE ’09

Timothy McKenna EE ’09Anil Venkatesh EE ’09

Neeraj Wahi EE ’09

ADVISORS:Jan Van der Spiegel

Shih-Schön LinSpecial thanks to Kostas Daniilidis and

Sid Deliwala

DEMO TIMES:Thursday, April 23, 2009

9:30, 10:00 AM, 2:00, 2:30 PM

GROUP #6

SYSTEM DESIGNOur system is designed to collect data in real-time from eighteen video cameras operating with 640x480 resolution at 30 frames per second. This configuration corresponds to a staggering bandwidth of 1.3 gigabits per second. The data is processed in parallel to overcome bandwidth and CPU power constraints, thus enabling real-time performance. Each slave PC performs image transformations on 3 video streams, averages the transformed streams, and sends the result to the master PC. The master PC then performs final calculations before displaying the result in real-time. The computers are networked over Ethernet using the User Datagram Protocol (UDP) to provide bidirectional communication between the master and slave PCs.

DEPTH OF FIELDDepth of field is inversely proportional to the size of the aperture. Accordingly, a large aperture is desirable because it allows for the blurring of near- field occluding objects while remaining focused on the object of interest.

MATHEMATICAL BACKGROUNDTo focus on a single plane, the images must be warped before they are combined. This warp is realized by a projective transformation known as homography, which maps the pixel coordinates of one

other objects are misaligned and appear blurred when a simple average is taken.

CAMERA CALIBRATIONParallax calibration is used in order to find the location of each camera relative to the others. A large checkerboard is placed at two different distances in full view of all the cameras.

VIRTUAL VIEWSLeast-squares optimization yields a unique homography matrix for every camera, which warps each image to a novel perspective, producing a scene that is captured by no actual camera. A camera’s virtual view homography matrix can be expressed as a linear combination of the camera’s homography matrices to two predetermined cameras, as shown below.

image to those of an image from a different perspective. Using homography, every image is warped to a common perspective, aligning just those objects which lie at a given distance from the camera array. All

RESULTS

Correspondent points are found on the checkerboard between all camera images. The positions of the cameras can be found based on the change in locations of corresponding points.

2,1,)1( iii HHH

IRIS: VISION BEYOND OBSTRUCTION

Image from single camera Image from synthetic aperture camera array