72
Multiview video compression and display ABSTRACT To perceive three dimensionals, a person’s eyes must see different,slightly unaligned images.In real word the spacing between the eyes makes that happen naturally.One display somehow has to present a different and separate view to each eye. Recent technological advances have made possible number of new applications in the area of 3D video. One of the enabling technologies for many of these 3D applications is multiview video coding. This project reveals signal processing issues related to coded repesentation,reconstruction and rendering of multiview video for 3D display using a Panda board.This technology sheds the clunky,chunky 3-D eyeglasses which were used to view a 3D image. An experimental analysis of multiview video compression for various temporal and inter-view prediction structures is presented in this project. The compression method is based on the multiple reference picture technique in the H.264/AVC video coding standard. The idea is to exploit the statistical dependencies from both temporal and inter- Dept of E&C, VVIET 1 MYSORE

Final Report

Embed Size (px)

Citation preview

Page 1: Final Report

Multiview video compression and display

ABSTRACT

To perceive three dimensionals, a person’s eyes must see different,slightly

unaligned images.In real word the spacing between the eyes makes that happen

naturally.One display somehow has to present a different and separate view to

each eye. Recent technological advances have made possible number of new

applications in the area of 3D video. One of the enabling technologies for many of

these 3D applications is multiview video coding. This project reveals signal

processing issues related to coded repesentation,reconstruction and rendering of

multiview video for 3D display using a Panda board.This technology sheds the

clunky,chunky 3-D eyeglasses which were used to view a 3D image. An

experimental analysis of multiview video compression for various temporal and

inter-view prediction structures is presented in this project. The compression

method is based on the multiple reference picture technique in the H.264/AVC

video coding standard. The idea is to exploit the statistical dependencies from

both temporal and inter-view reference pictures for motion-compensated

prediction.

Dept of E&C, VVIET 1 MYSORE

Page 2: Final Report

Multiview video compression and display

CONTENTS

Chapter 1: Introduction

1.1 Video compression

1.2 History of video compression standards

1.3 Literature survey

1.4 Motivation

1.5 Objective

Chapter 2: Overview of MVC

2.1 Rendering

2.2 Requirements of MVC

Chapter 3: Ubuntu

3.1 Introduction

3.2 Features

3.3 Sytem requirements

3.4 Variants

3.5 Terminal in Ubutu

Dept of E&C, VVIET 2 MYSORE

Page 3: Final Report

Multiview video compression and display

Chapter 1

Introduction

1.1 Video Compression

Video compression refers to reducing the quantity of data used to represent

digital video images, and is a combination of spatial image compression and

temporal motion compensation. Most video compression is lossy , it operates on

the premise that much of the data present before compression is not necessary for

achieving good perceptual quality. Video compression is a tradeoff between disk

space, video quality, and the cost of hardware required to decompress the video in

a reasonable time. However, if the video is over compressed in a lossy manner,

visible (and sometimes distracting) artifacts can appear.

Video data contains spatial and temporal redundancy. Similarities can thus be

encoded by merely registering differences within a frame (spatial), and/or

between frames (temporal). Spatial encoding is performed by taking advantage of

the fact that the human eye is unable to distinguish small differences in color as

easily as it can perceive changes in brightness, so that very similar areas of color

can be "averaged out”. With temporal compression only the changes from one

frame to the next are encoded as often a large number of the pixels will be the

same on a series of frames.

There are two types of video compression:

1. Lossless—Lossless compression preserves all the data, but makes it more

compact. The movie that comes out is exactly the same quality as what went in.

Lossless compression produces very high quality digital audio or video, but

requires a lot of data. The drawback with Lossless compression is that it is

inefficient when trying to maximize storage space or network and Internet

delivery capacity (bandwidth).

2. Lossy—Lossy compression eliminates some of the data. Most images and

sounds have more details than the eye and ear can discern. By eliminating some

Dept of E&C, VVIET 3 MYSORE

Page 4: Final Report

Multiview video compression and display

of these details, Lossy compression can achieve smaller files than Lossless

compression. However, as the files get smaller, the reduction in quality can

become noticeable. The smaller file sizes make Lossy compression ideal for

placing video on a CD-ROM or delivering video over a network or the Internet.

Most codec’s in use today are Lossy codec.

1.2 History of video compression standards

Year Standard Publisher Popular Implementations

1984 H.120 ITU-T

1990 H.261 ITU-TVideoconferencing, Video telephony

1993 MPEG-1 Part 2 ISO, IEC Video-CD

1995H.262/MPEG-2 Part 2

ISO, IEC, ITU-T

DVD Video, Blu-ray, Digital Video Broadcasting, SVCD

1996 H.263 ITU-TVideoconferencing, Video telephony, Video on Mobile Phones (3GP)

1999 MPEG-4 Part 2 ISO, IECVideo on Internet (DivX, Xvid ...)

2003H.264/MPEG-4 AVC

ISO, IEC, ITU-T

Blu-ray, Digital Video Broadcasting, iPod Video, HD DVD

2008 VC-2 (Dirac) ISO, BBCVideo on Internet, HDTV broadcast, UHDTV

Dept of E&C, VVIET 4 MYSORE

Page 5: Final Report

Multiview video compression and display

1.3 Literature survey:

3D video formats

3D depth perception of observed visual scenery can be provided by 3D display

systems which ensure that the user sees a specific different view with each eye.

Such a stereo pair of views must correspond to the human eye positions. Then the

brain can compute the 3D depth perception. History of 3D displays dates back

almost as long as classical 2D cinematography. In the past, users had to wear

specific glasses (anaglyph, polarization, or shutter) to ensure separation of left and

right view which was displayed simultaneously. Together with limited visual

quality this is regarded as main obstacle for wide success of 3D video systems in

home user environments.

1.3.1 Simulcast

The most obvious and straightforward means to represent stereo or multi-view

video is simulcast, where each view is encoded independent of the other. This

solution has low complexity since dependencies between views are not exploited,

thereby keeping computation and processing delay to a minimum. It is also a

backward compatible solution since one of the views could be decoded for legacy

2D displays. With simulcast, each view is assumed to be encoded with full spatial

resolution, however, here asymmetrical coding of stereo is done, whereby one of

the views is encoded with less quality, suggest that substantial savings in bit rate

for the second view could be achieved. In this way, one of the views is more

coarsely quantized than the other or coded with a reduced spatial resolution,

yielding an imperceptible impact on the stereo quality.

1.3.2 Stereo Interleaving

There is a class of formats for stereo content that we collectively refer to as stereo

interleaving. This category includes both time multiplexed and spatial multiplexed

in the time multiplexed format, the left and right views would be interleaved as

Dept of E&C, VVIET 5 MYSORE

Page 6: Final Report

Multiview video compression and display

alternating frames or fields. With spatial multiplexing, the left and right views

would appear in either a side-by-side or over/under format. As is often the case

with spatial multiplexing, the respective views are “squeezed” in the horizontal

dimension or vertical dimension to fit within the size of an original frame. To

distinguish the left and right views, some additional out-of-band signaling is

necessary. For instance, the H.264/AVC standard specifies a Stereo SEI message

that identifies the left view and right view; it also has the capability of indicating

whether the encoding of a particular view is self-contained, i.e., frame or field

corresponding to the left view are only predicted from other frames or fields in the

left view. Inter-view prediction for stereo is possible when the self-contained flag

is disabled. Similar type of signaling would be needed for the spatially

multiplexed content.

1.3.3 2D + Depth

Another well-known representation format is the 2D plus depth format. The

inclusion of depth enables a display independent solution for 3D that supports

generation of an increased number of views as need by any stereoscopic display.

A key advantage is that the main 2D video provides backward compatibility with

legacy devices. Also, this representation is agnostic of coding format, i.e., the

approach works with both MPEG-2 and H.264/AVC. ISO/IEC 23002-3 (also

referred to as MPEG-C Part 3) specifies the representation of auxiliary video and

supplemental information. In particular, it enables signaling for depth map

streams to support 3D video applications.

1.4 Motivation

Literature survey reveals that existing solutions to rendering a 3D view have

number of limitations like

1. Coding efficiency is not maximized since redundancy between views is not

exploited.

2. Above techniques are not backward compatible.

3. 2D+depth technique is only capable of rendering a limited depth range and has

Dept of E&C, VVIET 6 MYSORE

Page 7: Final Report

Multiview video compression and display

problems with occlusions.

Thus in order to exploit inters-image similarities and to overcome the above

listed limitations, an efficient algorithm must be developed.

1.5 Objective

The proposed technique in this project attempts to

1. Improve coding efficiency of multiview video.

2. Provide better results compared to simple AVC-based simulcast for the same

bit rate.

3. Provide backward compatibility.

We have used Linux platform for the implementatin purpose since the Panda

board runs on Linux Kernel.Chapter 3 provides a brief introduction to linux and

it’s usage.

Dept of E&C, VVIET 7 MYSORE

Page 8: Final Report

Multiview video compression and display

Chapter 2

Overview of MVC

2.1 Rendering

Multiview video rendering belongs to the broad research field of image-based

rendering , and has been studied extensively in the literature. In here, we focus on

one particular form of multiview video – multi-stereoscopic video with depth

maps. We assume we are given a number of video sequences captured from

different viewpoints.

Figure 2.1 The rendering process from multiview video.

In the following, we briefly describe the process of rendering an image from a

virtual viewpoint given the image set and the depth map. As shown in Fig 2.1,

given a virtual viewpoint, we first split the to be rendered view into light rays. For

each light ray, we trace the light ray to the surface of the depth map, obtain the

intersection, and re-project the intersection into nearby cameras The intensity of

the light ray is thus the weighted average of the projected light rays in Cam 3 and

Cam 4. The weight can be determined by many factors. In the simplest form, we

can use the angular difference between the light ray to be rendered and the light

ray being projected, assuming the capturing cameras are at roughly the same

distance to the scene objects. Care must be taken to perform such rendering, as

when the virtual viewpoint moves away from Cam 4 (where the depth map is

given), there will be occlusions and holes when computing the light ray/geometry

intersection. In our algorithm, we first convert the given depth map into a 3D

Dept of E&C, VVIET 8 MYSORE

Page 9: Final Report

Multiview video compression and display

mesh surface, where each vertex corresponds to one pixel in the depth map. The

mesh surface is then projected to the capturing cameras to compute any potential

occlusions in the captured images. Finally, the mesh is projected to the virtual

rendering point with multi-texture blending, similar to that in .For each vertex

being rendered, it is projected to the nearby captured images to locate the

corresponding texture coordinate. This process takes into consideration the

occlusion computed earlier. That is, if a vertex is occluded in a nearby view, its

weight for that camera will be set to zero.

With that information, multiple virtual rendering within the estimated range can

be conducted to compute a combined weight map for compression. In addition, if

the user’s viewpoint does not change significantly, we may achieve a similar

effect by simply smoothing the computed weight maps. During adaptive

multiview video compression, the weight map will be converted into a coarser

one for macroblock based encoding, which effectively smoothes the weight map

too.

2.2 MVC Requirements:

2.2.1 Compression Related Requirements

1. Compression efficiency

MVC shall provide high compression efficiency relative to independent coding of

each view of the same content. Some overhead, such as camera parameters, may

be necessary for facilitating view interpolation, i.e., trading coding efficiency for

functionality. However, the overhead data should be limited in order to increase

acceptance of new services.

2. View scalability

MVC shall support a scalable bit stream structure to allow for access of selected

views with minimum decoding effort. This enables the video to be displayed on a

multitude of different terminals and over networks with varying conditions.

Dept of E&C, VVIET 9 MYSORE

Page 10: Final Report

Multiview video compression and display

3. Free viewpoint scalability

MVC shall support a scalable bit stream structure to allow for access to partial

data from which new views can be generated, i.e., not the original camera views,

but the generated views from them. Such content can be delivered to various types

of displays. This enables the functionality of free viewpoint navigation on a

scalability basis.

4. Spatial/Temporal/SNR scalability

SNR scalability, spatial scalability, and temporal scalability should be supported.

5 .Backward compatibility

At any instant in time, the bitstream corresponding to one view shall be

conforming to AVC.

6 .Resource consumption

MVC should be efficient in terms of resource consumption, such as memory size,

memory bandwidth, and processing power.

7 .Low delay

MVC shall support low encoding and decoding delay modes. Low delay is very

important for the real-time applications such as a streaming and broadcasting

using multi-view video.

8 .Robustness

Robustness to errors, also known as error resilience, should be supported. This

enables the delivery of multiview video contents on error-prone networks, such as

wireless networks and other networks.

Dept of E&C, VVIET 10 MYSORE

Page 11: Final Report

Multiview video compression and display

9. Resolution, bit depth, chroma sampling format

MVC shall support spatial resolutions from QCIF to HD. MVC shall support the

YUV 4:2:0 format. MVC shall support 8 bits per pixel component. Future

applications may require higher bit depths and higher chroma sampling formats.

10 .Picture quality among views

MVC should enable flexible quality allocation over different views. For instance,

consistent quality might be required for some applications.

11. Temporal random access

MVC shall support random access in the time dimension. For example, it shall be

possible to access a frame at a given time with minimal decoding of frames in the

time dimension.

12. View random access

MVC shall support random access in the view dimension. For example, it shall be

possible to access a frame in a given view with minimal decoding of frames in the

view dimension.

13 .Spatial random access

MVC should support random access to a spatial area in a picture. This may be

treated as a view random access if a view is composed of several spatially smaller

views.

14. Resource management

MVC shall support efficient management of decoder resources. For instance, the

output timing of multiple pictures requires efficient management. Especially, the

pictures whose time stamps are the same with all views shall be available at the

same time or sequentially from a decoder.

Dept of E&C, VVIET 11 MYSORE

Page 12: Final Report

Multiview video compression and display

15. Parallel processing

MVC shall support parallel processing of different views or segments of the

multi-view video to facilitate efficient encoder and decoder implementations.

2.2.2 System Support Related Requirements

1. Synchronization

MVC shall support accurate temporal synchronization among the multiple views.

2. View generation

MVC should enable robust and efficient generation of virtual views or

interpolated views.

3 .Non-planar imaging and display systems

MVC should support efficient representation and coding methods for 3D display

including integral photography and non-planar image (e.g. dome) display systems.

4 .Camera parameters

MVC should support transmission of camera parameters.

Dept of E&C, VVIET 12 MYSORE

Page 13: Final Report

Multiview video compression and display

Block diagram of MVC system

The overall structure of MVC defining the interfaces is illustrated above.The

MVC encoder receives temporally synchronized video streams and generates one

video stream. The decoder receives the bit stream, decodes and provides separate

view to each eye.

The raw YUV 4:2:0 frames are provided as an input, they are encoded,

compressed using various algorithms of MVC.The output of encoder is given as

an input to decoder, where the frames are decompressed and decoded using

PANDA board which works on LINUX platform.The PANDA board is

interfaced with a suitable 3D device viz 3D-TV, 3D-mobile…

Dept of E&C, VVIET 13 MYSORE

Page 14: Final Report

Multiview video compression and display

CHAPTER 3

UBUNTU

3.1 INTRODUCTION

Ubuntu is a computer operating system based on the Debian Linux distribution

and distributed as free and open source software. It is named after the Southern

African philosophy of Ubuntu ("humanity towards others").

Ubuntu packages are based on packages from Debian's unstable branch: both

distributions use Debian's deb package format and package management tools

(APT and Synaptic). Debian and Ubuntu packages are not necessarily binary

compatible with each other, however, and sometimes .deb packages may need to

be rebuilt from source to be used in Ubuntu. Many Ubuntu developers are also

maintainers of key packages within Debian. Ubuntu cooperates with Debian by

pushing changes back to Debian, although there has been criticism that this does

not happen often enough. In the past, Ian Murdock, the founder of Debian, has

expressed concern about Ubuntu packages potentially diverging too far from

Debian to remain compatible.

3.2 FEATURES

Ubuntu is composed of many software packages, the vast majority of which are

distributed under a free software license. The only exceptions are some

proprietary hardware drivers. The main license used is the GNU General Public

License (GNU GPL) which, along with the GNU Lesser General Public License

(GNU LGPL), explicitly declares that users are free to run, copy, distribute, study,

change, develop and improve the software. On the other hand, there is also

proprietary software available that can run on Ubuntu. Ubuntu focuses on

usability, security and stability .The Ubiquity installer allows Ubuntu to be

installed to the hard disk from within the Live CD environment, without the need

for restarting the computer prior to installation. Ubuntu also emphasizes

Dept of E&C, VVIET 14 MYSORE

Page 15: Final Report

Multiview video compression and display

accessibility and internationalization to reach as many people as possible.

Beginning with 5.04, UTF-8 became the default character encoding, which allows

for support of a variety of non-Roman scripts. As a security feature, the sudo tool

is used to assign temporary privileges for performing administrative tasks,

allowing the root account to remain locked, and preventing inexperienced users

from inadvertently making catastrophic system changes or opening security holes.

PolicyKit is also being widely implemented into the desktop to further harden the

system through the principle of least privilege.

Ubuntu comes installed with a wide range of software that includes OpenOffice,

Firefox, Empathy (Pidgin in versions before 9.10), Transmission, GIMP (in

versions prior to 10.04), and several lightweight games (such as Sudoku and

chess). Additional software that is not installed by default can be downloaded and

installed using the Ubuntu Software Center or the package manager Synaptic,

which come pre-installed. Ubuntu allows networking ports to be closed using its

firewall, with customized port selection available. End-users can install Gufw

(GUI for Uncomplicated Firewall) and keep it enable GNOME (the current

default desktop) offers support for more than 46 languages. Ubuntu can also run

many programs designed for Microsoft Windows (such as Microsoft Office),

through Wine or using a Virtual Machine (such as VMware Workstation or

VirtualBox). For the upcoming 11.04 release, Canonical intends to drop the

GNOME Shell as the default desktop environment in favor of Unity, a graphical

interface it first developed for the notebook edition of ubuntu.

Ubuntu, unlike Debian, compiles their packages using gcc features such as PIE

and Buffer overflow protection to harden their software. These extra features

greatly increase security at the performance expense of 1% in 32 bit and 0.01% in

64 bit.

3.3 SYSTEM REQUIREMENTS

Dept of E&C, VVIET 15 MYSORE

Page 16: Final Report

Multiview video compression and display

The desktop version of Ubuntu currently supports the x86 32 bit and 64 bit

architectures. Unofficial support is available for the PowerPC, IA-64 (Itanium)

and PlayStation 3 architectures. A supported GPU is required to enable desktop

visual effects.

3.4 VARIANTS

The variants recognized by Canonical as contributing significantly towards the

Ubuntu project are the following:

Edubuntu: A GNOME-based subproject and add-on for Ubuntu, designed for

school environments and home users.

Kubuntu: A desktop distribution using the KDE Plasma Workspaces desktop

environment rather than GNOME.

Mythbuntu is designed for creating a home theater PC with Myth TV and uses the

Xfce desktop environment.

Ubuntu Studio: A distribution made for professional video and audio editing,

comes with higher-end free editing software and is a DVD .iso image unlike the

Live CD the other Ubuntu distributions use.

Xubuntu: A distribution based on the Xfce desktop environment instead of

GNOME, designed to run more efficiently on low-specification computers.

3.5 TO OPEN A TERMINAL IN UBUTU

All the commands in the Linux are typed in the terminal. To open the terminal go

to application in the toolbar, then accessories is selected then click on terminal, a

dialogue box appear which is shown in the figure 3.1

Dept of E&C, VVIET 16 MYSORE

Page 17: Final Report

Multiview video compression and display

Figure 3.1: To open a terminal in linux.

Table 3.1: Commands of Linux and their description

COMMANDS DESCRIPTION

cd filename Opens the specified directory

cd Desktop To open a folder on the Desktop

ls Gives the list of the files inside the folder

make clean Deletes the previously generated object files

make To build executable and Objective

Dept of E&C, VVIET 17 MYSORE

Page 18: Final Report

Multiview video compression and display

files

./configure To build configuration file

./filename.exe To run the executable files on linux

exit To close the terminal

gtkterm To open gtk for serial communication

CHAPTER 4

MULTIVIEW VIDEO CODING

4.1 SIMILARITIES IN TIME AND AMONG VIEWS

Exploiting similarities among the multi-view video images is the key to efficient

compression. When considering temporally successive images of one view

sequence, i.e. one row of the MOP, the same view-point is captured at different

time instances. Usually, the same objects appear in successive images but

possibly at different pixel locations. If so, objects are in motion and practical

compression schemes utilize motion compensation techniques to exploit these

temporal similarities. On the other hand, spatially neighboring views captured at

the same time instant, i.e., images in one column of the MOP, show the same

objects from different view-points. Similar to the previous case, the same objects

appear in neighboring views but at different pixel locations. Here, the objects in

each image are subject to parallax and practical compression schemes use

disparity compensation techniques to exploit these inter-view similarities.

4.1.1. Temporal Similarities

Consider temporally successive images of one view sequence, i.e., one row of the

MOP. If objects in the scene are subject to motion, the same objects appear in

Dept of E&C, VVIET 18 MYSORE

Page 19: Final Report

Multiview video compression and display

successive images but at different pixel locations. To exploit these temporal

similarities, sophisticated motion compensation techniques have been developed

in the past. Frequently used are so-called block matching techniques where a

motion vector establishes a correspondence between two similar blocks of pixels

chosen from two successive images. Practical compression schemes signal this

motion vectors to the decoder as part of the bit-stream. Variable block size

techniques improve the adaptation of the block motion held to the actual shape of

the object. Lately, so-called multi-frame techniques have been developed. Classic

block matching techniques use a single preceding image when choosing a

reference for the correspondence. Multi-frame techniques, on the other hand,

permit choosing the reference from several previously transmitted images; a

different image could be selected for each block. Finally, superposition

techniques are also used widely. Here, more than one correspondence per block of

pixels is specified and signaled as part of the bit-stream. A linear combination of

the blocks resulting from multiple correspondences is used to better match the

temporal similarities. A special example is the so-called bidirectionally predicted

picture where blocks resulting from two correspondences are combined. One

correspondence uses a temporally preceding reference; the other uses a temporally

succeeding reference. The generalized version is the so-called bi-predictive

picture. Here, two correspondences are chosen from an arbitrary set of available

reference images.

4.1.2. Inter-View Similarities

Consider spatially neighboring views captured at the same time instant, i.e.,

images in one column of the MOP. Objects in each image are subject to parallax

and appear at different pixel locations. To exploit these inter-view similarities,

disparity compensation techniques are used. The simplest approach to disparity

compensation is block matching techniques similar to those used for motion

compensation. These techniques offer the advantage of not requiring knowledge

of the geometry of the underlying 3D objects. However, if the cameras are

sparsely distributed, the block-based translatory disparity model fails to

Dept of E&C, VVIET 19 MYSORE

Page 20: Final Report

Multiview video compression and display

compensate accurately. More advanced approaches to disparity compensation are

depth-image-based rendering algorithms. They synthesize an image as seen from

a given view-point by using the reference texture and depth image as input data.

These techniques offer the advantage that the given view-point image is

compensated more accurately even when the cameras are sparsely distributed.

However, these techniques rely on accurate depth images, which are difficult to

estimate. Finally, hybrid techniques that combine the advantages of both

approaches may also be considered. For example, if the accuracy of a depth image

is not sufficient for accurate depth-image-based rendering, block-based

compensation techniques may be used on top for selective refinement.

4.2 COMPRESSION SCHEMES

The vast amount of multi-view data is a huge challenge not only for capturing and

processing but also for compression. Efficient compression exploits statistical

dependencies within the multi-view video imagery. Usually, practical schemes

accomplish this either with predictive coding or with sub band coding. In both

cases, motion compensation and disparity compensation are employed to make

better use of statistical dependencies. Note that predictive coding and sub band

coding have different constraints for efficient compression.

Predictive Coding

Predictive coding schemes encode multiview video imagery sequentially. Two

basic types of coded pictures are possible: intra and inter pictures. Intra pictures

are coded independently of any other image. Inter pictures, on the other hand,

depend on one or more reference pictures that have been encoded previously. By

design, an intra picture does not exploit the similarities among the multiview

images. But an inter picture is able to make use of these similarities by choosing

one or more reference pictures and generating a motion- and/or disparity-

compensated image for efficient predictive coding. The basic ideas of motion-

compensated predictive coding are summarized in the box “Motion-Compensated

Dept of E&C, VVIET 20 MYSORE

Page 21: Final Report

Multiview video compression and display

Predictive Coding.” When choosing the encoding order of images, various

constraints should be considered. For example, high coding efficiency

as well as good temporal multi resolution properties may be desirable.

Motion-compensated predictive coding of image sequences is accomplished with

intra and inter pictures. As depicted in Figure 4.1(a), the input image xk is

independently encoded into the intra picture IIk. The intra decoder is used to

independently reconstruct the image ˆxk. The input image xk is predicted by the

motion-compensated (MC) reference image ˆxr. The prediction error, also called

displaced frame difference (DFD), is encoded and constitutes, in combination

with the motion information, the inter picture Pk. The interpicture decoder

reverses this process but requires the same reference image ˆxr to be present at the

decoder side. If the reference picture differs at encoder and decoder sides, e.g.,

because of network errors, the

decoder is not able to reconstruct the same image ˆxk that the encoder has

encoded. Note that reference\ pictures can be either reconstructed intra pictures or

other reconstructed inter pictures.

Figure 4.1(b) shows the “basic” inter picture (predictive picture), which chooses

only one reference picture for compensation. More advanced are bipredictive

pictures that use a linear combination of two motion-compensated reference

pictures. Bidirectional motion-compensated prediction is a special case of

bipredictive pictures and is widely employed in standards like MPEG-1, MPEG-2,

and H.263.

Dept of E&C, VVIET 21 MYSORE

Page 22: Final Report

Multiview video compression and display

Fig 4.1: motion compensated predictive coding.

4.3 MVC ENCODING

The block diagram showing various steps in encoding are shown in the

Fig4.2 .The picture captured by various cameras are denoted as “view i picture”.

It is given as an input to the MVC encoder. The various steps in encoding are

described below.

.

Dept of E&C, VVIET 22 MYSORE

View i picture

Motion Compensation

Motion Estimation

Entropy Coding

Bitstream+ Quantization

IQuantization

+

Transform

ITransform

Deblocking Filter

Reference Picture

Store for View i

Reference Picture

Store for Other Views

Disparity/Illumination

Compensation

Disparity/Illumination Estimation

Intra Prediction

-

+

+

+

Mode Decision

Page 23: Final Report

Multiview video compression and display

Figure 4.2: Block diagram of MVC encoder

4.3.1 VIDEO FORMAT

YUV is a color space typically used as part of a color image pipeline. It encodes a

color image or video taking human perception into account, allowing reduced

bandwidth for chrominance components, thereby typically enabling transmission

errors or compression artifacts to be more efficiently masked by the human

perception than using a "direct" RGB-representation. Other color spaces have

similar properties, and the main reason to implement or investigate properties of

Y'UV would be for interfacing with analog or digital television or photographic

equipment that conforms to certain Y'UV standards. The raw YUV frames used

here is of format 4:2:0 i.e. for 4 Y components, one Cb and one Cr are transmitted

alternatively. This format is usually used in the video broadcasting because the

temporal and spatial resolutions are high.

4.3.2 TRANSFORM

The kind of transform used in the Multiview Video Compression is the DCT. A

discrete cosine transform (DCT) expresses a sequence of finitely many data points

in terms of a sum of cosine functions oscillating at different frequencies. The use

of cosine rather than sine functions is critical in these applications: for

compression, it turns out that cosine functions are much more efficient. The DCT is

applied on 8x8 block . The DCT equation (Eq.1)computes the i,j th

entry of the DCT of an image.

Dept of E&C, VVIET 23 MYSORE

Page 24: Final Report

Multiview video compression and display

p x, y th is the x,y th element of the image represented by the matrix p. N is the size

of the block that the DCT is done on. The equation calculates one entry (i,j th ) of

the transformed image from the pixel values of the original image matrix. For the

standard 8x8 block that JPEG compression uses, N equals 8 and x and y range

from 0 to 7. Therefore D i, j th would be as in Equation (3).

Because the DCT uses cosine functions, the resulting matrix depends on the

horizontal, diagonal, and vertical frequencies. Therefore an image black with a lot

of change in has a very random looking resulting matrix, while an image matrix of

just one color, has a resulting matrix of a large value for the first element and

zeroes for the other elements.

4.3.3 QUANTIZATION

Quantization is the process of mapping a large set of input values to a smaller set

– such as rounding values to some unit of precision. A device or algorithmic

function that performs quantization is called a quantizer. Quantization is involved

to some degree in nearly all digital signals processing, as the process of

representing a signal in digital form ordinarily involves rounding. Quantization

also forms the core of essentially all lossy compression algorithms.

Dept of E&C, VVIET 24 MYSORE

Page 25: Final Report

Multiview video compression and display

Because quantization is a many-to-few mapping, it is an inherently non-linear and

irreversible process (i.e., because the same output value is shared by multiple

input values, it is impossible in general to recover the exact input value when

given only the output value).

4.3.4 MOTION ESTIMATOR

Motion estimation is the process of determining motion vectors that describe the

transformation from one 2D image to another; usually from adjacent frames in a

video sequence. It is an ill-posed problem as the motion is in three dimensions but

the images are a projection of the 3D scene onto a 2D plane. The motion vectors

may relate to the whole image (global motion estimation) or specific parts, such

as rectangular blocks, arbitrary shaped patches or even per pixel. The motion

vectors may be represented by a translational model or many other models that

can approximate the motion of a real video camera, such as rotation and

translation in all three dimensions and zoom.

Closely related to motion estimation is optical flow, where the vectors correspond

to the perceived movement of pixels. In motion estimation an exact 1:1

correspondence of pixel positions is not a requirement.

Applying the motion vectors to an image to synthesize the transformation to the

next image is called motion compensation. The combination of motion estimation

and motion compensation is a key part of video compression as used by MPEG 1,

2 and 4 as well as many other video codec’s.

There are many types of motion estimation techniques. High efficiency is

achieved in EPZS.

The Enhanced Predictive Zonal Search (EPZS) for motion estimation.

EPZS, similar to other predictive algorithms, mainly comprises 3 steps. The initial

predictor selection, selects the best MV predictor from a set of potentially likely

predictors, the adaptive early termination allows the termination of the search at

given stages of the estimation if some rules are satisfied, while the prediction

Dept of E&C, VVIET 25 MYSORE

Page 26: Final Report

Multiview video compression and display

refinement, employs a refinement pattern around the best predictor to essentially

improve the final prediction. All these features are very vital to the performance

of EPZS algorithms.

4.3.5 MOTION COMPENSATION

Motion compensation is an algorithmic technique employed in the encoding of

video data for video .Motion compensation describes a picture in terms of the

transformation of a reference picture to the current picture. The reference picture

may be previous in time or even from the future. When images can be accurately

synthesized from previously transmitted/stored images, the compression

efficiency can be improved.

Motion compensation exploits the fact that, often, for many frames of a movie,

the only difference between one frame and another is the result of either the

camera moving or an object in the frame moving. In reference to a video file, this

means much of the information that represents one frame will be the same as the

information used in the next frame. This is called Spatial Redundancy. Detailed

explanation of motion compensation is given in section 2.2

4.3.6 DEBLOCKING FILTER

A deblocking filter is applied to blocks in decoded video to improve visual quality

and prediction performance by smoothing the sharp edges which can form

between macro blocks when block coding techniques are used. The filter aims to

improve the appearance of decoded pictures.

In H.264 deblocking filter is not an optional additional feature in the decoder. It is

a feature on both the decoding path and on the encoding path, so that the in-loop

effects of the filter are taken into account in reference macro blocks used for

prediction. When a stream is encoded, the filter strength can be selected, or the

filter can be switched off entirely. Otherwise, the filter strength is determined by

Dept of E&C, VVIET 26 MYSORE

Page 27: Final Report

Multiview video compression and display

coding modes of adjacent blocks, quantization step size, and the steepness of the

luminance gradient between blocks.

The filter operates on the edges of each 4×4 or 8×8 transform block in the luma

and chroma planes of each picture. Each small block's edge is assigned a

boundary strength based on whether it is also a macro block boundary, the coding

(intra/inter) of the blocks, whether references (in motion prediction and reference

frame choice) differ, and whether it is a luma or chroma edge. Stronger levels of

filtering are assigned by this scheme where there is likely to be more distortion.

The filter can modify as many as three samples on either side of a given block

edge (in the case where an edge is a luma edge that lies between different macro

blocks and at least one of them is intra coded). In most cases it can modify one or

two samples on either side of the edge (depending on the quantization step size,

the tuning of the filter strength by the encoder, the result of an edge detection test,

and other factors). one or more reference pictures and generating a motion and/or

disparity compensated image for efficient predictive coding.

4.3.7 ENTROPY ENCODER

An entropy encoding is a lossless data compression scheme that is independent of

the specific characteristics of the medium.

One of the main types of entropy coding creates and assigns a unique prefix-free

code to each unique symbol that occurs in the input. These entropy encoders then

compress data by replacing each fixed-length input symbol by the corresponding

variable-length prefix-free output codeword. The length of each codeword is

approximately proportional to the negative logarithm of the probability.

Therefore, the most common symbols use the shortest codes.

Dept of E&C, VVIET 27 MYSORE

Page 28: Final Report

Multiview video compression and display

There are two types of Entropy encoding

1. CABAC (Context Based Adaptive Binary Arithmetic Coding).

2. CAVLC (Context Based Adaptive Variable Length Coding).

Context-Based Adaptive Binary Arithmetic Coding (CABAC)

The arithmetic coding scheme selected for H.264, Context-based Adaptive

Binary Arithmetic Coding or CABAC [3], achieves good compression

performance through

(a) Selecting probability models for each syntax element according to the

element’s context,

(b) Adapting probability estimates based on local statistics and (c) using

arithmetic coding.

Coding a data symbol involves the following stages.

1. Binarization: CABAC uses Binary Arithmetic Coding which means that only

binary decisions (1 or 0) are encoded. A non-binary-valued symbol (e.g. a

transform coefficient or motion vector) is “binarized” or converted into a binary

code prior to arithmetic coding. This process is similar to the process of

converting a data symbol into a variable length code but the binary code is further

encoded (by the arithmetic coder) prior to transmission.

Stages 2, 3 and 4 are repeated for each bit (or “bin”) of the binarized symbol.

2. Context model selection: A “context model” is a probability model for one or

more bins of the binarized symbol. This model may be chosen from a selection of

available models depending on the statistics of recently-coded data symbols. The

context model stores the probability of each bin being “1” or “0”.

3. Arithmetic encoding: An arithmetic coder encodes each bin according to the

selected probability model. Note that there are just two sub-ranges for each bin

(corresponding to “0” and “1”).

Dept of E&C, VVIET 28 MYSORE

Page 29: Final Report

Multiview video compression and display

4. Probability update: The selected context model is updated based on the actual

coded value (e.g. if the bin value was “1”, the frequency count of “1”s is

increased).

4.3.8 MODE DECISION

A low complexity mode decision algorithm is proposed to reduce complexity of

ME and DE. An experimental analysis is performed to study inter-view

correlation in the coding information such as the prediction mode and rate

distortion (RD) cost. Based on the correlation, we propose four efficient mode

decision techniques, including early SKIP mode decision, adaptive early

termination, fast mode size decision and selective intra coding in inter frame.

Experimental results show that the proposed algorithm can significantly reduce

computational complexity of MVC while maintaining almost the same RD

performance.

4.4 MVC DECODER

The exact reverse process of encoder takes place in decoder.The block diagram

of MVC decoder is shown in Figure 4.3.

Dept of E&C, VVIET 29 MYSORE

Page 30: Final Report

Multiview video compression and display

Fig 4.3: MVC decoder.

Coded bitstream is applied to the entropy decoder then the decoded bits are

subjected to inverse quantization ad inverse transformation to get the decoded

YUV.

There are two ways of decoding, it can be from intra prediction and inter

prediction. Intra pictures are coded independently, where as the inter pictures

depend on one or more reference pictures that have been decoded previously. By

design, an intra picture does not exploit the similarities among the multi-view

images. But an inter picture is able to make use of these similarities by choosing

Dept of E&C, VVIET 30 MYSORE

Page 31: Final Report

Multiview video compression and display

one or more reference pictures and generating a motion and/or disparity

compensated image for efficient predictive coding.

The signal obtained by the inverse quantization and inverse DCT transform is

summed with output of intra prediction or the inter prediction. The mode is the

algorithm based switches used to select either inter or intra prediction signals. The

summed signals are given to the de-blocking filter. The de-blocking filter is

applied to blocks in decoded video to improve visual quality an prediction

performance by smoothing the sharp edges which can form between macro blocks

when block coding techniques are used. The filter aims to remove discontinuities

in the picture block. The filter output is now stored in the picture memory for the

further computation. The reference pictures stored in picture memory are pointed

by thr reference picture index obtained by the entropy decoder.

The decoded and reconstructed signals are finally obtained from the de-blocking

filter.

In this Chapter discussion was done on the coding and decoding of the YUV

frames. Next chapter reveals about experimentation and the test results in the

Linux platform

Dept of E&C, VVIET 31 MYSORE

Page 32: Final Report

Multiview video compression and display

4.5 Flowchart

1.MVC encoder

Dept of E&C, VVIET 32 MYSORE

Page 33: Final Report

Multiview video compression and display

2.MVC Decoder

Dept of E&C, VVIET 33 MYSORE

Page 34: Final Report

Multiview video compression and display

CHAPTER 5

EXPERIMENTATION AND RESULTS

5.1 EXPERIMENTATION ON LINUX PLATFORM (UBUNTU)

Step 1: CROSS COMPILATION

The cross compilation is done by pointing CC in the make file to the gnu

arm tool chain.

Step2: If any objective files are created then they are deleted using the command

make clean

Step3: To build objective and executable files make command is used.

Step4: The required executable files, input and the configuration files are copied

to the SD card.

Step5: The SD card is inserted to the PANDA board,5V power supply is given to

the board and the serial port of the computer is connected to the Panda board,

open gtkterm window to communicate with that of a serial port.

Step6:The baud rate is set to maximum. We have used a baud rate of 115200.

Step7: Panda board gets booted by 5V power supply, and then the executable

files are made to run on the Panda board using the command ./filename.exe.

Step8: The output obtained is verified and compression ratio is calculated.

Dept of E&C, VVIET 34 MYSORE

Page 35: Final Report

Multiview video compression and display

Figure 5.1: showing Step 2 and 3.

Dept of E&C, VVIET 35 MYSORE

Page 36: Final Report

Multiview video compression and display

5.2 TEST RESULTS

TEST-1

Number of frames to be coded:3

Output of the encoder:

Parsing

Configfile

encoder_stereo.cfg.....................................................................................................

....................................................................................................................................

....................................................................................................................................

..................................

Warning: Hierarchical coding or Referenced B slices used.

Make sure that you have allocated enough references

in reference buffer to achieve best performance.

------------------------------- JM 17.2 (FRExt) -------------------------------

Input YUV file : left_432x240.yuv

Input YUV file 2 : right_432x240.yuv

Output H.264 bitstream : test.264

Output YUV file : test_rec.yuv

Output YUV file 2 : test_rec2.yuv

YUV Format : YUV 4:2:0

Frames to be encoded : 3

Freq. for encoded bitstream : 30.00

PicInterlace / MbInterlace : 0/0

Dept of E&C, VVIET 36 MYSORE

Page 37: Final Report

Multiview video compression and display

Transform8x8Mode : 1

ME Metric for Refinement Level 0 : SAD

ME Metric for Refinement Level 1 : Hadamard SAD

ME Metric for Refinement Level 2 : Hadamard SAD

Mode Decision Metric : Hadamard SAD

Motion Estimation for components : Y

Image format : 320x240 (320x240)

Error robustness : Off

Search range : 32

Total number of references : 5

References for P slices : 5

References for B slices (L0, L1) : 5, 1

Sequence type : Hierarchy (QP: I 28, P 28, B 30)

Entropy coding method : CABAC

Profile/Level IDC : (128,40)

Motion Estimation Scheme : EPZS

EPZS Pattern : Extended Diamond

EPZS Dual Pattern : Extended Diamond

EPZS Fixed Predictors : All P + B

EPZS Temporal Predictors : Enabled

EPZS Spatial Predictors : Enabled

Dept of E&C, VVIET 37 MYSORE

Page 38: Final Report

Multiview video compression and display

EPZS Threshold Multipliers : (1 0 2)

EPZS Subpel ME : Basic

EPZS Subpel ME BiPred : Basic

Search range restrictions : none

RD-optimized mode decision : used

Data Partitioning Mode : 1 partition

Output File Format : H.264/AVC Annex B Byte Stream Format

------------------------------------------------------------------------------------

Frame View Bit/pic QP SnrY SnrU SnrV Time(ms) MET(ms) Frm/Fld

Ref

------------------------------------------------------------------------------------

00000(NVB) 480

00000(IDR) 0 189936 28 36.814 35.359 35.318 1549 0 FRM 3

00000( P ) 1 135344 28 35.293 39.691 38.779 2384 344 FRM 2

00002( P ) 0 112176 28 37.830 35.056 34.754 1995 361 FRM 2

00002( P ) 1 91032 28 40.726 35.247 34.447 1761 512 FRM 2

00001( B ) 0 147672 30 33.395 31.602 31.631 3232 1084 FRM 0

00001( B ) 1 115024 30 34.741 32.077 33.265 3225 1259 FRM 0

-------------------------------------------------------------------------------

Total Frames: 6

Leaky BucketRateFile does not have valid entries.

Dept of E&C, VVIET 38 MYSORE

Page 39: Final Report

Multiview video compression and display

Using rate calculated from avg. rate

Number Leaky Buckets: 8

Rmin Bmin Fmin

3955920 193416 193416

4944900 189936 189936

5933880 189936 189936

6922860 189936 189936

7911840 189936 189936

8900820 189936 189936

9889800 189936 189936

10878780 189936 189936

------------------ Average data all frames -----------------------------------

Total encoding time for the seq. : 14.148 sec (0.42 fps)

Total ME time for sequence : 3.563 sec

Y { PSNR (dB), cSNR (dB), MSE } : { 36.467, 35.888, 16.76049 }

U { PSNR (dB), cSNR (dB), MSE } : { 34.839, 34.125, 25.15187 }

V { PSNR (dB), cSNR (dB), MSE } : { 34.699, 34.205, 24.69435 }

View0_Y { PSNR (dB), cSNR (dB), MSE } : { 36.013, 35.577, 18.00490 }

View0_U { PSNR (dB), cSNR (dB), MSE } : { 34.006, 33.649, 28.06361 }

View0_V { PSNR (dB), cSNR (dB), MSE } : { 33.901, 33.580, 28.51354 }

Dept of E&C, VVIET 39 MYSORE

Page 40: Final Report

Multiview video compression and display

View1_Y { PSNR (dB), cSNR (dB), MSE } : { 36.920, 36.223, 15.51609 }

View1_U { PSNR (dB), cSNR (dB), MSE } : { 35.671, 34.659, 22.24014 }

View1_V { PSNR (dB), cSNR (dB), MSE } : { 35.497, 34.935, 20.87516 }

Total bits : 791664 (I 189936, P 338552, B 262696 NVB 480)

View 0 Total-bits : 450104 (I 189936, P 112176, B 147672 NVB 320)

View 1 Total-bits : 341560 (I 0, P 226376, B 115024 NVB 160)

Bit rate (kbit/s) @ 30.00 Hz : 7916.64

View 0 BR (kbit/s) @ 30.00 Hz : 4501.04

View 1 BR (kbit/s) @ 30.00 Hz : 3415.60

Bits to avoid Startcode Emulation : 28

Bits for parameter sets : 480

Bits for filler data : 0

real 0m0.271s

user 0m0.212s

sys 0m0.056s

------------------------------------------------------------------------------

Dept of E&C, VVIET 40 MYSORE

Page 41: Final Report

Multiview video compression and display

OUTPUT OF DECODER

Input H.264 bitstream : test.264

Output decoded YUV : test_dec.yuv

Input reference file : test_rec.yuv

POC must = frame# or field# for SNRs to be correct

--------------------------------------------------------------------------

Frame POC Pic# QP SnrY SnrU SnrV Y:U:V Time(ms)

--------------------------------------------------------------------------

00000(IDR) 0 0 28 0.0000 0.0000 0.0000 4:2:0 24

00000( P ) 0 0 28 13.8138 16.1082 15.1999 4:2:0 16

00002( P ) 4 1 28 0.0000 0.0000 0.0000 4:2:0 15

00002( P ) 4 1 28 18.0149 15.3684 14.0144 4:2:0 13

00001( b ) 2 2 30 0.0000 0.0000 0.0000 4:2:0 17

00001( b ) 2 2 30 15.6719 13.6850 13.0119 4:2:0 15

-------------------- Average SNR all frames ------------------------------

SNR Y(dB) : 7.92

SNR U(dB) : 7.53

SNR V(dB) : 7.04

Total decoding time : 0.102 sec (58.824 fps)[6 frm/102 ms]

--------------------------------------------------------------------------

Exit JM 17 (FRExt) decoder, ver 17.2

Output status file : log.dec

real 0m0.870s

Dept of E&C, VVIET 41 MYSORE

Page 42: Final Report

Multiview video compression and display

user 0m0.228s

sys 0m0.044s

Similarly the encoder and decoder were successfully tried with 100 and 135

frames respectively.

The results obtained in various trials can be tabulated as below

Table 5.1: Experimental results

Number of frames to be coded 3 100 135

Input YUV 14.8 MB 14.8 MB 14.8 MB

Test.264 97KB 1.6MB 2.1MB

Reconstructed YUV(output of

encoder)

338KB 11 MB 14.8 MB

Decoded YUV(output of decoder) 338KB 11 MB 14.8 MB

Compression Ratio .6 5.4 7.09

Real .870s 12.25s 17.23s

User .228s 9.16s 13.16s

System .044s .83s 1.29s

As seen from the table 5.1 greater the number of frames to be ccoded more is the

time taken for execution. Trial 3 i.e. coding all the frames of the given view has

got higher compression ratio and note that the input YUV, Reconstructed YUV

and Decoded YUV are having same size. Thus revealing encoding and decoding

are done effectively.

The application and future enhancement of proposed technique is discussed in the

following chapter.

Dept of E&C, VVIET 42 MYSORE

Page 43: Final Report

Multiview video compression and display

Application and Future enhancement

1. Application

1. Free view point television

2. The 3D technique using a cellophane sheet was applied to a laparoscope

in order to expand the limited viewing capability of this minimum

invasive surgical device. A unique feature of this 3D laproscope is that it

includes a virtual ruler to measure distances without physically touching

affected areas.

3. 3D games designed using MVC can draw the world at any angle and can

have the player walk in any increment of steps they choose.

4. Immersive teleconference.

5. 3d-mobiles

6. 3d-Television

2.Future Enhancement

As we saw from the statistics in chapter 5,the time reqired for coding is

high.MVC can be enhanced by minimizing time required for encoding and

decoding such that it can be used for real time application.

Dept of E&C, VVIET 43 MYSORE

Page 44: Final Report

Multiview video compression and display

CONCLUSION

The presented prediction structures for multi-view video coding are based on the

fact that multiple video bit-streams, showing the same scene from different

camera perspectives, show significant inter-view statistical dependencies. The

corresponding evaluation pointed out, that these correlations can be exploited for

efficient coding of multi-view video data. The multiview prediction structures

have the advantage of achieving significant coding gains and being highly flexible

regarding their adaptation to all kinds of spatial and temporal setups at the same

time. These prediction structures for multi-view video coding are very similar to

H.264/AVC and require only very minor syntax changes. Regarding coding

efficiency, Coding gains up to 3.2 dB and an average gain of 1.5 dB could be

achieved.

Dept of E&C, VVIET 44 MYSORE

Page 45: Final Report

Multiview video compression and display

APPENDICES

TECHNICAL SPECIFICATIONS OF PANDA BOARD

General

Low-cost mobile software development platform

1080p video, WLAN, Bluetooth & more

Dual core ARM CortexTM-A9 MPCore benefits

Community-driven projects & support

Display

HDMI v1.3 Connector (Type A) to drive HD displays

Dept of E&C, VVIET 45 MYSORE

Page 46: Final Report

Multiview video compression and display

DVI-D Connector (can drive a 2nd display, simultaneous display;

requires HDMI to DVI-D adapter)

LCD expansion header

Camera

Camera connector

Audio

3.5" Stereo Audio in/out

HDMI Audio out

Wireless Connectivity

802.11 b/g/n (based on WiLink 6.0)

Bluetooth v2.1 + EDR (based on WiLink 6.0)

Memory

1 GB low power DDR2 RAM

Full size SD/MMC card cage with support for High-Speed & High-

Capacity SD cards

Connectivity

Onboard 10/100 Ethernet

Expansion

1x USB 2.0 High-Speed On-the-go port

2x USB 2.0 High-Speed host ports

General purpose expansion header (I2C, GPMC, USB, MMC, DSS,

ETM)

Dept of E&C, VVIET 46 MYSORE

Page 47: Final Report

Multiview video compression and display

Camera expansion header

Debug Board

10/100 BASE-T Ethernet (RJ45 connector)

Mini-AB USB port (For debug UART connectivity)

60-pin MIPI Debug expansion connector

Debug LED

1 GPIO Button

Dimensions

Height: 4.5" (114.3 mm)

Width: 4.0" (101.6 mm)

Weight: 2.6 oz (74 grams)

PandaBoard component

Function Vendor Part ID

Application Processor TI OMAP4430

Memory Elpida EDB8064B1PB-8D-F

Power Management IC TI TWL6030

Audio IC TI TWL6040

Connectivity LSR LS240-WI-01-A20

4 Port USB Hub/Ethernet SMSC LAN9514-JZX

DVI Transmitter TI TFP410PAP

Dept of E&C, VVIET 47 MYSORE

Page 48: Final Report

Multiview video compression and display

Function Vendor Part ID

3.5 MM Dual Stacked Audio KYCON STX-4235-3/3-N

Bibliography

[1] A. Kubota, A. Smolic, M. Magnor, M. Tanimoto, T. Chen, and C. Zhang,

“Multi-view imaging and 3dtv,” IEEE Signal Processing Magazine, vol.

24, no. 6, pp. 10–21, 2007.

[2] Z. Yang, W. Wu, K. Nahrstedt, G. Kurillo, and R. Bajcsy, “Viewcast:

View dissemination and management for multi-party 3d tele-immersive

environments,” in ACM Multimedia, 2007.

[3] H. Baker, D. Tanguay, I. Sobel, D. Gelb, M. Goss, W. Culbertson, and

T. Malzbender, “The coliseum immersive teleconferencing system,”

Tech. Rep., HP Labs, 2002.

[4] M. Flierl and B. Girod, “Multiview video compression,” IEEE Signal

Processing Magazine, vol. 24, no. 6, pp. 66–76, 2007.

[5] A. Smolic and P. Kauff, “Interactive 3-d video representation and coding

technologies,” Proceedings of the IEEE, vol. 93, no. 1, pp. 98–110, 2005.

[6] C. Zhang and J. Li, “Interactive browsing of 3D environment over the

internet,” in Proc. SPIE VCIP, 2001.

Dept of E&C, VVIET 48 MYSORE

Page 49: Final Report

Multiview video compression and display

[7] C. Zhang and T. Chen, “A survey on image-based rendering – representation,

sampling and compression,” EURASIP Signal Processing: Image

Communication, vol. 19, no. 1, pp. 1–28, 2004.

[8] C.L. Zitnick, S.B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski,

“High-quality video view interpolation using a layered representation,”

in ACM SIGGRAPH, 2004.

[9] C. Buehler, M. Bosse, L. McMillan, S. J. Gortler, and M. F. Cohen,

“Unstructured lumigraph rendering,” in ACM SIGGRAPH, 2001.

[10] ITU-T Rec. H.264 / ISO/IEC 11496-10, “Advanced Video Coding”, Final

Committee Draft, Document JVT-

E022, September 2002

[11] I. Richardson, “Video CODEC Design”, John Wiley & Sons, 2002.

3 D. Marpe, G Blättermann and T Wiegand, “Adaptive Codes for H.26L”, ITU-T

SG16/6 document VCEG-L13,

Eibsee, Germany, January 2001

Dept of E&C, VVIET 49 MYSORE