Upload
asha-rani
View
129
Download
3
Tags:
Embed Size (px)
Citation preview
Multiview video compression and display
ABSTRACT
To perceive three dimensionals, a person’s eyes must see different,slightly
unaligned images.In real word the spacing between the eyes makes that happen
naturally.One display somehow has to present a different and separate view to
each eye. Recent technological advances have made possible number of new
applications in the area of 3D video. One of the enabling technologies for many of
these 3D applications is multiview video coding. This project reveals signal
processing issues related to coded repesentation,reconstruction and rendering of
multiview video for 3D display using a Panda board.This technology sheds the
clunky,chunky 3-D eyeglasses which were used to view a 3D image. An
experimental analysis of multiview video compression for various temporal and
inter-view prediction structures is presented in this project. The compression
method is based on the multiple reference picture technique in the H.264/AVC
video coding standard. The idea is to exploit the statistical dependencies from
both temporal and inter-view reference pictures for motion-compensated
prediction.
Dept of E&C, VVIET 1 MYSORE
Multiview video compression and display
CONTENTS
Chapter 1: Introduction
1.1 Video compression
1.2 History of video compression standards
1.3 Literature survey
1.4 Motivation
1.5 Objective
Chapter 2: Overview of MVC
2.1 Rendering
2.2 Requirements of MVC
Chapter 3: Ubuntu
3.1 Introduction
3.2 Features
3.3 Sytem requirements
3.4 Variants
3.5 Terminal in Ubutu
Dept of E&C, VVIET 2 MYSORE
Multiview video compression and display
Chapter 1
Introduction
1.1 Video Compression
Video compression refers to reducing the quantity of data used to represent
digital video images, and is a combination of spatial image compression and
temporal motion compensation. Most video compression is lossy , it operates on
the premise that much of the data present before compression is not necessary for
achieving good perceptual quality. Video compression is a tradeoff between disk
space, video quality, and the cost of hardware required to decompress the video in
a reasonable time. However, if the video is over compressed in a lossy manner,
visible (and sometimes distracting) artifacts can appear.
Video data contains spatial and temporal redundancy. Similarities can thus be
encoded by merely registering differences within a frame (spatial), and/or
between frames (temporal). Spatial encoding is performed by taking advantage of
the fact that the human eye is unable to distinguish small differences in color as
easily as it can perceive changes in brightness, so that very similar areas of color
can be "averaged out”. With temporal compression only the changes from one
frame to the next are encoded as often a large number of the pixels will be the
same on a series of frames.
There are two types of video compression:
1. Lossless—Lossless compression preserves all the data, but makes it more
compact. The movie that comes out is exactly the same quality as what went in.
Lossless compression produces very high quality digital audio or video, but
requires a lot of data. The drawback with Lossless compression is that it is
inefficient when trying to maximize storage space or network and Internet
delivery capacity (bandwidth).
2. Lossy—Lossy compression eliminates some of the data. Most images and
sounds have more details than the eye and ear can discern. By eliminating some
Dept of E&C, VVIET 3 MYSORE
Multiview video compression and display
of these details, Lossy compression can achieve smaller files than Lossless
compression. However, as the files get smaller, the reduction in quality can
become noticeable. The smaller file sizes make Lossy compression ideal for
placing video on a CD-ROM or delivering video over a network or the Internet.
Most codec’s in use today are Lossy codec.
1.2 History of video compression standards
Year Standard Publisher Popular Implementations
1984 H.120 ITU-T
1990 H.261 ITU-TVideoconferencing, Video telephony
1993 MPEG-1 Part 2 ISO, IEC Video-CD
1995H.262/MPEG-2 Part 2
ISO, IEC, ITU-T
DVD Video, Blu-ray, Digital Video Broadcasting, SVCD
1996 H.263 ITU-TVideoconferencing, Video telephony, Video on Mobile Phones (3GP)
1999 MPEG-4 Part 2 ISO, IECVideo on Internet (DivX, Xvid ...)
2003H.264/MPEG-4 AVC
ISO, IEC, ITU-T
Blu-ray, Digital Video Broadcasting, iPod Video, HD DVD
2008 VC-2 (Dirac) ISO, BBCVideo on Internet, HDTV broadcast, UHDTV
Dept of E&C, VVIET 4 MYSORE
Multiview video compression and display
1.3 Literature survey:
3D video formats
3D depth perception of observed visual scenery can be provided by 3D display
systems which ensure that the user sees a specific different view with each eye.
Such a stereo pair of views must correspond to the human eye positions. Then the
brain can compute the 3D depth perception. History of 3D displays dates back
almost as long as classical 2D cinematography. In the past, users had to wear
specific glasses (anaglyph, polarization, or shutter) to ensure separation of left and
right view which was displayed simultaneously. Together with limited visual
quality this is regarded as main obstacle for wide success of 3D video systems in
home user environments.
1.3.1 Simulcast
The most obvious and straightforward means to represent stereo or multi-view
video is simulcast, where each view is encoded independent of the other. This
solution has low complexity since dependencies between views are not exploited,
thereby keeping computation and processing delay to a minimum. It is also a
backward compatible solution since one of the views could be decoded for legacy
2D displays. With simulcast, each view is assumed to be encoded with full spatial
resolution, however, here asymmetrical coding of stereo is done, whereby one of
the views is encoded with less quality, suggest that substantial savings in bit rate
for the second view could be achieved. In this way, one of the views is more
coarsely quantized than the other or coded with a reduced spatial resolution,
yielding an imperceptible impact on the stereo quality.
1.3.2 Stereo Interleaving
There is a class of formats for stereo content that we collectively refer to as stereo
interleaving. This category includes both time multiplexed and spatial multiplexed
in the time multiplexed format, the left and right views would be interleaved as
Dept of E&C, VVIET 5 MYSORE
Multiview video compression and display
alternating frames or fields. With spatial multiplexing, the left and right views
would appear in either a side-by-side or over/under format. As is often the case
with spatial multiplexing, the respective views are “squeezed” in the horizontal
dimension or vertical dimension to fit within the size of an original frame. To
distinguish the left and right views, some additional out-of-band signaling is
necessary. For instance, the H.264/AVC standard specifies a Stereo SEI message
that identifies the left view and right view; it also has the capability of indicating
whether the encoding of a particular view is self-contained, i.e., frame or field
corresponding to the left view are only predicted from other frames or fields in the
left view. Inter-view prediction for stereo is possible when the self-contained flag
is disabled. Similar type of signaling would be needed for the spatially
multiplexed content.
1.3.3 2D + Depth
Another well-known representation format is the 2D plus depth format. The
inclusion of depth enables a display independent solution for 3D that supports
generation of an increased number of views as need by any stereoscopic display.
A key advantage is that the main 2D video provides backward compatibility with
legacy devices. Also, this representation is agnostic of coding format, i.e., the
approach works with both MPEG-2 and H.264/AVC. ISO/IEC 23002-3 (also
referred to as MPEG-C Part 3) specifies the representation of auxiliary video and
supplemental information. In particular, it enables signaling for depth map
streams to support 3D video applications.
1.4 Motivation
Literature survey reveals that existing solutions to rendering a 3D view have
number of limitations like
1. Coding efficiency is not maximized since redundancy between views is not
exploited.
2. Above techniques are not backward compatible.
3. 2D+depth technique is only capable of rendering a limited depth range and has
Dept of E&C, VVIET 6 MYSORE
Multiview video compression and display
problems with occlusions.
Thus in order to exploit inters-image similarities and to overcome the above
listed limitations, an efficient algorithm must be developed.
1.5 Objective
The proposed technique in this project attempts to
1. Improve coding efficiency of multiview video.
2. Provide better results compared to simple AVC-based simulcast for the same
bit rate.
3. Provide backward compatibility.
We have used Linux platform for the implementatin purpose since the Panda
board runs on Linux Kernel.Chapter 3 provides a brief introduction to linux and
it’s usage.
Dept of E&C, VVIET 7 MYSORE
Multiview video compression and display
Chapter 2
Overview of MVC
2.1 Rendering
Multiview video rendering belongs to the broad research field of image-based
rendering , and has been studied extensively in the literature. In here, we focus on
one particular form of multiview video – multi-stereoscopic video with depth
maps. We assume we are given a number of video sequences captured from
different viewpoints.
Figure 2.1 The rendering process from multiview video.
In the following, we briefly describe the process of rendering an image from a
virtual viewpoint given the image set and the depth map. As shown in Fig 2.1,
given a virtual viewpoint, we first split the to be rendered view into light rays. For
each light ray, we trace the light ray to the surface of the depth map, obtain the
intersection, and re-project the intersection into nearby cameras The intensity of
the light ray is thus the weighted average of the projected light rays in Cam 3 and
Cam 4. The weight can be determined by many factors. In the simplest form, we
can use the angular difference between the light ray to be rendered and the light
ray being projected, assuming the capturing cameras are at roughly the same
distance to the scene objects. Care must be taken to perform such rendering, as
when the virtual viewpoint moves away from Cam 4 (where the depth map is
given), there will be occlusions and holes when computing the light ray/geometry
intersection. In our algorithm, we first convert the given depth map into a 3D
Dept of E&C, VVIET 8 MYSORE
Multiview video compression and display
mesh surface, where each vertex corresponds to one pixel in the depth map. The
mesh surface is then projected to the capturing cameras to compute any potential
occlusions in the captured images. Finally, the mesh is projected to the virtual
rendering point with multi-texture blending, similar to that in .For each vertex
being rendered, it is projected to the nearby captured images to locate the
corresponding texture coordinate. This process takes into consideration the
occlusion computed earlier. That is, if a vertex is occluded in a nearby view, its
weight for that camera will be set to zero.
With that information, multiple virtual rendering within the estimated range can
be conducted to compute a combined weight map for compression. In addition, if
the user’s viewpoint does not change significantly, we may achieve a similar
effect by simply smoothing the computed weight maps. During adaptive
multiview video compression, the weight map will be converted into a coarser
one for macroblock based encoding, which effectively smoothes the weight map
too.
2.2 MVC Requirements:
2.2.1 Compression Related Requirements
1. Compression efficiency
MVC shall provide high compression efficiency relative to independent coding of
each view of the same content. Some overhead, such as camera parameters, may
be necessary for facilitating view interpolation, i.e., trading coding efficiency for
functionality. However, the overhead data should be limited in order to increase
acceptance of new services.
2. View scalability
MVC shall support a scalable bit stream structure to allow for access of selected
views with minimum decoding effort. This enables the video to be displayed on a
multitude of different terminals and over networks with varying conditions.
Dept of E&C, VVIET 9 MYSORE
Multiview video compression and display
3. Free viewpoint scalability
MVC shall support a scalable bit stream structure to allow for access to partial
data from which new views can be generated, i.e., not the original camera views,
but the generated views from them. Such content can be delivered to various types
of displays. This enables the functionality of free viewpoint navigation on a
scalability basis.
4. Spatial/Temporal/SNR scalability
SNR scalability, spatial scalability, and temporal scalability should be supported.
5 .Backward compatibility
At any instant in time, the bitstream corresponding to one view shall be
conforming to AVC.
6 .Resource consumption
MVC should be efficient in terms of resource consumption, such as memory size,
memory bandwidth, and processing power.
7 .Low delay
MVC shall support low encoding and decoding delay modes. Low delay is very
important for the real-time applications such as a streaming and broadcasting
using multi-view video.
8 .Robustness
Robustness to errors, also known as error resilience, should be supported. This
enables the delivery of multiview video contents on error-prone networks, such as
wireless networks and other networks.
Dept of E&C, VVIET 10 MYSORE
Multiview video compression and display
9. Resolution, bit depth, chroma sampling format
MVC shall support spatial resolutions from QCIF to HD. MVC shall support the
YUV 4:2:0 format. MVC shall support 8 bits per pixel component. Future
applications may require higher bit depths and higher chroma sampling formats.
10 .Picture quality among views
MVC should enable flexible quality allocation over different views. For instance,
consistent quality might be required for some applications.
11. Temporal random access
MVC shall support random access in the time dimension. For example, it shall be
possible to access a frame at a given time with minimal decoding of frames in the
time dimension.
12. View random access
MVC shall support random access in the view dimension. For example, it shall be
possible to access a frame in a given view with minimal decoding of frames in the
view dimension.
13 .Spatial random access
MVC should support random access to a spatial area in a picture. This may be
treated as a view random access if a view is composed of several spatially smaller
views.
14. Resource management
MVC shall support efficient management of decoder resources. For instance, the
output timing of multiple pictures requires efficient management. Especially, the
pictures whose time stamps are the same with all views shall be available at the
same time or sequentially from a decoder.
Dept of E&C, VVIET 11 MYSORE
Multiview video compression and display
15. Parallel processing
MVC shall support parallel processing of different views or segments of the
multi-view video to facilitate efficient encoder and decoder implementations.
2.2.2 System Support Related Requirements
1. Synchronization
MVC shall support accurate temporal synchronization among the multiple views.
2. View generation
MVC should enable robust and efficient generation of virtual views or
interpolated views.
3 .Non-planar imaging and display systems
MVC should support efficient representation and coding methods for 3D display
including integral photography and non-planar image (e.g. dome) display systems.
4 .Camera parameters
MVC should support transmission of camera parameters.
Dept of E&C, VVIET 12 MYSORE
Multiview video compression and display
Block diagram of MVC system
The overall structure of MVC defining the interfaces is illustrated above.The
MVC encoder receives temporally synchronized video streams and generates one
video stream. The decoder receives the bit stream, decodes and provides separate
view to each eye.
The raw YUV 4:2:0 frames are provided as an input, they are encoded,
compressed using various algorithms of MVC.The output of encoder is given as
an input to decoder, where the frames are decompressed and decoded using
PANDA board which works on LINUX platform.The PANDA board is
interfaced with a suitable 3D device viz 3D-TV, 3D-mobile…
Dept of E&C, VVIET 13 MYSORE
Multiview video compression and display
CHAPTER 3
UBUNTU
3.1 INTRODUCTION
Ubuntu is a computer operating system based on the Debian Linux distribution
and distributed as free and open source software. It is named after the Southern
African philosophy of Ubuntu ("humanity towards others").
Ubuntu packages are based on packages from Debian's unstable branch: both
distributions use Debian's deb package format and package management tools
(APT and Synaptic). Debian and Ubuntu packages are not necessarily binary
compatible with each other, however, and sometimes .deb packages may need to
be rebuilt from source to be used in Ubuntu. Many Ubuntu developers are also
maintainers of key packages within Debian. Ubuntu cooperates with Debian by
pushing changes back to Debian, although there has been criticism that this does
not happen often enough. In the past, Ian Murdock, the founder of Debian, has
expressed concern about Ubuntu packages potentially diverging too far from
Debian to remain compatible.
3.2 FEATURES
Ubuntu is composed of many software packages, the vast majority of which are
distributed under a free software license. The only exceptions are some
proprietary hardware drivers. The main license used is the GNU General Public
License (GNU GPL) which, along with the GNU Lesser General Public License
(GNU LGPL), explicitly declares that users are free to run, copy, distribute, study,
change, develop and improve the software. On the other hand, there is also
proprietary software available that can run on Ubuntu. Ubuntu focuses on
usability, security and stability .The Ubiquity installer allows Ubuntu to be
installed to the hard disk from within the Live CD environment, without the need
for restarting the computer prior to installation. Ubuntu also emphasizes
Dept of E&C, VVIET 14 MYSORE
Multiview video compression and display
accessibility and internationalization to reach as many people as possible.
Beginning with 5.04, UTF-8 became the default character encoding, which allows
for support of a variety of non-Roman scripts. As a security feature, the sudo tool
is used to assign temporary privileges for performing administrative tasks,
allowing the root account to remain locked, and preventing inexperienced users
from inadvertently making catastrophic system changes or opening security holes.
PolicyKit is also being widely implemented into the desktop to further harden the
system through the principle of least privilege.
Ubuntu comes installed with a wide range of software that includes OpenOffice,
Firefox, Empathy (Pidgin in versions before 9.10), Transmission, GIMP (in
versions prior to 10.04), and several lightweight games (such as Sudoku and
chess). Additional software that is not installed by default can be downloaded and
installed using the Ubuntu Software Center or the package manager Synaptic,
which come pre-installed. Ubuntu allows networking ports to be closed using its
firewall, with customized port selection available. End-users can install Gufw
(GUI for Uncomplicated Firewall) and keep it enable GNOME (the current
default desktop) offers support for more than 46 languages. Ubuntu can also run
many programs designed for Microsoft Windows (such as Microsoft Office),
through Wine or using a Virtual Machine (such as VMware Workstation or
VirtualBox). For the upcoming 11.04 release, Canonical intends to drop the
GNOME Shell as the default desktop environment in favor of Unity, a graphical
interface it first developed for the notebook edition of ubuntu.
Ubuntu, unlike Debian, compiles their packages using gcc features such as PIE
and Buffer overflow protection to harden their software. These extra features
greatly increase security at the performance expense of 1% in 32 bit and 0.01% in
64 bit.
3.3 SYSTEM REQUIREMENTS
Dept of E&C, VVIET 15 MYSORE
Multiview video compression and display
The desktop version of Ubuntu currently supports the x86 32 bit and 64 bit
architectures. Unofficial support is available for the PowerPC, IA-64 (Itanium)
and PlayStation 3 architectures. A supported GPU is required to enable desktop
visual effects.
3.4 VARIANTS
The variants recognized by Canonical as contributing significantly towards the
Ubuntu project are the following:
Edubuntu: A GNOME-based subproject and add-on for Ubuntu, designed for
school environments and home users.
Kubuntu: A desktop distribution using the KDE Plasma Workspaces desktop
environment rather than GNOME.
Mythbuntu is designed for creating a home theater PC with Myth TV and uses the
Xfce desktop environment.
Ubuntu Studio: A distribution made for professional video and audio editing,
comes with higher-end free editing software and is a DVD .iso image unlike the
Live CD the other Ubuntu distributions use.
Xubuntu: A distribution based on the Xfce desktop environment instead of
GNOME, designed to run more efficiently on low-specification computers.
3.5 TO OPEN A TERMINAL IN UBUTU
All the commands in the Linux are typed in the terminal. To open the terminal go
to application in the toolbar, then accessories is selected then click on terminal, a
dialogue box appear which is shown in the figure 3.1
Dept of E&C, VVIET 16 MYSORE
Multiview video compression and display
Figure 3.1: To open a terminal in linux.
Table 3.1: Commands of Linux and their description
COMMANDS DESCRIPTION
cd filename Opens the specified directory
cd Desktop To open a folder on the Desktop
ls Gives the list of the files inside the folder
make clean Deletes the previously generated object files
make To build executable and Objective
Dept of E&C, VVIET 17 MYSORE
Multiview video compression and display
files
./configure To build configuration file
./filename.exe To run the executable files on linux
exit To close the terminal
gtkterm To open gtk for serial communication
CHAPTER 4
MULTIVIEW VIDEO CODING
4.1 SIMILARITIES IN TIME AND AMONG VIEWS
Exploiting similarities among the multi-view video images is the key to efficient
compression. When considering temporally successive images of one view
sequence, i.e. one row of the MOP, the same view-point is captured at different
time instances. Usually, the same objects appear in successive images but
possibly at different pixel locations. If so, objects are in motion and practical
compression schemes utilize motion compensation techniques to exploit these
temporal similarities. On the other hand, spatially neighboring views captured at
the same time instant, i.e., images in one column of the MOP, show the same
objects from different view-points. Similar to the previous case, the same objects
appear in neighboring views but at different pixel locations. Here, the objects in
each image are subject to parallax and practical compression schemes use
disparity compensation techniques to exploit these inter-view similarities.
4.1.1. Temporal Similarities
Consider temporally successive images of one view sequence, i.e., one row of the
MOP. If objects in the scene are subject to motion, the same objects appear in
Dept of E&C, VVIET 18 MYSORE
Multiview video compression and display
successive images but at different pixel locations. To exploit these temporal
similarities, sophisticated motion compensation techniques have been developed
in the past. Frequently used are so-called block matching techniques where a
motion vector establishes a correspondence between two similar blocks of pixels
chosen from two successive images. Practical compression schemes signal this
motion vectors to the decoder as part of the bit-stream. Variable block size
techniques improve the adaptation of the block motion held to the actual shape of
the object. Lately, so-called multi-frame techniques have been developed. Classic
block matching techniques use a single preceding image when choosing a
reference for the correspondence. Multi-frame techniques, on the other hand,
permit choosing the reference from several previously transmitted images; a
different image could be selected for each block. Finally, superposition
techniques are also used widely. Here, more than one correspondence per block of
pixels is specified and signaled as part of the bit-stream. A linear combination of
the blocks resulting from multiple correspondences is used to better match the
temporal similarities. A special example is the so-called bidirectionally predicted
picture where blocks resulting from two correspondences are combined. One
correspondence uses a temporally preceding reference; the other uses a temporally
succeeding reference. The generalized version is the so-called bi-predictive
picture. Here, two correspondences are chosen from an arbitrary set of available
reference images.
4.1.2. Inter-View Similarities
Consider spatially neighboring views captured at the same time instant, i.e.,
images in one column of the MOP. Objects in each image are subject to parallax
and appear at different pixel locations. To exploit these inter-view similarities,
disparity compensation techniques are used. The simplest approach to disparity
compensation is block matching techniques similar to those used for motion
compensation. These techniques offer the advantage of not requiring knowledge
of the geometry of the underlying 3D objects. However, if the cameras are
sparsely distributed, the block-based translatory disparity model fails to
Dept of E&C, VVIET 19 MYSORE
Multiview video compression and display
compensate accurately. More advanced approaches to disparity compensation are
depth-image-based rendering algorithms. They synthesize an image as seen from
a given view-point by using the reference texture and depth image as input data.
These techniques offer the advantage that the given view-point image is
compensated more accurately even when the cameras are sparsely distributed.
However, these techniques rely on accurate depth images, which are difficult to
estimate. Finally, hybrid techniques that combine the advantages of both
approaches may also be considered. For example, if the accuracy of a depth image
is not sufficient for accurate depth-image-based rendering, block-based
compensation techniques may be used on top for selective refinement.
4.2 COMPRESSION SCHEMES
The vast amount of multi-view data is a huge challenge not only for capturing and
processing but also for compression. Efficient compression exploits statistical
dependencies within the multi-view video imagery. Usually, practical schemes
accomplish this either with predictive coding or with sub band coding. In both
cases, motion compensation and disparity compensation are employed to make
better use of statistical dependencies. Note that predictive coding and sub band
coding have different constraints for efficient compression.
Predictive Coding
Predictive coding schemes encode multiview video imagery sequentially. Two
basic types of coded pictures are possible: intra and inter pictures. Intra pictures
are coded independently of any other image. Inter pictures, on the other hand,
depend on one or more reference pictures that have been encoded previously. By
design, an intra picture does not exploit the similarities among the multiview
images. But an inter picture is able to make use of these similarities by choosing
one or more reference pictures and generating a motion- and/or disparity-
compensated image for efficient predictive coding. The basic ideas of motion-
compensated predictive coding are summarized in the box “Motion-Compensated
Dept of E&C, VVIET 20 MYSORE
Multiview video compression and display
Predictive Coding.” When choosing the encoding order of images, various
constraints should be considered. For example, high coding efficiency
as well as good temporal multi resolution properties may be desirable.
Motion-compensated predictive coding of image sequences is accomplished with
intra and inter pictures. As depicted in Figure 4.1(a), the input image xk is
independently encoded into the intra picture IIk. The intra decoder is used to
independently reconstruct the image ˆxk. The input image xk is predicted by the
motion-compensated (MC) reference image ˆxr. The prediction error, also called
displaced frame difference (DFD), is encoded and constitutes, in combination
with the motion information, the inter picture Pk. The interpicture decoder
reverses this process but requires the same reference image ˆxr to be present at the
decoder side. If the reference picture differs at encoder and decoder sides, e.g.,
because of network errors, the
decoder is not able to reconstruct the same image ˆxk that the encoder has
encoded. Note that reference\ pictures can be either reconstructed intra pictures or
other reconstructed inter pictures.
Figure 4.1(b) shows the “basic” inter picture (predictive picture), which chooses
only one reference picture for compensation. More advanced are bipredictive
pictures that use a linear combination of two motion-compensated reference
pictures. Bidirectional motion-compensated prediction is a special case of
bipredictive pictures and is widely employed in standards like MPEG-1, MPEG-2,
and H.263.
Dept of E&C, VVIET 21 MYSORE
Multiview video compression and display
Fig 4.1: motion compensated predictive coding.
4.3 MVC ENCODING
The block diagram showing various steps in encoding are shown in the
Fig4.2 .The picture captured by various cameras are denoted as “view i picture”.
It is given as an input to the MVC encoder. The various steps in encoding are
described below.
.
Dept of E&C, VVIET 22 MYSORE
View i picture
Motion Compensation
Motion Estimation
Entropy Coding
Bitstream+ Quantization
IQuantization
+
Transform
ITransform
Deblocking Filter
Reference Picture
Store for View i
Reference Picture
Store for Other Views
Disparity/Illumination
Compensation
Disparity/Illumination Estimation
Intra Prediction
-
+
+
+
Mode Decision
Multiview video compression and display
Figure 4.2: Block diagram of MVC encoder
4.3.1 VIDEO FORMAT
YUV is a color space typically used as part of a color image pipeline. It encodes a
color image or video taking human perception into account, allowing reduced
bandwidth for chrominance components, thereby typically enabling transmission
errors or compression artifacts to be more efficiently masked by the human
perception than using a "direct" RGB-representation. Other color spaces have
similar properties, and the main reason to implement or investigate properties of
Y'UV would be for interfacing with analog or digital television or photographic
equipment that conforms to certain Y'UV standards. The raw YUV frames used
here is of format 4:2:0 i.e. for 4 Y components, one Cb and one Cr are transmitted
alternatively. This format is usually used in the video broadcasting because the
temporal and spatial resolutions are high.
4.3.2 TRANSFORM
The kind of transform used in the Multiview Video Compression is the DCT. A
discrete cosine transform (DCT) expresses a sequence of finitely many data points
in terms of a sum of cosine functions oscillating at different frequencies. The use
of cosine rather than sine functions is critical in these applications: for
compression, it turns out that cosine functions are much more efficient. The DCT is
applied on 8x8 block . The DCT equation (Eq.1)computes the i,j th
entry of the DCT of an image.
Dept of E&C, VVIET 23 MYSORE
Multiview video compression and display
p x, y th is the x,y th element of the image represented by the matrix p. N is the size
of the block that the DCT is done on. The equation calculates one entry (i,j th ) of
the transformed image from the pixel values of the original image matrix. For the
standard 8x8 block that JPEG compression uses, N equals 8 and x and y range
from 0 to 7. Therefore D i, j th would be as in Equation (3).
Because the DCT uses cosine functions, the resulting matrix depends on the
horizontal, diagonal, and vertical frequencies. Therefore an image black with a lot
of change in has a very random looking resulting matrix, while an image matrix of
just one color, has a resulting matrix of a large value for the first element and
zeroes for the other elements.
4.3.3 QUANTIZATION
Quantization is the process of mapping a large set of input values to a smaller set
– such as rounding values to some unit of precision. A device or algorithmic
function that performs quantization is called a quantizer. Quantization is involved
to some degree in nearly all digital signals processing, as the process of
representing a signal in digital form ordinarily involves rounding. Quantization
also forms the core of essentially all lossy compression algorithms.
Dept of E&C, VVIET 24 MYSORE
Multiview video compression and display
Because quantization is a many-to-few mapping, it is an inherently non-linear and
irreversible process (i.e., because the same output value is shared by multiple
input values, it is impossible in general to recover the exact input value when
given only the output value).
4.3.4 MOTION ESTIMATOR
Motion estimation is the process of determining motion vectors that describe the
transformation from one 2D image to another; usually from adjacent frames in a
video sequence. It is an ill-posed problem as the motion is in three dimensions but
the images are a projection of the 3D scene onto a 2D plane. The motion vectors
may relate to the whole image (global motion estimation) or specific parts, such
as rectangular blocks, arbitrary shaped patches or even per pixel. The motion
vectors may be represented by a translational model or many other models that
can approximate the motion of a real video camera, such as rotation and
translation in all three dimensions and zoom.
Closely related to motion estimation is optical flow, where the vectors correspond
to the perceived movement of pixels. In motion estimation an exact 1:1
correspondence of pixel positions is not a requirement.
Applying the motion vectors to an image to synthesize the transformation to the
next image is called motion compensation. The combination of motion estimation
and motion compensation is a key part of video compression as used by MPEG 1,
2 and 4 as well as many other video codec’s.
There are many types of motion estimation techniques. High efficiency is
achieved in EPZS.
The Enhanced Predictive Zonal Search (EPZS) for motion estimation.
EPZS, similar to other predictive algorithms, mainly comprises 3 steps. The initial
predictor selection, selects the best MV predictor from a set of potentially likely
predictors, the adaptive early termination allows the termination of the search at
given stages of the estimation if some rules are satisfied, while the prediction
Dept of E&C, VVIET 25 MYSORE
Multiview video compression and display
refinement, employs a refinement pattern around the best predictor to essentially
improve the final prediction. All these features are very vital to the performance
of EPZS algorithms.
4.3.5 MOTION COMPENSATION
Motion compensation is an algorithmic technique employed in the encoding of
video data for video .Motion compensation describes a picture in terms of the
transformation of a reference picture to the current picture. The reference picture
may be previous in time or even from the future. When images can be accurately
synthesized from previously transmitted/stored images, the compression
efficiency can be improved.
Motion compensation exploits the fact that, often, for many frames of a movie,
the only difference between one frame and another is the result of either the
camera moving or an object in the frame moving. In reference to a video file, this
means much of the information that represents one frame will be the same as the
information used in the next frame. This is called Spatial Redundancy. Detailed
explanation of motion compensation is given in section 2.2
4.3.6 DEBLOCKING FILTER
A deblocking filter is applied to blocks in decoded video to improve visual quality
and prediction performance by smoothing the sharp edges which can form
between macro blocks when block coding techniques are used. The filter aims to
improve the appearance of decoded pictures.
In H.264 deblocking filter is not an optional additional feature in the decoder. It is
a feature on both the decoding path and on the encoding path, so that the in-loop
effects of the filter are taken into account in reference macro blocks used for
prediction. When a stream is encoded, the filter strength can be selected, or the
filter can be switched off entirely. Otherwise, the filter strength is determined by
Dept of E&C, VVIET 26 MYSORE
Multiview video compression and display
coding modes of adjacent blocks, quantization step size, and the steepness of the
luminance gradient between blocks.
The filter operates on the edges of each 4×4 or 8×8 transform block in the luma
and chroma planes of each picture. Each small block's edge is assigned a
boundary strength based on whether it is also a macro block boundary, the coding
(intra/inter) of the blocks, whether references (in motion prediction and reference
frame choice) differ, and whether it is a luma or chroma edge. Stronger levels of
filtering are assigned by this scheme where there is likely to be more distortion.
The filter can modify as many as three samples on either side of a given block
edge (in the case where an edge is a luma edge that lies between different macro
blocks and at least one of them is intra coded). In most cases it can modify one or
two samples on either side of the edge (depending on the quantization step size,
the tuning of the filter strength by the encoder, the result of an edge detection test,
and other factors). one or more reference pictures and generating a motion and/or
disparity compensated image for efficient predictive coding.
4.3.7 ENTROPY ENCODER
An entropy encoding is a lossless data compression scheme that is independent of
the specific characteristics of the medium.
One of the main types of entropy coding creates and assigns a unique prefix-free
code to each unique symbol that occurs in the input. These entropy encoders then
compress data by replacing each fixed-length input symbol by the corresponding
variable-length prefix-free output codeword. The length of each codeword is
approximately proportional to the negative logarithm of the probability.
Therefore, the most common symbols use the shortest codes.
Dept of E&C, VVIET 27 MYSORE
Multiview video compression and display
There are two types of Entropy encoding
1. CABAC (Context Based Adaptive Binary Arithmetic Coding).
2. CAVLC (Context Based Adaptive Variable Length Coding).
Context-Based Adaptive Binary Arithmetic Coding (CABAC)
The arithmetic coding scheme selected for H.264, Context-based Adaptive
Binary Arithmetic Coding or CABAC [3], achieves good compression
performance through
(a) Selecting probability models for each syntax element according to the
element’s context,
(b) Adapting probability estimates based on local statistics and (c) using
arithmetic coding.
Coding a data symbol involves the following stages.
1. Binarization: CABAC uses Binary Arithmetic Coding which means that only
binary decisions (1 or 0) are encoded. A non-binary-valued symbol (e.g. a
transform coefficient or motion vector) is “binarized” or converted into a binary
code prior to arithmetic coding. This process is similar to the process of
converting a data symbol into a variable length code but the binary code is further
encoded (by the arithmetic coder) prior to transmission.
Stages 2, 3 and 4 are repeated for each bit (or “bin”) of the binarized symbol.
2. Context model selection: A “context model” is a probability model for one or
more bins of the binarized symbol. This model may be chosen from a selection of
available models depending on the statistics of recently-coded data symbols. The
context model stores the probability of each bin being “1” or “0”.
3. Arithmetic encoding: An arithmetic coder encodes each bin according to the
selected probability model. Note that there are just two sub-ranges for each bin
(corresponding to “0” and “1”).
Dept of E&C, VVIET 28 MYSORE
Multiview video compression and display
4. Probability update: The selected context model is updated based on the actual
coded value (e.g. if the bin value was “1”, the frequency count of “1”s is
increased).
4.3.8 MODE DECISION
A low complexity mode decision algorithm is proposed to reduce complexity of
ME and DE. An experimental analysis is performed to study inter-view
correlation in the coding information such as the prediction mode and rate
distortion (RD) cost. Based on the correlation, we propose four efficient mode
decision techniques, including early SKIP mode decision, adaptive early
termination, fast mode size decision and selective intra coding in inter frame.
Experimental results show that the proposed algorithm can significantly reduce
computational complexity of MVC while maintaining almost the same RD
performance.
4.4 MVC DECODER
The exact reverse process of encoder takes place in decoder.The block diagram
of MVC decoder is shown in Figure 4.3.
Dept of E&C, VVIET 29 MYSORE
Multiview video compression and display
Fig 4.3: MVC decoder.
Coded bitstream is applied to the entropy decoder then the decoded bits are
subjected to inverse quantization ad inverse transformation to get the decoded
YUV.
There are two ways of decoding, it can be from intra prediction and inter
prediction. Intra pictures are coded independently, where as the inter pictures
depend on one or more reference pictures that have been decoded previously. By
design, an intra picture does not exploit the similarities among the multi-view
images. But an inter picture is able to make use of these similarities by choosing
Dept of E&C, VVIET 30 MYSORE
Multiview video compression and display
one or more reference pictures and generating a motion and/or disparity
compensated image for efficient predictive coding.
The signal obtained by the inverse quantization and inverse DCT transform is
summed with output of intra prediction or the inter prediction. The mode is the
algorithm based switches used to select either inter or intra prediction signals. The
summed signals are given to the de-blocking filter. The de-blocking filter is
applied to blocks in decoded video to improve visual quality an prediction
performance by smoothing the sharp edges which can form between macro blocks
when block coding techniques are used. The filter aims to remove discontinuities
in the picture block. The filter output is now stored in the picture memory for the
further computation. The reference pictures stored in picture memory are pointed
by thr reference picture index obtained by the entropy decoder.
The decoded and reconstructed signals are finally obtained from the de-blocking
filter.
In this Chapter discussion was done on the coding and decoding of the YUV
frames. Next chapter reveals about experimentation and the test results in the
Linux platform
Dept of E&C, VVIET 31 MYSORE
Multiview video compression and display
4.5 Flowchart
1.MVC encoder
Dept of E&C, VVIET 32 MYSORE
Multiview video compression and display
2.MVC Decoder
Dept of E&C, VVIET 33 MYSORE
Multiview video compression and display
CHAPTER 5
EXPERIMENTATION AND RESULTS
5.1 EXPERIMENTATION ON LINUX PLATFORM (UBUNTU)
Step 1: CROSS COMPILATION
The cross compilation is done by pointing CC in the make file to the gnu
arm tool chain.
Step2: If any objective files are created then they are deleted using the command
make clean
Step3: To build objective and executable files make command is used.
Step4: The required executable files, input and the configuration files are copied
to the SD card.
Step5: The SD card is inserted to the PANDA board,5V power supply is given to
the board and the serial port of the computer is connected to the Panda board,
open gtkterm window to communicate with that of a serial port.
Step6:The baud rate is set to maximum. We have used a baud rate of 115200.
Step7: Panda board gets booted by 5V power supply, and then the executable
files are made to run on the Panda board using the command ./filename.exe.
Step8: The output obtained is verified and compression ratio is calculated.
Dept of E&C, VVIET 34 MYSORE
Multiview video compression and display
Figure 5.1: showing Step 2 and 3.
Dept of E&C, VVIET 35 MYSORE
Multiview video compression and display
5.2 TEST RESULTS
TEST-1
Number of frames to be coded:3
Output of the encoder:
Parsing
Configfile
encoder_stereo.cfg.....................................................................................................
....................................................................................................................................
....................................................................................................................................
..................................
Warning: Hierarchical coding or Referenced B slices used.
Make sure that you have allocated enough references
in reference buffer to achieve best performance.
------------------------------- JM 17.2 (FRExt) -------------------------------
Input YUV file : left_432x240.yuv
Input YUV file 2 : right_432x240.yuv
Output H.264 bitstream : test.264
Output YUV file : test_rec.yuv
Output YUV file 2 : test_rec2.yuv
YUV Format : YUV 4:2:0
Frames to be encoded : 3
Freq. for encoded bitstream : 30.00
PicInterlace / MbInterlace : 0/0
Dept of E&C, VVIET 36 MYSORE
Multiview video compression and display
Transform8x8Mode : 1
ME Metric for Refinement Level 0 : SAD
ME Metric for Refinement Level 1 : Hadamard SAD
ME Metric for Refinement Level 2 : Hadamard SAD
Mode Decision Metric : Hadamard SAD
Motion Estimation for components : Y
Image format : 320x240 (320x240)
Error robustness : Off
Search range : 32
Total number of references : 5
References for P slices : 5
References for B slices (L0, L1) : 5, 1
Sequence type : Hierarchy (QP: I 28, P 28, B 30)
Entropy coding method : CABAC
Profile/Level IDC : (128,40)
Motion Estimation Scheme : EPZS
EPZS Pattern : Extended Diamond
EPZS Dual Pattern : Extended Diamond
EPZS Fixed Predictors : All P + B
EPZS Temporal Predictors : Enabled
EPZS Spatial Predictors : Enabled
Dept of E&C, VVIET 37 MYSORE
Multiview video compression and display
EPZS Threshold Multipliers : (1 0 2)
EPZS Subpel ME : Basic
EPZS Subpel ME BiPred : Basic
Search range restrictions : none
RD-optimized mode decision : used
Data Partitioning Mode : 1 partition
Output File Format : H.264/AVC Annex B Byte Stream Format
------------------------------------------------------------------------------------
Frame View Bit/pic QP SnrY SnrU SnrV Time(ms) MET(ms) Frm/Fld
Ref
------------------------------------------------------------------------------------
00000(NVB) 480
00000(IDR) 0 189936 28 36.814 35.359 35.318 1549 0 FRM 3
00000( P ) 1 135344 28 35.293 39.691 38.779 2384 344 FRM 2
00002( P ) 0 112176 28 37.830 35.056 34.754 1995 361 FRM 2
00002( P ) 1 91032 28 40.726 35.247 34.447 1761 512 FRM 2
00001( B ) 0 147672 30 33.395 31.602 31.631 3232 1084 FRM 0
00001( B ) 1 115024 30 34.741 32.077 33.265 3225 1259 FRM 0
-------------------------------------------------------------------------------
Total Frames: 6
Leaky BucketRateFile does not have valid entries.
Dept of E&C, VVIET 38 MYSORE
Multiview video compression and display
Using rate calculated from avg. rate
Number Leaky Buckets: 8
Rmin Bmin Fmin
3955920 193416 193416
4944900 189936 189936
5933880 189936 189936
6922860 189936 189936
7911840 189936 189936
8900820 189936 189936
9889800 189936 189936
10878780 189936 189936
------------------ Average data all frames -----------------------------------
Total encoding time for the seq. : 14.148 sec (0.42 fps)
Total ME time for sequence : 3.563 sec
Y { PSNR (dB), cSNR (dB), MSE } : { 36.467, 35.888, 16.76049 }
U { PSNR (dB), cSNR (dB), MSE } : { 34.839, 34.125, 25.15187 }
V { PSNR (dB), cSNR (dB), MSE } : { 34.699, 34.205, 24.69435 }
View0_Y { PSNR (dB), cSNR (dB), MSE } : { 36.013, 35.577, 18.00490 }
View0_U { PSNR (dB), cSNR (dB), MSE } : { 34.006, 33.649, 28.06361 }
View0_V { PSNR (dB), cSNR (dB), MSE } : { 33.901, 33.580, 28.51354 }
Dept of E&C, VVIET 39 MYSORE
Multiview video compression and display
View1_Y { PSNR (dB), cSNR (dB), MSE } : { 36.920, 36.223, 15.51609 }
View1_U { PSNR (dB), cSNR (dB), MSE } : { 35.671, 34.659, 22.24014 }
View1_V { PSNR (dB), cSNR (dB), MSE } : { 35.497, 34.935, 20.87516 }
Total bits : 791664 (I 189936, P 338552, B 262696 NVB 480)
View 0 Total-bits : 450104 (I 189936, P 112176, B 147672 NVB 320)
View 1 Total-bits : 341560 (I 0, P 226376, B 115024 NVB 160)
Bit rate (kbit/s) @ 30.00 Hz : 7916.64
View 0 BR (kbit/s) @ 30.00 Hz : 4501.04
View 1 BR (kbit/s) @ 30.00 Hz : 3415.60
Bits to avoid Startcode Emulation : 28
Bits for parameter sets : 480
Bits for filler data : 0
real 0m0.271s
user 0m0.212s
sys 0m0.056s
------------------------------------------------------------------------------
Dept of E&C, VVIET 40 MYSORE
Multiview video compression and display
OUTPUT OF DECODER
Input H.264 bitstream : test.264
Output decoded YUV : test_dec.yuv
Input reference file : test_rec.yuv
POC must = frame# or field# for SNRs to be correct
--------------------------------------------------------------------------
Frame POC Pic# QP SnrY SnrU SnrV Y:U:V Time(ms)
--------------------------------------------------------------------------
00000(IDR) 0 0 28 0.0000 0.0000 0.0000 4:2:0 24
00000( P ) 0 0 28 13.8138 16.1082 15.1999 4:2:0 16
00002( P ) 4 1 28 0.0000 0.0000 0.0000 4:2:0 15
00002( P ) 4 1 28 18.0149 15.3684 14.0144 4:2:0 13
00001( b ) 2 2 30 0.0000 0.0000 0.0000 4:2:0 17
00001( b ) 2 2 30 15.6719 13.6850 13.0119 4:2:0 15
-------------------- Average SNR all frames ------------------------------
SNR Y(dB) : 7.92
SNR U(dB) : 7.53
SNR V(dB) : 7.04
Total decoding time : 0.102 sec (58.824 fps)[6 frm/102 ms]
--------------------------------------------------------------------------
Exit JM 17 (FRExt) decoder, ver 17.2
Output status file : log.dec
real 0m0.870s
Dept of E&C, VVIET 41 MYSORE
Multiview video compression and display
user 0m0.228s
sys 0m0.044s
Similarly the encoder and decoder were successfully tried with 100 and 135
frames respectively.
The results obtained in various trials can be tabulated as below
Table 5.1: Experimental results
Number of frames to be coded 3 100 135
Input YUV 14.8 MB 14.8 MB 14.8 MB
Test.264 97KB 1.6MB 2.1MB
Reconstructed YUV(output of
encoder)
338KB 11 MB 14.8 MB
Decoded YUV(output of decoder) 338KB 11 MB 14.8 MB
Compression Ratio .6 5.4 7.09
Real .870s 12.25s 17.23s
User .228s 9.16s 13.16s
System .044s .83s 1.29s
As seen from the table 5.1 greater the number of frames to be ccoded more is the
time taken for execution. Trial 3 i.e. coding all the frames of the given view has
got higher compression ratio and note that the input YUV, Reconstructed YUV
and Decoded YUV are having same size. Thus revealing encoding and decoding
are done effectively.
The application and future enhancement of proposed technique is discussed in the
following chapter.
Dept of E&C, VVIET 42 MYSORE
Multiview video compression and display
Application and Future enhancement
1. Application
1. Free view point television
2. The 3D technique using a cellophane sheet was applied to a laparoscope
in order to expand the limited viewing capability of this minimum
invasive surgical device. A unique feature of this 3D laproscope is that it
includes a virtual ruler to measure distances without physically touching
affected areas.
3. 3D games designed using MVC can draw the world at any angle and can
have the player walk in any increment of steps they choose.
4. Immersive teleconference.
5. 3d-mobiles
6. 3d-Television
2.Future Enhancement
As we saw from the statistics in chapter 5,the time reqired for coding is
high.MVC can be enhanced by minimizing time required for encoding and
decoding such that it can be used for real time application.
Dept of E&C, VVIET 43 MYSORE
Multiview video compression and display
CONCLUSION
The presented prediction structures for multi-view video coding are based on the
fact that multiple video bit-streams, showing the same scene from different
camera perspectives, show significant inter-view statistical dependencies. The
corresponding evaluation pointed out, that these correlations can be exploited for
efficient coding of multi-view video data. The multiview prediction structures
have the advantage of achieving significant coding gains and being highly flexible
regarding their adaptation to all kinds of spatial and temporal setups at the same
time. These prediction structures for multi-view video coding are very similar to
H.264/AVC and require only very minor syntax changes. Regarding coding
efficiency, Coding gains up to 3.2 dB and an average gain of 1.5 dB could be
achieved.
Dept of E&C, VVIET 44 MYSORE
Multiview video compression and display
APPENDICES
TECHNICAL SPECIFICATIONS OF PANDA BOARD
General
Low-cost mobile software development platform
1080p video, WLAN, Bluetooth & more
Dual core ARM CortexTM-A9 MPCore benefits
Community-driven projects & support
Display
HDMI v1.3 Connector (Type A) to drive HD displays
Dept of E&C, VVIET 45 MYSORE
Multiview video compression and display
DVI-D Connector (can drive a 2nd display, simultaneous display;
requires HDMI to DVI-D adapter)
LCD expansion header
Camera
Camera connector
Audio
3.5" Stereo Audio in/out
HDMI Audio out
Wireless Connectivity
802.11 b/g/n (based on WiLink 6.0)
Bluetooth v2.1 + EDR (based on WiLink 6.0)
Memory
1 GB low power DDR2 RAM
Full size SD/MMC card cage with support for High-Speed & High-
Capacity SD cards
Connectivity
Onboard 10/100 Ethernet
Expansion
1x USB 2.0 High-Speed On-the-go port
2x USB 2.0 High-Speed host ports
General purpose expansion header (I2C, GPMC, USB, MMC, DSS,
ETM)
Dept of E&C, VVIET 46 MYSORE
Multiview video compression and display
Camera expansion header
Debug Board
10/100 BASE-T Ethernet (RJ45 connector)
Mini-AB USB port (For debug UART connectivity)
60-pin MIPI Debug expansion connector
Debug LED
1 GPIO Button
Dimensions
Height: 4.5" (114.3 mm)
Width: 4.0" (101.6 mm)
Weight: 2.6 oz (74 grams)
PandaBoard component
Function Vendor Part ID
Application Processor TI OMAP4430
Memory Elpida EDB8064B1PB-8D-F
Power Management IC TI TWL6030
Audio IC TI TWL6040
Connectivity LSR LS240-WI-01-A20
4 Port USB Hub/Ethernet SMSC LAN9514-JZX
DVI Transmitter TI TFP410PAP
Dept of E&C, VVIET 47 MYSORE
Multiview video compression and display
Function Vendor Part ID
3.5 MM Dual Stacked Audio KYCON STX-4235-3/3-N
Bibliography
[1] A. Kubota, A. Smolic, M. Magnor, M. Tanimoto, T. Chen, and C. Zhang,
“Multi-view imaging and 3dtv,” IEEE Signal Processing Magazine, vol.
24, no. 6, pp. 10–21, 2007.
[2] Z. Yang, W. Wu, K. Nahrstedt, G. Kurillo, and R. Bajcsy, “Viewcast:
View dissemination and management for multi-party 3d tele-immersive
environments,” in ACM Multimedia, 2007.
[3] H. Baker, D. Tanguay, I. Sobel, D. Gelb, M. Goss, W. Culbertson, and
T. Malzbender, “The coliseum immersive teleconferencing system,”
Tech. Rep., HP Labs, 2002.
[4] M. Flierl and B. Girod, “Multiview video compression,” IEEE Signal
Processing Magazine, vol. 24, no. 6, pp. 66–76, 2007.
[5] A. Smolic and P. Kauff, “Interactive 3-d video representation and coding
technologies,” Proceedings of the IEEE, vol. 93, no. 1, pp. 98–110, 2005.
[6] C. Zhang and J. Li, “Interactive browsing of 3D environment over the
internet,” in Proc. SPIE VCIP, 2001.
Dept of E&C, VVIET 48 MYSORE
Multiview video compression and display
[7] C. Zhang and T. Chen, “A survey on image-based rendering – representation,
sampling and compression,” EURASIP Signal Processing: Image
Communication, vol. 19, no. 1, pp. 1–28, 2004.
[8] C.L. Zitnick, S.B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski,
“High-quality video view interpolation using a layered representation,”
in ACM SIGGRAPH, 2004.
[9] C. Buehler, M. Bosse, L. McMillan, S. J. Gortler, and M. F. Cohen,
“Unstructured lumigraph rendering,” in ACM SIGGRAPH, 2001.
[10] ITU-T Rec. H.264 / ISO/IEC 11496-10, “Advanced Video Coding”, Final
Committee Draft, Document JVT-
E022, September 2002
[11] I. Richardson, “Video CODEC Design”, John Wiley & Sons, 2002.
3 D. Marpe, G Blättermann and T Wiegand, “Adaptive Codes for H.26L”, ITU-T
SG16/6 document VCEG-L13,
Eibsee, Germany, January 2001
Dept of E&C, VVIET 49 MYSORE