Arko Barman Computer Vision & Artificial Intelligence Lab Department of Electrical Engineering Indian Institute of Science, Bangalore

Distributed Video Coding

Arko BarmanComputer Vision & Artificial Intelligence Lab

Department of Electrical Engineering

Indian Institute of Science, Bangalore

New paradigm for video compression

Based on results from Information Theory proposed in 1970s

A radical departure from traditional video compression techniques

Well suited for applications which require many encoders and a single decoder

Introduction

Low-complexity, low-power encoder

Possibly higher complexity decoder

Should achieve coding efficiency similar to that of conventional video compression techniques

Should try to achieve the Rate-Distortion performance of conventional schemes

Goals

Number of encoders usually much higher than number of decoders (usually one)

Partially overlapping areas in multiple video sequences

Should exploit correlation between multiple encoded video sequences at the decoder

Low-complexity encoders required Decoder may be of higher complexity

Applications

Wireless low-power surveillance networks

Applications

Wireless Mobile Video

Both encoder and decoder must be of low cost and low complexity

Encoder must be Wyner-Ziv for low complexity Decoder must be MPEG-x or H.26x for low

complexity A base station receives Wyner-Ziv encoded

bitstream from transmitter, decodes it, re-encodes it as MPEG-x or H.26x and transmits to receiver

Applications

Wireless Mobile Video

Applications

Multi-view Acquisition

Neighbouring cameras of a large camera-array capture overlapping, and hence, correlated video sequences

Independent encoding of videos in individual cameras

Joint decoding at a central station – must exploit correlation between different views

Used for image-based rendering (3D reconstruction with texture-mapping)

Applications

Multi-view Acquisition

Applications like tracking a person throughout an environment, monitoring of activities, tracking events and creating alarms

Multiple sensors with video-acquisition capabilities – must be of low-cost, low-power and low-complexity

Central decoding device with high computational capabilities and storage

Applications

Video-based Sensor Networks

Entropy:

where,

Joint Entropy:

Conditional Entropy:

Background

Information Theory Fundamentals

Lower bound on the bitrate of signals:

Lower bound on total bitrate:

Background

Information Theory Fundamentals

Background

Slepian-Wolf Theorem

Consider two statistically dependent sequences X and Y separately encoded but jointly decoded

Is it possible to recover these dependent sequences with arbitrarily low reconstruction error probability?

In 1973, Slepian and Wolf determined possible rate combinations of RX and RY for reconstruction of X and Y with an arbitrarily small error probability

These bounds are given by the conditional entropies of the signals X and Y, and their joint entropy

Background


The bounds on the rates are determined to be

Background


Even when encoding of correlated sources performed independently, a total bitrate equal to the joint entropy is enough

Theoretically, separate encoding in distributed video coding schemes does not need to have any loss in compression efficiency compared to conventional video coding techniques

Defines an achievable rate region for reconstruction of dependent sequences with arbitrarily small probability of error

Background


Background


Background

Wyner-Ziv Theorem

In 1976, Wyner and Ziv studied a special case of Slepian-Wolf coding corresponding to the rate point

Deals with source coding of a sequence X considering the sequence Y (known as side information) to be available at the decoder

Known as lossy compression with decoder side information

Background

Wyner-Ziv Theorem

Source values X encoded without access to side information Y

Decoder has access to Y and obtains a reconstruction of the source values

Distortion is acceptable Wyner-Ziv Rate-Distortion function is the

achievable lower bound for the bitrate for a distortion D

Background

Wyner-Ziv Theorem

Mathematically,

where is the minimum rate necessary to encode X when Y is available at the encoder i.e. statistical dependency between X and Y is utilized while encoding X, for an average distortion D.

Background

Wyner-Ziv Theorem

Note that for no distortion i.e. D=0, we get the same result as Slepian-Wolf Theorem i.e.

Inequality of Wyner-Ziv Theorem reduces to equality for Gaussian memory-less sources and mean squared error distortion function i.e.

Background

Wyner-Ziv Theorem

In 1996, Zamir proved that for general statistics and mean-squared error distortion function, the rate loss is less than 0.5 bits/sample i.e.

Combining with the Wyner-Ziv Theorem, we have

Background

Wyner-Ziv Theorem

The term ‘distributed’ refers to the encoding operation mode and not location

Coding of two or more dependent sources in an independent way i.e. associating a separate independent encoder to each source

Independent bitstream sent from each encoder – signals are encoded without exploiting the correlation between them

A single decoder performs joint decoding of all received bitstreams using statistical dependencies between them

Distributed Source Coding

Basic Architecture

Pixel-domain Codec

Quantizer divides signal space into cells May consist of non-contiguous sub-cells mapped

into same quantizer index Q Practical implementations of Lloyd Algorithm for

optimal vector quantizers lack in performance or are prohibitively complex

Unfortunately, code cell contiguity precludes optimality of quantizers in general

Basic Architecture

Quantization & Dequantization

Introduction of a rate measure that depends on both quantization index and side information divorces dimensionality of the quantizer from block length of Slepian-Wolf coder – fundamental requirement for practical system design

At high rates and certain other conditions, lattice quantizers are optimal for Wyner-Ziv Coding

Disconnected quantization cells need not be mapped into the same index

Asymptotically, there is no performance loss by not having access to the side information at the encoder

Basic Architecture

Quantization & Dequantization

Unconventional video coding system

Encodes individual frames independently, but decodes them conditionally

Only intra-frame processing required at encoder

Inter-frame processing only at decoder

Basic Architecture

Slepian-Wolf Encoder & Decoder

Previously decoded frames used as side information for decoding a Wyner-Ziv coded frame

Performance closer to conventional inter-frame coding (MPEG) than conventional intra-frame coding (Motion-JPEG)

Encoding may be in pixel domain or transform domain

Basic Architecture


Slepian-Wolf codec can be implemented using any of the following:

DISCUS (DIstributed Source Coding Using Syndromes)

Turbo codes, like RCPT (Rate-Compatible Punctured Turbo code)

LDPC (Low-Density Parity-Check) codes IRA (Irregular-Repeat-Accumulate) codes

Basic Architecture


Encoding Techniques

Pixel-domain Codec using RCPT

A subset of frames, regularly spaced in the video sequence, selected as keyframes, K

Keyframes are encoded and decoded using conventional intraframe 8x8 Discrete Cosine Transform (8x8 DCT)

Frames between keyframes are called “Wyner-Ziv frames”

Wyner-Ziv frames are intraframe-encoded but interframe-decoded

Encoding Techniques

Pixel-domain Codec

For each Wyner-Ziv frame, S, each pixel value is uniformly quantized with intervals

Subtractive dithering done to avoid contouring and improve subjective quality of reconstructed image

Sufficiently large block of quantizer indices q provided to Slepian-Wolf encoder

Encoding Techniques

Pixel-domain Codec

RCPT provides rate-flexibility Rate adapts to changing statistics between side

information and frame to be encoded In this system, rate of RCPT is chosen by decoder

and relayed to encoder through feedback For each Wyner-Ziv frame, decoder generates side

information, , by using previously decoded keyframes, and possibly previously decoded Wyner-Ziv frames

Encoding Techniques


To exploit side information, decoder assumes a statistical model of the ‘correlation channel’

Laplacian distribution of difference between individual pixel values S and is assumed

Decoder estimates parameter of Laplacian distribution by observing the statistics from previously decoded frames

Encoding Techniques


Turbo decoder combines side information and received parity bits to recover symbol stream

If decoder cannot reliably decode original symbols, it requests additional parity bits from encoder buffer through feedback

This “request-and-decode” process is repeated until an acceptable probability of symbol reconstruction error is achieved

Encoding Techniques


Using side information, decoder predicts the quantization bin q

For this, decoder needs to request bits to establish which of the bins a pixel belongs to

With calculated values of and , decoder calculates MMSE reconstruction

of the original frame, S

Encoding Techniques


If side information is within reconstructed bin , then reconstructed pixel takes a value close to side information value

Otherwise, is outside and the reconstruction function forces to lie within the bin

Magnitude of reconstruction error limited to a maximum value determined by quantizer coarseness – perceptually desirable property since it eliminates large errors, which might me annoying to the viewer

Encoding Techniques


Compared to conventional motion-compensated coding, pixel-domain WZ coding is much less complex

Motion estimation, prediction and DCT not required for encoding of WZ frames

Slepian-Wolf encoder requires two feedback shift registers and an interleaver

Encoding Techniques


Encoding Techniques

Transform-domain Codec

Block-wise DCT is applied to WZ frame W in the encoder to generate transformed signal X

Transform coefficients are grouped together to form coefficient bands , where k denotes the coefficient number

Each transform coefficient band is then encoded independently

Encoding Techniques

Transform-domain Encoding using RCPT

For each , coefficients are quantized using uniform scalar quantizer with levels

Quantized symbols, are converted to fixed-length binary codewords

Corresponding bitplanes are blocked together forming bit-plane vectors

Each bit-plane vector coded by Slepian-Wolf encoder

Encoding Techniques


Slepian-Wolf coder is implemented using RCPT RCPT, combined with feedback, provides rate

flexibility which is essential in adapting to changing statistics between side information and frame to be encoded

Parity bits produced by turbo encoder are stored in a buffer

Buffer transmits a subset of these parity bits to decoder on request

Encoding Techniques


Decoder takes previously reconstructed frames to form side information , an estimate of W

Block-wise DCT of is taken to generate Transform coefficients from are grouped together

to form coefficient bands (side information corresponding to )

To be able to use at turbo decoder and reconstruction block, a statistical dependency model is assumed between and

Encoding Techniques

Transform-domain Decoding using RCPT

Given a coefficient band, the turbo decoder successively decodes bit-planes starting from most significant bit-plane

Decoder uses received subset of parity bits corresponding to that bit-plane and side-information to decode current bit-plane

If decoder cannot reliably decode the bits, it requests additional parity bits from the encoder buffer through feedback

Encoding Techniques


This “request-and-decode” process continues until an acceptable probability of reconstruction error is achieved

Probabilities generated for current bit-plane are used for decoding lower significance bit-planes

By using side information and successively decoding bitplanes, decoder needs to request

bits to decode which of the bins a transform coefficient belongs to

Encoding Techniques


When all bitplanes are decoded, bits are regrouped and the quantized symbol stream is reconstructed as

Reconstructed coefficient band is calculated as Assuming is error free, this reconstruction

function bounds magnitude of reconstruction distortion to a maximum value depending on quantizer coarseness

Encoding Techniques


This property is desirable since it eliminates large positive or negative errors for a given transform coefficient

Fewer errors are perceptible to the viewer and subjective quality of reconstructed video is improved

Finally reconstructed WZ frame is generated by taking IDCT of the reconstructed coefficient bands

Encoding Techniques


Motion compensated side information is generated at the decoder

As a result, decoders are more complex than encoders

Here we consider every odd frame to be a keyframe and every even frame to be a WZ frame

Frames may or may not be decoded in their actual sequence (similar to conventional video coding techniques)

Side Information

Motion-Compensated Side Information

Side information for a WZ frame at time index t is generated by motion-compensated interpolation using decoded keyframes at time and

Involves symmetrical bi-directional block matching, smoothness constraints for estimated motion and overlapped block motion compensation

Since next keyframe is needed for interpolation, frames are decoded out-of-order (similar to B frames in predictive video coding)

Side Information

Motion-compensated Interpolation (MC-I)

To generate side information for WZ frame at time index t , we estimate motion between previously decoded WZ frame at time and previously decoded keyframe at time using block matching and a smoothness constraint

Estimated motion is extrapolated to time t and side information is generated by performing overlapped motion compensation using pixel values from previous key frame

Side Information

Motion-compensated Extrapolation (MC-E)

Since a previously decoded WZ frame is used for motion estimation, reconstruction errors from all the previously decoded WZ frames can accumulate and degrade the reliability of motion compensation

Unlike MC-I, all frames can be decoded sequentially

Side Information

Motion-compensated Extrapolation (MC-E)

Simplified interpolation or extrapolation scheme to reduce decoder complexity at the expense of reduced compression efficiency

1. Average Interpolation (Ave-I): Side information for WZ frame is generated by averaging pixel values from keyframes at and

2. Previous Frame Extrapolation (Prev-E): Previous keyframe is used directly as side information

Side Information

Low-complexity side information

Performance

B. Girod, A. Aaron, S. Rane, D. Rebollo-Monedero, “Distributed Video Coding” (Invited Paper), Proc. IEEE Special Issue on Advances in Video Coding and Delivery, 2005

Catarina Isabel Carvalheiro Brites, “Advances on Distributed Video Coding”, MSc. Thesis, Technical University of Lisbon, Institute of Superior Technology

A. Aaron, R.Zhang, B. Girod, “Wyner-Ziv Coding of Motion Video”, Asilomar Conference on Signals and Systems, Pacific Grove, CA, Nov 2002

A. Aaron, E. Setton, B.Girod, “Towards practical Wyner-Ziv Coding of Video”, Proc. IEEE International Conference on Image Processing, Barcelona, Spain, Sept 2003

A. Aaron, S. Rane, E. Setton, B. Girod, “Transform-domain Wyner-Ziv Codec for Video” in Proc. SPIE Visual Communications and Image Processing, San Jose, CA, Jan. 2004

References

Thank You

Documents

Arko Barman Computer Vision & Artificial Intelligence Lab Department of Electrical Engineering Indian Institute of Science, Bangalore