Exploiting Temporal Redundancy of Visual Structures for Video Compression Georgios Georgiadis Stefano Soatto Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA {giorgos,soatto}@cs.ucla.edu We present a video coding system that partitions the scene into “visual structures” and a residual “background” layer. The system exploits the temporal redundancy of visual structures to compress video sequences. We construct a dictionary of track- templates, which correspond to a representation of visual structures. We subsequently choose a subset of the dictionary’s elements to encode video frames using a Markov Random Field (MRF) formulation that places the track-templates in “depth” layers (Fig. 1). Our video coding system o↵ers an improvement over H.265/H.264 and other methods in a rate-distortion comparison (Fig. 2) in four sequences from MOSEG 1 . + Input Visual Structures Layers (Excluding Background Layer) Reconstructed Visual Structures Input Frame Reconstructed Visual Structures Background Layer Reconstructed Frame ≈ = Figure 1: Reconstructing a frame. Visual structures are decomposed into depth layers and reconciled by overlaying them. The input frame is reconstructed by adding back to the visual structures the background layer. 0 50 100 150 200 25 30 35 40 45 Bit Rate (kbps) PSNR (dB) VS+h.265 h.265 VS+h.264 h.264 JPEG 0 50 100 150 200 25 30 35 40 45 Bit Rate (kbps) PSNR (dB) VS+h.265 h.265 VS+h.264 h.264 JPEG 0 50 100 150 200 25 30 35 40 45 Bit Rate (kbps) PSNR (dB) VS+h.265 h.265 VS+h.264 h.264 JPEG 0 50 100 150 200 25 30 35 40 45 Bit Rate (kbps) PSNR (dB) VS+h.265 h.265 VS+h.264 h.264 JPEG Figure 2: Typical results on four sequences (Cars-1,2, People-1,2). Top: PSNR against bit rate. “VS+H.265” (black) and “VS+H.264” (blue) outperform H.265 (yellow) and H.264 (red) respectively. Bottom: Propagated and newly-created tracks. Non-transparent tracks correspond to tracks that are motion-predicted from previous frames. Semi-transparent tracks are tracks that start in this frame. 1 http://lmb.informatik.uni-freiburg.de/resources/datasets/

Exploiting Temporal Redundancy of Visual Structures for

Download PDF Report

Upload
others
View
1
Download
0

Embed Size (px)

Citation preview

Exploiting Temporal Redundancy

of Visual Structures for Video Compression

Georgios Georgiadis Stefano Soatto

Department of Computer Science,

University of California, Los Angeles,

Los Angeles, CA 90095, USA

{giorgos,soatto}@cs.ucla.edu

We present a video coding system that partitions the scene into “visual structures”and a residual “background” layer. The system exploits the temporal redundancy ofvisual structures to compress video sequences. We construct a dictionary of track-templates, which correspond to a representation of visual structures. We subsequentlychoose a subset of the dictionary’s elements to encode video frames using a MarkovRandom Field (MRF) formulation that places the track-templates in “depth” layers(Fig. 1). Our video coding system o↵ers an improvement over H.265/H.264 and othermethods in a rate-distortion comparison (Fig. 2) in four sequences from MOSEG1.

Input Visual Structures Layers (Excluding Background Layer) Reconstructed

Visual Structures

Input Frame

Reconstructed Visual Structures

Background Layer

Reconstructed Frame

≈ =

Figure 1: Reconstructing a frame. Visual structures are decomposed into depth layers andreconciled by overlaying them. The input frame is reconstructed by adding back to thevisual structures the background layer.

0 50 100 150 200

Bit Rate (kbps)

PSNR

(dB)

VS+h.265h.265VS+h.264h.264JPEG

0 50 100 150 200

Bit Rate (kbps)

PSNR

(dB)

VS+h.265h.265VS+h.264h.264JPEG

0 50 100 150 200

Bit Rate (kbps)

PSNR

(dB)

VS+h.265h.265VS+h.264h.264JPEG

0 50 100 150 200

Bit Rate (kbps)

PSNR

(dB)

VS+h.265h.265VS+h.264h.264JPEG

Figure 2: Typical results on four sequences (Cars-1,2, People-1,2). Top: PSNR against bitrate. “VS+H.265” (black) and “VS+H.264” (blue) outperform H.265 (yellow) and H.264(red) respectively. Bottom: Propagated and newly-created tracks. Non-transparent trackscorrespond to tracks that are motion-predicted from previous frames. Semi-transparenttracks are tracks that start in this frame.

1http://lmb.informatik.uni-freiburg.de/resources/datasets/