Low-latency streaming of live- encoded and pre-stored video

HPL Low-latency Video Streaming Project Meeting

Feb. 20, 02

Low-latency streaming of live-encoded and pre-stored video

HP Low-latency Video Streaming Project

Outline

Latency in video streamingLong-term memory prediction and error-resilienceDelivery of live-encoded videoDelivery of pre-encoded videoExperimental resultsOpen issues and future work


Challenges for Low-latency Video Streaming

Undesirable latency in today’s video streaming - typical streaming system: large receiver buffer and retransmission (10-15 second latency)

Today’s Internet provides best-effort services with no QoS guarantee.Hybrid video codec: Inter frames predicted from a reference frame with MC; decoding depends on the reference

Goal of this work: better management of packet dependency to achieve higher error-resilience and eliminate the need for retransmission


LTM Prediction and Packet Dependency

Long-term Memory (LTM) prediction on Macroblock levelHigher coding efficiency [Wiegand, Zhang, Girod ‘99] Higher Error-resilience [Wiegand, Färber, Girod ‘00]

Reference Picture Selection (RPS) in Annex N of H.263+

NACK

In this work: Extended RPSDynamically manage packet dependency

LTM prediction on the frame levelPacketize one frame into one IP packet for transmission


Error Resilience vs. Coding Efficiency

P1

P2

P5

I

Different types of pictures (or prediction structure) provide different error-resilience, at the cost of coding efficiency.

230 frames of Foreman coded using H.26L TML8.5. Average PSNR=33.4dB

Extension of picture types:INTER frame: P -> P1Extended INTER: P2, P3, … PVINTRA: I


Optimal Reference Picture Selection

1

1.0

,,...2,1

)(

1

2 :outcomes ofnumber Max

3455

)(min arg)(

−

∞=

=

−+=

=+=

= ∑

fbd

Q

vVvopt

vvv

nL

lvlvlv

QQe

nJnvRDJ

DpD

λ

λ

Optimal reference picture is selected within a rate-distortion (RD) framework – minimal cost.


Live-encoding – Results (1)

.10.0,7,5 === pdV fb

Rate-distortion performance:


Live-encoding – Results (2)

.7,5 == fbdV

Foreman, distortion vs. channel loss rate.

.10.0,7 == pd fb

Foreman, distortion vs. length of LTM.


Cost of Error-resilience (1)

Error-resilience / low-latency is not free

35%14%37.843%20%35.9

39%17%33.4

Bitrate increase for 10% loss

Bitrate increase for

5% loss

PSNR (dB)

Distortion at the encoder.7,5 == fbdV


Cost of Error-resilience (2)

46%22%39.340%16%40.0

45%17%36.4

52%20%35.0

Bitrate increase for 10% loss

Bitrate increase for

5% loss

PSNR (dB)

Distortion at the encoder.7,5 == fbdV


Dynamic Bit-stream Assembly of Pre-encoded Video

MotivationLow complexity of the server – bit-stream assembly can be done at real-timePre-encoded and pre-stored copies of video streams benefit large number of users (at the cost of higher disk storage)

Challenges: mismatch between encoder and decoder

I I I I I I …S0S1

ENCODEDI P P P P P …

TRANSMITTED I P P I P P …

DECODED I P P I P P …

Previous work to solve the mismatch problem: S-frame [Färber, Girod ICIP’97 ]; SP-frame[H. 26L]- Both at the cost of higher bitrate


Layered Prediction Structure (1)

I I

I P5 P5 P5 P5 I

LAYER I

LAYER III P5 P5 P5 P5

I P5 P5 P5 P5I P5 P5 P5 …

I P5 P5 …

TGOP=25

V=5

(need TGOP/V versions)

2) Defines SGOP. Frames in Layer II only have two types: PV (predicted from previous PV or I) and I.SYNC-frame: Layer I and II frames, positioned at kV , where switching allowed.

1) I frames define GOPs, with max length TGOP;

P5 P5LAYER III3) Restriction: can only use previous frames in the same SGOP as a reference.


Layered Prediction Structure (2)

SYNC-frames: Pre-encode: TGOP/V versions encoded offline with (R,D) values saved; Transmit: assembly determined within an R-D framework, with feedback considered; requiring

fbdV ≥

Layer III: Pre-encode: frames are encoded offline with restricted OPTS, using binary tree structure; Transmit: the right version used according to the selected SYNC frame.


Schemes Compared

Proposed pre-encoding/dynamic assembly schemeLive-encoding with ORPS (baseline)Simple P-I with multiple versions of bit-stream, and with feedback

I P P P P P P P P P I P …I P P P P P P P P P I …

I P P P P P P P P P I …I P P P P P P P P P I …

I P P P P P P P P P I …


Pre-encoded – Results (1)

.10.0,5,5 === pdV fb

Rate-distortion performance:


Pre-encoded – Results (2)

Only one version of Layer III pictures stored, predicted from the leading I-frame.

.10.0,5,5 === pdV fb


Cost of Layered Coding Structure (1)

23%

25%

.0,5,5 === pdV fb

30%

Lossless channel

32%


Cost of Layered Coding Structure (2)

Channel loss rate=5%

.05.0,5,5 === pdV fb


Results – Video Sequence

Pre-encodedMother-Daughter 100kbps 33.72dB – OPTS 31.89dB – P/I

Live-encodedForeman 132kbps 32.20dB – ORPS 29.73dB – P/I


Conclusions

With ORPS and dynamic management of packet dependency, error-resilience is increasedThe need for retransmission is eliminated, which reduces latency from 10-15 second to several hundreds of millisecondsFor pre-stored video, mismatch can be solved by storing multiple versions of the pictures and the restricted prediction structureRestricted coding structure does not compromise RD performance in lossy channelsImproved RD performance by using OPTS


Future Work (1)

Study and tradeoff between latency and RD performance

Considering retransmission, LTM prediction, and FECRetransmission: highest RD efficiency at the cost of high delayLTM: lower RD efficiency, lowest delayFEC: lower RD efficiency, medium delay

Quantify and jointly optimize delay, rate and distortion


Future Work (2)

Extend the work using path diversityThe problem: given the bandwidth, loss probability (Gilbert model) of the multiple channels, find out the optimal picture type and the path to usePast related work:

Apostolopoulos et al., VCIP ‘01; INFOCOM ‘02

Lin et al., ICME ’01 (RPS on multiple paths)

Documents

Low-latency streaming of live- encoded and pre-stored video