21
Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund magnus.westerlund@ericsso n.com

Speech codecs and DCCP with TFRC VoIP mode

  • Upload
    rupali

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

Speech codecs and DCCP with TFRC VoIP mode. Magnus Westerlund [email protected]. Important Features of TFRC VoIP mode. Minimum packet interval 10 ms Packet rate is penalized: X = X * S_true / (S_true + H) H=40; Header size S_true is complete RTP packet size, i.e. RTP+Payload - PowerPoint PPT Presentation

Citation preview

Page 1: Speech codecs and DCCP with TFRC VoIP mode

Speech codecs and DCCP with TFRC VoIP mode

Magnus Westerlund

[email protected]

Page 2: Speech codecs and DCCP with TFRC VoIP mode

Important Features of TFRC VoIP mode

• Minimum packet interval 10 ms• Packet rate is penalized:

– X = X * S_true / (S_true + H)– H=40; Header size– S_true is complete RTP packet size, i.e. RTP+Payload

• Still TFRC and sending is delayed if not sufficient bit-rate available.

• Slow start of 4 packets, the size limitation is not an issue for the discussed codecs.

Page 3: Speech codecs and DCCP with TFRC VoIP mode

ReceiverSender

System overview

• Contributors to system delay are:– Sampling buffering– Encoding delay– Packetization delay– Transmission delay– Transport delay (Internet)– Receiver buffering delay– Decoding delay– Playout delay

• Sum of delays less than 200 ms for high quality conversational, less than 400 ms to be usable for conversational VoIP

Codec

MIC

Payload

Packetization

DCCP

Internet

Codec

Speaker

DCCP

Jitter

Buffer

Page 4: Speech codecs and DCCP with TFRC VoIP mode

Problems with TFRC style packet rate penalties

• Varying the packetization, directly affects the system delay seen at the receiver.

• Requires a jitter buffer that is capable of handling the increased or decreased system delay.

• Frequent changes will make it more problematic for adaptive buffers to correctly parameterize the jitter.

• Buffer under-runs needs to be handled with little impact on voice quality. Thus insertion of audio data or invoking of error concealment becomes required.

Page 5: Speech codecs and DCCP with TFRC VoIP mode

Speech and Audio Codecs with RTP Payload formats

• Narrowband codecs:– G.711 (PCMA or PCMU)– G.723– G.726– G.728– G.729– GSM– GSM-EFR– AMR– EVRC– SMV– QCELP– BroadVoice 16– iLBC

• Wideband codecs– AMR-WB

– VMR-WB

– BroadVoice 32

– G.722

• Variable sampling rate– DVI4

– VDVI

– L8

– L16

– PCMA

– PCMU

Page 6: Speech codecs and DCCP with TFRC VoIP mode

Codec and RTP payload properties

• Bit-rate of encoded content

• Sample or frame based

• Frame lengths: 2.5, 5, 10, 20, 30, etc. frame lengths in milliseconds

• Basically all payload formats supports aggregation, however some have modes where it is restricted.

Page 7: Speech codecs and DCCP with TFRC VoIP mode

DTX and Comfort Noise

• DTX is Discontinuous Transmission• Voice activity detector (VAD) detects if there is

active speech or not. • When there is no active speech different DTX

procedures can be used:– No Transmission at all– Comfort Noise (CN) using RFC 3389– Codec built CN in like AMR SID (Silence Descriptor)

• Frequency of Comfort Noise packets varies but is usually some fraction of normal packet rate

Page 8: Speech codecs and DCCP with TFRC VoIP mode

Sample based codecs

• Speech bandwidth depends on sampling rate.• Sample based, and can usually handle any number

of samples per packet.• Usually no adaptivity other than packetization.

Some can vary quantization, like G.726.• Bit-rate depends on sampling rate and sample

quantization. • Example: G.711 uses 8 bits per sample, and 8kHz

sampling. Resulting in 64 kbps audio data rate.• Comfort noise may be supported using RFC 3389.

Page 9: Speech codecs and DCCP with TFRC VoIP mode

AMR

• 3GPP defined, mandatory speech codec in UMTS 3G networks

• Narrowband codec (8kHz audio sampling rate)• Frame-based with 20ms frames• Multi-rate: has 8 encoding modes with bit-rate

between 12.2 and 4.75 kbps. • Has comfort noise generation (SID) and DTX.• The SID (Silence Descriptor) is sent in every 8th

frame and is 5 bytes in size.

Page 10: Speech codecs and DCCP with TFRC VoIP mode

EVRC and SMV• 3GPP2 defined, required in CDMA networks• Narrowband codecs (8kHz audio sampling rate)• Frame-based with 20 ms frames• Encodes at 3 (EVRC) or 4 (SMV) different rates, varying from

8.55 to 0.8 kbps depending on audio input. Thus highly variable packet sizes.

• The average bit-rate is dependent on codec modes, Each mode selects the used encoding rates differently to provide different average rates.

• Lacks DTX and needs to transmit all frames.• One mode in the payload format requires a single frame per

packet.

Page 11: Speech codecs and DCCP with TFRC VoIP mode

Broad Voice 16

• Broadcom defined coded, used in voice over cable• Narrowband codec (8kHz audio sampling rate)• Frame-based with 5ms frames, thus needing at

least 2 frames per packet aggregation for TFRC VoIP mode.

• No rate adaptation, fixed encoding at 16 kbps.• No built in comfort noise or DTX.

Page 12: Speech codecs and DCCP with TFRC VoIP mode

Broad Voice 32

• Broadcom defined coded, used in voice over cable • Wideband codec (16kHz audio sampling rate)• Frame-based with 5ms frames, thus needing at

least 2 frames per packet aggregation for TFRC VoIP mode.

• No rate adaptation, fixed encoding at 32 kbps.• No built in comfort noise or DTX.

Page 13: Speech codecs and DCCP with TFRC VoIP mode

AMR-WB

• 3GPP specified codec, mandatory in UMTS 3G if wideband supported

• Wideband codec (16kHz audio sampling rate)• Frame-based with 20ms frames• Multi-rate encoding at 9 different rates between

23.85 and 6.6 kbps• Has built in support for DTX and comfort noise

(SID)• SID (silence descriptor) is sent every 8th frame and

is 5 bytes in size

Page 14: Speech codecs and DCCP with TFRC VoIP mode

VMR-WB

• 3GPP2 defined• Wideband Codec (16kHz audio sampling rate)• Frame-based with 20 ms frames• Encodes using 4 different rates

(13.3-1.0 kbps)• Has compatibility mode with AMR-WB (12.6,

8.85, 6.60)• Has DTX mode

Page 15: Speech codecs and DCCP with TFRC VoIP mode

Summary of codecs

AMR EVRC SMV BV16 BV32 AMR-WB

VMR-WB

Sampling rate

8k 8k 8k 8k 16k 16k 16k

Frame size

20 20 20 5 5 20 20

Bit-rate (kbps)

4.75-12.2

0.8-8.8(4.2)

0.8-8.8 (4.2)

16 32 6.6-23.85

1.0-13.3

Runtime codec adaption

Y Y Y N N Y Y

DTX Y N N N N Y Y

Page 16: Speech codecs and DCCP with TFRC VoIP mode

The effects of codec bit-rate adaptation

• Reduction of codec bit-rate always means lower quality

• The actual switching does affect user perceived quality:– Codec transition effects (varying)– The change in quality can be noticeable

• Switching to higher codec rate may not improve user experience.– Flapping between modes can be more annoying than

constant lower quality

Page 17: Speech codecs and DCCP with TFRC VoIP mode

Other codec developments

• Audio encoding, rather than speech:– Greater bit-rate span 10-300 kbps

• Variable frame-rate, depending on codec mode (AMR-WB+), which is problematic in RTP

• Currently scalability is hot:– For audio, usually not speech

– MPEG is doing something

– European union research project assuming arbitrary truncation of packets

Page 18: Speech codecs and DCCP with TFRC VoIP mode

Effects of packetization• The AMR codec bit-rate

adaptation has less impact than the choice of packetization on total bandwidth.

• Calculated using IP (20) + DCCP (12) + RTP (12) headers for each packet

• Not unexpected considering that a speech frame including payload overhead is 13, 18 and 32 bytes.

Codec Mode

Frames per packet

Total(kbps)

4.75 3 11.2

6.7 3 13.2

4.75 2 14.2

6.7 2 16.2

12.2 3 18.8

12.2 2 21.8

4.75 1 23.2

6.7 1 25.2

12.2 1 30.8

Page 19: Speech codecs and DCCP with TFRC VoIP mode

ReceiverSender

System Delay Overview

• Contributors to system delay are:– Sampling buffering– Encoding delay– Packetization delay– Transmission delay– Transport delay (Internet)– Receiver buffering delay– Decoding delay– Playout delay

• Sum of delays less than 200 ms for high quality conversational, less than 400 ms to be usable for conversational VoIP

Codec

MIC

Payload

Packetization

DCCP

Internet

Codec

Speaker

DCCP

Jitter

Buffer

Page 20: Speech codecs and DCCP with TFRC VoIP mode

Delay and Robustness Effects

• Although it seems tempting to use 3 frames per packet to save bandwidth it will cost much delay.

• For optimal quality there is need to trade off quality reduction from lower bit-rate modes against the expected system delay.

• For a system which already have a big delay; reduce codec mode.

• For a system with small delays changing packetization to use more frames per packet can be done without much quality cost.

• More frames per packet also reduces robustness

Page 21: Speech codecs and DCCP with TFRC VoIP mode

Questions for future studies

• How hard is it to maintain an periodic transmission with TFRC VoIP mode? Otherwise it will introduce extra jitter, which requires more receiver buffering.

• What is the effects of DTX, like in the AMR case, where the packet rate drops to an 1/8th compared to active speech.