1 Introduction to MPEG Surround 韓志岡 2/9/2005. 2 Outline Background – Motivation – Perception of sound in space Pricicple of MPEG Surround – Downmixing

1

Introduction to MPEG Surround

韓志岡2/9/2005

2

Outline

Background– Motivation– Perception of sound in space

Pricicple of MPEG Surround– Downmixing to one channel– Estimation of spatial cues– Synthesis of spatial cues

Conclusions & Reference

3

Motivation

The vast majority of audio playback equipment use traditional two-channel presentations (stereo)

More reproduction channels (“multi-channel audio” or “surround sound”) is quite visible in the market place

A non-disruptive transition from stereo to multi-channel audio requires media formats that can serve both those using conventional stereo equipment and those using next-generation multi-channel equipment.

4

Perception of sound in space

HRTF(Head Related Transfer Function) modeling the path of sound from a source to the left and right ear entrances.

5

Perception of sound in space(cont.)

Three parameters(cues) describing how human localize sound in the horizontal plane:

– Interaural level difference (ILD)– Interaural time difference (ITD)– Interaural coherence (IC)

6

ITD (Interaural time difference) & ILD (Interaural level difference)

)(log201

210

12

dBa

aILD

ddITD

7

ITD (Interaural time difference) & ILD (Interaural level difference) (cont.)

ITD and ILD between a pair of headphone signals determine the location of the auditory event which appears in the frontal section of the upper head.

8

IC (Interaural coherence)

The spatial impression of the auditory enent is related to IC

9

Two sound source: Summing localization

Inter-channel time difference (ICTD) Inter-channel level difference (ICLD) Inter-channel coherence (ICC)

10

Two sound source: Summing localization (cont.)

11

MPEG Surround

MPEG Surround exploits inter-channel differences in level, phase and coherence equivalent to the ILD, ITD and IC cues to capture the spatial image of a multi-channel audio signal

Downmix signal and encodes these cues in a very compact form such that the cues and the transmitted signal can be decoded to synthesize a high quality multi-channel representation.

Provide backward compatibility with stereo/mono audio systems.

12

Coding Scheme

13

Downmixing to one channel (1/2)

The sum signal is generated by adding the input channels in a subband domain

Multiplying the sum with a factor in order to preserve signal power

C

cc kxkeks

1

)(~)()(~

)(

)(~

2

1

~~

1~

)()()()( kp

kp

x

C

cx x

C

c cx

ckekpkekp

14

Downmixing to one channel (2/2)

15

Estimation of spatial cues (1/4)

The spatial cues, ICTD, ICLD, and ICC are estimated in a subband domain. The spatial cue estimation is applied independently to each subband

16

Estimation of spatial cues(2/4)

ICTD (samples):with a short-time estimate of normalized cross-correlation function

where

and is a short-time estimate of the mean of

)},({maxarg)( 1212 kdk d

)()(),(

2~1~

),(~~

12

21

21

dkpdkp

pkd

xx

kdxx

}0,max{

}0,max{

2

1

dd

dd

),(21~~ kdp xx

)(~)(~ 2211 dkxdkx

17


ICLD (dB):

ICC :

)(

)(log10)(

1

2

~

~

1012 kp

kpkL

x

x

|),(|max)( 1212 kdkc d

18


For multi-channel audio signals, ICTD and ICLD are defined between the reference channel and each other C-1 channels

19

Synthesis of spatial cues(1/3)

ICTD are synthesized by imposing delays, ICLD by scaling, and ICC by applying de-correlation filters.

20


The delays are determined by the ICTDs

{cd ,)(

)),(min)((max

11

121221

dk

kk

c

lCllCl

.2

1

Cc

c

21


The scale factors are determined by the ICLDs satisfying:

After delays and scaling, we need to reduce correlation between the subbands.This is achieved by designing the filters hc controlled as a function of ICC.

20

)(

1

1

10kL

cc

a

a

22

Conclusions (1/2)

Well-known perceptual audio coders, such as MP3, primarily exploit a single channel’s ability to mask its own quantization noise.

In contrast, spatial perception is primarily attributed to three parameters : ILD, ITD, and IC.

23

Conclusions (2/2)

MPEG Surround provides an extremely efficient method for coding of multi-channel sound via the transmission of a compressed stereo (or even mono) audio program plus a low-rate side-information channel.

MPEG Surround is the latest technology for bitrate efficient and backward compatible presentation of multi-channel audio.

24

Reference

ISO/IEC JTC1/SC29/WG11 (MPEG), Document N7390, “Tutorial on MPEG Surround Audio Coding ”, July 2005, Poznan, Poland

C. Faller, “Parametric coding of spatial audio,” in Proc. DAFx (Digital Audio Effects), October 2004.

Documents

1 Introduction to MPEG Surround 韓志岡 2/9/2005. 2 Outline Background – Motivation – Perception of sound in space Pricicple of MPEG Surround – Downmixing