24
1 Introduction to MPEG Surround 韓韓韓 2/9/2005

1 Introduction to MPEG Surround 韓志岡 2/9/2005. 2 Outline Background – Motivation – Perception of sound in space Pricicple of MPEG Surround – Downmixing

  • View
    222

  • Download
    1

Embed Size (px)

Citation preview

1

Introduction to MPEG Surround

韓志岡2/9/2005

2

Outline

Background– Motivation– Perception of sound in space

Pricicple of MPEG Surround– Downmixing to one channel– Estimation of spatial cues– Synthesis of spatial cues

Conclusions & Reference

3

Motivation

The vast majority of audio playback equipment use traditional two-channel presentations (stereo)

More reproduction channels (“multi-channel audio” or “surround sound”) is quite visible in the market place

A non-disruptive transition from stereo to multi-channel audio requires media formats that can serve both those using conventional stereo equipment and those using next-generation multi-channel equipment.

4

Perception of sound in space

HRTF(Head Related Transfer Function) modeling the path of sound from a source to the left and right ear entrances.

5

Perception of sound in space(cont.)

Three parameters(cues) describing how human localize sound in the horizontal plane:

– Interaural level difference (ILD)– Interaural time difference (ITD)– Interaural coherence (IC)

6

ITD (Interaural time difference) & ILD (Interaural level difference)

)(log201

210

12

dBa

aILD

ddITD

7

ITD (Interaural time difference) & ILD (Interaural level difference) (cont.)

ITD and ILD between a pair of headphone signals determine the location of the auditory event which appears in the frontal section of the upper head.

8

IC (Interaural coherence)

The spatial impression of the auditory enent is related to IC

9

Two sound source: Summing localization

Inter-channel time difference (ICTD) Inter-channel level difference (ICLD) Inter-channel coherence (ICC)

10

Two sound source: Summing localization (cont.)

11

MPEG Surround

MPEG Surround exploits inter-channel differences in level, phase and coherence equivalent to the ILD, ITD and IC cues to capture the spatial image of a multi-channel audio signal

Downmix signal and encodes these cues in a very compact form such that the cues and the transmitted signal can be decoded to synthesize a high quality multi-channel representation.

Provide backward compatibility with stereo/mono audio systems.

12

Coding Scheme

13

Downmixing to one channel (1/2)

The sum signal is generated by adding the input channels in a subband domain

Multiplying the sum with a factor in order to preserve signal power

C

cc kxkeks

1

)(~)()(~

)(

)(~

2

1

~~

1~

)()()()( kp

kp

x

C

cx x

C

c cx

ckekpkekp

14

Downmixing to one channel (2/2)

15

Estimation of spatial cues (1/4)

The spatial cues, ICTD, ICLD, and ICC are estimated in a subband domain. The spatial cue estimation is applied independently to each subband

16

Estimation of spatial cues(2/4)

ICTD (samples):with a short-time estimate of normalized cross-correlation function

where

and is a short-time estimate of the mean of

)},({maxarg)( 1212 kdk d

)()(),(

2~1~

),(~~

12

21

21

dkpdkp

pkd

xx

kdxx

}0,max{

}0,max{

2

1

dd

dd

),(21~~ kdp xx

)(~)(~ 2211 dkxdkx

17

Estimation of spatial cues(3/4)

ICLD (dB):

ICC :

)(

)(log10)(

1

2

~

~

1012 kp

kpkL

x

x

|),(|max)( 1212 kdkc d

18

Estimation of spatial cues(4/4)

For multi-channel audio signals, ICTD and ICLD are defined between the reference channel and each other C-1 channels

19

Synthesis of spatial cues(1/3)

ICTD are synthesized by imposing delays, ICLD by scaling, and ICC by applying de-correlation filters.

20

Synthesis of spatial cues(2/3)

The delays are determined by the ICTDs

{cd ,)(

)),(min)((max

11

121221

dk

kk

c

lCllCl

.2

1

Cc

c

21

Synthesis of spatial cues(3/3)

The scale factors are determined by the ICLDs satisfying:

After delays and scaling, we need to reduce correlation between the subbands.This is achieved by designing the filters hc controlled as a function of ICC.

20

)(

1

1

10kL

cc

a

a

22

Conclusions (1/2)

Well-known perceptual audio coders, such as MP3, primarily exploit a single channel’s ability to mask its own quantization noise.

In contrast, spatial perception is primarily attributed to three parameters : ILD, ITD, and IC.

23

Conclusions (2/2)

MPEG Surround provides an extremely efficient method for coding of multi-channel sound via the transmission of a compressed stereo (or even mono) audio program plus a low-rate side-information channel.

MPEG Surround is the latest technology for bitrate efficient and backward compatible presentation of multi-channel audio.

24

Reference

ISO/IEC JTC1/SC29/WG11 (MPEG), Document N7390, “Tutorial on MPEG Surround Audio Coding ”, July 2005, Poznan, Poland

C. Faller, “Parametric coding of spatial audio,” in Proc. DAFx (Digital Audio Effects), October 2004.