57
Science Meets Arts: An Introduction to Perceptual Audio Coding EQ2460: Seminars in Wireless Systems February 22, 2017 KTH, Stockholm Erik Björkman, M.Sc. EE Research Engineer, ATG Sound Technology Research Dolby Sweden AB www.dolby.com | [email protected] | http://se.linkedin.com/in/ebjorkm

KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

Science Meets Arts: An Introduction to Perceptual Audio CodingEQ2460: Seminars in Wireless Systems

February 22, 2017 KTH, Stockholm

Erik Björkman, M.Sc. EE

Research Engineer, ATG Sound Technology Research

Dolby Sweden AB

www.dolby.com | [email protected] | http://se.linkedin.com/in/ebjorkm

Page 2: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

What is a Perceptual Audio Codec?

2

Page 3: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

What is a Perceptual Audio Codec?

• Meaning

• Perceptual ~ related to human perception

• Audio ~ from the Latin verb audīre or audiō (“to hear” or “I hear”)

• Codec ~ (en)coder + decoder ≈ codec

• Data compression

• lossless coding vs. lossy coding

Sound example

• Original (CD-quality) 1411.2 kbps ≈ 1.4 Mbps

– 16-bit PCM, 44.1 kHz, stereo

– BW = 22.050 kHz

• Perceptual audio codec 24 kbps

– Compression ratio ~1:59

– BW = 14.815 kHz

3

Page 4: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Presentation Outline

• My Background

• Dolby Laboratories

• About Dolby

• Overview of recent products

• Master’s thesis proposal

• Introduction to Perceptual Audio Coding

• Psychoacoustics

• Waveform Coding

– Sub-Band Coding (SBC)

• Parametric Coding Algorithms

– Spectral Band Replication (SBR)

– Parametric Stereo (PS)

4

Page 5: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

My Background

• Started the E programme at KTH 2006

• Non-matriculated graduate student at Stanford University 2009/2010

• Erasmus Mundus research intern at the MTG of UPF in Barcelona 2011

• Consultant in RF and electronics design at Tritech Technology AB 2012-2016

• Master’s degree from KTH completed in 2015

• Research engineer at Dolby Sweden AB since October 2016

5Hardware setup for headphone measurements during Master’s degree project, Signal Processing Lab, KTH.

Page 6: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

What’s Dolby to you?

Page 7: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Laboratories

• Founded 1965 by Ray Dolby (1933-2013)

• Dolby Noise Reduction [1]

• Technologies for entertainment businesses

• Cinema audio and video

• Home theaters

• Audio and video coding

• Audio post-processing

• Video and image processing

• …and more(!)

• Product concepts

• Professional equipment

• Technology licensing for consumer products

7Ray Dolby was honored with a star on the Hollywood Walk of Fame on January 22, 2015, adapted from [2].

Page 8: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Sweden

• Dolby Laboratories

• Headquarters in San Francisco

• ~40 offices worldwide

• ~2000 employees

• ~400 employees in R&D or related

Dolby Sweden

• Coding Technologies

– originally founded by Lars “Stockis” Liljeryd in 1997

– Dolby acquired Coding Technologies in 2007

• ~15 researchers

• Research focused on audio compression

• Office in Vasastan, Stockholm

8Fussball table at Dolby Sweden with a Dolby Conference Phone [3,4] in the background.

Page 9: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Products

Some of our recent products include

• Dolby Atmos [5,6]

• Dolby Vision [7-9]

• Dolby Cinema [10]

• Dolby Atmos for the Home [11,12]

• Dolby AC-4 [14-19]

• Dolby Home Theater v4 [20]

9Feel Every Dimension in Dolby Atmos.

Page 10: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Atmos

Dolby Atmos trailer (blank slide in hand-out version). 10

Page 11: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Atmos

• Audio platform for cinema

• Released in 2012

• Novelties

• Audio objects

• Overhead speakers, front surround speakers

• Up to 64 individual speaker feeds

• 650+ movies mixed for Atmos

• ~2700 planned or installed Atmos screens

• Closest Atmos screen(s):

– Filmstaden Scandinavia #2, #3, #4, Solna [21]

11Channel-based vs. object-based surround, adapted from [6].

Page 12: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Atmos: Rendering

• Inputs: up to 128 tracks

• 9.1 bed (static)

• 118 objects (dynamic)

• Outputs: up to 64 speaker feeds

• Filmstaden Scandinavia [21]

– Screen #2: 55.1

– Screen #3: 47.1

– Screen #4: 42.1

• Benefits

• Increased immersive sound experiences

• Reduced singular-sweet-spot effect

12Reproduction principle for Dolby Atmos for the Cinema, adapted from [6].

Page 13: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Atmos: Speaker Outline

• Overhead surround speakers

• Enables sounds or objects from above

• Front surround speakers Lfs / Rfs

• Surround speaker arrays (L = Left, R = Right)

5.1: L/R surround Ls / Rs

7.1: L/R side/rear surround Lss / Lrs / Rrs / Rss

9.1: Two more arrays for surround bed Lfs / Lrs + 7.1

• Increased spatial resolution in the azimuthal plane

• Individual feeds for surround speakers

• Previous platforms

– Surround speakers are only grouped as arrays

• Object positions

– Enables arbitrary spatial rendering

13Speaker outline for a Dolby Atmos theater, adapted from [6].

Rss

Lss

Rrs

Rfs

Lfs

Rs = Rss + Rrs

Ls = Lss + Lrs

Lrs

Page 14: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Vision

• Traditional TVs

• Legacy: Cathode Ray Tube (CRT) technology

• Brightness from 0.117 to 100 nits (1 nit = 1cd/m2)

– Sunny day may be measured at up to 50,000 nits

• HDR reproduction

• Poor reproduction of HDR images at ≤ 100 nits

• Narrow dynamic range

• Modern TVs may offer playback at 300-500 nits

• Image-distorting post-processing depends on manufacturer and model

– Lack of source data for post-processing

14Brightness of a HDR flower image, adapted from [8].

Page 15: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Vision

• Research at Dolby

• 0.001 to 10,000 nits to please 90% of viewers [9]

• Dolby Vision concept

• Standardize post-processing

– Extract metadata during image pre-processing

– Master images with a calibrated display (Pulsar)

– Algorithms are tuned to each TV or monitor

• No new codec required

– Dual layer: base (SDR) + enhancement (EDR; Vision)

– SDR images transmitted through industry-standard codecs (HEVC, AVC)

– Enhancement layer ignored by SDR displays

15Summary of dynamic range subjective test results, adapted from [9].

Page 16: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Vision

• Perceptual Quantizer (PQ)

• Novel EOTF allows coding a 10,000-nit range with 12 bits instead of 14 bits

• Standardized as SMPTE ST-2084

• Image mastering at 4,000 nits

• CMU maps images to a reference display at SDR brightness 100 nits

• EDR metadata enables rendering to arbitrary brightness up to 4,000 nits

• Up to 10,000 nits in future versions

• More detailed images (EDR = HDR + WGC)

• Higher dynamic range (HDR)

• Wider color ”gamut” (WCG)

• Results in greater contrast and color ”volume”

16Graphical representations of gamut and dynamic range, adapted from [8].

Page 17: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Cinema

Dolby Cinema at AMC, adapted from [10] 17

Total cinema experience with Dolby Atmos, Dolby Vision and premium interior design.

Page 18: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Atmos for the Home

• Audio reproduction system for the home theater

• Released in 2014

• Immersive content via…

• Blu-ray

• Streaming

• Broadcast

• Speaker options

• Traditional surround system + either:

– overhead speakers

– upward-firing Atmos speakers

• Atmos enabled sound bar + optional subwoofer

20Dolby Atmos for the Home 7.1.6 surround system with upward-firing Atmos speakers, adapted from [23].

Page 19: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby AC-4

• State-of-the-art audio codec

• Broadcasting

• Streaming

• End-to-end system from content creation to distribution and playback

• Key features

• Personalization and accessibility

• Support for immersive content

• Efficiency: 50 % greater compression efficiency compared to Dolby Digital Plus (DD+)

• Video-frame alignment

21Audio representation in DD+ (conventional) vs. AC-4 (present), adapted from [18].

Page 20: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

AC-4 Video-Frame Alignment

22

Conventionalapproach

AC-4

AC-4 video-frame alignment, adapted from [18].

Page 21: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Complexity Comparison: Enhanced AC-3 (DD+) vs. AC-4

23Number of pages used to describe the respective syntax elements of Enhanced AC-3 and AC-4, adapted from [17].

Page 22: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Home Theater v4

24

Surround Decoder

Dialog Enhancer

IntelligentEQ

Volume Leveler

Imp

rove C

on

ten

t Q

uality

En

han

ce t

he P

layb

ack

Syst

em

Surround Virtualizer

Volume Maximizer

Audio Optimizer

Audio Regulator

Graphic EQ

Page 23: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Master’s Thesis Proposal:Algorithm for Loudness Metering of Object-Based Audio Content

Location: Dolby Sweden office, Stockholm, Sweden

Duration, Start: 20 weeks (30hp); Flexible (a.s.a.p)

Objective: Develop and evaluate real-time algorithms for loudness-leveling of object-based audio content

Profile:

• Last-year MSc EE, MSc CS student or similar with focus on digital signal processing

• Courses or practical experience in or more of the following: audio signal processing, psychoacoustics, non-linear signal processing

• Good programming skills in C/C++ and experience in Matlab or Python for digital signal processing

• Excellent communication skills and fluent in English

Applications: Apply online at https://career4.successfactors.com/career?company=Dolby , requisition number 24981

25

Page 24: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

Introduction to Perceptual Audio Coding

Sub-Band Coding

&

Parametric Coding Algorithms

Page 25: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Audio Coding (compression)

• General objective

• To reduce the number of bits needed to represent an audio signal in a digital form

• Lossless coding: redundancy reduction

• Perfect reconstruction

– Comparable to general data compression; “.zip”

• Dolby TrueHD, FLAC

Lossy coding: irrelevancy reduction

• “Intelligent” quantization

• Reconstructed signals contains additional quantization noise and time-aliasing distortion

– Ideally: coding artifacts are inaudible

– Comparable to lossy image compression; “.jpeg”

• AC-4, Dolby Digital Plus, AAC, mp3, ogg vorbis

27

Perceptual audio coding

Page 26: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Perceptual Audio Coding

• Objective

• To use as few bits as possible

• Straight-forward solution

• Coarser quantization

– uses less bits but…

– introduces more noise(!)

• What’s the possible trade-off then?

– No need to reproduce sounds that “we” can’t hear

• Alternative solution

• Exploit observed limitations of human hearing

– Sub-band coding (waveform data)

– Parametric coding algorithms (parametric data)

28Outline of a perceptual audio codec (sub-band coding).

Page 27: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Psychoacoustics

• Is human hearing linear?

• Pitch or frequency detection mechanisms

• Absolute threshold of hearing

• Masking phenomena

• Spectral or simultaneous masking*

– Frequency domain

• Temporal or non-simultaneous masking

– Time-domain: forward masking, backward masking

• Frequency resolution

• Critical bandwidth

– Low vs. High frequencies

29Absolute threshold of hearing, adapted from [24].

Page 28: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Spectral Masking

• Masking thresholds

• Spectral components of simultaneous sounds below the masking threshold are inaudible

• Frequency-dependent

• Level-dependent

• Playback at different levels

• Requires compromise and approximation

• *Importance of spectral masking

• May be the most important psychoacoustic phenomenon applied in perceptual audio codecs

• EQ2321 Speech and Audio Processing

• Launched in P3, 2018

• 7.5 credits

30Spectral masking thresholds for critical-band-wide noise, adapted from [24].

Page 29: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Quantization Noise

• Signal-to-Quantization-Noise-Ratio

• SQNR 20 log 2 dB 6.02 · dB

• quantization bits

• CD-quality audio

• 16-bit PCM stereo audio sampled at 44.1 kHz

• SQNR 6.02 · 16 dB 96 dB

• The figure to the right

• Q-noise << Masking threshold < Audio signal

• Weak sounds

• Q-noise ~ Masking threshold

• What happens if played through a powerful amplifier?

31Quantization noise spectrum (dark grey) in linearly quantized audio, adapted from [25].

Page 30: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Ideal Audio Codec

32Quantization noise spectrum (dark grey) of an ideal perceptual audio coder, adapted from [25].

• Quantization errors just below the masking threshold…

• …if the bitrate is sufficiently high

Sound example

• Original (CD-quality) ~1.4 Mbps

– BW = 22.050 kHz

• AAC (SBC) 128 kbps

– BW = 16.0 kHz

– Compression ratio ~1:11

• Difference signal @ –19.3 dB

• White noise @ –19.3 dB

• Original + white noise SNR @ +19.3 dB

Page 31: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Sub-Band Coding (SBC)

• Compute the masking threshold

• Perceptual masking model derived from psychoacoustic experiments

• Encoding

• Split the signal into sub-bands

• Quantize the signal within each sub-band

– Q-error < Masking threshold in each sub-band

– Use an “optimal” number of bits in each sub-band

• Decoding

• Decode the quantized sub-band signals

• Synthesize output waveform in a synthesis filterbank

– “Inverse encoding”

33Quantization concept in sub-band coding.

Page 32: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Low-Bitrate Coding

34Quantization noise spectrum (dark grey) of a suboptimal audio coder, adapted from [25].

• Bitrate to low

• Quantization errors becomes audible

Sound example

• Original (CD-quality) ~1.4 Mbps

– BW = 22.050 kHz

• AAC (constrained BW = 16.0 kHz)

– BW = 16.0 kHz 48 kbps

– BW = 16.0 kHz 24 kbps

Bandlimit the input signal?

Page 33: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Band-Limiting

35Quantization noise spectrum (dark grey) of an ideal but bandlimited audio coder, adapted from [25].

• Less audible quantization errors…

• …but more spectral “distortion”

Sound example

• Original (CD-quality) ~1.4 Mbps

– BW = 22.050 kHz

• AAC (unconstrained BW)

– BW 8.5 kHz 48 kbps

– BW 6.6 kHz 24 kbps

Other remedies?

• Parametric coding algorithms…

Page 34: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

Parametric Coding Algorithms

Spectral Band Replication

&

Parametric Stereo

Page 35: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Spectral Band Replication (SBR)

• Problem: available bitrate too low

• Idea

• Human hearing has relatively poor high-frequency resolution

• Correlation between low and high frequenies

• “Reconstruct” high-frequency components

– Transposition of bandlimited waveform

– Adjust transposed frequencies with parameters estimated by the encoder

– Reconstruction outline on the next slide…

• Solution: Spectral Band Replication [25]

• Developed by Coding Technologies

• Used in mp3PRO and aacPlus v1 (HE-AAC)

– SBC (mp3 or AAC core) + SBR

• A-SPX in AC-4 [15] derived from SBR 37SBR transposition of bandlimited waveform, adapted from [25].

Page 36: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

SBR Reconstruction

1. Transposition

2. Envelope adjustment

3. Tonality adjustment

Sound example

• Original (CD-quality) ~1.4 Mbps

– BW = 22.050 kHz

• AAC 48 kbps

– BW 8.5kHz

• aacPlus v1 (AAC+SBR) 48 kbps

– BW = 16.193kHz

– Compression ratio ~1:29

38Output spectrum after SBR reconstruction, adapted from [25].

Page 37: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

SBR Encoding

39Outline of an SBR encoder.

1. Extract parametric data

• Cut-off frequency

– Depends on final bitrate, typically fc > 3-4 kHz

• Spectral envelope

• Tonality

2. Extract core waveform

• Low-pass filtering

• Down-sampling

3. SBC encoding of core waveform

• AAC, mp3

4. Bitstream multiplexing

• Encoded waveform + parametric data

Page 38: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

SBR Decoding

40Outline of an SBR decoder.

1. Bitstream demultiplexing

• encoded waveform

• SBR data

2. Decode core waveform

3. SBR reconstruction

• Transposition of core waveform

• Envelope adjustment of reconstructed frequencies

• Tonality adjustment through addition of sinusoids and noise

Details follows in the upcoming slides…

Page 39: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

SBR: core decoding

41Core waveform decoding.

1. Perform decoding of core waveform

Page 40: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

SBR: waveform analysis

1. Perform decoding of core waveform

2. Analyze core waveform in a QMF filterbank

42QMF analysis of core waveform.

Page 41: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

SBR: sub-band transposition

1. Perform decoding of core waveform

2. Analyze core waveform in a QMF filterbank

3. Transpose the QMF sub-bands

43Transposition of the core waveform QMF bands.

Page 42: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

SBR: envelope adjustment

1. Perform decoding of core waveform

2. Analyze core waveform in a QMF filterbank

3. Transpose the QMF sub-bands

4. Adjust the spectral envelope of the transposed QMF bands to match the original envelope

44Envelope adjustment of transposed QMF bands.

Page 43: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

SBR: sinusoids and noise

1. Perform decoding of core waveform

2. Analyze core waveform in a QMF filterbank

3. Transpose the QMF sub-bands

4. Adjust the spectral envelope of the transposed QMF bands to match the original envelope

5. Add additional sinusoids and noise to match the tonality of the original signal

45Add sinusoids and noise if necessary.

Page 44: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

SBR: waveform synthesis

1. Perform decoding of core waveform

2. Analyze core waveform in a QMF filterbank

3. Transpose the QMF sub-bands

4. Adjust the spectral envelope of the transposed QMF bands to match the original envelope

5. Add additional sinusoids and noise to match the tonality of the original signal

6. Sample rate conversion of core QMF bands and waveform synthesis of the output signal in a synthesis filterbank

46Waveform synthesis.

Page 45: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Parametric Stereo (PS)

• Problem: available bitrate is still too low

• Idea

• Stereo vs. Mono

– Mono: half the bitrate for a given “Signal-to-Quantization-Error-Ratio”

• “Reconstruct” stereo from a decoded mono down-mix with parameters estimated by the encoder

• Solution: Parametric Stereo [15]

• Developed by Coding Technologies

• ~2-3 kbps of parametric data

• Enhanced aacPlus (aacPlus v2)

– SBC (AAC core) + SBR + PS

• A-CPL in AC-4 [15] derived from PS

47Outline of a generalized PS encoder, adapted from [26].

Page 46: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

PS: coding principle

• Encoding

• Estimate intensity and correlation properties of the stereo channels

– Inter-channel Intensity Difference (IID) -> “pan”

– Inter-Channel Correlation (ICC) -> “ambience”

• Down-mix stereo to mono M

– Typically  

• Decoding

• Reconstruct channel intensities through the IID

• Decorrelate channel signals through all-pass filters estimated from the ICC

48Parametric view of stereo signals, adapted from [26].

Page 47: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Combining SBR and PS

Sound example

• Original (CD-quality) ~1.4 Mbps

– BW = 22.050 kHz

• AAC 24 kbps

– BW = 6.6 kHz

• aacPlus v1 (AAC+SBR) 24 kbps

– BW = 12.748 kHz

• aacPlus v2 (AAC+SBR+PS) 24 kbps

– BW = 14.815 kHz

– Compression ratio ~1:59

Conclusions

• If the bitrate is too low for SBC coding

– Form a hybrid codec with SBC and SBR

• If the bitrate is too low for SBR+SBC coding

– Form a hybrid codec with SBC and SBR+PS 49Outline of a hybrid decoder with both SBR and PS, adapted from [26].

Page 48: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Summary & Remarks

Psychophysical models

Lossy compression codecs for audio, video and images can benefit from psychophysical models derived from observed limitations in human perception

Parametric coding

Parametric coding algorithms may offer substantial bitrate reductions when combined with core codecs that are unsuitable for further bitrate reductions

The use of a core codec (AAC, HEVC/AVC) in combination with an enhancement codec (SBR/PS, EDR) may offer backwards compatibility with devices only capable of decoding the core signal

Increased system efficiency at the expense of increased complexity

May offer further bitrate reductions like in the case of multiple language representations in AC-4

Quality measures

Subjective quality may vary among individuals and objective quality shall be measured in studies with blind-testing conditions [27] or with validated measurement algorithms [28]

The sound examples in this presentation are subject to bias and shall only be viewed as educational examples50

Page 49: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

Further Questions?

Dolby Laboratories

Dolby Products

Master’s Thesis Proposal

Perceptual Audio Coding

Page 50: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Abbreviations & Keywords

A-CPL Advanced CouPLing HE-AAC High-Efficiency AAC (aacPlus)

A-SPX Advanced SPectral eXtension HEVC High-Efficiency Video Coding

AAC Advanced Audio Coding HF High-Frequency

ATG Advanced Technology Group (Dolby Laboratories) IIC Inter-Channel Correlation

AVC Advanced Video Coding IID Inter-channel Intensity Difference

BW BandWidth kbps KiloBits Per Second

CMU Content Mapping Unit L Left (multi-channel audio)

-> Lfs Left Frount Surround (9.1)

CRT Cathode Ray Tube Lrs Left Rear Surround (7.1, 9.1)

DD+ Dolby Digital Plus Ls Left Surround (5.1)

EDR Extended Dynamic Range (Dolby Vision) Lss Left side surround (7.1, 9.1)

EOTF Electro-Optical Transfer Function Mbps MegaBits Per Second

EQ EQualizer (audio) MDCT Modified Discrete Cosine Transform

HDR High Dynamic Range MTG Music Technology Group (UPF)

52

Page 51: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Abbreviations & Keywords

PCM Pulse-Code Modulation SBC Sub-Band Coding

PQ Perceptual Quantizer (Dolby Vision) SBR Spectral Band Replication

PQF Polyphase Quadrature Filter(s) SDR Standard Dynamic Range

PS Parametric Stereo SMPTE Society of Motion Picture and Television Engineers

Q-error Quantization error (“noise + distortion”) SQNR Signal-to-Quantization-Noise-Ratio

Q-noise Quantization noise STFT Short-Time Fourier Transform

QMF Quadrature Mirror Filter(s) UPF Universitat Pompeu Fabra (Barcelona, Spain)

R Right (multi-channel audio) WCG Wide Color Gamut

R&D Research & Development

RF Radio-Frequency

Rfs Right Front Surround (9.1) -> critical bandwidth (psychoacoustics)

Rrs Right Rear Surround (7.1, 9.1) -> spectral/simultaneous masking (psychoacoustics)

Rs Right Surround (5.1)

Rss Right Side Surround (7.1, 9.1)

53

Page 52: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Links & References

1. R. Dolby, “An Audio Noise Reduction System”, Journal of the Audio Engineering Society, October 1967, Vol. 15, p.383-388

2. https://www.dolby.com/us/en/about/leadership/ray-dolby-walk-of-fame.html (URL 2017/02/16)

3. https://www.dolby.com/us/en/technologies/dolby-voice.html (URL 2017/02/16)

4. https://www.dolby.com/us/en/professional/products/dolby-conference-phone.html (URL 2017/02/16)

5. https://www.dolby.com/us/en/technologies/cinema/dolby-atmos.html (URL 2017/02/16)

6. https://www.dolby.com/us/en/professional/cinema/products/dolby-atmos-next-generation-audio-for-cinema-white-paper.pdf (URL 2017/02/16)

7. https://www.dolby.com/us/en/brands/dolby-vision.html (URL 2017/02/16)

54

Page 53: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Links & References

8. https://www.dolby.com/us/en/technologies/dolby-vision/dolby-vision-white-paper.pdf (URL 2017/02/16)

9. https://www.dolby.com/us/en/technologies/dolby-vision/the-art-of-better-pixels.pdf (URL 2017/02/16)

10. https://www.dolby.com/us/en/platforms/dolby-cinema.html / (URL 2017/02/16)

11. http://www.dolby.com/us/en/technologies/home/dolby-atmos.html (URL 2017/02/03)

12. https://www.dolby.com/us/en/technologies/dolby-atmos/dolby-atmos-for-the-home-theater.pdf (URL 2017/02/16)

13. https://www.dolby.com/uploadedFiles/wwwdolbycom/Content/Gutter/dolby-atmos-for-sound-bar-applications.pdf (URL 2017/02/16)

14. https://www.dolby.com/us/en/brands/dolby-audio.html (URL 2017/02/16)

55

Page 54: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Links & References

15. K. Kjörling et al., “AC-4 – The Next Generation Audio Codec”, 140th AES Convention, Paris, June 4–7, 2016

16. H. Purnhagen et al., “Immersive Audio Delivery Using Joint Object Coding”, 140th AES Convention, Paris, June 4–7, 2016

17. J. Larsen and M. Wolters, “Development Tools for Modern Audio Codecs”, 140th AES Convention, Paris, June 4–7, 2016

18. https://www.dolby.com/uploadedFiles/wwwdolbycom/Content/Technologies/AC-4/Dolby-AC-4-Audio-System-for-Next-Generation-Broadcast-Services.pdf (URL 2017/02/16)

19. https://www.dolby.com/us/en/technologies/ac-4/Next-Generation-Entertainment-Services.pdf (URL 2017/02/16)

20. https://www.dolby.com/us/en/technologies/dolby-pc-entertainment-experience-v4-overview.pdf (URL 2017/02/16)

56

Page 55: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Links & References

21. http://www.sf.se/biografer/Stockholm---Filmstaden-Scandinavia/Dolby-ATMOS/ (URL 2017/02/07)

22. https://www.dolby.com/us/en/star-wars/index.html (URL 2017/02/16)

23. https://www.dolby.com/us/en/guide/7.1.6-dolby-atmos-enabled-speaker-setup-guide.pdf (URL 2017/02/17)

24. E. Zwicker and H. Fastl, “Psychoacoustics – Facts and Models”, Third Edition. Springer. 2007. pages 17, 64.

25. M. Dietz et al., “Spectral Band Replication, a Novel Approach in Audio Coding”, 112th AES Convention, Munich, May 10-13, 2002

26. H. Purnhagen, “Low Complexity Parametric Stereo Coding in MPEG-4,” in Proc. 7th Int. Conf. Digital Audio Effects (DAFx’04), Naples, October 5-8, 2004

57

Page 56: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp

© 2016-17 DOLBY LABORATORIES, INC.

Links & References

27. International Telecommunications Union, “Method for the Subjective Assessment of Intermediate Quality Level of Audio Systems”, Rec. ITU-R BS.1534-3 (10/2015)

28. International Telecommunications Union, “Method for Objective Measurements of Perceived Audio Quality”, Rec. ITU-R BS.1387-1 (11/2001)

58

Page 57: KTH 20170222 final pp v0 · 4cience .eets "rts "n *ntroduction to 1erceptual "udio $oding &2 4eminars in 8ireless 4ystems 'ebruary ,5) 4tockholm (tkm %lÐtmocp