Properties of Musical Sound SubjectiveObjective PitchFrequency VolumeAmplitude/power/intensity TimbreOvertone content/spectrum Duration in beatsDuration

Properties of Musical Sound

Subjective Objective

Pitch Frequency

Volume Amplitude/power/intensity

Timbre Overtone content/spectrum

Duration in beats Duration in time

Direct Sound

Sound waves that travel directly from the source to the listener.

Direct Sound intensity attenuates with the distance according to the inverse square law.

2Dist

KI

For example, doubling the distance will result in an attenuation of 4 times, or

dB6

Early (first order) Reflection

Sound waves that travel to the listener after reflecting “once” from the environment (mainly walls).

Early reflection within 35ms from direct sound reinforce the latter.

According to Beranek who study 54 concert halls, “intimate” effect was felt with early reflections of less than 20ms.

In large halls, suspended reflectors are employed to provide early reflection to center seats.

Reverberation (second to higher order reflection)

Sound waves that travel to the listener after reflection of first-order-reflection.

Reverberation will decay with time as sound energy is absorbed by the enviroment.

Reverberation time is the duration for the sound pressure to drop to 60dB of its initial level, in general for the frequency range of 500-1000Hz,

2

3

s

murfaceroomTotal

mvolumeRoomTR

High frequency signals are absorbed more quickly in air than low ones, reverberation time is hence shorter.

Microphone

Dynamic Magnetic Induction

A Classical Ribbon Microphone

Magnet

Coil

Ribbon diaphragm

Simple Economical Robust

Microphone

Condenser Capacitor Transducer

A Condenser Microphone

Complicated Expensive Sharper transient Phantom power

Current

Magnitude of a microphone’s response to pressure changes imposed at different directions.

0o

30o

60o

90o

330o

300o

270o

240o

210o180o

150o

120o

0.25

0.50

0.75

1.0

0o

30o

60o

90o

330o

300o

270o

240o

210o180o

150o

120o

0.25

0.50

0.75

1.0

Omnidirectional 1

0o

30o

60o

90o

330o

300o

270o

240o

210o180o

150o

120o

0.25

0.50

0.75

1.0

Bidirectional (figure-eight) cos

0o

30o

60o

90o

330o

300o

270o

240o

210o180o

150o

120o

0.25

0.50

0.75

1.0

Standard cardioid cos5.05.0

0o

30o

60o

90o

330o

300o

270o

240o

210o180o

150o

120o

0.25

0.50

0.75

1.0

Supercardioid cos63.037.0

0o

30o

60o

90o

330o

300o

270o

240o

210o180o

150o

120o

0.25

0.50

0.75

1.0

Subcardioid cos25.075.0

XY (coincident pair) Microphone Recording

90o-135o

Top view Front view

Two identical cardioids aimed across each other at 90o to 135o, 12 inches or less apart

Extremely mono-compatible, moderate stereo effect.

Localization of sound source based on difference in amplitude.

e.g., if L > R, the source seems to be closer to the left side.

Blumlein coincident Microphone Recording

90o

Top view Front view

Two identical “figure 8” microphones placed at 90o, one directly on top of the other

Create by Alan Blumlein, provides precise stereo imaging from sound sources at front and reverberation from rear.

L R

Near coincident Microphone Recording

90o-135o

Top view

ORTF (Office de Radio Television Francaise), 2 cardioids spaced 17cm apart at 110o apart. NOS (Netherlandshe Omroep Stichting), 2 cardioids spaced 12cm apart at 90o apart.

MS Microphone Recording: Recording and playback configuration can be different

S (side)

M is a microphone of any polar pattern, S is a bidirectional microphone

M

M (main/mono)

SSML

SMR

Preserve monophonic compatibility.

Flexible stereoscopic perspectives.

e.g. a cardioid for M

MRL 2

Simulates equivalent microphone at playback

Optimized Cardioid Triangle (OCT)

C (Center)

RF (Right Front)LF (Left Front)

8cm

4-100cm

INA 5

17.5cm

17.5cm17.5cm

60cm60cm

60o

Ideale Nierenanordung (ideal cardioid)

Five cardioid microphones orientated in 5 directions to supply the five channels

C (Center)

RL

LS RS

Fukada Tree

Developed by NHK

INA 5 as basis plus two omnidirectional microphones to expand spatial impression

C (Center)

LS RS

RLRRLL

Pair-wise pan-pot permit permits positioning of sound source

Non-zero gain is applied only to the two speakers adjacent to the phantom image location Even if there are more than two speakers, only the pair which encloses the phantom image is considered.

L R

2

1

P

12 P

12

21

Pg

12

12

Pg

Assuming gain decreases linearly in one channel and increase linearly in the other, we have

)(, oooLet 4531545 12

2

1

gain otali

igT

2

1

2 power otali

igT Ideal case: independent on image position

Linear Panning: Total Gain

0

0.2

0.4

0.6

0.8

1

1.2

-45

-35

-25

-15 -5 5 15 25 35 45

Channel one

Channel two

Total gain

Linear Panning: Total Power

0

0.2

0.4

0.6

0.8

1

1.2

-45

-35

-25

-15 -5 5 15 25 35 45

Channel one

Channel two

Total Power

Loudness is proportional to

power instead of gain

Constant Gain Optimization

Let

Constant Power Optimization

12

190 P

m mg cos1 mg sin2

Constant Power Panning: Total Gain

-0.20

0.20.40.60.8

11.21.41.6

-45

-35

-25

-15 -5 5 15 25 35 45

Channel one

Channel two

Total gain

Constant Power Panning: Total Power

0

0.2

0.4

0.6

0.8

1

1.2

-45

-35

-25

-15 -5 5 15 25 35 45

Channel one

Channel two

Total Power

Time domain Digitization (e.g. CD)

x(t) Quantization y(n)

…01001010...

Sampling

x(n)

Bit-rate = Sampling rate (f) Bits per sample

Number of Channels

Example: bit-rate of 16bits, 44kHz stereo signal =

44,100 16 2 = 1,411,200 bits per second = 176,400 bytes per second

Time domain Digitization (e.g. CD)

x(t) Quantization y(n) Sampling

x(n)

After sampling, the maximum frequency of the signal will be restricted to half the sampling frequency (why?).

2T

T

The highest repetitive pattern that can be obtained with a sampling interval of T is shown below:

Tf s

1

Minimum period =22

1max

sffT

2T

T

Tf s

1

Minimum period =22

1max

sffT

A common convention: Normalized the digital frequencies to the range

2,0f

0 0

fs/8 pi/4

fs/6 pi/3

fs/4 pi/2

fs/2 (fmax) pi

fs/2

Frequency spectrum of a digitized audio signal

Increasing the sampling rate by two times

fs/4

fs/2

fs

fs/2

Frequency spectrum of a digitized audio signal

Increasing the sampling rate by N times

fs/2N

fs

fs

Increasing the sampling rate by N times

fs/2N

fs/2

Quantization noise

Relocate the quantization errors to the high frequency end so that it will reduce its effect on the signal

pk+1

pk

qq/2

If the signal is random (white) the probability distribution of the quantization noise is uniform, noise power (mean square quantization error) =

12/1 22/

2/

2 qdxxq

Nq

qQ

Whenever q is reduced by two times, the power is reduced by 4, i.e. 6dB.

Q

h(n)

d(n)

x(n) y(n)+

+

+

+_

_

nhnunynxnu

u(n)

(1)

ndnuQny (2)

h(n)

x(n) y(n)+

+

+_

_

nhnunynxnu

u(n)

(as before)

nenuny (3)

+

e(n)

The combine noise addition and quantization can be represented by an overall noise term e(n), as

H()

X Y()+

+

+_

_

HUYXU

U

(4)

EUY (5)

+

E()

Applying Fourier Transform gives

HEXY 1 (6)

If H()=1 then the quantization error will be eliminated.

NXHEXY 1 (6)

However this kind of filter cannot be implemented in practice, alternatively different transfer function can be selected so that the noise will be attenuated more on the low frequency end.

jeH (7)

NEXeEXY j 1 (8)

f |N(|2 db

0 0 0 -infinity

fs/8 pi/4 0.5 -3

fs/6 pi/3 1 0

fs/4 pi/2 2 3

fs/2 pi 4 6

Noted that the noise is attenuated more at the low frequency end than the higher end.

Noise power gain

212

1 22

0

dHNPG (9)

Hence the noise shaper had increased the noise power by 3dB

Time Frequency domain Digitization (e.g. MD)

x(t)

Quantizer 1

y(n)

Sampling

x(n)

Block 1 Block 2 Block N

Block M NM 1

Band 1

Band 2

Band 3

Band K

Quantizer 2

Quantizer K

Quantizer 2

Freq. To Time

Converter

1. Time signal is chopped into segments or blocks

2. Each block is transformed into its frequency spectrum

3. Frequency spectrum is partitioned into bands

4. Each band is digitized and quantized

If each frequency band is quantized with the same number of levels, no compression is achieved.

5. In the player, each digitized band is converted back to analogue form

6. The frequency bands integrates to reconstruct the frequency spectrum

7. The frequency spectrum is transformed back to the time domain to reproduce the time segment.

The extra, complicated effort is wasted

However, those bands will subject to more distortion

Compression is attained if certain bands can be quantized with less number of levels

The distortion is in the form of “Quantization Noise”

Any solution to make both ends meet ?

Quantization Noise is less audible at some frequencies than at others

Key researchers in the study of HAS

1. G. von Bekesy

2. J.B. Allen

3. H. Fletcher

4. B. Scharf

5. D.D. Greenwood

Important Findings: Hearing Sensitivity, Tone-Masking-Noise, Critical Bands

“The brain interprets signal received via the auditory system rather than its objective representation.”

Author: Diana Deutsch Source: http://psy.ucsd.edu/~ddeutsch/psychology/figures/fig3.jpg Copyright: Diana Deutsch

Listeners grouped tones by frequency proximity, rather than the actual representation

L

L

R

R

“When two identical but delayed audio sources are heard, the first one will inhibit the other if the delay is within 25 to 35 ms.”

This is true even if the second sound is 10db above the first one.

The result is sound seems to originate from the first source only, and the loudness is increased.

Analyses with frequency (critical) bands

The ear operates like a spectrum analyser

100 Hz below 500Hz

1/6 to 1/3 of an octave above 500Hz

High energy in one band may inhibit neighboring bands

Masking occurs after the masking tone starts and ends:

Forward and backward masking

Frequency response of human ears is non-uniform

• Placed an audience in a quiet room

• Raised 1kHz tone until just audible and recorded the amplitude

• Repeat with other frequencies

kHz

dB

2 4 6 8 10

10

20

Masking involves two signals; a Masker (M) and a Probe (P)

Hiding of one signal at a given frequency by another signal at or near that frequency

M

P

HAS M

P is masked by M

Masking involves two signals; a Masker (M) and a Probe (P)

Hiding of one signal at a given frequency by another signal at or near that frequency

M

P

HAS M

The level when P is just audible is known as “just noticable difference (JND)

Masking by 1kHz tone

kHz

dB

2 4 6 8 10

20

40

60 1

Note: Two types of masking

kHz

dB

2 4 6 8 10

20

40

Masking of multiple tone

601

0.25 4 8

Note: Two types of masking

Determine masking envelop

Divide signal into bands

Determine masked noise region

Masking tone

Noise that can be masked

Determine masking envelop

Divide signal into spectral bands

Determine masked noise region

Masking tone


Quantization is a kind of noise

QQ

QQ

RSS

RSS

=S + Noise

The coarser the quantization, the smaller is the bit-rate. The effect, however, is negiligible is the noise can be masked

Masking tone


Masking tone


The narrower the bandwidth of each band, the better is the noise masking effect.

Time resolution is best at higher frequencies:

Easier to locate the instance of a particular tone

Frequency resolution is best at low frequencies:

Easier to discriminate different frequencies

Time resolution is best at higher frequencies:

Easier to locate the instance of a particular tone

Frequency resolution is best at low frequencies:

Easier to discriminate different frequencies

Suggest non-uniform partitioning of audio frequency spectrum

Standard: The Bark Scale (after Barkhausen)

Partitioning of frequency spectrum into Critical Bands according to the Psychoacoustic model

f

otherwisef

log

Hzff

1000

49

500for 100

1 Bark

Bark

dB

5 10 15 20

20

40

Masking of multiple tone

600.25

(2.5Bk)1k (9Bk) 4k (17Bk)

0.5 (5Bk) 2k (13Bk)

mS

dB

0 5 10 20

20

40

Test tone shortly after the Mask is not audible

60

Mask tone

Test tone

Quantization Noise is less audible at some frequencies than at others

Sensitivity of the ear varies with different frequency

Most sensitive: around 4kHz

Less sensitive: at higher frequencies

Simultaneous masking: A softer sound is less audible in the presence of a louder sound

Quantization Noise is less audible at frequencies on, or closed to loud tones.

x(n)

Segments of input signal

Y0

Y1

|

.|

YN-1

t0

Y0

Y1

|

.|

YN-1

t1

DCTx(n)

Yi

x(n)

Analyzing time windows

A single spectral component for each time slotOthers are computed in the same

The MDCT blends one frame into the next to avoid inter-frame block boundary artifacts. The MDCT output of one frame is windowed according to MDCT requirements, overlapped 50% with the output of the previous frame and added.

Case 1: equal sized-windows

The MDCT blends one frame into the next to avoid inter-frame block boundary artifacts. The MDCT output of one frame is windowed according to MDCT requirements, overlapped 50% with the output of the previous frame and added.

Case 2: non-equal sized-windows

x(n)

A single window

122

122

1

0

N

ki i

Nk

NcoskxY

Y0

Y1

Yi

YN-1

N samples

DCT

x’(n)

Overlapping window w(n)

122

122

1

0

N

ki i

Nk

NcoskxY

Y0

Y1

Yi

YN-1

N samples

DCT

* ' nwnxnx

1,....,1,0 122

122

cos

12

0

'

NkiN

kN

Ykx

N

ii

Y0

Y1

Yi

YN-1

IDCTx’(n)

Overlapping window

Discard frequency band that is less essential to HAS

Decompose x(n) into N

MDCT coefficients

Select the coefficients that are sensitive to the HAS and discard the rest

(e.g. select K bands where K<N)

x(n)= [x(0),x(1), ....., x(N-1)] N

Number of data samples

N

K < N

Disadvantage: Noticable distortion on discarded bands

A better approach: Assign different quantization step-size to each coefficients according to their tolerance to quantization noise based on HAS

x(n)= [x(0),x(1), ....., x(N-1)] N Number of data samples

N

1 re whe1

0

j

N

jj qq

Quantize each coefficient so that the noise is below the

masking threshold (1bit = 6dB)

Decompose x(n) into N

MDCT coefficients

20 bands for lower frequencies

A total of 52 critical bands

16 bands for middle frequencies

16 bands for higher frequencies

Smallest time window: 1.45mS

Longest time window: 11.60mS

Source bit-rate: 1.4Mb/s

Target bit-rate: 292Kb/s

Number of time windows: 8

Bark

dB

5 10 15 20

Different noise masking response in bands can be taken to quantize

frequency components adaptively

Fine quantization is required

Coarse quantization

allowed

No interbank masking

Frequency Range

Analyser

MDCT

MDCT

MDCT

Block size decision

Bit Allocation

H

M

L

11-22k

5.5-11k

0.5-5.5k

292kbps1.4Mbps

Documents

Properties of Musical Sound SubjectiveObjective PitchFrequency VolumeAmplitude/power/intensity TimbreOvertone content/spectrum Duration in beatsDuration