Upload
lorin-bishop
View
217
Download
1
Embed Size (px)
Citation preview
Properties of Musical Sound
Subjective Objective
Pitch Frequency
Volume Amplitude/power/intensity
Timbre Overtone content/spectrum
Duration in beats Duration in time
Direct Sound
Sound waves that travel directly from the source to the listener.
Direct Sound intensity attenuates with the distance according to the inverse square law.
2Dist
KI
For example, doubling the distance will result in an attenuation of 4 times, or
dB6
Early (first order) Reflection
Sound waves that travel to the listener after reflecting “once” from the environment (mainly walls).
Early reflection within 35ms from direct sound reinforce the latter.
According to Beranek who study 54 concert halls, “intimate” effect was felt with early reflections of less than 20ms.
In large halls, suspended reflectors are employed to provide early reflection to center seats.
Reverberation (second to higher order reflection)
Sound waves that travel to the listener after reflection of first-order-reflection.
Reverberation will decay with time as sound energy is absorbed by the enviroment.
Reverberation time is the duration for the sound pressure to drop to 60dB of its initial level, in general for the frequency range of 500-1000Hz,
2
3
s
murfaceroomTotal
mvolumeRoomTR
High frequency signals are absorbed more quickly in air than low ones, reverberation time is hence shorter.
Microphone
Dynamic Magnetic Induction
A Classical Ribbon Microphone
Magnet
Coil
Ribbon diaphragm
Simple Economical Robust
Microphone
Condenser Capacitor Transducer
A Condenser Microphone
Complicated Expensive Sharper transient Phantom power
Current
Magnitude of a microphone’s response to pressure changes imposed at different directions.
0o
30o
60o
90o
330o
300o
270o
240o
210o180o
150o
120o
0.25
0.50
0.75
1.0
0o
30o
60o
90o
330o
300o
270o
240o
210o180o
150o
120o
0.25
0.50
0.75
1.0
Omnidirectional 1
0o
30o
60o
90o
330o
300o
270o
240o
210o180o
150o
120o
0.25
0.50
0.75
1.0
Bidirectional (figure-eight) cos
0o
30o
60o
90o
330o
300o
270o
240o
210o180o
150o
120o
0.25
0.50
0.75
1.0
Standard cardioid cos5.05.0
0o
30o
60o
90o
330o
300o
270o
240o
210o180o
150o
120o
0.25
0.50
0.75
1.0
Supercardioid cos63.037.0
0o
30o
60o
90o
330o
300o
270o
240o
210o180o
150o
120o
0.25
0.50
0.75
1.0
Subcardioid cos25.075.0
XY (coincident pair) Microphone Recording
90o-135o
Top view Front view
Two identical cardioids aimed across each other at 90o to 135o, 12 inches or less apart
Extremely mono-compatible, moderate stereo effect.
Localization of sound source based on difference in amplitude.
e.g., if L > R, the source seems to be closer to the left side.
Blumlein coincident Microphone Recording
90o
Top view Front view
Two identical “figure 8” microphones placed at 90o, one directly on top of the other
Create by Alan Blumlein, provides precise stereo imaging from sound sources at front and reverberation from rear.
L R
Near coincident Microphone Recording
90o-135o
Top view
ORTF (Office de Radio Television Francaise), 2 cardioids spaced 17cm apart at 110o apart. NOS (Netherlandshe Omroep Stichting), 2 cardioids spaced 12cm apart at 90o apart.
MS Microphone Recording: Recording and playback configuration can be different
S (side)
M is a microphone of any polar pattern, S is a bidirectional microphone
M
M (main/mono)
SSML
SMR
Preserve monophonic compatibility.
Flexible stereoscopic perspectives.
e.g. a cardioid for M
MRL 2
Simulates equivalent microphone at playback
Optimized Cardioid Triangle (OCT)
C (Center)
RF (Right Front)LF (Left Front)
8cm
4-100cm
INA 5
17.5cm
17.5cm17.5cm
60cm60cm
60o
Ideale Nierenanordung (ideal cardioid)
Five cardioid microphones orientated in 5 directions to supply the five channels
C (Center)
RL
LS RS
Fukada Tree
Developed by NHK
INA 5 as basis plus two omnidirectional microphones to expand spatial impression
C (Center)
LS RS
RLRRLL
Pair-wise pan-pot permit permits positioning of sound source
Non-zero gain is applied only to the two speakers adjacent to the phantom image location Even if there are more than two speakers, only the pair which encloses the phantom image is considered.
L R
2
1
P
12 P
12
21
Pg
12
12
Pg
Assuming gain decreases linearly in one channel and increase linearly in the other, we have
)(, oooLet 4531545 12
2
1
gain otali
igT
2
1
2 power otali
igT Ideal case: independent on image position
Linear Panning: Total Gain
0
0.2
0.4
0.6
0.8
1
1.2
-45
-35
-25
-15 -5 5 15 25 35 45
Channel one
Channel two
Total gain
Linear Panning: Total Power
0
0.2
0.4
0.6
0.8
1
1.2
-45
-35
-25
-15 -5 5 15 25 35 45
Channel one
Channel two
Total Power
Loudness is proportional to
power instead of gain
Constant Gain Optimization
Let
Constant Power Optimization
12
190 P
m mg cos1 mg sin2
Constant Power Panning: Total Gain
-0.20
0.20.40.60.8
11.21.41.6
-45
-35
-25
-15 -5 5 15 25 35 45
Channel one
Channel two
Total gain
Constant Power Panning: Total Power
0
0.2
0.4
0.6
0.8
1
1.2
-45
-35
-25
-15 -5 5 15 25 35 45
Channel one
Channel two
Total Power
Time domain Digitization (e.g. CD)
x(t) Quantization y(n)
…01001010...
Sampling
x(n)
Bit-rate = Sampling rate (f) Bits per sample
Number of Channels
Example: bit-rate of 16bits, 44kHz stereo signal =
44,100 16 2 = 1,411,200 bits per second = 176,400 bytes per second
Time domain Digitization (e.g. CD)
x(t) Quantization y(n) Sampling
x(n)
After sampling, the maximum frequency of the signal will be restricted to half the sampling frequency (why?).
2T
T
The highest repetitive pattern that can be obtained with a sampling interval of T is shown below:
Tf s
1
Minimum period =22
1max
sffT
2T
T
Tf s
1
Minimum period =22
1max
sffT
A common convention: Normalized the digital frequencies to the range
2,0f
0 0
fs/8 pi/4
fs/6 pi/3
fs/4 pi/2
fs/2 (fmax) pi
fs/2
Frequency spectrum of a digitized audio signal
Increasing the sampling rate by two times
fs/4
fs/2
fs
fs/2
Frequency spectrum of a digitized audio signal
Increasing the sampling rate by N times
fs/2N
fs
fs
Increasing the sampling rate by N times
fs/2N
fs/2
Quantization noise
Relocate the quantization errors to the high frequency end so that it will reduce its effect on the signal
pk+1
pk
qq/2
If the signal is random (white) the probability distribution of the quantization noise is uniform, noise power (mean square quantization error) =
12/1 22/
2/
2 qdxxq
Nq
Whenever q is reduced by two times, the power is reduced by 4, i.e. 6dB.
Q
h(n)
d(n)
x(n) y(n)+
+
+
+_
_
nhnunynxnu
u(n)
(1)
ndnuQny (2)
h(n)
x(n) y(n)+
+
+_
_
nhnunynxnu
u(n)
(as before)
nenuny (3)
+
e(n)
The combine noise addition and quantization can be represented by an overall noise term e(n), as
H()
X Y()+
+
+_
_
HUYXU
U
(4)
EUY (5)
+
E()
Applying Fourier Transform gives
HEXY 1 (6)
If H()=1 then the quantization error will be eliminated.
NXHEXY 1 (6)
However this kind of filter cannot be implemented in practice, alternatively different transfer function can be selected so that the noise will be attenuated more on the low frequency end.
jeH (7)
NEXeEXY j 1 (8)
f |N(|2 db
0 0 0 -infinity
fs/8 pi/4 0.5 -3
fs/6 pi/3 1 0
fs/4 pi/2 2 3
fs/2 pi 4 6
Noted that the noise is attenuated more at the low frequency end than the higher end.
Noise power gain
212
1 22
0
dHNPG (9)
Hence the noise shaper had increased the noise power by 3dB
Time Frequency domain Digitization (e.g. MD)
x(t)
Quantizer 1
y(n)
Sampling
x(n)
Block 1 Block 2 Block N
Block M NM 1
Band 1
Band 2
Band 3
Band K
Quantizer 2
Quantizer K
Quantizer 2
Freq. To Time
Converter
1. Time signal is chopped into segments or blocks
2. Each block is transformed into its frequency spectrum
3. Frequency spectrum is partitioned into bands
4. Each band is digitized and quantized
If each frequency band is quantized with the same number of levels, no compression is achieved.
5. In the player, each digitized band is converted back to analogue form
6. The frequency bands integrates to reconstruct the frequency spectrum
7. The frequency spectrum is transformed back to the time domain to reproduce the time segment.
The extra, complicated effort is wasted
However, those bands will subject to more distortion
Compression is attained if certain bands can be quantized with less number of levels
The distortion is in the form of “Quantization Noise”
Any solution to make both ends meet ?
Quantization Noise is less audible at some frequencies than at others
Key researchers in the study of HAS
1. G. von Bekesy
2. J.B. Allen
3. H. Fletcher
4. B. Scharf
5. D.D. Greenwood
Important Findings: Hearing Sensitivity, Tone-Masking-Noise, Critical Bands
“The brain interprets signal received via the auditory system rather than its objective representation.”
Author: Diana Deutsch Source: http://psy.ucsd.edu/~ddeutsch/psychology/figures/fig3.jpg Copyright: Diana Deutsch
Listeners grouped tones by frequency proximity, rather than the actual representation
L
L
R
R
“When two identical but delayed audio sources are heard, the first one will inhibit the other if the delay is within 25 to 35 ms.”
This is true even if the second sound is 10db above the first one.
The result is sound seems to originate from the first source only, and the loudness is increased.
Analyses with frequency (critical) bands
The ear operates like a spectrum analyser
100 Hz below 500Hz
1/6 to 1/3 of an octave above 500Hz
High energy in one band may inhibit neighboring bands
Masking occurs after the masking tone starts and ends:
Forward and backward masking
Frequency response of human ears is non-uniform
• Placed an audience in a quiet room
• Raised 1kHz tone until just audible and recorded the amplitude
• Repeat with other frequencies
kHz
dB
2 4 6 8 10
10
20
Masking involves two signals; a Masker (M) and a Probe (P)
Hiding of one signal at a given frequency by another signal at or near that frequency
M
P
HAS M
P is masked by M
Masking involves two signals; a Masker (M) and a Probe (P)
Hiding of one signal at a given frequency by another signal at or near that frequency
M
P
HAS M
The level when P is just audible is known as “just noticable difference (JND)
Masking by 1kHz tone
kHz
dB
2 4 6 8 10
20
40
60 1
Note: Two types of masking
kHz
dB
2 4 6 8 10
20
40
Masking of multiple tone
601
0.25 4 8
Note: Two types of masking
Determine masking envelop
Divide signal into bands
Determine masked noise region
Masking tone
Noise that can be masked
Determine masking envelop
Divide signal into spectral bands
Determine masked noise region
Masking tone
Noise that can be masked
Quantization is a kind of noise
RSS
RSS
=S + Noise
The coarser the quantization, the smaller is the bit-rate. The effect, however, is negiligible is the noise can be masked
Masking tone
Noise that can be masked
Masking tone
Noise that can be masked
The narrower the bandwidth of each band, the better is the noise masking effect.
Time resolution is best at higher frequencies:
Easier to locate the instance of a particular tone
Frequency resolution is best at low frequencies:
Easier to discriminate different frequencies
Time resolution is best at higher frequencies:
Easier to locate the instance of a particular tone
Frequency resolution is best at low frequencies:
Easier to discriminate different frequencies
Suggest non-uniform partitioning of audio frequency spectrum
Standard: The Bark Scale (after Barkhausen)
Partitioning of frequency spectrum into Critical Bands according to the Psychoacoustic model
f
otherwisef
log
Hzff
1000
49
500for 100
1 Bark
Bark
dB
5 10 15 20
20
40
Masking of multiple tone
600.25
(2.5Bk)1k (9Bk) 4k (17Bk)
0.5 (5Bk) 2k (13Bk)
mS
dB
0 5 10 20
20
40
Test tone shortly after the Mask is not audible
60
Mask tone
Test tone
Quantization Noise is less audible at some frequencies than at others
Sensitivity of the ear varies with different frequency
Most sensitive: around 4kHz
Less sensitive: at higher frequencies
Simultaneous masking: A softer sound is less audible in the presence of a louder sound
Quantization Noise is less audible at frequencies on, or closed to loud tones.
x(n)
Segments of input signal
Y0
Y1
|
.|
YN-1
t0
Y0
Y1
|
.|
YN-1
t1
DCTx(n)
Yi
x(n)
Analyzing time windows
A single spectral component for each time slotOthers are computed in the same
The MDCT blends one frame into the next to avoid inter-frame block boundary artifacts. The MDCT output of one frame is windowed according to MDCT requirements, overlapped 50% with the output of the previous frame and added.
Case 1: equal sized-windows
The MDCT blends one frame into the next to avoid inter-frame block boundary artifacts. The MDCT output of one frame is windowed according to MDCT requirements, overlapped 50% with the output of the previous frame and added.
Case 2: non-equal sized-windows
x(n)
A single window
122
122
1
0
N
ki i
Nk
NcoskxY
Y0
Y1
Yi
YN-1
N samples
DCT
x’(n)
Overlapping window w(n)
122
122
1
0
N
ki i
Nk
NcoskxY
Y0
Y1
Yi
YN-1
N samples
DCT
* ' nwnxnx
1,....,1,0 122
122
cos
12
0
'
NkiN
kN
Ykx
N
ii
Y0
Y1
Yi
YN-1
IDCTx’(n)
Overlapping window
Discard frequency band that is less essential to HAS
Decompose x(n) into N
MDCT coefficients
Select the coefficients that are sensitive to the HAS and discard the rest
(e.g. select K bands where K<N)
x(n)= [x(0),x(1), ....., x(N-1)] N
Number of data samples
N
K < N
Disadvantage: Noticable distortion on discarded bands
A better approach: Assign different quantization step-size to each coefficients according to their tolerance to quantization noise based on HAS
x(n)= [x(0),x(1), ....., x(N-1)] N Number of data samples
N
1 re whe1
0
j
N
jj qq
Quantize each coefficient so that the noise is below the
masking threshold (1bit = 6dB)
Decompose x(n) into N
MDCT coefficients
20 bands for lower frequencies
A total of 52 critical bands
16 bands for middle frequencies
16 bands for higher frequencies
Smallest time window: 1.45mS
Longest time window: 11.60mS
Source bit-rate: 1.4Mb/s
Target bit-rate: 292Kb/s
Number of time windows: 8
Bark
dB
5 10 15 20
Different noise masking response in bands can be taken to quantize
frequency components adaptively
Fine quantization is required
Coarse quantization
allowed
No interbank masking
Frequency Range
Analyser
MDCT
MDCT
MDCT
Block size decision
Bit Allocation
H
M
L
11-22k
5.5-11k
0.5-5.5k
292kbps1.4Mbps