Sound Recording Techniques · The Girl From Ipanema Stan Getz 6 Spoonful Howlin' Wolf Nobody Loves Me But My Mother B.B. King ^jazz _ * * Method

Sound Recording Techniques

MediaCity, Salford Wednesday 26th March, 2014

www.goodrecording.net Perception and automated assessment

of recorded audio quality, focussing on

user generated content.

Iain Jackson, Bruno M. Fazenda, Trevor J. Cox, Paul Kendrick, Francis F. Li, Stephen Groves-Kirkby, & Alex Wilson Acoustics Research Centre, University of Salford

How distortion affects the perceived quality of music: Psychoacoustic experiments

• How does clipping affect the perception of quality in music?

–Are hard clipping and soft clipping perceived differently in terms of quality?

• How well does HASQI predict subjective quality ratings of clipped music?

• How robust is HASQI across different styles of music?

• Hearing Aid Speech Quality Index (Kates & Arehart, 2010)

• Models the effect of degradation on quality.

• Measures the combined effect of noise, nonlinear distortion, and linear filters.

• For both normal-hearing and hearing-impaired listeners.

• Good performance for speech signals (Kressner et al, 2013)

– What happens when applied to music?

What is HASQI?

“vocalise” “jazz”

“Haydn”

• Wide variety of degradation/processing: – Additive noise, peak clipping, amplitude quantisation, compression,

compression + babble, spectral sub, high-pass filter, low-pass filter, bandpass filter, positive spectral tilt, negative spectral tilt, single resonance peak, multiple peaks, stationary noise...

– ...a total of 112 conditions.

• But... – Only 3 samples of music.

• Quality ratings “reasonably well predicted” by HASQI. – Were also “significantly affected by genre of music”.

Arehart, Kates & Anderson (2011)

• In contrast to previous work, we assess the effect of a single type of processing – hard clipping – against a comprehensive range of musical styles.

Experiment 1 The effect of hard clipping on perceptions of quality

• Aim: Select a representative sample of as wide a range of musical styles as possible.

• Guided by previous work (Rentfrew & Gosling, 2003)

– 25 prototype songs from each of 14 Genres: • Classical, jazz, blues, folk, alternative, rock, heavy metal, country,

pop, religious, rap/hip-hop, soul, funk, and electronica/dance.

• Final sample library of 140 songs.

– We obtained CD copies of 117 songs on the list.

• How to scale down to a manageable number of songs for test?

– Sort and cluster by timbre.

Sample Selection

Sample Selection Why select by timbre, not genre?

• Genre

– Intuitively useful but lacking in objectivity.

• Timbre

– Apply objective methods to compare songs.

• Samples clustered using modified version of technique used by Aucouturier and Pachet (2002).

– Gaussian Mixture Model (GMM) fitted to Mel Frequency Cepstrum Coefficients (MFCC) for 3 sections each song, which are then clustered by similarity.

• Total number of clusters is an emergent feature:

– In this case it was found to be 6.

• From each of our 6 timbre clusters we draw two samples.

– One cluster, number 4, however contains only one sample.

• Additionally, we include the three samples used by Arehart et al (2011) in their previous assessment of HASQI and music (“jazz”, “Haydn”, “vocalise).

• Thus the final test set consists of 14 samples.

The test set

Table 1. The 14 songs the final test samples were taken from, by cluster number.

Song Name Artist/Composer

1

Riverboat Set: Denis Dillon’s Square Dance Polka, Dancing on the Riverboat

John Whelan

Crazy Train Ozzy Osbourne “Haydn” * *

2 Ave Maria Franz Schubert Packin' Truck Leadbelly “vocalise” * Tierney Sutton

3 Kalifornia Fatboy Slim Brown Sugar The Rolling Stones

4 The Four Seasons: Spring Antonio Vivaldi

5 For What It's Worth Buffalo Springfield The Girl From Ipanema Stan Getz

6 Spoonful Howlin' Wolf Nobody Loves Me But My Mother B.B. King “jazz” * *

Method

• HASQI is continuous between values of 0 to 1.

– HASQI values used to estimate discrete levels.

• 10 Levels per song sample:

– 9 levels of distortion, spread at equal intervals over full range of (available) HASQI values.

– Plus original, clean sample.

Distortion of samples

Relationship between HASQI values and threshold

Crazy Train

0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

Thre

shold

(%

of peak level)

Distortion level (1-HASQI)

For What It's Worth

Table 1. The 14 songs the final test samples were taken from, by cluster number.

Song Name Artist/Composer Example Samples

Clean Medium High

1


John Whelan







• Broadly reproduced method used by Arehart et al.

• 30 participants.

– Mean age 23.7 years (SD: 4.7 years)

– No reported hearing impairments

• Sounds presented over headphones.

– Sennheiser 650 HD

– Stereo, 72dB (linear)

• 140 trials.

– 14 songs x 10 processing conditions

– 7 second samples (randomised presentation order)

• Ratings of overall quality.

– Slider labelled Bad and Excellent at either end (output: 0 - 100)

Results

Figure 1. Mean quality ratings of each cluster, as a function of distortion level. (Error bars show 95% CIs.)

Figure 1. Mean quality ratings of each cluster, as a function of distortion level. (Error bars show 95% CIs.)

• Differences in quality between timbre clusters?

– Repeated-measures ANOVA • Independent variables: Level of distortion, cluster

• Dependent variable: Mean quality ratings

• Significant main effect for distortion level (F(4.97, 144.26) =

458.38, p = <.01, ηp² = .94).

• Significant main effect for cluster (F(2.33, 67.48) = 42.43, p = <.01,

ηp² = .59).

• Significant interaction of cluster x distortion level (F(11.91, 345.41) = 6.98, p = <.01, ηp² = .19).

• Each successive level of distortion is associated with a significant decrease in quality ratings, but the rate of degradation is not perceived equally across all timbres.

Table 2. Clusters grouped according to (between group) significantly different quality ratings.


1


John Whelan







Table 2. Clusters grouped according to (between group) significantly different quality ratings.


1


John Whelan







Results HASQI performance

Table 3. Correlation coefficients for quality ratings and values predicted by HASQI for each timbre cluster.

Cluster Quality

1 .828

2 .689

3 .693

4 .671

5 .801

6 .755

Mean (SD) .732 (.065)

HASQI performance: for speech = .942 (Kates & Arehart, 2010) for music = .838, (range = .770 to .849; Arehart et al, 2011) Rnonlin performance: for music = .95 (1 music sample, 10 participants; Moore et al, 2004)

• How robust is the HASQI model over a comprehensive range of musical styles?

– The performance of HASQI was found to be (a little) less accurate than previous work suggests.

– Overall correlation of predicted vs actual quality ratings = .73 (compared to equivalent value of .84 in Arehart et al).

• Predictive accuracy of HASQI can be improved by factoring in timbral features of samples.

Conclusions

Experiment 2 The effect of Hard Vs Soft clipping on perceptions of quality

• Partial replication of Experiment 1.

– Both hard and soft clipping processing conditions included in test set.

– Equivalent to distortion levels 1 to 5 from Experiment 1 (as opposed to levels 1 to 9 considered in Experiment 1).

• Samples (original, clean files), experimental set-up, procedure, and number of participants all as per Experiment 1.

Hard versus soft clipping

Hard Clipping Thresholds

0 0.1 0.2 0.3 0.4 0.5 0.60

10

20

30

40

50

60

70T

hre

shold

(%

of peak level)


Soft Clipping Thresholds

0 0.1 0.2 0.3 0.4 0.5 0.60

10

20

30

40

50

60

70T

hre

shold

(%

of peak level)


Table 4. Comparison examples of hard and soft clipping at equivalent HASQI levels.

Song Name Artist/Composer Hard/Soft Clip Distortion Level

Clean Low Medium

Hard Ave Maria Franz Schubert

Soft

Hard Packin' Truck Leadbelly

Soft


Figure 4. Mean quality ratings for hard (left) and soft (right) distortion conditions, shown by cluster. Error bars show 95% CIs.


0 1 2 3 4 5 6 7 8 9

0

10

20

30

40

50

60

70

80

90

100

Distortion Level (0 is clean, 9 is most distorted)

Me

an

qu

alit

y r

atin

g

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

0 1 2 3 4 5 6 7 8 9

0

10

20

30

40

50

60

70

80

90

100

Distortion Level (0 is clean, 9 is most distorted)

Me

an

qu

alit

y r

atin

g

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6


• Across all samples, no significant difference between ratings for hard and soft clipping.

• HASQI performance is unaffected by type of distortion.

Experiment 3 Descriptions of quality attributes in different distortion categories

Digital audio sample statistics Since digital audio is encoded as discrete samples of the audio waveform, much can be said about a recording by the statistical properties of these samples.

The Probability Mass Function can show the presence of distortion in mastered audio. Consider three categories:

1. The ‘clean’ distribution, where there is no clipping and a wide dynamic range.

2. Audio with hard-clipping will feature a PMF with high values at its extreme values, where the maximum amplitude has been reached.

3. Where softer distortions are used, there is not one single large value at extremes but more gentle bumps in the nearby regions.

• 63 samples of music, containing a mix of ‘clean’, hard-clipping distortion and soft distortions.

• 22 participants gave quality ratings for each sample on a 5-point scale and also provided 2 descriptors.

• Ratings for clean samples were significantly higher than for the two distorted categories.

• The two distortion categories did not significantly differ between themselves (F(1, 2) = 5.72, p < 0.001, η2 = 0.008).

Subjective Test (Wilson & Fazenda, submitted)

Verbal descriptions of distortion categories • As well as a rating out of 5 participants were also asked to provide two words

which described the attributes on which quality was assessed.

• For example:

• “I gave this sample 5 stars because it was clear and full”

• “I gave this sample 1 star because it was distorted and dull”

Word-clouds of the most common attributes associated with (a) clean samples, (b) hard clipped samples, (c) soft distortion samples.

• Table shows the five most commonly used descriptor words and their absolute frequencies for each of the clean, hard-clipped and soft distortion categories.

• Chi-Square analysis shows that there is significant variation in the distribution of words used to describe each of the three categories (χ2(8, N = 547) = 33.28; p < 0.001).

• Bold frequencies in the table indicate values significantly greater (>) or less than (<) the expected counts of the null hypothesis.

Verbal descriptions of distortion categories

• “Distorted” is used less than expected by chance to describe samples in the ‘clean’ category. The opposite is true for both other categories, the hard and soft clipped distortion samples.

• Samples in the soft category are more frequently described as “Distorted” than those in the hard category. This suggests that small amounts of hard-clipping can go unnoticed.

• “Punchy” used less often when describing the soft distortions, compared to hard-clipping. This may be due to the lesser influence of inter-sample peaks in soft distortions compared to hard-clipping.

• “Harsh” was not associated with either of the distortion categories

but does appear more often than expected by chance for words describing the clean samples.

Verbal descriptions of distortion categories

Conclusions • Overall, HASQI found to predict degradation

in music quality reasonably well.

– Performance across hard and soft clipping is very good.

• Limitation of HASQI for music - not developed for stereo.

– Model does not account for stereo width and panning.

• K.H. Arehart, J.M. Kates and M.C. Anderson. Effects of noise, nonlinear processing, and linear filtering on perceived music quality. Int. J. Audiol. 50(3):177–190. (2011).

• J.J. Aucouturier and F. Pachet. Music similarity measures: What’s the use?. Proc. ISMIR. (2002).

• J.M. Kates and K.H. Arehart. The Hearing-Aid Speech Quality Index (HASQI). J. Audio Eng. Soc. 58(5): 363–381. (2010).

• A. Kressner, D. Anderson, and C. Rozell. Evaluating the generalization of the Hearing Aid Speech Quality Index (HASQI). IEEE Trans. Audio. Speech. Lang. Processing. 21(2): 407–415. (2013).

• B.C.J. Moore, C-T, Tan, N.Zacharov and V-V. Mattila. Measuring and predicting the perceived quality of music and speech subjected to combined linear and nonlinear distortion. J. Audio Eng. Soc. 52(12): 1228–1244. (2004).

• J.P. Rentfrow and S.D. Gosling. The Do Re Mi’s of everyday life: The structure and personality correlates of music preferences. J. Pers. Soc. Psychol. 84(6): 1236-56. (2003).

• A. Wilson and B.M. Fazenda. Sonic character: Categorisation of distortion profiles in relation to audio quality of music recordings. Submitted to 17th Int. Conference on Digital Audio Effects (DAFx-14).

References

Documents

Sound Recording Techniques · The Girl From Ipanema Stan Getz 6 Spoonful Howlin' Wolf Nobody Loves Me But My Mother B.B. King ^jazz _ * * Method