Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing...

Preview:

Citation preview

Evaluation of singing synthesis: methodology and case study

with concatenative and performative systems

Lionel Feugère1, Christophe d'Alessandro1, Samuel Delalez1,Luc Ardaillon2, Axel Roebel2

1LIMSI, CNRS, Université Paris-Saclay, 91405 Orsay, France2IRCAM, CNRS, Sorbonne Universités UPMC, 75004 Paris, France

13th Interspeech 2016, September 8th- 12th, San Francisco

Singing synthesis challenges1993 Stockolm Musical Acoustic Conference2007 Interspeech2016 Interspeech

GoalsProposing a method for evaluating singing synthesisEvaluating synthesis systems from the ChaNTeR project http://chanter.limsi.fr/

Context and Goal

2

Synthesis systems

Methodology

Protocol

Results

Conclusion

Outline

3

Case study

4

Segmental basis

Melodic and rhythmic control

Concatenation and/or freq.-time scaling

Case study

5

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Delalez, S. & d'Alessandro, C. (LIMSI)Ardaillon, L. & Roebel, A. (Ircam)

Segmental basis

Melodic and rhythmic control

Concatenation and/or freq.-time scaling

Case study

6

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Delalez, S. & d'Alessandro, C. (LIMSI)Ardaillon, L. & Roebel, A. (Ircam)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

Concatenation and/or freq.-time scaling

Case study

7

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Delalez, S. & d'Alessandro, C. (LIMSI)Ardaillon, L. & Roebel, A. (Ircam)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

Concatenation and/or freq.-time scaling

PAN SuperSVP

Le Beux, S. et al. Roebel, A.Degottex, G. et alHubber, S. et al.

RT-PSOLA

Case study

8

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

PAN SuperSVP

Concatenation /freq.-time scaling

RT-PSOLA

Case study

9

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

PAN SuperSVP

Concatenation /freq.-time scaling

RT-PSOLA

Case study

10

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

RT-PSOLAPAN SuperSVP

Concatenation /freq.-time scaling

Case study

11

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

RT-PSOLAPAN SuperSVP

Concatenation /freq.-time scaling

Case study

12

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

PAN SuperSVP

Concatenation /freq.-time scaling

RT-PSOLA

Methodology - Types of listening tests

13

AB test

Task: preference bet. 2 soundsResults: mean % of preferencebetween each system

direct comparison

All sounds are compared by pair

=> Short sounds are preferableBetter for particulardimension assessment

Quality of articulationQuality of ornamentation

=> Few sounds is preferableBetter not to add references

Methodology - Types of listening tests

14

Absolute Category Rating

Task: opinion score (1-5)Results: mean opinion score (MOS) foreach system

indirect comparison

Each sound is assessed individually

=> Allows long soundsBetter for general quality assessment

=> Allows higher number of soundsAllows to add references

NaturalPitch/timbre/phoneme degradations

AB test

Task: preference bet. 2 soundsResults: mean % of preferencebetween each system

direct comparison

All sounds are compared by pair

=> Short sounds are preferableBetter for particulardimension assessment

Quality of articulationQuality of ornamentation

=> Few sounds is preferableBetter not to add references

Protocol

15

~2sec sounds

AB test 1 “Choose theitem for whichyou rate thequality of lyricsarticulation thebest”

List

enin

g te

stM

ater

ial &

par

ticip

ants

~7sec sounds (4 bars)

25 paid subjects, active in audio/music, not involved in the project Summer Time and Autumn Leaves musics, French lyrics Synthesized by each system

Absolute Category rating Question:

Globally, how do you ratethe quality of what youhave just heard?

Response: bad (1), poor (2),fair (3), good (4), excellent (5)

AB test 2 “Choose theitem for whichyou rate thequality ofornamentation (vibrato,portamento) the best”

~2sec sounds

Results – General quality (ACR)

16

Diamond are MOS

REFERENCESNat = NaturalDC1 = pitch degradedDC2 = timbre degradedDC3 = phoneme degraded

SEGEMENTAL BASISCon = concatenationMi = Natural monocord-isochron

CONCATENATION / TIME-FREQ SCALINGPAN = Text-to-Chant with PANSVP = Text-to-Chant with SuperVPCal = Calliphony Singing instrument

Results – General quality (ACR)

17

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochron Database for concatenation

Concatenation and/or freq.-time scaling

RT-PSOLAPAN SuperSVP

>

=

=

Results – articulation quality (AB)

18

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochron Database for concatenation

Concatenation and/or freq.-time scaling

<~70% preference

~60-80% preference

RT-PSOLAPAN SuperSVP

>

=

Results – ornamentation quality (AB)

19

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochron Database for concatenation

Concatenation and/or freq.-time scaling

~60% preference

~60-70% preference

RT-PSOLAPAN SuperSVP

>

=

Conclusion

20

Global and analytical evaluation methods for assessing overallquality, articulation quality and ornamentation quality

Absolute category rating allows longer extracts when largenumber of systems=> better for overall musical quality

AB test allows to find differences where Absolute Category ratingdid not=> better for quality on specific dimensions

Text-to-Chant system > Singing instrument CalliphonyBut the methodology better suited for Text-to-Chant systems

Thank you for your attentionlionel.feugere@limsi.fr

Calliphony singing instrument: samuel.delalez@limsi.fr Text-to-Chant system: luc.ardaillon@ircam.fr

ChaNTeR project: http://chanter.limsi.fr/ Sound examples can be downloaded or played online (see paper)

Evaluation of singing synthesis: methodology and case study with concatenative and performative systems

Lionel Feugère, Christophe d'Alessandro, Samuel Delalez, Luc Ardaillon, Axel Roebel

Results (AB)

22

AB Mi-SVP Con-PAN Mi-PAN Con-Cal Mi-Cal

Con-SVP 12

68%*58%*

56%57%

15%*29%*

40%*34%*

Mi-SVP 12

20%*28%

Con-PAN 12

71%*48%

13%*31%*

35%*33%*

Mi-PAN 12

17%*37%*

Con-Cal 12

71%*55%

Percentage of preference of the column system over the line system

* = significant

yellow = less than 1/3 or more than 2/3

AB1: articulation quality

AB2: ornamentation quality

Recommended