22
Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel Feugère 1 , Christophe d'Alessandro 1 , Samuel Delalez 1 , Luc Ardaillon 2 , Axel Roebel 2 1 LIMSI, CNRS, Université Paris-Saclay, 91405 Orsay, France 2 IRCAM, CNRS, Sorbonne Universités UPMC, 75004 Paris, France 13th Interspeech 2016, September 8 th - 12 th , San Francisco

Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Evaluation of singing synthesis: methodology and case study

with concatenative and performative systems

Lionel Feugère1, Christophe d'Alessandro1, Samuel Delalez1,Luc Ardaillon2, Axel Roebel2

1LIMSI, CNRS, Université Paris-Saclay, 91405 Orsay, France2IRCAM, CNRS, Sorbonne Universités UPMC, 75004 Paris, France

13th Interspeech 2016, September 8th- 12th, San Francisco

Page 2: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Singing synthesis challenges1993 Stockolm Musical Acoustic Conference2007 Interspeech2016 Interspeech

GoalsProposing a method for evaluating singing synthesisEvaluating synthesis systems from the ChaNTeR project http://chanter.limsi.fr/

Context and Goal

2

Page 3: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Synthesis systems

Methodology

Protocol

Results

Conclusion

Outline

3

Page 4: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Case study

4

Segmental basis

Melodic and rhythmic control

Concatenation and/or freq.-time scaling

Page 5: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Case study

5

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Delalez, S. & d'Alessandro, C. (LIMSI)Ardaillon, L. & Roebel, A. (Ircam)

Segmental basis

Melodic and rhythmic control

Concatenation and/or freq.-time scaling

Page 6: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Case study

6

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Delalez, S. & d'Alessandro, C. (LIMSI)Ardaillon, L. & Roebel, A. (Ircam)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

Concatenation and/or freq.-time scaling

Page 7: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Case study

7

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Delalez, S. & d'Alessandro, C. (LIMSI)Ardaillon, L. & Roebel, A. (Ircam)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

Concatenation and/or freq.-time scaling

PAN SuperSVP

Le Beux, S. et al. Roebel, A.Degottex, G. et alHubber, S. et al.

RT-PSOLA

Page 8: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Case study

8

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

PAN SuperSVP

Concatenation /freq.-time scaling

RT-PSOLA

Page 9: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Case study

9

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

PAN SuperSVP

Concatenation /freq.-time scaling

RT-PSOLA

Page 10: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Case study

10

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

RT-PSOLAPAN SuperSVP

Concatenation /freq.-time scaling

Page 11: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Case study

11

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

RT-PSOLAPAN SuperSVP

Concatenation /freq.-time scaling

Page 12: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Case study

12

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochronDatabase for concatenation

PAN SuperSVP

Concatenation /freq.-time scaling

RT-PSOLA

Page 13: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Methodology - Types of listening tests

13

AB test

Task: preference bet. 2 soundsResults: mean % of preferencebetween each system

direct comparison

All sounds are compared by pair

=> Short sounds are preferableBetter for particulardimension assessment

Quality of articulationQuality of ornamentation

=> Few sounds is preferableBetter not to add references

Page 14: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Methodology - Types of listening tests

14

Absolute Category Rating

Task: opinion score (1-5)Results: mean opinion score (MOS) foreach system

indirect comparison

Each sound is assessed individually

=> Allows long soundsBetter for general quality assessment

=> Allows higher number of soundsAllows to add references

NaturalPitch/timbre/phoneme degradations

AB test

Task: preference bet. 2 soundsResults: mean % of preferencebetween each system

direct comparison

All sounds are compared by pair

=> Short sounds are preferableBetter for particulardimension assessment

Quality of articulationQuality of ornamentation

=> Few sounds is preferableBetter not to add references

Page 15: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Protocol

15

~2sec sounds

AB test 1 “Choose theitem for whichyou rate thequality of lyricsarticulation thebest”

List

enin

g te

stM

ater

ial &

par

ticip

ants

~7sec sounds (4 bars)

25 paid subjects, active in audio/music, not involved in the project Summer Time and Autumn Leaves musics, French lyrics Synthesized by each system

Absolute Category rating Question:

Globally, how do you ratethe quality of what youhave just heard?

Response: bad (1), poor (2),fair (3), good (4), excellent (5)

AB test 2 “Choose theitem for whichyou rate thequality ofornamentation (vibrato,portamento) the best”

~2sec sounds

Page 16: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Results – General quality (ACR)

16

Diamond are MOS

REFERENCESNat = NaturalDC1 = pitch degradedDC2 = timbre degradedDC3 = phoneme degraded

SEGEMENTAL BASISCon = concatenationMi = Natural monocord-isochron

CONCATENATION / TIME-FREQ SCALINGPAN = Text-to-Chant with PANSVP = Text-to-Chant with SuperVPCal = Calliphony Singing instrument

Page 17: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Results – General quality (ACR)

17

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochron Database for concatenation

Concatenation and/or freq.-time scaling

RT-PSOLAPAN SuperSVP

>

=

=

Page 18: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Results – articulation quality (AB)

18

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochron Database for concatenation

Concatenation and/or freq.-time scaling

<~70% preference

~60-80% preference

RT-PSOLAPAN SuperSVP

>

=

Page 19: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Results – ornamentation quality (AB)

19

Text-to-Chant (TTC)Input symbolic scoreOff-line model of melodyand phoneme duration

Singing instrument (Calliphony)MusicianLive control of articulation rhythm( foot pedal) and pitch (pen-tablet)

Segmental basis

Melodic and rhythmic control

Natural Monocord-isochron Database for concatenation

Concatenation and/or freq.-time scaling

~60% preference

~60-70% preference

RT-PSOLAPAN SuperSVP

>

=

Page 20: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Conclusion

20

Global and analytical evaluation methods for assessing overallquality, articulation quality and ornamentation quality

Absolute category rating allows longer extracts when largenumber of systems=> better for overall musical quality

AB test allows to find differences where Absolute Category ratingdid not=> better for quality on specific dimensions

Text-to-Chant system > Singing instrument CalliphonyBut the methodology better suited for Text-to-Chant systems

Page 21: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Thank you for your [email protected]

Calliphony singing instrument: [email protected] Text-to-Chant system: [email protected]

ChaNTeR project: http://chanter.limsi.fr/ Sound examples can be downloaded or played online (see paper)

Evaluation of singing synthesis: methodology and case study with concatenative and performative systems

Lionel Feugère, Christophe d'Alessandro, Samuel Delalez, Luc Ardaillon, Axel Roebel

Page 22: Evaluation of singing synthesis: methodology and case study with … · Evaluation of singing synthesis: methodology and case study with concatenative and performative systems Lionel

Results (AB)

22

AB Mi-SVP Con-PAN Mi-PAN Con-Cal Mi-Cal

Con-SVP 12

68%*58%*

56%57%

15%*29%*

40%*34%*

Mi-SVP 12

20%*28%

Con-PAN 12

71%*48%

13%*31%*

35%*33%*

Mi-PAN 12

17%*37%*

Con-Cal 12

71%*55%

Percentage of preference of the column system over the line system

* = significant

yellow = less than 1/3 or more than 2/3

AB1: articulation quality

AB2: ornamentation quality