27
Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

Embed Size (px)

Citation preview

Page 1: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

Pronunciation LexiconBackground

Paolo Baggia, Loquendo

W3C SSML WorkshopBeijing – 2-3 Nov 2005

Page 2: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

2

W3C SSML workshop2-3 Nov 05 - Beijing

Overview

• Introduction to Pronunciation Lexicon

• Pronunciation Alphabets

• The PLS language

• Issues for the workshop

Page 3: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

3

W3C SSML workshop2-3 Nov 05 - Beijing

Introduction to Pronunciation Lexicon Specification

• The PLS spec is about “Pronunciation Lexicon”:– How to pronounce words and phrases

– How to deal with the variability of pronunciations by country, region, person, etc.

– How to spell abbreviations and acronyms

• Two main uses:– Speech Synthesis (SSML documents)

– Speech Recognition (SRGS grammars)

– Other uses are possible (embedded or referenced in other mark-up)

Page 4: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

4

W3C SSML workshop2-3 Nov 05 - Beijing

The TTS perspective

• A TTS engine’s job is to transform an “input text” into speech, this involves a lot of processing, including:– Text normalization

– Word pronunciation (lexical stress, phonetic transcription)

– Sentence structure (intonation, rhythm)

– Sentence level modification in phonetic transcription (co-articulation)

– Computation of prosodic parameters

– Generation of the acoustic signal

• SSML documents enable TTS enhancement, acting on several levels of processing through SSML markup elements

• PLS improves SSML on text normalization and phonetic transcription

Page 5: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

5

W3C SSML workshop2-3 Nov 05 - Beijing

An SSML example document• This is a simple SSML document:

• This is an enhancement of the same example:

<?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">

The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Roberto Benigni. </speak>

<?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">

The title of the movie is: <phoneme alphabet="ipa" ph="&#x2C8;l&#x251; &#x2C8;vi&#x2D0;&#x27E;&#x259; &#x2C8;&#x294;e&#x26A; &#x2C8;b&#x25B;l&#x259;"> La vita è bella </phoneme> <!-- The IPA pronunciation is “ˈlɑ ˈviːɾə ˈʔeɪ ˈbɛlə” --> (Life is beautiful), which is directed by <phoneme alphabet="ipa"

ph="&#x279;&#x259;&#x2C8;b&#x25B;&#x2D0;&#x279;&#x27E;o&#x28A; b&#x25B;&#x2C8;ni&#x2D0;nji"> Roberto Benigni </phoneme> <!-- The IPA pronunciation is “ɹəˈbɛːɹɾoʊ bɛˈniːnji” --> </speak>

Page 6: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

6

W3C SSML workshop2-3 Nov 05 - Beijing

An SSML example with PLS• This is a simple SSML document that references an

external Pronunciation Lexicon:

• PLS factorizes all the changes in an external document• TTS engine loads the PLS document(s) and applies it(them)

transparently to the SSML document• An application may define contextual PLS documents to be

used in different points of the interaction

<?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">

<lexicon uri="http://www.example.com/movie_lexicon.pls"/>

The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Roberto Benigni. </speak>

Page 7: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

7

W3C SSML workshop2-3 Nov 05 - Beijing

The ASR perspective

• An ASR engine’s job is to transform an audio signal into a textual or semantic representation of the meaning of the sentence

• Using SRGS grammars constrains the sentences to be recognized and improves ASR performance

• PLS improves ASR performance by allowing multiple pronunciations of words, phrases, abbreviations, text normalization

Page 8: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

8

W3C SSML workshop2-3 Nov 05 - Beijing

An SRGS example grammar• This is a very simple SRGS grammar:

• The grammar recognizes sentences like:– “Boston Massachusetts” or “Miami Florida”

but also:– “Boston Florida” or “Fargo Massachusetts”

<?xml version="1.0" encoding="ISO-8859-1"?> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="city_state" mode="voice">

<rule id="city" scope="public"> <one-of> <item>Boston</item> <item>Miami</item> <item>Fargo</item> </one-of> </rule> <rule id="state" scope="public"> <one-of> <item>Florida</item> <item>North Dakota</item> <item>Massachusetts</item> </one-of> </rule> <rule id="city_state" scope="public"> <ruleref uri="#city"/> <ruleref uri="#state"/> </rule></grammar>

Page 9: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

9

W3C SSML workshop2-3 Nov 05 - Beijing

An SRGS example with PLS• This is a simple SRGS grammar that references an external

Pronunciation Lexicon:

• The grammar allows different pronunciations of words to accommodate many different speakers

<?xml version="1.0" encoding="ISO-8859-1"?> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="city_state" mode="voice">

<lexicon uri= =“http://www.example.com/city_state.pls"/>

<rule id="city" scope="public"> <one-of> <item>Boston</item> <item>Miami</item> <item>Fargo</item> </one-of> </rule> <rule id="state" scope="public"> <one-of> <item>Florida</item> <item>North Dakota</item> <item>Massachusetts</item> </one-of> </rule> <rule id="city_state" scope="public"> <ruleref uri="#city"/> <ruleref uri="#state"/> </rule></grammar>

Page 10: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

10

W3C SSML workshop2-3 Nov 05 - Beijing

PLS allows you…• to create Pronunciation Lexicons to be used by both ASR

and TTS

• to take into account different usages:

– For TTS: to improve reading proper names

– For ASR: to give multiple pronunciations

– For TTS/ASR: to expand abbreviations and acronyms

• to exchange Pronunciation Lexicons between different

applications (interoperability)

• to use contextual Pronunciation Lexicons in different

points of the application

• The PLS is a W3C standard language!

PLS saves application developers time/money for creating good speech applications!

Page 11: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

11

W3C SSML workshop2-3 Nov 05 - Beijing

Phonetic Alphabets

• To describe the pronunciation of a word/phrase, you need a

phonetic alphabet

• An alphabet contains symbols to represent speech sounds,

just like in a dictionary, e.g.

Cracked /krakt/ adj. 1 having cracks. 2 (predic.) slang crazy

• The PLS spec suggests to use either:

– a standard pronunciation alphabet, such as IPA

(defined by the International Phonetic Association,

see: http://www2.arts.gla.ac.uk/IPA/index.html)

– other alphabets:

• SAMPA which is an ASCII-way of encoding IPA and X-SAMPA• Pying, JEITA, etc

Page 12: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

12

W3C SSML workshop2-3 Nov 05 - Beijing

IPA – Chart• IPA was founded in 1886• It is the major international

association of phoneticians• The IPA alphabet provides

symbols making possible the phonemic transcription of all known languages

• IPA characters can be encoded in Unicode by supplementing ASCII with characters from other ranges, particularly:

– IPA extensions (0250–02AF)

– Latin Extended-A (0100-017F)

• See the detailed: http://www.unicode.org/charts

Page 13: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

13

W3C SSML workshop2-3 Nov 05 - Beijing

SAMPA – SAM Phonetic Alphabet

• Developed for phonetic transcription in a EU founded project called Speech Assessment Methods (SAM)

• It is ASCII based (easy to write). It is an “ASCII-ization” of IPA• Recently, Prof. John C. Wells proposed an alphabet called

“X-SAMPA”, which encodes all the IPA symbols in ASCII format

• A few examples:– “thin” IPA: /θɪn/ X-SAMPA: /TIn/

– “thing” IPA: /θɪŋ/ X-SAMPA: /TIN/

– “flabbergasted” IPA: /’flæbəgɑːstɪd/ X-SAMPA: /”fl{b@gA:stID/– “Weltanshauung” IPA: /’vɛltʔan,ʃaʊʊŋ/ X-SAMPA: /”vElt?an

%SaUUN/

– en-GB:“vice versa” IPA: /vaɪsə ’vɜːsə/ X-SAMPA: /vaIs@ “v3:s@/

it-IT:“vice versa” IPA: /’viʧe ’vɛrsa/ X-SAMPA: /”vitSe ”vErsa/

Page 14: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

14

W3C SSML workshop2-3 Nov 05 - Beijing

Phonetic Alphabets – Issues

• How to write pronunciation in a reliable and easy way?

• Problems with fonts, word processors, browsers

• There are very few tools to help with writing pronunciation and to let

you listen to what you have written

• The standardization process may push the creation of tools and the

improvement of the coverage by word processors.

• Has IPA any uses for Asian languages?

• Are there standard phonetic alphabets for Asian languages? Such as pinyin, jyutping or jeita?

• Should they be referenced in a standard way, like “ipa”?

Page 15: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

15

W3C SSML workshop2-3 Nov 05 - Beijing

The PLS language• PLS is an XML language

<?xml version="1.0" encoding="UTF-8"?>

• The container element is <lexicon>, attributes:

– version (required): "1.0"

– xmlns (required): "http://www.w3.org/2005/pronunciation-

lexicon"

– alphabet (optional): "ipa" (default value)

– xml:lang (optional): “en-US” or “zh-CN” or “jp”

Example:

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"

alphabet="ipa" xml:lang=“zh-CN"> <!– The lexicon for Chinese Mandarin! --></lexicon> • The current PLS is monolingual!

Page 16: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

16

W3C SSML workshop2-3 Nov 05 - Beijing

The PLS language - metadata• Metadata (annotation of the document for other uses, …)

can be of two varieties:– <meta> element (for compatibility with other markup, like SRGS and SSML)– <metadata> element (which contains the annotations either RDF format or other

formats)Example of metadata:<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"

alphabet="ipa" xml:lang="en-US”> <metadata> <rdf:RDF xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc = "http://purl.org/dc/elements/1.1/"> <!-- Metadata about the PLS document --> <rdf:Description rdf:about="" dc:title="Pronunciation lexicon for W3C terms“ dc:description="This lexicon contains common pronunciations for many W3C acronyms and abbreviations, such as I18N, WSDL or WAI" dc:publisher="W3C“ dc:language="en-US“ dc:date="2005-11-29“ dc:rights="Copyright 2002 W3C“ dc:format="application/pls+xml"> <dc:creator>The W3C Voice Browser Working Group</dc:creator> </rdf:Description> </rdf:RDF> </metadata> <!– Add lexicon entries here!! --></lexicon>

Page 17: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

17

W3C SSML workshop2-3 Nov 05 - Beijing

The PLS language – <lexeme>

• The <lexeme> element is the container of a lexicon entry. It is composed of:– One or more <grapheme> elements

that indicate the words/phrases to be matched in the input

– One or more either <phoneme> or <alias> elements that indicate the possible pronunciations or expansions respectively

• First considerations:– More <grapheme> elements may be present

this means that all of them will match the pronunciations

– More <phoneme> elements may be present this means that several pronunciations are in alternative

– A mixture of <alias> and <phoneme> elements may be present there is a preference mechanism to choose the single one for TTS

Page 18: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

18

W3C SSML workshop2-3 Nov 05 - Beijing

The PLS language – <grapheme>• The <grapheme> element contains CDATA that represents

orthographies:– Regional spelling variations e.g. "colour" and "color"; – Free spelling variations e.g. "judgment" and "judgement" – Traditional vs Modern spellings e.g. for example in German it is common to

replace "ö" with "oe". – Alternate writing systems, e.g. Japanese uses a mixture of Han ideographs

(Kanji), and phonemic spelling systems e.g. Katakana or Hiragana for representing the orthography of a word or phrase

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"

xml:lang="jp" alphabet="ipa"> <lexeme> <grapheme orthography="Latn">nihongo</grapheme> <grapheme orthography="Hani"> 日本語 </grapheme> <grapheme orthography="Kana"> にほんご </grapheme>

<!– Here you can insert the pronunciation of “nihongo”. in IPA language it could be: "nɪhɒŋɒ" -->v </lexeme> </lexicon>

• Is an explicit “orthography” attribute useful?

• Is it redundant?

Page 19: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

19

W3C SSML workshop2-3 Nov 05 - Beijing

The PLS language – <phoneme>

• The <phoneme> elements are contained inside <lexeme>

• <phoneme> contains CDATA specifying the pronunciation in a given pronunciation alphabet:– An “alphabet” attribute may be specified to override the alphabet of

the whole lexicon– A “prefer” attribute may be present to indicate precedence among

pronunciations

Example of lexeme for Sepulveda:<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon“ alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>Sepulveda</grapheme> <phoneme>s&#x0259;'p&#x028C;lv&#x026A;d&#x0259;</phoneme> <!– In IPA language it says: "sə'pʌlvɪdə" --> </lexeme> </lexicon>

Page 20: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

20

W3C SSML workshop2-3 Nov 05 - Beijing

The PLS language – <phoneme>• Other examples

Example for more than one pronunciation of the word “huge”:<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"

xml:lang=“en-US" alphabet="ipa"> <lexeme> <grapheme>huge</grapheme> <phoneme prefer="true">hju:&#x02A4;</phoneme> <!-- IPA string is: "hju:ʤ" --> <phoneme>ju:&#x02A4;</phoneme> <!-- IPA string is: "ju:ʤ" --> </lexeme> </lexicon>

Example for the Japanese word “nihongo” with different spellings:<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"

xml:lang="jp" alphabet="ipa"> <lexeme> <grapheme orthography="Latn">nihongo</grapheme> <grapheme orthography="Hani"> 日本語 </grapheme> <grapheme orthography="Kana"> にほんご </grapheme> <phoneme>n&#x026A;h&#x0252;&#x014B;&#x0252;</phoneme> <!-- IPA string is: "nɪhɒŋɒ" --> </lexeme> </lexicon>

Page 21: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

21

W3C SSML workshop2-3 Nov 05 - Beijing

The PLS language – <alias>

• The <alias> elements are contained inside <lexeme>

• <alias> is used to indicate the pronunciation of an acronym or an abbreviated term in the form of other orthographies.

• <alias> may contain– A “prefer” attribute to indicate precedence among pronunciations

• Both <phoneme> and <alias> may occur in a <lexeme>

Example of lexeme with both <phoneme> and <alias>:<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"

alphabet="ipa" xml:lang="en"> <lexeme> <grapheme>W3C</grapheme> <alias>World Wide Web Consortium</alias> </lexeme> </lexicon>

Page 22: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

22

W3C SSML workshop2-3 Nov 05 - Beijing

Use Cases/Future IssuesThe current version of PLS can deal with:• Multiple Pronunciations for ASR• Homographs• Abbreviations

But it cannot deal with:• Homophones• Part of speech annotations (and other contextual

information)• Grouping lexemes and external references

Too challenging tasks to be solved for PLS version 1.0

Page 23: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

23

W3C SSML workshop2-3 Nov 05 - Beijing

Issues for the workshop

• Monolingual lexicon?

• Orthography attribute: Useful or redundant?

• Mandate new phonetic alphabets?

Page 24: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

24

W3C SSML workshop2-3 Nov 05 - Beijing

Quick demo of SSML+PLS

• Mobile device (with embedded TTS)

• By GPRS, the device connects to a server:– It donwloads News for news site (RSS)– Transformation in SSML– Returned to the mobile device

• The device then:– Shows the news on the screen– Read the SSML document (which includes a lexicon) using

the TTS engine

Page 25: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

25

W3C SSML workshop2-3 Nov 05 - Beijing

Use Cases – Multiple pronunciations

• More than one pronunciation for a word (very common for

ASR)

Example of two pronunciations for the word “Newton”:

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0“ xmlns="http://www.w3.org/2005/pronunciation-lexicon" alphabet="ipa" xml:lang="en"> <lexeme> <grapheme>Newton</grapheme> <phoneme prefer="true">nju:'t&#x0259;n</phoneme> <!-- IPA string is: "nju:'tən" --> <phoneme>nu:'t&#x0259;n</phoneme> <!-- IPA string is: "nu:'tən" --> <lexeme></lexicon>

Page 26: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

26

W3C SSML workshop2-3 Nov 05 - Beijing

Use Cases – Multiple Orthographies

• More than one orthography for a word (common for ASR

and TTS)

Example of two orthographies for colour/color:

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon" alphabet="ipa" xml:lang="en"> <lexeme> <grapheme>color</grapheme> <grapheme>colour</grapheme> <phoneme>'k&#x028C;l&#x0259;</phoneme> <!-- IPA string is: "'kʌlə" --> <lexeme></lexicon>

Page 27: Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

27

W3C SSML workshop2-3 Nov 05 - Beijing

Final Remarks

• The usage of PLS:

– Simplifies the development of a speech application

– Improves the performance of speech recognition (in a

standard way)

– Enhances TTS output

• A standard language for PLS enables the exchange of

pronunciations between applications