Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

Pronunciation LexiconBackground

Paolo Baggia, Loquendo

W3C SSML WorkshopBeijing – 2-3 Nov 2005

2

W3C SSML workshop2-3 Nov 05 - Beijing

Overview

• Introduction to Pronunciation Lexicon

• Pronunciation Alphabets

• The PLS language

• Issues for the workshop

3


Introduction to Pronunciation Lexicon Specification

• The PLS spec is about “Pronunciation Lexicon”:– How to pronounce words and phrases

– How to deal with the variability of pronunciations by country, region, person, etc.

– How to spell abbreviations and acronyms

• Two main uses:– Speech Synthesis (SSML documents)

– Speech Recognition (SRGS grammars)

– Other uses are possible (embedded or referenced in other mark-up)

4


The TTS perspective

• A TTS engine’s job is to transform an “input text” into speech, this involves a lot of processing, including:– Text normalization

– Word pronunciation (lexical stress, phonetic transcription)

– Sentence structure (intonation, rhythm)

– Sentence level modification in phonetic transcription (co-articulation)

– Computation of prosodic parameters

– Generation of the acoustic signal

• SSML documents enable TTS enhancement, acting on several levels of processing through SSML markup elements

• PLS improves SSML on text normalization and phonetic transcription

5


An SSML example document• This is a simple SSML document:

• This is an enhancement of the same example:

<?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">

The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Roberto Benigni. </speak>


The title of the movie is: <phoneme alphabet="ipa" ph="ˈlɑ ˈviːɾə ˈʔeɪ ˈbɛlə"> La vita è bella </phoneme>  (Life is beautiful), which is directed by <phoneme alphabet="ipa"

ph="ɹəˈbɛːɹɾoʊ bɛˈniːnji"> Roberto Benigni </phoneme>  </speak>

6


An SSML example with PLS• This is a simple SSML document that references an

external Pronunciation Lexicon:

• PLS factorizes all the changes in an external document• TTS engine loads the PLS document(s) and applies it(them)

transparently to the SSML document• An application may define contextual PLS documents to be

used in different points of the interaction


<lexicon uri="http://www.example.com/movie_lexicon.pls"/>

The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Roberto Benigni. </speak>

7


The ASR perspective

• An ASR engine’s job is to transform an audio signal into a textual or semantic representation of the meaning of the sentence

• Using SRGS grammars constrains the sentences to be recognized and improves ASR performance

• PLS improves ASR performance by allowing multiple pronunciations of words, phrases, abbreviations, text normalization

8


An SRGS example grammar• This is a very simple SRGS grammar:

• The grammar recognizes sentences like:– “Boston Massachusetts” or “Miami Florida”

but also:– “Boston Florida” or “Fargo Massachusetts”

<?xml version="1.0" encoding="ISO-8859-1"?> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="city_state" mode="voice">

<rule id="city" scope="public"> <one-of> <item>Boston</item> <item>Miami</item> <item>Fargo</item> </one-of> </rule> <rule id="state" scope="public"> <one-of> <item>Florida</item> <item>North Dakota</item> <item>Massachusetts</item> </one-of> </rule> <rule id="city_state" scope="public"> <ruleref uri="#city"/> <ruleref uri="#state"/> </rule></grammar>

9


An SRGS example with PLS• This is a simple SRGS grammar that references an external

Pronunciation Lexicon:

• The grammar allows different pronunciations of words to accommodate many different speakers

<?xml version="1.0" encoding="ISO-8859-1"?> <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="city_state" mode="voice">

<lexicon uri= =“http://www.example.com/city_state.pls"/>

<rule id="city" scope="public"> <one-of> <item>Boston</item> <item>Miami</item> <item>Fargo</item> </one-of> </rule> <rule id="state" scope="public"> <one-of> <item>Florida</item> <item>North Dakota</item> <item>Massachusetts</item> </one-of> </rule> <rule id="city_state" scope="public"> <ruleref uri="#city"/> <ruleref uri="#state"/> </rule></grammar>

10


PLS allows you…• to create Pronunciation Lexicons to be used by both ASR

and TTS

• to take into account different usages:

– For TTS: to improve reading proper names

– For ASR: to give multiple pronunciations

– For TTS/ASR: to expand abbreviations and acronyms

• to exchange Pronunciation Lexicons between different

applications (interoperability)

• to use contextual Pronunciation Lexicons in different

points of the application

• The PLS is a W3C standard language!

PLS saves application developers time/money for creating good speech applications!

11


Phonetic Alphabets

• To describe the pronunciation of a word/phrase, you need a

phonetic alphabet

• An alphabet contains symbols to represent speech sounds,

just like in a dictionary, e.g.

Cracked /krakt/ adj. 1 having cracks. 2 (predic.) slang crazy

• The PLS spec suggests to use either:

– a standard pronunciation alphabet, such as IPA

(defined by the International Phonetic Association,

see: http://www2.arts.gla.ac.uk/IPA/index.html)

– other alphabets:

• SAMPA which is an ASCII-way of encoding IPA and X-SAMPA• Pying, JEITA, etc

12


IPA – Chart• IPA was founded in 1886• It is the major international

association of phoneticians• The IPA alphabet provides

symbols making possible the phonemic transcription of all known languages

• IPA characters can be encoded in Unicode by supplementing ASCII with characters from other ranges, particularly:

– IPA extensions (0250–02AF)

– Latin Extended-A (0100-017F)

• See the detailed: http://www.unicode.org/charts

13


SAMPA – SAM Phonetic Alphabet

• Developed for phonetic transcription in a EU founded project called Speech Assessment Methods (SAM)

• It is ASCII based (easy to write). It is an “ASCII-ization” of IPA• Recently, Prof. John C. Wells proposed an alphabet called

“X-SAMPA”, which encodes all the IPA symbols in ASCII format

• A few examples:– “thin” IPA: /θɪn/ X-SAMPA: /TIn/

– “thing” IPA: /θɪŋ/ X-SAMPA: /TIN/

– “flabbergasted” IPA: /’flæbəgɑːstɪd/ X-SAMPA: /”fl{b@gA:stID/– “Weltanshauung” IPA: /’vɛltʔan,ʃaʊʊŋ/ X-SAMPA: /”vElt?an

%SaUUN/

– en-GB:“vice versa” IPA: /vaɪsə ’vɜːsə/ X-SAMPA: /vaIs@ “v3:s@/

it-IT:“vice versa” IPA: /’viʧe ’vɛrsa/ X-SAMPA: /”vitSe ”vErsa/

14


Phonetic Alphabets – Issues

• How to write pronunciation in a reliable and easy way?

• Problems with fonts, word processors, browsers

• There are very few tools to help with writing pronunciation and to let

you listen to what you have written

• The standardization process may push the creation of tools and the

improvement of the coverage by word processors.

• Has IPA any uses for Asian languages?

• Are there standard phonetic alphabets for Asian languages? Such as pinyin, jyutping or jeita?

• Should they be referenced in a standard way, like “ipa”?

15


The PLS language• PLS is an XML language

<?xml version="1.0" encoding="UTF-8"?>

• The container element is <lexicon>, attributes:

– version (required): "1.0"

– xmlns (required): "http://www.w3.org/2005/pronunciation-

lexicon"

– alphabet (optional): "ipa" (default value)

– xml:lang (optional): “en-US” or “zh-CN” or “jp”

Example:

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"

alphabet="ipa" xml:lang=“zh-CN"> <!– The lexicon for Chinese Mandarin! --></lexicon> • The current PLS is monolingual!

16


The PLS language - metadata• Metadata (annotation of the document for other uses, …)

can be of two varieties:– <meta> element (for compatibility with other markup, like SRGS and SSML)– <metadata> element (which contains the annotations either RDF format or other

formats)Example of metadata:<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"

alphabet="ipa" xml:lang="en-US”> <metadata> <rdf:RDF xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc = "http://purl.org/dc/elements/1.1/">  <rdf:Description rdf:about="" dc:title="Pronunciation lexicon for W3C terms“ dc:description="This lexicon contains common pronunciations for many W3C acronyms and abbreviations, such as I18N, WSDL or WAI" dc:publisher="W3C“ dc:language="en-US“ dc:date="2005-11-29“ dc:rights="Copyright 2002 W3C“ dc:format="application/pls+xml"> <dc:creator>The W3C Voice Browser Working Group</dc:creator> </rdf:Description> </rdf:RDF> </metadata> <!– Add lexicon entries here!! --></lexicon>

17


The PLS language – <lexeme>

• The <lexeme> element is the container of a lexicon entry. It is composed of:– One or more <grapheme> elements

that indicate the words/phrases to be matched in the input

– One or more either <phoneme> or <alias> elements that indicate the possible pronunciations or expansions respectively

• First considerations:– More <grapheme> elements may be present

this means that all of them will match the pronunciations

– More <phoneme> elements may be present this means that several pronunciations are in alternative

– A mixture of <alias> and <phoneme> elements may be present there is a preference mechanism to choose the single one for TTS

18


The PLS language – <grapheme>• The <grapheme> element contains CDATA that represents

orthographies:– Regional spelling variations e.g. "colour" and "color"; – Free spelling variations e.g. "judgment" and "judgement" – Traditional vs Modern spellings e.g. for example in German it is common to

replace "ö" with "oe". – Alternate writing systems, e.g. Japanese uses a mixture of Han ideographs

(Kanji), and phonemic spelling systems e.g. Katakana or Hiragana for representing the orthography of a word or phrase

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"

xml:lang="jp" alphabet="ipa"> <lexeme> <grapheme orthography="Latn">nihongo</grapheme> <grapheme orthography="Hani"> 日本語 </grapheme> <grapheme orthography="Kana"> にほんご </grapheme>

<!– Here you can insert the pronunciation of “nihongo”. in IPA language it could be: "nɪhɒŋɒ" -->v </lexeme> </lexicon>

• Is an explicit “orthography” attribute useful?

• Is it redundant?

19


The PLS language – <phoneme>

• The <phoneme> elements are contained inside <lexeme>

• <phoneme> contains CDATA specifying the pronunciation in a given pronunciation alphabet:– An “alphabet” attribute may be specified to override the alphabet of

the whole lexicon– A “prefer” attribute may be present to indicate precedence among

pronunciations

Example of lexeme for Sepulveda:<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon“ alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>Sepulveda</grapheme> <phoneme>sə'pʌlvɪdə</phoneme> <!– In IPA language it says: "sə'pʌlvɪdə" --> </lexeme> </lexicon>

20


The PLS language – <phoneme>• Other examples

Example for more than one pronunciation of the word “huge”:<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"

xml:lang=“en-US" alphabet="ipa"> <lexeme> <grapheme>huge</grapheme> <phoneme prefer="true">hju:ʤ</phoneme>  <phoneme>ju:ʤ</phoneme>  </lexeme> </lexicon>

Example for the Japanese word “nihongo” with different spellings:<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"

xml:lang="jp" alphabet="ipa"> <lexeme> <grapheme orthography="Latn">nihongo</grapheme> <grapheme orthography="Hani"> 日本語 </grapheme> <grapheme orthography="Kana"> にほんご </grapheme> <phoneme>nɪhɒŋɒ</phoneme>  </lexeme> </lexicon>

21


The PLS language – <alias>

• The <alias> elements are contained inside <lexeme>

• <alias> is used to indicate the pronunciation of an acronym or an abbreviated term in the form of other orthographies.

• <alias> may contain– A “prefer” attribute to indicate precedence among pronunciations

• Both <phoneme> and <alias> may occur in a <lexeme>

Example of lexeme with both <phoneme> and <alias>:<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"

alphabet="ipa" xml:lang="en"> <lexeme> <grapheme>W3C</grapheme> <alias>World Wide Web Consortium</alias> </lexeme> </lexicon>

22


Use Cases/Future IssuesThe current version of PLS can deal with:• Multiple Pronunciations for ASR• Homographs• Abbreviations

But it cannot deal with:• Homophones• Part of speech annotations (and other contextual

information)• Grouping lexemes and external references

Too challenging tasks to be solved for PLS version 1.0

23


Issues for the workshop

• Monolingual lexicon?

• Orthography attribute: Useful or redundant?

• Mandate new phonetic alphabets?

24


Quick demo of SSML+PLS

• Mobile device (with embedded TTS)

• By GPRS, the device connects to a server:– It donwloads News for news site (RSS)– Transformation in SSML– Returned to the mobile device

• The device then:– Shows the news on the screen– Read the SSML document (which includes a lexicon) using

the TTS engine

25


Use Cases – Multiple pronunciations

• More than one pronunciation for a word (very common for

ASR)

Example of two pronunciations for the word “Newton”:

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0“ xmlns="http://www.w3.org/2005/pronunciation-lexicon" alphabet="ipa" xml:lang="en"> <lexeme> <grapheme>Newton</grapheme> <phoneme prefer="true">nju:'tən</phoneme>  <phoneme>nu:'tən</phoneme>  <lexeme></lexicon>

26


Use Cases – Multiple Orthographies

• More than one orthography for a word (common for ASR

and TTS)

Example of two orthographies for colour/color:

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon" alphabet="ipa" xml:lang="en"> <lexeme> <grapheme>color</grapheme> <grapheme>colour</grapheme> <phoneme>'kʌlə</phoneme>  <lexeme></lexicon>

27


Final Remarks

• The usage of PLS:

– Simplifies the development of a speech application

– Improves the performance of speech recognition (in a

standard way)

– Enhances TTS output

• A standard language for PLS enables the exchange of

pronunciations between applications

Documents

Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005