Upload
priya-rajan
View
216
Download
0
Embed Size (px)
Citation preview
8/2/2019 Korea - POS
1/10
Position Paper for W3C Workshop on Internationalizing SSML
The Usage of Part-Of-Speech for
Resolving Multiple Pronunciations in SSML
2005. 11. 3.
Myoung-Wan Koo and Du-Seong Chang
KT/KAIT
8/2/2019 Korea - POS
2/10
The Value Networking Company2/9
Introduction
Multiple pronunciation problem
Same word but different pronunciations Newton: /nju:t n/ v.s. /nu:t n/
Same spelling but different pronunciations (homograph)
refuse: /r 'fju:z/ v.s. /'refju:s/
Newton
nju:t n
nu:t n
refuse
r 'fju:z
'refju:s
8/2/2019 Korea - POS
3/10
The Value Networking Company3/9
Multiple pronunciation in SSML&PLS
SSML
The Speech Synthesis Markup Language SpecificationVersion 1.0
Pronunciation information in SSML
Phoneme element
Lexicon element
PLS Pronunciation Lexicon Specification Version 1.0
Pronunciation information in PLS
Phoneme element
Prefer attribute
They doesnt fully support the pronunciation lexicon for multiplepronunciations and agglutinative language.
Part-Of-Speech information is needed
8/2/2019 Korea - POS
4/10
The Value Networking Company4/9
Pronunciation information in PLS (1/2)
Pronunciation Lexicon Specification
Version 1.0/Feb 2005/W3C Voice Browser Working Group It allow interoperable specification of pronunciation
information for either ASR and TTS engines within voicebrowsing applications.
It is expected to handle multiple pronunciation.
Example of PLS
tomato
t mei ou
8/2/2019 Korea - POS
5/10
The Value Networking Company5/9
Pronunciation information in PLS (2/2)
Prefer attribute of phoneme element
Give one pronunciation high priority among pronunciationcandidates.
Effective in speech synthesis Only in multiple pronunciations for same orthography
Not in homograph problem
refuse: verb /r 'fju:z/ v.s. noun/'refju:s/
No information for ASR systems.
Newton
nju:t n
nu:t n
8/2/2019 Korea - POS
6/10
The Value Networking Company6/9
Typical Korean TTS system structure
Morphological
Analyzer
Grapheme-to-
Phoneme
Prosody
Analysis
Waveform
production
Pronunciation Dictionary
morpheme POS Pronunciation
Text
Morpheme Dictionary
morpheme POS1 POS2
Speech
Morphemes, POSPhonemes, POS
Phonemes, Prosody
Structural Information
8/2/2019 Korea - POS
7/10The Value Networking Company7/9
POS for resolving multiple pronunciations
POS information can reduce the overhead of resolving multiple
pronunciations in ASR and TTS systems. The word refuse can have two different pronunciations
depending on pos information.
Proposal: POS attribute
refuse
r 'fju:z
refuse
'refju:s
8/2/2019 Korea - POS
8/10The Value Networking Company
8/9
POS information for LVCSR
Large vocabulary continuous speech recognition of agglutinative
language Basic unit is morpheme (pseudo-morpheme) for reducing
the vocabulary size.
Many homographs in the recognition dictionary.
POS information help system to get a proper pronunciation
in a dictionary as well as to resolve multiple pronunciationsin some words.
It reduce the search time since POS information could cutthe wrong word connection in the first stage, not in thesemantic interpretation stage.
8/2/2019 Korea - POS
9/10The Value Networking Company
9/9
Proposals
Proposal 1: POS attribute of phoneme element
Optional attribute Proposal 2: POS element
Lexeme element contain optional POS elements.
POS values: language-specific
Type: allow vendor-specific POS type?
Outstanding POS set: Penn Treebank, Sejong project
(Korean)
refuse
r 'fju:z
verb
8/2/2019 Korea - POS
10/10The Value Networking Company
10/9
Conclusion
No element or attribute for resolving multiple pronunciations
In current SSML, PLS POS information
can reduce the overhead of resolving multiplepronunciations in ASR and TTS systems.
Can reduce the search time in a large vocabulary
recognition system.
Can be effective in agglutinative language.
Proposals
POS element
POS attribute