Korea - POS

Embed Size (px)

Citation preview

  • 8/2/2019 Korea - POS

    1/10

    Position Paper for W3C Workshop on Internationalizing SSML

    The Usage of Part-Of-Speech for

    Resolving Multiple Pronunciations in SSML

    2005. 11. 3.

    Myoung-Wan Koo and Du-Seong Chang

    KT/KAIT

  • 8/2/2019 Korea - POS

    2/10

    The Value Networking Company2/9

    Introduction

    Multiple pronunciation problem

    Same word but different pronunciations Newton: /nju:t n/ v.s. /nu:t n/

    Same spelling but different pronunciations (homograph)

    refuse: /r 'fju:z/ v.s. /'refju:s/

    Newton

    nju:t n

    nu:t n

    refuse

    r 'fju:z

    'refju:s

  • 8/2/2019 Korea - POS

    3/10

    The Value Networking Company3/9

    Multiple pronunciation in SSML&PLS

    SSML

    The Speech Synthesis Markup Language SpecificationVersion 1.0

    Pronunciation information in SSML

    Phoneme element

    Lexicon element

    PLS Pronunciation Lexicon Specification Version 1.0

    Pronunciation information in PLS

    Phoneme element

    Prefer attribute

    They doesnt fully support the pronunciation lexicon for multiplepronunciations and agglutinative language.

    Part-Of-Speech information is needed

  • 8/2/2019 Korea - POS

    4/10

    The Value Networking Company4/9

    Pronunciation information in PLS (1/2)

    Pronunciation Lexicon Specification

    Version 1.0/Feb 2005/W3C Voice Browser Working Group It allow interoperable specification of pronunciation

    information for either ASR and TTS engines within voicebrowsing applications.

    It is expected to handle multiple pronunciation.

    Example of PLS

    tomato

    t mei ou

  • 8/2/2019 Korea - POS

    5/10

    The Value Networking Company5/9

    Pronunciation information in PLS (2/2)

    Prefer attribute of phoneme element

    Give one pronunciation high priority among pronunciationcandidates.

    Effective in speech synthesis Only in multiple pronunciations for same orthography

    Not in homograph problem

    refuse: verb /r 'fju:z/ v.s. noun/'refju:s/

    No information for ASR systems.

    Newton

    nju:t n

    nu:t n

  • 8/2/2019 Korea - POS

    6/10

    The Value Networking Company6/9

    Typical Korean TTS system structure

    Morphological

    Analyzer

    Grapheme-to-

    Phoneme

    Prosody

    Analysis

    Waveform

    production

    Pronunciation Dictionary

    morpheme POS Pronunciation

    Text

    Morpheme Dictionary

    morpheme POS1 POS2

    Speech

    Morphemes, POSPhonemes, POS

    Phonemes, Prosody

    Structural Information

  • 8/2/2019 Korea - POS

    7/10The Value Networking Company7/9

    POS for resolving multiple pronunciations

    POS information can reduce the overhead of resolving multiple

    pronunciations in ASR and TTS systems. The word refuse can have two different pronunciations

    depending on pos information.

    Proposal: POS attribute

    refuse

    r 'fju:z

    refuse

    'refju:s

  • 8/2/2019 Korea - POS

    8/10The Value Networking Company

    8/9

    POS information for LVCSR

    Large vocabulary continuous speech recognition of agglutinative

    language Basic unit is morpheme (pseudo-morpheme) for reducing

    the vocabulary size.

    Many homographs in the recognition dictionary.

    POS information help system to get a proper pronunciation

    in a dictionary as well as to resolve multiple pronunciationsin some words.

    It reduce the search time since POS information could cutthe wrong word connection in the first stage, not in thesemantic interpretation stage.

  • 8/2/2019 Korea - POS

    9/10The Value Networking Company

    9/9

    Proposals

    Proposal 1: POS attribute of phoneme element

    Optional attribute Proposal 2: POS element

    Lexeme element contain optional POS elements.

    POS values: language-specific

    Type: allow vendor-specific POS type?

    Outstanding POS set: Penn Treebank, Sejong project

    (Korean)

    refuse

    r 'fju:z

    verb

  • 8/2/2019 Korea - POS

    10/10The Value Networking Company

    10/9

    Conclusion

    No element or attribute for resolving multiple pronunciations

    In current SSML, PLS POS information

    can reduce the overhead of resolving multiplepronunciations in ASR and TTS systems.

    Can reduce the search time in a large vocabulary

    recognition system.

    Can be effective in agglutinative language.

    Proposals

    POS element

    POS attribute