14
MAI Internship April-May 2002

MAI Internship April-May 2002

  • Upload
    ulani

  • View
    19

  • Download
    0

Embed Size (px)

DESCRIPTION

MAI Internship April-May 2002. What?. The AST Project promotes development of speech technology for official languages of South Africa SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho Create reusable databases & software Prototype hotel booking dialogue system 2000-2003. - PowerPoint PPT Presentation

Citation preview

Page 1: MAI Internship April-May 2002

MAI Internship April-May 2002

Page 2: MAI Internship April-May 2002

MAI Internship 2002 Slide 2 of 14

What?

• The AST Project promotes development of speech technology for official languages of South Africa

• SAEnglish, Afrikaans, Zulu, Xhosa, Sesotho

• Create reusable databases & software• Prototype hotel booking dialogue

system• 2000-2003

Page 3: MAI Internship April-May 2002

MAI Internship 2002 Slide 3 of 14

AST dialogue system: basics

Telephone Network

Speech Recognitio

n

Natural Language Understanding

Dialogue Manager

Speech Synthesis

DATABASE

Page 4: MAI Internship April-May 2002

MAI Internship 2002 Slide 4 of 14

• Use? input ASR: acoustic training output ASR: dictionary

• Start from scratch, even for SAE• Telephone data based on SpeechDat

– Datasheet utterances– Hierarchical recruiting method

• Labeling Tool: PRAAT

AST Speech Database

Page 5: MAI Internship April-May 2002

MAI Internship 2002 Slide 5 of 14

Language Spoken Code No. of Speakers

1 English (E) Speech varieties: Mother-tongue English Black English Coloured English Asian English Afrikaans English

  EEBECEASEAE

1500-2000 300-400300-400300-400300-400300-400

2 isiXhosa (X) XX 300-400

3 Sesotho (S) SS 300-400

4 isiZulu (Z) ZZ 300-400

5 Afrikaans (A) Speech varieties: Mother-tongue Afrikaans Black Afrikaans Coloured Afrikaans

  AABACA

900-1200 300-400300-400300-400

Page 6: MAI Internship April-May 2002

MAI Internship 2002 Slide 6 of 14

AST Speech Database

Orthographic annotation

Phonemic transcription

Acoustic signal

Phonetic alignment

Manual labour

Rules & dictionary: Patana

Forced alignment: HTK

Page 7: MAI Internship April-May 2002

MAI Internship 2002 Slide 7 of 14

• Difficult:– Speaker independent, noisy conditions– Medium-size vocabulary (10.000 words)– Training data sparse

• Not so difficult:– Dialogue Manager helps

• Phoneme-based HMMs future diphones

• Finite-state language model• Pitch & clicks African languages ignored

AST Speech Recognition

Page 8: MAI Internship April-May 2002

MAI Internship 2002 Slide 8 of 14

• Same finite-state network as language model recogniser +: all utterances ‘understood’

-: FSG are limited• Makes no sense to recognise more than

we can understand• Semantic labels are activated• Alternative: robust parsing (Phoenix,

ATIS)

AST Natural Language Understanding

Page 9: MAI Internship April-May 2002

MAI Internship 2002 Slide 9 of 14

Speech Recognitio

n

NLU Dialogue

ManagerFSG

Recognised utterance

Grammar IDGrammar ID

Meaning

AST Natural Language Understanding

Page 10: MAI Internship April-May 2002

MAI Internship 2002 Slide 10 of 14

Embedded semantic tags:‘drie honderd duisend agt en neëntig’ 3 0 0 0 9 8

biljoen miljard miljoen duisend

NIL

NILNIL

NIL

NIL

miljoenmiljard

miljard

biljoen

biljoen

biljoen

$honderd

$honderd_en

@hundreds@hundreds@hundreds@hundreds

V6=3 V5=0 V4=0V3=0 V2=9 V1=8

t1=3 t2=0 t3=0

AST Natural Language Understanding

Page 11: MAI Internship April-May 2002

MAI Internship 2002 Slide 11 of 14

• Trade-off: naturalness response restriction

• System-directed: predictability user utterances, simple dialogues

• Mixed-initiative: shorter dialogues, more recognition errors

• User-initiative: unpopular

AST Dialogue Manager

Page 12: MAI Internship April-May 2002

MAI Internship 2002 Slide 12 of 14

Design:• Early focus on users and task• Wizard-of-Oz: pay no attention to the

man behind the curtain• System-in-the-loop

• Finite-state structure because of simplicity and functionality

• Possible frame-based approach in future

AST Dialogue Manager

Page 13: MAI Internship April-May 2002

MAI Internship 2002 Slide 13 of 14

• Fixed machine utterances: pre-recorded speech

• Database queries: limited-domain synthesis (Festival platform)

AST Speech Synthesis

Page 14: MAI Internship April-May 2002

MAI Internship 2002 Slide 14 of 14

Conclusion

Finite-state approach in– Recogniser– NLU component– Dialogue manager

Workable prototype New fundings 2003