55
Training Statistical Training Statistical Language Models from Language Models from Grammar-Generated Data: Grammar-Generated Data: A Comparative Case-Study A Comparative Case-Study Manny Rayner Geneva University (joint work with Beth Ann Hockey and Gwen Christian)

Training Statistical Language Models from Grammar-Generated Data: A Comparative Case-Study

  • Upload
    raziya

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Training Statistical Language Models from Grammar-Generated Data: A Comparative Case-Study. Manny Rayner Geneva University (joint work with Beth Ann Hockey and Gwen Christian). Structure of talk. Background: Regulus and MedSLT Grammar-based language models and statistical language models. - PowerPoint PPT Presentation

Citation preview

Page 1: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Training Statistical Language Training Statistical Language Models from Models from

Grammar-Generated Data: Grammar-Generated Data: A Comparative Case-StudyA Comparative Case-Study

Manny RaynerGeneva University

(joint work with Beth Ann Hockey and Gwen Christian)

Page 2: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Structure of talkStructure of talk

Background: Regulus and MedSLTGrammar-based language models and

statistical language models

Page 3: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

What is MedSLT?What is MedSLT?

Open Source medical speech translation system for doctor-patient dialogues

Medium-vocabulary (400-1500 words)Grammar-based: uses Regulus platformMulti-lingual: translate through interlingua

Page 4: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

MedSLTMedSLT

Open Source medical speech translator for doctor – patient examinations

Main system unidirectional (patient answers non-verbally, e.g. nods or points)– Also experimental bidirectional system

Two main purposes– Potentially useful (could save lives!)– Vehicle for experimenting with underlying Regulus

spoken dialogue engineering toolkit

Page 5: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Regulus: central goalsRegulus: central goals

Reusable grammar-based language models– Compile into recognisers

Infrastructure for using them in applications– Speech translation– Spoken dialogue

MultilingualEfficient development environmentOpen Source

Page 6: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

$25 (paperback edition)from amazon.com

The full story…

Page 7: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

What kind of applications?What kind of applications?

Grammar-based is – Good on in-coverage data– Good for complex, structured utterances

Users need to – Know what they can say– Be concerned about accuracy

Good target applications– Safety-critical – Medium vocabulary (~200 – 2000 words)

Page 8: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

In particular…In particular…

Clarissa– NASA procedure assistant for astronauts– ~250 word vocabulary, ~75 command types

MedSLT– Multilingual medical speech translator– ~400 – ~1000 words, ~30 question types

SDS – Experimental in-car system from Ford Research– First prize, Ford internal demo fair, 2007– ~750 words

Page 9: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Key technical ideasKey technical ideas

Reusable grammar resourcesUse grammars for multiple purposes

– Parsing– Generation– Recognition

Appropriate use of statistical methods

Page 10: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Reusable grammar resourcesReusable grammar resources

Building a good grammar from scratch is very challenging

Need a methodology for rational reuse of existing grammar structure

Use small corpus of examples to extract structure from a large resource grammar

Page 11: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

GeneralUG

EBL Specialization

UG to CFGCompiler

R E G U L U S

Application Specific

UG

CFGGrammar

Recognizer

NUANCE

The Regulus pictureThe Regulus picture

Lexicon

Training Corpus

PCFGGrammar

CFG to PCFGCompiler

(P)CFG to RecogniserCompiler

OperationalityCriteria

Page 12: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

The general English grammarThe general English grammar

Loosely based on SRI Core Language Engine grammar

Compositional semantics (4 different versions) ~200 unification grammar rules ~75 features Core lexicon, ~ 450 words

(Also resource grammars for French, Spanish, Catalan, Japanese, Arabic, Finnish, Greek)

Page 13: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

General grammar General grammar domain-specific grammardomain-specific grammar

“Macro-rule learning”Corpus-based processRemove unused rules and lexicon itemsFlatten parsed examples to remove structureSimpler structure less ambiguity

smaller search space

Page 14: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

when do you get headaches

PP PPV PRO V N

NP NBAR

NPVBARVBAR

VP

VP

VP

S

S

UTTERANCE EBL example (1)EBL example (1)

Page 15: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

when do you get headaches

PP PPV PRO V N

NP NBAR

NPVBARVBAR

VP

VP

VP

S

S

UTTERANCE EBL example (2)EBL example (2)

Page 16: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

when do you get headaches

PP V PRO V N

NP

NPVBARVBAR

S

UTTERANCE EBL example (3)EBL example (3)

Main new rules:

S PP VBAR VBAR NPNP N

Page 17: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Using grammars for multiple Using grammars for multiple purposespurposes

Parsing– Surface words logical form

Generation– Logical form surface words

Recognition– Speech surface words

Page 18: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Building a speech translatorBuilding a speech translator

Combine Regulus-based components– Source-language recognizer (speech words)– Source-language parser (words logical form)– Transfer from source to target, via interlingua

(logical form logical form)– Target-language generator (logical form

words)– (3rd party text to speech)

Page 19: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Adding statistical methodsAdding statistical methods

Two different ways to use statistical methods:

Statistical tuning of grammarIntelligent help system

Page 20: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Impact of statistical tuningImpact of statistical tuning

(Regulus book, chapter 11)Base recogniser

– MedSLT with English recogniser– Training corpus: 650 utterances– Vocabulary: 429 surface words

Test data:– 801 spoken and transcribed utterances

Page 21: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Vary vocabulary sizeVary vocabulary size

Add lexical items (11 different versions)Total vocabulary 429 – 3788 surface wordsNew vocabulary not used in test dataExpect degradation in performance

– Larger search space– New possibilities just a distraction

Page 22: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Impact of statistical tuningImpact of statistical tuningfor different vocabulary sizesfor different vocabulary sizes

0

5

10

15

20

25

429 1392 2096 2698 3266 3788

CFG

PCFG

Vocabulary size

Sem

Error R

ate

Page 23: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Intelligent help systemIntelligent help system

Need robustness somewhereAdd a backup statistical recogniserUse it to advise the user

– Approximate match with in-coverage examples– Show user similar things they could say

Original paper: Gorrell, Lewin and Rayner, ICSLP 2002

Page 24: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

MedSLT experimentsMedSLT experiments

(Chatzichrisafis et al, HLT workshop 2006)French English version of systemBasic questions

– How quickly do novices become experts?– Can people adapt to limited coverage?

Let subjects use system several times, and track performance

Page 25: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Experimental SetupExperimental Setup

Subjects– 8 medical students, no previous knowledge of system

Scenario– Experimenter simulates headache– Subject must diagnose it– 3 sessions, 3 tasks per session

Instruction– ~20 min instructions & video (headset, push-to-talk)– All other instruction from help system

Page 26: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Session 1Session 2

Session 3

98.6

63.4

53.9

40

50

60

70

80

90

100

Results - # InteractionsResults - # InteractionsInteractions

Page 27: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Results – Time/DiagnosisResults – Time/Diagnosis

00

02

04

07

09

12

14

16

19

Session 1 Session 2 Session 3

Diagnosis 1 Diagnosis 2 Diagnosis 3

Page 28: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Questionnaire results Questionnaire results I quickly learned how to use the system. 4.4

System response times were generally satisfactory. 4.5

When the system did not understand me, the help system usually showed me another way to ask the question. 4.6

When I knew what I could say, the system usually recognized me correctly. 4.3

I was often unable to ask the questions I wanted. 3.8

I could ask enough questions that I was sure of my diagnosis. 4.3

This system is more effective than non-verbal communication using gestures. 4.3

I would use this system again in a similar situation. 4.1

Page 29: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

SummarySummary

After 1.5 hours of use, subjects complete task in average of 4 minutes– System implementers average 3 minutes

All coverage learned from help systemSubjects’ impressions very positive

Page 30: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

A few words about interlinguaA few words about interlingua

Coverage in different languages diverges if left to itself– Want to enforce uniform coverage

Many-to-many translation– “N2 problem”

Solution: translate through interlingua– Tight interlingua definition

Page 31: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Interlingua grammarInterlingua grammar

Think of interlingua as a languageDefine using Regulus

– Mostly for constraining representations– Also get a surface form

“Semantic grammar”– Not linguistic, all about domain constraints

Page 32: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Example of interlinguaExample of interlingua

“YN-QUESTION pain become-better sc-when [ you sleep PRESENT] PRESENT”

[[utterance_type, ynq], [symptom, pain], [event, become_better], [tense, present], [sc, when], [clause, [[utterance_type, dcl], [pronoun, you], [action, sleep], [tense, present]]]]

Page 33: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Constraints from interlinguaConstraints from interlingua

Source language sentences licensed by grammar may not produce valid interlingua

Interlingua can act as a knowledge source to improve language modelling

Page 34: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Structure of talkStructure of talk

Background: Regulus and MedSLT Grammar-based language models and

statistical language models

Page 35: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Language modelsLanguage models

Two kinds of language modelsStatistical (SLM)

– Trainable, robust– Require a lot of corpus data

Grammar-based (GLM)– Require little corpus data– Brittle

Page 36: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Compromises between Compromises between SLM and GLMSLM and GLM

Put weights on GLM (CFG PCFG)– Powerful technique, see earlier– Doesn’t address robustness

Put GLMs inside SLMs (Wang et al, 2002)Use GLM to generate training data for SLM

(Jurafsky et al 1995, Jonson 2005)

Page 37: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Generating SLM training data Generating SLM training data with a GLMwith a GLM

Optimistic view– Need only small seed corpus, to build GLM– Will be robust, since finally an SLM

Pessimistic view– “Something for nothing”– Data for GLM could be used directly to build an SLM

Hard to decide– Don’t know what data went into GLM– Often just in grammar writer’s head

Page 38: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Regulus permits comparisonRegulus permits comparison

Use Regulus to build GLMData-driven process with explicit corpusSame corpus can be used to build SLMComparison is meaningful

Page 39: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Two ways to build SLMTwo ways to build SLM

Direct– Seed corpus SLM

Indirect– Seed corpus GLM corpus SLM

Page 40: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Parameters for indirect methodParameters for indirect method

Size of generated corpus– Can generate any amount of data

Method of generating corpus– CFG versus PCFG

Filtering– Use interlingua to filter generated corpus

Page 41: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

CFG versus PCFG generationCFG versus PCFG generation

CFG– Use plain GLM to do random generation

PCFG– Use seed corpus to weight GLM rules– Weights then used in random generation

Page 42: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Interlingua filteringInterlingua filtering

Impossible to make GLM completely tightMany in-coverage sentences make no senseSome of these don’t produce valid

interlinguaUse interlingua grammar as filter

Page 43: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Example: CFG generated dataExample: CFG generated data

what attacks of them 're your duration all dayhave a few sides of the right sides regularly frequently hurtwhere 's it increasedwhat previously helped this headachehave not any often ever helpedare you usually made drowsy at homewhat sometimes relieved any gradually during its night's this severity frequently increased before helpingwhen are you usually at homehow many kind of changes in temperature help a history

Page 44: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Example: PCFG generated dataExample: PCFG generated data

does bright light cause the attacksare there its cigarettesdoes a persistent pain last several hoursis your pain usually the same beforewere there them when this kind of large meal helped joint paindo sudden head movements usually help to usually relieve the

painare you thirstydoes nervousness aggravate light sensitivityis the pain sometimes in the faceis the pain associated with your headaches

Page 45: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Example: PCFG generated data Example: PCFG generated data with interlingua filteringwith interlingua filtering

does a persistent pain last several hoursdo sudden head movements usually help to usually relieve the

painare you thirstydoes nervousness aggravate light sensitivityis the pain sometimes in the facehave you regularly experienced the paindo you get the attacks hoursis the headache pain betterare headaches worseis neck trauma unchanging

Page 46: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

ExperimentsExperiments

Start with same English seed corpus – 948 utterances

Generate GLM recogniser Generate different types of training corpus

– Train SLM from each corpus Compare recognition performance

– Word Error Rate (WER)– Sentence Error Rate (SER)

McNemar sign test on SER to get significance

Page 47: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Experiment 1: different methodsExperiment 1: different methods

Version corpus WER SER

GLM 948 21.96% 50.62%

SLM, seed corpus 948 27.74% 58.40%

SLM, CFG, no filter 4281 49.0% 88.4%

SLM, CFG, filter 4281 44.68% 85.68%

SLM, PCFG, no filter 4281 25.98% 65.31%

SLM, PCFG, filter 4281 25.81% 63.70%

Page 48: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Experiment 1:Experiment 1:significant differencessignificant differences

GLM >> all SLMsseed corpus >> all generated corporaPCFG generation >> CFG generationfiltered > not filtered

However, generated corpora are small…

Page 49: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Experiment 2: Experiment 2: different sizes of corpusdifferent sizes of corpus

Version corpus WER SER

GLM 948 21.96% 50.62%

SLM, seed corpus 948 27.74% 58.40%

SLM, PCFG, no filter 16 619 24.84% 62.47%

SLM, PCFG, filter 16 619 23.80% 59.51%

SLM, PCFG, no filter 497 798 24.38% 59.88%

SLM, PCFG, filter 497 798 23.76% 57.16%

Page 50: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Experiment 2:Experiment 2:significant differencessignificant differences

GLM >> all SLMs large corpus > small corpus large unfiltered generated corpus ~ seed corpus

– SER for large unfiltered corpus about the same

large filtered generated corpus ~/> seed corpus – SER for large filtered corpus better, but not significant

filtered > not filtered

Page 51: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Experiment 3: Experiment 3: like 2, but only in-coverage datalike 2, but only in-coverage data

Version corpus WER SER

GLM 948 7.00% 22.37%

SLM, seed corpus 948 14.40% 42.02%

SLM, PCFG, no filter 16 619 14.13% 46.11%

SLM, PCFG, filter 16 619 12.76% 40.86%

SLM, PCFG, no filter 497 798 12.35% 40.66%

SLM, PCFG, filter 497 798 11.25% 36.19%

Page 52: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Experiment 3:Experiment 3:significant differencessignificant differences

GLM >> all SLMs large corpus > small corpus large unfiltered generated corpus ~/> seed corpus

– SER for large unfiltered corpus better, not significant

large filtered generated corpus > seed corpus filtered > not filtered

Page 53: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

Using GLMs to make SLMs:Using GLMs to make SLMs:conclusionsconclusions

Regulus lets us evaluate fairlyIndirect method for building SLM only

slightly better than direct oneGLM better than all SLM variants

– Especially clear on in-coverage data

PCFG generation much better than CFG

Page 54: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

SummarySummary

MedSLT– Potentially useful tool for doctors in future– Good test-bed for research now

Using GLMs to build SLMs– Example of how Regulus lets us evaluate a

grammar-based method objectively

Page 55: Training Statistical Language Models from  Grammar-Generated Data:  A Comparative Case-Study

For more informationFor more information

Regulus websiteshttp://sourceforge.net/projects/regulus/http://www.issco.unige.ch/projects/regulus/Rayner, Hockey and Bouillon“Putting Linguistics Into Speech Recognition” (CSLI Press, June 2006)