Upload
ziven
View
37
Download
1
Embed Size (px)
DESCRIPTION
Speech-to-Speech MT Design and Engineering. Alon Lavie and Lori Levin MT Class April 16 2001. Outline. Design and Engineering of the JANUS speech-to-speech MT system The Travel & Medical Domain Interlingua (IF) Portability to new domains: ML approaches Evaluation and User Studies - PowerPoint PPT Presentation
Citation preview
Speech-to-Speech MTDesign and Engineering
Alon Lavie and Lori Levin
MT Class
April 16 2001
Outline
• Design and Engineering of the JANUS speech-to-speech MT system
• The Travel & Medical Domain Interlingua (IF)• Portability to new domains: ML approaches• Evaluation and User Studies• Open Problems, Current and Future Research
Overview
• Fundamentals of our approach
• System overview
• Engineering a multi-domain system
• Evaluations and user studies
• Alternative translation approaches
• Current and future research
JANUS Speech Translation
• Translation via an interlingua representation
• Main translation engine is rule-based
• Semantic grammars
• Modular grammar design
• System engineered for multiple domains
• Recent focus on domain portability– using machine learning for rapid extension to a
new domain
The C-STAR Travel Planning Domain
General Scenario:
• Dialogue between one traveler and one or more travel agents
• Focus on making travel arrangements for a personal leisure trip (not business)
• Free spontaneous speech
The C-STAR Travel Planning Domain
Natural breakdown into several sub-domains:
• Hotel Information and Reservation
• Transportation Information and Reservation
• Information about Sights and Events
• General Travel Information
• Cross Domain
Semantic Grammars
• Describe structure of semantic concepts instead of syntactic constituency of phrases
• Well suited for task-oriented dialogue containing many fixed expressions
• Appropriate for spoken language - often disfluent and syntactically ill-formed
• Faster to develop reasonable coverage for limited domains
Semantic Grammars
Hotel Reservation Example:
Input: we have two hotels available
Parse Tree:
[give-information+availability+hotel]
(we have [hotel-type]
([quantity=] (two)
[hotel] (hotels)
available)
The JANUS-III Translation System
The JANUS-III Translation System
The SOUP Parser
• Specifically designed to parse spoken language using domain-specific semantic grammars
• Robust - can skip over disfluencies in input• Stochastic - probabilistic CFG encoded as a
collection of RTNs with arc probabilities• Top-Down - parses from top-level concepts of the
grammar down to matching of terminals• Chart-based - dynamic matrix of parse DAGs
indexed by start and end positions and head cat
The SOUP Parser
• Supports parsing with large multiple domain grammars
• Produces a lattice of parse analyses headed by top-level concepts
• Disambiguation heuristics rank the analyses in the parse lattice and select a single best path through the lattice
• Graphical grammar editor
SOUP Disambiguation Heuristics
• Maximize coverage (of input)• Minimize number of parse trees (fragmentation)• Minimize number of parse tree nodes• Minimize the number of wild-card matches• Maximize the probability of parse trees• Find sequence of domain tags with maximal
probability given the input words: P(T|W), where T= t1,t2,…,tn is a sequence of domain tags
JANUS Generation Modules
Two alternative generation modules:
• Top-Down context-free based generator - fast, used for English and Japanese
• GenKit - unification-based generator augmented with Morphe morphology module - used for German
Modular Grammar Design• Grammar development separated into modules corresponding to
sub-domains (Hotel, Transportation, Sights, General Travel, Cross Domain)
• Shared core grammar for lower-level concepts that are common to the various sub-domains (e.g. times, prices)
• Grammars can be developed independently (using shared core grammar)
• Shared and Cross-Domain grammars significantly reduce effort in expanding to new domains
• Separate grammar modules facilitate associating parses with domain tags - useful for multi-domain integration within the parser
Translation with Multiple Domain Grammars
• Parser is loaded with all domain grammars
• Domain tag attached to grammar rules of each domain
• Previously developed grammars for other domains can also be incorporated
• Parser creates a parse lattice consisting of multiple analyses of the input into sequences of top-level domain concepts
• Parser disambiguation heuristics rank the analyses in the parse lattice and select a single best sequence of concepts
Translation with Multiple Domain Grammars
A SOUP Parse Lattice
Domain Portability: Travel to Medical
Knowledge-Based Methods
Re-usability of knowledge sources for translation and speech recognition
Corpus-Based Methods
Reduce the amount of new training data for translation and speech recognition
Background
• New domain: Medical– Doctor-patient diagnostic conversations– Global importance in emergencies and in
machine translation for remote health care– Synergy with Lincoln Lab
• Joint evaluation
• Joint interlingua
– Test case for portability
Portability
• Advantage: Interlingua• Problem: Writing semantic grammars
– Domain dependent
– Requires time, effort, and expertise
• Approach:– Grammar modularity
– Domain action learning
– Automatic/Interactive semantic grammar induction
Hybrid Stat/Rule-based Analysis
• Developing large coverage semantic analysis grammars is time consuming difficult to port analysis system to new domains
• “low-level” argument grammars are more domain-independent: contain many concepts that are used across domains: time, location, prices, etc.
• “high-level” domain-actions are domain-specific, must be redeveloped for each new domain: give-info+onset+symptom
• Tagging data sets with interlingua representations is less time consuming, needed anyway for system development
Hybrid Rule/Stat Approach
• Combines grammar-based and statistical approaches to analysis:– Develop semantic grammars for phrase-level arguments that are
more portable to new domains
– Use statistical machine learning techniques for classifying into domain-actions
• Porting to a new domain requires:– developing argument parse rules for new domain
– tagging training set with domain-actions for new domain
– training the classifiers for domain-actions on the tagged data
The Hybrid Analysis Process
Parse an utterance for arguments Segment the utterance into sentences Extract features from the utterance
and the single best parse output Use a learned classifier to identify
the speech act Use a learned classifier to identify
the concept sequence Combine into a full parse
Argument Parsing
The SOUP parser produces a forest of parse trees that cover as much of the input as possible
The parse forest can be a mixture of trees allowed by any of the grammars
Only the best parse is used for further processing
Argument Parse ExampleWe have a double room available for you at twenty-three thousand five hundred yen
[=availability=]::PSD ( we have [super_room-type=] ( [room-type=] ( a [room:double] ( double room ) ) ) available )
[arg-party:for-whom=]::ARG ( for [you] ( you ) )[arg:time=]::ARG ( [point=] ( at [hour-minute=] (
[big:hour=] ( [big:23] ( twenty-three ) ) ) ) )[arg:super_price=]::ARG ( [price=] (
[one-price:main-quantity=] ( [n-1000=] ( thousand ) [price:n-100=] ( five hundred ) )
[currency=] ( [yen] ( yen ) ) ) )
Automatic Classification of Domain Actions
Train classifiers for speech acts and concepts Training data: Utterances labeled with speech act,
concepts, and best argument parse Input features
n most common words Arguments and pseudo-arguments in best parse Speaker Predicted speech act (for concept classifier)
Full Parse ExampleWe have a double room available for you at twenty-three thousand five hundred yen
give-information+availability+room ([=availability=]::PSD ( we have [super_room-type=] (
[room-type=] ( a [room:double] ( double room ) ) ) available )
[arg-party:for-whom=]::ARG ( for [you] ( you ) )[arg:time=]::ARG ( [point=] ( at [hour-minute=] (
[big:hour=] ( [big:23] ( twenty-three ) ) ) ) )[arg:super_price=]::ARG ( [price=] (
[one-price:main-quantity=] ( [n-1000=] ( thousand ) [price:n-100=] ( five hundred ) )
[currency=] ( [yen] ( yen ) ) ) ))
Classification Results UsingMemory-based (TiMBL) Classifiers
Classification Accuracy (16-fold Cross Validation)
0
0.1
0.2
0.30.4
0.5
0.6
0.7
0.8
500 1000 2000 3000 4000 5000 6009
Training Set Size
Mea
n A
ccu
racy
Speech Act
Concept Sequence
Domain Action
Status and Open Research
• Preliminary analysis engine implemented, currently used for travel domain in NESPOLE!
• Areas for further research and development:– Explore a variety of classifiers– Explore features for domain-action classification– Classification compositionality – how to claissify the
components of the domain-action separately and combine them?
– Taking advantage of additional knowledge sources: the interlingua specification, dialogue context
– Better address segmentation of utterance into DAs
Automatic Induction of Semantic Grammars
• Seed grammar for a new domain has very limited coverage
• Corpus of development data tagged with interlingua representations available
• Expand the seed grammar by learning new rules for covering the same domain-actions
• First step: how well can we do with no human intervention?
Outline of Semantic Grammar Induction
Tree Matching Linearization
ParserIF
HypothesesGeneration
RulesInduction
Knowledge
RulesManagementSeed
Grammar
s[gi+onset+sym]
( [manner=]
[sym-loc=]
*+became
[adj:sym-name=] )
Learned Grammar
Human vs Machine Experiment
• Seed grammar
• Extended by a human
• Extended by automatic semantic grammar induction
Seed Grammar
Cross Domain
Medical Shared
Around 100 rules and 6000 lexical items
Around 200 rules Around 600 rules and growing
MedicalHello. My name is Sam.
I have a burning sensation in my foot.
A Parse Tree[request-information+existence+body-state]::MED ( WH-PHRASES::XDM ( [q:duration=]::XDM ( [dur:question]::XDM ( how long ) ) ) HAVE-GET-FEEL::MED ( GET ( have ) ) you HAVE-GET-FEEL::MED ( HAS ( had ) ) [super_body-state-spec=]::MED ( [body-state-spec=]::MED ( ID-WHOSE::MED ( [identifiability=] ( [id:non-distant] ( this ) ) )
BODY-STATE::MED ( [pain]::MED ( pain ) ) ) ) )
Manual Grammar Development
•About five additional days of development after the seed grammar was finalized
•Focusing on medical rules only
•Domain-independent rules remain untouched
Development and evaluation sets
• Development set: 133 sentences– from one dialog
• Evaluation set: 83 sentences– from two dialogs – unseen speakers– Only SDUs that could be manually tagged with a full IF
according to the current specification were included.
Grading Procedure: Recall and Precision of IF Components
c:give-information+ speech act
existence+body-state concepts
(body-state-spec=(pain, top-level argument
identifiability=no), sub-argument
body-location= top-level argument
(inside=head)) sub-argument
• Recall – ignored if number of items is 0
• Precision – ignored if 0 out of 0
12.948.26.2Precision
14.128.31.2Recall
Sub-level Values
12.648.20.0Precision
14.128.30.0Recall
Sub-Level Args
39.250.00.0Precision
29.88.30.0Recall
Top-Level Values
34.442.20.0Precision
29.67.20.0Recall
Top-Level Args
25.142.212.5Precision
32.510.12.2Recall
Concept List
45.875.071.0Precision
49.348.243.3Recall
Speech Act
LearnedExtendedSeed
Human vs. Machine: Evaluation Results
User Studies• We conducted three sets of user tests• Travel agent played by experienced system user• Traveler is played by a novice and given five minutes of
instruction• Traveler is given a general scenario - e.g., plan a trip to
Heidelberg
• Communication only via ST system, multi-modal interface and muted video connection
• Data collected used for system evaluation, error analysis and then grammar development
System Evaluation Methodology
• End-to-end evaluations conducted at the SDU (sentence) level
• Multiple bilingual graders compare the input with translated output and assign a grade of: Perfect, OK or Bad
• OK = meaning of SDU comes across• Perfect = OK + fluent output• Bad = translation incomplete or incorrect
August-99 Evaluation
• Data from latest user study - traveler planning a trip to Japan
• 132 utterances containing one or more SDUs, from six different users
• SR word error rate 14.7%
• 40.2% of utterances contain recognition error(s)
Evaluation ResultsMethod Output
LanguageOK+Perfect Perfect
SOUP -Transcribed English 74% 54%SOUP-Recognition English 59% 42%SOUP-Transcribed Japanese 77% 59%SOUP-Recognition Japanese 62% 45%SOUP-Transcribed German 70% 39%SOUP-Recognition German 58% 34%
Evaluation - Progress Over Time
Method OK+Perfect Perfect
Jan-99 Transcribed 69% 46%
Apr-99 Transcribed 70% 49%
Aug-99 Transcribed 74% 54%
Jan-99 Recognition 55% 36%
Apr-99 Recognition 57% 38%
Aug-99 Recognition 59% 42%
Current and Future Work
• Expanding the interlingua: covering descriptive as well as task-oriented sentences
• Developing the new portable approaches• development of the server-based architecture for
supporting multiple applications:– NESPOLE!: speech-MT for advanced e-commerce
– C-STAR: speech-to-speech MT over mobile phones
– LingWear: MT and language assistance on wearable devices
Students Working on the Project
• Chad Langley: Hybrid Rule/Stat Analysis, Speech MT architecture
• Ben Han: Automatic Grammar Induction
• Alicia Tribble: Interlingua and grammar development for Medical Domain
• Joy Zhang, Erik Peterson: Chinese EBMT for LingWear
The JANUS Speech-MT Team• Project Leaders: Lori Levin, Alon Lavie, Tanja
Schultz, Alex Waibel• Grammar and Component Developers: Donna
Gates, Dorcas Wallace, Kay Peterson, Alicia Tribble, Chad Langley, Ben Han, Celine Morel, Susie Burger, Vicky MacLaren, Kornel Laskowski, Erik Peterson