Speech-to-Speech MT Design and Engineering

Speech-to-Speech MTDesign and Engineering

Alon Lavie and Lori Levin

MT Class

April 16 2001

Outline

• Design and Engineering of the JANUS speech-to-speech MT system

• The Travel & Medical Domain Interlingua (IF)• Portability to new domains: ML approaches• Evaluation and User Studies• Open Problems, Current and Future Research

Overview

• Fundamentals of our approach

• System overview

• Engineering a multi-domain system

• Evaluations and user studies

• Alternative translation approaches

• Current and future research

JANUS Speech Translation

• Translation via an interlingua representation

• Main translation engine is rule-based

• Semantic grammars

• Modular grammar design

• System engineered for multiple domains

• Recent focus on domain portability– using machine learning for rapid extension to a

new domain

The C-STAR Travel Planning Domain

General Scenario:

• Dialogue between one traveler and one or more travel agents

• Focus on making travel arrangements for a personal leisure trip (not business)

• Free spontaneous speech

The C-STAR Travel Planning Domain

Natural breakdown into several sub-domains:

• Hotel Information and Reservation

• Transportation Information and Reservation

• Information about Sights and Events

• General Travel Information

• Cross Domain

Semantic Grammars

• Describe structure of semantic concepts instead of syntactic constituency of phrases

• Well suited for task-oriented dialogue containing many fixed expressions

• Appropriate for spoken language - often disfluent and syntactically ill-formed

• Faster to develop reasonable coverage for limited domains

Semantic Grammars

Hotel Reservation Example:

Input: we have two hotels available

Parse Tree:

[give-information+availability+hotel]

(we have [hotel-type]

([quantity=] (two)

[hotel] (hotels)

available)

The JANUS-III Translation System

The JANUS-III Translation System

The SOUP Parser

• Specifically designed to parse spoken language using domain-specific semantic grammars

• Robust - can skip over disfluencies in input• Stochastic - probabilistic CFG encoded as a

collection of RTNs with arc probabilities• Top-Down - parses from top-level concepts of the

grammar down to matching of terminals• Chart-based - dynamic matrix of parse DAGs

indexed by start and end positions and head cat

The SOUP Parser

• Supports parsing with large multiple domain grammars

• Produces a lattice of parse analyses headed by top-level concepts

• Disambiguation heuristics rank the analyses in the parse lattice and select a single best path through the lattice

• Graphical grammar editor

SOUP Disambiguation Heuristics

• Maximize coverage (of input)• Minimize number of parse trees (fragmentation)• Minimize number of parse tree nodes• Minimize the number of wild-card matches• Maximize the probability of parse trees• Find sequence of domain tags with maximal

probability given the input words: P(T|W), where T= t1,t2,…,tn is a sequence of domain tags

JANUS Generation Modules

Two alternative generation modules:

• Top-Down context-free based generator - fast, used for English and Japanese

• GenKit - unification-based generator augmented with Morphe morphology module - used for German

Modular Grammar Design• Grammar development separated into modules corresponding to

sub-domains (Hotel, Transportation, Sights, General Travel, Cross Domain)

• Shared core grammar for lower-level concepts that are common to the various sub-domains (e.g. times, prices)

• Grammars can be developed independently (using shared core grammar)

• Shared and Cross-Domain grammars significantly reduce effort in expanding to new domains

• Separate grammar modules facilitate associating parses with domain tags - useful for multi-domain integration within the parser

Translation with Multiple Domain Grammars

• Parser is loaded with all domain grammars

• Domain tag attached to grammar rules of each domain

• Previously developed grammars for other domains can also be incorporated

• Parser creates a parse lattice consisting of multiple analyses of the input into sequences of top-level domain concepts

• Parser disambiguation heuristics rank the analyses in the parse lattice and select a single best sequence of concepts

Translation with Multiple Domain Grammars

A SOUP Parse Lattice

Domain Portability: Travel to Medical

Knowledge-Based Methods

Re-usability of knowledge sources for translation and speech recognition

Corpus-Based Methods

Reduce the amount of new training data for translation and speech recognition

Background

• New domain: Medical– Doctor-patient diagnostic conversations– Global importance in emergencies and in

machine translation for remote health care– Synergy with Lincoln Lab

• Joint evaluation

• Joint interlingua

– Test case for portability

Portability

• Advantage: Interlingua• Problem: Writing semantic grammars

– Domain dependent

– Requires time, effort, and expertise

• Approach:– Grammar modularity

– Domain action learning

– Automatic/Interactive semantic grammar induction

Hybrid Stat/Rule-based Analysis

• Developing large coverage semantic analysis grammars is time consuming difficult to port analysis system to new domains

• “low-level” argument grammars are more domain-independent: contain many concepts that are used across domains: time, location, prices, etc.

• “high-level” domain-actions are domain-specific, must be redeveloped for each new domain: give-info+onset+symptom

• Tagging data sets with interlingua representations is less time consuming, needed anyway for system development

Hybrid Rule/Stat Approach

• Combines grammar-based and statistical approaches to analysis:– Develop semantic grammars for phrase-level arguments that are

more portable to new domains

– Use statistical machine learning techniques for classifying into domain-actions

• Porting to a new domain requires:– developing argument parse rules for new domain

– tagging training set with domain-actions for new domain

– training the classifiers for domain-actions on the tagged data

The Hybrid Analysis Process

Parse an utterance for arguments Segment the utterance into sentences Extract features from the utterance

and the single best parse output Use a learned classifier to identify

the speech act Use a learned classifier to identify

the concept sequence Combine into a full parse

Argument Parsing

The SOUP parser produces a forest of parse trees that cover as much of the input as possible

The parse forest can be a mixture of trees allowed by any of the grammars

Only the best parse is used for further processing

Argument Parse ExampleWe have a double room available for you at twenty-three thousand five hundred yen

[=availability=]::PSD ( we have [super_room-type=] ( [room-type=] ( a [room:double] ( double room ) ) ) available )

[arg-party:for-whom=]::ARG ( for [you] ( you ) )[arg:time=]::ARG ( [point=] ( at [hour-minute=] (

[big:hour=] ( [big:23] ( twenty-three ) ) ) ) )[arg:super_price=]::ARG ( [price=] (

[one-price:main-quantity=] ( [n-1000=] ( thousand ) [price:n-100=] ( five hundred ) )

[currency=] ( [yen] ( yen ) ) ) )

Automatic Classification of Domain Actions

Train classifiers for speech acts and concepts Training data: Utterances labeled with speech act,

concepts, and best argument parse Input features

n most common words Arguments and pseudo-arguments in best parse Speaker Predicted speech act (for concept classifier)

Full Parse ExampleWe have a double room available for you at twenty-three thousand five hundred yen

give-information+availability+room ([=availability=]::PSD ( we have [super_room-type=] (

[room-type=] ( a [room:double] ( double room ) ) ) available )

[arg-party:for-whom=]::ARG ( for [you] ( you ) )[arg:time=]::ARG ( [point=] ( at [hour-minute=] (

[big:hour=] ( [big:23] ( twenty-three ) ) ) ) )[arg:super_price=]::ARG ( [price=] (

[one-price:main-quantity=] ( [n-1000=] ( thousand ) [price:n-100=] ( five hundred ) )

[currency=] ( [yen] ( yen ) ) ) ))

Classification Results UsingMemory-based (TiMBL) Classifiers

Classification Accuracy (16-fold Cross Validation)

0

0.1

0.2

0.30.4

0.5

0.6

0.7

0.8

500 1000 2000 3000 4000 5000 6009

Training Set Size

Mea

n A

ccu

racy

Speech Act

Concept Sequence

Domain Action

Status and Open Research

• Preliminary analysis engine implemented, currently used for travel domain in NESPOLE!

• Areas for further research and development:– Explore a variety of classifiers– Explore features for domain-action classification– Classification compositionality – how to claissify the

components of the domain-action separately and combine them?

– Taking advantage of additional knowledge sources: the interlingua specification, dialogue context

– Better address segmentation of utterance into DAs

Automatic Induction of Semantic Grammars

• Seed grammar for a new domain has very limited coverage

• Corpus of development data tagged with interlingua representations available

• Expand the seed grammar by learning new rules for covering the same domain-actions

• First step: how well can we do with no human intervention?

Outline of Semantic Grammar Induction

Tree Matching Linearization

ParserIF

HypothesesGeneration

RulesInduction

Knowledge

RulesManagementSeed

Grammar

s[gi+onset+sym]

( [manner=]

[sym-loc=]

*+became

[adj:sym-name=] )

Learned Grammar

Human vs Machine Experiment

• Seed grammar

• Extended by a human

• Extended by automatic semantic grammar induction

Seed Grammar

Cross Domain

Medical Shared

Around 100 rules and 6000 lexical items

Around 200 rules Around 600 rules and growing

MedicalHello. My name is Sam.

I have a burning sensation in my foot.

A Parse Tree[request-information+existence+body-state]::MED ( WH-PHRASES::XDM ( [q:duration=]::XDM ( [dur:question]::XDM ( how long ) ) ) HAVE-GET-FEEL::MED ( GET ( have ) ) you HAVE-GET-FEEL::MED ( HAS ( had ) ) [super_body-state-spec=]::MED ( [body-state-spec=]::MED ( ID-WHOSE::MED ( [identifiability=] ( [id:non-distant] ( this ) ) )

BODY-STATE::MED ( [pain]::MED ( pain ) ) ) ) )

Manual Grammar Development

•About five additional days of development after the seed grammar was finalized

•Focusing on medical rules only

•Domain-independent rules remain untouched

Development and evaluation sets

• Development set: 133 sentences– from one dialog

• Evaluation set: 83 sentences– from two dialogs – unseen speakers– Only SDUs that could be manually tagged with a full IF

according to the current specification were included.

Grading Procedure: Recall and Precision of IF Components

c:give-information+ speech act

existence+body-state concepts

(body-state-spec=(pain, top-level argument

identifiability=no), sub-argument

body-location= top-level argument

(inside=head)) sub-argument

• Recall – ignored if number of items is 0

• Precision – ignored if 0 out of 0

12.948.26.2Precision

14.128.31.2Recall

Sub-level Values


14.128.30.0Recall

Sub-Level Args


29.88.30.0Recall

Top-Level Values


29.67.20.0Recall

Top-Level Args


32.510.12.2Recall

Concept List


49.348.243.3Recall

Speech Act

LearnedExtendedSeed

Human vs. Machine: Evaluation Results

User Studies• We conducted three sets of user tests• Travel agent played by experienced system user• Traveler is played by a novice and given five minutes of

instruction• Traveler is given a general scenario - e.g., plan a trip to

Heidelberg

• Communication only via ST system, multi-modal interface and muted video connection

• Data collected used for system evaluation, error analysis and then grammar development

System Evaluation Methodology

• End-to-end evaluations conducted at the SDU (sentence) level

• Multiple bilingual graders compare the input with translated output and assign a grade of: Perfect, OK or Bad

• OK = meaning of SDU comes across• Perfect = OK + fluent output• Bad = translation incomplete or incorrect

August-99 Evaluation

• Data from latest user study - traveler planning a trip to Japan

• 132 utterances containing one or more SDUs, from six different users

• SR word error rate 14.7%

• 40.2% of utterances contain recognition error(s)

Evaluation ResultsMethod Output

LanguageOK+Perfect Perfect

SOUP -Transcribed English 74% 54%SOUP-Recognition English 59% 42%SOUP-Transcribed Japanese 77% 59%SOUP-Recognition Japanese 62% 45%SOUP-Transcribed German 70% 39%SOUP-Recognition German 58% 34%

Evaluation - Progress Over Time

Method OK+Perfect Perfect

Jan-99 Transcribed 69% 46%

Apr-99 Transcribed 70% 49%

Aug-99 Transcribed 74% 54%

Jan-99 Recognition 55% 36%

Apr-99 Recognition 57% 38%

Aug-99 Recognition 59% 42%

Current and Future Work

• Expanding the interlingua: covering descriptive as well as task-oriented sentences

• Developing the new portable approaches• development of the server-based architecture for

supporting multiple applications:– NESPOLE!: speech-MT for advanced e-commerce

– C-STAR: speech-to-speech MT over mobile phones

– LingWear: MT and language assistance on wearable devices

Students Working on the Project

• Chad Langley: Hybrid Rule/Stat Analysis, Speech MT architecture

• Ben Han: Automatic Grammar Induction

• Alicia Tribble: Interlingua and grammar development for Medical Domain

• Joy Zhang, Erik Peterson: Chinese EBMT for LingWear

The JANUS Speech-MT Team• Project Leaders: Lori Levin, Alon Lavie, Tanja

Schultz, Alex Waibel• Grammar and Component Developers: Donna

Gates, Dorcas Wallace, Kay Peterson, Alicia Tribble, Chad Langley, Ben Han, Celine Morel, Susie Burger, Vicky MacLaren, Kornel Laskowski, Erik Peterson

Documents

Speech-to-Speech MT Design and Engineering