95
Speech and Language Processing: Where have we been and where are we going? Kenneth Ward Church AT&T Labs-Research [email protected] www.research.att.com/~kwc

Speech and Language Processing: Where have we been and where are we going? Kenneth Ward Church AT&T Labs-Research [email protected] kwc

Embed Size (px)

Citation preview

Speech and Language Processing:

Where have we beenand where are we going?

Kenneth Ward Church

AT&T Labs-Research

[email protected]

www.research.att.com/~kwc

Eurospeech 2003 2

Where have we been?How To Cook A Demo

(After Dinner Talk at TMI-1992 & Invited Talk at TMI-2002)

• Great fun!

• Effective demos– Theater, theater, theater– Production quality matters– Entertainment >> evaluation– Strategic vision >> technical correctness

• Success/Catastrophe– Warning: demos can be too effective– Dangerous to raise unrealistic expectations

Message forAfter Dinner Talk

Message forAfter Breakfast Talk

Eurospeech 2003 3

Let’s go to the video tape!(Lesson: manage expectations)

• Lots of predictions– Entertaining in retrospect– Nevertheless, many of these people went on to very successful

careers: president of MIT, Microsoft exec, etc.

Eurospeech 2003 4

Let’s go to the video tape!(Lesson: manage expectations)

• Lots of predictions– Entertaining in retrospect– Nevertheless, many of these people went on to very successful

careers: president of MIT, Microsoft exec, etc.1. Machine Translation (1950s) video

– Classic example of a demo embarrassment in retrospect

Eurospeech 2003 5

Let’s go to the video tape!(Lesson: manage expectations)

• Lots of predictions– Entertaining in retrospect– Nevertheless, many of these people went on to very successful

careers: president of MIT, Microsoft exec, etc.1. Machine Translation (1950s) video

– Classic example of a demo embarrassment in retrospect2. Translating telephone (late 1980s) video

– Pierre Isabelle pulled a similar demo because it was so effective– The limitations of the technology were hard to explain to public

• Though well understood by research community

Eurospeech 2003 6

Let’s go to the video tape!(Lesson: manage expectations)

• Lots of predictions– Entertaining in retrospect– Nevertheless, many of these people went on to very successful

careers: president of MIT, Microsoft exec, etc.1. Machine Translation (1950s) video

– Classic example of a demo embarrassment in retrospect2. Translating telephone (late 1980s) video

– Pierre Isabelle pulled a similar demo because it was so effective– The limitations of the technology were hard to explain to public

• Though well understood by research community

3. Apple (~1990) video– Still having trouble setting appropriate expectations– Factoid: the day of this demo, speech recognition deployed at scale in

AT&T network – with significant lasting impact – but little media

Eurospeech 2003 7

Let’s go to the video tape!(Lesson: manage expectations)

• Lots of predictions– Entertaining in retrospect– Nevertheless, many of these people went on to very successful

careers: president of MIT, Microsoft exec, etc.1. Machine Translation (1950s) video

– Classic example of a demo embarrassment in retrospect2. Translating telephone (late 1980s) video

– Pierre Isabelle pulled a similar demo because it was so effective– The limitations of the technology were hard to explain to public

• Though well understood by research community

3. Apple (~1990) video– Still having trouble setting appropriate expectations– Factoid: the day of this demo, speech recognition deployed at scale in

AT&T network – with significant lasting impact – but little media4. Andy Rooney (~1990): reset expectations video

Eurospeech 2003 8

Outline: Where have we been and where are we going?

1. Consistent progress over decades Moore’s Law, Speech Coding, Error Rate

2. History repeats itself• Empiricism: 1950s• Rationalism: 1970s• Empiricism: 1990s• Rationalism: 2010s (?)

3. Discontinuities: Fundamental changes that invalidate fundamental assumptions

• Petabytes: $2,000,000 $2,000• Can demand keep up with supply?• If not Tech meltdown• New priorities: Search >> Compression & Dictation

ManagingExpectations

Eurospeech 2003 9

Charles Wayne’s Challenge:Demonstrate Consistent Progress Over Time

• Controversial in 1980s– But not in 1990s– Though, lgrumbling

• Benefits1. Agreement on what to do2. Limits endless discussion3. Helps sell the field

• Manage expectations• Fund raising

• Risks (similar to benefits)1. All our eggs are in one basket

(lack of diversity)2. Not enough discussion

• Hard to change course

3. Methodology Burden

ManagingExpectations

Eurospeech 2003 10

Hockey StickBusiness Case

2002 2003 2004

t

$

LastYear

ThisYear Next

Year

Eurospeech 2003 11

Moore’s Law: Ideal AnswerWhere have we been and where are we going?

Eurospeech 2003 12

Where have we been and where are we going?Moore’s Law: Ideal Answer

Why different slopes?1. Progress limited by physicsphysics

– Disk seek: 10 years (normal inflation)– Disk capacity: 1 year (hyper-inflation)

Physics & Investment Rate of Progress

in Speech & Language(and everything)

Normal Inflation

Hyper-Inflation

Eurospeech 2003 13

Where have we been and where are we going?Moore’s Law: Ideal Answer

Why different slopes?1. Progress limited by physicsphysics

– Disk seek: 10 years (normal inflation)– Disk capacity: 1 year (hyper-inflation)

2. Progress limited by investmentinvestment – Case history: PCs improved faster than

supercomputers (Cray)• PCs: larger market more R&D

– Irony: “Dis-economy of Scale”– Danny Hillis (Thinking Machines)

• Computing is better (cheaper & faster) on smaller machines

– PCs >> big iron– LAN routers >> 5ESS (big phone switch)

– Economies of scale depend on size of market, not size of machine• Market: PC >> big iron (Economist View)• Machine: PC << big iron (CS View)

Physics & Investment Rate of Progress

in Speech & Language(and everything)

Normal Inflation

Hyper-Inflation

Eurospeech 2003 14

Where have we been and where are we going?Moore’s Law: Ideal Answer

Why different slopes?1. Progress limited by physicsphysics

– Disk seek: 10 years (normal inflation)– Disk capacity: 1 year (hyper-inflation)

2. Progress limited by investmentinvestment – Case history: PCs improved faster than

supercomputers (Cray)• PCs: larger market more R&D

– Irony: “Dis-economy of Scale”– Danny Hillis (Thinking Machines)

• Computing is better (cheaper & faster) on smaller machines

– PCs >> big iron– LAN routers >> 5ESS (big phone switch)

– Economies of scale depend on size of market, not size of machine• Market: PC >> big iron (Economist View)• Machine: PC << big iron (CS View)

Physics & Investment Rate of Progress

in Speech & Language(and everything)

Normal Inflation

Hyper-Inflation

Eurospeech 2003 15

Where have we been and where are we going?Moore’s Law: Ideal Answer

Why different slopes?1. Progress limited by physicsphysics

– Disk seek: 10 years (normal inflation)– Disk capacity: 1 year (hyper-inflation)

2. Progress limited by investmentinvestment – Case history: PCs improved faster than

supercomputers (Cray)• PCs: larger market more R&D

– Irony: “Dis-economy of Scale”– Danny Hillis (Thinking Machines)

• Computing is better (cheaper & faster) on smaller machines

– PCs >> big iron– LAN routers >> 5ESS (big phone switch)

– Economies of scale depend on size of market, not size of machine• Market: PC >> big iron (Economist View)• Machine: PC << big iron (CS View)

Physics & Investment Rate of Progress

in Speech & Language(and everything)

Normal Inflation

Hyper-Inflation

Eurospeech 2003 16

Bit Rate (kb/s)

Sp

eech

Qu

alit

y

Excellent

Good

Fair

Poor

Bad

Evolution of Speech Coder Performance

ITU RecommendationsCellular Standards

Secure Telephony

1980 Profile1990 Profile2000 Profile

2000

1980

1990

North American TDMA

Borrowed SlideRich Cox

Eurospeech 2003 17

Speech Coding

(Telephony)

• More complicated than Moore’s Law– Many Dimensions: Bit Rate, Quality, Complexity and Delay– Quality ceiling (imposed by telephone standards)

• Easy to reach the ceiling at high bit rates (≥ 8 kb/s)• More room for progress at low bit rates (≤ 8 kb/s)

Ceiling

Eurospeech 2003 18

Speech Coding

(Telephony)

• More complicated than Moore’s Law– Many Dimensions: Bit Rate, Quality, Complexity and Delay– Quality ceiling (imposed by telephone standards)

• Easy to reach the ceiling at high bit rates (≥ 8 kb/s)• More room for progress at low bit rates (≤ 8 kb/s)

• Moore’s Law Time Constant– Bit rates half every decade (≤ 8 kb/s)– Relatively slow by Moore’s Law standards (not hyper-inflation)

• Performance doubles every decade• Like disk seek or money in the bank (normal inflation)

– Limited more by physics than investment

Ceiling

Eurospeech 2003 19

Speech Coding

(Telephony)

• More complicated than Moore’s Law– Many Dimensions: Bit Rate, Quality, Complexity and Delay– Quality ceiling (imposed by telephone standards)

• Easy to reach the ceiling at high bit rates (≥ 8 kb/s)• More room for progress at low bit rates (≤ 8 kb/s)

• Moore’s Law Time Constant– Bit rates half every decade (≤ 8 kb/s)– Relatively slow by Moore’s Law standards (not hyper-inflation)

• Performance doubles every decade• Like disk seek or money in the bank (normal inflation)

– Limited more by physics than investment• Potential compression opportunity

– At most 10x: 8 kb/s 2 kb/s 1 kb/s (?)– Entropy: 50 bits per sec (Roger Moore)

Ceiling

Eurospeech 2003 20

Speech Coding

(Telephony)

• More complicated than Moore’s Law– Many Dimensions: Bit Rate, Quality, Complexity and Delay– Quality ceiling (imposed by telephone standards)

• Easy to reach the ceiling at high bit rates (≥ 8 kb/s)• More room for progress at low bit rates (≤ 8 kb/s)

• Moore’s Law Time Constant– Bit rates half every decade (≤ 8 kb/s)– Relatively slow by Moore’s Law standards (not hyper-inflation)

• Performance doubles every decade• Like disk seek or money in the bank (normal inflation)

– Limited more by physics than investment• Potential compression opportunity

– At most 10x: 8 kb/s 2 kb/s 1 kb/s (?) 50 bits per sec (??)• Speech (2 kb/s) >> text (2 bits/char): 10-1000 times more bits

– Speech coding will not close this gap for foreseeable future

Ceiling

Eurospeech 2003 21

Where have we been and where are we going?

1. Consistent progress over decades• Moore’s Law• Speech Coding Reducing Speech Recognition Error Rates

2. History repeats itself• Empiricism: 1950s• Rationalism: 1970s• Empiricism: 1990s• Rationalism: 2010s (?)

3. Discontinuities: Fundamental changes that invalidate fundamental assumptions

• Petabytes: $2,000,000 $2,000• Can demand keep up with supply?• If not Tech meltdown• New priorities: Search >> Compression & Dictation

Eurospeech 2003 22

Err

or

Ra

te

Date (15 years)

Moore’s Law Time Constant:• 10x improvement per decade• Limited by R&D Investment

• (Not Physics)

Borrowed SlideAudrey Le (NIST)

Eurospeech 2003 23

Milestones in Speech and Multimodal Technology Research

1962 1967 1972 1977 1982 1987 1992 1997 2002

Year

Isolated Words

Filter-bank analysis;

Time-normalization

;Dynamic programming

Isolated Words; Connected Digits;

Continuous Speech

Pattern recognition; LPC

analysis; Clustering

algorithms; Level building;

Continuous Speech; Speech Understanding

Stochastic language understanding;

Finite-state machines;

Statistical learning;

Small Vocabulary,

Acoustic Phonetics-

based

Medium Vocabulary, Template-based

Large Vocabulary;

Syntax, Semantics,

Connected Words;

Continuous Speech

Large Vocabulary,

Statistical-based

Hidden Markov models;

Stochastic Language modeling;

Spoken dialog; Multiple

modalities

Very Large Vocabulary; Semantics, Multimodal Dialog, TTS

Concatenative synthesis; Machine

learning; Mixed-initiative dialog;

BorrowedSlide

Consistent improvement over time, but unlike Moore’s Law, hard to extrapolate (predict future)

Eurospeech 2003 24

Speech-Related TechnologiesWhere will the field go in 10 years?

Niels Ole Bernsen (ed)

2003 Useful speech recognition-based language tutor

2003 Useful portable spoken sentence translation systems

2003 First pro-active spoken dialogue with situation awareness

2004 Satisfactory spoken car navigation systems

2005Small-vocabulary (> 1000 words)spoken conversational systems

2006Multiple-purpose personal assistants (spoken dialog, animated characters)

2006 Task-oriented spoken translation systems for the web

2006 Useful speech summarization systems in top languages

2008 Useful meeting summarization systems

2010 Medium-size vocabulary conversational systems

Eurospeech 2003 25

Where have we been and where are we going?Consistent Progress over Time

Extrapolation/Prediction is Applicable

Extrapolation/Prediction is Not Applicable

2002 2003 2004

t

$

Physics andInvestment

Investment

Physics

ManageExpectations

Eurospeech 2003 26

Where have we been and where are we going?

1. Consistent progress over decades• Moore’s Law, Speech Coding, Error Rate

History repeats itself• Empiricism: 1950s• Rationalism: 1970s• Empiricism: 1990s• Rationalism: 2010s (?)

3. Discontinuities: Fundamental changes that invalidate fundamental assumptions

• Petabytes: $2,000,000 $2,000• Can demand keep up with supply?• If not Tech meltdown• New priorities: Search >> Compression & Dictation

Eurospeech 2003 27

It has been claimed that

Recent progress made possible by EmpiricismEmpiricismProgress (or Oscillating Fads)?

• 1950s: Empiricism was at its peak– Dominating a broad set of fields

• Ranging from psychology (Behaviorism)• To electrical engineering (Information Theory)

– Psycholinguistics: Word frequency norms (correlated with reaction time, errors)• Word association norms (priming): bread and butter, doctor / nurse

– Linguistics/psycholinguistics: focus on distribution (correlate of meaning)• Firth: “You shall know a word by the company it keeps”• Collocations: Strong tea v. powerful computers

• 1970s: Rationalism was at its peak– with Chomsky’s criticism of ngrams in Syntactic Structures (1957)– and Minsky and Papert’s criticism of neural networks in Perceptrons (1969).

• 1990s: Revival of EmpiricismEmpiricism– Availability of massive amounts of data (popular arg, even before the web)

• “More data is better data”• Quantity >> Quality (balance)

– Pragmatic focus:• What can we do with all this data?• Better to do something than nothing at all

– Empirical methods (and focus on evaluation): Speech Language• 2010s: Revival of Rationalism (?)

Eurospeech 2003 28

It has been claimed that

Recent progress made possible by EmpiricismEmpiricismProgress (or Oscillating Fads)?

• 1950s: EmpiricismEmpiricism was at its peak– Dominating a broad set of fields

• Ranging from psychology (Behaviorism)• To electrical engineering (Information Theory)

– Psycholinguistics: Word frequency norms (correlated with reaction time, errors)• Word association norms (priming): bread and butter, doctor / nurse

– Linguistics/psycholinguistics: focus on distribution (correlate of meaning)• Firth: “You shall know a word by the company it keeps”• Collocations: Strong tea v. powerful computers

• 1970s: Rationalism was at its peak– with Chomsky’s criticism of ngrams in Syntactic Structures (1957)– and Minsky and Papert’s criticism of neural networks in Perceptrons (1969).

• 1990s: Revival of EmpiricismEmpiricism– Availability of massive amounts of data (popular arg, even before the web)

• “More data is better data”• Quantity >> Quality (balance)

– Pragmatic focus:• What can we do with all this data?• Better to do something than nothing at all

– Empirical methods (and focus on evaluation): Speech Language• 2010s: Revival of Rationalism (?)

Eurospeech 2003 29

It has been claimed that

Recent progress made possible by EmpiricismEmpiricismProgress (or Oscillating Fads)?

• 1950s: EmpiricismEmpiricism was at its peak– Dominating a broad set of fields

• Ranging from psychology (Behaviorism)• To electrical engineering (Information Theory)

– Psycholinguistics: Word frequency norms (correlated with reaction time, errors)• Word association norms (priming): bread and butter, doctor / nurse

– Linguistics/psycholinguistics: focus on distribution (correlate of meaning)• Firth: “You shall know a word by the company it keeps”• Collocations: Strong tea v. powerful computers

• 1970s: RationalismRationalism was at its peak– with Chomsky’s criticism of ngrams in Syntactic Structures (1957)– and Minsky and Papert’s criticism of neural networks in Perceptrons (1969).

• 1990s: Revival of EmpiricismEmpiricism– Availability of massive amounts of data (popular arg, even before the web)

• “More data is better data”• Quantity >> Quality (balance)

– Pragmatic focus:• What can we do with all this data?• Better to do something than nothing at all

– Empirical methods (and focus on evaluation): Speech Language• 2010s: Revival of Rationalism (?)

Eurospeech 2003 30

It has been claimed that

Recent progress made possible by EmpiricismEmpiricismProgress (or Oscillating Fads)?

• 1950s: EmpiricismEmpiricism was at its peak– Dominating a broad set of fields

• Ranging from psychology (Behaviorism)• To electrical engineering (Information Theory)

– Psycholinguistics: Word frequency norms (correlated with reaction time, errors)• Word association norms (priming): bread and butter, doctor / nurse

– Linguistics/psycholinguistics: focus on distribution (correlate of meaning)• Firth: “You shall know a word by the company it keeps”• Collocations: Strong tea v. powerful computers

• 1970s: RationalismRationalism was at its peak– with Chomsky’s criticism of ngrams in Syntactic Structures (1957)– and Minsky and Papert’s criticism of neural networks in Perceptrons (1969).

• 1990s: Revival of EmpiricismEmpiricism– Availability of massive amounts of data (popular arg, even before the web)

• “More data is better data”• Quantity >> Quality (balance)

– Pragmatic focus:• What can we do with all this data?• Better to do something than nothing at all

– Empirical methods (and focus on evaluation): Speech Language• 2010s: Revival of RationalismRationalism (?)

Consistent progress?

• Periodic signals are continuous• Support extrapolation/prediction• Progress? Consistent progress?

Extrapolation/Prediction: Applicable?

Eurospeech 2003 31

Speech Language Has the pendulum

swung too far?• What happened between TMI-1992 and TMI-2002 (if anything)?• Have empirical methods become too popular?

– Has too much happened since TMI-1992?• I worry that the pendulum has swung so far that

– We are no longer training students for the possibility• that the pendulum might swing the other way

• We ought to be preparing students with a broad education including:– Statistics and Machine Learning– as well as Linguistic Theory

• History repeats itself: Mark Twain; bad idea then and still a bad idea now– 1950s: empiricism– 1970s: rationalism (empiricist methodology became too burdensome)– 1990s: empiricism– 2010s: rationalism (empiricist methodology is burdensome, again)

Eurospeech 2003 32

Speech Language Has the pendulum

swung too far?• What happened between TMI-1992 and TMI-2002 (if anything)?• Have empirical methods become too popular?

– Has too much happened since TMI-1992?• I worry that the pendulum has swung so far that

– We are no longer training students for the possibility• that the pendulum might swing the other way

• We ought to be preparing students with a broad education including:– Statistics and Machine Learning– as well as Linguistic Theory

• History repeats itself: Mark Twain; bad idea then and still a bad idea now– 1950s: empiricism– 1970s: rationalism (empiricist methodology became too burdensome)– 1990s: empiricism– 2010s: rationalism (empiricist methodology is burdensome, again)

Plays well at Machine

Translation conferences

Eurospeech 2003 33

Speech Language Has the pendulum

swung too far?• What happened between TMI-1992 and TMI-2002 (if anything)?• Have empirical methods become too popular?

– Has too much happened since TMI-1992?• I worry that the pendulum has swung so far that

– We are no longer training students for the possibility• that the pendulum might swing the other way

• We ought to be preparing students with a broad education including:– Statistics and Machine Learning– as well as Linguistic Theory

• History repeats itself: Mark Twain; bad idea then and still a bad idea now– 1950s: empiricism– 1970s: rationalism (empiricist methodology became too burdensome)– 1990s: empiricism– 2010s: rationalism (empiricist methodology is burdensome, again)

Plays well at Machine

Translation conferences

Eurospeech 2003 34

Speech Language Has the pendulum

swung too far?• What happened between TMI-1992 and TMI-2002 (if anything)?• Have empirical methods become too popular?

– Has too much happened since TMI-1992?• I worry that the pendulum has swung so far that

– We are no longer training students for the possibility• that the pendulum might swing the other way

• We ought to be preparing students with a broad education including:– Statistics and Machine Learning– as well as Linguistic Theory

• History repeats itself:– 1950s: empiricismempiricism– 1970s: rationalismrationalism (empiricist methodology became too burdensome)– 1990s: empiricismempiricism– 2010s: rationalismrationalism (empiricist methodology is burdensome, again)

Plays well at Machine

Translation conferences

Mark Twain; bad idea then and still a bad idea now

Eurospeech 2003 35

Rationalism Empiricism

Well-known advocates Chomsky, Minsky

Shannon, Skinner, Firth, Harris

Model Competence Model Noisy Channel Model

Contexts of Interest Phrase-Structure N-Grams

Goals

All and OnlyMinimize Prediction Error

(Entropy)

Explanatory Descriptive

Theoretical Applied

Linguistic Generalizations

Agreement & Wh-movement

Collocations & Word Associations

Parsing StrategiesPrinciple-Based,

CKY (Chart), ATNs, Unification

Forward-Backward (HMMs), Inside-outside (PCFGs)

Applications

Understanding Recognition

Who did what to whom

Noisy Channel Applications

Eurospeech 2003 36

Where have we been and where are we going?

1. Consistent progress over decades• Moore’s Law, Speech Coding, Error Rate

2. History repeats itself• Empiricism: 1950s• Rationalism: 1970s• Empiricism: 1990s• Rationalism: 2010s (?)

Discontinuities: Fundamental changes that invalidate fundamental assumptions

• Petabytes: $2,000,000 $2,000• Can demand keep up with supply?• If not Tech meltdown• New priorities: Search >> Compression & Dictation

Eurospeech 2003 37

Meeting Demand for PetabytesBet: Speech >> Text

(because we aren’t going to solve all “speech” problems)

• Moore’s Law More and More Supply– Disks, Memory, Network Bandwidth, everything…– Petabytes are coming: $2,000,000 (today) $2,000 (in 10 years)

• Can demand keep up?– If not, revenues will collapse tech meltdown– Much worse than the Dot-Bomb…

• Ans1: no problem– Demand has always kept up– Pundits have never been able to explain why

• Thomas J. Watson (1943): I think there is a world market for maybe five computers

– But if you build it, they will come• Ans2: big problem (prices for PCs & Networks are collapsing)

– Demand is everything– Anyone (even a dot-com) can build a network,– But the challenge is to sell it– Need a kill app (more minutes on the network)

Discontinuity

Eurospeech 2003 38

Meeting Demand for PetabytesBet: Speech >> Text

(because we aren’t going to solve all “speech” problems)

• Moore’s Law More and More Supply– Disks, Memory, Network Bandwidth, everything…– Petabytes are coming: $2,000,000 (today) $2,000 (in 10 years)

• Can demand keep up?– If not, revenues will collapse tech meltdown– Much worse than the Dot-Bomb…

Discontinuity

Eurospeech 2003 39

Meeting Demand for PetabytesBet: Speech >> Text

(because we aren’t going to solve all “speech” problems)

• Moore’s Law More and More Supply– Disks, Memory, Network Bandwidth, everything…– Petabytes are coming: $2,000,000 (today) $2,000 (in 10 years)

• Can demand keep up?– If not, revenues will collapse tech meltdown– Much worse than the Dot-Bomb…

• Ans1: no problem– Demand has always kept up– Pundits have never been able to explain why

• Thomas J. Watson (1943): I think there is a world market for maybe five computers www.wikipedia.org/wiki/Thomas+J.+Watson

– But if you build it, they will come• Ans2: big problem (prices for PCs & Networks are collapsing)

– Demand is everything– Anyone (even a dot-com) can build a network,– But the challenge is to sell it– Need a killer app (more minutes on the network)

Discontinuity

Eurospeech 2003 40

How much is a Petabyte?(1015 bytes)

• Question from execs:– How do I explain to a lay audience

• How much is a petabyte• And why everyone will buy lots of them

• Wrong answer: – 106 is a million (a floppy disk/email msg)– 109 is a billion (a billion here, a billion there…)– 1012 is a trillion (the US debt)– 1015 is a zillion (= , an unimaginably large #)

Eurospeech 2003 41

How much is a Petabyte?(1015 bytes)

• Question from execs:– How do I explain to a lay audience

• How much is a petabyte• And why everyone will buy lots of them

• Wrong answer: – 106 is a million (a floppy disk/email msg)– 109 is a billion (a billion here, a billion there…)– 1012 is a trillion (the US debt)– 1015 is a zillion (= , an unimaginably large #)

Eurospeech 2003 42

How much is a Petabyte?Some more wrong answers

• Goal: create demand for a petabyte/lifetime– ≈ 1015 bytes/100 years ≈ 18 megabytes/minute– Text: 18,000 pages/min– Speech: 317 telephone channels for 100 years per capita

• Text won’t do it– Speech probably won’t either, but it is closer– DVD video will (1.8 gigabytes/hour = 1.6 petabytes/lifetime), but

• Too much opportunity for compression• Not enough demand for Picture Phone (privacy concerns)

• Bank on speech recognition not working too well– Can’t afford big improvements in compression:

• Speech rates Text ratesFortunately, that

won’t happen

Eurospeech 2003 43

New Research Challenges

• New Priorities– Increase demand for

space >> Data entry• New Killer Apps

– Search >> Dictation• Speech Google!

– Data mining

• Old Priorities– Dictation application dates

back to days of dictation machines

– Speech recognition has not displaced typing

• Speech recognition has improved

• But typing skills have improved even more

– My son will learn typing in 1st grade

– Sec rarely take dictation

– Dictation machines are history• My son may never see one• Museums have slide rulers

and steam trains– But dictation machines?

Eurospeech 2003 44

Data Mining & Call Centers: An Intelligence Bonanza

• Some companies are collecting information with technology designed to monitor incoming calls for service quality.

• Last summer, Continental Airlines Inc. installed software from Witness Systems Inc. to monitor the 5,200 agents in its four reservation centers.

• But the Houston airline quickly realized that the system, which records customer phone calls and information on the responding agent's computer screen, also was an intelligence bonanza, says André Harris, reservations training and quality-assurance director.

In Search of PetaByte Databases

Jim Gray

Tony Hey

BorrowedSlide

Eurospeech 2003 46

Personal 100 GB todayThe Personal Petabyte (someday)

• It’s coming (2M$ today…2K$ in 10 years)

• Today the pack rats have ~ 10-100GB– 1-10 GB in text (eMail, PDF, PPT, OCR…)– 10GB – 50GB tiff, mpeg, jpeg,…– Some have 1TB (voice + video).

• Video can drive it to 1PB.

• Online PB affordable in 10 years.

• Get ready: tools to capture, manage, organize, search, display will be big app.

BorrowedSlide

Text won’t do it;Speech won’t either

Eurospeech 2003 47

300 TB (cooked)Hotmail / Yahoo

• Clone front ends ~10,000@hotmail.

• Application servers– ~100 @ hotmail – Get mail box– Get/put mail– Disk bound

• ~30,000 disks

• ~ 20 admins

BorrowedSlide

Cost of storage: People

Per Capita Demand: Tiny

Eurospeech 2003 48

AOL (msn)(1PB?)

• 10 B transactions per day (10% of that)

• Huge storage

• Huge traffic

• Lots of eye candy

• DB used for security/accounting.

• GUESS AOL is a petabyte – (40M x 10MB = 400 x 1012)

BorrowedSlide

Per Capita Demand: Tiny

Eurospeech 2003 49

Google1.5PB as of last spring

• 8,000 no-name PCs– Each 1/3U, 2 x 80 GB disk, 2

cpu 256MB ram

• 1.4 PB online.• 2 TB ram online• 8 TeraOps • Slice-price is 1K$ so 8M$.• 15 admins (!) (== 1/100TB).

BorrowedSlide

Per Capita Demand: Tiny

2001

Cost of storage: People

Eurospeech 2003 50

Digital Immortality:Gordon Bell & Jim Gray (2000)

Estimated Lifetime Storage Requirements

Data-types Per day Per Lifetime

email, papers, text 0.5 MB 15 GB

photos 2 MB 150 GB

speech 40 MB 1.2 TB

music 60 MB 5.0 TB

video-lite (200 Kb/s) 1 GB 100 TB

DVD video (4.3 Mb/s = 1.8 GB/hour) 20 GB 1 PB

Eurospeech 2003 51

Future of Tech Industry Depends On…

• Supply running into a (physical) limit – Moore’s Law breaking down– And little progress on compression

• Demand keeping up – If we build it, they will come…

• Bell & Gray underestimating demand by a lot– Everyone wanting lots and lots of speech– Everyone wanting lots of video– A miracle (the fat lady might sing…)

– Big progress on searching speech & videoBest Bet!

Not Likely

Not Likely

Not Optimistic

Eurospeech 2003 52

Bait and Switch Strategywww.elsnet.org

• Bait: public Internet– Large, sexy, available, rich hypertext structure

• Switch: as large as the web is– There are larger & more valuable private repositories

• Private Intranets & telephone networks– Exclusivity Value

• No one cares about data that everyone can have• Just as Groucho Marx doesn’t want to be in a club that…

• Strategy: Use the public Intranet to develop, test and socialize new ways to extract value from large linguistic repositories– Value to society: Port solutions to private repositories

Eurospeech 2003 53

Bait and Switch Strategywww.elsnet.org

• Bait: public Internet– Large, sexy, available, rich hypertext structure

• Switch: as large as the web is– There are larger & more valuable private repositories

• Private Intranets & telephone networks– Exclusivity Value

• No one cares about data that everyone can have• Just as Groucho Marx doesn’t want to be in a club that…

• Strategy: Use the public Intranet to develop, test and socialize new ways to extract value from large linguistic repositories– Value to society: Port solutions to private repositories

Eurospeech 2003 54

Bait and Switch Strategywww.elsnet.org

• Bait: public Internet– Large, sexy, available, rich hypertext structure

• Switch: as large as the web is– There are larger & more valuable private repositories

• Private Intranets & telephone networks– Exclusivity Value

• No one cares about data that everyone can have• Just as Groucho Marx doesn’t want to be in a club that…

• Strategy: Use the public Intranet to develop, test and socialize new ways to extract value from large linguistic repositories– Value to society: Port solutions to private repositories

Eurospeech 2003 55

Switch: How Large is Large?

• Web Renewed Excitement– Large, rich hypertext structure & publicly available– Ngram freqs Google = 1000 * BNC

• Google: 100 Billion Words • British National Corpus (BNC): 100 Million Words

1 TB (ngram freqs) or 1 PB (Gray)?

Eurospeech 2003 56

Switch: How Large is Large?

• Web Renewed Excitement– Large, rich hypertext structure & publicly available– Ngram freqs Google = 1000 * BNC

• Google: 100 Billion Words • British National Corpus (BNC): 100 Million Words

• It is often said that the web is the largest repository but…– Changes to copyright laws could unlock vast resources:

www.lexisnexis.com• Private Intranets and telephone networks >> Public Web

– American Telephone Network (FCC): 1 line/person• Usage: 1 hour/day/line• Assume 1 sec ≈ 1 word 10 Google collections/day

– Currently, Intranets (data) ≈ telephones (voice)• But data is growing faster than voice

– AT&T networks: 1 PB/day• Worldwide networks: tens of PB/day

1 TB (ngram freqs) or 1 PB (Gray)?

Eurospeech 2003 57

Switch: How Large is Large?

• Web Renewed Excitement– Large, rich hypertext structure & publicly available– Ngram freqs Google = 1000 * BNC

• Google: 100 Billion Words • British National Corpus (BNC): 100 Million Words

• It is often said that the web is the largest repository but…– Changes to copyright laws could unlock vast resources:

www.lexisnexis.com• Private Intranets and telephone networks >> Public Web

– American Telephone Network (FCC): 1 line/person• Usage: 1 hour/day/line• Assume 1 sec ≈ 1 word 10 Google collections/day

– Currently, Intranets (data) ≈ telephones (voice)• But data is growing faster than voice

– AT&T networks: 1 PB/day• Worldwide networks: tens of PB/day

1 TB (ngram freqs) or 1 PB (Gray)?

Eurospeech 2003 58

Switch: How Large is Large?

• Web Renewed Excitement– Large, rich hypertext structure & publicly available– Ngram freqs Google = 1000 * BNC

• Google: 100 Billion Words • British National Corpus (BNC): 100 Million Words

• It is often said that the web is the largest repository but…– Changes to copyright laws could unlock vast resources:

www.lexisnexis.com• Private Intranets and telephone networks >> Public Web

– American Telephone Network (FCC): 1 line/person• Usage: 1 hour/day/line• Assume 1 sec ≈ 1 word 10 Google collections/day

– Currently, Intranets (data) ≈ telephones (voice)• But data is growing faster than voice

– AT&T networks: 1 PB/day• Worldwide networks: tens of PB/day

1 TB (ngram freqs) or 1 PB (Gray)?

Eurospeech 2003 59

Switch: How Large is Large?

• Web Renewed Excitement– Large, rich hypertext structure & publicly available– Ngram freqs Google = 1000 * BNC

• Google: 100 Billion Words • British National Corpus (BNC): 100 Million Words

• It is often said that the web is the largest repository but…– Changes to copyright laws could unlock vast resources:

www.lexisnexis.com• Private Intranets and telephone networks >> Public Web

– American Telephone Network (FCC): 1 line/person• Usage: 1 hour/day/line• Assume 1 sec ≈ 1 word 10 Google collections/day

– Currently, Intranets (data) ≈ telephones (voice)• But data is growing faster than voice

– AT&T networks: 1 PB/day• Worldwide networks: tens of PB/day

1 TB (ngram freqs) or 1 PB (Gray)?

A lot of speech, but notPB per capita

Eurospeech 2003 60

Privacy Concerns: Private Data is Private(Exclusivity Value)

• Data on private intranets cannot be distributed– And most telephone conversations cannot even be recorded

• let alone distributed

• But attitudes are changing– It used to be considered rude to have an answering machine– Now it is considered rude not to have one

• Between answering machines and call centers, perhaps 10% of telephone traffic can be recorded (≈ 1 PB/day)– Customer expectation: call centers can retrieve recordings of

previous calls based on content• New capabilities new public policy

– Video recording: • Expected in banks (ATMs)• Prohibited in rest rooms (except children’s YMCA locker room)

Eurospeech 2003 61

Privacy Concerns: Private Data is Private(Exclusivity Value)

• Data on private intranets cannot be distributed– And most telephone conversations cannot even be recorded

• let alone distributed

• But attitudes are changing– It used to be considered rude to have an answering machine– Now it is considered rude not to have one

• Between answering machines and call centers, perhaps 10% of telephone traffic can be recorded (≈ 1 PB/day)– Customer expectation: call centers can retrieve recordings of

previous calls based on content• New capabilities new public policy

– Video recording: • Expected in banks (ATMs)• Prohibited in rest rooms (except children’s YMCA locker room)

Eurospeech 2003 62

Privacy Concerns: Private Data is Private(Exclusivity Value)

• Data on private intranets cannot be distributed– And most telephone conversations cannot even be recorded

• let alone distributed

• But attitudes are changing– It used to be considered rude to have an answering machine– Now it is considered rude not to have one

• Between answering machines and call centers, perhaps 10% of telephone traffic can be recorded (≈ 1 PB/day)– Customer expectation: call centers can retrieve recordings of

previous calls based on content• New capabilities new public policy

– Video recording: • Expected in banks (ATMs)• Prohibited in rest rooms (except children’s YMCA locker room)

Eurospeech 2003 63

Privacy Concerns: Private Data is Private(Exclusivity Value)

• Data on private intranets cannot be distributed– And most telephone conversations cannot even be recorded

• let alone distributed

• But attitudes are changing– It used to be considered rude to have an answering machine– Now it is considered rude not to have one

• Between answering machines and call centers, perhaps 10% of telephone traffic can be recorded (≈ 1 PB/day)– Customer expectation: call centers can retrieve recordings of

previous calls based on content• New capabilities new public policy

– Video recording: • Expected in banks (ATMs)• Prohibited in rest rooms (except children’s YMCA locker room)

Eurospeech 2003 64

In the past, recording all this data would have been prohibitively expensive

• Thanks to Moore’s Law– Storage costs have been falling faster than transport– And will continue to do so for some time

• Even at current prices, transport >> storage– Transport: Long-distance telephone calls: 5 cents per minute of speech – Storage: Disk space: ½ cent per minute of speech

• If I am willing to pay for a call– I might as well keep the speech online forever

• Similar comments hold for data (web pages)– If I am willing to pay to fetch a web page

• I might as well cache it for a long time• Why flush a page if there is any chance that it might be requested again?

– Web caches crawlers• Go find the pages that I might ask for and keep them forever

• Storage is cheap (compared to transport)

Eurospeech 2003 65

In the past, recording all this data would have been prohibitively expensive

• Thanks to Moore’s Law– Storage costs have been falling faster than transport– And will continue to do so for some time

• Even at current prices, transport >> storage– Transport: Long-distance telephone calls: 5 cents per minute of speech – Storage: Disk space: ½ cent per minute of speech

• If I am willing to pay for a call– I might as well keep the speech online forever

• Similar comments hold for data (web pages)– If I am willing to pay to fetch a web page

• I might as well cache it for a long time• Why flush a page if there is any chance that it might be requested again?

– Web caches crawlers• Go find the pages that I might ask for and keep them forever

• Storage is cheap (compared to transport)

Eurospeech 2003 66

In the past, recording all this data would have been prohibitively expensive

• Thanks to Moore’s Law– Storage costs have been falling faster than transport– And will continue to do so for some time

• Even at current prices, transport >> storage– Transport: Long-distance telephone calls: 5 cents per minute of speech – Storage: Disk space: ½ cent per minute of speech

• If I am willing to pay for a call– I might as well keep the speech online forever

• Similar comments hold for data (web pages)– If I am willing to pay to fetch a web page

• I might as well cache it for a long time• Why flush a page if there is any chance that it might be requested again?

– Web caches crawlers• Go find the pages that I might ask for and keep them forever

• Storage is cheap (compared to transport)

Eurospeech 2003 67

In the past, recording all this data would have been prohibitively expensive

• Thanks to Moore’s Law– Storage costs have been falling faster than transport– And will continue to do so for some time

• Even at current prices, transport >> storage– Transport: Long-distance telephone calls: 5 cents per minute of speech – Storage: Disk space: ½ cent per minute of speech

• If I am willing to pay for a call– I might as well keep the speech online forever

• Similar comments hold for data (web pages)– If I am willing to pay to fetch a web page

• I might as well cache it for a long time• Why flush a page if there is any chance that it might be requested again?

– Web caches crawlers• Go find the pages that I might ask for and keep them forever

• Storage is cheap (compared to transport)

Eurospeech 2003 68

In the past, recording all this data would have been prohibitively expensive

• Thanks to Moore’s Law– Storage costs have been falling faster than transport– And will continue to do so for some time

• Even at current prices, transport >> storage– Transport: Long-distance telephone calls: 5 cents per minute of speech – Storage: Disk space: ½ cent per minute of speech

• If I am willing to pay for a call– I might as well keep the speech online forever

• Similar comments hold for data (web pages)– If I am willing to pay to fetch a web page

• I might as well cache it for a long time• Why flush a page if there is any chance that it might be requested again?

– Web caches crawlers• Go find the pages that I might ask for and keep them forever

• Storage is cheap (compared to transport)

Eurospeech 2003 69

Bait: Use Web to Establish Excitement: More data is better data

• Shocking at TMI-1992 (Bob Mercer)– but less so a decade later (Eric Brill)– Many researchers are finding that performance improves with corpus

size, over full range of sizes that are available.• EMNLP-2002 Best paper (& CL): Using the Web to Overcome

Data Sparseness, Keller et al– For many tasks:

– Language modelling– Predicting psycholinguistic judgements

• Larger corpora (100B Google) >> Smaller corpora (100M BNC)– Collecting more data is better than tricks for not collecting data

• Smoothing, balance, etc.• Tricks have limited power:

– Collecting xx data with tricks ≈ collecting 10xx data without tricks• Wish list: more papers measuring power of various tricks

– Was balancing BNC (British National Corpus) worth the effort?• Should a corpus be balanced? (Oxford Debate, 1991)

• The rising tide of data will lift all boats!1. TREC Question Answering2. Collocations:

My spin

Google is displacing BNCjust as PCs displaced Crays

Still find papers on “tiny” corpora

Larg

er m

arke

t sha

re

M

ore

$$ fo

r R

&D

B

ette

r M

oore

’s L

aw T

ime

Con

stan

t

Eurospeech 2003 70

Bait: Use Web to Establish Excitement: More data is better data

• Shocking at TMI-1992 (Bob Mercer)– but less so a decade later (Eric Brill)– Many researchers are finding that performance improves with corpus

size, over full range of sizes that are available.• EMNLP-2002 Best paper (& CL): Using the Web to Overcome

Data Sparseness, Keller et al– For many tasks:

– Language modelling– Predicting psycholinguistic judgements

• Larger corpora (100B Google) >> Smaller corpora (100M BNC)– Collecting more data is better than tricks for not collecting data

• Smoothing, balance, etc.• Tricks have limited power:

– Collecting xx data with tricks ≈ collecting 10xx data without tricks• Wish list: more papers measuring power of various tricks

– Was balancing BNC (British National Corpus) worth the effort?• Should a corpus be balanced? (Oxford Debate, 1991)

• The rising tide of data will lift all boats!1. TREC Question Answering2. Collocations: http://labs1.google.com/sets

My spin

Google is displacing BNCjust as PCs displaced Crays

Still find papers on “tiny” corpora

Larg

er m

arke

t sha

re

M

ore

$$ fo

r R

&D

B

ette

r M

oore

’s L

aw T

ime

Con

stan

t

Eurospeech 2003 71

The rising tide of data will lift all boats!TREC Question Answering & Google:

What is the highest point on Earth?

Eurospeech 2003 74

The rising tide of data will lift all boats!Acquiring Lexical Resources from Data:

Dictionaries, Ontologies, WordNets, Language Models, etc.http://labs1.google.com/sets

Cat cat England Japan

Dog more France China

Horse ls Germany

Fish rm Italy

Bird mv Ireland

Rabbit cd Spain

Cattle cp Scotland

Rat mkdir Belgium

Livestock man Canada

Mouse tail Austria

Human pwd Australia

Eurospeech 2003 75

The rising tide of data will lift all boats!Acquiring Lexical Resources from Data:

Dictionaries, Ontologies, WordNets, Language Models, etc.http://labs1.google.com/sets

Cat cat England Japan

Dog more France China

Horse ls Germany India

Fish rm Italy Indonesia

Bird mv Ireland Malaysia

Rabbit cd Spain Korea

Cattle cp Scotland Taiwan

Rat mkdir Belgium Thailand

Livestock man Canada Singapore

Mouse tail Austria Australia

Human pwd Australia Bangladesh

Eurospeech 2003 76

Rising Tide of Data Lifts all Boats

• More data better results – TREC Question Answering

• Remarkable performance: Google and not much else

– Norvig (ACL-02)– AskMSR (SIGIR-02)

– Lexical Acquisition• Google Sets

– We tried similar things» but with tiny corpora» which we called large

SwitchSwitch: port these ideas to private repositories

BaitBait: use public web to create & socialize new ideas

Eurospeech 2003 77

RecommendationsBait and Switch Strategy

• Strategy: Use the public Intranet to develop, test and socialize new ways to extract value from large linguistic repositories– Value to society: Port solutions to private repositories

• Research papers:– Keep up the good work!– There is already considerable interest in evaluation of new ideas

on corpora (public repositories)– There will be more interest in

• How well methods port to new corpora• How well performance scales with size

– Hopefully corpus size helps• But of course, all the data in the world

– Will not solve all the world’s problems– Need to understand when more data will help

• And when it is better to do something else– Revival of RationalismRationalism (Linguistics)

Switch

Bait

Eurospeech 2003 78

More RecommendationsBait and Switch Strategy

• Infrastructure– In addition to traditional public repositories (large)

• Web data, data collection efforts such as LDC– We ought to think more about private repositories (even larger)

• Most of us do not keep voice mail for long– But I have been using Scanmail to copy my voice mail to email– And like many, I keep email online for a long time

• Private repositories would be much larger if– It was more convenient to capture private data– and there was obvious value in doing so.

• Currently, tools for public repositories (e.g., Google)– are better than comparable tools for private data (e.g., searching email)

• Better search tools (email, speech & video) Larger private repositories

• New priorities (consume space) new killer apps– Search (consumes space) >> Dictation (data entry) & Compression

Switch

Bait

Eurospeech 2003 79

Summary:Where have we been and where are we going?

• 1970s: Hot debate: knowledge v. data intensive methods– People think about what they can afford to think about– Data was expensive

• Only the richest industrial labs could play• Beyond the reach of most universities• Victor Zue dreams of having an hour of speech online (with annotations)

• 1990s: Revival of Empiricism: More data is better data!– Everyone can afford to play (but still expensive)– Linguistic Data Consortium (LDC) Web– Evaluation, evaluation, evaluation demonstrates consistent progress

over time, but not as convincingly as Moore’s Law– Data intensive: method of choice

• Pendulum swings (too) far• Is this progress, or is the pendulum about to swing back the other way?

• 2010s: Petabytes everywhere (be careful what you ask for)– Big problem: Supply >> Demand tech meltdown (??)– No problem: Demand has always kept up new killer apps

• Search (consumes space) >> dictation (data entry) & compression• Video >> Speech >> Text

Demonstrate consistentprogressover time

Oscillations

Discontinuities

More realistic expectations

Don’t see how to consume PB per capita

Eurospeech 2003 80

Where have we been and where are we going?

1. Consistent progress over decades• Moore’s Law, Speech Coding, Error Rate• Time constant limited by: physics and/or R&D investment

2. History repeats itself: • Mark Twain; bad idea then and still a bad idea now

• Empiricism: 1950s• Rationalism: 1970s• Empiricism: 1990s• Rationalism: 2010s (?)

3. Discontinuities:• Fundamental changes that invalidate fundamental assumptions

• Petabytes: $2,000,000 $2,000• Can demand keep up with supply?• If not Tech meltdown• New priorities: data entry create demand for petabytes

– New Killer Apps: Search (creates demand) >> Compression & Dictation

Backup

Eurospeech 2003 82

Speech Language

Shannon’s: Noisy Channel Model

• I Noisy Channel O

• I΄ ≈ ARGMAXI Pr(I|O) = ARGMAXI Pr(I) Pr(O|I)Language Model

Word Rank More likely alternatives

We 9The This One Two A Three

Please In

need 7 are will the would also do

to 1

resolve 85 have know do…

all 9The This One Two A Three

Please In

of 2The This One Two A Three

Please In

the 1

important 657 document question first…

issues 14 thing point to

Channel Model

Application Input Output

Speech Recognition writer rider

OCR (Optical Character Recognition)

all a1l

Spelling Correction government goverment

ChannelModel

LanguageModel

ApplicationIndependent

Eurospeech 2003 83

Speech Language Using (Abusing) Shannon’s Noisy Channel Model: Part of Speech Tagging and Machine Translation

• Speech– Words Noisy Channel Acoustics

• OCR– Words Noisy Channel Optics

• Spelling Correction– Words Noisy Channel Typos

• Part of Speech Tagging (POS): – POS Noisy Channel Words

• Machine Translation: “Made in America”– English Noisy Channel French

Eurospeech 2003 84

I am going to try to avoid making predictions like these because…

• Too falsifiable• Appearance of conflicts of interest

– Sound like you are trying to raise money for your favorite stuff

• Committees do what committees do– Union of all (represented) positions = no position– Advocate what the members are currently working on

• Rarely establish new strategic direction

• Boring (too obviously correct)

Eurospeech 2003 85

Predictions: Where are we going? Change the subject (engage in meta discussion)

• Set unrealistic expectations (plenty of examples)– Sound like you are trying to raise money for your favorite stuff

• And that you have lost touch with reality• Come up short (fewer examples)• Sound like you’re over the hill (old fogies session at Coling)

– Kids these days don’t get it– Everyone should still be working on

• what we thought was important when we were kids– Dress up old-style thinking (empiricism/rationalism)

• with current fashion (web) Meta discussion: consistent progress, history repeating itself, discontinuities

Come up with a new angle: bounds• Lower bound: we will solve such and such (x)

– Extrapolations based on Moore’s Law• Upper bound: we won’t solve x (soon/ever)

– e.g., pass Turing Test, compress speech down to text rates– And you can bank on it good apps based on assumption x can’t be done

Eurospeech 2003 86

Breaking Through Automation Barriers

Illustrative

Complexity of Services

Com

plex

ity

of U

ser

Inte

ract

ion

TraditionalIVR

Word Spotting

Agents

AdvancedASR

Natural Language Dialog

Exten

t of A

utom

atio

n

BorrowedSlide

Eurospeech 2003 87

Past, Present, Future….

MATCH: Multimodal Access To City Help

Keyword spottingHandcrafted grammars No dialogue

Directory AssistanceVRCP

• Constrained speech• Minimal data collection• Manual design

Medium size ASRHandcrafted Grammars System Initiative

Airline reservationBanking

• Constrained speech• Moderate data collection• Some automation

Large size ASR Limited NLU Mixed-initiative

Call centers, E-commerce

• Spontaneous speech• Extensive data collection• Semi-automation

1990+

Unlimited ASR Deeper NLU Adaptive systems

Multimodal, MultilingualHelp Desks, E-commerce

• Spontaneous speech/pen• Fully automated systems

1995+

2000+

2005+

BorrowedSlide

Eurospeech 2003 88

Example of Upper Bound:Reverse Turing Test

(Kochanski et al., ICSLP-2002)

• Assume: won’t pass Turing Test (any time soon)• Assumptions you can bank on

– Liberace: cry all the way to the bank• Good apps for crummy (limited) technology

– “Good Applications for Crummy Machine Translation” • Church & Hovy (1993)

• Reverse Turing Test– Owner of web site wants to grant access to people but not to spiders– Task: distinguish friend from foe, man from beast– Solution: assume there are a class of problems (AI-complete) that any

person can do and no machine can.• Currently deployed Reverse Turing Applications

– Assume OCR is AI-complete– User is given a degraded image and asked to enter text into a form– Easy for people but challenging for machines

• Problem: OCR is not challenging enough for machines• Proposal: Speech recognition with noise is more challenging

– We can bank on not solving the cocktail party effect any time soon

Eurospeech 2003 89

Where have we been and where are we going?

1. Consistent progress over decades• Moore’s Law, Speech Coding, Error Rate

2. History repeats itself• Empiricism: 1950s• Rationalism: 1970s• Empiricism: 1990s• Rationalism: 2010s (?)

Discontinuities: Fundamental changes that invalidate fundamental assumptions

• Petabytes: $2,000,000 $2,000• Can demand keep up with supply?• If not Tech meltdown• New priorities: Search >> Compression & Dictation

Eurospeech 2003 90

Statistical MT:IBM Models 1-5

• E Noisy Channel F• E΄ = ARGMAXE Pr(E) Pr(F|E)• Language Model, Pr(E):

– Trigram model (borrowed from speech recog)• Channel Model, Pr(F|E):

– Based on aligned parallel corpora– Models 1-5: alignment

• Mercer & Church (Computational Linguistics, 1993)– Statistical MT may fail for reasons advanced by Chomsky– Regardless of its ultimate success or failure,– There is a growing community of researchers in corpus-based

linguistics who believe it will produce valuable lexical resources• Bilingual concordances• Translation tools• Training & testing material for word sense disambig (senseval)

Eurospeech 2003 91

Word Sense Disambiguation

• Knowledge Acquisition Bottleneck– Bar-Hillel (1960)– Expert systems don’t scale– Sense-tagged text: expensive– Parallel text!

• Translation = sense-tagged text– Sentence (judicial sense) peine– Sentence (syntactic sense) phrase

• Yarowsky: bilingual monolingual• One sense per discourse• Machine Learning: early example of co-training (EM alg)

Eurospeech 2003 92

TMI-02 Keynote (similar subject)The organizers asked me…

• What's changed since TMI-92 (if anything)?– TMI-92: great excitement over the use of aligned parallel corpora to help

human translators (translation tools)– Also, much controversy over IBM Models 1-5

• Have IBM Models 1-5 failed to solve all the world’s problems?• So what's happened (if anything) since 1992?

– Empiricism has come of age• Textbooks: Charniak, Jelinek, Manning & Schultze, Jurafsky & Martin• Textbooks courses in many universities around the world

– What used to be considered radical is now accepted practice• Evaluation is practically required for publication

– Mercer’s fighting words: More data is better data!• Aren’t as shocking when Brill makes the case a decade later

– The new field of Machine Learning has absorbed many good (and formally controversial) ideas including

• IBM Models 1-5• Yarowsky's Word Sense Disambiguation

– Grew out of Machine Translation,– But is now widely cited in Machine Learning as an early example of co-training

Eurospeech 2003 93

What has happened to the IBM-Approach to Machine Translation?

• Support for human translators – Terminology: translators don’t need help with the easy

vocabulary and the easy grammar– Translation Memory: translators are often asked to translate

the same material again and again (e.g., revisions of manuals)– Alignment

• Fully automatic– CLIR: cross-language information retrieval– Translating web pages

• Academic fields– Machine Learning: most important contributionmost important contribution– Corpus-based Lexicography: spreading into lots of other fields

Eurospeech 2003 94

Revival of Empiricism:A Personal Perspective

• As a student at MIT, I was solidly opposed to empiricism– But that changed soon after moving to AT&T Bell Labs (1983)

• Letter-to-Sound Rules (speech synthesis)– Names: Letter stats Etymology Pronunciation video– NetTalk: Neural Nets video

• Demo: great theater unrealistic expectations • Self-organizing systems v. empiricism• Machine Learning v. Corpus-based Linguistics• I did it, I did it, I did it, but…

• Part of Speech Tagging (1988)• Word Associations (Hanks)

– Mutual info collocations & word associations• Collocations: Strong tea v. powerful computers• Word Associations: bread and butter, doctor/nurse

• Good-Turing Smoothing (Gale)• Aligning Parallel Corpora (inspired by MT)• Word Sense Disambiguation

– Bilingual Monolingual• Even if IBM’s approach fails for MT lasting benefit (tools, linguistic

resources, academic contributions to machine learning)

Eurospeech 2003 95

Speech Coding

(Telephony)

• More complicated than Moore’s Law– Many Dimensions: Bit Rate, Quality, Complexity and Delay– Quality ceiling (imposed by telephone standards)

• Easy to reach the ceiling at high bit rates (≥ 8 kb/s)• More room for progress at low bit rates (≤ 8 kb/s)

• Moore’s Law Time Constant– Bit rates half every decade (≤ 8 kb/s)– Relatively slow by Moore’s Law standards (not hyper-inflation)

• Performance doubles every decade• Like disk seek or money in the bank (normal inflation)

– Limited more by physics than investment• Potential compression opportunity

– At most 10x: 8 kb/s 2 kb/s 1 kb/s (?)• Speech (2 kb/s) >> text (2 bits/char): 100-1000 times more bits

– Speech coding will not close this gap for foreseeable future

Ceiling