189
cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing Mehryar Mohri Michael Riley Richard Sproat AT&T Laboratories AT&T Laboratories Bell Laboratories [email protected] [email protected] [email protected] Joint work with Emerald Chung, Donald Hindle, Andrej Ljolje, Fernando Pereira Tutorial presented at COLING’96, August 3rd, 1996.

Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

cmp-lg/9608018 v2 17 Sep 1996

Algorithms for Speech Recognition andLanguage Processing

Mehryar Mohri Michael Riley Richard Sproat

AT&T Laboratories AT&T Laboratories Bell Laboratories

[email protected] [email protected] [email protected]

Joint work with

Emerald Chung, Donald Hindle, Andrej Ljolje, Fernando Pereira

Tutorial presented at COLING’96, August 3rd, 1996.

Page 2: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Introduction (1)

Text and speech processing: hard problems� Theory of automata� Appropriate level of abstraction� Well-defined algorithmic problems

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing Introduction 2

Page 3: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Introduction (2)

Three Sections:� Algorithms for text and speech processing (2h)� Speech recognition (2h)� Finite-state methods for language processing (2h)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing Introduction 3

Page 4: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

PART IAlgorithms for Text and Speech Processing

Mehryar Mohri

AT&T Laboratories

[email protected]

August 3rd, 1996

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 4

Page 5: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Definitions: finite automata (1)A = (Σ; Q; �; I; F )� Alphabet Σ,� Finite set of states Q,� Transition function �: Q� Σ! 2Q,� I � Q set of initial states,� F � Q set of final states.A recognizes L(A) = fw 2 Σ� : �(I; w) \ F 6= ;g(Hopcroft and Ullman, 1979; Perrin, 1990)

Theorem 1 (Kleene, 1965). A set is regular (or rational) iff it can be

recognized by a finite automaton.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 5

Page 6: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Definitions: finite automata (2)

0

ab

1a

2b

3a

0

b1a

a

2

b

b

3a

a

b

Figure 1: L(A) = Σ�aba.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 6

Page 7: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Definitions: weighted automata (1)A = (Σ; Q; �; �; �; �; I; F )� (Σ; Q; �; I; F ) is an automaton,� Initial output function �,� Output function �: Q� Σ�Q! K,� Final output function �,� Function f : Σ� ! (K;+; �) associated with A:8u 2 Dom(f); f(u) = X(i;q)2I�(�(i;u)\F )(�(i) � �(i; u; q) � �(q)).M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 7

Page 8: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Definitions: weighted automata (2)

0/4 1/0a/0

3/0a/2

2/0

b/1

b/0a/0

Figure 2: Index of t = aba.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 8

Page 9: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Definitions: rational power series� Power series: functions mapping Σ� to a semiring (K;+; �)

– Notation: S = Xw2Σ�(S;w)w, (S;w): coefficients

– Support: supp(S) = fw 2 Σ� : (S;w) 6= 0g– Sum: (S + T; w) = (S;w) + (T;w)– Star: S� =Xn�0

Sn– Product: (ST; w) = Xuv=w2Σ�(S; u)(T; v)� Rational power series: closure under rational operations of polynomials

(polynomial power series) (Salomaa and Soittola, 1978; Berstel and

Reutenauer, 1988)

Theorem 2 (Schutzenberger, 1961). A power series is rational iff it can be

represented by a weighted finite automaton.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 9

Page 10: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Definitions: transducers (1)T = (Σ;∆; Q; �; �; I; F )� Finite alphabets Σ and ∆,� Finite set of states Q,� Transition function �: Q� Σ! 2Q,� Output function �: Q� Σ�Q! Σ�,� I � Q set of initial states,� F � Q set of final states.T defines a relation:R(T ) = f(u; v) 2 (Σ�)2 : v 2 [q2(�(I;u)\F ) �(I; u; q)gM.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 10

Page 11: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Definitions: transducers (2)

0

a:a

1a:a

b:b

3a:b

2

b:a

b:ab:a

a:a

Figure 3: Fibonacci normalizer ([abb! baa] � [baa abb]).M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 11

Page 12: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Definitions: weighted transducers

0

a:b/0b:a/1

1a:b/0

a:b/1

2b:c/1

3/0a:b/0

Figure 4: Example, aaba ! (bbcb; (0� 0� 1� 0)� (0� 1� 1� 0)).(min;+) : aaba! minf1; 2g = 1(+; �) : aaba! 0 + 0 = 0

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 12

Page 13: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Composition: Motivation (1)� Construction of complex sets or functions from more elementary ones� Modular (modules, distinct linguistic descriptions)� On-the-flyexpansion

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 13

Page 14: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Composition: Motivation (2)

lexicalanalyzer

syntaxanalyzer

semanticanalyzer

intermediate codegenerator

codeoptimizer

codegenerator

source program

target programFigure 5: Phases of a compiler (Aho et al., 1986).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 14

Page 15: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Composition: Motivation (3)

Spellchecker

Inflected forms

Index

Source text

Set of positions

Figure 6: Complex indexation.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 15

Page 16: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Composition: Example (1)

0 1a:a

2b:ε

3c:ε

4d:d

0 1a:d

2:eε

3d:a

(0,0) (1,1)a:d

(2,2)b:e

(3,2)c:ε

(4,3)d:a

Figure 7: Composition of transducers.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 16

Page 17: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Composition: Example (2)

0 1a:a/3

2b:ε/1

3c:ε/4

4d:d/2

0 1a:d/5

2:eε /7

3d:a/6

(0,0) (1,1)a:d/15

(2,2)b:e/7

(3,2)c:ε/4

(4,3)d:a/12

Figure 8: Composition of weighted transducers (+; �).M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 17

Page 18: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Composition: Algorithm (1)� Construction of pairs of states

– Match: q1

a:b=w1�! q01 and q2

b:c=w2�! q02– Result: (q1; q2) a:c=(w1�w2)�! (q01; q02)� Elimination of �-paths redundancy: filter� Complexity: quadratic� On-the-flyimplementation

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 18

Page 19: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Composition: Algorithm (2)

a:a b:ε c:ε d:d

a:d ε:e d:a

a:d ε1:e d:a

ε2:ε ε2:ε ε2:ε ε2:ε

ε:ε1

a:a b:ε2 c:ε2 d:d

ε:ε1 ε:ε1 ε:ε1 ε:ε1

0 1 2 3 4

0 1 2 3

0 1 2 3

0 1 2 3 4

(a)

(b)

(c)

(d)

A

B

A'

B'

Figure 9: Composition of weighted transducers with �-transitions.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 19

Page 20: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Composition: Algorithm (3)

0,0 1,1 1,2

2,1 2,2

3,1 3,2

4,3

a:d ε:e

b:ε

c:ε

b:ε

c:ε

ε:e

ε:e

d:a

b:e

(x:x) (ε1:ε1)

(ε1:ε1)

(ε1:ε1)

(ε2:ε2)(ε2:ε2)

(ε2:ε2) (ε2:ε2)

(x:x)

(ε2:ε1)

Figure 10: Redundancy of �-paths.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 20

Page 21: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Composition: Algorithm (4)

0

x:xε2:ε1

1ε1:ε1

2

ε2:ε2

x:x

ε1:ε1

x:x

ε2:ε2

Figure 11: Filter for efficient composition.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 21

Page 22: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Composition: Theory� Transductions (Elgot and Mezei, 1965; Eilenberg, 1974 1976;

Berstel, 1979).� Theorem 3 Let�1 and�2 be two (weighted) (automata +

transducers), then(�1 � �2) is a (weighted) (automaton + transducer).� Efficient composition of weighted transducers (Mohri, Pereira, and

Riley, 1996).� Works with any semiring� Intersection: composition of automata (weighted).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 22

Page 23: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Intersection: Example

0

b1a

a

2

b

b

3a

a

b

0

1b 3

a

c

2b

c

4ba

5a

(0,0)

(0,1)b

(1,3)a

(0,2)b

(2,4)b

a

(3,5)a

Figure 12: Intersection of automata.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 23

Page 24: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Union: Example

0

b/11a/3

a/5

2

b/2

b/6

3/0a/4

a/3

b/7

0

1b/5 3

a/3

c/0

2b/2

c/1

4b/3a/6

5/0a/4

0

1b/5 3

a/3

c/0

2b/2

c/1

4b/3a/6

5/0a/4

6

b/1

7a/3

a/5

8

b/2

b/6

9/0

a/4

a/3

b/7

10

ε/0

ε/0

Figure 13: Union of weighted automata (min;+).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 24

Page 25: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Determinization: Motivation (1)� Efficiency of use (time)� Elimination of redundancy� No loss of information ( 6= pruning)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 25

Page 26: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Determinization: Motivation (2)

0

1

which/69.9

2

which/72.9

3

which/77.7

4which/81.6

5

flights/54.3

6

flights/64

7flight/72.4

flights/50.2

8

flights/83.89

flights/88.2

flight/45.4

flights/79

flights/83.4

flight/43.7

flights/53.5

flights/61.8

10leave/64.6

11

leaves/67.6

12leave/70.9

13

leave/73.6

14leave/82.1

leaves/51.4

leave/54.4

leave/57.7

leaves/60.4

leave/68.9

leave/44.4

leave/47.4

leaves/50.7

leave/53.4

leave/61.9

leave/35.9

leaves/39.2

leave/41.9

leave/31.3

leaves/34.6

leave/37.3

leave/45.8

15/0

Detroit/106

Detroit/110

Detroit/109

Detroit/102

Detroit/106

Detroit/105

Detroit/99.1

Detroit/103

Detroit/102

Detroit/96.3

Detroit/99.7

Detroit/99.4

Detroit/88.5

Detroit/91.9

Detroit/91.6

Figure 14: Toylanguage model (16 states, 53 transitions, 162 paths).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 26

Page 27: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Determinization: Motivation (3)

0 1which/69.9

2flights/53.1

3flight/53.2

4leave/64.6

5leaves/62.3

6leave/63.6

7

leaves/67.6

8/0

Detroit/103

Detroit/105

Detroit/105

Detroit/101

Figure 15: Determinized language model(9 states, 11 transitions, 4 paths).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 27

Page 28: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Determinization: Example (1)

t4

0

2a

b

1

a

b

3

b

b

b

b

{0} {1,2}a

b{3}

b

Figure 16: Determinization of automata.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 28

Page 29: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Determinization: Example (2)

t4

0

2a/1

b/4

1/0

a/3

b/1

3/0

b/1

b/3

b/3

b/5

{(0,0)}

{(1,2),(2,0)}/2a/1

{(1,0),(2,3)}/0

b/1{(3,0)}/0

b/1

b/3

Figure 17: Determinization of weighted automata (min;+).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 29

Page 30: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Determinization: Example (3)

0

2

a:b

b:a

1

a:ba

b:aa

3

c:c

d:ε

{(0,ε)} {(1,a),(2,ε)}a:b

b:a

a

{(3,ε)}

c:c

d:a

Figure 18: Determinization of transducers.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 30

Page 31: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Determinization: Example (4)

0

2

a:b/3

b:a/2

1/0

a:ba/4

b:aa/3

3/0

c:c/5

d:ε/4

{(0,e,0)} {(1,a,1),(2,ε,0)}a:b/3

b:a/2

a/1

{(3,ε,0)}/0

c:c/5

d:a/5

Figure 19: Determinization of weighted transducers (min;+).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 31

Page 32: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Determinization: Algorithm (1)� Generalization of the classical algorithm for automata

– Powerset construction

– Subsets made of (state, weight) or (state, string, weight)� Applies to subsequentiable weighted automata and transducers� Time and space complexity: exponential (polynomial w.r.t. size of

the result)� On-the-flyimplementation

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 32

Page 33: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Determinization: Algorithm (2)

Conditions of applications� Twin states: q and q0 are twin states iff:

– If: they can be reached from the initial states by the same input

string u– Then: cycles at q and q0 with the same input string v have the

same output value� Theorem 4 (Choffrut, 1978; Mohri, 1996a) Let� be an

unambiguous weighted automaton (transducer, weighted transducer),

then� can be determinized iff it has the twin property.� Theorem 5 (Mohri, 1996a) The twin property can be tested in

polynomial time.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 33

Page 34: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Determinization: Theory� Determinization of automata

– General case (Aho, Sethi, and Ullman, 1986)

– Specific case of Σ��: failure functions (Mohri, 1995)� Determinization of transducers, weighted automata, and weighted

transducers

– General description, theory and analysis (Mohri, 1996a; Mohri,

1996b)

– Conditions of application and test algorithm

– Acyclic weighted transducers or transducers admit

determinization� Can be used with other semirings (ex: (R;+; �))M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 34

Page 35: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Local determinization: Motivation� Time efficiency� Reduction of redundancy� Control of the resulting size (flexibility)� Equivalent function (or equal set)� No loss of information

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 35

Page 36: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Local determinization: Example

0

1a:a/3

b:a/5

2a:b/4

b:b/6

c:a/2

3a:a/5

b:a/74

c:a/3

5b:c/3

a:b/3

a:c/2

a:a/3

c:b/2

0

1

{(1,a,0),(2,b,1),(3,a,2)}

a:ε/3

b:ε/53

{(2,ε,0)}

c:a/2

:b/1ε:

2

{(1,ε,0)}

:a/0ε

4

{(3,ε,0)}

:a/2ε 5

a:b/3

6

a:c/2

c:a/3

b:c/3

a:a/3

c:b/2

Figure 20: Local determinization of weighted transducers (min;+).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 36

Page 37: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Local determinization: Algorithm� Predicate, ex: (P ) (out� degree(q) > k)� k: threshold parameter� Local: Dom(det) = fq : P (q)g� Determinizationonly for q 2 Dom(det)� On-the-flyimplementation� Complexity O(jDom(det)j �maxq2Q(out� degree(q)))M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 37

Page 38: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Local determinization: theory� Various choices of predicate (constraint: local)� Definition of parameters� Applies to all automata, weighted automata, transducers, and

weighted transducers� Can be used with other semirings (ex: (R;+; �))M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 38

Page 39: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Minimization: Motivation� Space efficiency� Equivalent function (or equal set)� No loss of information ( 6= pruning)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 39

Page 40: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Minimization: Motivation (2)

0 1which/69.9

2flights/53.1

3flight/53.2

4leave/64.6

5leaves/62.3

6leave/63.6

7

leaves/67.6

8/0

Detroit/103

Detroit/105

Detroit/105

Detroit/101

Figure 21: Determinized language model.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 40

Page 41: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Minimization: Motivation (3)

0 1which/291

2flights/0

3flight/1.34

4

leave/0.0498

leaves/0

leave/0

leaves/0.132

5/0Detroit/0

Figure 22: Minimized language model.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 41

Page 42: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Minimization: Example (1)

t96

01

a

3

b

a

2b

4

b

c

5

a

bc

b

a

t97

0

1a

3

b

a

2

b

b

c 4a

b

Figure 23: Minimization of automata.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 42

Page 43: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Minimization: Example (2)

0 1a:0

b:1

d:0

2a:3

4

b:2

3

c:2

5

c:1

d:4

6

e:3c:1

7e:1

d:3 e:2

0 1a:6

b:7

d:0

2a:3

4

b:0

3

c:0

5

c:0

d:6 6e:0

c:1

7e:0

d:6 e:0

0 1a:6

b:7

d:0

2

a:3

b:0

3

c:0

d:64

e:0

c:1

5e:0

Figure 24: Minimization of weighted automata (min;+).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 43

Page 44: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Minimization: Example (3)

0

1a:A

4

b:C

2b:B

5b:C

3

c:C

d:D

a:DB

6e:D

7f:BC

c:D

0

1a:ABCDB

4

b:CCDDB

2b:ε

5b:ε

3

c:ε

d:CDB

a:DB

6e:C

7f:ε

c:ε

0 1a:ABCDB

b:CCDDB

2b:ε

3

c:ε

d:CDB

a:DB

5e:C

6f:ε

Figure 25: Minimization of transducers.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 44

Page 45: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Minimization: Example (4)

0

1a:A/0

4

b:C/2

2b:B/5

5b:C/2

3

c:C/3

d:D/1

a:DB/2

6e:D/1

7/0f:BC/6

c:D/4

0

1a:ABCDB/15

4b:CCDDB/15

2b: ε/0

5b: ε/0

3

c: ε/0

d:CDB/9

a:DB/2

6e:C/0

7/0f:ε/0

c: ε/0

0 1a:ABCDB/15

b:CCDDB/15

2b:ε/0

3

c: ε/0

d:CDB/9

a:DB/2

5e:C/0

6/0f: ε/0

Figure 26: Minimization of weighted transducers (min;+).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 45

Page 46: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Minimization: Algorithm (1)� Two steps

– Pushingor extractionof strings or weights towards initial state

– Classical minimization of automata, (input,ouput) considered as a

single label� Algorithm for the first step

– Transducers: specific algorithm

– Weighted automata: shortest-paths algorithms

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 46

Page 47: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Minimization: Algorithm (2)� Complexity

– E: set of transitions

– S: sum of the lengths of output strings

– the longest of the longest common prefixes of the output paths

leaving each state

Type General Acyclic

Automata O(jEj � log(jQj)) O(jQj+ jEj)Weighted automata O(jEj � log(jQj)) O(jQj+ jEj)Transducers O(jQj+ jEj� O(S + jEj+ jQj+(log jQj+ jPmaxj)) (jEj � (jQj � jF j)) �jPmaxj)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 47

Page 48: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Minimization: Theory� Minimization of automata (Aho, Hopcroft, and Ullman, 1974; Revuz,

1991)� Minimization of transducers (Mohri, 1994)� Minimization of weighted automata (Mohri, 1996a)

– Minimal number of transitions

– Test of equivalence� Standardization of power series (Schutzenberger, 1961)

– Works only with fields

– Creates too many transitions

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 48

Page 49: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Conclusion (1)� Theory

– Rational power series

– Weighted automata and transducers� Algorithms

– General (various semirings)

– Efficiency (used in practice, large sizes)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 49

Page 50: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Conclusion (2)� Applications

– Text processing

(spelling checkers, pattern-matching, indexation, OCR)

– Language processing

(morphology, phonology, syntax, language modeling)

– Speech processing (speech recognition, text-to-speech synthesis)

– Computational biology (matching with errors)

– Many other applications

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 50

Page 51: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

PART IISpeech Recognition

Michael Riley

AT&T Laboratories

[email protected]

August 3rd, 1996

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 51

Page 52: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Overview� The speech recognition problem� Acoustic, lexical and grammatical models� Finite-state automata in speech recognition� Search in finite-state automata

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 52

Page 53: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Speech Recognition

Given an utterance, find its most likely written transcription.

Fundamental ideas:� Utterances are built from sequences of units� Acoustic correlates of a unit are affected by surrounding units� Units combine into units at a higher level — phones! syllables!

words� Relationships between levels can be modeled by weighted graphs —

we use weighted finite-state transducers� Recognition: find the best path in a suitable product graph

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 53

Page 54: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Levels of Speech Representation

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 54

Page 55: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Maximum A Posteriori Decoding

Overall analysis [4, 57]:� Acoustic observations:parameter vectors derived by local spectralanalysis of the speech waveform at regular (e.g. 10msec) intervals� Observation sequence o� Transcriptions w� Probability P (ojw) of observing o when w is uttered� Maximum a posteriori decoding:

w = argmaxw

P (wjo) = argmaxw

P (ojw)P (w)P (o)= argmaxw

P (ojw)| {z }generative

modelP (w)| {z }

language

model

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 55

Page 56: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Generative Models of Speech

Typical decomposition of P (ojw) into conditionally-independent

mappings between levels:� Acoustic model P (ojp) : phone sequences! observation sequences.

Detailed model:

– P (ojd) : distributions! observation vectors —

symbolic! quantitative

– P (djm) : context-dependent phone models!distribution sequences

– P (mjp) : phone sequences! model sequences� Pronunciation model P (pjw) : word sequences! phone sequences� Language model P (w) : word sequences

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 56

Page 57: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Recognition Cascades: General Form� Multistage cascade:

o = sk w = s0s1sk−1stage k stage 1

Find s0 maximizingP (s0; sk) = P (skjs0)P (s0) = P (s0) Xs1;:::;sk�1

Y1�j�kP (sj jsj�1)� “Viterbi” approximation:

Cost(s0; sk) = Cost(skjs0) + Cost(s0)Cost(skjs0) � mins1;:::;sk�1

P1�j�k Cost(sj jsj�1)

where Cost(: : :) = � logP (: : :).M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 57

Page 58: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Speech Recognition Problems� Modeling: how to describe accurately the relations between levels)

modeling errors� Search:how to find the best interpretation of the observations

according to the given models) search errors

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 58

Page 59: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Acoustic Modeling – Feature Selection I� Short-time spectral analysis:

log

����Z g(� )x(t + � )e�i2�f� d� ����Short-time (25 msec. Hamming window) spectrum of /ae/ – Hz. vs. Db.� Scale selection:

– Cepstral smoothing

– Parameter sampling (13 parameters)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 59

Page 60: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Acoustic Modeling – Feature Selection II [40, 38]� Refinements

– Time derivatives – 1st and 2nd order

– non-Fourier analysis (e.g., Mel scale)

– speaker/channel adaptation� mean cepstral subtraction� vocal tract normalization� linear transformations� Result: 39 dimensional feature vector (13 cepstra, 13 delta cepstra,

13 delta-delta cepstra) every 10 milliseconds

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 60

Page 61: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Acoustic Modeling – Stochastic Distributions [4, 61, 39, 5]� Vector quantization – find codebook of prototypes� Full covariance multivariate Gaussians:P [y] = 1(2�)N=2jSj1=2

e� 12 (yT��T )S�1(y��)� Diagonal covariance Gaussian mixtures� Semi-continuous, tied mixtures

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 61

Page 62: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Acoustic Modeling – Units and Training [61, 36]� Units

– Phonetic (sub-word) units – e.g., cat –> /k ae t/

– Context-dependent units – aek;t– Multiple distributions (states) per phone – left, middle, right� Training

– Given a segmentation, training straight-forward

– Obtain segmentation by transcription

– Iterate until convergence

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 62

Page 63: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Generating Lexicons – Two Steps� Orthography! Phonemes“had”! /hh ae d/“your”! /y uw r/

– complex, context-independent mapping

– usually small number of alternatives

– determined by spelling constraints; lexical “facts”

– large online dictionaries available� Phonemes! Phones/hh ae d y uw r/! [hh ae dcl jh axr] (60% prob)/hh ae d y uw r/! [hh ae dcl d y axr] (40% prob)

– complex, context-dependent mapping

– many possible alternatives

– determined by phonological and phonetic constraints

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 63

Page 64: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Decision Trees: Overview [9]� Description/Use: Simple structure – binary tree of decisions,

terminal nodes determine prediction (cf. “Game of Twenty

Questions”).If dependent variable is categorical (e.g., red,

yellow, green), called “classification tree”, if continuous, called

“regression tree”.� Creation/Estimation: Creating a binary decision tree for

classification or regression involves three steps (Breiman, et al):

1. Splitting Rules:Which split to take at a node?

2. Stopping Rules:When to declare a node terminal?

3. Node Assignment:Which class/value to assign to a terminal node?

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 64

Page 65: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

1. Decision Tree Splitting Rules

Which split to take at a node?� Candidate splits considered.

– Binary cuts: For continuous�1 � x <1, consider splits of

form: x � k vs: x > k; 8k:– Binary partitions: For categoricalx 2 f1; 2; :::; ng = X ,

consider splits of form:x 2 A vs: x 2 X �A; 8A � X:M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 65

Page 66: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

1. Decision Tree Splitting Rules – Continued� Choosing best candidate split.

– Method 1: Choose k (continuous) or A (categorical) that

minimizes estimated classification (regression) error after split.

– Method 2 (for classification): Choose k or A that minimizes

estimated entropy after that split.

No. 1: 100

No. 2: 300

No. 1: 300

No. 2: 100

No. 1: 400

No. 2: 400

SPLIT #1

No. 1: 200

No. 2: 0

No. 1: 200

No. 2: 400

No. 1: 400

No. 2: 400

SPLIT #2

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 66

Page 67: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

2. Decision Tree Stopping Rules

When to declare a node terminal?� Strategy (Cost-Complexity pruning):

1. Grow over-large tree.

2. Form sequence of subtrees, T0; :::; Tn ranging from full tree tojust the root node.

3. Estimate “honest” error rate for each subtree.

4. Choose tree size with mininum “honest” error rate.� To form sequence of subtrees, vary � from 0 (for full tree) to1 (forjust root node) in:

minT �R(T ) + � jT j � :� To estimate “honest” error rate, test on data different from trainingdata, e.g., grow tree on 9=10 of available data and test on 1=10 of datarepeating 10 times and averaging (cross-validation).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 67

Page 68: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

End of Declarative Sentence Prediction: PruningSequence

++++++++++++++++++++++++++++++++++++++++++++++

++++

+

+

+

+ = raw, o = cross-validated# of terminal nodes

erro

r ra

te

0 20 40 60 80 100

0.0

0.00

50.

015

0.02

5

oooooooooooooooooooooooooooo

ooooo

o

o

o

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 68

Page 69: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

3. Decision Tree Node Assignment

Which class/value to assign to a terminal node?� Plurality vote: Choose most frequent class at that node for

classification; choose mean value for regression.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 69

Page 70: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

End-of-Declarative-Sentence Prediction: Features [65]� Prob[word with “.” occurs at end of sentence]� Prob[word after “.” occurs at beginning of sentence]� Length of word with “.”� Length of word after “.”� Case of word with “.”: Upper, Lower, Cap, Numbers� Case of word after “.”: Upper, Lower, Cap, Numbers� Punctuation after “.” (if any)� Abbreviation class of word with “.”: – e.g., month name,

unit-of-measure, title, address name, etc.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 70

Page 71: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

End of Declarative Sentence?

bprob:<27.29bprob:>27.29

48294/52895

yes

eprob:<1.045eprob:>1.045

5539/10020

yes

3289/3547

no

next:cap,upcase+.next:n/a,lcase,lcase+.,upcase,num

5281/6473

yes

type:n/atype:addr,com,group,state,title,unit

5156/5435

yes

5137/5283

yes

133/152

no

913/1038

no

42755/42875

yes

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 71

Page 72: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Phoneme-to-Phone Alignment

PHONEME PHONE WORDp p purpose

er erp pcl

- pax ix

s sae ax and

n nd -

r r respectih ix

s sp pcl

- peh eh

k kclt t

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 72

Page 73: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Phoneme-to-Phone Realization: Features [66, 10, 62]� Phonemic Context:

– Phoneme to predict

– Three phonemes to left

– Three phonemes to right� Stress (0, 1, 2)� Lexical Position:

– Phoneme count from start of word

– Phoneme count from end of word

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 73

Page 74: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Phoneme-to-Phone Realization: Prediction Example

Tree splits for /t/ in ‘‘your pretty red’’:

PHONE COUNT SPLITix 182499

n 87283 cm0: vstp,ustp,vfri,ufri,vaff,uaff,nas

kcl+k 38942 cm0: vstp,ustp,vaff,uaff

tcl+t 21852 cp0: alv,pal

tcl+t 11928 cm0: ustp

tcl+t 5918 vm1: mono,rvow,wdi,ydi

dx 3639 cm-1: ustp,rho,n/a

dx 2454 rstr: n/a,no

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 74

Page 75: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Phoneme-to-Phone Realization: Network Example

Phonetic network for ‘‘Don had your pretty...’’:

PHONEME PHONE1 PHONE2 PHONE3 CONTEXTd 0.91 daa 0.92 aan 0.98 nhh 0.74 hh 0.15 hvae 0.73 ae 0.19 ehd 0.51 dcl jh 0.37 dcl dy 0.90 y (if d!dcl d)

0.84 - 0.16 y (if d!dcl jh)uw 0.48 axr 0.29 err 0.99 -p 0.99 pcl pr 0.99 rih 0.86 iht 0.73 dx 0.11 tcl tiy 0.90 iy

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 75

Page 76: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Acoustic Model Context Selection [92, 39]� Statistical regressiontrees used to predict contexts based on

distribution variance� One tree per context-independent phone and state (left, middle, right)� The trees were grown until the data criterion of 500 frames per

distribution was met� Trees pruned using cost-complexity pruning and cross-validation to

select best contexts� About 44000 context-dependent phone models� About 16000 distributions

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 76

Page 77: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

N-Grams: Basics� ‘Chain Rule’ and Joint/Conditional Probabilities:P [x1x2 : : : xN ] = P [xN jx1:::xN�1]P [xN�1jx1:::xN�2] : : : P [x2jx1]P [x1]

where, e.g., P [xN jx1 : : : xN�1] = P [x1 : : : xN ]P [x1 : : : xN�1]� (First–Order) Markov assumption:P [xkjx1 : : : xk�1] = P [xkjxk�1] = P [xk�1xk]P [xk�1]� nth–Order Markov assumption:P [xkjx1 : : : xk�1] = P [xkjxk�n:::xk�1] = P [xk�n : : : xk]P [xk�n : : : xk�1]M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 77

Page 78: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

N-Grams: Maximum Likelihood Estimation

Let N be total number of n-grams observed in a corpus and c(x1 : : : xn)

be the number of times the n-gram x1 : : : xn occurred. ThenP [x1 : : : xn] = c(x1 : : : xn)Nis the maximum likelihood estimate of that n-gram probability.

For conditional probabilities,P [xnjx1 : : : xn�1] = c(x1 : : : xn)c(x1 : : : xn�1) :is the maximum likelihood estimate.

With this method, an n-gram that does not occur in the corpus is assigned

zero probability.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 78

Page 79: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

N-Grams: Good-Turing-Katz Estimation [29, 16]

Let nr be the number of n-grams that occurred r times. ThenP [x1 : : : xn] = c�(x1 : : : xn)Nis the Good-Turing estimate of that n-gram probability, wherec�(x) = (c(x) + 1)nc(x)+1nc(x) :For conditional probabilities,P [xnjx1 : : : xn�1] = c�(x1 : : : xn)c(x1 : : : xn�1) ; c(x1 : : : xn) > 0

is Katz’s extension of the Good-Turing estimate.

With this method, an n-gram that does not occur in the corpus is assigned

the backoff probability P [xnjx1 : : : xn�1] = �P [xnjx2 : : : xn�1]; where� is a normalizing constant.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 79

Page 80: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Finite-State Modeling [57]

Our view of recognition cascades: represent mappings between levels,

observation sequences and language uniformly with weightedfinite-state

machines:� Probabilistic mapping P (xjy): weighted finite-state transducer.

Example — word pronunciation transducer:

d:ε/1 ey:ε/.4

ae:ε/.6

dx:ε/.8

t:ε/.2

ax:"data"/1

� Language model P (w): weighted finite-state acceptor

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 80

Page 81: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Example of Recognition Cascade

phones wordsA D M

observationsO� Recognition from observations o by composition:

– Observations:O(s; s) =8<: 1 if s= o

0 otherwise

– Acoustic-phone transducer:A(a;p) = P (ajp)– Pronunciation dictionary:D(p;w) = P (pjw)– Language model:M(w;w) = P (w)� Recognition:w = argmax

w

(O �A �D �M)(o;w)M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 81

Page 82: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Speech Models as Weighted Automata� Quantized observations:

on. . .t1 t2t0o1 o2 tn� Phone model A� : observations! phones

oi:ε/p01(i) ε:π/p2f

...

... ...

... ...

oi:ε/p12(i)

oi:ε/p00(i) oi:ε/p11(i) oi:ε/p22(i)

s0 s1 s2

Acoustic transducer: A = �P� A���� Word pronunciations Ddata : phones! words

d:ε/1 ey:ε/.4

ae:ε/.6

dx:ε/.8

t:ε/.2

ax:"data"/1

Dictionary: D = �PwDw��M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 82

Page 83: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Example: Phone LatticeO �A� Lattices: Weighted acyclic graphs representing possible

interpretations of an utterance as sequences of units at a given level of

representation (phones, syllables, words,: : : )� Example: result of composing observation sequence for hostile battle

with acoustic model:

0 14hh/-9.043

20aa/-2.743

ao/-2.593

34

s/-12.007

31

s/-8.579

f/-8.129

27

s/-4.893

f/-4.100

23

v/-0.421

40

s/-4.436

t/-3.386

38s/-3.343

t/-3.621s/-3.179

t/-2.493

th/-2.464

pau/-2.421

s/-7.129

s/-6.200

t/-6.200

s/-7.214

th/-6.257

pau/-6.621

s/-3.493

t/-3.229

f/-3.057

th/-3.264

pau/-4.207

s/-10.657

s/-9.893

s/-11.479

s/-8.007

s/-3.336

44

ax/-2.971

en/-1.72948

el/-6.457

ax/-3.721

el/-4.229

53d/-4.721 58

b/-8.007

v/-2.150

b/-2.271

70ae/-13.100

68

ae/-10.600

74n/-1.236

q/-0.336

dx/-0.514

n/-2.857

78

r/-0.579

ax/-0.379

uw/-0.714

83el/-5.679

l/-4.371

el/-4.357

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 83

Page 84: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Sample Pronunciation DictionaryD

Dictionary with hostile, battleand bottleas a weighted transducer:

0

15

-:-/2.466

l:-/0.112

14

b:bottle/0.000

17-:-/0.000

16-:-/0.000

1

-:-/0.014

2

ax:-/2.607

ay:-/1.616

el:-/0.4313

t:-/0.067

4s:-/0.035

5

-:-/2.466

l:-/0.112

6-:-/0.014

7

ax:-/2.607

el:-/0.164

8t:-/2.113

dx:-/0.2409

ae:-/0.057

10-:-/2.466

l:-/0.112

11

-:-/0.01412

ax:-/2.607

el:-/0.164

13 t:-/2.113

dx:-/0.240

aa:-/0.055

18-:hostile/2.943

hh:hostile/0.134

hv:hostile/2.635

b:battle/0.000

aa:-/0.055

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 84

Page 85: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Sample Language ModelMSimplified language model as a weighted acceptor:

0 4-/2.374

5

-/3.961

2

battle/6.603

hostile/9.394

-/3.173

battle/9.268

1

bottle/11.510

-/1.882

-/2.306

-/1.102

-/1.9133

hostile/11.119

-/3.537

battle/10.896

bottle/13.970

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 85

Page 86: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Recognition by Composition� From phones to words:compose dictionary with phone lattice to

yield word lattice with combined acoustic and pronunciation costs:

0 1hostile/-32.900

2battle/-26.825� Applying language model:Compose word lattice with language

model to obtain word lattice with combined acoustic, pronunciation

and language model costs:

0

2hostile/-21.781

1hostile/-19.407

3

battle/-17.916

battle/-15.250

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 86

Page 87: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Context-Dependency Examples� Context-dependent phone models:Maps from CI units to CD units.

Example: ae=b d! aeb;d� Context-dependent allophonic rules:Maps from baseforms to

detailed phones. Example: t=V 0 V ! dx� Difficulty: Cross-word contexts – where several words enter and

leave a state in the grammar, substitution does not apply.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 87

Page 88: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Context-Dependency Transducers

Example — triphonic context transducer for two symbols x and y.

x.x x/x_x:x

x.y

x/x_y:x

y.x

y/x_x:yy.y

y/x_y:y x/y_x:x

x/y_y:x

y/y_x:y

y/y_y:y

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 88

Page 89: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Generalized State Machines

All of the above networks have bounded contextand thus can be

represented as generalized state machines. A generalized state machineM :� Supports these operations:

– M:start – returns start state

– M:final(state) – returns 1 if final, 0 if non-final state

– M:arcs(state) – returns transitions (a1; a2; : : : ; aN ) leavingstate, where ai = (ilabel; olabel; weight; nextstate)� Doesnot necessarily support:

– providing the number of states

– expanding states that have not been already discovered

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 89

Page 90: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

On-Demand Composition [69, 53]

Create generalized state machine C for composition A �B.� C:start := (A:start;B:start)� C:final((s1; s2)) := A:final(s1) ^B:final(s2)� C:arcs((s1; s2)) :=Merge(A:arcs(s1); B:arcs(s2))Merged arcs defined as:(l1; l3; x+ y; (ns1; ns2)) 2Merge(A:arcs(s1); B:arcs(s2))iff (l1; l2; x; ns1) 2 A:arcs(s1) and (l2; l3; y; ns2) 2 B:arcs(s2)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 90

Page 91: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

State Caching

Create generalized state machine B for input machine A.� B:start := A:start� B:final(state) := A:final(state)� B:arcs(state) := A:arcs(state)Cache Disciplines:� Expand each state of A exactly once, i.e. always save in cache

(memoize).� Cache, but forget ’old’ states using a least-recently used criterion.� Use instructions (ref counts) from user (decoder) to save and forget.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 91

Page 92: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

On Demand Composition – Results

ATIS Task - class-based trigram grammar, full cross-word triphonic

context-dependency.

states arcs

context 762 40386

lexicon 3150 4816

grammar 48758 359532

full expansion � 1:6� 106 5:1� � 106

For the same recognition accuracy as with a static, fully expanded

network, on-demand composition expands just 1.6% of the total number

of arcs.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 92

Page 93: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Determinization in Large Vocabulary Recognition� For large vocabularies, ’string’ lexicons are very non-deterministic� Determinizing the lexicon solves this problem, but can introduce

non-coassessible states during its composition with the grammar� Alternate Solutions:

– Off-line compose, determinize, and minimize:Lexicon �Grammar– Pre-tabulate non-coassessible states in the composition of:Det(Lexicon) �Grammar

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 93

Page 94: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Search in Recognition Cascades� Reminder: Cost� � log probability� Example recognition problem: w = argmaxw

(O � A �D �M)(o;w)� Viterbi search: approximate w by the output word sequence for the

lowest-cost path from the start state to a final state in O �A �D �M

— ignores summing over multiple paths with same output:

...:w1

...:wi

...:wn...:ε

...:ε...:ε

...:ε>

O ° A ° D ° M� Composition preserves acyclicity, O is acyclic) acyclic search

graph

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 94

Page 95: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Single-source Shortest Path Algorithms [83]� Meta-algorithm:Q fs0g; 8 s;Cost(s) 1While Q not empty, s Dequeue(Q)

For each s0 2 Adj[s] such that Cost(s0) > Cost(s) + cost(s; s0)Cost(s0) Cost(s) + cost(s; s0)Enqueue(Q; s)� Specific algorithms:

Name Queue type Cycles Neg. Weights Complexity

acyclic topological no yes O(jV j+ jEj)Dijkstra best-first yes no O(jEj log jV j)

Bellman-Ford FIFO yes yes O(jV j � jEj)M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 95

Page 96: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

The Search Problem� Obvious first approach: use an appropriate single-source

shortest-path algorithm� Problem: impractical to visit all states, can we do better?

– Admissiblemethods: guarantee finding best path, but reorder

search to avoid exploring provably bad regions

– Non-admissiblemethods: may fail to find best path, but may need

to explore much less of the graph� Current practical approaches:

– Heuristic cost functions

– Beam search

– Multipass search

– Rescoring

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 96

Page 97: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Heuristic Cost Function — A* Search [4, 56, 17]� States in search ordered by

cost-so-far(s) + lower-bound-to-complete(s)� With a tight bound, states not on good paths are not explored� With a loose lower bound no better than Dijkstra’s algorithm� Where to find a tight bound?

– Full search of a composition of smaller automata (homomorphic

automata with lower-bounding costs?)

– Non-admissible A* variants: use averaged estimate of

cost-to-complete, not a lower-bound

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 97

Page 98: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Beam Search [35]� Only explore states with costs within a beam(threshold) of the cost

of the best comparablestate� Non-admissible� Comparable states� states corresponding to (approximately) the

same observations� Synchronous (Viterbi) search: explore composition states in

chronological observation order� Problem with synchronous beam search: too local, some observation

subsequences are unreliable and may locally put the best overall path

outside the beam

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 98

Page 99: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Beam-Search Tradeoffs [68]

Word lattice: result of composing observation sequence, level

transducers and language model.

Beam Word latticeerror rate

Median numberof edges

4 7.3% 86.5

6 5.4% 244.5

8 4.4% 827

10 4.1% 3520

12 4.0% 13813.5

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 99

Page 100: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Multipass Search [52, 3, 68]� Use a succession of binary compositions instead of a single n-way

composition — combinable with other methods� Prune: Use two-pass variant of composition to remove states not in

any path close enough to the best� Pruned intermediate lattices are smaller, lower number of state

pairings considered� Approximate: use simpler models (context-independent phone

models, low-order language models)� Rescore: : :

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 100

Page 101: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Rescoring

Most successful approach in practice:

cheapmodelso …

w1

wn

detailedmodels wi

rescoringapproximate

n best

� Small pruned result built by composing approximate models� Composition with full models, observations� Find lowest-cost path

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 101

Page 102: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

PART IIIFinite State Methods in Language

Processing

Richard Sproat

Speech Synthesis Research Department

Bell Laboratories, Lucent Technologies

[email protected]

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 102

Page 103: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Overview� Text-analysis for Text-to-Speech (TTS) Synthesis

– A rich domain with lots of linguistic problems

– Probably the least familiar application of NLP technologies� Syntactic analysis� Some thoughts on text indexation

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 103

Page 104: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

The Nature of the TTS Problem

This is some text:It was a dark andstormy night. Fourscore and sevenyears ago. Now isthe time for allgood men. Letthem eat cake.Quoth the ravennevermore.

Linguistic Analysis

Speech Synthesis

phonemes, durationsand pitch contours

speech waveforms

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 104

Page 105: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

From Text to Linguistic RepresentationÑ«YoC‘The rat is eating the oil’ &

l a u∩

σ

µ

us

σ

µ

σ

µ

j o u∩

σ

µ

ω ω ω

rt s

Φ

N V N

shu3 chi1 you2 lao3

L H H L HL

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 105

Page 106: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Russian Percentages: The ProblemHow do you say ‘%’ in Russian?

Adjectival forms when modifying nouns20% skidka ) dvadcat i -procentn a� skidka

‘20% discount’ dvadcati -procent naja skidkas 20% rastvorom ) s dvadcat i -procent nym rastvorom

‘with 20% solution’ s dvadcati -procent nym rastvorom

Nominal forms otherwise21% ) dvadcat~ odin procent

dvadcat’ odin procent

23% ) dvadcat~ tri procent advadcat’ tri procenta

20% ) dvadcat~ procent ovdvadcat’ procentovs 20% ) s dvadcat~ � procent ami

‘with 20%’ s dvadcat’ ju procent ami

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 106

Page 107: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Text Analysis Problems� Segment text into words.� Segment text into sentences, checking for and expandingabbreviations :

St. Louis is in Missouri.� Expand numbers� Lexical and morphological analysis� Word pronunciation

– Homograph disambiguation� Phrasing� Accentuation

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 107

Page 108: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Desiderata for a Model of Text Analysis for TTS� Delay decisions until have enough information to make them� Possibly weightvarious alternatives

Weighted Finite-State Transducers offer an attractive computational model

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 108

Page 109: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Overall Architectural Matters

Example: word pronunciation in Russian� Text form: kostra<kostra> (bonfire+genitive.singular)� Morphological analysis:kost0 ¨Erfnoungfmascgfinang+0afsggfgeng� Pronunciation: /k�str0a/� Minimal Morphologically-Motivated Annotation (MMA): kostr0a(Sproat, 1996)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 109

Page 110: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Overall Architectural Matters

Pronunciation

Language Model

Surface Orthographic FormKOSTPA #kastr"a#

#KOSTP"A#

α:β

γ:δ

fstα:β

γ:δ

fst

α:β

γ:δ

fst

α:β

γ:δ

fst

Morphological Analysis #KOST"{E}P{noun}{masc}{inan}+"A{sg}{gen}#

:MMA

S

M

P

D

S O−1−1

Lexical Analysis WFST:

L

O

Phonological Analysis WFST:

PLL = D MO

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 110

Page 111: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Orthography ! Lexical Representation

A Closer Look

Words : Lex. Annot. � Lex. Annot. : Lex. Anal. _ Punc. :Interp. �S S

Special Symbols : Expansions SPACE:Interp.SNumerals : Expansions

SPACE: white space in German, Spanish, Russian : : :� in Japanese, Chinese : : :M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 111

Page 112: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Chinese Word SegmentationF ! F 1asp4:68 le0 perfFÑ ! F 2Ñ 1vb8:11 liao3jie3 understandj ! j 1vb5:56 da4 bigjó ! j 1ónc11:45 da4jie1 avenue£ ! £ 2adv4:58 bu4 notb ! bvb4:45 zai4 atÑ ! Ñvb11:77 wang4 forgetÑ£F ! Ñvb++£ 2F 2npot12:23 wang4+bu4liao3 unable to forgetÚ ! Únp4:88 wo3 Iñ ! ñvb8:05 fang4 placeñj ! ñj 1vb10:70 fang4da4 enlargeþÌ ! þ 1Ìnc11:02 na3li3 whereó ! ónc10:35 jie1 avenueÑñ ! Ñ 1ñnc10:92 jie3fang4 liberationÑñj ! Ñ 3ñj 1 urnp42:23 xie4 fang4da4 nameM.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 112

Page 113: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Chinese Word Segmentation

Space = � : #

L = Space _ (Dictionary _ (Space [ Punc))+BestPath(ÚÑ£FÑñjóbþÌ � L) =Úpro4:88#Ñvb+£F 2npot12:23#Ñ1ñnc10:92j 1ónc11:45 : : :‘I couldn’t forget where Liberation Avenue is.’

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 113

Page 114: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Numeral Expansion

234 � Factorization ) 2 � 102 + 3 � 101 + 4� DecadeFlop ) 2 � 102 + 4 + 3 � 101� NumberLexicon� +

zwei+hundert+vier+und+dreißig

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 114

Page 115: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Numeral Expansion

0 1

1:1

2:2

3:3

4:4

5:56:6

7:7

8:8

9:9

4

ε:10∧1

2

ε:10∧2 5

0:0

1:1

2:2

3:3

4:45:5

6:6

7:7

8:8

9:93

0:0

1:1

2:2

3:3

4:45:5

6:6

7:7

8:8

9:9

ε:10∧1

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 115

Page 116: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

German Numeral Lexicon/f1g : (’einsfnumg(fmascgjfneutg)fsggf##g)/

/f2g : (zw’eifnumgf##g)/

/f3g : (dr’eifnumgf##g)/...

/(f0gf+++gf1gf10^1g) : (z’ehnfnumgf##g)/

/(f1gf+++gf1gf10^1g) : (’elffnumgf##g)/

/(f2gf+++gf1gf10^1g) : (zw’olffnumgf##g)/

/(f3gf+++gf1gf10 1g) : (dr’eif++gzehnfnumgf##g)/...

/(f2gf10^1g) : (zw’anf++gzigfnumgf##g)/

/(f3gf10^1g) : (dr’eif++gßigfnumgf##g)/...

/(f10^2g) : (h’undertfnumgf##g)/

/(f10^3g) : (t’ausendfnumgfneutgf##g)/

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 116

Page 117: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Morphology: Paradigmatic Specifications

Paradigm fA1g# starke Flektion (z.B. nach unbestimmtem Artikel)

Suffix f++ger fsggfmascgfnomgSuffix f++gen fsggfmascg(fgengjfdatgjfaccg)Suffix f++ge fsggffemig(fnomgjfaccg)Suffix f++gen fsgg(ffemigjfneutg)(fgengjfdatg)Suffix f++ges fsggfneutg(fnomgjfaccg)Suffix f++ge fplg(fnomgjfaccg)Suffix f++ger fplgfgengSuffix f++gen fplgfdatg

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 117

Page 118: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Morphology: Paradigmatic Specifications

##### Possessiva ("mein, euer")

Paradigm fA6gSuffix f++gfEpsg fsgg(fmascg|fneutg)fnomg

Suffix f++ge fsggffemigfnomgSuffix f++ges fsgg(fmascg|fneutg)fgengSuffix f++ger fsggffemig(fgeng|fdatg)Suffix f++gem fsgg(fmascg|fneutg)fdatgSuffix f++gen fsggfmascgfaccgSuffix f++gfEpsg fsggfneutgfaccgSuffix f++ge fplg(fnomg|faccg)Suffix f++ger fplgfgengSuffix f++gen fplgfdatg

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 118

Page 119: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Morphology: Paradigmatic Specifications

/fA1g : (’aalf++gglattfadjg)//fA1g : (’abf++ganderf++glichfadjgfumltg)//fA1g : (’abf++gartigfadjg)//fA1g : (’abf++gbauf++gwurdigfadjgfumltg)/

...

/fA6g : (d’einfadjg)//fA6g : (’euerfadjg)//fA6g : (’ihrfadjg)//fA6g : (’Ihrfadjg)//fA6g : (m’einfadjg)//fA6g : (s’einfadjg)//fA6g : (’unserfadjg)/

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 119

Page 120: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Morphology: Paradigmatic Specifications

Project((fA6g_Endings) � ((fA6g:Stems)_Id(Σ�))))0 1m 2’ 3e 4i 5n 6adj 7++

8sg

12

e

9masc

11

neut

pl

13sg

14

m

17

n 20

r 24s10

nom

nom

accfemi

15sg 16

pl

18sg

21sg

23pl

25sg

masc

neut

dat19mascacc

22femi

gen

gen

dat

mascneut

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 120

Page 121: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Morphology: Finite-State Grammar

START PREFIX fEpsgPREFIX STEM fEpsgPREFIX STEM t"elef++g<1.0>

...

STEM SUFFIX ’abend

STEM SUFFIX ’abenteuer...

SUFFIX PREFIX f++g<1.0>SUFFIX FUGE fEpsg<1.0>SUFFIX WORD fEpsg<2.0>

...

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 121

Page 122: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Morphology: Finite-State Grammar

FUGE SECOND f++g<1.5>FUGE SECOND f++gsf++g<1.5>

...

SECOND PREFIX fEpsg<1.0>SECOND STEM fEpsg<2.0>SECOND WORD fEpsg<2.0>

...

WORD

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 122

Page 123: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Morphology: Finite-State Grammar

Unanstandigkeitsunterstellung

‘allegation of indecency’+"unf++g"anf++gst’andf++gigf++gkeitf++gsf++gunterf++gst’ellf++gung

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 123

Page 124: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Rewrite Rule Compilation

Context-dependent rewrite rules

General form: �! =� ��; ; �; � regular expressions.

Constraint: cannot be rewritten but can be used as a context

Example: a! b=c b(Johnson, 1972; Kaplan & Kay, 1994; Karttunen, 1995; Mohri & Sproat,

1996)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 124

Page 125: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Example

a! b=c bw = cab

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 125

Page 126: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Example

Input:

0 1c

2a

3b

After r:

0 1c

2a

3>

4b

After f:

0 1c

2<1

<23

a4

>5

b

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 126

Page 127: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Example

After replace:

0 1c

2<1

3

<2 4

b

a 5b

After l1:

0 1c

2

b

3

<2 4b

a

After l2:

0 1c

2b

3b

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 127

Page 128: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Rewrite Rule Compilation� Principle

– Based on the use of marking transducers

– Brackets inserted only where needed� Efficiency

– 3 determinizations + additional linear time work

– Smaller number of compositions

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 128

Page 129: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Rule Compilation Methodr � f � replace � l1 � l2r : Σ��! Σ� > �f : (Σ [ f>g)�� >! (Σ [ f>g)�f<1; <2g� >replace : <1 � >!<1 l1 : Σ�� <1! Σ��l2 : Σ�� <2! Σ��M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 129

Page 130: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Marking Transducers

Proposition Let � be a deterministic automaton representing ��, then

the transducer � post-marks occurrences of � by #.

q

c:c

d:d

a:a

b:b

Final state q with entering and leaving transitions of Id(�).q q’

ε:#c:c

d:d

a:a

b:b

States and transitions after modifications, transducer � .

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 130

Page 131: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Marker of Type 2

q

#:εc:c

d:d

a:a

b:b

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 131

Page 132: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

The Transducers as Expressions usingMarkerr = [reverse(Marker(Σ�reverse(�); 1; f>g; ;))]f = [reverse(Marker((Σ [ f>g)�reverse(�> >); 1; f<1; <2g; ;))]l1 = [Marker(Σ��; 2; ;; f<1g)]<2:<2l2 = [Marker(Σ��; 3; ;; f<2g)]

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 132

Page 133: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Example: r for rule a! b=c b

Σ�reverse(�) =0

a:ac:c

1

b:b

a:a

c:c

b:bMarker(Σ�reverse(�); 1; f>g; ;) =0

a:ac:c

1b:b

Eps:>reverse(Marker(Σ�reverse(�); 1; f>g;;)) =0

a:ac:c

1Eps:>

b:b

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 133

Page 134: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

The ReplaceTransducer

0

Σ:Σ, < :< , >:ε2 2

1< :< 1 1

2

[φxψ]< :ε, < : ε, >:ε 1 2

>:ε

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 134

Page 135: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Extension to Weighted Rules

Weighted context-dependent rules:�! =� �� �; �; � regular expressions,� formal power series on the tropical semiring

Example: c! (:9c) + (:1t)=a tM.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 135

Page 136: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Rational power series

Functions S : Σ� ! R+ [ f1g, Rationalpower series� Tropical semiring: (R+ [ f1g;min;+)� Notation: S = Xw2Σ�(S;w)� Example: S = (2a)(3b)(4b)(5b) + (5a)(3b)�(S; abbb) = minf2 + 3 + 4 + 5 = 14; 5 + 3 + 3 + 3 = 11g = 11

Theorem 6 (Schutzenberger, 1961):S is rational iff it is recognizable

(representable by a weighted transducer).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 136

Page 137: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Compilation of weighted rules� Extension of the composition algorithm to the weighted case

– Efficient filter for �-transitions

– Addition of weights of matching labels� Same compilation algorithm� Single-source shortest paths algorithms to find the best path

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 137

Page 138: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Rewrite Rules: An Example

s! z = ($|#) VStop ;

0

V:V$:$z:z

VStop:VStop#:#

1

s:s

2

s:z

z:z

V:V

s:s

s:z

3#:#

$:$

VStop:VStop

4

$:$

#:#

V:V

$:$

z:z

#:#

s:ss:z

VStop:VStop

/mis$mo$/ � Voicing = /miz$mo$/

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 138

Page 139: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Syllable structure

1 (C� V C�1:0 $ )+ T2 : (Σ� $ (CC \ : (CG [ OL)) Σ�)

T3 : (Σ� $ ([+cor] [ /x/ [ /�/) /l/ Σ�)

T4 : (Σ� $ ([+cor,+strid] [ /x/ [ /�/) /r/ Σ�)

estrella: /estreya/ � Intro( $ ) � Syl )

0 1##/0

2e/0

3s/1

4$/0

14t/1

5

t/0

$/0

15

r/1 6

r/07

e/08$/0

13y/1

9

y/0

$/010

a/011

$/012/0

##/0$/0

BestPath(/estreya/ � Intro( $ ) � Syl) = /es $ tre $ ya $ /

atlas: /atlas/ � Intro( $ ) � Syl )

0 1##/0

2a/0

3t/1

4$/0

10l/1

5

l/0

$/06

a/07

s/18

$/09/0

##/0

BestPath(/atlas/ � Intro( $ ) � Syl) = /at $ las $ /

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 139

Page 140: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Russian Percentage Expansion: An examples 5% skidko ˘i�Lexical Analysis FST+

sprep pjatnum’nom-procentnadj ? +ajafem+sg+nom skidkfemojsg+instr [sprep pjatnumigen-procentnadj ? +ojfem+sg+instr skidkfemojsg+instr 2:0 [

sprep pjatnum’juinstr-procentnoun+amipl+instr skidkfemojsg+instr 4:0 [...

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 140

Page 141: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

�Language Model FSTs:�! ? = procentnoun (Σ \ :#)� # (Σ \ :#)�noun? ! � = procentnadj (Σ \ :#)� # (Σ \ :#)�noun��! ? = procentn (Σ \ :#)�Case\:instr# (Σ \ :#)�instr�! ? = procentn (Σ \ :#)�sg+Case# (Σ \ :#)�pl�:(Σ� ? Σ�)+

s pjatigen-procentnadjojsg+instr skidkoj

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 141

Page 142: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Percentage Expansion: Continueds 5% skidko ˘i+s pjatigen-procentnadjojsg+instr skidkoj�

L � P+s # PiT"!p�r@c"Entn&y # sK"!tk&y

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 142

Page 143: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Phrasing Prediction� Problem: predict intonational phrase boundaries in long

unpunctuated utterences:

For his part, Clinton told reporters in Little Rock, Ark., onWednesdayk that the pact can be a good thing for Americak if we change our

economic policyk to rebuild American industry here at homek and if

we get the kind of guarantees we need on environmental and labor

standards in Mexicok and a real plank to help the people who will

be dislocated by it.� Bell Labs synthesizer uses a CART-based predictor trained on labeled

corpora (Wang & Hirschberg 1992).

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 143

Page 144: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Phrasing Prediction: Variables

For each < wi; wj >:� length of utterance; distance of wi in syllables/stressed syllables/words : : : from the beginning/end of the sentence� automatically predicted pitch accent for wi and wj� part-of-speech (POS) for a 4-word window around < wi; wj >;� (largest syntactic constituent dominating wi but not wj and viceversa, and smallest constituent dominating them both)� whether < wi; wj > is dominated by an NP and, if so, distance ofwi from the beginning of that NP, the NP, and distance/length� (mutual information scores for a four-word window around< wi; wj >)

The most successful of these predictors so far appear to be POS, someconstituency information, and mutual information

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 144

Page 145: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Phrasing Prediction: Sample Tree

punc:NO

punc:YES

69923/82958

no

j3f:CC,CS,EX,FORIN,IN,ININ,ONIN,TO,TOIN

j3f:AT,CD,PART,UH,NA

69826/74358

no

j3f:CC,CS

j3f:EX,FORIN,IN,ININ,ONIN,TO,TOIN

9968/13032

no

syls:<7.5

syls:>7.5

1502/2985

no

j1v:HV,MD,SAIDVBD,VB,VBD,VBG

j1v:HVD,VBN,VBZ,NA

361/452

no

3221/29

yes

33353/423

no

j4n:NN,NNS,NP

j4n:PN,NA

1392/2533

yes

npdist:<1.875

npdist:>1.875

510/792

no

68472/667

no

6987/125

yes

j2n:NN,NNS,NP

j2n:PN,NA

1110/1741

yes

70820/1042

yes

j3f:CC

j3f:CS

409/699

no

j1f:AT,CS,IN,TO

j1f:CC,CD,FORIN,ININ,ONIN,TOIN,NA

167/295

yes

28440/60

no

285147/235

yes

raj4:CL,DEACC

raj4:ACC

281/404

no

286275/379

no

28719/25

yes

98466/10047

no

nploc:SUCC,SINGLE

nploc:PRE,W/IN,OTHER

59858/61326

no

j2n:NN,NNS,NP

j2n:PN,NA

8536/9549

no

j3w:WP$,WDT,WPS,WRB

j3w:NA

6111/7106

no

ssylsp:<4.5

ssylsp:>4.5

219/418

yes

8048/53

no

j1f:AT,CS,FORIN,TOIN

j1f:CC,CD,IN,ININ,NA

214/365

yes

16256/88

no

163182/277

yes

415912/6688

no

212425/2443

no

1151322/51777

no

j3n:NN,NP

j3n:NNS,PN,NA

8503/8600

yes

j4f:0,ONIN

j4f:AT,CC,CD,FORIN,IN,ININ,TO,TOIN,NA

777/848

yes

1230/39

no

13768/809

yes

77726/7752

yes

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 145

Page 146: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Phrasing Prediction: Results� Results for multi-speaker read speech:

– major boundaries only: 91.2%

– collapsed major/minor phrases: 88.4%

– 3-way distinction between major, minor and null boundary:

81.9%� Results for spontaneous speech:

– major boundaries only: 88.2%

– collapsed major/minor phrases: 84.4%

– 3-way distinction between major, minor and null boundary:

78.9%� Results for 85K words of hand-annotated text, cross-validated on

training data: 95.4%.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 146

Page 147: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Tree-Based Modeling: Prosodic Phrase Prediction

[1] dpunc:<3.5dpunc:>3.5

920/1800no

[2] lpos:N,V,A,Adv,Dlpos:P

620/1080no

[3] rpos:N,Arpos:V,Adv,D,P

495/900no

[4] dpunc:<2.5dpunc:>2.5

190/300no

16133/200

no

lpos:Nlpos:V,A,Adv,D

57/100no

3413/20yes

3550/80

no

dpunc:<1.5dpunc:>1.5

305/600no

lpos:V,A,Advlpos:N,D

117/200no

3678/120

no

lpos:Nlpos:D

41/80yes

7423/40yes

7522/40

no

rpos:V,Adv,Drpos:P

212/400yes

lpos:V,Alpos:N,Adv,D

152/300yes

rpos:Vrpos:Adv,D

64/120no

15224/40yes

15348/80

no

rpos:Vrpos:Adv,D

96/180yes

15433/60

no

15569/120

yes

3960/100

yes

5125/180

no

rpos:N,Arpos:V,Adv,D,P

420/720yes

lpos:Alpos:N,V,Adv,D,P

134/240no

1230/40

no

lpos:V,Advlpos:N,D,P

104/200no

2647/80

no

rpos:Nrpos:A

63/120yes

5435/60

no

5538/60yes

7314/480

yes

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 147

Page 148: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

The Tree Compilation Algorithm

(Sproat & Riley, 1996)� Each leafnode corresponds to singlerule defining a constrained weighted

mappingfor the input symbol associated with the tree� Decisions at each node are stateable as regular expressions restricting the left

or right context of the rule(s) dominated by the branch� The full left/right context of the rule at a leaf node are derived by intersecting

the expressions traversed between the root and leaf node� The transducer for the entire tree represents the conjunction of all the

constraints expressed at the leaf nodes; it is derived by intersectingtogether

the set of WFSTs corresponding to each of the leaves

– Note that intersection is defined for transducers that express same-length

relations� The alphabetis defined to be an alphabet of all correspondence pairs that

were determined empirically to be possible

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 148

Page 149: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Interpretation of Tree as a Ruleset

Node 16 � �1 (Σ�(I! [ I!#! [ I!#!#!)) \ 3 N [A

2 (Σ�(N [ V [A [ Adv [D)) \4 (Σ�(I! [ I!#!))

#) (I1:09 [ #0:41) = I(!#)?(N [ V [A [ Adv [D) N [AM.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 149

Page 150: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Summary of Compilation Algorithm

Each rule represents a weighted two-level surface coercion ruleRuleL = Compile(�T ! L= \p2PL �p \p2PL �p)

Each tree/forest represents a set of simultaneous weighted two-level

surface coercion rules RuleT = \L2T RuleLRuleF = \T2F RuleTBestPath(,D#N#V#Adv#D#A#N � Tree)) ,D#N#V#Adv , D#A#N2:76

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 150

Page 151: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Lexical Ambiguity Resolution� Word sense disambiguation:

She handed down a harsh sentence. peine

This sentenceis ungrammatical. phrase� Homograph disambiguation:

He plays bass. /be/s/

This lake contains a lot of bass. /bæs/� Diacritic restoration:

appeler l’autre cotede l’atlantique cote ‘side’

Coted’Azur cote ‘coast’

(Yarowsky, 1992; Yarowsky 1996; Sproat, Hirschberg & Yarowsky, 1992;

Hearst 1991)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 151

Page 152: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Homograph Disambiguation 1� N-Grams

Evidence l�d lid Logprob

lead level/N 219 0 11.10

of lead in 162 0 10.66

the lead in 0 301 10.59

lead poisoning 110 0 10.16

lead role 0 285 10.51

narrow lead 0 70 8.49� Predicate-Argument Relationships

follow/V+ lead 0 527 11.40

take/V+ lead 1 665 7.76� Wide Context

zinc$ lead 235 0 11.20

copper$ lead 130 0 10.35� Other Features (e.g. Capitalization)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 152

Page 153: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Homograph Disambiguation 2

Sort by Abs(Log(Pr(Pron1jCollocationi)Pr(Pron2jCollocationi) ))

Decision List for lead

Logprob Evidence Pronunciation

11.40 follow/V+ lead ) lid

11.20 zinc$ lead ) l�d

11.10 lead level/N ) l�d

10.66 of lead in ) l�d

10.59 thelead in ) lid

10.51 lead role ) lid

10.35 copper$ lead ) l�d

10.28 lead time ) lid

10.16 lead poisoning ) l�d

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 153

Page 154: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Homograph Disambiguation 3: Pruning� Redundancy by subsumption

Evidence lid l�d Logprob

lead level/N 219 0 11.10

lead levels 167 0 10.66

lead level 52 0 8.93� Redundancy by association

Evidence t�� ti�tear gas 0 1671

tear$ police 0 286

tear$ riot 0 78

tear$ protesters 0 71

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 154

Page 155: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Homograph Disambiguation 4: Use

Choose single best piece of matching evidence.

Decision List for lead

Logprob Evidence Pronunciation

11.40 follow/V+ lead ) lid

11.20 zinc$ lead ) l�d

11.10 lead level/N ) l�d

10.66 of lead in ) l�d

10.59 thelead in ) lid

10.51 lead role ) lid

10.35 copper$ lead ) l�d

10.28 lead time ) lid

10.16 lead poisoning ) l�d

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 155

Page 156: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Homograph Disambiguation: EvaluationWord Pron1 Pron2 Sample Size Prior Performance

lives laivz livz 33186 .69 .98

wound waYnd wund 4483 .55 .98

Nice nais nis 573 .56 .94

Begin biqgin beigin 1143 .75 .97

Chi tSi kai 1288 .53 .98

Colon koYqloYn qkoYl�n 1984 .69 .98

lead (N) lid l�d 12165 .66 .98

tear (N) t�� ti� 2271 .88 .97

axes (N) qæksiz qæksiz 1344 .72 .96

IV ai vi fAMW 1442 .76 .98

Jan dcæn j�n 1327 .90 .98

routed Mutid MaYtid 589 .60 .94

bass beis bæs 1865 .57 .99

TOTAL 63660 .67 .97

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 156

Page 157: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Decision Lists: Summary� Efficient and flexible use of data.

� Easy to interpret and modify.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 157

Page 158: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Decision Lists as WFSTs

The leadexample

� Construct ‘homograph taggers’ H0, H1 : : : that find and tag instances

of a homograph set in a lexical analysis. For example, H1 is:

0

φ:φ

1

##:##

φ:φ2

l:l

φ:φ3

e:e

φ:φ4

a:aφ:φ

5

d:dφ:φ

6

1:1φ:φ

7

nn:nn

8

<e>:H1φ:φ

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 158

Page 159: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Decision Lists as WFSTs� Construct an environmental classifierconsisting of a pair of transducers C1 and C2,

where

– C1 optionally rewrites any symbol except the word boundary or the homograph tags

H0, H1 : : : , as a single dummy symbol ∆

– C2 classifies contextual evidence from the decision list according to its type, and

assigns a cost equal to the position of the evidence in the list; and otherwise passes∆, word boundary and H0, H1 : : : through:

## follow vb ## ! ## ∆ V0 ## <1>## zinc nn ## ! ## ∆ C1 ## <2>## level(s?) nn ## ! ## ∆ R1 ## <3>## of pp ## ! ## ∆ [1 ## <2>## in pp ## ! ## ∆ 1] ## <2>

...

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 159

Page 160: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Decision Lists as WFSTs� Construct a disambiguatorD from a set of optional rules of the form:

H0 ! 3 / V0 Σ�H1 ! 3 / C1 Σ�H1 ! 3 / Σ� C1

H0 ! 3 / ## ∆� R0

H1 ! 3 / ## ∆� R1

H0 ! 3 / [0 ## ∆� ## ∆� 0]

H1 ! 3 / [1## ∆� ## ∆� 1]...

H0 ! 3 < 20 >H1 ! 3 < 40 >� Construct a filter F that removes all paths containing H0, H1 : : : .

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 160

Page 161: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Decision Lists as WFSTs� Let an example input T be:

0 1##

2c

3o

4p

5p

6e

7r

8nn

9##

10a

11n

12d

13##

14l

15e

16a

17d 18

01

19vb

nn20

##� Then the disambiguated input T 0 is given by:T \ Project�1[ BestPath [ T �H0 �H1 � C1 � C2 �D � F ] ]M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 161

Page 162: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Syntactic Parsing and Analysis� Intersection grammars (Voutilainen, 1994, inter alia)� FST simulation of top-down parsing (Roche, 1996)� Local grammars implemented as failure function automata (Mohri,

1994)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 162

Page 163: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Intersection Grammars� Text automatonconsisting of all possible lexical analyses of the

input, including analysis of boundaries.

0 1

@@@ 2

the3

dt4

@@@ 5

cans 6vb

17

nn 7

sg

pl 8

@@@ 9

hold10

vb11

pl12

@@@ 13

tuna14

nn15

sg16

@@@

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 163

Page 164: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

� Series of syntactic FSAs to be intersected with the text automaton,

constraining it.

0

ρ1dt

ρ

dt

2

@

@@ρ

dt

3

hold

thefactory

cans

tuna

ρ

dt

4vb

0 1

@@2

the3

dt4

@5

cans6

nn7

pl8

@9

hold10

vb11

pl12

@13

tuna14

nn15

sg16

@@� Experimental grammars with a couple of hundred rules have been

constructed.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 164

Page 165: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Top Down ParsingS = [S the cans hold tuna S]Tdic =0

[NP:[NP[VP:[VP

[S:[Sρ:ρ

1[NP:(NP

2

[VP:(VP

3[S:(S

4

the:the

5

cans:cans

tuna:tuna

6

cans:cans

hold:hold

7e:[NP

factory:factorycans:cans

tuna:tuna

NP]:NP)

8e:[NP

ρ:ρ

9

e:NP]

ρ:ρ 10e:NP]

11

e:[VPVP]:VP)

ρ:ρ

12

e:VP]

S]:S)S � Tdic = (S [NP NP] [VP the cans hold tuna VP] S)

(S [NP the NP] [VP cans hold tuna VP] S)

(S [NP the cans NP] [VP hold tuna VP] S)

(S [NP the cans hold NP] [VP tuna VP] S)

(S [NP the cans hold tuna NP] [VP VP] S)S � Tdic � Tdic = (S (NP the cans NP) (VP hold [NP tuna NP] VP) S)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 165

Page 166: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Local Grammars� Descriptions of local syntactic phenomena, compiled into efficient, compactdeterministic automata, using failure functions. (Cf. the use of failure functions with

(sets of) stringsfamiliar from string matching — e.g. Crochemore & Rytter, 1994)� Descriptions may be negativeor positive.Example of a negative constraint:

– Let L(G) = DT @ WORD VB

– Construct deterministic automaton for Σ�L(G)0

φ1

dt

φ2

3

WORDφ

4

vb

φ

– Given a sentence L(S), compute L(S) � Σ�L(G)Σ�M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 166

Page 167: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Indexation of natural language texts� Motivation

– Use of linguistic knowledge in indexation

– Optimal complexities� Preprocessing of a text t, O(jtj)� Search for positions of a string x,O(jxj+NumOccurrences(x))� Existing efficient indexation algorithms (PAT), but not convenient

(use with large linguistic information)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 167

Page 168: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Example (1)

0

5

1a:0

5

b:2

4

2a:0

3b:1

4b:0

6a:1

b:0

b:0

a:0

Figure 27: Indexation with subsequential transducers t = aabba.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 168

Page 169: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Example (2)

0

{0,1,2,3,4,5}

1

{1,2,5}

a

5

{3,4}

b

2

{2}

a

3

{3}

b

4

{4}

b

6

{5}

a

b

b

a

Figure 28: Indexation with automata t = aabba.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 169

Page 170: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Algorithms� Based on the definition of an equivalence relation R on Σ�:(w1 R w2) iff w1 and w2 have the same set of ending positions in t� Construction

– Minimal machines (subsequential transducer or automaton)

– Use of a failure function to distinguish equivalence classes� Can be adapted to natural language text

(not storing list of positions of short words)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 170

Page 171: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Indexation with finite-state machines� Complexity

– Transducers (Crochemore, 1986): Preprocessing O(jtj), SearchO(jxj+NumOccurrences(x)) if using complex labels

– Automata (Mohri, 1996b): Preprocessing quadratic, SearchO(jxj+NumOccurrences(x))� Advantage: use of linguistic information

– Extended search: composition with morphological transducer

– Refinement: composition with finite-state grammar� Applications to WWW (Internet)

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 171

Page 172: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

References

[1] Aho, Alfred V., John E. Hopcroft, and Jeffrey D. Ullman. 1974. Thedesign and analysis of computer algorithms. Addison Wesley: Reading,MA.

[2] Aho, Alfred V., Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers,Principles, Techniques and Tools. Addison Wesley: Reading, MA.

[3] Austin, S., R. Schwartz, and P. Placeway. 1991. Theforward-backward search algorithm. Proc. ICASSP ‘91, S10.3.

[4] Bahl, L.R., F. Jelinek, and R. Mercer. 1983. A maximum likelihoodapproach to continuous speech recognition. IEEE Transactions on PAMI,

pp 179-190, March, 1983.

[5] Bellegarda, J. 1990. Tied mixture continuous parameter modeling forspeech recognition IEEE Transactions on ASSP38(12) 2033-2045.

[6] Berstel, Jean. 1979. Transductions and Context-Free Languages.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 172

Page 173: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Teubner Studienbucher: Stuttgart.

[7] Berstel, Jean, and Christophe Reutenauer. 1988. Rational Series and

Their Languages. Springer-Verlag: Berlin-New York.

[8] Bird, Steven, and T. Mark Ellison. 1994. One-level phonology:

Autosegmental representations and rules as finite automata.

Computational Linguistics, 20(1):55–90.

[9] Breiman, Leo, Jerome H. Friedman, Richard A. Olshen, and Charles J.

Stone. 1984. Classification and Regression Trees. Wadsworth & Brooks,

Pacific Grove CA.

[10] Chen, F. 1990. Identification of contextual factors for pronunciation

networks. Proc. ICASSP ‘90, S14.9,

[11] Choffrut, Christian. 1978. Contributionsa l’ etude de quelques

familles remarquables de fonctions rationnelles. Ph.D. thesis, (these de

doctorat d’Etat), Universite Paris 7, LITP: Paris, France.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 173

Page 174: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

[12] Cormen, T., C. Leiserson, and R. Rivest. 1992. Introduction to

Algorithms. The MIT Press: Cambridge, MA.

[13] Eilenberg, Samuel. 1974-1976. Automata, Languages and Machines,volume A-B. Academic Press.

[14] Elgot, C. C. and J. E. Mezei. 1965. On relations defined bygeneralized finite automata. IBM Journal of Research and Development,9.

[15] Gildea, Daniel, and Daniel Jurafsky. 1995. Automatic induction offinite state transducers for simple phonological rules. In 33rd AnnualMeeting of the Association for Computational Linguistics, pages 9–15.ACL.

[16] Good, I. 1953. The population frequencies of species and theestimation of population parameters. Biometrica V40(3,4) 237-264.

[17] Gopalakrishnam, P., L. Bahl, and R. Mercer. 1995. A tree searchstrategy for large-vocabulary continuous speech recognition Proc.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 174

Page 175: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

ICASSP ‘95, 572-575.

[18] Gross, Maurice. 1989. The use of finite automata in the lexicalrepresentation of natural language. Lecture Notes in Computer Science,377.

[19] Hearst, Marti. 1991. Noun homograph disambiguation using localcontext in large text corpora. In Using Corpora, Waterloo, Ontario.University of Waterloo.

[20] Hopcroft, John E. and Jeffrey D. Ullman. 1979. Introduction to

Automata Theory, Languages, and Computation. Addison Wesley:Reading, MA.

[21] Johnson, C. Douglas. 1972. Formal Aspects of Phonological

Description. Mouton, Mouton, The Hague.

[22] Joshi, Aravind K. 1996. A parser from antiquity: an earlyapplication of finite state transducers to natural language processing. InECAI-96 Workshop, Budapest, Hungary. ECAI.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 175

Page 176: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

[23] Kaplan, Ronald M. and Martin Kay. 1994. Regular models ofphonological rule systems. Computational Linguistics, 20(3).

[24] Karlsson, Fred, Atro Voutilainen, Juha Heikkila, and Atro Anttila.1995. Constraint Grammar, A language-Independent System for ParsingUnrestricted Text. Mouton de Gruyter.

[25] Karttunen, Lauri. 1995. The replace operator. In 33rd Meeting of theAssociation for Computational Linguistics (ACL 95), Proceedings of theConference, MIT, Cambridge, Massachussetts. ACL.

[26] Karttunen, Lauri. 1996. Directed replacement. In 34th AnnualMeeting of the Association for Computational Linguistics, pages108–115. ACL.

[27] Karttunen, Lauri, and Kenneth Beesley. 1992. Two-level rulecompiler. Technical Report P92–00149, Xerox Palo Alto Research Center.

[28] Karttunen, Lauri, Ronald M. Kaplan, and Annie Zaenen. 1992.Two-level morphology with composition. In Proceedings of the fifteenth

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 176

Page 177: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

International Conference on Computational Linguistics (COLING’92),

Nantes, France. COLING.

[29] Katz, S. 1987. Estimation of probabilities from sparse data for thelanguage model component of a speech recognizer. IEEE Trans. of ASSP,35(3):400–401.

[30] Kiraz, George Anton . 1996. S. emH. e: A generalized two-levelsystem. In 34th Annual Meeting of the Association for Computational

Linguistics, pages 159–166. ACL.

[31] Koskenniemi, Kimmo. 1983. Two-Level Morphology: a GeneralComputational Model for Word-Form Recognition and Production. PhDthesis. University of Helsinki, Helsinki.

[32] Koskenniemi, Kimmo. 1985. Compilation of automata frommorphological two-level rules. In Proceedings of the Fifth Scandinavian

Conference of Computational Linguistics, Helsinki, Finland.

[33] Koskenniemi, Kimmo. 1990. Finite-state parsing and

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 177

Page 178: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

disambiguation. In Proceedings of the thirteenth International

Conference on Computational Linguistics (COLING’90), Helsinki,Finland. COLING.

[34] Kuich, Wener and Arto Salomaa. 1986. Semirings, Automata,

Languages. Springer-Verlag: Berlin-New York.

[35] Lee, C. and L. Rabiner. 1989. A frame-synchronous network searchalgorithm for connected word recognition IEEE Transactions on ASSP

37(11) 1649-1658.

[36] Lee, K.F. 1990. Context dependent phonetic hidden Markov modelsfor continuous speech recognition. IEEE Transactions on ASSP. 38(4)599–609.

[37] van Leeuwen, Hugo. 1989. TooLiP: A Development Tool forLinguistic Rules. PhD thesis. Technical University Eindhoven.

[38] Leggestter, C.J. and P.C. Woodland. 1994. Speaker adaptation ofcontinuous density HMMs using linear regression. Proc. ICSLP ’942:

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 178

Page 179: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

451-454. Yokohama.

[39] Ljolje, A.. 1994. High accuracy phone recognition using context

clustering and quasi-triphonic models. Computer Speech and Language,

8:129–151.

[40] Ljolje, A., M. Riley, and D. Hindle. 1996. Recent improvements in

the AT&T 60,000 word speech-to-text system. Proc. DARPA Speech and

Natural Language Workshop,February 1996. Harriman, NY.

[41] Lothaire, M. 1990. Mots. Hermes: Paris, France.

[42] Lucassen, J., and R. Mercer 1984. An information theoretic

approach to the automatic determination of phonemic baseforms.

Proceedings of ICASSP 84, 42.5.1-42.5.4.

[43] Magerman, David. 1995. Statistical decision-tree models for

parsing. In 33rd Annual Meeting of the Association for Computational

Linguistics, pages 276–283. ACL.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 179

Page 180: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

[44] Mohri, Mehryar. 1994a. Compact representations by finite-statetransducers. In 32nd Meeting of the Association for Computational

Linguistics (ACL 94), Proceedings of the Conference, Las Cruces, NewMexico. ACL.

[45] Mohri, Mehryar. 1994b. Minimization of sequential transducers.Lecture Notes in Computer Science, 807.

[46] Mohri, Mehryar. 1994c. Syntactic analysis by local grammarsautomata: an efficient algorithm. In Papers in Computational

Lexicography: COMPLEX ’94, page s 179–191, Budapest. ResearchInstitute for Linguistics, Hungarian Academy of Sciences.

[47] Mohri, Mehryar. 1995. Matching patterns of an automaton. Lecture

Notes in Computer Science, 937.

[48] Mohri, Mehryar. 1996a. Finite-state transducers in language andspeech processing. Computational Linguistics, to appear.

[49] Mohri, Mehryar. 1996b. On some applications of finite-state

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 180

Page 181: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

automata theory to natural language processing. Journal of Natural

Language Engineering, to appear.

[50] Mohri, Mehryar, Fernando C. N. Pereira, and Michael Riley. 1996.Weighted automata in text and speech processing. In ECAI-96 Workshop,

Budapest, Hungary. ECAI.

[51] Mohri, Mehryar and Richard Sproat. 1996. An efficient compiler forweighted rewrite rules. In 34th Meeting of the Association for

Computational Linguistics (ACL 96), Proceedings of the Conference,Santa Cruz, California. ACL.

[52] Murveit, H., J. Butzbeger, V. Digalakis, and M. Weintrab. 1993.Large-vocabulary dictation using SRI’s decipher speech recognitionsystem: Progressive search techniques.

In Proceedings of the International Conference on Acoustics,Speech, andSignal Processing, pages II.319–II.322. ICASSP93.

[53] Odell, J.J, V. Valtchev, P. Woodland, and S. Young. 1994. A one pass

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 181

Page 182: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

decoder design for large vocabulary recognition Proc. ARPA HumanLanguage Technology Workshop. March 1994. Morgan Kaufmann.

[54] Oncina, Jose, Pedro Garcıa, and Enrique Vidal. 1993. Learningsubsequential transducers for pattern recognition tasks. IEEETransactions on Pattern Analysis and Machine Intelligence, 15:448–458.

[55] Ostendorf, M., and N. Veilleux. 1994. A hierarchical stochasticmodel for automatic prediction of prosodic boundary location.Computational Linguistics, 20(1):27–54.

[56] Paul, D. 1991. Algorithms for an optimal A* search and linearizingthe search in the stack decoder. Proc. ICASSP ‘91, S10.2.

[57] Pereira, Fernando C. N., and Michael Riley. 1996. Speechrecognition by composition of weighted finite automata. CMP-LGarchive paper 9603001.

[58] Pereira, Fernando C. N., Michael Riley, and Richard Sproat. 1994.Weighted rational transductions and their application to human language

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 182

Page 183: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

processing. In ARPA Workshop on Human Language Technology.Advanced Research Projects Agency.

[59] Pereira, Fernando C. N. and Rebecca N. Wright. 1991. Finite-stateapproximation of phrase structure grammars. In 29th Annual Meeting of

the Association for Computational Linguistics, (ACL 91), Proceedings ofthe conference, Berkeley, California. ACL.

[60] Perrin, Dominique. 1990. Finite automata. In J. Van Leuwen, editor,Handbook of Theoretical Computer Science, Volume B: FormalModelsand Semantics. Elsevier, Amsterdam, pages 1–57.

[61] Rabiner, L. 1989. A tutorial on hidden Markov models and selectedapplications in speech recognition. Proc. IEEE. 77(2) February 1989.

[62] Randolph, M. 1990. A data-driven method for discovery andpredicting allophonic variation. Proc. ICASSP ‘90,S14.10.

[63] Revuz, Dominique. 1991. Dictionnaires et lexiques, methodes et

algorithmes. Ph.D. thesis, Universite Paris 7: Paris, France.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 183

Page 184: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

[64] Riccardi, G. , E. Bocchieri, and R. Pieraccini. 1995.Non-deterministic stochastic language models for speech recognition. InProceedings of the International Conference on Acoustics,Speech, andSignal Processing, pages I.237–I.240. ICASSP95, May 1995.

[65] Riley, M. 1989. Some applications of tree-based modeling to speechand language. Proc. DARPA Speech and Natural Language Workshop.Cape Cod, MA, 339-352, Oct. 1989.

[66] Riley, M. 1991. A statistical model for generating pronunciationnetworks. In Proceedings of the International Conference on Acoustics,Speech, and Signal Processing, pages S11.1.–S11.4. ICASSP91, October1991.

[67] Riley, M., and A. Ljolje. 1992. Recognizing phonemes vs.recognizing phones: a comparison. Proceedings of ICSLP, Banff,Canada, October 1992.

[68] Riley, M., A. Ljolje, D. Hindle, and F. Pereira. 1995. The AT&T

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 184

Page 185: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

60,000 word speech-to-text system. Eurospeech ’95,Madrid, Spain, Sept.1995.

[69] Riley, M., F. Pereira, and E. Chung. 1995. Lazy transducercomposition: a flexible method for on-the-fly expansion ofcontext-dependent grammar networks. IEEE Workshop of AutomaticSpeech RecogntionSnowbird, Utah, December 1995.

[70] Roche, Emmanuel. 1993. Analyse syntaxique transformationnelle du

francais par transducteur et lexique-grammaire. Ph.D. thesis, UniversiteParis 7: Paris, France.

[71] Roche, Emmanuel. 1996. Finite-state transducers: Parsing free andfrozen sentences. In Proceedings of the ECAI-96 Workshop on ExtendedFinite State Models of Language, Budapest, Hungary. EuropeanConference on Artificial Intelligence.

[72] Salomaa, Arto and Matti Soittola. 1978. Automata-Theoretic Aspectsof Formal Power Series. Springer-Verlag: New York.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 185

Page 186: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

[73] Schutzenberger, Marcel Paul. 1961. A remark on finite transducers.Information and Control, 4:185–196.

[74] Schutzenberger, Marcel Paul. 1961. On the definition of a family ofautomata. Information and Control, 4.

[75] Schutzenberger, Marcel Paul. 1977. Sur une variante des fonctionssequentielles. Theoretical Computer Science.

[76] Schutzenberger, Marcel Paul. 1987. Polynomial decomposition ofrational functions. In Lecture Notes in Computer Science, volume 386.Lecture Notes in Computer Science, Springer-Verlag: Berlin HeidelbergNew York.

[77] Silberztein, Max. 1993. Dictionnaireselectroniques et analyse

automatique de textes: le systeme INTEX. Masson: Paris, France.

[78] Richard, Sproat. 1995. A finite-state architecture for tokenizationand grapheme-to-phoneme conversion in multilingual text analysis. InProceedings of the ACL SIGDAT Workshop, Dublin, Ireland. ACL.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 186

Page 187: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

[79] Sproat, Richard. 1996. Multilingual text analysis for text-to-speechsynthesis. In Proceedings of the ECAI-96 Workshop on Extended Finite

State Models of Language, Budapest, Hungary. European Conference onArtificial Intelligence.

[80] Sproat, Richard, Julia Hirschberg, and David Yarowsky. 1992. Acorpus-based synthesizer. In Proceedings of the International Conferenceon Spoken Language Processing, pages 563–566, Banff. ICSLP.

[81] Sproat, Richard, and Michael Riley. 1996. Compilation of weightedfinite-state transducers from decision trees. In 34th Annual Meeting of the

Association for Computational Linguistics, pages 215–222. ACL.

[82] Sproat, Richard, Chilin Shih, William Gale, and Nancy Chang. 1996.A stochastic finite-state word-segmentation algorithm for Chinese.Computational Linguistics, 22(3).

[83] Tarjan, R. 1983. Data Structures and Network Algorithms. SIAM.Philadelphia.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 187

Page 188: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

[84] Traber, Christof. 1995. SVOX: The implementation of atext-to-speech system for German. Technical Report 7. Swiss FederalInstitute of Technology, Zurich.

[85] Tzoukermann, Evelyne, and Mark Liberman. 1990. A finite-statemorphological processor for Spanish. In COLING-90, Volume 3, pages 3:277–286. COLING.

[86] Voutilainen, Atro. 1994. Designing a parsing grammar. TechnicalReport 22, University of Helsinki.

[87] Wang, Michelle, and Julia Hirschberg. 1992. Automaticclassification of intonational phrase boundaries. Computer Speech and

Language, 6:175–196.

[88] Woods, W.A. 1970. Transition network grammars for naturallanguage analysis. Communications of the Association for the

Computational Machinery, 13(10).

[89] Woods, W.A. 1980. Cascaded ATN grammars. Computational

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 188

Page 189: Algorithms for Speech Recognition and Language Processing › pdf › cmp-lg › 9608018.pdf · cmp-lg/9608018 v2 17 Sep 1996 Algorithms for Speech Recognition and Language Processing

Linguistics, 6(1).

[90] Yarowsky, David. 1992. Word-sense disambiguation using statistical

models of roget’s categories trained on large corpora. In Proceedings of

COLING-92. Nantes, France. COLING.

[91] Yarowsky, David. 1996. Homograph disambiguation in

text-to-speech synthesis. In Jan van Santen, Richard Sproat, Joseph Olive,

and Julia Hirschberg, editors, Progress in Speech Synthesis. Springer,

New York.

[92] Young, S.J., J.J. Odell, and P.C Woodland. 1994. Tree-based

state-tying for high accuracy acoustic modelling. Proc. ARPA Human

Language Technology Workshop. March 1994. Morgan Kaufmann.

M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 189