Transcript
Page 1: Introduction The Structured Language Model(SLM)

This research was partially supported by the U.S. National

Science Foundation via STIMULATE grant No. 9618874

Introduction

The Structured Language Model(SLM)

- An attempt to exploit the syntactic structure of natural language

- Consists of a predictor, a tagger and a parser

- Jointly assigns a probability to a word sequence and parse structure

- Still suffers from data sparseness problem, Deleted Interpolation(DI) has been used

Use of Kneser-Ney smoothing to improve the performance

The Structured Language Model(SLM)

Example of a Partial Parse

Probability estimation in the SLM

Kneser-Ney Smoothing

Experiment Result

N-Best Rescoring

Test Set PPL as a Function of

ASR WER for SWB

Smoothing Issues in the Strucutred Language ModelWoosung Kim, Sanjeev Khudanpur, and Jun Wu

The Center for Language and Speech Processing, The Johns Hopkins University {woosung, sanjeev, junwu}@clsp.jhu.edu

The Center for Language and Speech

Processing

The Johns Hopkins University

3400 N. Charles Street, Barton Hall

Baltimore, MD 21218

Concluding Remarks

• KN smoothing of the SLM shows modest but consistent improvements

– both PPL and WER

• Future Work

– SLM with Maximum Entropy Models

– But Maximum Entropy Model training requires heavy computation

• Fruitful results in the selection of features for the Maximum Entropy Models

Smoothing 3gram SLM InptlDeleted Intpl 39.1% 38.6% 38.2%

KN-BO(Predictor) 38.3% 37.7% 37.5%KN-BO(All Modules) 38.3% 37.8% 37.7%

Nonlinear Intpl 38.1% 37.6% 37.5%NI w/Deleted Est. 38.3% 37.7% 37.5%

50

70

90

110

130

150

170

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Interpolation Weight()

Te

st S

et P

PL

KN BO for WSJ

KN NI for WSJ

KN BO for SWB

KN NI for SWB

Language Model Perplexity

WSJ Corpus SWBEM Iter. 3gram SLM Intpl Smoothing EM Iter. 3gram SLM Intpl

EM0

EM370 73

72

67

66

Deleted

Intpl

EM0

EM3

162 166

154

149

146

EM0

EM364 64

63

60

60

KN-BO

(Predictor)

EM0

EM3

152 166

149

139

137

EM0

EM364 64

63

60

60

KN-BO

All Modules

EM0

EM3

152 170

153

141

140

EM0

EM365 65

65

61

61

Nonlinear

Intpl

EM0

EM3

146 152

141

132

131

EM0

EM364 63

64

60

60

NI w/Deleted Estimation

EM0

EM3

145 150

141

131

130

Item WSJ SWB

Word Voc.

Part-Of-Speech Tags

Non-Terminal Tags

Parser Operations

10K(open)

40

54

136

21K(closed)

49

64

112

LM Dev. Set

LM Check Set

LM Test Set

ASR Test Set

885K

117K

82K

-

2.07M

216K

20K

20K

Database Size Specifications(in Words)

• Two corpora – Wall Street Journal(WSJ) Upenn Treebank

• for LM PPL test– Switchboard(SWB)

• For ASR WER test as well as LM PPL• Tokenization

– Original SWB tokenization Examples : They’re, It’s, etc. Not Suitable for syntactic analysis

– Treebank tokenizationExamples : They ‘re, It ‘s, etc.

Speech Recognizer(Baseline LM) 100 Best Hyp

Speech

Rescoring(New LM)

1 hypothesis

)T,.,|(T

),.,.|.(

).,,.,|(T,W

1

12

11221

)(

iiii

ii

i

n

inn

tagwwP

wtaghtaghtagwP

taghhtaghhwPP predictor

tagger

parser

ij

i

ij

i

S

jii

jiij

ii

jii

jiii

Sii

P

ρwPwP

T

T

)T,(W

)T,(W)T,(W

)T,(W)T,W|()W|(

where

11

LM PPL

Parse tree probability

The contract ended with a loss of 7 cents after

DT NN VBD IN DT NN IN CD NNS

cents NP

of PP

loss NP

lossNP

contractNP

ended VPwith PP

Backoff

Nonlinear Interpolation

00

0

0

0

)~,,(:~)~,,(

),,(),|(

),|(),(

),,(

),,(),(

),,(

),|(

wvuNw

wvn

wvnvuwβ

otherwisevuwβvuN

vund

wvuNifvuN

dwvuN

vuwP

w

wvn

wvnvuwβ

vuwβvuN

vund

vuN

dwvuNvuwP

~0

0

0

)~,,(

),,(),|(ˆ

),|(ˆ),(

),,(

),(

}0,),,(max{),|(

0),,~(:~

0 1),,( wherewvuNu

wvn

Recommended