1
This research was partially supported by the U.S. National Science Foundation via STIMULATE grant No. 9618874 Introduction The Structured Language Model(SLM) - An attempt to exploit the syntactic structure of natural language - Consists of a predictor, a tagger and a parser - Jointly assigns a probability to a word sequence and parse structure - Still suffers from data sparseness problem, Deleted Interpolation(DI) has been used Use of Kneser-Ney smoothing to improve the performance The Structured Language Model(SLM) Example of a Partial Parse Probability estimation in the SLM Kneser-Ney Smoothing Experiment Result N-Best Rescoring Test Set PPL as a Function of ASR WER for SWB Smoothing Issues in the Strucutred Language Model Woosung Kim, Sanjeev Khudanpur, and Jun Wu The Center for Language and Speech Processing, The Johns Hopkins University {woosung, sanjeev, junwu}@clsp.jhu.edu The Center for Language and Speech Processing The Johns Hopkins University 3400 N. Charles Street, Barton Hall Baltimore, MD 21218 Concluding Remarks KN smoothing of the SLM shows modest but consistent improvements – both PPL and WER Future Work SLM with Maximum Entropy Models But Maximum Entropy Model training requires heavy computation Fruitful results in the selection of features for the Maximum Entropy Models Smoothing 3gram SLM Inptl Deleted Intpl 39.1% 38.6% 38.2% KN-BO(Predictor) 38.3% 37.7% 37.5% KN-BO(All Modules) 38.3% 37.8% 37.7% Nonlinear Intpl 38.1% 37.6% 37.5% NI w/Deleted Est. 38.3% 37.7% 37.5% 50 70 90 110 130 150 170 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Interpolation W eight( ) Test S et PP L KN BO forW SJ KN NI forW SJ KN BO forSW B KN NI forSWB Language Model Perplexity WSJ Corpus SWB EM Iter. 3gram SLM Intpl Smoothing EM Iter. 3gram SLM Intpl EM0 EM3 70 73 72 67 66 Deleted Intpl EM0 EM3 162 166 154 149 146 EM0 EM3 64 64 63 60 60 KN-BO (Predictor) EM0 EM3 152 166 149 139 137 EM0 EM3 64 64 63 60 60 KN-BO All Modules EM0 EM3 152 170 153 141 140 EM0 EM3 65 65 65 61 61 Nonlinear Intpl EM0 EM3 146 152 141 132 131 EM0 EM3 64 63 64 60 60 NI w/Deleted Estimation EM0 EM3 145 150 141 131 130 Item WSJ SWB Word Voc. Part-Of- Speech Tags Non-Terminal Tags Parser Operations 10K(open) 40 54 136 21K(clos ed) 49 64 112 LM Dev. Set LM Check Set LM Test Set ASR Test Set 885K 117K 82K - 2.07M 216K 20K 20K Database Size Specifications(in Words) • Two corpora –Wall Street Journal(WSJ) Upenn Treebank •for LM PPL test –Switchboard(SWB) •For ASR WER test as well as LM PPL • Tokenization –Original SWB tokenization Examples : They’re, It’s, etc. Not Suitable for syntactic analysis –Treebank tokenization Examples : They ‘re, It ‘s, etc. Speech Recognizer (Baseline LM) 100 Best Hyp Speech Rescoring (New LM) 1 hypothesis ) T , . , | (T ) , . , . | . ( ) . , , . , | ( T , W 1 1 2 1 1 2 2 1 ) ( i i i i i i i n i n n tag w w P w tag h tag h tag w P tag h h tag h h w P P predictor tagger parser i j i i j i S j i i j i i j i i j i i j i i i S i i P P ρ ρ w P w P T T ) T , (W ) T , (W ) T , (W ) T , (W ) T , W | ( ) W | ( where 1 1 LM PPL Parse tree probability t ended with a loss of 7 cents after DT NN VBD IN DT NN IN CD NNS cents NP of PP loss NP loss NP contract NP ended VP with PP Backoff Nonlinear Interpolation 0 0 0 0 0 ) ~ , , ( : ~ ) ~ , , ( ) , , ( ) , | ( ) , | ( ) , ( ) , , ( ) , , ( ) , ( ) , , ( ) , | ( w v u N w w v n w v n v u w β otherwise v u w β v u N v u n d w v u N if v u N d w v u N v u w P w w v n w v n v u w β v u w β v u N v u n d v u N d w v u N v u w P ~ 0 0 0 ) ~ , , ( ) , , ( ) , | ( ˆ ) , | ( ˆ ) , ( ) , , ( ) , ( } 0 , ) , , ( max{ ) , | ( 0 ) , , ~ ( : ~ 0 1 ) , , ( where w v u N u w v n

Introduction The Structured Language Model(SLM)

  • Upload
    neorah

  • View
    14

  • Download
    0

Embed Size (px)

DESCRIPTION

ended. VP. with. PP. loss. NP. of. contract. PP. NP. cents. NP. loss. NP. Speech Recognizer (Baseline LM). Rescoring (New LM). 100 Best Hyp. Speech. 1 hypothesis. Smoothing Issues in the Strucutred Language Model. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction The Structured Language Model(SLM)

This research was partially supported by the U.S. National

Science Foundation via STIMULATE grant No. 9618874

Introduction

The Structured Language Model(SLM)

- An attempt to exploit the syntactic structure of natural language

- Consists of a predictor, a tagger and a parser

- Jointly assigns a probability to a word sequence and parse structure

- Still suffers from data sparseness problem, Deleted Interpolation(DI) has been used

Use of Kneser-Ney smoothing to improve the performance

The Structured Language Model(SLM)

Example of a Partial Parse

Probability estimation in the SLM

Kneser-Ney Smoothing

Experiment Result

N-Best Rescoring

Test Set PPL as a Function of

ASR WER for SWB

Smoothing Issues in the Strucutred Language ModelWoosung Kim, Sanjeev Khudanpur, and Jun Wu

The Center for Language and Speech Processing, The Johns Hopkins University {woosung, sanjeev, junwu}@clsp.jhu.edu

The Center for Language and Speech

Processing

The Johns Hopkins University

3400 N. Charles Street, Barton Hall

Baltimore, MD 21218

Concluding Remarks

• KN smoothing of the SLM shows modest but consistent improvements

– both PPL and WER

• Future Work

– SLM with Maximum Entropy Models

– But Maximum Entropy Model training requires heavy computation

• Fruitful results in the selection of features for the Maximum Entropy Models

Smoothing 3gram SLM InptlDeleted Intpl 39.1% 38.6% 38.2%

KN-BO(Predictor) 38.3% 37.7% 37.5%KN-BO(All Modules) 38.3% 37.8% 37.7%

Nonlinear Intpl 38.1% 37.6% 37.5%NI w/Deleted Est. 38.3% 37.7% 37.5%

50

70

90

110

130

150

170

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Interpolation Weight()

Te

st S

et P

PL

KN BO for WSJ

KN NI for WSJ

KN BO for SWB

KN NI for SWB

Language Model Perplexity

WSJ Corpus SWBEM Iter. 3gram SLM Intpl Smoothing EM Iter. 3gram SLM Intpl

EM0

EM370 73

72

67

66

Deleted

Intpl

EM0

EM3

162 166

154

149

146

EM0

EM364 64

63

60

60

KN-BO

(Predictor)

EM0

EM3

152 166

149

139

137

EM0

EM364 64

63

60

60

KN-BO

All Modules

EM0

EM3

152 170

153

141

140

EM0

EM365 65

65

61

61

Nonlinear

Intpl

EM0

EM3

146 152

141

132

131

EM0

EM364 63

64

60

60

NI w/Deleted Estimation

EM0

EM3

145 150

141

131

130

Item WSJ SWB

Word Voc.

Part-Of-Speech Tags

Non-Terminal Tags

Parser Operations

10K(open)

40

54

136

21K(closed)

49

64

112

LM Dev. Set

LM Check Set

LM Test Set

ASR Test Set

885K

117K

82K

-

2.07M

216K

20K

20K

Database Size Specifications(in Words)

• Two corpora – Wall Street Journal(WSJ) Upenn Treebank

• for LM PPL test– Switchboard(SWB)

• For ASR WER test as well as LM PPL• Tokenization

– Original SWB tokenization Examples : They’re, It’s, etc. Not Suitable for syntactic analysis

– Treebank tokenizationExamples : They ‘re, It ‘s, etc.

Speech Recognizer(Baseline LM) 100 Best Hyp

Speech

Rescoring(New LM)

1 hypothesis

)T,.,|(T

),.,.|.(

).,,.,|(T,W

1

12

11221

)(

iiii

ii

i

n

inn

tagwwP

wtaghtaghtagwP

taghhtaghhwPP predictor

tagger

parser

ij

i

ij

i

S

jii

jiij

ii

jii

jiii

Sii

P

ρwPwP

T

T

)T,(W

)T,(W)T,(W

)T,(W)T,W|()W|(

where

11

LM PPL

Parse tree probability

The contract ended with a loss of 7 cents after

DT NN VBD IN DT NN IN CD NNS

cents NP

of PP

loss NP

lossNP

contractNP

ended VPwith PP

Backoff

Nonlinear Interpolation

00

0

0

0

)~,,(:~)~,,(

),,(),|(

),|(),(

),,(

),,(),(

),,(

),|(

wvuNw

wvn

wvnvuwβ

otherwisevuwβvuN

vund

wvuNifvuN

dwvuN

vuwP

w

wvn

wvnvuwβ

vuwβvuN

vund

vuN

dwvuNvuwP

~0

0

0

)~,,(

),,(),|(ˆ

),|(ˆ),(

),,(

),(

}0,),,(max{),|(

0),,~(:~

0 1),,( wherewvuNu

wvn