Dynamic Match Lattice Spotting

1

Dynamic Match Lattice Spotting

Spoken Term Detection Evaluation

Queensland University of Technology

Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan

Presented by Roy Wallace

2

Overview

• Phonetic-based index open-vocabulary

• Based on lattice-spotting technique

• Two-tier database

• Dynamic-match rules

• Algorithmic optimisations

NOTE: Patented technology

3

Conceptgreasy

?

Phone decomposition

……………

aenxmdow

nxrnayth

iysaxrg

g r iy s iy

4

Concept

Target sequence:

Observed sequences:

Costs

g r ax s ih

th ay n r nx

ow d m nx ae

… … … … …

Dynamic matching

ax ih

g r iy s iy

5

Indexing

FeatureExtraction

Segmentation

SpeechRecognition

SequenceGeneration

Lattices

SequenceDB

Hyper-SequenceGeneration

Hyper-Sequence

DB

Audio

6

Hyper-sequence Mapping

• Map individual phones to “parent” classes

– We use Vowels, Fricatives, Glides, Stops and Nasals

• Simple example

– Parent classes: Vowels, Consonants

– Map each phone to parent class to create hyper-sequence

Cc

Vv

i

i

,...,, 321

SequenceDB

Hyper-Sequence

DB

7

Hyper-sequence Mapping

Hyper-sequence DB

Search term:

Hyper-sequence:

g r oy s ih

t l ow p iy

nx s eh r ay

d r ax b ae

b f ax d aa

oy b r aa f

eh g r iy m

… … … … …

Sequence DB

C C V C V

V C C V CC V C V C… … … … …… … … … …… … … … …

g r iy s iy

C C V C V

8

SearchingTerm

SequenceDB

Hyper-Sequence

DB

ResultsDynamic Matching

KeywordVerification

Hyper-mapping

Phone decomp.

Split longterms

Mergelong terms

9

Dynamic Matching

• Minimum Edit Distance (MED)

• i.e. Levenshtein Distance

• Insertions, deletions, substitutions

• Finds minimum cost of transformation

10

Dynamic Matching

• Substitution costs

– Derived from phone confusion statistics

t transcripreference in the was phone

recogniserby emitted was phone

yR

xE

y

x

xy

xy

s

ERp

ERI

yxyxC

|log

|

phone with phone ngsubstituti ofcost ),(

11

Optimisations

• Prefix sequence optimisation

• Early stopping optimisation

• Linearised MED search approximation

12

Long Term Mergingolympic sites

ow l ih m p ih k s ay t s

ow l ih m p ih k p ih k s ay t s

Search Search

Merge

Results

13

Keyword Verification

• Acoustic

– Use acoustic score from lattice to boost occurrences with high confidence

• Neural Network

– Produce a confidence score by fusing

• MED score and Acoustic score

• Term phone length

• Term phone classes

14

Results

Source TypeDevSet phone

error rate

Primary system

Contrastive systems

No Acous. LTS Only

Bnews 24% 0.246 0.245 0.208

CTS 45% 0.104 0.102 0.080

Confmtg 56% 0.021 0.019 0.016

Index size 558 MB/Sh (297 MB/Sh for No Acous.)

Index speed 18x real-time

Search speed 3 hr searched / CPU-sec

Maximum Term-Weighted Value on EvalSet terms

15

Conclusion

• Open-vocabulary and phone-based

• Patented technology utilises

– sequence and hyper-sequence databases

– optimisations for rapid searches

• Advantages

– Other languages

– Economy of scale

16

Conclusion

• Limitations

– Indexing speed and size

– Need to split long sequences

• Future work

– Keyword Verification

• Word-level information (e.g. LVCSR)

• Acoustic features (e.g. prosody)

– Indexing/searching frameworks

– Spoken Document Retrieval and other semantic applications

17

References1. A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with

applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March 2005

2. K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication

3. CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict

4. S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc.

5. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp. 707-710.

Documents

Dynamic Match Lattice Spotting