17
1 Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy Wallace

Dynamic Match Lattice Spotting

  • Upload
    kalila

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

Dynamic Match Lattice Spotting. Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy Wallace. Overview. Phonetic-based index  open-vocabulary Based on lattice-spotting technique - PowerPoint PPT Presentation

Citation preview

Page 1: Dynamic Match Lattice Spotting

1

Dynamic Match Lattice Spotting

Spoken Term Detection Evaluation

Queensland University of Technology

Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan

Presented by Roy Wallace

Page 2: Dynamic Match Lattice Spotting

2

Overview

• Phonetic-based index open-vocabulary

• Based on lattice-spotting technique

• Two-tier database

• Dynamic-match rules

• Algorithmic optimisations

NOTE: Patented technology

Page 3: Dynamic Match Lattice Spotting

3

Conceptgreasy

?

Phone decomposition

……………

aenxmdow

nxrnayth

iysaxrg

g r iy s iy

Page 4: Dynamic Match Lattice Spotting

4

Concept

Target sequence:

Observed sequences:

Costs

g r ax s ih

th ay n r nx

ow d m nx ae

… … … … …

Dynamic matching

ax ih

g r iy s iy

Page 5: Dynamic Match Lattice Spotting

5

Indexing

FeatureExtraction

Segmentation

SpeechRecognition

SequenceGeneration

Lattices

SequenceDB

Hyper-SequenceGeneration

Hyper-Sequence

DB

Audio

Page 6: Dynamic Match Lattice Spotting

6

Hyper-sequence Mapping

• Map individual phones to “parent” classes

– We use Vowels, Fricatives, Glides, Stops and Nasals

• Simple example

– Parent classes: Vowels, Consonants

– Map each phone to parent class to create hyper-sequence

Cc

Vv

i

i

,...,, 321

SequenceDB

Hyper-Sequence

DB

Page 7: Dynamic Match Lattice Spotting

7

Hyper-sequence Mapping

Hyper-sequence DB

Search term:

Hyper-sequence:

g r oy s ih

t l ow p iy

nx s eh r ay

d r ax b ae

b f ax d aa

oy b r aa f

eh g r iy m

… … … … …

Sequence DB

C C V C V

V C C V CC V C V C… … … … …… … … … …… … … … …

g r iy s iy

C C V C V

Page 8: Dynamic Match Lattice Spotting

8

SearchingTerm

SequenceDB

Hyper-Sequence

DB

ResultsDynamic Matching

KeywordVerification

Hyper-mapping

Phone decomp.

Split longterms

Mergelong terms

Page 9: Dynamic Match Lattice Spotting

9

Dynamic Matching

• Minimum Edit Distance (MED)

• i.e. Levenshtein Distance

• Insertions, deletions, substitutions

• Finds minimum cost of transformation

Page 10: Dynamic Match Lattice Spotting

10

Dynamic Matching

• Substitution costs

– Derived from phone confusion statistics

t transcripreference in the was phone

recogniserby emitted was phone

yR

xE

y

x

xy

xy

s

ERp

ERI

yxyxC

|log

|

phone with phone ngsubstituti ofcost ),(

Page 11: Dynamic Match Lattice Spotting

11

Optimisations

• Prefix sequence optimisation

• Early stopping optimisation

• Linearised MED search approximation

Page 12: Dynamic Match Lattice Spotting

12

Long Term Mergingolympic sites

ow l ih m p ih k s ay t s

ow l ih m p ih k p ih k s ay t s

Search Search

Merge

Results

Page 13: Dynamic Match Lattice Spotting

13

Keyword Verification

• Acoustic

– Use acoustic score from lattice to boost occurrences with high confidence

• Neural Network

– Produce a confidence score by fusing

• MED score and Acoustic score

• Term phone length

• Term phone classes

Page 14: Dynamic Match Lattice Spotting

14

Results

Source TypeDevSet phone

error rate

Primary system

Contrastive systems

No Acous. LTS Only

Bnews 24% 0.246 0.245 0.208

CTS 45% 0.104 0.102 0.080

Confmtg 56% 0.021 0.019 0.016

Index size 558 MB/Sh (297 MB/Sh for No Acous.)

Index speed 18x real-time

Search speed 3 hr searched / CPU-sec

Maximum Term-Weighted Value on EvalSet terms

Page 15: Dynamic Match Lattice Spotting

15

Conclusion

• Open-vocabulary and phone-based

• Patented technology utilises

– sequence and hyper-sequence databases

– optimisations for rapid searches

• Advantages

– Other languages

– Economy of scale

Page 16: Dynamic Match Lattice Spotting

16

Conclusion

• Limitations

– Indexing speed and size

– Need to split long sequences

• Future work

– Keyword Verification

• Word-level information (e.g. LVCSR)

• Acoustic features (e.g. prosody)

– Indexing/searching frameworks

– Spoken Document Retrieval and other semantic applications

Page 17: Dynamic Match Lattice Spotting

17

References1. A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with

applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March 2005

2. K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication

3. CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict

4. S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc.

5. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp. 707-710.