Upload
kalila
View
22
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Dynamic Match Lattice Spotting. Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy Wallace. Overview. Phonetic-based index open-vocabulary Based on lattice-spotting technique - PowerPoint PPT Presentation
Citation preview
1
Dynamic Match Lattice Spotting
Spoken Term Detection Evaluation
Queensland University of Technology
Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan
Presented by Roy Wallace
2
Overview
• Phonetic-based index open-vocabulary
• Based on lattice-spotting technique
• Two-tier database
• Dynamic-match rules
• Algorithmic optimisations
NOTE: Patented technology
3
Conceptgreasy
?
Phone decomposition
……………
aenxmdow
nxrnayth
iysaxrg
g r iy s iy
4
Concept
Target sequence:
Observed sequences:
Costs
g r ax s ih
th ay n r nx
ow d m nx ae
… … … … …
Dynamic matching
ax ih
g r iy s iy
5
Indexing
FeatureExtraction
Segmentation
SpeechRecognition
SequenceGeneration
Lattices
SequenceDB
Hyper-SequenceGeneration
Hyper-Sequence
DB
Audio
6
Hyper-sequence Mapping
• Map individual phones to “parent” classes
– We use Vowels, Fricatives, Glides, Stops and Nasals
• Simple example
– Parent classes: Vowels, Consonants
– Map each phone to parent class to create hyper-sequence
Cc
Vv
i
i
,...,, 321
SequenceDB
Hyper-Sequence
DB
7
Hyper-sequence Mapping
Hyper-sequence DB
Search term:
Hyper-sequence:
g r oy s ih
t l ow p iy
nx s eh r ay
d r ax b ae
b f ax d aa
oy b r aa f
eh g r iy m
… … … … …
Sequence DB
C C V C V
V C C V CC V C V C… … … … …… … … … …… … … … …
g r iy s iy
C C V C V
8
SearchingTerm
SequenceDB
Hyper-Sequence
DB
ResultsDynamic Matching
KeywordVerification
Hyper-mapping
Phone decomp.
Split longterms
Mergelong terms
9
Dynamic Matching
• Minimum Edit Distance (MED)
• i.e. Levenshtein Distance
• Insertions, deletions, substitutions
• Finds minimum cost of transformation
10
Dynamic Matching
• Substitution costs
– Derived from phone confusion statistics
t transcripreference in the was phone
recogniserby emitted was phone
yR
xE
y
x
xy
xy
s
ERp
ERI
yxyxC
|log
|
phone with phone ngsubstituti ofcost ),(
11
Optimisations
• Prefix sequence optimisation
• Early stopping optimisation
• Linearised MED search approximation
12
Long Term Mergingolympic sites
ow l ih m p ih k s ay t s
ow l ih m p ih k p ih k s ay t s
Search Search
Merge
Results
13
Keyword Verification
• Acoustic
– Use acoustic score from lattice to boost occurrences with high confidence
• Neural Network
– Produce a confidence score by fusing
• MED score and Acoustic score
• Term phone length
• Term phone classes
14
Results
Source TypeDevSet phone
error rate
Primary system
Contrastive systems
No Acous. LTS Only
Bnews 24% 0.246 0.245 0.208
CTS 45% 0.104 0.102 0.080
Confmtg 56% 0.021 0.019 0.016
Index size 558 MB/Sh (297 MB/Sh for No Acous.)
Index speed 18x real-time
Search speed 3 hr searched / CPU-sec
Maximum Term-Weighted Value on EvalSet terms
15
Conclusion
• Open-vocabulary and phone-based
• Patented technology utilises
– sequence and hyper-sequence databases
– optimisations for rapid searches
• Advantages
– Other languages
– Economy of scale
16
Conclusion
• Limitations
– Indexing speed and size
– Need to split long sequences
• Future work
– Keyword Verification
• Word-level information (e.g. LVCSR)
• Acoustic features (e.g. prosody)
– Indexing/searching frameworks
– Spoken Document Retrieval and other semantic applications
17
References1. A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with
applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March 2005
2. K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication
3. CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
4. S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc.
5. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp. 707-710.