Upload
michal-dankiewicz
View
51
Download
1
Embed Size (px)
Citation preview
Machine Learningin Voice Biometrics
Michał DankiewiczDataKRK Meetup30.01.2017
basic concepts
Agenda
● Biometrics in general
● R&D @ VoicePIN.com
● Machine learning techniques in voice biometrics
● Features extraction
● Gaussian Mixture Models
● i-vectors
● Gotchas
● Challenge
Biometrics
sources: commons.wikimedia.orgclipartfest.compixabay.com
Voice Biometrics
= excitation + filter
sources: synthschool.comcommons.wikimedia.org https://youtu.be/ZQcEyXI1OGM?t=54s
Voice Biometrics
Access control
Transaction authentication
Internet of Things
Law enforcement
sources: giphy.com
Enrollment & verification
Enrollment
Bob
Verification
sources: pixabay.com
Bob
First attempts
Enrollment
Bob
Bob
Verification
cross-correlationsources: pixabay.comcommons.wikimedia.org
R&D @ VoicePIN.com
R&D @ VoicePIN.com
Features extraction
framing: waveform → matrix
MFCC – mel-frequency cepstral coefficients
● psychoacoustics
bottleneck features
● ANN with a narrow layer
sources: Deep neural networks in speaker recognition – K. Odrzywołek, 2016
Gaussian Mixture Models
sources: commons.wikimedia.org
λ – model x – observation (1 point)X – sequence of observations
Identification
Charlie
p(X|λ)
sources: pixabay.comcommons.wikimedia.org
Alice
Bob
argmax
Bob
Verification, UBM & LLR
p(X|λ)Bob
P(λ|X )=p(X|λ)P (λ )
P (X )Bayes’ theorem
we want this
GMM formula same for every speaker
- negligible
P(X )≈p(X|λ?)
P(λ|X)
LLR = log( p(X|λ)
p(X|λUBM))
Totally Bob
UBM – Universal Background Model
LLR – log-likelihood ratiosources: pixabay.comcommons.wikimedia.org
Verification, UBM & LLR
Clear conditions Noisy conditions-2
0
2
4
6
8
10
LLR is a solution
Alice's voice Bob's voice
log(p(X | λ))- log(p(X | UBM)
Alice's modelin relation to
UBM
Clear conditions Noisy conditions
-39
-37
-35
-33
-31
-29
-27
-25
Problem with p(X|λ) - where should we set a threshold?
Alice's voice Bob's voice
log(p(X|λ))
Alice's model
i-vectors
● D-dimensional GMM with C components has D*C mean values● Concatenation of them is a mean supervector
M = m + T*w
speaker supervector[D*C × 1]
UBM supervector[D*C × 1]
total variabilitymatrix
[D*C × W]
speaker i-vector[W × 1]
source: Low-dimensional speech representation based on Factor Analysis and its applications - Najim Dehak and Stephen Shum
Gotchas
● quality of recordings
Gotchas
● quality of recordings
train test
Gotchas
● quality of recordings
train test
Gotchas
● quality of recordings● gender
train test
Gotchas
● quality of recordings● gender
train test
Gotchas
● quality of recordings
● gender
● device/channel
train test
Gotchas
● quality of recordings
● gender
● device/channel
● real case (conditions)
train test
Gotchas
● quality of recordings
● gender
● device/channel
● real case (conditions)
● session variability
train test
Challenge Sneakers, 1992
https://www.youtube.com/watch?v=-zVgWpVXb64
Challenge
„What if someone records my voice?”
www.spoofingchallenge.org
@VoicePINcom
Thanks!
VoicePIN.com