Machine Learning in Voice Biometrics

Machine Learningin Voice Biometrics

Michał DankiewiczDataKRK Meetup30.01.2017

basic concepts

Agenda

● Biometrics in general

● R&D @ VoicePIN.com

● Machine learning techniques in voice biometrics

● Features extraction

● Gaussian Mixture Models

● i-vectors

● Gotchas

● Challenge

Biometrics

sources: commons.wikimedia.orgclipartfest.compixabay.com

Voice Biometrics

= excitation + filter

sources: synthschool.comcommons.wikimedia.org https://youtu.be/ZQcEyXI1OGM?t=54s

Voice Biometrics

Access control

Transaction authentication

Internet of Things

Law enforcement

sources: giphy.com

Enrollment & verification

Enrollment

Bob

Verification

sources: pixabay.com

Bob

First attempts

Enrollment

Bob

Bob

Verification

cross-correlationsources: pixabay.comcommons.wikimedia.org

R&D @ VoicePIN.com

R&D @ VoicePIN.com

Features extraction

framing: waveform → matrix

MFCC – mel-frequency cepstral coefficients

● psychoacoustics

bottleneck features

● ANN with a narrow layer

sources: Deep neural networks in speaker recognition – K. Odrzywołek, 2016

Gaussian Mixture Models

sources: commons.wikimedia.org

λ – model x – observation (1 point)X – sequence of observations

Identification

Charlie

p(X|λ)

sources: pixabay.comcommons.wikimedia.org

Alice

Bob

argmax

Bob

Verification, UBM & LLR

p(X|λ)Bob

P(λ|X )=p(X|λ)P (λ )

P (X )Bayes’ theorem

we want this

GMM formula same for every speaker

- negligible

P(X )≈p(X|λ?)

P(λ|X)

LLR = log( p(X|λ)

p(X|λUBM))

Totally Bob

UBM – Universal Background Model

LLR – log-likelihood ratiosources: pixabay.comcommons.wikimedia.org

Verification, UBM & LLR

Clear conditions Noisy conditions-2

0

2

4

6

8

10

LLR is a solution

Alice's voice Bob's voice

log(p(X | λ))- log(p(X | UBM)

Alice's modelin relation to

UBM

Clear conditions Noisy conditions

-39

-37

-35

-33

-31

-29

-27

-25

Problem with p(X|λ) - where should we set a threshold?

Alice's voice Bob's voice

log(p(X|λ))

Alice's model

i-vectors

● D-dimensional GMM with C components has D*C mean values● Concatenation of them is a mean supervector

M = m + T*w

speaker supervector[D*C × 1]

UBM supervector[D*C × 1]

total variabilitymatrix

[D*C × W]

speaker i-vector[W × 1]

source: Low-dimensional speech representation based on Factor Analysis and its applications - Najim Dehak and Stephen Shum

Gotchas

● quality of recordings

Gotchas


train test

Gotchas


train test

Gotchas

● quality of recordings● gender

train test

Gotchas

● quality of recordings● gender

train test

Gotchas


● gender

● device/channel

train test

Gotchas


● gender

● device/channel

● real case (conditions)

train test

Gotchas


● gender

● device/channel

● real case (conditions)

● session variability

train test

Challenge Sneakers, 1992

https://www.youtube.com/watch?v=-zVgWpVXb64

Challenge

„What if someone records my voice?”

www.spoofingchallenge.org

@VoicePINcom

Thanks!

VoicePIN.com

Science

Machine Learning in Voice Biometrics