Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick...

Preview:

Citation preview

Audio-Visual Speech Processing

Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari

NOLISP, Paris, March 23rd 2007

Page 8 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Compulsory ? for:– Homeland/firms security: restricted

accesses,…– Secured computer login– Secured on-line signature of contracts

(e-Commerce)

Page 9 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Available features

– Face/Face features (lip, eyes) Face Modality– Speech Speech Modality– Speech Synchrony Synchrony Modality

Page 10 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Face modality– Detection:

• Generative models (MPT toolbox)• Temporal median Filtering• Eyes detection within faces

– Normalization: geometry + illumination

Page 11 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Face modality– Selection:

• Keep only the most reliable detection results

• Based on the distance Rel between a detected zone and its projection over the eigenfaces space

Page 12 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification Face Modality:

– Two verification strategies and one single comparison framework

• Global = Eigenfaces:– Calculation of a set of directions (eigenfaces)

defining a projection space– Two faces are compared regarding their

projection on the eigenfaces space.– Learning data: BIOMET (130 pers.) + BANCA

(30 pers.)

Page 13 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Face Modality:• SIFT descriptors:

– Keypoints extraction– Keypoints representation: 128-dimensional

vector (gradient orientation histogramme,…) + 4-dimensional position vector

SIFT descriptor(dim 128)

Position (x,y) + scale + orientation(dim 4)

Page 14 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Face Modality:• SVD-based matching method:

– Compare two videos V1 and V2– Exclusive principle: One-to-one correspondences

between» Faces (global)» Descriptors (local)

– Principle:» Proximity matrix computation between faces or

descriptors» Extraction of good pairings (made easy by SVD

computation)– Scores:

» One matching score between global representations

» One matching score between local representations

Page 15 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Speech Modality:– GMM-based approach;

• One world model• Each speaker model is derived from the

World Model by MAP adaptation• Speech verification score: derived from

likelihood ratio

Page 16 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Synchrony Modality:– Principle: synchrony between lips and

speech carries identity information– Process:

• Computation of a synchrony model (CoIA analysis) for each person based on DCT (visual signal) and MFCC (speech signal)

• Comparison of the test sample with the synchrony model

Page 17 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Experiments:– BANCA database:

• 52 persons divided into two groups (G1 and G2)• 3 recording conditions• 1 person 8 recordings (4 client accesses, 4

impostor accesses) • Evaluation based on P protocol: 234 client

accesses and 312 impostor accesses– Scores:

• 4 scores per access (PCA face, SIFT face, speech, synchrony)

• Score fusion based on RBF-SVM: hyperplan learned on G1/tested on G2 and conversely)

Page 18 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Experiments:

Recommended