Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer

Speaker Change Detection using Support Vector Machines

V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar

Speech and Vision LaboratoryDepartment of Computer Science and Engineering

Indian Institute of Technology Madras, Chennai – India

Speaker Change Detection

• Automatic segmentation of multispeaker speech data into data of one speaker only

• Dissimilarity of distributions of the data before and after a speaker change point

• Proposal: Speaker change detection as a pattern classification problem

• Patterns extracted from the data around the speaker change points as positive examples

• Patterns extracted from the data between the speaker change points negative examples

Speaker Change Detection using SVMs

• SVM trained using the positive and negative examples of speaker change points

• The SVM to scan the multispeaker data to hypothesize speaker change points

• Main issues: - Speaker independent detection of the points - Silence regions before speaker change points - Varying durations of speaker turns - Length of the window used for extraction of patterns - Large dimension of segmental pattern vectors - Large number of false alarms

Speaker Change Detection System

Fixed Duration Window based Patterns

Speaker Change Point Hypothesization using Fixed Duration Window based Patterns

• Input: The continuous speech signal of multispeaker speech data without silence regions

• The SVM is trained with pattern vectors extracted from the fixed length windows of n frames

• Sliding window method: A test pattern is extracted for every n frames with one frame shift.

• The test patterns with positive output of the SVM are hypothesized as speaker change points

• Several hypotheses may be spurious.

False Alarm Reduction

• Two methods are considered for reduction of spurious hypotheses (false alarms)

• 1st method: A threshold of 5 frames on the duration of speaker turns.

• 2nd method: The false hypotheses on validation data are used as the negative examples in training an SVM for false alarm reduction.

Studies on Speaker Change Detection

• Extended data of NIST2003 speaker recognition evaluation database

• 2-sp conversations, each of about 5 minute duration including 3 for each of M-M, M-F and F-F speaker conversations

• Speaker change points are manually marked• Data divided into training, validation and test datasets • Each dataset includes one each of M-M, M-F and F-F • Training dataset for SVM• Validation dataset to derive the negative examples for the

false alarm reduction SVM• Test dataset to evaluate the performance of speaker change

detection system

Performance of Speaker Change Detection System• # actual speaker change points in test dataset: 282• # frames in the test dataset: about 16000• # speaker change points missed (not detected): M• # false alarms: FA

Windowlength

(in msec)

After speaker change

hypothesization

After smoothing

After false alarm

reductionM FA M FA M FA

100 14 2488 26 957 43 836200 27 2269 37 907 57 673300 30 3277 39 1094 46 884400 9 5276 24 1330 33 1092

Summary

• Speaker change detection as a pattern classification problem.

• Fixed duration window method• SVMs to hypothesize the speaker change

points.• Methods for reduction of the number of false

alarms.• Performance of the proposed method on

NIST2003 speaker verification database.

Thank You

Documents

Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer