One out of many: A sliding window approach to automatic ... · Linda Gerlach1, Finnian Kelly2, Anil...

Linda Gerlach1, Finnian Kelly2, Anil Alexander2

1Philipps-Universität Marburg, 2Oxford Wave Research Ltd.

{gerlach8@students.uni-marburg.de, anil|finnian@oxfordwaveresearch.com}

One out of many: A sliding window approach to automatic speaker recognition with multi-speaker files

IAFPA conference Istanbul, 14.-17.07.2019

Background

Situation: a multi-speaker recording has to be analysed

A typical first step: speaker diarisation

Problem: it is time consuming

Especially when dealing with large numbers of multi-speaker recordings, manual or semi-automatic diarisation is not feasible.

Alexander et al. (IAFPA 2017):

Dutch police:

4 years of telephone intercept recordings comprising about 1000 files

Question:

In which calls may a known suspect be present?

Options:

Manual or automatic diarisation

Real-case motivation

takes too long (x4)

varying accuracy;

human assistance requiredAlexander, A., Forth, O., Atreya, A. A. and Kelly, F. (2017). Not a lone voice: Automatically identifying speakers in multi-speaker recordings. Presentation at IAFPA 2017.

Is it possible to bypass speaker diarisation?

(by adopting a simple sliding window approach to speaker recognition)

Approach in Alexander et al. (IAFPA 2017):

They used a segmental approach within the i-vector framework, where the overall score was made up by the average of the three highest scores across the whole recording.

Disadvantage: Tricky for analysis of live recordings.

The present study explores a further simplified method and compares the performance of the i-vector and x-vector frameworks.

Previous approaches

1. Short segments from the given multi-speaker recording are extracted.

2. Each segment is modelled within the VOCALISE i-vector or x-vector framework.

3. The speaker models of each segment are compared with the model of the target speaker.

4. The comparison scores obtained across all segments are compared to each other.

5. The final match score is provided by the maximum score of the segment comparisons.

Two conditions (controlled versus real-world) were tested.

Approach

Sliding window approach

5 s slide

chunk 1 chunk 2 chunk X

-12.1877 11.72301 … -26.9067

Window length

Targetspeaker

Multi-speaker

recording

Corpus: DyViS (Nolan, 2011)

Task 1 (Interview) as a source of two-speaker data

Channel 1: predominantly target speaker (+ bleeding from channel 2)

Channel 2: predominantly interviewer (+ bleeding from channel 1)

Merged channels: speech from both speakers

Task 3 (Report), containing only the target speaker

100 speakers in total; 10 speakers used for sliding window approach

Experimental data – controlled conditions

EERs for DyViS Task 1 (channel 1, channel 2, merged channels) vs Task 3:

Channel 1 Channel 2 Merged

i-vector 4,85% 23,25% 10,32%

x-vector 2,78% 25,54% 11,45%

Results for 100 speakers

EERs for DyViS Task 1 (channel 1, channel 2, merged channels) vs Task 3:

Channel 1 Channel 2 Merged

i-vector 2,50% 18,16% 12,50%

x-vector 1,82% 21,58% 4,00%

Results for 10 speakers

EERs for DyViS Task 1 (merged channels with sliding window approach) vs Task 3: 10 speakers

120 … 70 60 50 45 40 30 20 15 10 5

Window length (s)

Ivector

Xvector

Corpus: NFI-FRIDA

10 stereo recordings containing 5 speaker pairs

Background noise: wind, birds

Both channels extracted represent a speaker of interest

Merged channels were subjected to the sliding window procedure

Extracted channels contain bleeding from the other channel to some extent

Same VOCALISE settings as for DyViS

Experimenting with FRIDA – uncontrolled condition

merged

*Thank you, David van der Vloed (NFI)

Van der Vloed, D., Bouten, J., Kelly, F. & Alexander, A. (2019). Forensically Realistic Inter-Device Audio (NFI-FRIDA) and initial experiments. NederlandsForensisch Institut.

Results for merged file comparisons using FRIDA

120 … 60 30 15 10 5

Window length (s)

Ivector

Xvector

Decreasing the window length lead to an increase in discrimination.

An initial decrease in the i-vector framework was followed by an increase (in the real-world condition).

A continued decrease could be shown in the x-vector framework in both conditions.

The sliding window approach looks promising and would be an effective, light-weight approach that could be used for analysing multi-speaker files and could be extended to live comparisons.

Can we hope to relieve the analyst?

Netherlands Forensic Institute

University of Cambridge

Special Thanks for Data

Alexander, A., Forth, O., Atreya, A. A. and Kelly, F. (2016). VOCALISE: A Forensic Automatic Speaker Recognition System supporting Spectral, Phonetic, and User-Provided Features. Odyssey 2016.

Alexander, A., Forth, O., Atreya, A. A. and Kelly, F. (2017). Not a lone voice: Automatically identifying speakers in multi-speaker recordings. Presentation at IAFPA 2017.

Nolan, F. (2011). Dynamic Variability in Speech: a Forensic Phonetic Study of British English, 2006-2007. [data collection]. UK Data Service. SN: 6790, http://doi.org/10.5255/UKDA-SN-6790-1

Kelly, F., Forth, O., Kent, S., Gerlach, L., Alexander, A. (2019). Deep neural network based forensic automatic speaker recognition in VOCALISE using x-vectors. Audio Engineering Society (AES) Forensics Conference 2019, Porto, Portugal.

van der Vloed, D., Bouten, J., Kelly, F., and Alexander A. (2018). NFI-FRIDA – Forensically Realistic Inter-Device Audio. IAFPA 2018.

References

Questions?

One out of many: A sliding window approach to automatic ... · Linda Gerlach1, Finnian Kelly2, Anil...

Documents

Line 78 Fm 85 87 Atkinson Van Head MW 38 M WA MO US …oneta4 atkinson (napoleon bonparte3, william alexander2, william1) was born 18 Jul 1884 in Helix, Umatilla, OR, and died 13 Jan

St. John the Baptist R. C. Church · 2020-05-19 · May Sponsor of St. John’s App: Maureen & Finnian Owens & Family In Loving Memory of Peggy Owens DOWNLOAD OUR FREE APP ‘ST JOHN

PyFR and GiMMiK on Intel KNL - WitherdenPyFR and GiMMiK on Intel KNL F.D. Witherden1, J.S. Park2, A. Heinecke3, P. Kelly2, P.E. Vincent2 and A. Jameson1 1Department of Aeronautics

Advances in understanding the hydrogeology of the … · Advances in understanding the hydrogeology of the London Basin Steve Buss1 and Travis Kelly2 With grateful acknowledgements

Muscarinic Receptor Agonists and Antagonists 2001.pdf · Muscarinic Receptor Agonists and Antagonists Kenneth J. Broadley1 and David R. Kelly2,* 1Division of Pharmacology, Welsh School

A N ç H C O ê S A D R A G commun N A · 2017-02-06 · Modernising Garda HR Ð A template for Change Mr Eric Brady C O N T E N T S. .. . Ms Paula Q uinn Mr E ric B rady Mr Finnian

Kelly Hampton, Founder Healing Enterprises LLC. …jazzupyourlifewithjudy.com/Kelly2/downloads/Item2-Small...magnified healing systems. I have much to learn about small animal behavior

The Supreme Court's Due Process Calculus for Administrative … · 2018-12-06 · Calculus for Administrative Adjudication case of Goldberg v. Kelly2 in 1970 confirmed the Court's

Deep neural network based forensic automatic speaker ... · Deep neural network based forensic automatic speaker recognition in VOCALISE using x-vectors Finnian Kelly 1, Oscar Forth

Western Illinois University - Your potential. Our purpose.faculty.wiu.edu/KM-Kelly2/ias.pdf · 2005. 8. 17. · KELLY ET AL. 901 points). Higher scores indicate a greater sensitivity

Collaborative Community Builder - PeopleCollaborative Community Builder A Digital Platform for Crowd-Sourcing a Community Vision Bowen Hui1 & Wayne Kelly2 1. Dept. of Computer Science,

Selecting, Planting and Staking Trees care o… · Selecting, Planting and Staking Trees U.K. Schuch1 and J.J. Kelly2 1Plant Sciences Department and 2Pima County Cooperative Extension,

Hemocyanin Conformational Changes Associated … Conformational Changes Associated with SDS-induced Phenol Oxidase Activation Sharon Baird1, Sharon M. Kelly2, Nicholas C. Price2, Elmar

Census 2: 1920, WA Walla Walla Prescott Dist 116 Pg 96A(See … · 2015-06-28 · 39. NELLIE MAXINE5 ATKINSON (CLAY4, NAPOLEON BONPARTE3, WILLIAM ALEXANDER2, WILLIAM1) was born 12

Selecting, Planting and Staking Trees...Selecting, Planting and Staking Trees U.K. Schuch1 and J.J. Kelly2 1Plant Sciences Department and 2Pima County Cooperative Extension, University

To be published in the April 2014 issue of the Family Court ... E. McIntosh1, Marsha Kline Pruett, and Joan B. Kelly2 This article is a companion piece to the empirical and theoretical

Participation Packet FINNIAN 2022

Star Healing Equine From Archangel Michaeljazzupyourlifewithjudy.com/Kelly2/downloads/Item3-Star...More about the Pleiadian/Star Galaxy, the Source of Star Healing Intergalactic EnergyTM

A Basic Celtic Timeline Patrick D.circa 461 Finnian D circa 549D.circa 432 Brigid D.circa 500 Pelagius D.circa 350 Brendan D circa 577 David Columba D.circa

1, James M. Kelly2 and Nicos Makris3 ABSTRACT - PEERpeer.berkeley.edu/events/2009/icaese3/cd/files/pdf/KONSTANTINIDIS... · 1 EXPERIMENTAL INVESTIGATION ON THE SEISMIC RESPONSE OF