35
17-18 Mai 2006 Evaluation INRIA 1 METISS METISS METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader : Frédéric BIMBOT Audio & speech processing Audio & speech processing Overview of activities 2002-2005 Overview of activities 2002-2005 INRIA-Rennes

METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

Embed Size (px)

Citation preview

Page 1: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 1METISS

METISSMETISS Modélisation et Expérimentationpour le Traitement des Informations et des Signaux Sonores

Scientific leader : Frédéric BIMBOT

Audio & speech processingAudio & speech processing

Overview of activities 2002-2005Overview of activities 2002-2005

INRIA-Rennes

Page 2: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 2METISS

IntroductionIntroduction

Page 3: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 3METISS

Framework and foundations

Scientific foundations Probabilistic models and statistical estimation Redundant systems and adaptive representations

analysis, processingmodelling, representation description, decompositiondetection, classificationrecognition

General frameworkaudiospeechmusicmultimedia…

signalsrecordingsstreamstracks…

of

Audio scene analysis, description and recognition

Page 4: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 4METISS

Scientific objectives

to design generic, robust, fast and flexible approaches to a variety of problems in speech and audio segmentation, detection and classification, operating in the probabilistic framework

to investigate on theoretical properties and practical applications of adaptive representations and sparseness criteria with the purpose of advanced processing and structured description of audio signals

to extend and adapt approaches classically used in the context of speech processing to other classes of signals and problems

to study convergence between statistical approaches and adaptive decomposition within a common framework embedding signal representations and classification

Page 5: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 5METISS

Application domain and focus

Applicative fields Security, verification, authentication, rights management Rich audio transcription, content-based indexing, multi-purpose

navigation, information retrieval and summarization Advanced audio processing : segmentation, separation, spatialisation,

sound object extraction, music modeling Audio and audio-visual authoring, production and repurposing Education and entertainement

Primary focuses Speaker characterisation Audio structuring and indexing Sparse representations : theory and applications Audio source separation (under-determined case)

Page 6: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 6METISS

Team composition

MAILHEARBERET

TENGHUET

FORTHOFERSALLOZEROVLESAGECOLLET

BEN

BENAROYABLOUET

MC DONAGH

POREEBETSER

KIJAKKRSTULOVIC

GONONBEN

MORARU

BIMBOTGRAVIERGRIBONVAL

3

3 2

Permanent researchers (CR - CNRS or INRIA)Non-permanent staff (Engineers, ATER, Post-Doc)

PhD - 100 % with METISS PhD ~ 50 % with METISS2

2002 2003 2004 2005

+ Marie-Noëlle Georgeault administrative assistant (~ 25 %)

Page 7: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 7METISS

Probabilistic modeling Probabilistic modeling of audio signalsof audio signals

Page 8: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 8METISS

Probabilistic modeling (1)

1 audio class or 1 sound object

a variety of observations

1 family of sounds 1 probabilistic model

1 probability density function 1 likelihood function

)( 1 XYP T )(ˆ 1 XYP T

Page 9: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 9METISS

Probabilistic modeling (2)

Probabilistic modelingStatistical estimationState-sequence decodingBayesian decision

+ « know-how »

DetectionClassificationVerificationSegmentation…

Probabilistic models offer a well-understood generic inter-operable framework for the description and the classification of audio and speech signals

Dominant position of Hidden Markov Models (HMM) (and variants)

Highly competitive field in speech processing (research & industry)

More open in audio indexing (additional factors of complexity)

Page 10: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 10

METISS

Challenges and positioning

Robustness to unseen acoustic conditions to scarce training data to poorly representative samples to missing observations to …

Implementability size speed

scalability

distribution etc …

Generalisation to wider classesof signals with an audio component

multiple scales multiple sources multiple structures multiple sensors multiple levels of underlying processes heterogeneous streams (audio-visual) external sources of knowledge

METISS positioning :

- robust training and test methods- compact distributed algorithms- versatility / migration of formalism- methodology and evaluation

speaker verification audio segmentation broad sound-class indexing( speech recognition)

Page 11: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 11

METISS

Adaptive Adaptive representationsrepresentations

Page 12: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 12

METISS

Adaptive representations (1)

Audio signal : diversity of structures (time, frequency, statistics,…) superimposition of objects (notes, sources, tracks, …)

Redundant system(dictionary of atoms)

Adaptive decomposition

NiTti tgD

11)(

Tttss 1)(

TN with

Large set of vectors with various :- scales- time structures- frequency structures- phases- statistical properties- …

)()(1

tgts iNii

Selection of the« best » decomposition,

according to a given criterion :- sparsity- perception criterion- separability- conditional entropy- …

Page 13: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 13

METISS

Adaptive representations (2)

= 2 : quadratic norm maximizes dispersion = 0 : minimum non-zero coefficient NP-complete = 1 : tractable « compromise »

)(

FMinArgConstraint :

1

1

)()(

Ni

iLFCriterion :

)()(1

tgts iNii

Nii 1

Decomposition

Sparsity criteria

Pursuit algorithms (Matching Pursuit)

Page 14: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 14

METISS

Recent fast-growing field

High applicative potential

Intense emerging competition

Optimality and convergence of adaptive decompositions

Dictionary design (knowledge-based, data driven, …) Deformable, stochastic, multi-dimensional, … atoms Efficient decomposition algorithms and implementations Application scope

Ongoing scientific issues

METISS positioning :

- theoretical results- concepts and methodologies- decomposition algorithms

audio source separation(under-determined case)

Page 15: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 15

METISS

AchievementsAchievements2002-20052002-2005

and selected resultsand selected results

Speaker characterisation Audio structuring and indexing Sparse representations : theory and applications Audio source separation (under-determined case)

Page 16: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 16

METISS

Speaker characterisation

CART trees for scalable and distributable speaker verification

Model-based metrics and normalisations for speaker verification

Structural adaptation of speaker models (hierarchical Bayesian networks)

Methodology and algorithms for optimizing the coverage of a speaker database

Relative speaker space and metrics for efficient speaker indexing and retrieval [ongoing]

Page 17: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 17

METISS

CART based speaker verification

)Xy(P̂

)Xy(P̂log)y(S

t

t

tX

direct score functionassignment

-0.4-0.5

-0.80.7

0.9

0.3

1a

1b

2a

2b

3b

11ay

12by

32by

21ay

22by

YES

YES

YES

YES

NO

NO

NO

NO

YES

NO

-0.8

0.3 -0.5

0.9

0.7 -0.4

CART Treesused as a familyof approximatingfunctions

Blouet, Bimbot, Gonon, et al.

+ Extensionto oblique trees

complexity down 200 xerror rate up 33% only

EU-ISTINSPIRED Project

Page 18: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 18

METISS

Speaker recognition inthe model space (1)

Formal links between LLR and KL-divergence

+mean-only adaptation

training procedure

likelihood ratio test

~=Euclidean distance in the

model space

Ben, Bimbot et al.

Page 19: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 19

METISS

Speaker recognition inthe model space (2)

Consequences :

- faster score computation procedure (at least -50%)- simpler normalization schemes (M-Norm)

no need of additional development data

with no performance degradation

Ben, Bimbot et al.

Tested successfullyfor speaker recognition forNIST and ESTER campaigns

Page 20: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 20

METISS

Audio indexing

HMM-based audio and audio-visual structuring (applied to sports programmes)

Audio segmentation and tracking using probabilistic models and statistical tests

Detection of simultaneous events in audio tracks

Granular models of audio signals using deformable atoms

Comparison and evaluation of beam-search techniques and hypothesis rescoring using external sources of knowledge [ongoing]

Algebraic representations and statistical modeling of formal music [ongoing]

Page 21: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 21

METISS

Multi-stream HMM modeling (1)of a tennis match

inspiredand adapted

from thespeech

recognitionparadigms

multi-level state-sequencerepresentation of a tennis match

Kijak et al. (with TMM)

multi-stream audio-visual HMM

Page 22: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 22

METISS

Video-onlyShot-basedC = 77%

Video+AudioShot-based + segmentalC = 85%

Multi-stream HMM modeling (2)Delakis, Gravier et al.(with TexMex)

segmental models relaxed synchronyconstraints

Page 23: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 23

METISS

Sparse representations

Mathematical test for the optimality of a sparse representation

Matching pursuit made tractable (1 hour 0.25 x RT)

Structured matching pursuit incorporating explicit signal family models

Adaptive computational strategies

Beyond sparsity : recovering structured representations…

Learning shift-invariant atoms (MoTIF algorithms) [ongoing]

Page 24: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 24

METISS

Sparse solutions to inverse linear problems

In the under-determined case :

Gribonval et al.

BUT if :

If a sparse representation is sparse enough,then it is the sparsest one

Page 25: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 25

METISS

Matching Pursuit made tractableGribonval, Krstulovic et al.

C++ ToolkitGPL Licence

for a 1 hour audio signalprocessing time reduced from 20 h 0.25 h

flexible operationreproducible results

usable in other fields : medical signals, sismology, etc …

MPTK

Page 26: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 26

METISS

Source separation(with primary focus on undertermined problems)

Statistical schemes and adaptive training for single-channel separation

Source separation approaches using multi-channel Matching Pursuit in the underdetermined case

Contributions in evaluation methodology : task definition & performance measurements

Speech « denoising » using underdetermined sources separation techniques

Dictionary design methods for source separation [ongoing]

DEMIX : a robust algorithm to estimate the number of sources using clustering techniques [ongoing]

Page 27: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 27

METISS

Single sensor audio source separation

Factorial GMM

Voice GMM

Music GMM

Observed signalVoice + Music

Wiener filter

EstimatedVoice signal

Benaroya, Bimbot, Gribonval, Ozerov (with FTR&D)

innovative scheme for underdetermined source separation compatibility with speech processing state-of-the-art strong links with sparse decomposition problems versatile and efficient for a range of audio description tasks

Use of afactorial GMMto builda time-varyingWiener filter

Articlein IEEETrans SAP2006

+ new resultsto come

Page 28: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 28

METISS

Underdetermined stereophonicsource separation using sparse method

Separation

least squares sparsity

Mixing matrix

Lesage, Gribonval et al.

Audio examplesavailable

Page 29: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 29

METISS

Collaborations, Disseminationand Visibility

Privileged cooperation with the TEXMEX group at IRISA (+ VISTA)

Consistent network of academic and industrial partners outside IRISA

Regular participation to collaborative projects (EU-IST, RNRT, bilateral partnership, …)

Strong involvement in concerted research actions (ESTER, MathSTIC, GDR-ISIS, NIST evaluations, …)

Visible participation to and production of free software : ELISA platform, AudioSeg, MPTK, SIROCCO, BSS-EVAL

Sustained effort of publication and dissemination of the group research results

Additional visibility through responsability taking in scientific societies, workshop organisation and editorial boards

Page 30: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 30

METISS

Summary 2002-2005Summary 2002-2005

Strategy and perspectivesStrategy and perspectives2006-20102006-2010

Page 31: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 31

METISS

Achievements 2002-2005 (1)

solid contributions to the state-of-the art with respect to several topics related to speaker and audio class modelling and recognition

key extension, experimentation and validation of the Hidden Markov Model framework for joint audio and video modelling and structuring

major theoretical and experimental progress in the field of sparse representations and adaptive decomposition

pioneering work in mono- and multi-channel source separation in the underdetermined case

Page 32: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 32

METISS

Achievements 2002-2005 (2)

strategic improvement in the efficiency of pursuit algorithms both in terms of search strategy and implementation

development of a usable know-how in keyword spotting and speech recognition

sustained activities in assessment methodology, resource distribution and evaluation campaigns

scientific objective #4 needs consolidation

Page 33: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 33

METISS

To keep our position in our initial field of expertise : models, algorithms and tools for automatic processing of audio and speech signal

To push our advantage in the field of sparse representations, both from the theoretical and applicative viewpoint.

To extend our scope towards more powerful approaches for the representation and modeling of audio and multi-modal signals with an audio component

To step in and progress in the area of compressing large-scale high-dimensional multi-modal data

Strategy 2006-2010

Page 34: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 34

METISS

Scientific challenges

Probabilistic multi-level multi-stream dependency models for the representation of multiple sources and the integration of heterogeneous levels of knowledge in audio (-visual) streams Bayesian networks

Data-driven representations, model discovery and self-structuring of information in audio and audio-visual streams and contents

theoretical consolidation

Experimental platforms and numerically efficient algorithms for large scale data and near real-time processing engineering work

Deeper understanding of the links between theoretical concepts of adaptive representation, sparse decomposition, multi-scale analysis and pratical implications in terms of robustness, separability and adaptability

potential links with SVM

Compressing large-scale high-dimensional multimodal data for storage, description and classification compressed sensing

Page 35: METISS 17-18 Mai 2006Evaluation INRIA1 METISS Modélisation et Expérimentation pour le Traitement des Informations et des Signaux Sonores Scientific leader

17-18 Mai 2006 Evaluation INRIA 35

METISS

QuestionsQuestions