28
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Application of Audio and Video Processing Methods for Language Research Przemyslaw Lenkiewicz, Peter Wittenburg Oliver Schreer, Stefano Masneri Daniel Schneider, Sebastian Tschöpel Max Planck Institute for Psycholinguistics Fraunhofer-Heinrich Hertz Institute Fraunhofer IAIS Institute

Application of Audio and Video Processing Methods for Language Research

Embed Size (px)

DESCRIPTION

Application of Audio and Video Processing Methods for Language Research. Przemyslaw Lenkiewicz, Peter Wittenburg Oliver Schreer , Stefano Masneri Daniel Schneider, Sebastian Tschöpel Max Planck Institute for Psycholinguistics Fraunhofer -Heinrich Hertz Institute - PowerPoint PPT Presentation

Citation preview

Page 1: Application of Audio and Video Processing Methods for Language Research

The Language Archive – Max Planck Institute for Psycholinguistics

Nijmegen, The Netherlands

Application of Audio and Video Processing Methods for Language

ResearchPrzemyslaw Lenkiewicz, Peter

Wittenburg Oliver Schreer, Stefano Masneri

Daniel Schneider, Sebastian Tschöpel

Max Planck Institute for PsycholinguisticsFraunhofer-Heinrich Hertz InstituteFraunhofer IAIS Institute

Page 2: Application of Audio and Video Processing Methods for Language Research

AVATECH

Advancing Video and Audio Technology in Humanities research

Page 3: Application of Audio and Video Processing Methods for Language Research

AVATECH

Max Planck Institute for PsycholinguisticsFraunhofer-Heinrich Hertz InstituteFraunhofer IAIS Institute

Page 4: Application of Audio and Video Processing Methods for Language Research

Annotations

Base of research analysis

Page 5: Application of Audio and Video Processing Methods for Language Research

Annotations – challenges

• Annotations are of different types, almost all manual • Different quality, conditions – mostly bad• Different languages – mostly minority languages• Annotation time is anything between 10-100 times

the length of the media

Page 6: Application of Audio and Video Processing Methods for Language Research

Manual Annotation Gap

We have around 200 TB data at MPI, in particular digitalized Audio/Video-Recordings, Brain-Images, Hand tracking, etc.Increasingly more data is nor described nor annotated

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 20130

50

100

150

200

250

300

MPI Digital Archive

Tera

byte

Organized and annotateddata

Not annotated dataswitch to lossless mJPEG2000, HD Video and Brain-Imaging

Page 7: Application of Audio and Video Processing Methods for Language Research

AVATecH Main Goals

• Reduce the time necessary for annotating.

• Develop communication interfaces and human-machine interfaces.

• Develop A / V processing algorithms.

Page 8: Application of Audio and Video Processing Methods for Language Research

Recognizers

• Small applications executed from ELAN• They have some specific purposes, they

recognize specific things• They usually create annotations or visualize

things for you• Aim at tasks that can be trivial but time consuming

Page 9: Application of Audio and Video Processing Methods for Language Research

RECOGNIZERS

Page 10: Application of Audio and Video Processing Methods for Language Research

Audio recognizers

• Audio segmentation– Autonomously splits audio stream into

homogeneous segments– Approach: Model-free approach based on

clustering with help of Bayesian information criterion

Page 11: Application of Audio and Video Processing Methods for Language Research

Audio recognizers

• Audio segmentation: Goals– Find coherent parts in a recording – Detect speaker changes – Detect environment changes – Detect utterances– Preprocessing step for speaker ID, clustering

Page 12: Application of Audio and Video Processing Methods for Language Research

• Speech/Non-speech detection– Detects whether a segment contains speech or

not– Approach: Offline training of Gaussian Mixture

Models for speech & non-speech and detection of model for each segment with highest likelihood

– Integrates further user-driven feedback mechanism

Page 13: Application of Audio and Video Processing Methods for Language Research

• Local Speaker clustering– Joins and labels segments according to

underlying speaker– Approach: Iterative calculation of Bayesian

Information Criterion and BIC-dependent merging of speech-segment combinations

– Baseline tested on single documents with mediocre results robustification needed

Page 14: Application of Audio and Video Processing Methods for Language Research

• Speaker Identification– Identifies well-known speakers from given

speech segments – Approach: Based on Adapted Gaussian Mixture

Models & probability functions– Currently developing fast, iterative training-

workflow to train a speaker model for detection

Page 15: Application of Audio and Video Processing Methods for Language Research

• Language Independent Alignment– Accurate alignment between speech and text

in a multilingual context.

Page 16: Application of Audio and Video Processing Methods for Language Research

• Query-by-example:– Accurate alignment between speech and text

in a multilingual context.

Page 17: Application of Audio and Video Processing Methods for Language Research

RECOGNIZERSEXAMPLE

Page 18: Application of Audio and Video Processing Methods for Language Research
Page 19: Application of Audio and Video Processing Methods for Language Research

Detect how many persons are in the video, detect who and when is

speaking, create appropriate number of tiers and annotations for all of them and align their speech with

transcription from a textfile.

Page 20: Application of Audio and Video Processing Methods for Language Research

Detect how many persons are in the video, detect who and when is

speaking, create appropriate number of tiers and annotations for all of them and align their speech with

transcription from a textfile.

Page 21: Application of Audio and Video Processing Methods for Language Research

Video recognizers

Page 22: Application of Audio and Video Processing Methods for Language Research

Shot detection/keyframe extraction

Page 23: Application of Audio and Video Processing Methods for Language Research

Skin color estimation

Page 24: Application of Audio and Video Processing Methods for Language Research

Skin color estimation

Page 25: Application of Audio and Video Processing Methods for Language Research

Hand/Head Detection & Tracking

Page 26: Application of Audio and Video Processing Methods for Language Research

We can calculate

• Boundaries of the gesture space• Speed, acceleration of hand movement• Segment recording into units:

– Stroke– Hold– Retreat

• Hand movement related to body• Which parts of speech overlap with

gestures

Page 27: Application of Audio and Video Processing Methods for Language Research

Hand/Head Detection & Tracking

• Demo (ellipses video)