Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Preview:

Citation preview

Improving Speech Recognitionwith Embodied Cognition

and Behaviour-based Robotics

Improving Speech Recognitionwith Embodied Cognition

and Behaviour-based Robotics

Jorge Davila-Chacon

University of Hamburg - Knowledge Technology

www.informatik.uni-hamburg.de/WTM/

Spotify ML Meetup – November 3rd 2014

MotivationMotivation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 2

• Why is bio-inspired SSL interesting / useful?

Neurobotic ExperimentsNeurobotic Experiments

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 3

Virtual Reality LabVirtual Reality Lab

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 4

Bauer, J., Davila-Chacon, J., Strahl, E., Wermter, S. Smoke and Mirrors — Virtual Realities for Sensor Fusion Experiments in Biomimetic Robotics. In: Multisensor Fusion and Integration for Intelligent Systems, 2012

Neurobotic ExperimentsNeurobotic Experiments

Jorge Davila-Chacon 5Bio-Inspired SSL for Robot ASR

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 6

ITD

ILD

ITDs fromLow Frequencies

ITDs fromLow Frequencies

ILDs fromHigh Frequencies

ILDs fromHigh Frequencies

Spatial cues allow sound source localisation:

• Interaural Time Difference (ITD)• Interaural Level Difference (ILD)

Spatial cues allow sound source localisation:

• Interaural Time Difference (ITD)• Interaural Level Difference (ILD)

Same frequency component

Same frequency component

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 7

ITDs extracted in Medial Superior Olive (MSO)

ITDs extracted in Medial Superior Olive (MSO)

• AVCN - Anterior Ventral Cochlear Nucleus

• AN - Auditory Nerve

• IC – Inferior Colliculus

Interaural Time DifferencesNeuroanatomy

Interaural Time DifferencesNeuroanatomy

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 8

Interaural Time DifferencesComputational Principle

Interaural Time DifferencesComputational Principle

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 9

ILDs extracted in Lateral Superior Olive (LSO)

ILDs extracted in Lateral Superior Olive (LSO)

• MNTB - Medial Nucleus of the Trapezoid Body

• IC – Inferior Colliculus

Interaural Level DifferencesNeuroanatomy

Interaural Level DifferencesNeuroanatomy

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 10

Output ofMSO and LSO integrated in

IC

Output ofMSO and LSO integrated in

IC

J. Dávila-Chacón, S. Heinrich, J. Liu, S. Wermter. Biomimetic Binaural Sound Source Localisation with Ego-Noise Cancellation. International Conference on Artificial Neural Networks, 2012.

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 11

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 12

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 13

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 14

MLP

IC

IC

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 15

J. Dávila-Chacón, S. Magg, J. Liu, S. Wermter. Neural and Statistical Processing of Spatial Cues for Sound Source Localisation. International Joint Conference on Neural Networks, 2013.

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 16

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 17

Simple IC outputSimple IC output

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 18

Complex IC outputComplex IC output

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 19

Static SSLStatic SSL

Dynamic SSL

Dynamic SSL

Feed forwardneural network

Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 20

Platforms used for ASR: iCub and Soundman

Platforms used for ASR: iCub and Soundman

Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 21

J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation. International Conference on Artificial Neural Networks, 2014.

Binary measure - Static ASRBinary measure - Static ASR

Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 22

Continuous measure - Static ASR

Continuous measure - Static ASR

J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation. International Conference on Artificial Neural Networks, 2014.

● Robotics as a “sandbox” for learning ML

● Neuroscience provides clues for computational principles

● Embodiment• iCub allows computation of spatial cues

• Interaction with environment can reduce noise

● Signal processing with ANN• Spiking ANN are an effective representation of spatial cues

• Bayesian integration important for dimensionality reduction

• Softmax Neural layer robust to ego-noise and reverberation

ConclusionConclusion

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 23

Future WorkFuture Work

● Neural SSL• Integrate GPU version of MSO and LSO

• Propagation of probabilities through time

• From discrete to continuous

● Integration with vision• From supervised to unsupervised SSL

• Possible extension to sensorimotor contingencies• Vision to select between multiple sound sources

• Vision for speech segregation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 24

Thank you for your attention.

jorgedch@gmail.com

LinkedIn: Jorge Davila Chacon

• J. Liu, D. Perez-Gonzalez, A. Rees, H. Erwin, S. Wermter. A biologically inspired spiking neural network model of the auditory midbrain for sound source localisation. Neurocomputing (2010)

• J. Davila-Chacon, S. Heinrich, J. Liu, and S. Wermter. Biomimetic binaural sound source localisation with ego-noise cancellation. International Conference on Artificial Neural Networks (2012)

• J. Bauer, J. Davila-Chacon, E. Strahl, S. Wermter. Smoke and Mirrors — Virtual Realities for Sensor Fusion Experiments in Biomimetic Robotics. Multisensor Fusion and Integration for Intelligent Systems (2012)

• J. Davila-Chacon, S. Magg, J. Liu, S. Wermter. Neural and Statistical Processing of Spatial Cues for Sound Source Localisation. International Joint Conference on Neural Networks (2013)

• J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation. International Conference on Artificial Neural Networks (2014)

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 25

AppendixAppendix

Best performances with clustering layer

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 26

AppendixAppendix

Best performances with clustering layer

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 27

AppendixAppendix

Bayesian IC model

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 28

AppendixAppendix

Bayesian IC model

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 29

AppendixAppendix

Levenshtein distance

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 30

J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation. International Conference on Artificial Neural Networks, 2014.

Recommended