1
NEURAL VISUAL OBJECTS ENHANCEMENT FOR SENSORIAL SUBSTITUTION FROM VISION TO AUDITION Damien Lescal, Louis-Charles Caron and Jean Rouat NECOTIS, Université de Sherbrooke, GEGI, Sherbrooke QC, Canada, J1K 2R1 Introduction Existing sensorial substitution systems : From vision to touch [1]: From vision to audition [2]: Studies have shown that simple tasks such as object position [3], shape recognition [4,5] and reading can be achieved using these systems. Object based approach : Different methods exist to find and track objects [6] or to represent structure of objects as graphs [7] but, they need large sets of data and require prior knowledge of the visual environnement. Overview of the substitution system 1 2 Sound 1 Sound 2 Sound 3 F1 F2 F3 3 4 + Final sound 5 1. Captured image : each pixel of the image is captured and then represented by one neuron. 2. Enhanced saliencies : strongly connected neurons form clusters which represent regions of interest in the image. 3.Sound generation : each region is then associated to a simple sound 4. Filtering : every single sound is then filtered depending on its position in the image using Virtual Acoustic Space (VAS) model [11]. 5.Complex sound : all simple sounds are added and form a complex sound which represents the visual scene Each pixel p of the input image is a neuron n in the network and synapses connect each neuron to its 8 neighboring neurons by synapses . Weight of a synapse between two neurons : Where is the weight of the synapse connecting neurons and , f () is a possibly nonlinear function and is neuron ’s pixel value. Computation of an iteration : The sounds are created in two steps: Recording the transfer function of the head : HRTFs are measured by placing miniature probe microphones into the subject’s ears and recording the impulse responses to broad-band sounds presented from a range of directions in space. Playing back the sounds through a VAS filter : the bank of HRTF impulse responses are now converted into a filter bank. Any desired sound can be convolved with one of these filters and played over headphones. This creates the perception of an externalized sound source. Discussion How much information from the visual scene can be carried by the auditory pathway? Would our system be fast enough to process images and generate sound in real time? References [1] P Bach-Y-Rita et al., “Vision substitution by tactile image projection,” Nature, vol. 221, pp. 963–964, 1969, [2] P.B.L. Meijer, “An experimental system for auditory image representations,” IEEE Tr. On Biomedical Engineering, vol. 39, no. 2, pp. 112 –121, 1992, [3] G Jansson, “Tactile guidance of movement,” International Journal of Neuroscience, vol. 19, pp. 37– 46, 1983, [4] E Sampaio et al., “Brain plasticity: ’Visual’ acuity of blind persons via the tongue,” Brain Research, vol. 908, pp. 204–207, 2001, [5] K A Kaczmarek and S J Haase, “Pattern identification… electrotactile display,” IEEE Tr. on N. S. R. E., vol. 11, pp. 9–16, 2003, [6]I.Kokkinosand A.Yuille,“Hop:Hierarchical object parsing,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference 2009, pp. 802–809, [7] Michael I. Jordan (ed), Learning in Graphical Models, MIT Press, 1998, [8] P.M.Milner,“A model for visual shape recognition,” Psychological Review, vol. 81, no. 6, pp. 521–535, Nov. 1974, [9] Christoph Von Der Malsburg, “The correlation theory of brain function,” Models of Neural Networks II: Temporal Aspects … Biological Systems, , no. July 1981, pp. 95–119, 1994, [10] Louis-Charles Caron et al., “FPGA implementation of a spiking neural network for pattern matching,” in IEEE Int. Symp. on Circuits and Systems , Rio de Janeiro, 15–18 May 2011, p. 1342, [11] J. Schnupp et al, Auditory Neuroscience: Making Sense of Sound, The MIT Press, 2011. n 1 n 2 n 3 n 4 n 5 n 6 n 9 n 8 n 7 s 35 s 57 s 45 s 56 s 25 s 58 s 15 s 59 p 1 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 s ij . w = s ji . w = f ( abs ( n i . p n j . p )) s ij . w n i n j n i . p n i Object enhancement algorithm n i . s ( k ) = n i . s ( k 1) + n j . s ( k 1) × s ij . w j NORM Where is neuron ’s state at iteration k. NORM equals to 9 in this case. Thresholding : Results : n i . s ( k ) n i n i . s ( k ) { n i . s ( k ) 0 otherwise if n i . s ( k ) THRESH s ij

NEURAL VISUAL OBJECTS ENHANCEMENT FOR SENSORIAL ...€¦ · Systems , Rio de Janeiro, 15–18 May 2011, p. 1342, [11] J. Schnupp et al, Auditory Neuroscience: Making Sense of Sound,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NEURAL VISUAL OBJECTS ENHANCEMENT FOR SENSORIAL ...€¦ · Systems , Rio de Janeiro, 15–18 May 2011, p. 1342, [11] J. Schnupp et al, Auditory Neuroscience: Making Sense of Sound,

NEURAL VISUAL OBJECTS ENHANCEMENT FOR SENSORIAL SUBSTITUTION FROM VISION TO AUDITION

Damien Lescal, Louis-Charles Caron and Jean Rouat NECOTIS, Université de Sherbrooke, GEGI, Sherbrooke QC, Canada, J1K 2R1

Introduction  Existing sensorial substitution systems:   From vision to touch [1]:   From vision to audition [2]:

Studies have shown that simple tasks such as object position [3], shape recognition [4,5] and reading can be achieved using these systems.  Object based approach: Different  methods  exist   to  find  and  track  objects   [6]  or  to  represent  structure  of  objects  as  graphs  [7]  but,  they  need   large  sets  of  data  and  require  prior  knowledge  of  the  visual  environnement.  

Overview of the substitution system

1 2

Sound 1

Sound 2

Sound 3

F1

F2

F3

3 4

+ Final sound

5

1. Captured image: each pixel of the image is captured and then represented by one neuron. 2. Enhanced saliencies: strongly c o n n e c te d n e u ro n s fo r m clusters which represent regions of interest in the image. 3.Sound generation: each region is then associated to a simple sound

4. Filtering: every single sound is then filtered depending on its position in the image using Virtual Acoustic Space (VAS) model [11].

5.Complex sound: all simple sounds are added and form a complex sound which represents the visual scene

Each pixel p of the input image is a neuron n in the network and synapses connect each neuron to its 8 neighboring neurons by synapses .  Weight of a synapse between two neurons : Where is the weight of the synapse connecting neurons and , f () is a possibly nonlinear function and is neuron ’s pixel value.   Computation of an iteration :

The sounds are created in two steps: •  Recording the transfer function of the head: HRTFs are measured by placing miniature probe microphones into the subject’s ears and

recording the impulse responses to broad-band sounds presented from a range of directions in space. •  Playing back the sounds through a VAS filter: the bank of HRTF impulse responses are now converted into a filter bank. Any desired sound

can be convolved with one of these filters and played over headphones. This creates the perception of an externalized sound source.

Discussion

•  How much information from the visual scene can be carried by the auditory pathway?

•  Would our system be fast enough to process

images and generate sound in real time?

References [1]  P Bach-Y-Rita et al., “Vision substitution by tactile image projection,” Nature, vol. 221, pp. 963–964, 1969, [2] P.B.L. Meijer, “An experimental system for auditory image representations,” IEEE Tr. On Biomedical Engineering, vol. 39, no. 2, pp. 112 –121, 1992, [3]  G Jansson, “Tactile guidance of movement,” International Journal of Neuroscience, vol. 19, pp. 37– 46, 1983, [4] E Sampaio et al., “Brain plasticity: ’Visual’ acuity of blind persons via the tongue,” Brain Research, vol. 908, pp. 204–207, 2001, [5] K A Kaczmarek and S J Haase, “Pattern identification… electrotactile display,” IEEE Tr. on N. S. R. E., vol. 11, pp. 9–16, 2003, [6]I.Kokkinosand A.Yuille,“Hop:Hierarchical object parsing,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference 2009, pp. 802–809, [7] Michael I. Jordan (ed), Learning in Graphical Models, MIT Press, 1998, [8] P.M.Milner,“A model for visual shape recognition,” Psychological Review, vol. 81, no. 6, pp. 521–535, Nov. 1974, [9] Christoph Von Der Malsburg, “The correlation theory of brain function,” Models of Neural Networks II: Temporal Aspects … Biological Systems, , no. July 1981, pp. 95–119, 1994, [10] Louis-Charles Caron et al., “FPGA implementation of a spiking neural network for pattern matching,” in IEEE Int. Symp. on Circuits and Systems , Rio de Janeiro, 15–18 May 2011, p. 1342, [11] J. Schnupp et al, Auditory Neuroscience: Making Sense of Sound, The MIT Press, 2011.

n1 n2 n3

n4 n5 n6

n9n8n7

s35

s 57

s45 s56

s25

s58

s15

s59

p1 p2 p3

p4 p5 p6

p7 p8 p9

sij.w = sji.w = f (abs(ni.p− nj.p))sij.w

ninj

ni.pni

Object enhancement algorithm

ni.s(k) =ni.s(k −1)+ nj.s(k −1)× sij.w

j∑

NORM

Where is neuron ’s state at iteration k. NORM equals to 9 in this case.   Thresholding :   Results :

ni.s(k) ni

ni.s(k)← {ni .s(k )0

otherwiseif ni .s(k )≤THRESH

sij