NEURAL VISUAL OBJECTS ENHANCEMENT FOR SENSORIAL ......Systems , Rio de Janeiro, 15–18 May 2011, p....

Preview:

Citation preview

NEURAL VISUAL OBJECTS ENHANCEMENT FOR SENSORIAL SUBSTITUTION FROM VISION TO AUDITION

Damien Lescal, Louis-Charles Caron and Jean Rouat NECOTIS, Université de Sherbrooke, GEGI, Sherbrooke QC, Canada, J1K 2R1

Introduction  Existing sensorial substitution systems:   From vision to touch [1]:   From vision to audition [2]:

Studies have shown that simple tasks such as object position [3], shape recognition [4,5] and reading can be achieved using these systems.  Object based approach: Different  methods  exist   to  find  and  track  objects   [6]  or  to  represent  structure  of  objects  as  graphs  [7]  but,  they  need   large  sets  of  data  and  require  prior  knowledge  of  the  visual  environnement.  

Overview of the substitution system

1 2

Sound 1

Sound 2

Sound 3

F1

F2

F3

3 4

+ Final sound

5

1. Captured image: each pixel of the image is captured and then represented by one neuron. 2. Enhanced saliencies: strongly c o n n e c te d n e u ro n s fo r m clusters which represent regions of interest in the image. 3.Sound generation: each region is then associated to a simple sound

4. Filtering: every single sound is then filtered depending on its position in the image using Virtual Acoustic Space (VAS) model [11].

5.Complex sound: all simple sounds are added and form a complex sound which represents the visual scene

Each pixel p of the input image is a neuron n in the network and synapses connect each neuron to its 8 neighboring neurons by synapses .  Weight of a synapse between two neurons : Where is the weight of the synapse connecting neurons and , f () is a possibly nonlinear function and is neuron ’s pixel value.   Computation of an iteration :

The sounds are created in two steps: •  Recording the transfer function of the head: HRTFs are measured by placing miniature probe microphones into the subject’s ears and

recording the impulse responses to broad-band sounds presented from a range of directions in space. •  Playing back the sounds through a VAS filter: the bank of HRTF impulse responses are now converted into a filter bank. Any desired sound

can be convolved with one of these filters and played over headphones. This creates the perception of an externalized sound source.

Discussion

•  How much information from the visual scene can be carried by the auditory pathway?

•  Would our system be fast enough to process

images and generate sound in real time?

References [1]  P Bach-Y-Rita et al., “Vision substitution by tactile image projection,” Nature, vol. 221, pp. 963–964, 1969, [2] P.B.L. Meijer, “An experimental system for auditory image representations,” IEEE Tr. On Biomedical Engineering, vol. 39, no. 2, pp. 112 –121, 1992, [3]  G Jansson, “Tactile guidance of movement,” International Journal of Neuroscience, vol. 19, pp. 37– 46, 1983, [4] E Sampaio et al., “Brain plasticity: ’Visual’ acuity of blind persons via the tongue,” Brain Research, vol. 908, pp. 204–207, 2001, [5] K A Kaczmarek and S J Haase, “Pattern identification… electrotactile display,” IEEE Tr. on N. S. R. E., vol. 11, pp. 9–16, 2003, [6]I.Kokkinosand A.Yuille,“Hop:Hierarchical object parsing,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference 2009, pp. 802–809, [7] Michael I. Jordan (ed), Learning in Graphical Models, MIT Press, 1998, [8] P.M.Milner,“A model for visual shape recognition,” Psychological Review, vol. 81, no. 6, pp. 521–535, Nov. 1974, [9] Christoph Von Der Malsburg, “The correlation theory of brain function,” Models of Neural Networks II: Temporal Aspects … Biological Systems, , no. July 1981, pp. 95–119, 1994, [10] Louis-Charles Caron et al., “FPGA implementation of a spiking neural network for pattern matching,” in IEEE Int. Symp. on Circuits and Systems , Rio de Janeiro, 15–18 May 2011, p. 1342, [11] J. Schnupp et al, Auditory Neuroscience: Making Sense of Sound, The MIT Press, 2011.

n1 n2 n3

n4 n5 n6

n9n8n7

s35

s 57

s45 s56

s25

s58

s15

s59

p1 p2 p3

p4 p5 p6

p7 p8 p9

sij.w = sji.w = f (abs(ni.p− nj.p))sij.w

ninj

ni.pni

Object enhancement algorithm

ni.s(k) =ni.s(k −1)+ nj.s(k −1)× sij.w

j∑

NORM

Where is neuron ’s state at iteration k. NORM equals to 9 in this case.   Thresholding :   Results :

ni.s(k) ni

ni.s(k)← {ni .s(k )0

otherwiseif ni .s(k )≤THRESH

sij