An Information-Maximization Approach to Blind Separation and Blind Deconvolution A.J. Bell and T.J. Sejnowski Computational Modeling of Intelligence 11.03.11.(Fri)

An Information-Maximization Approach to Blind Separation and Blind Deconvolution

A.J. Bell and T.J. Sejnowski

Computational Modeling of Intelli-gence

11.03.11.(Fri)Summarized by Joon Shik Kim

Abstract

• Self-organizing learning algorithm that max-imizes the information transferred in a net-work of nonlinear units.

• The nonlinearities are able to pick up higher-order moments of the input distribution and perform true redundancy reduction between units in the output representation.

• We apply the network to the source separa-ton (or cocktail party) problem, successfully separating unknown mixtures of up to 10 speakers.

Cocktail Party Problem

Introduction (1/2)

• The development of information-the-oretic unsupervised learning rules for neural networks

• The use, in signal processing, of higher-order statistics for separating out mixtures of independent sources (blind separation) or reversing the ef-fect of an unknown filter (blind de-convolution)

Introduction (2/2)

• The approach we take to these problem is a gen-eralization of Linsker’s informax principle to non-linear units with arbitrarily distributed inputs.

• When inputs are to be passed through a sigmoid function, maximum information transmission can be achieved when the sloping part of the sigmoid is optimally lined up with the high density parts of the inputs.

• Generation of this rule to multiple units leads to a system that, in maximizing information transfer, also reduces the redundancy between the units in the output layer.

Information Maximization

• The basic problem is how to maxi-mize the mutual information that the output Y of a neural network proces-sor contains about its input X.

• I(Y,X)=H(Y)-H(Y|X)

Information Maximization

• Information between inputs and out-puts can be maximized by maximiz-ing the entropy of the outputs alone.

• H(Y|X) tends to minus infinity as the noise variance goes to zero.

( , ) ( )I Y X H Yw w

For One Input and One Out-put

• When we pass a single input x through a transforming function g(x) to give an output variable y, both I(y,x) and H(y) are maximized when we align high density parts of the probability density function (pdf) of x with highly sloping parts of the func-tion g(x).



For an N→N Network

Inverse of a Matrix

1 ( 1) ( )

( )

i jijDet A

A transposeofDet A

11a b d b

c d c aad bc

Blind Separation and Blind Decon-volution

Blind Separation Results

Different Aspects from Previous work

• There is no noise, or rather, there is no noise model in this system.

• There is no assumption that inputs or outputs have Gaussian statistics.

• The transfer function is in general nonlinear.

Conclusion

• The learning rule is decidedly nonlocal. Each “neuron” must know the cofactor ei-ther of all the weights entering it, or all those leaving it. The network rule re-mains unbiological.

• We believe that the information maxi-mization approach presented here could serve as a unifying framework that brings together several lines of research, and as a guiding principle for further advances.

Documents

An Information-Maximization Approach to Blind Separation and Blind Deconvolution A.J. Bell and T.J. Sejnowski Computational Modeling of Intelligence 11.03.11.(Fri)