19
1 A Common Neural Network Model for Unsupervised Exploratory Data Analysis and Independent Component Analysis Keywords: Unsupervised Learning, Independent Component Analysis, Data Clustering, Data Visualisation, Blind Source Separation. Mark Girolami ‡† , Andrzej Cichocki, and Shun-Ichi Amari RIKEN Brain Science Institute Laboratory for Open Information Systems & Laboratory for Information Synthesis Hirosawa 2-1, Wako-shi, Saitama 351-01, Japan. Tel (+81) 48 467-9669 Fax (+81) 48 467-9687 {cia, amari}@brain.riken.go.jp [email protected] Currently on Secondment from Department of Computing and Information Systems The University of Paisley Paisley, PA1 2BE, Scotland Tel (+44) 141 848 3963 Fax (+44) 848 3404 [email protected] IEEE Membership Number: 40216669 Manuscript Number TNN#3323 Accepted for I.E.E.E Transactions on Neural Networks BRIEF PAPER Submitted :13 – 08 - 97 Accepted : 28– 07 - 98 Corresponding Author

A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

  • Upload
    vucong

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

1

A Common Neural Network Model for UnsupervisedExploratory Data Analysis and Independent

Component Analysis

Keywords: Unsupervised Learning, Independent Component Analysis, DataClustering, Data Visualisation, Blind Source Separation.

Mark Girolami‡†, Andrzej Cichocki, and Shun-Ichi Amari‡

‡RIKEN Brain Science InstituteLaboratory for Open Information Systems &

Laboratory for Information SynthesisHirosawa 2-1, Wako-shi, Saitama 351-01,

Japan.

Tel (+81) 48 467-9669Fax (+81) 48 467-9687

{cia, amari}@[email protected]

†Currently on Secondment fromDepartment of Computing and Information Systems

The University of PaisleyPaisley, PA1 2BE,

Scotland

Tel (+44) 141 848 3963Fax (+44) 848 3404

[email protected] Membership Number: 40216669

Manuscript Number TNN#3323

Accepted for I.E.E.E Transactions on Neural Networks

BRIEF PAPERSubmitted :13 – 08 - 97Accepted : 28– 07 - 98

† Corresponding Author

Page 2: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

2

Abstract

This paper presents the derivation of an unsupervised learning algorithm, which

enables the identification and visualisation of latent structure within ensembles of

high dimensional data. This provides a linear projection of the data onto a lower

dimensional subspace to identify the characteristic structure of the observations

independent latent causes. The algorithm is shown to be a very promising tool for

unsupervised exploratory data analysis and data visualisation. Experimental results

confirm the attractiveness of this technique for exploratory data analysis and an

empirical comparison is made with the recently proposed Generative Topographic

Mapping (GTM) and standard principal component analysis (PCA). Based on

standard probability density models a generic nonlinearity is developed which allows

both; 1) identification and visualisation of dichotomised clusters inherent in the

observed data and, 2) separation of sources with arbitrary distributions from mixtures,

whose dimensionality may be greater than that of number of sources. The resulting

algorithm is therefore also a generalised neural approach to independent component

analysis (ICA) and it is considered to be a promising method for analysis of real

world data that will consist of sub and super-Gaussian components such as biomedical

signals.

1. Introduction

This paper develops a generalisation of the adaptive neural forms of the

independent component analysis (ICA) algorithm primarily as a method for

interactive unsupervised data exploration, clustering and visualisation.

The ICA transformation has attracted a great deal of research focus in an

attempt to solve the signal-processing problem of blind source separation (BSS) [1, 2,

3, 6, 7, 8, 10, 14, 15, 16, 18]. However, the work reported in this paper has been

motivated by unsupervised data exploration and data visualisation. Unsupervised

statistical analysis for classification or clustering of data is a subject of great interest

when no classes are defined a priori. The projection pursuit (PP) methodology as

detailed in [13], was developed to seek or pursue P dimensional projections of multi-

dimensional data, which would maximise some measure of statistical interest, where

P = 1 or 2 to enable visualisation. Projection pursuit therefore provides a means of

latent data structure identification through visualisation [13].

Page 3: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

3

The link with projection pursuit and independent component analysis (ICA) is

discussed in [18], and a neural implementation of projection pursuit is developed and

utilised for BSS in [15]. It is argued [18] that the maximal independence index of

projection offered by ICA best describes the fundamental nature of the data. This is of

course in accordance with the latent variable model of data [12], which invariably

assumes that the latent variables are orthogonal i.e. independent.

A stochastic gradient-based algorithm is developed which is shown,

ultimately, to be an extension of the natural gradient family of algorithms proposed in

[1, 2]. The paper has the following structure; Section 2 introduces the ICA or latent

variable model of data and briefly reviews the ICA data transformation. Section 3

presents the derivation of the algorithm for data visualisation and ICA. In Section 4

the classical Pearson Mixture of Gaussians (MOG) density model [19] for cluster

identification is utilised as the non-linear term for the developed algorithm. It is found

that this provides an elegant closed form generic activation function, which also

provides a method of separating arbitrary mixtures of non-Gaussian sources. Section 5

reports on a data exploration simulation and a source separation experiment. The

concluding section discusses the potential extensions of the proposed approach.

2. The Independent Component Analysis Data Model

The particular ICA data model considered in this paper is defined as follows

( ) ( ) ( )ttt nAsx += (1)

The observed zero mean data vector is real valued such that ( ) Nt ℜ∈x the vector of

underlying independent sources or factors is given as ( ) Mt ℜ∈s such that M ≤ N, and

due to source independence the multivariate joint density is factorable

( ) ( )∏ ==

M

i ii spp1

s . The noise vector ( )tn is assumed to be Gaussian with a diagonal

covariance matrix { }T nnRnn E= , E denotes expectation, where the variance of each

noise component is usually assumed as constant such that IRnn2nσ= . The unknown

real valued matrix MN ×ℜ∈A is designated the mixing matrix in ICA literature [6] or

in factor analysis, the factor loading matrix [12].

Our objective is then to find a linear transformation

Wxy = (2)

Page 4: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

4

with NP×ℜ∈W where P << N typically with P = 2 for visualisation purposes, such

that the elements of y are as non-Gaussian and mutually independent as possible.

The objective of standard ICA is to recover all or some of the original source

signals ( )ts , or, indeed, to extract one specific source, when only the observation

vector ( )tx is available [16]. Alternatively the objective may be to estimate the

mixing matrix A. In contrast to these objectives in this paper our primary task is not to

estimate or extract any specific source signals ( )tsi but rather to cluster the data into

logical groupings and allow their visualisation. The non-Gaussian nature of the

marginal components of s, in terms of exploratory data analysis, is indicative of

interesting structure such as bi-modalities i.e clusters and intrinsic classes. This

indicates the potential for ICA to be applied to the clustering of data and this shall be

explored further herein. The following section derives an unsupervised learning

algorithm, which will be capable of identifying latent structure within data.

3. A Gradient Algorithm Based on Maximal Marginal Negentropy Criterion

The projection pursuit methodology, which seeks linear projections of the observed

data, can be considered as a means of seeking latent non-Gaussian structure within the

observations. In many ways the ICA model which assumes non-Gaussian sources can

be a more realistic data model than the independent Gaussian generated FA model. In

[13] the maximisation of higher order moments is utilised to pursue projections that

will identify structure associated with the maximised moment i.e. multiple modes or

skewness. However, if the resulting moments are small thus describing a mesokurtic

(slightly deviated from Gaussian) structure then moment based indices may not be

suitable. Marriot in [13] argues that information theoretic criteria for maximisation

may require to be considered in this case. The most obvious choice of an information

theoretic measure signifying departure from Gaussian will be negentropy, [9].

Negentropy [9] is defined as the Kullback-Leibler divergence of a distribution

( )iy yp from a Gaussian distribution with identical mean and variance ( )iG yp . In the

univariate case this is,

( ) ( ) ( )( ) i

iG

iy

iyiG dyyp

ypypppKL log∫= (3)

Page 5: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

5

where the subscript i denotes the i’th marginal density of a data vector y. Negentropy

will always be greater than zero and only vanishes when the distribution ( )yp y is

normally distributed [9]. This will then lend itself to stochastic gradient ascent

techniques.

The derivation of a learning rule for a simple single layer structure, which will

drive the output of each neuron maximally from normality, is the goal of this

particular section. Each output neuron will be parameterised individually as it is the

intention for each neuron to respond optimally to differing independent features

within the data. This is also in accordance with the factorial representation of the

density of the underlying sources (or latent variables) in both the FA and ICA data

models. We then use the factorial parametric form for the density of the network

output where Wxy = and NP×ℜ∈W , P ≤ N. The following criterion is proposed as

a measure of non-Gaussianity for the P outputs

( ) ( ) ( )( ) yy

yy dp

yppppKL

G

P

i ii

GF∏ =∫= 1log (4)

where the subscript F denotes the factored form. This criterion can then be written as

( ) ( ) ( )( ) ( ) ( )∏ =∫+= P

i

PGF d

iy

ippeppKL

1 log det2log

2

1yyR yyyπ (5)

The covariance matrices of the observed data and the transformed data Rxx { }TxxE=

and Ryy { } TT WWRyy xx== E are positive definite matrices respectively. By

considering the maximisation of this criterion we note that the two individual terms

play a significant role in the emergent properties of the learning. Consider the left-

hand term, ( ) ( )( )yyRdet2log21 Peπ . The Haddamard inequality [9] is given as

( ) ∏=

≤P

iyi

1

2det σyyR (6)

Where { } 22iyi yE=σ is the variance of the data from the i-th output. This indicates

that the maximal value of the term will be attained when the covariance matrix of the

network output is diagonalised. This is precisely the effect of the ‘sphereing’ data

transformation often discussed in the projection pursuit literature [12]. Likewise, the

second term will be maximal once the transformed data conforms to a factorial

Page 6: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

6

representation thus ensuring that each neuron will indeed be responding to distinct

independent underlying characteristics of the data.

In order to derive the learning algorithm let us compute the gradients of the

proposed criterion over the weights ijw of the transformation matrix.

( ) ( )( ) ( ){ }⎥⎦⎤

⎢⎣⎡ ++

∂∂ ∑

=

P

ii

yi

pEeP

1

T log detlog21

2log2

WWRW xxπ (7)

It is interesting to note that the term ( )( )Tdetlog WWR xx ensures that the rank of W is

equal to P and so each of the rows of W will be linearly independent. This is a

naturally occurring term, which ensures that each output neuron will seek mutually

independent directions. Now it is straightforward to show that

( )( ) ( ) xxxxxx WRWWRWWRW

1TTdetlog−

=∂∂

(8)

Taking the gradient of the output entropy gives

( ){ } ( ){ }T

1

log xyfW

Ei

yi

pEP

i

=∂∂ ∑

= (9)

The function f(y) operates element-wise on the vector y, such that

( ) ( ) ( )[ ]T,, MMii yfyf �=yf and ( ) ( )( )ii

iiii

yp

ypyf

'

= . The final gradient term is then

( ) ( ) ( ){ }T1T xyfWRWWRW xxxx EppKL GF +=

∂∂ −

(10)

The standard instantaneous gradient ascent technique can now be used in a stochastic

update equation, however we consider utilising the more efficient natural gradient for

the weight update [1, 2]. In the particular case under consideration where P << N the

square symmetric matrix term WWT will be positive semi-definite so the standard

natural gradient formula:

( ) ttGFt

tt ppKL WWW

W T

⎟⎟⎠

⎞⎜⎜⎝

⎛∂

∂=Δ η (11)

can not be used directly. We propose the modified formula

( ) ( )IWWW

W TtttGF

ttt ppKL εη +⎟⎟⎠

⎞⎜⎜⎝

⎛∂

∂=Δ (12)

Where εt is a small positive constant which ensures that the term ( )IWW Tttt ε+ is

always positive definite. This approach is similar to that used in many optimisation

methods to keep the inverse of the Hessian positive definite.

Page 7: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

7

Using (10) and (12) yields the following stochastic weight update.

( )( ) ( ) ( )( )T1TT xyfWRWWRWyyfIW xxxx +++=Δ−

ttttttt εηη (13)

For small values of tε then ( ) ( )ttt OO εηη >> and the rightmost term will have a

negligible effect on the weight update so the learning equation can be approximated

by

( )( ) ttttt WyyfIW T+=Δ η (14)

The weight update (13) and (14) will seek maximally non-Gaussian

projections onto lower dimensional sub-spaces for unsupervised exploratory data

analysis. However (14) can also be seen to be a generalisation of the original

equivariant ICA algorithm [1, 2, 3, 6, 7, 8] for ICA, as it is capable of finding P

independent components in an N-dimensional subspace. An alternative approach to

this problem is proposed in, for example, [16] where the notion of ‘extracting’ sources

sequentially from the observed mixture is utilised.

4. A Mixture Model to Identify Latent Clustered Data Structure

Distributions that are bi-modal exhibit one form of latent structure which is of interest

in identifying. Multiple modes may indicate specific clusters and classes inherent in

the data. Maximum likelihood estimation (MLE) approaches to data clustering [11]

employ Mixtures of Gaussian (MOG) models to seek transition regions of high and

low density and so identify potential data clusters.

One particular univariate MOG model which is of particular interest in this

study was originally proposed in [19]. The generic form of the Pearson model is given

below

( ) ( ) ( ) ( ) , , 1 222

211 σμσμ yapypayp GG +−= (15)

where 10 ≤≤ a (see Figure 1). It is clear that the distribution is symmetric possesing

two distinct modes when the mixing coefficient a = ½.

For the strictly symmetric case where a = ½ , μμμ == 21 and

222

21 σσσ == the above MOG density model (15) can be written as

( ) ⎟⎠⎞⎜

⎝⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟⎟⎠

⎞⎜⎜⎝

⎛−=

22

2

2

2 cosh

2expexp

2

1

σμ

σσμ

πσyy

yp (16)

Page 8: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

8

Employing (16) to compute the individual nonlinear terms in (14) produces the

following

( ) ( )( )

tanh

222

'

⎟⎠⎞⎜

⎝⎛+−==

σμ

σμ

σyy

yp

ypyf (17)

This is a particularly interesting form of nonlinearity as the both the linear and

hyperbolic tangent terms have been studied as the activations for single layer

unsupervised neural networks. A density model has now been idenitified with these

particular activations and now allows a probabilistic interpretation of the hyperbolic

tangent activation function.

To gain some insight regarding the statistical nature of the proposed MOG

model the associated cumulant generating function (CGF) is employed [14]. The

explicit form of the CGF for the generic Pearson model (15) in the case where

μμμ == 21 and 222

21 σσσ == is simply ( ) ( ) ( ) ( )( )BaAaw expexp1log +−=φ

where 2 22wwiA σμ −= , 2 22wwiB σμ += [14]. The related cumulants of the

distribution can be now be computed and after some tedious algebraic manipulation

the kurtosis for the distribution under consideration is

( )( ) ( )( ) 1416 611622224 σμμ +−+−− aaaaaa (18)

For the symmetric case where a = ½ then the expression for kurtosis reduces to

( )2224 2 σμμ +− (19)

which, interestingly, takes on strictly negative values for all 0>μ [14]. This is of

particular significance as this distribution and nonlinear function can also be utilised

for BSS of strictly sub-Gaussian sources.

For the case where two distinct modes are defined such that 2 =μ and 12 =σ

(see Figure 1) the nonlinearity takes the very simple form of

( ) ( ) yyyf −= 2tanh2 (20)

The following weight adaptation will then seek projections, which identify maximally

dichotomised clustered structure inherent in the data.

( )[ ] ttttttt WyyyyIW TT2tanh2 −+=Δ η (21)

An alternative density model, which defines a unimodal super-Gaussian density, is

given as ( ) ( ) ( )yyyp sech 2exp 2−∝ , where the normalising constant is neglected

Page 9: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

9

[14]. The associated derivative of log-density is then ( ) ( ) yyyf −−= tanh . This can

be combined with the nonlinear term based on the symmetric Pearson model when

1 =μ and 12 =σ , yielding, in vector format

( ) ( ) yyKy −−= tanh4f (22)

The square diagonal matrix, which contains each individual output kurtosis sign, is

defined as

( ) ( ) ( )[ ]Mdiag 424

144 sgn,sgn,sgn κκκ �=K (23)

The kurtosis of each output can be estimated online using the following moving

average estimator

[ ] [ ] [ ] ptttpttp yymym ˆ 1 ˆ 1 μμ +−=+ (24)

[ ] [ ] [ ]( ) 3 ˆˆ ˆ 2244 −= ttt ymymyκ (25)

The sample moments of order p are estimated using (24); in this case the second and

fourth order moments are required. The sample kurtosis estimate is then given by

(25).

The generic term (22) can be substituted into (14) finally giving

( )( ) tttttttt WyyyyKIW TT,4 tanh −−=Δ η (26)

This update equation can then also be applied to general ICA where the number of

outputs is less than the number of sensors. From the form of (26) it is clear that this

adaptation rule can also be utilised to separate scalar mixtures which may contain

arbitrary numbers of both sub and super-Gaussian sources. The use of (21) will

specifically seek linear projections identifying bi-modal and dichotomised clustered

structure within the data.

5. Experimental Results

5.1 Data Visualisation

The dataset1 used in this experiment arises from synthetic data modelling of a non-

invasive monitoring system which is used to measure the relative quantity of oil

within a multi-phase pipeline carrying oil, water and gas. The data consists of twelve

dimensions, which correspond to the measurements from six dual powered gamma

1 This data set is available from http://www.ncrg.aston.ac.uk/GTM/3PhaseData.html

Page 10: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

10

ray densitometers [4]. There are three particular flow regimes which may occur within

the pipeline namely laminar, annular and homogenous.

The Generative topographic mapping (GTM) [4] has been applied successfully

to the problem of visualising the latent structure within this data set and is used here

as a means of comparison with the derived adaptation rule (21). The data is first made

zero-mean and then sequentially presented to the network until the weights achieve a

steady value. A fixed learning rate of value 0.001 was used in this simulation.

The results using the nonlinear GTM mapping under the conditions reported in

[4] are given in Figure 2a. It is clear that the three clusters corresponding to the

different phases have been clearly identified and separated. In comparison to principal

component analysis (PCA) the results from GTM provide considerably more distinct

separation of the clusters corresponding to the three flow regimes. Figure 2b shows

the results using the adaptation rule (21), again it is clear that the points relating to the

laminar, annular and homogenous flow regimes have been distinctly clustered

together. However, it is of significance to note that there exist two clusters

corresponding to the laminar flow.

As the proportions of each phase changes within the laminar flow over time

there will be a change in the physical boundary between the phases which will trigger

a step change in the across pipe beams. It is this physical effect which gives rise to the

distinct clusters within the laminar flow. This identification of the additional clustered

structure within the laminar flow requires the use of a linear hierarchical approach to

data visualisation and is demonstrated in [5].

5.2 Blind Source Separation

This simulation focuses on image enhancement and is used here to demonstrate the

algorithm performance when applied to ICA. The problem consists of three original

source images which are mixed by a non-square linear memoryless mixing matrix

such that the number of mixtures is greater than the number of sources.

The pixel distribution of each image is such that two of them are sub-gaussian

with negative values of kurtosis and the other is super-gaussian with a positive value

of kurtosis. The values of kurtosis for each image are computed as 0.3068, -1.3753

and –0.2415. It is interesting to note that two of the images (Figure 3) have relatively

small absolute values of kurtosis and as such are approximately Gaussian. This is a

particularly difficult problem due to the non-square mixing and the presence of both

Page 11: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

11

sub and super Gaussian sources within the mixture. This difficulty is also

compounded with the small absolute values of kurtosis of two sources.

The first problem that has to be addressed is identifying the number of

sources. Simply computing the rank of the covariance matrix of the mixture can do

this. Historically the next problem would be two-fold as the mixture consists of a

number of sources which are sub-Gaussian and some which are super-Gaussian. This

of course affects the choice of the nonlinearity required to successfully separate the

sources. However, from (26) all that is required is to ‘learn’ the diagonal terms of the

K4 matrix. Figure 3, shows the observations and the final separated sources. Each

value is drawn randomly from the mixture and (26) is used to update the network

weights. The learning rate is kept at a fixed value of 0.0001. It should be stressed that

it is not required to make any assumptions on the type of non-Gaussian sources

present in the mixture, nor is choosing another form of nonlinearity and changing the

simple form of the algorithm required. Figure 3 shows the final separated images

indicating the good performance of the algorithm.

6. Conclusions

By considering an information theoretic index of projection based on negentropy a

generalised learning algorithm has been derived and this may be applied to both

unsupervised exploratory data analysis and independent component analysis with an

arbitrary number of outputs. The powerful capability of this approach for

unsupervised exploratory data analysis has been demonstrated using the oil pipeline

data and compared with the probabilistic (GTM). This technique has been applied to

other classical data-sets such as the Iris, Crab and Swiss Banknotes. In each case the

intrinsic clustered nature of the data is revealed by the use of the proposed learning

algorithm (26).

In terms of ICA a particularly difficult image enhancement problem has been

used to demonstrate the algorithm performance for blind source separation. Current

work, which is being addressed from the perspective of data analysis, is a means by

which this technique can be extended to a hierarchical method of dichotomising and

clustering data.

Acknowledgements

M Girolami is supported by a grant from the Knowledge Laboratory Advanced

Technology Centre, NCR Financial Systems Limited, Dundee, Scotland. We are

Page 12: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

12

indebted to Dr. J.F. Cardoso for helpful discussions regarding this work. Mark

Girolami is grateful to Dr. Michael Tipping and Prof. Chris Bishop for providing the

oil pipeline data and giving helpful insights regarding the physical interpretation of

the data analysis. This work was completed whilst Mark Girolami was an invited

visiting researcher at the Laboratory for Open Information Systems, Brain Science

Institute, Riken, Institute of Chemical and Physical Research, Wako-shi, Japan.

References

[1] Amari, S., Chen, T, P., and Cichocki, A., Stability Analysis of Learning

Algorithms for Blind Source Separation, Neural Networks, Vol.10, No8, 1345-1351,

1997.

[2] Amari, S., Cichocki, A, and Yang, H, A New Learning Algorithm for Blind

Signal Separation. Neural Information Processing, Vol 8, pp. 757-763. M.I.T Press,

1995.

[3] Bell, A and Sejnowski, T, An Information Maximisation Approach to Blind

Separation and Blind Deconvolution. Neural Computation 7, 1129 – 1159, 1995.

[4] Bishop, C., Svensen, M., Williams, C., GTM: The Generative Topographic

Mapping., Neural Computation, Vol. 10, Number 1, pp215 - 234.

[5] Bishop, C., and Tipping, M., A Hierarchical Latent Variable Model for Data

Visualisation, Technical Report NCRG/96/028, Aston University, 1997.

[6] Cardoso, J, F. and Laheld, B, H, Equivarient Adaptive Source Separation I.E.E.E

transactions on Signal Processing, SP-43, pp 3017 – 3029, 1997.

[7] Cichocki, A., Unbehauen, R. and Rummert, E, Robust Learning Algorithm for

Blind Separation of Signals, Electronics Letters,.30, No.17, pp 1386-1387, 1994.

[8] Cichocki, A. and Unbehauen, R., Robust Neural Networks With On-Line learning

For Blind Identification and Blind Separation of Sources, IEEE transactions on

Circuits and Systems – I: Fundamental Theory and Applications, Vol.43, pp 894-906.

[9] Cover, T. and Thomas, J, A, Elements of Information Theory, Wiley Series in

Telecommunications, 1991.

[10] Douglas, S, C., Cichocki, A., and Amari, S, Multichannel Blind Separation and

Deconvolution of Sources with Arbitrary Distributions. Proc. I.E.E.E Workshop

Neural Networks for Signal Processing, pp.436-444, 1997.

[11] Everitt, B, S., Cluster Analysis, Heinemann Educational Books, 1993.

Page 13: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

13

[12] Everitt, B, S, An Introduction to Latent Variable Models, London: Chapman and

Hall, 1984.

[13] Friedman, J. H, Exploratory Projection Pursuit. Journal of the American

Statistical Association, 82 (397):pp 249-266, 1987.

[14] Girolami, M. An Alternative Perspective on Adaptive Independent Component

Analysis Algorithms. Neural Computation, Vol. 10, No. 8, pp 2103 - 2114,1998.

[15] Girolami, M and Fyfe, C. Extraction of Independent Signal Sources using a

Deflationary Exploratory Projection Pursuit Network with Lateral Inhibition. I.E.E

Proceedings on Vision, Image and Signal Processing , Vol 14, No 5, pp 299 - 306,

1997.

[16] Hyvarinen, A., and Oja, E, A Fixed-Point Algorithm for Independent Component

Analysis. Neural Computation, Vol. 9, No. 7, pp. 1483-1492, 1997.

[17] Jones, M. C. and Sibson, R, What is Projection Pursuit. The Royal Statistical

Society. 150(1), pp. 1 – 36, 1987.

[18] Karhunen, J., Oja, E., Wang, L., Vigario, R., Joutsensalo J, A Class of Neural

Networks for Independent Component Analysis. IEEE Transactions on Neural

Networks, 8, pp 487 – 504, 1997.

[19] Pearson, K., Contributions to the Mathematical Study of Evolution. Phil. Trans.

Roy. Soc. A 185, 71, 1894.

Page 14: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

14

Figure 1

Page 15: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

15

Laminar Annular Homogenous

Figure 2 a

Page 16: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

16

Laminar Annular Homogenous

Figure 2 b

Page 17: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

17

Laminar Annular Homogenous

Figure 2 c

Page 18: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

18

Image No 1 : Kurtosis = -1.37 Image No 2 : Kurtosis = +0.31 Image No 3 : Kurtosis = -0.25

Figure 3

MN×ℜ∈A

N×ℜ∈ MW

Original Source Images

Observed Mixed Images

Page 19: A Common Neural Network Model for ... - bsp.brain.riken.jp · PDF fileThis paper develops a generalisation of the adaptive neural forms of the ... algorithm primarily as a method for

19

Figure Captions

Figure 1: Examples of the uni-variate Pearson mixture model for μ = 2 and σ2 = 1

and various parameter values a.

Figure 2 a: Plot of the posterior mean for each point in latent space using the GTM,

clearly the three flow regimes responsible for generating the twelve dimensional

measurements have been clustered successfully.

Figure 2 b: Plot of the twelve dimensional data projected onto a two dimensional

subspace which maximises the negentropy of the data within the subspace. The three

flow regimes responsible for the measurements have been clearly defined and in

addition the clustered nature of the laminar flow has been identified.

Figure 2 c: Plot of the twelve dimensional data projected onto the two dimensional

subspace whose basis is the first two principal components. The three flow regimes

responsible for the measurements have not been clearly defined or separated into

distinct clusters.

Figure 3 : Independent Component Analysis performed on a 5 x 3 mixture of three

images. One is super gaussian with a kurtosis value of + 1.37 another has a very small

value of positive kurtosis +0.307, with the third having a negative kurtosis of -0.25.

The two images with the small absolute values of kurtosis could be considered as

approximately mesokurtic.