Nonlinear mixtures - Aaltousers.ics.aalto.fi/juha/papers/finalchapter.pdf · problem for general nonlinear mixtures. These results clearly show that nonlin-ear ICA is a highly non-unique

Chapter 14

Nonlinear mixtures

C. Jutten, M. Babaie-Zadeh and J. Karhunen

14.1 Introduction

Blind source separation problems in linear mixtures have been intensively inves-tigated and are now well understood, see for example [26, 53, 29]. Since 1985,many methods have been proposed [60, 28, 30, 21, 27, 10, 54], especially basedon independent component analysis (ICA) [31] exploiting the assumption of sta-tistical independence of the source signals. These methods have been applied invarious domains, and many examples can be found in application chapters 16,19, 17, 18 of this book.

The linear mixing model, either without memory (instantaneous mixtures)or with memory (convolutive mixtures), is an approximated model, which isvalid provided that the nonlinearities in the mixing system are weak, or theamplitudes of the signals are limited. In various situations, this approximationdoes not hold: for instance when one uses sensors with hard nonlinearities, orwhen signal levels lead to saturation of conditioning electronic circuits. Thus, itis relevant to consider the blind source separation problem in the more generalframework of nonlinear mixtures.

This problem has been sketched in the early works by Jutten [59] where thebest linear separating solution was estimated, by Burel [25] for known nonlin-earities with unknown parameters, and by Parra et al. [84, 83] for nonlineartransform with Jacobian equal to 1. The essential problem of the existence ofsolutions has been considered at the end of 90’s by Hyvarinen and Pajunen [55]in a general framework, and by Taleb and Jutten [99] for particular nonlinearmixtures1. In addition to the theoretical interest, nonlinear mixtures are rele-vant for a few realistic applications, for example in image processing [6, 77, 44]

1In the framework of blind source separation. One can find results in statistics [34, 62] onthe related problem of factorial decomposition from beginning of 50’s.

579

580 CHAPTER 14. NONLINEAR MIXTURES

and in instrumentation [92, 20, 22, 37, 36, 38].Assume now that there are available T samples of a random vector x with

P components, which is modeled by

x = A(s) + b, (14.1)

where s is a random source vector in RN , whose components are the source

signals s1, s2, . . . , sN , assumed to be statistically independent, A is a nonlineartransform from R

N to RP , and b is additive noise vector which is independent

of the source signals.In this chapter, we assume, unless other assumptions are explicitly made,

that (i) there are as many observations as sources: P = N , (ii) the nonlineartransform A is invertible, and (iii) the noise vector is neglectible (b = 0).

The nonlinear blind source separation problem is then the following: Isit possible to estimate the sources s from the nonlinear observations x only?Without extra prior information, this problem is ill-posed and has no solution.For achieving a solution, we can add assumptions on the statistics of the sourcesignals, or prior information on their coloredness or nonstationarity.

This chapter is organized as follows. The next section 14.2 is devoted to theexistence and uniqueness of solutions provided by ICA in the general nonlinearframework. In Section 14.3, we consider the influence of structural constraintson identifiability and separability. In Section 14.4, we address regularizationeffect due to priors on sources. In Section 14.5, we focus on some propertiesof mutual information as independence criterion and of a quadratic criterion.A Bayesian approach for general nonlinear mixtures is considered in Section14.6. In Section 14.7, we briefly discuss other methods introduced for nonlinearmixtures. The interested reader can find more information on them in the givenreferences. We finish this chapter by a short presentation of a few applications(Section 14.8) of nonlinear BSS before conclusions (Section 14.9).

14.2 Nonlinear ICA in the general case

In this section, we present theoretical considerations which justify that statisti-cal independence alone is not sufficient for solving the blind source separationproblem for general nonlinear mixtures. These results clearly show that nonlin-ear ICA is a highly non-unique concept.

14.2.1 Nonlinear independent component analysis (ICA)

A natural extension of the linear ICA method to the nonlinear case consists inestimating, by using only observations x, a nonlinear transform B from R

N →R

N such that the random vector

y = B(x) (14.2)

has mutually independent components.

14.2. NONLINEAR ICA IN THE GENERAL CASE 581

Does the mapping B ◦ A, which preserves independence of the componentsof the source vector, allow to separate the sources? If it is possible, what arethe necessary or sufficient conditions?

14.2.2 Definitions and preliminary results

We do not recall the definition of mutual independence of random vectors, whichis based on factorization of probability density functions. We just provide thedefinition of the σ-diagonal transform.

Definition 14.1 A bijective function H from Rn to R

n is called σ-diagonal ifit preserves independence of any random vector.

This definition implies that any random vector2 x ∈ Rn with mutually inde-

pendent components is transformed to a random vector y = H(x) with mutuallyindependent components, too. The set of σ-diagonal transforms will be denotedT. One can prove the following theorem [96].

Theorem 14.1 A bijective function H from Rn to R

n is σ-diagonal if and onlyif its components hi, i = 1, . . . , n, satisfy:

Hi(u1, . . . , un) = hi(uσ(i)), i = 1, . . . , n, (14.3)

where the functions hi are from R to R, and σ denotes a permutation in the set{1, . . . , n}.

This theorem has the following corollary.

Corollary 14.2 A bijective function H from Rn to R

n is σ-diagonal if andonly if its Jacobian matrix is diagonal, up to any permutation.

In the following, transforms whose Jacobian matrices are diagonal up to anypermutation σ, will be called σ-diagonal. A priori, ICA provides a transformwhich preserves mutual independence only for particular sources, the sourcesinvolved in the mixing. It means that we cannot claim that the transform pre-serves independence for other distributions, and especially for any distributions,so that the transform is σ-diagonal. Consequently, separation cannot be guar-anteed. Moreover, even when the sources are separated, this happens only upto any permutation (which is not a problem) and up to any unknown nonlinearfunction (which is much more annoying). In fact, if u and v are two independentrandom variables, then for any transforms f and g, the random variables f(u)and g(v) are independent, too. Therefore, due to the distortions, the estimatedsources can strongly differ from original sources. Of course, sources are sep-arated, but the remaining indeterminacy is very undesirable for restoring thesources.

2that is, whatever its distribution is.


14.2.3 Existence and uniqueness of transforms preservingindependence

14.2.3.1 The problem

Following Darmois [33, 34], consider the factorial representation of a randomvector x in a random vector ζ with mutually independent components ζi:

x = H1(ζ). (14.4)

For studying the uniqueness of the representation, one can look for anotherfactorial representation of the random vector x in a random vector ω withmutually independent components ωi such that:

x = H1(ζ) = H2(ω). (14.5)

If there exist two factorial representations of x, with two different randomvectors ζ and ω, then there is no uniqueness.

14.2.3.2 Existence

Generally, for any random vector x = A(s) with components without particularproperties (especially not mutually independent), one can design a transformH such that H ◦ A preserves independence, but is not σ-diagonal, so that His not a separating transform as defined by (14.3). This result, which is basedon a constructive method similar to a Gram-Schmidt procedure to be discussedsoon, has been proved in the 1950’s by Darmois [33]. This result has also beenused in [55] for designing parametric families of solutions for nonlinear ICA.

14.2.3.2.1 A simple example We now present a simple example of a trans-form which preserves independence while being still mixing [99]. Let s1 be aRayleigh distributed random variable (with values in R

+) with a probabilitydensity function (pdf) ps1

(s1) = s1 exp(−s21/2), and s2 a random variable in-

dependent of s1, with a uniform distribution s2 ∈ [0, 2π). Let us then considerthe nonlinear transform

[y1, y2] = H(s1, s2)

= [s1 cos(s2), s1 sin(s2)](14.6)

whose Jacobian matrix is non-diagonal:

J =

(

cos(s2) −s1 sin(s2)sin(s2) s1 cos(s2)

)

. (14.7)

The joint pdf of y1 and y2 is

py1,y2(y1, y2) =

ps1,s2(s1, s2)

| det(J)|

=1

2πexp

(−y21 − y2

2

2

)

=( 1√

2πexp

−y21

2

)( 1√2π

exp−y2

2

2

)

.

14.2. NONLINEAR ICA IN THE GENERAL CASE 583

The joint pdf can be factorized, and thus one can conclude that the randomvariables y1 and y2 are independent. However, it is clear that these variablesare still mixtures of random variables s1 and s2. The transform H preservesindependence for the random variables s1 and s2 (Rayleigh and uniform), butnot for any random variables. Other examples can be found in the literature(for instance in [73]), or can be easily invented.

14.2.3.2.2 Method for designing non-separating transforms preserv-

ing independence Let x be a random vector, resulting for instance from amixture x = A(s) of mutually independent random variables s, where A is atransform with a non-diagonal Jacobian3. We propose to design an invertibletransform B (independent of A) which preserves independence while being stillmixing, i.e. such that the random vector y = B(x) provided by the invertibletransform B has mutually independent components and that B ◦ A has a non-diagonal Jacobian.

Since the transform B is invertible, one can write:

py(y) = px(x)/| detJB(x)|, (14.8)

where JB(x) is the Jacobian matrix associated to the transform B. Withoutloss of generality, one can assume the random variables yi, i = 1, . . . , n, areuniformly distributed in [0, 1]. Moreover, because the variables yi are assumedto be independent, they satisfy py(y) =

∏

i pyi(yi) = 1, and Eq. (14.8) simplifies

to

px(x) =| JB(x) | (14.9)

Looking for solutions with the following form:

B1(x) = h1(x1)B2(x) = h2(x1, x2)...Bn(x) = hn(x1, x2, . . . , xn),

(14.10)

Eq. (14.9) becomes:

px(x) =

n∏

i=1

∂Bi(x)

∂xi(14.11)

or, using Bayes theorem:

px1(x1)px2|x1

(x1, x2) . . . pxn|x1,...,xn−1(x1, . . . , xn) =

n∏

i=1

∂Bi(x)

∂xi. (14.12)

3in fact, in that case, the random vector x would have independent components.


By integrating (14.12), one gets the following solution:

B1(x1) = Fx1(x1)

B2(x1, x2) = Fx2|x1(x1, x2)

...Bn(x1, x2, . . . , xn) = Fxn|x1,...,xn−1

(x1, x2, . . . , xn)

(14.13)

where Fx1is the cumulative probability density function of the random variable

x1, and Fxk+1|x1,...,xkis the conditional cumulative probability density function

of the random variable xk+1, conditionally to x1, . . . , xk. Generally, the trans-form B ◦ A is a nonlinear transform which is non σ-diagonal since its Jacobianmatrix is not diagonal, but it transforms any random vector x to a randomvector y with independent components.

This result by Darmois is negative since it shows that for any mixturesx = A(s), there exists at least one non σ-diagonal transform H4 which is amixture of the variables x although it preserves the statistical independence.Consequently, using statistical independence without other constraints or priors,one gets a transform which preserves statistical independence but will not be aseparating transform (that is, σ-diagonal).

14.2.3.3 Conclusion

Independent components estimated from (14.1) can then be very different fromthe actual sources. Generally, using ICA for solving the source separation prob-lem in nonlinear mixtures requires additional information on sources or con-straints, in order to look for a transform in a restricted manifold G = B ◦ Awhich regularizes the solutions.

14.3 ICA for constrained nonlinear mixtures

In this section, we first present using the previous results the theoretical ICAframework when the transforms A and B have structural constraints. We thenstudy a few constraints on mixtures for which ICA is able to identify the sepa-rating σ-diagonal transforms.

14.3.1 Structural constraints

Let us assume that we restrict the transforms in a set denoted Q. For char-acterizing the indeterminacies for transforms G ∈ Q, one must solve the trickyequation for independence preservation that can be written

∀E ∈MN ,

∫

E

dFs1dFs2

· · ·dFsN=

∫

G(E)

dFy1dFy2

· · · dFyN(14.14)

4H depends on x, and, generally is not a transform preserving independence for otherrandom vectors u 6= x.

14.3. ICA FOR CONSTRAINED NONLINEAR MIXTURES 585

There MN is the set of all the measurable compacts in RN , and Fu is the

cumulative density probability function of the random variable u.Let us denote by T the set of σ-diagonal mappings. Then, one can define

the set5 P:

P = {(Fs1, Fs2

, . . . , FsN) / ∃G ∈ Q \ T) :

G(s) has independent components} (14.15)

This is the set of all the distributions for which there exist non σ-diagonaltransforms G ∈ Q (not belonging to the set T), and which preserves mutualindependence of the components of the source vector s.

Ideally, P should be an empty set, and T∩Q should contain identity functionas unique element. However, this is generally not satisfied:

1. Source separation is possible if the distributions of the sources belong tothe set P, which is the complementary set of P;

2. The sources are then restored up to a σ-diagonal transform, that is, atransform belonging to the set T ∩Q.

Solving (14.14), or defining the set of distributions P, is generally a trickyproblem, except for particular models Q such as linear invertible transforms.

14.3.2 Smooth transforms

Recently, multi-layer perceptron (MLP) neural networks [47] have been usedin [114, 8, 7] for estimating the nonlinear separating transform B. For justi-fying this choice, in addition to the universal approximation property of MLP,Almeida claims that restricting the target transforms to the set of smooth trans-forms6 generated by a MLP provides regularized solutions which ensures thatnonlinear ICA leads to source separation. However, the following example [12]proves that the smoothness property is not a sufficient condition for this pur-pose.

Without loss of generality, let us consider two independent random vari-ables s = (s1, s2)

T uniformly distributed in the interval [−1, 1], and the smoothnonlinear transform represented by the matrix

M =

(

cos(θ(r)) − sin(θ(r))sin(θ(r)) cos(θ(r))

)

(14.16)

where r ,√

s21 + s2

2. This transform is a rotation whose rotating angle θ(r)depends on the radius r:

θ(r) =

{

θ0(1− r)q , 0 ≤ r ≤ 10, r > 1

(14.17)

where q ≥ 2. Figure 14.1 shows the image of the region {−1 6 s1 6 1,−1 6s2 6 1} by the transformation for q = 2 and θ0 = π/2. One can easily compute

5In Eq. (14.15), the difference of two sets \ is defined as: Q\ T = {x/x ∈ Q and x /∈ T}.6f is a smooth transform if its derivatives of any order exist and are continuous.


−1 −0.5 0 0.5 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

y1

y 2

Figure 14.1: The smooth transform M is a mixing transform which preservesindependence of a uniformly distributed vector. On the figure, the curves y1

and y2 that are images of lines s1 and s2 after the transform M, respectively,show clearly these properties.

the Jacobian matrix of this transform [12]:

JM(r) =

(

cos(θ(r)) − sin(θ(r))sin(θ(r)) cos(θ(r))

)(

1− s2∂θ∂s1

−s2∂θ∂s2

s1∂θ∂s1

1 + s1∂θ∂s2

)

. (14.18)

Computing the determinant, one gets

detJM(r) = 1 + s1∂θ

∂s2− s2

∂θ

∂s1(14.19)

and since

s1∂θ

∂s2= s2

∂θ

∂s1=

s1s2

rθ′ (r) (14.20)

one finally gets detJM(r) = 1, and:

py1y2(y1, y2) = ps1s2

(s1, s2). (14.21)

From Eq. (14.18), one can deduce that the Jacobian matrix of this smoothtransform is non-diagonal, and consequently that the transform is mixing. How-ever, from Eq. (14.21), this transform preserves independence of the two randomvariables uniformly distributed on [−1, 1]. This counterexample proves that re-stricting nonlinear transforms to the set of smooth transforms is not a sufficientcondition for separation.


14.3.3 Example of linear mixtures

For linear invertible mixtures, the transform A is linear and can be representedby a square regular mixing matrix A. In this case, it is sufficient to constrainthe separating model B to belong to the set of square invertible matrices. Sourceseparation is obtained by estimating a matrix B such that y = Bx = Gs hasmutually independent components. The global mapping G = G is then anelement of the set of square invertible matrices Q.

The set of σ-diagonal linear transforms, T ∩ Q is the set of matrices equalto the product of a permutation matrix and a diagonal matrix. From Darmois-Skitovich theorem [33], it is clear that the set P contains distributions with atleast two Gaussian components. This result is similar to the Comon’s identifi-ability theorem [31] which proves that for a linear invertible mixture, indepen-dence allows to separate sources up to a permutation and a diagonal matrices,provided that there is at most one Gaussian source.

14.3.4 Conformal mappings

Hyvarinen and Pajunen [55] show that if one restricts the nonlinear mappingsto the set of conformal mappings (Q), ICA allows to separate the sources.

Definition 14.2 A conformal mapping is a mapping which preserves orientedangles.

Conformal mappings are often considered in the framework of functions ofcomplex-valued variables, which are restricted to plane (two-dimensional) map-pings7. We then have the following theorem:

Theorem 14.3 Let Z = f(z) be a holomorphic function defined in a domainD. If ∀z ∈ D, f ′(z) 6= 0, the mapping Z = f(z) is a conformal mapping.

This result shows that the set of conformal mappings is contained in theset of smooth mappings, due to the property of algebraic angle preservation.Hyvarinen and Pajunen prove that ICA is able to estimate a separating map-ping, up to a rotation, provided that the following conditions hold:

• the mixing mapping A is a conformal mapping, such that A(0) = 0,

• each source has a known bounded support.

It seems that the extension of this result to conformal mappings in N di-mensions has not been considered. Of course, the angle preservation conditionseems very restrictive. In particular, it is not very realistic in the framework ofnonlinear mappings associated to a set on a nonlinear sensor array.

7This kind of mapping is frequently used for solving problems with intricated geometry, ina transformed domain where the geometry becomes simple. For instance, Joukovski mappingis a classical example for studying profiles of plane wings in aeronautics.


Af1

fn

h1

hn

B-

-

-

-

-

-

-

-

-

-

en

e1

xn

x1

zn

z1

sn

s1

yn

y1

......

......

Mixing system� - Separating system� -

Figure 14.2: Mixing and separating models for post-nonlinear (PNL) model.

14.3.5 Post-nonlinear (PNL) mixtures

model Initially, post-nonlinear mixtures are inspired by devices for which thesensors and their amplifiers used for signal conditioning are assumed to be non-linear, for example due to saturation. In addition to its relevance for most sensorarrays, as we shall see, this model has the nice property that it is separable usingICA with weak indeterminacies.

14.3.5.1 PNL model

In post-nonlinear mixtures (PNL), observations are mixtures with the followingform (Figure 14.2):

xi(t) = fi

(

N∑

j=1

aijsj(t))

, i = 1, . . . , N (14.22)

The PNL mixture consists of a linear mixture As(t), followed on each channel bya nonlinear mapping fi. In addition, we assume that the linear mixing matrix A

is regular one with P = N , and that the nonlinear mappings fi are invertible.In the following, a PNL mixture x(t) = f(As(t)) will be simply denoted by(A, f). In addition to its theoretical interest, the PNL model belongs to theclass of L-ZMNL8 models, which suits perfectly to many realistic applications.For instance, one can meet such models in sensor arrays [82, 20], in satellitecommunication systems [89], and in biological systems [65].

As we explained above, the main issue concerns the identifiability of themixture model (leading to the separability, if A is regular and f invertible) fromthe statistical independence assumption. In this purpose, it is first necessary toconstrain the separation structure B so that:

1. B is able to invert the mixture in the sense of Eq. (14.3);

2. B is as simple as possible for reducing residual distortions gi, using onlysource independence.

8L for Linear and ZMNL for Zero-Memory NonLinearity: it is then a separable system,with a linear stage followed by a nonlinear mapping.


Under these two constraints, we choose as a separation structure B the mirrorstructure of the mixing structure A = (A, f) (see Figure 14.2). We denote thepost-nonlinear separating structure y(t) = Bh(x(t)) by (h,B). The globalmapping G is then an element of the set Q of mappings which are the cascadeof a linear invertible mixture (regular matrix A) followed by stepwise invertiblenonlinear mappings, and then by another invertible linear mapping (regularmatrix B).

14.3.5.2 Separability using ICA

In [99], it is shown that independence of the components of the output vectory is sufficient to identify PNL mixtures with the same indeterminacies as forlinear mixtures.

Lemma 14.4 Consider a PNL model (A, f) and a separating structure (h,B)such that (H1) A is a regular matrix having at least two nonzero entries in eachrow or each column, (H2) the functions fi are invertible, (H3) B is a regularmatrix, (H4) gi = hi◦fi satisfies g′i(u) 6= 0, ∀i, ∀u ∈ R, (H5) at most one sourcesi is Gaussian, and each source has a pdf which is equal to zero on a compact.Then the vector y = B ◦A(s) has mutually independent components if and onlyif the mappings hi are affine mappings and if B satisfies BA = Π∆, where Π

is a permutation matrix and ∆ is a diagonal matrix.

The condition on the mixing matrix A shows that the estimation of nonlin-ear mappings is possible only if the mixing is “sufficiently” mixing. Althoughthis seems surprising at first glance, this result is easy to understand. Let usassume that the mixing matrix A is diagonal, and the observations fi(aiisi) aremutually independent random variables. Sources are then already separated,consequently it is impossible (without extra prior) to estimate sources with aweaker indeterminacy than an unknown nonlinear mapping.

The condition that the probability density function (pdf) must be equal tozero is a technical condition used in the proof, but it does not seems to betheoretically necessary. In fact, Achard and Jutten [1] extended this results byrelaxing this assumption.

Lemma 14.5 Let (A, f) be a PNL mixture and (h,B) a PNL separating struc-ture such that (H1) A is invertible, and ∀i, j such that aij 6= 0, ∃k 6= j/aik 6=0, or ∃l 6= i such that alj 6= 0, (H2) mappings hi(.) are differentiable and in-vertible, (H3) the random vector s has independent components si, with at mostone of which is Gaussian, (H4) the pdf of each source is differentiable and itsderivative is continuous on its support. Then the random vector y = B ◦ A(s)has independent components if and only if hi ◦fi, ∀i = 1, . . . , N are linear func-tions and BA = Π∆, where Π is a permutation matrix and ∆ is a diagonalmatrix.

Assumption H1 is a necessary condition, which ensures that the mixingmatrix is sufficiently mixing. If this condition is satisfied, there is no non-zeroisolated entry aij , i.e. without other nonzero entry in the row i or in the column


j. If a nonzero and isolated entry aij exists, the mixing xi = fi(aijsj) wouldbe independent of all the other mixings: the source sj would then be alreadyseparated in xi = fi(aijsj), and it would not be possible to estimate the inverseof the function fi and consequently to retrieve the source sj up to a scalingfactor.

As a conclusion, PNL mixtures are identifiable using ICA for sources of whichat most one is Gaussian (the set P contains the multivariate distributions whichhave at most two Gaussian components), with the same indeterminacies thanthe linear mixtures (the set of linear σ-diagonal mappings T ∩Q is the set ofsquare matrices which are the product of a permutation matrix and a diagonalmatrix) if the mixing matrix A is “sufficiently” mixing.

14.3.5.3 Extension to nonlinear mixtures with memory

Identifiability of PNL mixtures can be generalized to convolutive PNL mixtures(CPNL), in which the scalar mixing matrix A is replaced by a matrix A(z),whose entries are linear filters, and each source is independent and identicallydistributed (iid) [14]. In fact, denoting A(z) =

∑

k Akz−k, and using the fol-lowing notations:

s ,(

. . . , sT (k − 1), sT (k), sT (k + 1), . . .)T

(14.23)

x ,(

. . . ,xT (k − 1),xT (k),xT (k + 1), . . .)T

, (14.24)

one gets:

x = f(

As)

(14.25)

where f acts componentwise, and:

A =

· · · · · · · · · · · · · · ·· · · Ak+1 Ak Ak−1 · · ·· · · Ak+2 Ak+1 Ak · · ·· · · · · · · · · · · · · · ·

. (14.26)

The iid nature of each source, i.e. the temporal independence of samplessi(k), i = 1, . . . , T , ensures the spatial independence of s. Thus, CPNL mixturescan be considered as particular PNL mixtures (the mixing matrix A is a block-Toeplitz matrix). For mixing matrices A(z) whose entries are finite impulseresponse (FIR) filters, Eq. (14.25) is associated to a PNL mixture of finitedimension and separation results (for instantaneous PNL mixtures) hold. Formixing matrices whose entries are infinite impulse response (IIR) filters, Eq.(14.25) is equivalent to a PNL mixture of infinite dimension, for which we haveno separability proof, and we just conjecture the separability.

Moreover, by using a suitable parameterization, Wiener systems (Figure14.3) can be viewed as particular PNL mixtures. Consequently, identifiabilityof PNL mixtures leads to invertibility of Wiener systems [100].


A(z) f(·)-- -s(k) e(k) x(k)

Figure 14.3: A Wiener system consists of a filter A(z) followed by a nonlinearmapping f(.)

14.3.6 Bilinear mixtures

S. Hosseini and Y. Deville have addressed [50, 51] the separation of “bilinear”(or “linear-quadratic”) mixing models, that is, mixing systems of the form:

{

x1 = s1 − l1s2 − q1s1s2

x2 = s2 − l2s1 − q2s1s2(14.27)

The Jacobian of the above mixing model is:

J = 1− l1l2 − (q2 + l2q1)s1 − (q1 + l1q2)s2 (14.28)

In their works, Hosseini and Deville have shown that the nonlinear mapping (14.27)is invertible if the values of the parameters (l1, l2, q1, q2) and the range of vari-ations of s1, s2 are such that either J > 0 for all values of (s1, s2), or J < 0 forall values of (s1, s2). On the other hand, if J > 0 for some values of (s1, s2)and J < 0 for some other values of (s1, s2), then the above bilinear transfor-mation is not bijective, and we cannot determine (s1, s2) from (x1, x2) even ifthe values of the parameters l1, l2, q1, q2 were known. Figure 14.4 shows thetransformation of the region si ∈ [−0.5, 0.5] for two different set of values ofparameters, one of which correspond to an invertible mapping and the other toa non-invertible mapping. Note that one may consider the non-invertible caseas a ‘highly nonlinear’ mixture, which is not separable.

Assuming that for a problem at hand the bilinear model is invertible, Hos-seini and Deville have proposed that the recurrent structure of Figure 14.5(which is inspired by the early work of Herault et. al. [48]) is able to retrievethe source signals. In fact, this recurrent structure can be written as:

{

y(k+1)1 (·) = x1(·) + l1y

(k)2 (·) + q1y

(k)1 (·)y(k)

2 (·)y(k+1)2 (·) = x2(·) + l2y

(k)1 (·) + q2y

(k)1 (·)y(k)

2 (·)(14.29)

Comparing the above equation with (14.27), one sees that if y(k)i (·) = si(·), i ∈

{1, 2}, then y(k+1)i (·) = y

(k)i (·), i ∈ {1, 2}, that is, the above iteration has con-

verged. They have also studied the stability of the above recurrent structureand shown that it is stable at the point (y1, y2) = (s1, s2) if and only if theabsolute values of the two eigenvalues of the Jacobian matrix of the mapping(14.27) are smaller than one.


−0.6 −0.3 0 0.3 0.6−0.6

−0.3

0

0.3

0.6

x1

x 2

−1.5 −1 −0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

1

1.5

x1

x 2

Figure 14.4: The transformation of the region si ∈ [−0.5, 0.5] of (s1, s2) planeto the (x1, x2) plane, by the bilinear transformation (14.27) for l2 = −l1 = 0.5,and Left) q2 = −q1 = 0.8, Right) q2 = −q1 = 3.2. The left transform isbijective (J > 0 everywhere) and hence invertible, while the right transform isnot bijective and hence non-invertible.

q1

l1

l2

q2

m

m

m

x1

x2 y2

y1q q

q

qq

-

-

6

6

?

?�

�

-

-

?

6

��

��

��

LL

LL

LL

�

×

+

+

Figure 14.5: Recurrent structure used by Hosseini and Deville [50, 51] for in-verting bilinear mixtures.

In the previous paragraphs it has been assumed that all the parametersl1, l2, q1, q2 are known. In blind source separation, however, these parametershave to be estimated from the data. Hosseini and Deville then assume that theseparameters can be estimated by the statistical independence of the outputs9.Then, they propose an iterative algorithm of the form:

• Repeat

1. Repeat iterations (14.29) until convergence.

9Note that here Hosseini and Deville have implicitly assumed that the invertibility of thebilinear mapping (14.27) insures its separability. In fact, invertibility of (14.27) means that ifl1, l2, q1, q2 are known one can obtain (s1, s2) from (x1, x2); while (blind) separability meansthat in a separating structure, the independence of the outputs guarantees the separation ofthe sources. Hence, the (blind) separability of invertible bilinear mixing models remains asan open question.


2. Update the estimated values of the parameters l1, l2, q1, q2, based ona measure of statistical independence of y1 and y2.

• Until convergence

For the second step (updating the values of the parameters), Hosseini andDeville have proposed two methods. In [50], they use an iteration which isvery similar to what is used in the original Herault-Jutten algorithm [48] andis based on nonlinear decorrelation as a measure of independence. Then in [51]they develop a maximum likelihood (ML) criterion, and use a steepest ascentiteration for maximizing it in the second step of the above general algorithm.Another method based on minimizing the mutual information of the outputshas been recently proposed by Mokhtari et. al. [78].

14.3.7 A class of separable nonlinear mappings

Due to Darmois’ results [33, 34], which are very interesting for linear mixtures,a natural approach is to apply a transform so that nonlinear mixtures becomelinear ones.

14.3.7.1 Multiplicative mixtures

As an example, consider the multiplicative mixture

xj(t) =

N∏

i=1

sαji

i (t), j = 1, . . . , N (14.30)

where si(t)’s are independent sources with values in R∗+. Taking the logarithm

of (14.30) leads to

log xj(t) =N∑

i=1

αji log si(t), j = 1, . . . , N (14.31)

which is a linear mixture of the random variables log si(t).This kind of mixture can model dependence between temperature and mag-

netic field in Hall effect sensors [9], or between incident light and object re-flectance in an image [41]. Consider now the first example in more detail. TheHall voltage [88] is equal to

VH = kBT α, (14.32)

where α depends on the semi-conductor type (N or P), because temperatureinfluence is related to mobility of majority carriers. Thus, using sensors of bothN-type and P-type, one gets:

{

VHN(t) = kNB(t)T αN (t),

VHP(t) = kP B(t)T αP (t).

(14.33)


For simplifying notations, in the following we forget the variable t. Sincethe temperature T is positive, but the magnetic field B can be both positive ornegative, after taking the logarithm, one obtains the following equations:

{

log | VHN| = log kN + log | B | +αN log T,

log | VHP| = log kP + log | B | +αP log T.

(14.34)

These equations are related to a linear mixture of two sources log | B | andlog T . They can be easily solved by simple decorrelation because B appearswith the same power in both equations (14.33). It is then simpler to computedirectly the ratio of the two equations (14.33):

R =VHN

VHP

=kN

kPT αN−αP (14.35)

which only depends on the temperature T . For estimating the magnetic field, itis sufficient to estimate the parameter k such that VHN

Rk is uncorrelated withR. One can then deduce B(t), up to a multiplicative gain. The final estimationof B and T requires a calibration step for restoring sign and scale.

This idea is also used for homomorphic filtering [95] in image processing,more precisely for removing the incident light contribution from the object re-flectance. Assuming the incident light contribution is a low frequency bandsignal, a simple low-pass filtering applied on the logarithm of the image is asimple but efficient processing method.

14.3.7.2 Linearizable mappings

Extension of Darmois-Skitovic theorem to nonlinear mappings has been ad-dressed by Kagan et al. [62]. These results have been considered again in theframework of source separation of nonlinear mixtures by Eriksson and Koivunen[41]. The main idea is to consider particular mappings F , satisfying an additiontheorem in the sense of the theory of functional equations.

14.3.7.2.1 A simple example Let us consider the nonlinear mapping F(s1, s2)of two independent random variables s1 and s2:

{

x1 = (s1 + s2)(1 + s1s2)−1,

x2 = (s1 − s2)(1− s1s2)−1

By using the following change of variables ui = tan−1(si), Eq. (14.36) becomes:{

x1 = tan(u1 + u2),

x2 = tan(u1 − u2).

Applying again the transform tan−1 to the variables xi, and denoting vi =tan−1(xi), one finally gets:

{

v1 = tan−1(x1) = u1 + u2,

v2 = tan−1(x2) = u1 − u2.


These equations are linear mixtures of the two independent variables u1 andu2. This result is simply due to the fact that tan(a + b) (and tan(a − b)) is afunction of tana and tan b, in other words because there exists a function Fsuch that tan(a + b) = F(tan a, tan b).

14.3.7.2.2 General result More generally, Kagan et al. [62] show thatthis property appears provided that there exist a transform F and an invertiblefunction f with values in the open set S of R satisfying the addition theorem,such that:

f(s1 + s2) = F [f(s1), f(s2)]. (14.36)

The properties required for the transform F (for two variables, but general-ization to higher dimensions is straightforward) are the followings:

• F is continuous, at least separately with respect to each variable;

• F is commutative, i.e. ∀(u, v) ∈ S2, F(u, v) = F(v, u);

• F is associative, i.e. ∀(u, v, w) ∈ S3, F(F(u, v), w) = F(u,F(v, w));

• There exists a neutral element e ∈ S such that ∀u ∈ S, F(u, e) =F(e, u) = u;

• ∀u ∈ S, there exists an inverse element u−1 ∈ S such that F(u, u−1) =F(u−1, u) = e.

In other words, denoting u ◦ v = F(u, v), these conditions imply that the setS with the operation ◦ is a commutative group. Under this condition, Aczel [5]shows that there exists a monotonous and continuous function f : R → S suchthat

f(x + y) = F(f(x), f(y)) = f(x) ◦ f(y). (14.37)

In fact, by applying f−1 (which exists since f is monotonous) to Eq. (14.37),one gets

x + y = f−1(F(f(x), f(y))) = f−1(f(x) ◦ f(y)). (14.38)

By using the associative property of F and the relation (14.37), and posingy = x, one can define a multiplication by an integer c, denoted ⋆:

c ⋆ f(x) = f(cx). (14.39)

This multiplication can be extended to a multiplication by real variables α:α ⋆ f(x) = f(αx).

By computing the inverse f−1 and posing f(x) = u, one obtains:

cf−1(u) = f−1(c ⋆ u). (14.40)

Then, for any constant c1, . . . , cn and for any random variables u1, . . . , un,the following relation holds:

c1f−1(u1) + . . . cnf−1(un) = f−1(c1 ⋆ u1 ◦ . . . ◦ cn ⋆ un). (14.41)

Finally, Kagan et al. [62] present the following theorem:


Theorem 14.6 Let u1, . . . , un be independent random variables such that{

x1 = a1 ⋆ u1 ◦ . . . ◦ an ⋆ un

x2 = b1 ⋆ u1 ◦ . . . ◦ bn ⋆ un

(14.42)

are independent, and such that the operators ⋆ and ◦ satisfy the above conditions.Denoting by f the function defined by the operator ◦, f−1(ui) is Gaussian ifaibi 6= 0.

In fact, this theorem is nothing but the Darmois-Skitovich theorem withlight modifications: by applying f−1 to equations (14.42), posing f−1(ui) = si

and taking into account the properties of the operators ◦ and ⋆, one gets exactlythe Darmois’ equations [34].

14.3.7.2.3 Application to source separation This theorem can be usedfor source separation. With such mixtures, a source separation algorithm con-sists of three main steps [41]:

1. Transform using f−1 the nonlinear observations xi in order to obtain linearmixtures of “new” sources si = f−1(ui) ;

2. Solve the linear mixtures of si using a method for linear source separation(e.g. ICA) ;

3. Restore the actual independent sources by applying on sources si the trans-form ui = f(si).

Unfortunately, this algorithm is not blind, since the mapping f must beknown. If this condition is not satisfied, a possible architecture is a three stagecascade. The first stage consists of nonlinear blocks (for instance, multi-layerperceptrons) able to approximate f−1. The second stage is a matrix B ableto separate sources in linear mixtures. The structure of the third stage, whichmust approximate the mapping f , is similar to the structure of the first stage.One can remark that the first two stages are similar to the separating structureof PNL mixtures10. One can then compute the independent (distorted) sourcessi using a separation algorithm for PNL mixtures. Then, using the first stage(which provides an approximation of f−1), one can identify the third stage(which must approximate f) and restore the initial independent sources.

PNL mixtures are similar to these nonlinear mappings. In fact, they aremore general since the nonlinear functions fi can be different and unknown.Consequently, the algorithms developed for source separation in PNL mixtures[99, 14, 2, 45, 46] can be used for separating these nonlinear mixtures blindly,avoiding the third stage. Other examples of nonlinear mappings satisfying theaddition theorem are proposed in [62, 41]. However, realistic mixtures belongingto this class of mappings do not seem to be commonplace, except for PNLmixtures (14.22) and multiplicative mixtures (14.30).

10In fact, the first stage is slightly simpler since the functions fi are all the same, contraryto PNL mixtures.

14.4. PRIORS ON SOURCES 597

−1 0 1−1

−0.5

0

0.5

1Sources

−1 0 1−1

−0.5

0

0.5

1Linear mixtures

−1 0 1−1

−0.5

0

0.5

1PNL mixtures

Figure 14.6: Joint distributions of sources s (left), of linear mixtures e (middle)and of PNL mixtures x (right)

14.4 Priors on sources

In this section, we show that priors on sources can relax indeterminacies orsimplify algorithms. The first example takes into account the fact that thesources are bounded. The second one exploits the temporal correlation of thesources.

14.4.1 Bounded sources in PNL mixtures

Let us consider sources with a nonzero probability density function (pdf) ona bounded support, with nonzero values at the bounds of the support. As anexample, sources having uniform distribution or the distribution associated toa sinewave signal with regular sampling satisfy this condition. For the sake ofsimplicity, we only consider PNL mixtures (Figure 14.2) of two sources, but theresults can be easily extended to mixtures with more than two sources. Using theindependence condition, ps1s2

(s1, s2) = ps1(s1)ps2

(s2), one can deduce that thejoint distribution of the random vector s is bounded by a rectangle. After a linearmixture A, the joint distribution e = As is bounded in a parallelogram. Then,after componentwise mapping with nonlinear functions fi, the joint distributionof the PNL observations x is bounded by a distorted parallelogram (see Figure14.6). One can prove [15] the following theorem:

Theorem 14.7 Consider the mapping

{

z1 = g1(e1)z2 = g2(e2)

(14.43)

where g1 and g2 are two analytic functions11. Assume that the borders of anyparallelogram in the plane (e1, e2) are transformed to the borders of anotherparallelogram in the plane (z1, z2), and that the borders of this parallelogram are

11A function is called to be analytic on an interval, if one can expand it in Taylor series onthis interval.


not parallel to the coordinate axes. Then there exist real constants a1, a2, b1,and b2 such that

{

g1(u) = a1u + b1

g2(u) = a2u + b2(14.44)

Remarks

• This theorem provides another separability proof of PNL mixtures, forbounded sources.

• The existence of the constants b1 and b2 clearly shows another indeter-minacy on the sources: in fact, sources are estimated up to an additiveconstant. This indeterminacy also exists in linear mixtures, but disappearsbecause we assume that sources have zero mean. In other words, in PNLmixtures, the estimated sources can be written as yi(t) = αisσ(i)(t) + βi.

• One also finds that if the linear part A of the PNL model is not mixing,i.e. if A is diagonal, the joint distribution of the mixtures e = As isbounded in a rectangle (a parallelogram whose borders are parallel toaxes). Consequently the PNL observations x = f(e) are bounded in arectangle, too, and the estimation of nonlinear functions fi is impossible.

• These results have recently been generalized for more than two sourceswith bounded densities in [102].

This theorem suggests a two step geometrical algorithm for separating boundedsources in PNL mixtures:

• Estimate the invertible functions g1 and g2 which transform the joint dis-tribution of observations into a parallelogram. Using the above theorem,this step compensates the nonlinear distortion of the mixture.

• Separate the sources in the resulting linear mixture, by using an algorithmfor linear ICA.

Details of the algorithm and a few experimental results are given in [15].This method shows that the bounded source assumption provides very usefulextra information, which simplifies source separation in PNL mixtures: linearand nonlinear parts of the separation structure can be optimized independently,according to two different criteria.

14.4.2 Temporally correlated sources in nonlinear mix-tures

As we have explained in previous sections, independence is generally not a suf-ficient assumption for separating the sources in nonlinear mixtures. Withoutregularization, ICA can provide either good solutions corresponding to a σ-diagonal mapping of sources, or bad solutions for which estimated sources arestill mixtures of original sources. The main question is then the following: how

14.4. PRIORS ON SOURCES 599

can we distinguish good solutions from the bad ones? Hosseini and Jutten [52]suggest that temporal correlation between successive samples of each source canbe useful for this purpose.

14.4.2.1 A simple example

Let us return back to the simple example presented in paragraph 14.2.3.2.1,where we assume now that sources s1, s2, y1 and y2 are signals. If the signalss1(t) and s2(t) are temporally correlated and mutually independent, one canwrite:

E{s1(t1)s2(t2)} = E{s1(t1)}E{s2(t2)}, ∀t1, t2. (14.45)

So, in addition to the decorrelation condition

E{y1(t)y2(t)} = E{y1(t)}E{y2(t)}, (14.46)

many supplementary equations, allowing to reject the bad solutions y1(t) andy2(t), can be written:

E{y1(t1)y2(t2)} = E{y1(t1)}E{y2(t2)} ∀t1 6= t2. (14.47)

It is clear that if y1(t) and y2(t) are the original sources (up to any σ-diagonaltransform), the above equality holds ∀t1, t2. Moreover, if the independent com-ponents are obtained from mapping (14.6) (this mapping is nonlinear, but pre-serves independence of random variables), the right hand side term of (14.47)is equal to zero because y1 and y2 are zero-mean Gaussian variables . The lefthand side term of (14.47) is equal to

E{y1(t1)y2(t2)} = E{s1(t1) cos(s2(t1))s1(t2) sin(s2(t2))}= E{s1(t1)s1(t2)}E{cos(s2(t1)) sin(s2(t2))} (14.48)

If s1(t) and s2(t) are temporally correlated, there probably exists a pair t1, t2such that (14.48) is not equal to zero (of course, this depends on the nature ofthe temporal correlation between successive samples of each source), so that theequality (14.47) does not hold, and the solution can be rejected. In fact, thetwo stochastic processes y1(t) and y2(t) obtained by (14.6) are not independentalthough at each instant their samples (which are two independent randomvariables) are independent.

This simple example shows how by using temporal correlation of the sources,one can distinguish the σ-diagonal mappings, which preserve independence, or,at least reduce the set of non σ-diagonal mappings preserving independence.Here, we just use the cross-correlation of the signals (second order statistics),which is a first (but rough) step towards independence. We could also considersupplementary equations for improving the independence test, for example, byusing cross-correlations of orders greater than two for y1(t1) and y2(t2), whichmust satisfy:

E{yp1(t1)y

q2(t2)} = E{yp

1(t1)}E{yq2(t2)}, ∀t1, t2, ∀p, q 6= 0 (14.49)


14.4.2.2 Darmois decomposition with colored sources

As we presented in paragraph 14.2.3, another classical example for showing thenon-identifiability of nonlinear mixtures is the Darmois decomposition method.Let us consider two random signals s1(t) and s2(t), whose samples are inde-pendent and identically distributed (iid), and assume that x1(t) and x2(t) arenonlinear mixtures of them. By using the Darmois decomposition [33, 55], onecan derive new signals y1(t) and y2(t) which are statistically independent, al-though the related mapping is a mixing mapping (that is, not a σ-diagonalmapping):

{

y1(t) = Fx1(x1(t)),

y2(t) = Fx2|x1(x1(t), x2(t))

Let us denote by Fx1and Fx2|x1

the cumulative probability density func-tions of the observations. If the sources are temporally correlated, one canshow [52] that the independent components y1 and y2 obtained from the abovedecomposition generally do not satisfy the following equality for t1 6= t2:

py1,y2(y1(t1)y2(t2)) = py1

(y1(t1))py2(y2(t2)) (14.50)

Conversely, σ-diagonal mappings of the actual sources, which can be writteny1 = f1(s1) and y2 = f2(s2), clearly satisfy the above equality due to theassumption of independent sources. So the above equations can be used forrejecting (or at least for restricting) non σ-diagonal solutions provided by ICAusing the Darmois decomposition.

Of course, this theoretical result does not constitute a proof of separabilityof nonlinear mixtures for temporally correlated sources. It simply shows thateven weak prior information on the sources is able to reduce the typical inde-terminacies of ICA in nonlinear mixtures. In fact, with this information, ICAprovides many equations (constraints) which can be used for regularizing thesolutions and achieving semi-blind source separation12.

14.5 Independence criteria

14.5.1 Mutual information

ICA algorithms exploit independence assumption. This assumption can be writ-ten using contrast functions, for which we shall look for the simplest expression.For linear mixtures, the simplest contrast functions are based on fourth-ordercumulants. However, generally, one can consider that these particular contrastsare derived from the contrast −I{y}, the opposite of mutual information, us-ing various approximations of the pdf [10, 31, 53]. In this paragraph, we shallmainly focus on the minimization of the mutual information, and then in lessdetail on a quadratic independence criterion.

12Semi-blind since prior information is used.

14.5. INDEPENDENCE CRITERIA 601

Denoting by H{x} = −E{log px} the differential entropy, one can write

I{y} =∑

i

H{yi} −H{y}, (14.51)

where H{yi} is the marginal entropy of the estimated source yi, and H{y} isthe joint entropy of the estimated source vector.

14.5.1.1 Special case of nonlinear invertible mappings

For nonlinear invertible mappings B, the estimated source vector is y = B(x),and with a simple change of variable one can write [32]

I{y} =∑

i

H{yi} −H{x} − E{log | detJB|}, (14.52)

where JB is the Jacobian matrix of the mapping B. For estimating the separa-tion structure B, minimizing I{y} is then equivalent to minimizing the simplifiedcriterion

C(y) =∑

i

H{yi} − E{log | detJB|} (14.53)

because H{x} does not depend on B.This criterion is simpler to use since its estimation and optimization only

requires estimation of the mathematical expectation of a Jacobian and of themarginal entropies, and consequently only of marginal pdf (and not of jointmultivariate pdf).

14.5.1.2 Mutual information for PNL mixture

For PNL mixtures, the separating mapping is B(x) = B◦h, where B is a regularmatrix, and h(x) is the vector (h1(x1), . . . , hN(xN ))T . The simplified criterionbecomes then

C(y) =∑

i

H{yi} − log | detB| − E{log |∏

i

h′i(xi)|} (14.54)

14.5.1.2.1 Linear part With respect to the linear part (matrix B), theminimization of the criterion (14.54) leads to the same estimating equationsthan for linear mixtures:

∂C(y)

∂B= E{ϕy(y)xT } −B−T } = 0 (14.55)

In this relation, the components {ϕy(y)}i of the vector ϕy are the scorefunctions of the components yi of the estimated source vector y:

{ϕy(y)}i = ϕyi(yi) = − d

dyilog pyi

(yi) = −p′yi

(yi)

pyi(yi)

(14.56)


where pyi(yi) is the pdf of yi, and p′yi

(yi) its derivative. By multiplying from

the right with BT , one obtains the estimating equations

E{ϕyyT } − I = 0 (14.57)

In practice, one can derive from this equation an equivariant algorithm bycomputing the natural gradient or the relative gradient [30, 27, 10]. This pro-vides equivariant performance, i.e. which does not depend on A, for noiselessmixtures.

14.5.1.2.2 Nonlinear part Concerning the nonlinear stage, by modelingthe functions hi(·) with parametric models hi(θi, ·), the gradient of the simplifiedcriterion (14.54) can be written [99]

∂C(y)

∂θk= −E

{

∂ log | h′k(θk, xk) |∂θk

}

− E

{

N∑

i=1

ϕyi(yi)bik

∂hk(θk, xk)

∂θk

}

,

(14.58)

where xk is the k-th component of the observation vector, and bik is the elementik of the separation matrix B. Of course, the exact equation depends on theparametric model. In [99], a multi-layer perceptron (MLP) was used for model-ing each function hk(θk, .), k = 1, . . . , N , but other models, parametric or notcan be used [2].

By considering a non-parametric model of the nonlinear functions hi, i.e. bysimply considering the random variables zk, one obtains the following estimatingequations [98, 2]:

E

{

N∑

i=1

bikϕyi(yi) | zk

}

= ϕzk(zk) (14.59)

where E{. | .} denotes conditional expectation.

14.5.1.2.3 Influence of accuracy of score function estimation Con-trary to linear mixtures, separation performance for nonlinear mixtures stronglydepends on the estimation accuracy of the score function (14.56) [99]. In fact,for the linear part it is clear that if one is close to the solution, i.e. if the es-timated outputs yj are mutually independent, the estimating equations (14.57)become:

E{ϕyi(yi)yj} = E{ϕyi

(yi)}E{yj} = 0, ∀i = 1, . . . , N and i 6= j, (14.60)

because the random variables yj are zero mean. And the equality holds whateverthe accuracy of the score function estimation is!

For the nonlinear part, the estimating equations (14.59) depend on scorefunctions of both yi and zk. The equations are satisfied only if the score func-tion estimation of both yi and zk is very accurate. This result explains the

14.5. INDEPENDENCE CRITERIA 603

weak performance obtained (except if the mixtures are weakly nonlinear) byestimating the pdf using a fourth-order Gram-Charlier expansion [114, 99], andthen deriving the score functions by derivation. For harder nonlinearities, it ismuch better to compute a kernel estimate of the pdf. One can also estimatedirectly the score functions based on a least square minimization [87, 97].

14.5.2 Differential of the mutual information

Minimization of the criterion (14.54) is simple since it only requires estimationof the marginal pdf’s. However, this leads to biased estimates [3]. Moreover,this method is not applicable to convolutive mixtures, since there exist no simplerelationships between pdf of x(t) and y(t) = [B(z)]x(t), where the filter matrixB(z) models (in discrete time) the convolutive mixture. In [17, 13], Babaie-Zadeh et al. considered direct minimization of mutual information (14.51), andwe present the main results in the following.

14.5.2.1 Definitions

For a random vector x, marginal and joint score functions are defined as follows.

Definition 14.3 (MSF) The marginal score function (MSF) of a random vec-tor x is the vector denoted ϕx(x), whose i-th component is equal to

{ϕx(x)}i = −d log pxi(xi)

dxi(14.61)

Definition 14.4 (JSF) The joint score function (JSF) of a random vector x

is the gradient of − log px(x). It is denoted by ψx(x), and its i-th component isequal to

{ψx(x)}i = −∂ log px(x)

∂xi(14.62)

One can now define the score function difference (SFD) of a random vectorx.

Definition 14.5 (SFD) The score function difference (SFD) of x is the dif-ference between marginal and joint score functions:

βx(x) , ϕx(x)−ψx(x). (14.63)

14.5.2.2 Results

One can show [17] the following result concerning SFD:

Proposition 14.8 The components of the random vector x = (x1, . . . , xN )T

are independent if and only if βx(x) ≡ 0, i.e. if and only if:

ψx(x) = ϕx(x) (14.64)


More generally, one can compute a quantity similar to the differential ofmutual information (MI) by using the following theorem [17].

Theorem 14.9 (Differential of MI) Let x be a random vector and δ a“small” random vector with the same dimension. Then, one has:

I{x + δ} − I{x} = E{δTβx(x)} + o(δ) (14.65)

where o(δ) represents the higher order terms in δ, and βx is the score functiondifference (SFD) of x.

Recall that, for a multivariate (differentiable) function f(x), one has:

f(x + δ)− f(x) = δT · (∇f(x)) + o(δ) (14.66)

By comparing the above equation with (14.65), one observes that the SFDcan be interpreted as the “stochastic gradient” of mutual information. Finally,the following theorem [17] clearly shows that mutual information has no “localminima”.

Theorem 14.10 Let x0 be a random vector whose pdf is continuously differen-tiable. If for any “small” random vector δ, the condition I{x0} ≤ I{x0 + δ}holds, then I{x0} = 0.

14.5.2.3 Practical consequences

Using the differential of MI requires to estimate the marginal (MSF) and joint(JSF) score functions. Now, JSF is a multivariate function of N variables (itsdimension is equal to the source number N). Its estimation is very costly as thenumber of sources increases. Fast algorithms for computing joint and conditionalentropies and score functions have been proposed by Pham [86]. With thesemethods, the computational load increases with the factor 3n−1, where n is thedimension.

It is also interesting to exploit the fact that MI has no local minima. Atfirst glance, this result seems contradictory with other works [112]. In fact,it is not: observed local minima are related to the parametric model of theseparation structure. Thus although the MI itself has no local minima, even forlinear mixtures I{Bx} as a function of B can have local minima. Following thisobservation, Babaie-Zadeh et al. [16, 13] proposed a new class of algorithms,called Minimization-Projection (MP) algorithms, which estimate y = B(x) intwo successive steps:

• Minimization step of I{y}, without constraint related to the model B;

• Projection step, where the model B is estimated by minimizing the mean-square error E ‖B(x)− y‖2.

MP algorithms can be designed for any mixture model, especially for linearconvolutive models, PNL mixtures with memory (convolutive PNL) and withoutmemory.

14.6. A BAYESIAN APPROACH FOR GENERAL MIXTURES 605

14.5.3 Quadratic criterion

Following the works of Kankainen [63] and Eriksson et al. [40], S. Achard etal. [4] proposed a quadratic measure of dependence. The measure is based on akernelK, which has the property that its Fourier transform is almost everywherenonzero. For N random variables s = (s1, . . . , sN )T , one defines the quadraticcriterion CQ:

CQ(s1, . . . , sN ) =

∫

D2s(u1, . . . , uN)du1, . . . , duN , (14.67)

where

Ds(u1, . . . , uN) = E

{

N∏

i=1

K(

ui −si

σsi

)

}

−N∏

i=1

E

{

K(

ui −si

σsi

)

}

, (14.68)

in which σsiis a scaling factor. From this definition, and using the kernel trick

[79, 18], the criterion (14.67) can be easily computed. One obtains a simpleestimator, which does not suffer from curse of dimensionality, and computeits asymptotic properties. One can also show that the measure is related toquadratic error between the first joint characteristic function and the productof the first marginal characteristic functions. The choice of the kernel K remainsan open issue, although it is robust with respect to experimental results.

14.6 A Bayesian approach for general mixtures

Bayesian inference methods are well-known for the quality of their results, thesimplicity for taking into account priors in the model or in the sources, and theirrobustness. The main drawback is that their computational cost can be quitehigh, preventing sometimes their application to realistic unsupervised or blindlearning problems where the number of unknown parameters to be estimatedgrows.

Bayesian approaches have been used for source separation and ICA in linearmixtures and their various extensions [11, 66, 43, 53, 91, 64]. Valpola et al. havedeveloped variational Bayesian methods for various nonlinear mixture modelsin many papers, the main ones being [67, 106, 107, 108, 109]. Their researchefforts have been summarized in [61, 49]. In the following, we present brieflythe principles of their basic approach to nonlinear BSS called Nonlinear FactorAnalysis (NFA). For more details and various extensions, see the referencesmentioned above.

Variational Bayesian learning, also formerly called Bayesian ensemble learn-ing [23, 68], is in general based on approximation which is fitted to the posteriordistribution of the parameter(s) to be estimated. The approximative distribu-tion is often chosen to be Gaussian because of its simplicity and computationalefficiency. In our setting, the method tries to estimate the sources s(t) and themixing mapping A(s(t)) which have most probably generated the observed datax(t). Roughly speaking, this provides the regularization that is necessary formaking the nonlinear BSS problem solvable and tractable.


14.6.1 The nonlinear factor analysis (NFA) method

14.6.1.1 The model and cost function

The NFA method assumes that the data are generated by a noisy nonlinearmixture model (14.1)

x(t) = A(s(t)) + b(t) (14.69)

where x(t) and s(t) are the observation and source vector for the sample indexor time t, b(t) is the additive noise term at time t, and A(·) is a nonlinear mixingmapping. The dimensions P and N of the vectors x(t) and s(t), respectively, areusually different, and the components of the mapping A(·) are smooth enoughreal functions of the source vector s(t).

The nonlinear factor analysis (NFA) algorithm approximates the observeddata using an MLP network with one hidden layer:

A(s(t)) = A2 tanh(A1s(t) + a1) + a2, (14.70)

where (A1,a1) and (A2,a2) are the weight matrices and bias vectors of thehidden layer and of the output layer, respectively. In this equation and inthe following, the functions tanh and exp are applied componentwise to theirargument vectors.

Let us denote by

• X = {x(1), . . . ,x(T )} the set of T observation vectors, S = {s(1), . . . , s(T )}the set of T associated source vectors;

• θ the vector containing all the unknown model parameters, includingsources, noise and the parameters of the MLP network;

• p(S,θ|X) the theoretical a posteriori pdf and q(S,θ) its parametric ap-proximation.

In the variational Bayesian framework, one assigns a prior distribution foreach parameter of the vector θ. For instance, assuming that the noise vectorb(t) is jointly Gaussian and spatially and temporally white, one can write thelikelihood of the observations

p(X|S,θ) =∏

i,t

p(xi(t)|s(t),θ) =∏

i,t

N(xi(t);Ai(s(t)), exp(2vi)) (14.71)

where N(x; µ, σ2) represents a Gaussian distribution of variable x, with meanequal to µ and variance equal to σ2, and Ai is the i-th component of the non-linear mapping A. Variances are parameterized according to exponential lawexp(2v), where v is a parameter with a Gaussian a priori:

v ∼ N(mv, σv) (14.72)

The goal of the Bayesian approach is to estimate the a posteriori pdf of all theparameters of the vector θ. This is obtained by estimating a distribution q(S,θ)


which approximates the true a posteriori distribution p(S,θ|X). The differencebetween the approximation q(S,θ) and the true pdf p(S,θ|X) is measured usingthe Kullback-Leibler divergence:

K{q, p} =

∫

S

∫

θq(S,θ) ln

q(S,θ)

p(S,θ|X)dθdS (14.73)

The posterior distribution p(S,θ|X) cannot usually be evaluated, and thereforethe actual cost function used in variational Bayesian learning is

C = K{q, p} − ln p(X) =

∫

S

∫

θq(S,θ) ln

q(S,θ)

p(S,θ,X)dθdS (14.74)

This can be split into two parts arising from the denominator and numerator ofthe logarithm:

Cq =

∫

S

∫

θq(S,θ) ln q(S,θ)dθdS, (14.75)

Cp = −∫

S

∫

θq(S,θ) ln p(S,θ,X)dθdS (14.76)

The cost function (14.74) can be used also for model selection as explained in[68]. In the nonlinear BSS problem, it provides the necessary regularization. Foreach source signal and parameter, its posterior pdf is estimated instead of somepoint estimate. In many cases, an appropriate point estimate is given by themean of the posterior pdf of the desired quantity, and the respective varianceprovides at least a rough measure of the confidence of this estimate.

For evaluating the cost function C = Cq + Cp, we need two things: theexact formulation of the joint probability density p(S,θ,X), and its parametricapproximation q(S,θ). Usually the joint pdf p(S,θ,X) is a product of simpleterms due to the definition of the model. It can be written

p(S,θ,X) = p(X|S,θ)p(S|θ)p(θ) (14.77)

The pdf p(X|S,θ) has already been evaluated in (14.71), and the pdf’s p(S|θ)and p(θ) are also products of univariate Gaussian distributions. They can beobtained directly from the model structure [67, 109].

The cost function can be minimized efficiently if one assumes that the pa-rameters θ and S are independent:

q(S,θ) = q(S)q(θ) (14.78)

Finally, one assumes that parameters θi are Gaussian and independent, too:

q(θ) =∏

i

q(θi) =∏

i

N(θi, θi) (14.79)

The distribution q(S) follows a similar law. Estimation and evaluation ofthe cost function C = Cq + Cp is discussed in detail in [67, 106].


The NFA algorithm also assumes that the sources s(t) are Gaussian: s(t) ∼N(0, exp(2vs)). This assumption leads to a nonlinear PCA (principal com-ponent analysis) subspace only where the independent sources lie. The inde-pendent sources are then estimated by finding an appropriate rotation of thissubspace using a standard linear ICA algorithm, such as FastICA [53].

14.6.1.2 The learning method

The parameters of the approximating distribution q(S,θ) are optimized usinggradient based iterative algorithms. During one sweep of the algorithm all theparameters are updated once, using all the available data. One sweep consists oftwo different phases. The order of computations in these two phases is the sameas in standard back-propagation algorithm for MLP networks [47] but otherwisethe learning procedure is quite different. The most important differences arethat in the NFA method learning is unsupervised, the cost function is differ-ent, and unknown variables are characterized by distributions instead of pointestimates.

In the forward phase, the distributions of the outputs of the MLP networksare computed from the current values of the inputs. The value of the cost func-tion is also evaluated as explained in the previous subsection. In the backwardphase, the partial derivatives of the cost function with respect to all the param-eters are fed back through the MLP and the parameters are updated using thisinformation.

An update rule for the posterior variances θi is obtained by differentiating(14.74) with respect to θi, yielding [67, 109]

∂C

∂θi

=∂Cp

∂θi

+∂Cq

∂θi

=∂Cp

∂θi

− 1

2θi

. (14.80)

Equating this to zero yields a fixed-point iteration:

θi =

[

2∂Cp

∂θi

]−1

(14.81)

The posterior means θi can be estimated from the approximate Newton iteration[67, 109]

θi ← θi −∂Cp

∂θi

[

∂2C

∂θi2

]−1

≈ θi −∂Cp

∂θiθi (14.82)

In the NFA method, one tries to learn a nonlinear model (14.69), (14.70)with many parameters and unknowns in a completely blind manner from thedata X only. Therefore care is required especially in the beginning of learning.Otherwise the whole learning method could converge to some false solutionwhich is far from the correct one. The initialization process and tricks foravoiding false solutions are explained in [67, 61].


14.6.2 Extensions and experimental results

The original NFA algorithm has been extended by modeling the sources asmixtures of Gaussians instead of plain Gaussians in [67, 107, 106]. This NIFA(NIFA means nonlinear independent factor analysis) method provides somewhator slightly better estimation results than the NFA method followed by linear ICAat the expense of more complicated learning process and higher computationalload. Experimental results with simulated data [67, 107, 53], for which thetrue sources are known, show that the NFA method followed by linear ICAand the NIFA method are able to approximate pretty well the true sources.These methods have been applied also to real-world data sets, including 30-dimensional pulp data [53, 67, 107] and speech data, but interpretation of theresults is somewhat difficult, requiring problem specific expertise.

Furthermore, the NFA method has been extended for sources generated bya dynamic system [109], leading to a nonlinear dynamic factor analysis (NDFA)method. The NDFA method performs much better than the compared methodsin blind estimation of the dynamic system and its source signals. It has beensuccessfully applied also to the detection of changes in the states (sources) of thedynamic process [58]. There the NDFA method performed again much betterthan the compared state-of-the-art techniques of change detection.

The work on applying variational Bayesian learning to nonlinear BSS andrelated problems has been reported and summarized in more detail in [110,61, 49], with more references and results. MATLAB codes of NFA and NDFAalgorithms are available on the web page [108].

14.6.3 Comparisons on PNL mixtures

The NFA method has been applied on PNL mixtures (14.22) with additive noiseb(t) and compared with the method based on mutual information minimization(MI) [56, 57].

MI minimization is more efficient than NFA when the mixing model is a PNLmodel, even with additive noise. On the other hand, NFA can separate mixingswhich are locally non-invertible provided that they are globally invertible. Thismeans that the nonlinear functions fi are not invertible but the global mappingA is.

These experimental results show the importance of structural constraintson the achieved performance. Accurate modeling of the mixing and separatingsystem is then an essential step. It is directly related to the existence, simplic-ity, performance, and robustness of separating methods, even in the Bayesianframework.


14.7 Other methods and algorithms

14.7.1 Algorithms for PNL mixtures

A large number of works and algorithms have been devoted to PNL models.These methods mainly differ by the independence criterion [70], parameteriza-tion of the nonlinearity [85, 2], exploitation of temporal correlation [116] or ofa Markovian model for the sources [69]. If certain conditions hold, geometricalapproaches which avoid statistical estimations but require a large number ofsamples, can be used. They have been proposed and studied in several papers[90, 15, 104, 103, 80, 81].

Furthermore, a Bayesian method using MLP network structure has beenintroduced for blind separation of underdetermined post-nonlinear mixtures in[113]. For the same problem, a spectral clustering approach is proposed forsparse sources in [105]. A method based on flexible spline neural network struc-tures and minimization of mutual information is proposed for convolutive PNLmixtures in [111].

A very simple and efficient idea has also been proposed for enhancing theconvergence speed of source separation algorithms in PNL mixtures. Fromthe central limit theorem, one can claim the the linear mixtures ei (beforeapplication of the nonlinear functions fi) are approximately Gaussian. One canthen achieve a rough approximation of the inverse hi of the function fi, byenforcing (hi ◦ fi)(ei) to be Gaussian, too. This idea, developed independentlyby Sole et al. [93] and Ziehe et al. [117], leads to very simple and fast estimationof hi:

hi = Φ−1 ◦ Fxi, (14.83)

where Fxiis the cumulative probability density function of the random variable

xi, and Φ is the cumulative probability density function of the Gaussian randomvariable. Of course, the Gaussian assumption is just an approximation, but themethod is robust with respect to this assumption and provides an initial valueto hi

13 which increases a lot the convergence speed of algorithms [94]. Recently,this method has been extended to PNL mixtures in which the linear mixturesare close to Gaussian in [115].

14.7.2 Constrained MLP-like structures

Marques and Almeida [75, 8] generalized the Infomax principle [21] to nonlinearmixtures. In this purpose, they propose a separation structure B realized usinga multi-layer perceptron (MLP) neural network. Samples yi(t) at the outputof the MLP network are then transformed by a mapping Fi in order to providezi = Fi(yi), whose distribution is uniform in [0, 1]. It is evident that Fi alwaysexists: it is the cumulative probability density function of yi. Since mutual

13sometimes surprisingly accurate.

14.8. A FEW APPLICATIONS 611

information is invariant for any diagonal invertible mapping, one can write:

I{y} = I{z} =∑

i

H{zi} −H{z} (14.84)

Since the random variables zi are always uniformly distributed in [0, 1], theirentropies are constants. Consequently, minimizing I{y} is equivalent to maxi-mizing H{z}. Under the generic name MISEP, Almeida proposed a few algo-rithms, which mainly differ according to the parameterization of Fi, in whichthe parameters of the MLP network are updated in order to maximize H{z}[7].

14.7.3 Other approaches

Tan, Wang, and Zurada [101] have proposed a radial basis function (RBF) neu-ral network structure for approximating the separating mapping (14.2). Theircontrast function consists of both the mutual information and partial momentsof the estimated separated sources, which are used to provide the regularizationneeded in nonlinear BSS. Simulation results are presented for several artificiallygenerated nonlinear mixture sets, confirming the validity of the method intro-duced in [101].

Levin has developed a nonlinear BSS method based on differential geometryand phase-space density in [71]. In [24], the authors claim that temporal slow-ness complements statistical independence well, and that a combination of theseprinciples leads to unique solutions of the nonlinear BSS problem. The authorsintroduce an algorithm called independent slow feature analysis for nonlinearBSS. In [74], a noisy nonlinear version of ICA is proposed. Assuming that the pdfof sources is known, the authors derive a learning rule based on maximum like-lihood estimation. A new method for solving nonlinear BSS problems is derivedin [76] by exploiting second-order statistics in a kernel induced feature space.Experimental results are presented on realistic nonlinear mixtures of speech sig-nals, gas multisensor data, and visual disparity data. A new nonlinear mixingmodel called by the authors as multinonlinearity constrained mixing model anda separation method for it are proposed in [42]. New nonlinear mixture mod-els called by the authors as “additive-target mixtures” and “extractable-targetmixtures” and a separation method for them based on recurrent neural networksare introduced in [35].

More references especially to early works on nonlinear ICA and BSS can befound in the reviews [7, 61].

14.8 A few applications

Currently, a few real-world problems have been modeled by nonlinear mixtures.We present here three examples which seem quite interesting.


14.8.1 Chemical sensors

ISFET-like chemical sensors are designed with field effect transistors (MOS-FET), whose metal gate is replaced by a polymer membrane which is sensitiveto certain ions. Each sensor typically provides a drain current

Id = A + B log(

ai +∑

j

kijazi/zj

j (t))

, (14.85)

where A and B are constants depending on technological and geometrical pa-rameters of the transistor, kij measures the sensitivity of the sensor to secondaryions, and ai and zi are the activity and the valence of the ion i, respectively. Ofcourse, activities of different ions act like sources and are assumed independent,which is a realistic assumption. Moreover, the unknown parameters A, B, andkij may vary between the sensors, and this spatial diversity allows using sourceseparation methods.

In the general case, where the ions have different values, the quantity insidethe log is nonlinear with respect to aj . But if the different (or the main) ionshave the same valence, the ratios zi/zj are equal to 1, and the responses providedby ISFET sensors can be modeled by PNL mixtures. The problem is howeversimpler that the generic PNL problem, since the nonlinearities fi are knownfunctions (log). PNL methods can then be applied in a semi-blind manner,leading to simpler and more efficient algorithms. In fact, from the physicalmodel (14.85), one can see that the observation (the drain current) Id can belinearized by applying exp[(Id − A)/B]. Concerning the nonlinear part of theseparation structure, one then uses the parametric model hi(u) = exp[(u−α)/β],where α and β are the two parameters to be estimated [20]. In this way, one getsgood estimates of the unknown activities ai up to a constant, due to the scalingindeterminacy. This estimation process is then finalized with a calibration step.

For ions with different valences, the ratio k = zi/zj is no longer equal to 1,and the mixture is much more intricated than a PNL mixture. This problem hasbeen addressed assuming first that the log nonlinearity is canceled as suggestedabove. The remaining mixture is then still nonlinear:

xi = ai +∑

j

kijakj (t). (14.86)

This problem can be solved by extending the recursive structure suggested byHosseini and Deville [50, 51] for this particular mapping. Parameters of the sep-arating structure can be estimated by considering either higher-order statistics[36], mutual information [37], sparsity of sources [38], or in a Bayesian frame-work [39], possibly also using temporal prior between successive values of ionactivities.

14.8.2 Gas sensors

In a tin-oxide gas sensor, the oxygen concentration initially absorbed decreases ifa combustible gas is present, which increases the sensor resistivity. If N different

14.8. A FEW APPLICATIONS 613

gases are present, one can model the variation of the conductance [72] of eachsensor i by

Gi = G0 +

N∑

j

(

aijC−rj

j +

N∑

k=j

aijkC−rj

j C−rk

k

)

(14.87)

where aij , aijk, rj , and rk are unknown parameters and Cj is the concentration

of gas j. Denoting C−rj

j = sj , the conductance varies according to a bilinearmixture of sources sj . For two gases and two sensors, Bedoya [19] has shown thatone can estimate the concentration of both gases by using methods developedby Hosseini and Deville [50] for source separation in bilinear mixtures.

14.8.3 Mixtures of images

Let us consider a sheet of paper, whose both sides have been printed. If the pa-per sheet is thin, each side is in fact a mixture of the front and back images dueto transparency. This phenomena is called the show-through effect. Almeida[6] proposes to get two different mixtures (spatial diversity), by successivelyscanning the front and the back side of the paper sheet. After geometrical reg-istration, he gets two nonlinear mixtures of the front and back images. He thenuses his MISEP algorithm for separating the sources (front and back images)from their nonlinear mixtures. The separation performance is better than with asource separation algorithm for linear mixtures, but not very high. The separa-tion results could be enhanced by employing a better model of the nonlinearitiesin the observations.

This problem has been then addressed by Merrikh-Bayat et al. in [77] us-ing a refined model of the image mixtures and a suitable separation structure.First, their model takes into account the nonlinearity of the mixture, whichis experimentally estimated by an exponential function. This function can beapproximated by a first-order expansion, leading then to a bilinear model ofmixtures. Merrikh-Bayat et al. [77] apply then the algorithm of Hosseini andDeville [50, 51] (the recurrent structure of Figure 14.5) to these mixtures toobtain the result shown in the middle column of Figure 14.8. This result showsthat although the bilinear mixing model may be effective for removing show-through, it leaves some traces of the borders of the texts on the other side.

Merrikh-Bayat et al. [77] then suggest that this problem arises because thebilinear mixing model and the separating structure of Figure 14.5 are too simpleto model the show-through effect in the scanned documents. In effect, when thelight passes through the paper, it is scattered in different directions, and hencewhat we see from the other side is a blurred version of the actual image of theother side. In other words, what the scanner sees is not “the image of rectomixed with the image of verso” but is “the image of recto mixed with a blurredversion of the image of verso”. But the model (14.27) and the separatingstructure of Figure 14.5 do not take into account this blurring phenomenon. Toremedy this problem, Merrikh-Bayat et al. propose the separating structure ofFigure 14.7. In this structure, near the convergence, y2 is close to the imageof the verso, and before subtracting it from x1 (the scanned image of recto),


q1

l1 W (z)

l2 W (z)

q2

m

m

m

�

�

x1

x2 y2

y1q q

q

qq

-

-

6

6

?

?�

�

-

-

?

6

��

��

��

LL

LL

LL

�

×

+

+

Figure 14.7: The structure proposed by Merrikh-Bayat et al. [77] for solving theshow-though problem in scanned documents.

it passes through a blurring filter W (z). This corresponds to writing the firstequation of (14.27) as

x1 = s1 − l1[W (z)]s2 − q1s1s2. (14.88)

A similar operation is done to y1, too. Note that the authors did not useW (z) in the bilinear term, because the original equation is itself an approximateequation, and also because q1 is usually very small. Then, as a first-orderapproximation, they choose the blurring filters W (z) as fixed low-pass filters(3 × 3 point filters with all coefficients equal to 1). They then apply the samealgorithm of Hosseini and Deville [51] which had been originally designed for thestructure of Figure 14.5 to this new structure, and they obtain the result shownin right column of the Figure 14.8. Applying Hosseini and Deville’s algorithmon a different structure is just a first order approximation, justified by the factthat although the image of recto and its blurred version are different, they arenot too different, and the same algorithm may give a first order approximationon the new separating structure. The results of Figure 14.8 show that theapproach using the structure of Figure 14.7 may be very promising for solvingthe show-through effect in scanned documents.

We finish this section by emphasizing that in this application (removing theshow-through effect) an important problem is registration of the images of bothsides, that is, ensuring that the corresponding pixels of the images at handcorrespond to same points on the paper. In effect, while scanning the other sideof the paper, there is usually some amount of shift and rotation, which highlyaffect the quality of the results given by different algorithms. In the results ofthis section, it had been assumed that the scanned images of recto and versohave already been registered.

14.9 Conclusion

Blind source separation in nonlinear mixtures is generally not possible usingmerely ICA. In fact, for any random vector s with independent components, one

14.9. CONCLUSION 615

Figure 14.8: Results of removing show-through effect in scanned documents.Left column: Two registered image distorted by show-through. Middle column:Separation result using the structure of Figure 14.5. Right column: Separationresult using the structure of Figure 14.7 for fixed W (z)’s.

can design a mapping H which provides independent components y althoughthe mapping is still mixing (not σ-diagonal).

In this chapter, we explored three main ideas for separating sources in non-linear mixtures:

• The first one is based on structural constraints (on mixing and separatingstructures), which restrict the solution space. Post-nonlinear mixturesconstitute a particularly interesting example, due to their realism, thetheoretical separability characterized by weak indeterminacies, and thealgorithmic performance.

• The second one consists in using prior information on sources, like thecoloredness of temporal sources, or boundedness of sources. This idea ismore ad hoc since each prior leads to particular processing and algorithm.It can lead to simpler algorithms, or simply restrict the solution space dueto additional constraints introduced by extra equations.

• The third approach, which is quite promising although computationallydemanding, is the Bayesian framework, which allows to take into accountin a unified probabilistic formalism prior information and constraints onsources and mixtures.


In the framework of ICA for post-nonlinear mixtures, we used mutual infor-mation (MI) and showed that optimization of the nonlinear part of the modelrequires accurate estimation of the score functions, contrary to the estimation ofthe linear part estimation which allows very rough estimation. For overcomingthe MI estimation problems, other criteria can be introduced, e.g. the quadraticcriterion, whose estimation based on kernel trick is both simple and efficient.

In the future, the most efficient methods will be able to jointly take into ac-count structural constraints and priors on sources. Finally, interest on nonlinearsource separation methods will be relevant only if they can be applied to actualproblems and applications. The most promising application domain currentlyis the design of smart sensor arrays, especially for chemicals and gases whichcan be classified in structurally constrained models.

Acknowledgments. This chapter is based on results obtained since 1996, inthe framework of PhD theses and post-doctoral works, especially by A. Taleb, M.Babaie-Zadeh, S. Hosseini, S. Achard, L. Duarte and in cooperation with D.T.Pham and other partners of the European project BLISS (BLInd Sources Separa-tion and applications, IST-1999-14190), especially H. Valpola, J. Karhunen, A.Honkela and A. Ilin (Helsinki University of Technology, Finland), as well as K.-R. Muller, A. Ziehe and S. Harmeling (Fraunhofer Institute, Berlin, Germany).

Software. Software (applets Java and Matlab sources) for PNL algorithms areavailable on the Web pages of GIPSA-lab:http://www.lis.inpg.fr/demos/sep sourc/ICAdemo/index.html

Other software (Matlab sources for PNL algorithms and NFA, especially) areavailable on the Web pages of the European project BLISS:http://www.lis.inpg.fr/bliss/deliverables.php.

Bibliography

[1] S. Achard and C. Jutten, Identifiability of post nonlinear mixtures,IEEE Signal Processing Letters, 12 (2005), pp. 423–426.

[2] S. Achard, D. Pham, and C. Jutten, Blind source separation in postnonlinear mixtures, in Proc. of the 3rd Workshop on Independent Com-ponent Analysis and Signal Separation (ICA2001), San Diego (California,USA), 2001, pp. 295–300.

[3] , Criteria based on mutual information minimization for blindsource separation in post-nonlinear mixtures, Signal Processing, 85 (2004),pp. 965–974.

[4] , Quadratic dependence measure for nonlinear blind source separa-tion, in Proc. of 4th Int. Symp. on Independent Component Analysis andBlind Source Separation (ICA2003), S.-I. Amari, A. Cichocki, S. Makino,and N. Murata, eds., Nara, Japan, April 2003, pp. 263–268.

BIBLIOGRAPHY 617

[5] J. Aczel, Lectures on Functional Equations and Their Applications, Aca-demic Press, New York, 1966.

[6] L. Almeida, Separating a real-life nonlinear image mixture, Journal ofMachine Learning Research, 6 (2005), pp. 1199–1229.

[7] , Nonlinear Source Separation, Synthesis Lectures on Signal Process-ing, vol. 2, Morgan & Claypool Publishers, 2006.

[8] , Linear and nonlinear ICA based on mutual information, in Proc.IEEE 2000 Adaptive Systems for Signal Processing, Communications, andControl Symposium (AS-SPCC), Lake Louise, Canada, October 2000,pp. 117–122.

[9] L. Almeida, C. Jutten, and H. Valpola, Realistic models of nonlin-ear mixtures, EU project BLISS (IST1999-14190) Report D5, May 2001.Available at http://www.lis.inpg.fr/bliss/deliverables.php.

[10] S. Amari, A. Cichocki, and H. Yang, A new learning algorithm forblind signal separation, in Advances in Neural Information Processing Sys-tems 1995, M. M. et al., ed., vol. 8, Cambridge, MA, USA, 1996, MITPress, pp. 757–763.

[11] H. Attias, Independent factor analysis, Neural Computation, 11 (1999),pp. 803–851.

[12] M. Babaie-Zadeh, On blind source separation in convolutive and non-linear mixtures, PhD thesis, INPG, Grenoble, France, Sept. 2002.

[13] M. Babaie-Zadeh and C. Jutten, A general approach for mutual infor-mation minimization and its application to blind source separation, SignalProcessing, 85 (2005), pp. 975–995.

[14] M. Babaie-Zadeh, C. Jutten, and K. Nayebi, Separating convolutivepost non-linear mixtures, in Proc. of the 3rd Workshop on IndependentComponent Analysis and Signal Separation (ICA2001), San Diego (Cali-fornia, USA), 2001, pp. 138–143.

[15] , A geometric approach for separating post nonlinear mixtures, inProc. of the XI European Signal Processing Conf. (EUSIPCO 2002),vol. II, Toulouse, France, September 2002, pp. 11–14.

[16] , Minimization-projection (MP) approach for blind source separationin different mixing models, in Proc. of the 4th Int. Symp. on Indepen-dent Component Analysis and Blind Source Separation (ICA2003), Nara,Japan, April 2003, pp. 1083–1088.

[17] , Differential of mutual information, IEEE Signal Processing Letters,11 (2004), pp. 48–51.


[18] F. Bach and M. Jordan, Kernel independent component analysis, Jour-nal of Machine Learning Research, 3 (2002), pp. 1–48.

[19] G. Bedoya, Non-linear Blind Signal Separation for Chemical Solid-StateSensor Arrays, PhD thesis, Technical Univ. of Catalonia, Dept. of Elec-trical Eng., Barcelona, Spain, 2006.

[20] G. Bedoya, C. Jutten, S. Bermejo, and J. Cabestany, Improvingsemiconductor-based chemical sensor arrays using advanced algorithms forblind source separation, in Proc. of the ISA/IEEE Sensors for IndustryConference (SIcon 04), New Orleans, USA, January 2004, pp. 149–154.

[21] A. Bell and T. Sejnowski, An information-maximization approach toblind separation and blind deconvolution, Neural Computation, 7 (1995),pp. 1129–1159.

[22] S. Bermejo, C. Jutten, and J. Cabestany, ISFET source separation:foundations and techniques, Sensors and Actuators B: Chemical, B (2006),pp. 222–233.

[23] C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

[24] T. Blaschke, T. Zito, and L. Wiskott, Independent slow featureanalysis and nonlinear blind separation, Neural Computation, 19 (2007),pp. 994–1021.

[25] G. Burel, Blind separation of sources: A nonlinear neural algorithm,Neural Networks, 5 (1992), pp. 937–947.

[26] J.-F. Cardoso, Blind signal separation: statistical principles, Proceed-ings of the IEEE, 9 (1998), pp. 2009–2025.

[27] J.-F. Cardoso and B. Laheld, Equivariant adaptive source separation,IEEE Trans. on Signal Processing, 44 (1996), pp. 3017–3030.

[28] J.-F. Cardoso and A. Souloumiac, Blind beamforming for non gaus-sian signals, IEE Proceedings-F, 140 (1993), pp. 362–370.

[29] A. Cichocki and S.-I. Amari, Adaptive Blind Signal and Image Pro-cessing - Learning Algorithms and Applications, J. Wiley, 2002.

[30] A. Cichocki, R. Unbehauen, and E. Rummert, Robust learning al-gorithm for blind separation of signals, Electronics Letters, 30 (1994),pp. 1386–1387.

[31] P. Comon, Independent component analysis, a new concept?, Signal Pro-cessing, 36 (1994), pp. 287–314.

[32] T. Cover and J. Thomas, Elements of Information Theory, Wiley Se-ries in Telecommunications, 1991.

BIBLIOGRAPHY 619

[33] G. Darmois, Analyse des liaisons de probabilite, in Proc. of Int. StatisticsConferences 1947, vol. III A, Washington (D.C.), 1951, p. 231.

[34] G. Darmois, Analyse generale des liaisons stochastiques, Rev. Inst. In-tern. Stat., 21 (1953), pp. 2–8.

[35] Y. Deville and S. Hosseini, Recurrent networks for separatingextractable-target nonlinear mixtures, part I: Non-blind configurations,Signal Processing, 89 (2009), pp. 378–393.

[36] L. Duarte and C. Jutten, Blind source separation of a class of non-linear mixtures, in Proc. of the 7th Int. Conf. on Independent ComponentAnalysis and Signal Separation (ICA2007), Lecture Notes in ComputerScience, vol. 4666, London, United Kingdom, September 2007, Springer-Verlag, pp. 41–48.

[37] , A mutual information minimization approach for a class of nonlin-ear recurrent separating systems, in Proc. of the 2007 IEEE Int. Workshopon Machine Learning for Signal Processing (MLSP2007), Thessaloniki,Greece, August 2007.

[38] , A nonlinear source separation approach for the Nicolsky-Eisenmanmodel, in Proc. of the 16th European Signal Processing Conf. (EU-SIPCO2008), Lausanne, Switzerland, August 2008.

[39] L. Duarte, C. Jutten, and S. Moussaoui, Ion-selective electrode ar-ray based on a Bayesian nonlinear source separation method, in Proc. ofthe 8th Int. Conf. on Independent Component Analysis and Signal Sepa-ration (ICA2009), Paraty, Brasil, March 2009.

[40] J. Eriksson, A. Kankainen, and V. Koivunen, Novel characteristicfunction based criteria for ICA, in Proc. of the 3rd Int. Conf. on Indepen-dent Component Analysis and Signal Separation (ICA2001), San Diego,CA, USA, December 2001, pp. 108–113.

[41] J. Eriksson and V. Koivunen, Blind identifiability of class of nonlinearinstantaneous ICA models, in Proc. of the XI European Signal Proc. Conf.(EUSIPCO2002), vol. 2, Toulouse, France, September 2002, pp. 7–10.

[42] P. Gao, W. Woo, and S. Dlay, Nonlinear signal separation for multi-nonlinearity constrained mixing model, IEEE Trans. on Neural Networks,17 (2006), pp. 796–802.

[43] M. Girolami, ed., Advances in Independent Component Analysis,Springer-Verlag, 2000.

[44] M. Haritopoulos, H. Yin, and N. Allison, Image denoising usingself-organizing map-based nonlinear independent component analysis, Neu-ral Networks, 15 (2002), pp. 1085–1098.


[45] S. Harmeling, A. Ziehe, B. Blankertz, and K.-R. Muller, Non-linear blind source separation using kernel feature bases, in Proc. of the3rd Int. Conf. on Independent Component Analysis and Signal Separation(ICA2001), San Diego, CA, USA, December 2001, pp. 102–107.

[46] , Kernel-based nonlinear blind source separation, Neural Computa-tion, 15 (2003), pp. 1089–1124.

[47] S. Haykin, Neural Networks - A Comprehensive Foundation, PrenticeHall, 2nd ed., 1998.

[48] J. Herault, C. Jutten, and B. Ans, Detection de grandeurs primitivesdans un message composite par une architecture de calcul neuromimetiqueen apprentissage non supervise, in Actes du X eme colloque GRETSI,Nice, France, May 1985, pp. 1017–1022.

[49] A. Honkela, H. Valpola, A. Ilin, and J. Karhunen, Blind separa-tion of nonlinear mixtures by variational Bayesian learning, Digital SignalProcessing, 17 (2007), pp. 914–934.

[50] S. Hosseini and Y. Deville, Blind separation of linear-quadraticmixtures of real sources using a recurrent structure, in Proc. of the7th Int. Work-Conference on Artificial and Natural Neural Networks(IWANN2003), Lecture Notes in Computer Science, vol. 2686, Menorca,Spain, June 2003, Springer-Verlag, pp. 241–248.

[51] , Blind maximum likelihood separation of a linear-quadratic mixture,in Proc. of the 5th Int. Conf. on Independent Component Analysis andBlind Signal Separation (ICA2004), C. Puntonet and A. Prieto, eds.,Granada, Spain, September 2004, pp. 694–701. Lecture Notes in Com-puter Science, vol. 3195, Springer-Verlag.

[52] S. Hosseini and C. Jutten, On the separability of nonlinear mixtures oftemporally correlated sources, IEEE Signal Processing Letters, 10 (2003),pp. 43–46.

[53] A. Hyvarinen, J. Karhunen, and E. Oja, Independent ComponentAnalysis, J. Wiley, 2001.

[54] A. Hyvarinen and E. Oja, A fast fixed-point algorithm for independentcomponent analysis, Neural Computation, 9 (1997), pp. 1483–1492.

[55] A. Hyvarinen and P. Pajunen, Nonlinear independent componentanalysis: Existence and uniqueness results, Neural Networks, 12 (1999),pp. 429–439.

[56] A. Ilin, S. Achard, and C. Jutten, Bayesian versus constrained struc-ture approaches for source separation in post-nonlinear mixtures, in Proc.of the 2004 Int. J. Conf. on Neural Networks (IJCNN2004), Budapest,Hungary, July 2004, pp. 2188–2186.

BIBLIOGRAPHY 621

[57] A. Ilin and A. Honkela, Post-nonlinear independent component anal-ysis by variational Bayesian learning, in Proc. of the 5th Int. Conf. on In-dependent Component Analysis and Blind Signal Separation (ICA2004),C. Puntonet and A. Prieto, eds., Granada, Spain, September 2004,pp. 766–773. Lecture Notes in Computer Science, vol. 3195. Springer-Verlag.

[58] A. Ilin, H. Valpola, and E. Oja, Nonlinear dynamical factor analysisfor state change detection, IEEE Trans. on Neural Networks, 15 (2004),pp. 559–575.

[59] C. Jutten, Calcul Neuromimetique et Traitement du Signal, Analyse enComposantes Independantes, PhD thesis, INPG, Univ. Grenoble, France,1987. (in French).

[60] C. Jutten and J. Herault, Blind separation of sources, Part I: anadaptive algorithm based on a neuromimetic architecture, Signal Process-ing, 24 (1991), pp. 1–10.

[61] C. Jutten and J. Karhunen, Advances in blind source separation(BSS) and independent component analysis (ICA) for nonlinear mixtures,Int. J. of Neural Systems, 14 (2004), pp. 267–292.

[62] A. Kagan, Y. Linnik, and C. Rao, Extension of Darmois-Skitovictheorem to functions of random variables satisfying an addition theorem,Communications in Statistics, 1 (1973), pp. 471–474.

[63] A. Kankainen, Consistent testing of total independence based on em-pirical characteristic functions, PhD thesis, University of Jyvaskyla,Jyvaskyla, Finland, 1995.

[64] K. Knuth and E. K. (Eds.), Special issue on Bayesian source separa-tion, Digital Signal Processing, 17 (2007).

[65] M. Korenberg and I. Hunter, The identification of nonlinear biolog-ical systems: LNL cascade models, Biol. Cybernetics, 43 (1995), pp. 125–134.

[66] H. Lappalainen, Ensemble learning for independent component analysis,in Proc. Int. Workshop on Independent Component Analysis and SignalSeparation (ICA’99), Aussois, France, 1999, pp. 7–12.

[67] H. Lappalainen and A. Honkela, Bayesian nonlinear independentcomponent analysis by multi-layer perceptrons, in Advances in Inde-pendent Component Analysis, M. Girolami, ed., Springer-Verlag, 2000,pp. 93–121.

[68] H. Lappalainen and J. Miskin, Ensemble learning, in Advances inIndependent Component Analysis, M. Girolami, ed., Springer-Verlag,Berlin, 2000, pp. 75–92.


[69] A. Larue, C. Jutten, and S. Hosseini, Markovian source separationin non-linear mixtures, in Proc. of the 5th Int. Conf. on Independent Com-ponent Analysis and Blind Signal Separation (ICA2004), Lecture Notes inComputer Science, vol. 3195, C. Puntonet and A. Prieto, eds., Granada,Spain, September 2004, Springer-Verlag, pp. 702–709.

[70] T.-W. Lee, B. Koehler, and R. Orglmeister, Blind source separa-tion of nonlinear mixing models, in Neural Networks for Signal ProcessingVII, Proc. of the 1997 IEEE Signal Processing Society Workshop, IEEEPress, 1997, pp. 406–415.

[71] D. Levin, Using state space differential geometry for nonlinear blindsource separation, Journal of Applied Physics, 103 (2008). Article ID044906, 12 pages.

[72] E. Llobet, X. Vilanova, J. Brezmes, and X. Correig, Electricalequivalent models of semiconductor gas sensors using PSICE, Sensors andActuators B, 77 (2001), pp. 275–280.

[73] E. Lukacs, A characterization of the Gamma distribution, Annals ofMathematical Statistics, 26 (1955), pp. 319–324.

[74] S. Maeda, W.-J. Song, and S. Ishii, Nonlinear and noisy extensionof independent component analysis: Theory and its application to a pitchsensation model, Neural Computation, 17 (2005), pp. 115–144.

[75] G. Marques and L. Almeida, Separation of nonlinear mixtures usingpattern repulsion, in Proc. Int. Workshop on Independent ComponentAnalysis and Signal Separation (ICA’99), Aussois, France, 1999, pp. 277–282.

[76] D. Martinez and A. Bray, Nonlinear blind source separation usingkernels, IEEE Trans. on Neural Networks, 14 (2003), pp. 228–235.

[77] F. Merrikh-Bayat, M. Babaie-Zadeh, and C. Jutten, A nonlinearblind source separation solution for removing the show-through effect inthe scanned documents, in Proc. of the 16th European Signal ProcessingConf. (EUSIPCO2008), Lausanne, Switzerland, August 2008.

[78] F. Mokhtari, M. Babaie-Zadeh, and C. Jutten, Blind separating bi-linear mixtures using mutual information minimization, in 2009 IEEE Int.Conf. on Machine Learning for Signal Processing (MLSP2009), Grenoble,France, September 2009. Submitted.

[79] K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf,An introduction to kernel-based learning algorithms, IEEE Trans. on Neu-ral Networks, 12 (2001), pp. 181–201.

BIBLIOGRAPHY 623

[80] T. Nguyen, J. Patra, and A. Das, A post nonlinear geometric algo-rithm for independent component analysis, Digital Signal Processing, 15(2005), pp. 276–294.

[81] T. Nguyen, J. Patra, and S. Emmanuel, gpICA: A novel nonlinearICA algorithm using geometric linearization, EURASIP Journal on Ad-vances in Signal Processing, (2007). Special issue on Advances in BlindSource Separation, Article ID 31951, 12 pages.

[82] A. Parashiv-Ionescu, C. Jutten, and G. Bouvier, Source separationbased processing for integrated Hall sensor arrays, IEEE Sensors Journal,2 (2002), pp. 663–673.

[83] L. Parra, Symplectic nonlinear component analysis, in Advances in Neu-ral Information Processing Systems 1995, M. M. et al., ed., vol. 8, MITPress, Cambridge, MA, USA, 1996, pp. 437–443.

[84] L. Parra, G. Deco, and S. Miesbach, Statistical independence andnovelty detection with information-preserving nonlinear maps, NeuralComputation, 8 (1996), pp. 260–269.

[85] H. Peng, Z. Chi, and W. Siu, A semi-parametric hybrid neural modelfor nonlinear blind signal separation, Int. J. of Neural Systems, 10 (2000),pp. 79–94.

[86] D. Pham, Fast algorithms for estimating mutual information, entropiesand score functions, in Proc. 4th Int. Symp. on Independent ComponentAnalysis and Blind Signal Separation (ICA2003), S.-I. Amari, A. Cichocki,S. Makino, and N. Murata, eds., Nara, Japan, Apr. 2003, pp. 17–22.

[87] D. Pham, P. Garat, and C. Jutten, Separation of mixtures of inde-pendent sources through a maximum likelihood approach, in Signal Process-ing VI, Theories and Applications, J. Vandewalle, R. Boite, M. Moonen,and A. Oosterlinck, eds., vol. 2, Brussels, Belgium, Aug. 1992, Elsevier,pp. 771–774.

[88] R. Popovic, Hall-Effect Devices, Adam Hilger, Bristol, 1991.

[89] S. Prakriya and D. Hatzinakos, Blind identification of LTI-ZMNL-LTI nonlinear channel models, IEEE Trans. on Signal Processing, 43(1995), pp. 3007–3013.

[90] C. Puntonet, M. Alvarez, A. Prieto, and B. Prieto, Separation ofsources in a class of post-nonlinear mixtures, in Proc. of the 6th EuropeanSymp. on Artificial Neural Networks (ESANN’98), Bruges, Belgium, April1998, pp. 321–326.

[91] S. Roberts and R. Everson, eds., Independent Component Analysis:Principles and Practice, Cambridge Univ. Press, 2001.


[92] S. Senecal, P.-O. Amblard, and L. Cavazzana, Particle filteringequalization method for a satellite communication channel, J. of AppliedSignal Processing, 15 (2004), pp. 2317–2327.

[93] J. Sole, M. Babaie-Zadeh, C. Jutten, and D. Pham, Improvingalgorithm speed in PNL mixture separation and Wiener system inversion,in Proc. of 4th Int. Symp. on Independent Component Analysis and BlindSource Separation (ICA2003), S.-I. Amari, A. Cichocki, S. Makino, andN. Murata, eds., Nara, Japan, April 2003, pp. 639–644.

[94] J. Sole, C. Jutten, and D. Pham, Fast approximation of nonlinearitiesfor improving inversion algorithms of PNL mixtures and Wiener systems,Signal Processing, 85 (2005), pp. 1780–1786.

[95] T. Stockham, T. Cannon, and R. Ingerbretsen, Blind deconvolu-tion through digital signal processing, Proceedings of the IEEE, 63 (1975),pp. 678–692.

[96] A. Taleb, A generic framework for blind source separation in structurednonlinear models, IEEE Trans. on Signal Processing, 50 (2002), pp. 1819–1830.

[97] A. Taleb and C. Jutten, Nonlinear source separation: The postlinearmixtures, in Proc. of the 5th Europ. Symp. on Artificial Neural Networks(ESANN’97), Bruges, Belgium, April 1997, pp. 279–284.

[98] , Batch algorithm for source separation in post-nonlinear mixtures,in Proc. First Int. Workshop on Independent Component Analysis andSignal Separation (ICA’99), Aussois, France, 1999, pp. 155–160.

[99] , Source separation in post-nonlinear mixtures, IEEE Trans. on SignalProcessing, 47 (1999), pp. 2807–2820.

[100] A. Taleb, J. Sole, and C. Jutten, Quasi-nonparametric blind inver-sion of Wiener systems, IEEE Trans. on Signal Processing, 49 (2001),pp. 917–924.

[101] Y. Tan, J. Wang, and J. Zurada, Nonlinear blind source separationusing a radial basis function network, IEEE Trans. on Neural Networks,12 (2001), pp. 124–134.

[102] F. Theis and P. Gruber, On model identifiability in analytic postnon-linear ICA, Neurocomputing, 64 (2005), pp. 223–234.

[103] F. Theis and E. Lang, Postnonlinear blind source separation via lin-earization identification, in Proc. of the 2004 Int. J. Conf. on Neural Net-works (IJCNN2004), Budapest, Hungary, July 2004, pp. 2199–2204.

BIBLIOGRAPHY 625

[104] F. Theis, C. Puntonet, and E. Lang, Nonlinear geometrical ICA,in Proc. 4th Int. Symp. on Independent Component Analysis and BlindSignal Separation (ICA2003), S.-I. Amari, A. Cichocki, S. Makino, andN. Murata, eds., Nara, Japan, Apr. 2003, pp. 275–280.

[105] S. V. Vaerenbergh and I. Santamaria, A spectral clustering ap-proach to underdetermined postnonlinear blind source separation of sparsesources, IEEE Trans. on Neural Networks, 17 (2006), pp. 811–814.

[106] H. Valpola, Nonlinear independent component analysis using ensem-ble learning: Theory, in Proc. of the 2nd Int. Workshop on IndependentComponent Analysis and Blind Signal Separation (ICA2000), Helsinki,Finland, 2000, pp. 251–256.

[107] H. Valpola, X. Giannakopoulos, A. Honkela, and J. Karhunen,Nonlinear independent component analysis using ensemble learning: Ex-periments and discussion, in Proc. of the 2nd Int. Workshop on Inde-pendent Component Analysis and Blind Signal Separation (ICA2000),Helsinki, Finland, 2000, pp. 351–356.

[108] H. Valpola, A. Honkela, and X. Giannakopoulos, MAT-LAB codes for the NFA and NDFA algorithms, Available athttp://www.cis.hut.fi/projects/bayes/software/, (2002).

[109] H. Valpola and J. Karhunen, An unsupervised ensemble learningmethod for nonlinear dynamic state-space models, Neural Computation,14 (2002), pp. 2647–2692.

[110] H. Valpola, E. Oja, A. Ilin, A. Honkela, and J. Karhunen, Non-linear blind source separation by variational Bayesian learning, IEICETransactions (Japan), E86-A (2003), pp. 532–541.

[111] D. Vigliano, R. Parisi, and A. Uncini, An information theoretic ap-proach to a novel nonlinear independent component analysis paradigm,Signal Processing, 85 (2005), pp. 997–1028.

[112] F. Vrins and M. Verleysen, On the entropy minimization of a linearmixture of variables for source separation, Signal Processing, 85 (2005),pp. 1029–1044.

[113] C. Wei, W. Woo, and S. Dlay, Nonlinear underdetermined blind sig-nal separation using Bayesian neural network approach, Digital SignalProcessing, 17 (2007), pp. 50–68.

[114] H. Yang, S.-I. Amari, and A. Cichocki, Information-theoretic ap-proach to blind separation of sources in non-linear mixture, Signal Pro-cessing, 64 (1998), pp. 291–300.


[115] K. Zhang and L.-W. Chan, Extended gaussianization method for blindseparation of post-nonlinear mixtures, Neural Computation, 17 (2005),pp. 425–452.

[116] A. Ziehe, M. Kawanabe, S. Harmeling, and K.-R. Muller, Separa-tion of post-nonlinear mixtures using ACE and temporal decorrelation, inProc. of the 3rd Int. Conf. on Independent Component Analysis and SignalSeparation (ICA2001), San Diego, CA, USA, December 2001, pp. 433–438.

[117] , Blind separation of post-nonlinear mixtures using gaussianiz-ing transformations and temporal decorrelation, in Proc. of 4th Int.Symp. on Independent Component Analysis and Blind Source Separation(ICA2003), S.-I. Amari, A. Cichocki, S. Makino, and N. Murata, eds.,Nara, Japan, April 2003, pp. 269–274.

Documents

Nonlinear mixtures - Aaltousers.ics.aalto.fi/juha/papers/finalchapter.pdf · problem for general nonlinear mixtures. These results clearly show that nonlin-ear ICA is a highly non-unique