Je ery-Williams Lecture fileJe ery-Williams Lecture On the e ectiveness of operator-valued free probability theory Roland Speicher Universit at des Saarlandes Saarbrucken, Germany

Jeffery-Williams Lecture

On the effectiveness of operator-valued

free probability theory

Roland Speicher

Universitat des Saarlandes

Saarbrucken, Germany

joint work with Serban Belinschi, Tobias Mai, John Treilhard,

Carlos Vargas

Once Upon a Time ....

... There Were Large Random Matrices

We are interested in the limiting eigenvalue distribution of an

N ×N random matrix for N →∞.

Typical phenomena for basic random matrix ensembles:

• almost sure convergence to a deterministic limit eigenvalue

distribution

• this limit distribution can be effectively calculated





distribution





• almost sure convergence to a deterministic limit eigen-

value distribution


Wigner Random Matrix

A Wigner random matrix

X =(xij)Ni,j=1

• is symmetric:

X∗ = X

• {xij | 1 ≤ i ≤ j ≤ N} are independent and identically dis-

tributed

8 eigenvalues of an 8× 8 matrix with random ±1 entries

−10 −8 −6 −4 −2 0 2 4 6 8 100

1

2

3

4

5

6

7

8

1 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 1


−10 −8 −6 −4 −2 0 2 4 6 8 100

1

2

3

4

5

6

7

8

−1 −1 −1 −1 −1 −1 −1 −1−1 −1 −1 −1 −1 −1 −1 −1−1 −1 −1 −1 −1 −1 −1 −1−1 −1 −1 −1 −1 −1 −1 −1−1 −1 −1 −1 −1 −1 −1 −1−1 −1 −1 −1 −1 −1 −1 −1−1 −1 −1 −1 −1 −1 −1 −1−1 −1 −1 −1 −1 −1 −1 −1


−10 −8 −6 −4 −2 0 2 4 6 8 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

−1 −1 1 1 1 −1 1 −1−1 −1 1 −1 −1 −1 1 −11 1 1 1 −1 −1 1 −11 −1 1 −1 −1 −1 1 11 −1 −1 −1 1 1 −1 −1−1 −1 −1 −1 1 1 −1 11 1 1 1 −1 −1 −1 −1−1 −1 −1 1 −1 1 −1 −1


−10 −8 −6 −4 −2 0 2 4 6 8 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

1 −1 −1 1 −1 −1 1 −1−1 1 −1 1 1 −1 1 1−1 −1 −1 −1 1 1 1 −11 1 −1 1 −1 1 1 1−1 1 1 −1 −1 −1 −1 −1−1 −1 1 1 −1 1 −1 −11 1 1 1 −1 −1 −1 −1−1 1 −1 1 −1 −1 −1 −1

100 eigenvalues of 100× 100 matrix with random ±1√N

entries

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

N = 100 realisation 1


entries

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4



entries

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


4000 eigenvalues of 4000×4000 matrix with random ±1√N

entries

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.50

0.05

0.1

0.15

0.2

0.25

0.3

0.35



entries

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.50

0.05

0.1

0.15

0.2

0.25

0.3

0.35



entries

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.50

0.05

0.1

0.15

0.2

0.25

0.3

0.35


Almost Sure Convergence to a Deterministic

Limit Eigenvalue Distribution

For large N , the eigenvalue distribution of X is with very high

probability very close to a deterministic “limit distribution”.

Wishart Random Matrix

A Wishart random matrix X is of the form X = AA∗ where

• A is an N ×M matrix

A =(aij)i=1,...,Nj=1,...,M

• where all entries are independent and identically distributed:

{aij | 1 ≤ i ≤ N,1 ≤ j ≤M} are iid

For N →∞, one keeps the ratio

λ :=N

Mfixed.

100 eigenvalues of a Wishart matrix, with λ = 0.25

0 0.5 1 1.5 2 2.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

N = 100


0 0.5 1 1.5 2 2.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

N = 1000


0 0.5 1 1.5 2 2.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



0 0.5 1 1.5 2 2.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



0 0.5 1 1.5 2 2.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


Almost Sure Convergence to a Deterministic

Limit Eigenvalue Distribution

For large N , the eigenvalue distribution of X is with very high

probability (for generic choices of X) very close to a deterministic

“limit distribution”, which depends on λ.




• almost sure convergence to a deterministic limit eigen-

value distribution






distribution


The Cauchy (or Stieltjes) Transform

For any probability measure µ on R we define its Cauchy trans-

form

G(z) :=∫R

1

z − tdµ(t)

This is an analytic function G : C+ → C− and we can recover µ

from G by Stieltjes inversion formula

dµ(t) = −1

πlimε→0=G(t+ iε)dt

For our basic random matrix ensembles one can derive equationsfor the Cauchy transform of the limiting eigenvalue distribution,solve those equations and then get the density via Stieltjes in-version:

Wigner random matrix

G(z)2 + 1 = zG(z),

which can be solved as

G(z) =z −

√z2 − 4

2, thus dµs(t) =

1

2π

√4− t2dt

Wigner random matrix and Wigner’s semicircle

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

x

For our basic random matrix ensembles one can derive equations

for the Cauchy transform of the limiting eigenvalue distribution,

solve those equations and then get the density via Stieltjes in-

version:

Wishart random matrix

λ

1−G(z)+

1

G(z)= z

which can be solved as

G(z) =z + 1− λ−

√(z − (1 + λ))2 − 4λ

2zand thus

dµ(t) =1

2πλt

√4λ− (t− (1 + λ))2dt

Wishart random matrix and Marchenko-Pastur distrib.

0 0.5 1 1.5 2 2.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

x

The Saga Begins ...

.... Consider Functions of Several Independent

Random Matrices

We are interested in the limiting eigenvalue distribution of

functions of several N ×N random matrices for N →∞.

Typical phenomena:


distribution


Wigner + Wishart random matrices, N = 3000

−2 −1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

realization 1


−2 −1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

realization 2


−2 −1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

realization 3

We are interested in the limiting eigenvalue distribution of

functions of several N ×N random matrices for N →∞.

Typical phenomena:


distribution

• this limit distribution can be effectively calculated only in

very simple situations

For simple situations one can derive equations for the Cauchy

transform of the limiting eigenvalue distribution; those can usu-

ally not be solved explicitly; however, as fixed point equations

they have a good analytic behaviour and can be solved numeri-

cally by iteration algorithms

Wigner + Wishart: For G(z) := GWigner+Wishart(z) one finds

the fixed point equation (in subordination form)

G(z) = GWishart(z −G(z)),

which can be easily solved by iteration.


−2 −1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Results for Calculations of the Limit Eigenvalue

Distribution

• Marchenko, Pastur 1967: general Wishart matrices ADA∗

• Pastur 1972: deterministic + Wigner (deformed semicircle)

• Speicher, Nica 1998; Vasilchuk 2003: commutator or anti-commutator: X1X2 ±X2X1

• more general models in wireless communications (Tulino,Verdu 2004; Couillet, Debbah, Silverstein 2011):

RADA∗R or∑i

RiAiDiA∗iRi

The Quest:

But What About More Complicated or Even

General Selfadjoint Polynomials

.... something like

P (X,Y ) = XY + Y X +X2

or

P (X1, X2, X3) = X1X2X1 +X2X3X2 +X3X1X3

or even just

P (X1, . . . , Xk) P selfadjoint polynomial

P (X,Y ) = XY + Y X +X2

for independent X,Y ; X is Wigner, Y is Wishart

−5 0 5 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

N = 100

P (X,Y ) = XY + Y X +X2


−5 0 5 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

N = 300

P (X,Y ) = XY + Y X +X2


−5 0 5 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

N = 3000, Realization 1

P (X,Y ) = XY + Y X +X2


−5 0 5 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

N = 3000, Realization 2

The Hero:

Free Probability Theory

Definition of Freeness (Voiculescu 1985)

Let (A, ϕ) be non-commutative probability space, i.e., A isa unital algebra and ϕ : A → C is unital linear functional (i.e.,ϕ(1) = 1)

Unital subalgebras Ai (i ∈ I) are free or freely independent, ifϕ(a1 · · · an) = 0 whenever

• ai ∈ Aj(i), j(i) ∈ I ∀i, j(1) 6= j(2) 6= · · · 6= j(n)

• ϕ(ai) = 0 ∀i

Random variables x1, . . . , xn ∈ A are freely independent, if theirgenerated unital subalgebras Ai := algebra(1, xi) are so.

What Is Freeness?

Freeness between x and y is an infinite set of equations relating

various moments in x and y:

ϕ

(p1(x)q1(y)p2(x)q2(y) · · ·

)= 0

Basic observation: free independence between x and y is actually

a rule for calculating mixed moments in x and y from the

moments of x and the moments of y:

ϕ

(xm1yn1xm2yn2 · · ·

)= polynomial

(ϕ(xi), ϕ(yj)

)

If x and y are freely independent, then we have

ϕ(xmyn) = ϕ(xm) · ϕ(yn)

ϕ(xm1ynxm2) = ϕ(xm1+m2) · ϕ(yn)

but also

ϕ(xyxy) = ϕ(x2) · ϕ(y)2 + ϕ(x)2 · ϕ(y2)− ϕ(x)2 · ϕ(y)2

Free independence is a rule for calculating mixed moments,

analogous to the concept of independence for random variables.

Note: free independence is a different rule from classical indepen-

dence; free independence occurs typically for non-commuting

random variables, like operators on Hilbert spaces

Consequence: Distribution of Polynomial in

Freely Independent Variables Is Determined by

Distributions of Their Variables

If x1, . . . , xk are freely independent, and p is a polynomial in k

variables, then the distribution of p(x1, . . . , xk) is determined by

the moments of each of the xi and by the fact that they are

freely independent.

Where Does Free Independence Show Up?

• generators of the free group in the corresponding free group

von Neumann algebras L(Fn)

• creation and annihilation operators on full Fock spaces

• for many classes of random matrices

Where Does Free Independence Show Up?

• generators of the free group in the corresponding free group

von Neumann algebras L(Fn)

• creation and annihilation operators on full Fock spaces

• for many classes of random matrices

Asymptotic Freeness of Random Matrices

Basic result of Voiculescu (1991):

Large classes of independent random matrices (like Wigner or

Wishart matrices) become asymptoticially freely independent,

with respect to ϕ = 1NTr, if N →∞.

Consequence: Reduction of Our random Matrix

Problem to the Problem of Polynomial in Freely

Independent Variables

If the random matrices X1, . . . , Xk are asymptotically freely inde-

pendent, then the distribution of a polynomial p(X1, . . . , Xk) is

asymptotically given by the distribution of p(x1, . . . , xk), where

• x1, . . . , xk are freely independent variables, and

• the distribution of xi is the asymptotic distribution of Xi

Can We Actually Calculate Polynomials in

Freely Independent Variables?

Free probability can deal effectively with simple polynomials

• the sum of variables (Voiculescu 1986, R-transform)

p(x, y) = x+ y

• the product of variables (Voiculescu 1987, S-transform)

p(x, y) = xy (=√xy√x)

• the commutator of variables (Nica, Speicher 1998)

p(x, y) = xy − yx

There is no hope to calculate effectively more

complicated or general polynomials in freely

independent variables with usual free probability

theory!

The Superhero:

Operator-Valued Extension of Free

Probability

Let B ⊂ A. A linear map

E : A → B

is a conditional expectation if

E[b] = b ∀b ∈ B

and

E[b1ab2] = b1E[a]b2 ∀a ∈ A, ∀b1, b2 ∈ B

An operator-valued probability space consists of B ⊂ A and a

conditional expectation E : A → B

Consider an operator-valued probability space E : A → B.

Random variables xi ∈ A (i ∈ I) are freely independent with

respect to E (or operator-valued freely independent) if

E[a1 · · · an] = 0

whenever ai ∈ B〈xj(i)〉 are polynomials in some xj(i) with coeffi-

cients from B and

E[ai] = 0 ∀i and j(1) 6= j(2) 6= · · · 6= j(n).

Calculation Rule for Mixed Moments

For operator-valued freely independent variables, one has analo-

gous formulas as in scalar-valued case, ...

The formula

ϕ(xyxy) = ϕ(xx)ϕ(y)ϕ(y) +ϕ(x)ϕ(x)ϕ(yy)−ϕ(x)ϕ(y)ϕ(x)ϕ(y)

has now to be written as

E[xyxy] = E[xE[y]x

]·E[y] +E[x] ·E

[yE[x]y

]−E[x]E[y]E[x]E[y]

Can We Actually Calculate Polynomials in

Operator-Valued Freely Independent Variables?

Again, in principle all operator-valued polynomials in freely inde-pendent variables are determined, but effectively we can againonly deal with simple polynomials:

• the sum of variablesVoiculescu 1995Belinschi, Mai, Speicher 2012

• the product of variablesVoiculescu 1995; Dykema 2006Belinschi, Speicher, Treilhard, Vargas 2012

The Miracle:

The Linearization Trick

Operator-Valued Polynomials Are Matrices of

Polynomials

Operator-valued polynomials in variables x1, . . . , xk are matrices

with entries given by polynomials in those random variables:

p11(x1, . . . , xk) · · · p1r(x1, . . . , xk)

... . . . ...

pr1(x1, . . . , xk) . . . prr(x1, . . . , xk)

The Linearization Philosophy:

In order to understand matrices of polynomials it suffices tounderstand (bigger) matrices of linear polynomials.

In particular, in order to understand polynomials in non-commuting variables, it suffices to understand matrices of linearpolynomials in those variables.

• Voiculescu 1987: motivation

• Haagerup, Thorbjørnsen 2005: largest eigenvalue

• Anderson 2012: the selfadjoint version

The selfadjoint linearization of

p = xy + yx+ x2 is p =

0 x y + x

2

x 0 −1

y + x2 −1 0

This means: the Cauchy transform Gp(z) of p = xy + yx + x2

is given as the (1,1)-entry of the operator-valued (3× 3 matrix)

Cauchy transform of p:

Gp(b) = id⊗ϕ[(b− p)−1

]=

Gp(z) ∗ ∗∗ ∗ ∗∗ ∗ ∗

for b =

z 0 00 0 00 0 0

But

p =

0 x y + x

2

x 0 −1

y + x2 −1 0

= x+ y

with

x =

0 x x

2

x 0 0

x2 0 0

and y =

0 0 y

0 0 −1

y −1 0

.

So p is just the sum of two operator-valued variables

p =

0 x x

2

x 0 0

x2 0 0

+

0 0 y

0 0 −1

y −1 0

.

where we understand the operator-valued distributions of x and

of y.

Are x and y freely independent?

Another Miracle

Matrices of Freely Independent Variables are matrix-valuedFreely Independent

If x and y are freely independent with respect to ϕ, then for anypolynomials pij in x and any polynomials qkl in y one has:

p11(x) . . . p1r(x)... . . . ...

pr1(x) . . . prr(x)

and

q11(y) . . . q1r(y)... . . . ...

qr1(y) . . . qrr(y)

are free with respect to

id⊗ ϕ

The Final Battle:

Algorithm and Calculation for Arbitrary

Selfadjoint Polynomial in Freely

Independent Variables

Input: p(x, y), Gx(z), Gy(z)

↓

Linearize p(x, y) to p = x+ y

↓

Gx(b) out of Gx(z) and Gy(b) out of Gy(z)

↓

Get w1(b) as the fixed point of the iterationw 7→ Gy(b+Gx(w)−1 − w)−1 − (Gx(w)−1 − w)

↓

Gp(b) = Gx(ω1(b))

↓

Recover Gp(z) as one entry of Gp(b)

P (X,Y ) = XY + Y X + X2

for independent X,Y ; X is Wigner and Y is Wishart

−5 0 5 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

p(x, y) = xy + yx + x2

for free x, y; x is semicircular and y is Marchenko-Pastur

P (X1, X2, X3) = X1X2X1 + X2X3X2 + X3X1X3for independent X1, X2, X3; X1, X2 Wigner, X3 Wishart

−10 −5 0 5 10 150

0.05

0.1

0.15

0.2

0.25

0.3

0.35

p(x1, x2, x3) = x1x2x1 + x2x3x2 + x3x1x3for free x1, x2, x3; x1, x2 semicircular, x3 Marchenko-Pastur

The Happy End

Theorem (Belinschi, Mai, Speicher 2012):

Combining the selfadjoint lineaziation trick with our

new analysis of operator-valued free convolution we

can provide an efficient and analytically controllable

algorithm for calculating the asymptotic eigenvalue

distribution of

• any selfadjoint polynomial in asymptotically

free random matrices.

Documents

Je ery-Williams Lecture fileJe ery-Williams Lecture On the e ectiveness of operator-valued free probability theory Roland Speicher Universit at des Saarlandes Saarbrucken, Germany