30
1 Complex-Valued Matrix Differentiation: Techniques and Key Results Are Hjørungnes, Senior Member, IEEE, and David Gesbert, Senior Member, IEEE. Abstract A systematic theory is introduced for finding the derivatives of complex-valued matrix functions with respect to a complex-valued matrix variable and the complex conjugate of this variable. In the framework introduced, the differential of the complex-valued matrix function is used to identify the derivatives of this function. Matrix differentiation results are derived and summarized in tables which can be exploited in a wide range of signal processing related situations. The Hessian matrix of a scalar complex function is also defined, and it is shown how it can be obtained from the second-order differential of the function. The methods given are general such that many other results can be derived using the framework introduced, although only important cases are treated. Keywords: Complex differentials, non-analytical complex functions, complex matrix derivatives, complex Hessian. I. I NTRODUCTION In many engineering problems, the unknown parameters are complex-valued vectors and matrices and, often, the task of the system designer is to find the values of these complex parameters which optimize a chosen criterion function. Examples of areas where these types of optimization problems might appear, can be in telecommunications, where digital filters can contain complex-valued coefficients [1], electric circuits [2], control theory [3], adaptive filters [4], acoustics, optics, mechanical vibrating systems, heat conduction, fluid flow, and electrostatics [5]. For solving this kind of optimization problems, one approach is to find necessary conditions for optimality. When a scalar real-valued function depends on a complex-valued matrix parameter, the necessary conditions for optimality Corresponding author: Are Hjørungnes is with UniK - University Graduate Center, University of Oslo, Instituttveien 25, P. O. Box 70, N-2027 Kjeller, Norway, email: [email protected]. David Gesbert is with Mobile Communications Department, Eurécom Institute, 2229 Route des Crêtes, BP 193, F-06904 Sophia Antipolis Cédex, France, email: [email protected]. This work was supported by the Research Council of Norway and the French Ministry of Foreign Affairs through the Aurora project entitled "Optimization of Broadband Wireless Communications Network".

1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

1

Complex-Valued Matrix Differentiation:

Techniques and Key Results

Are Hjørungnes, Senior Member, IEEE, and David Gesbert, Senior Member, IEEE.

Abstract

A systematic theory is introduced for finding the derivatives of complex-valued matrix functions with respect

to a complex-valued matrix variable and the complex conjugate of this variable. In the framework introduced,

the differential of the complex-valued matrix function is used to identify the derivatives of this function. Matrix

differentiation results are derived and summarized in tables which can be exploited in a wide range of signal

processing related situations. The Hessian matrix of a scalar complex function is also defined, and it is shown how

it can be obtained from the second-order differential of the function. The methods given are general such that many

other results can be derived using the framework introduced, although only important cases are treated.

Keywords: Complex differentials, non-analytical complex functions, complex matrix derivatives, complex Hessian.

I. INTRODUCTION

In many engineering problems, the unknown parameters are complex-valued vectors and matrices and, often, the

task of the system designer is to find the values of these complex parameters which optimize a chosen criterion

function. Examples of areas where these types of optimization problems might appear, can be in telecommunications,

where digital filters can contain complex-valued coefficients [1], electric circuits [2], control theory [3], adaptive

filters [4], acoustics, optics, mechanical vibrating systems, heat conduction, fluid flow, and electrostatics [5]. For

solving this kind of optimization problems, one approach is to find necessary conditions for optimality. When a

scalar real-valued function depends on a complex-valued matrix parameter, the necessary conditions for optimality

Corresponding author: Are Hjørungnes is with UniK - University Graduate Center, University of Oslo, Instituttveien 25, P. O. Box 70,

N-2027 Kjeller, Norway, email: [email protected].

David Gesbert is with Mobile Communications Department, Eurécom Institute, 2229 Route des Crêtes, BP 193, F-06904 Sophia Antipolis

Cédex, France, email: [email protected].

This work was supported by the Research Council of Norway and the French Ministry of Foreign Affairs through the Aurora project

entitled "Optimization of Broadband Wireless Communications Network".

Page 2: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

2

can be found by either setting the derivative of the function with respect to the complex-valued matrix parameter or

its complex conjugate to zero. Differentiation results are well-known for certain classes of functions, e.g., quadratic

functions, but can be tricky for others. This paper provides the tools for finding derivatives in a systematic way.

In an effort to build adaptive optimization algorithms, it will also be shown that the direction of maximum rate

of change of a real-valued scalar function, with respect to the complex-valued matrix parameter, is given by the

derivative of the function with respect to the complex conjugate of the complex-valued input matrix parameter. Of

course, this is a generalization of a well-known result for scalar functions of vector variables. A general framework

is introduced here showing how to find the derivative of complex-valued scalar-, vector-, or matrix functions with

respect to the complex-valued input parameter matrix and its complex conjugate. The main contribution of this

paper lies in the proposed approach for how to obtain derivatives in a way that is both simple and systematic,

based on the so-called differential of the function. In this paper, it is assumed that the functions are differentiable

with respect to the complex-valued parameter matrix and its complex conjugate, and it will be seen that these two

parameter matrices should be treated as independent when finding the derivatives, as is classical for scalar variables.

It is also shown how the Hessian matrix, i.e., second-order derivate, of a complex scalar function can be calculated.

The proposed theory is useful when solving numerous problems which involve optimization when the unknown

parameter is a complex-valued matrix.

The problem at hand has been treated for real-valued matrix variables in [6], [7], [8], [9], [10]. Four additional

references that give a brief treatment of the case of real-valued scalar functions which depend complex-valued vectors

are Appendix B of [11], Appendix 2.B in [12], Subsection 2.3.10 of [13], and the article [14]. The article [15]

serves as an introduction to this area for complex-valued scalar functions with complex-valued argument vectors.

Results on complex differentiation theory is given in [16], [17] for differentiation with respect to complex-valued

scalars and vectors, however, the more general matrix case is not considered. Examples of problems where the

unknown matrix is a complex-valued matrix are wide ranging including precoding of MIMO systems [18], linear

equalization design [19], array signal processing [20] to only cite a few.

Some of the most relevant applications to signal and communication problems are presented here, with key results

being highlighted and other illustrative examples are listed in tables. The complex Hessian matrix is defined for

Page 3: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

3

TABLE ICLASSIFICATION OF FUNCTIONS.

Function type Scalar variables z, z∗ ∈ C Vector variables z, z∗ ∈ CN×1 Matrix variables Z , Z∗ ∈ C

N×Q

Scalar function f ∈ C f�z, z∗� f

�z, z∗� f

�Z , Z∗�

f : C × C → C f : CN×1 × C

N×1 → C f : CN×Q × C

N×Q → C

Vector function f ∈ CM×1 f

�z, z∗� f

�z, z∗� f

�Z , Z∗�

f : C × C → CM×1 f : C

N×1 × CN×1 → C

M×1 f : CN×Q × C

N×Q → CM×1

Matrix function F ∈ CM×P F

�z, z∗� F

�z, z∗� F

�Z , Z∗�

F : C × C → CM×P F : C

N×1 × CN×1 → C

M×P F : CN×Q × C

N×Q → CM×P

complex scalar functions, and it is shown how it can be obtained from the second-order differential of the functions.

Hessian matrices can be used to check if a stationary point is a minimum, maximum, or a saddle point.

The rest of this paper is organized as follows: Section II contains a discussion on the differences between analytical

functions, which are usually studied in mathematical courses for engineers, and non-analytical functions, which

are often encountered when dealing with engineering problems of complex variables. In Section III, the complex

differential is introduced, and based on this differential, the definition of the derivatives of complex-valued matrix

function with respect to the complex-valued matrix argument and its complex conjugate is given in Section IV. The

key procedure showing how the derivatives can be found from the differential of a function is also presented in

Section IV. Section V contains important results such as the chain rule, equivalent conditions for finding stationary

points, and in which direction the function has the maximum rate of change. In Section VI, several key results

are placed in tables and some results are derived for various cases with high relevance for signal processing and

communication problems. The Hessian matrix of scalar complex-valued functions dependent on complex matrices

is defined in Section VII, and it is shown how it can be obtained. Section VIII contains some conclusions. Some

of the proofs are given in the appendices.

Notation: The following conventions are always used in this article: Scalar quantities (variables z or functions f )

are denoted by lowercase symbols, vector quantities (variables z or functions f ) are denoted by lowercase boldface

symbols, and matrix quantities (variables Z or functions F ) are denoted by capital boldface symbols. The types

of functions used throughout this paper are classified in Table I. From the table, it is seen that all the functions

depend on a complex variable and the complex conjugate of the same variable. Let j =√−1 be the imaginary

unit. Let the symbol z = x + jy denote a complex variable, where the real and imaginary parts are denoted by

x and y, respectively. The real and imaginary operators return the real and imaginary parts of the input matrix,

Page 4: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

4

respectively. These operators are denoted by Re{·} and Im{·}. If Z ∈ CN×Q is a complex-valued1 matrix, then

Z = Re {Z} + j Im {Z}, and Z∗ = Re {Z} − j Im {Z}, where Re {Z} ∈ RN×Q, Im {Z} ∈ R

N×Q, and the

operator (·)∗ denotes complex conjugate of the matrix it is applied to. The real and imaginary operators can be

expressed as Re {Z} = 12 (Z + Z∗) and Im {Z} = 1

2j (Z − Z∗).

II. ANALYTICAL VERSUS NON-ANALYTICAL FUNCTIONS

Mathematical courses on complex functions for engineers often only involve the analysis of analytical functions:

Definition 1: Let D ⊆ C be the domain of definition of the function f : D → C. The function f , which depends

on a complex scalar variable z, is an analytical function in the domain D [5] if limΔz→0

f(z + Δz) − f(z)Δz

exists

for all z ∈ D.

If f(z) satisfies the Cauchy-Riemann equations [5], then it is analytical. A function that is complex differentiable

is also named analytic, holomorphic, or regular. The Cauchy-Riemann condition can be formulated as one equation

as ∂∂z∗ f = 0, where the derivative of f with respect to z∗ and z, respectively, is found by treating the variable z

and z∗ as a constant, i.e., ∂∂z∗ z = 0 and ∂

∂z z∗ = 0, see also Definition 2 where this kind of partial derivatives will

be defined with more details. From ∂∂z∗ f = 0, it is seen that any analytical function f is not dependent on the

variable z∗. This can also be seen from Theorem 1, page 804 in [5], which states that any analytical function f(z)

can be written as an infinite power series2 with non-negative exponents of the complex variable z, and this power

series is called the Taylor series. The series does not contain any terms that depend on z ∗. The derivative of

a complex-valued function in mathematical courses of complex analysis for engineers is often defined only for

analytical functions. However, in many engineering problems, the functions of interest are often not analytic, since

they are often real-valued functions. Conversely, if a function is only dependent on z, like an analytical function,

and not implicitly or explicitly dependent on z∗, then this function cannot in general be real-valued, since the

function can only be real if the imaginary parts of z can be eliminated, and this is handled by the terms that depend

on z∗. An alternative treatment of how to find the derivative of real functions dependent on complex variables

than the one used for analytical function is needed. In this article, a systematic theory is provided for finding the

derivative of non-analytical functions.

1R and C are the sets of the real and complex numbers, respectively.

2A power series in the variable z ∈ C is an infinite sum of the form∞�

n=0

an (z − z0)n, where an, z0 ∈ C [5].

Page 5: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

5

A. Euclidean Distance

In engineering problems, the squared Euclidean distance is often used. Let f(z) = |z| 2 = zz∗. If the traditional

definition of the derivative given in Definition 1 is used, then the function f is not differentiable because:

limΔz→0

f(z0 + Δz) − f(z0)Δz

= limΔz→0

|z0 + Δz|2 − |z0|2Δz

= limΔz→0

(Δz) z∗0 + z0(Δz)∗ + Δz(Δz)∗

Δz. (1)

This limit does not exist because different values are found depending on how Δz is approaching 0. Let Δz =

Δx + jΔy. Firstly, let Δz approach zero such that Δx = 0, then the last fraction in (1) is j(Δy)z∗0−jz0Δy+(Δy)2

jΔy =

z∗0 − z0 − jΔy, which approaches z∗0 − z0 = −2j Im{z0} when Δy → 0. Secondly, let Δz approach zero such that

Δy = 0, then the last fraction in (1) is (Δx)z∗0+z0Δx+(Δx)2

Δx = z0 + z∗0 + Δx, which approaches z0 + z∗0 = 2Re{z0}

when Δx → 0. For any non-zero complex number z0 the following holds: Re{z0} �= −j Im{z0}, and therefore, the

function f(z) = |z|2 is not differentiable when using the commonly encountered definition given in Definition 1.

There are two alternative ways [13] to find the derivative of a scalar real-valued function f ∈ R with respect to

the complex-valued matrix parameter Z ∈ CN×Q. The standard way is to rewrite f as a function of the real X and

imaginary parts Y of the complex variable Z, and then to find the derivatives of the rewritten function with respect

to these two independent real variables, X and Y , separately. Notice that NQ independent complex parameters

in Z correspond to 2NQ independent real parameters in X and Y . In this paper, we bring the reader’s attention

onto an alternative approach that can lead to a simpler derivation. In this approach, one treats the differentials of

the variables variables Z and Z∗ as independent, in the way that will be shown by Lemma 1, see also [15]. It will

be shown later, that both the derivative of f with respect to Z and Z ∗ can be identified by the differential of f .

B. Formal Partial Derivatives

Definition 2: The derivatives with respect to z and z∗ of f(z0) are called the formal partial derivatives of f at

z0 ∈ C. Let z = x + jy, where x, y ∈ R, then the formal derivatives, or Wirtinger derivatives [21], are defined as:

∂zf(z0) � 1

2

(∂

∂xf(z0) − j

∂yf(z0)

)and

∂z∗f(z0) � 1

2

(∂

∂xf(z0) + j

∂yf(z0)

). (2)

Alternatively, when finding ∂∂zf(z0) and ∂

∂z∗ f(z0), the parameters z and z∗, respectively, are treated as independent

variables, see [15] for more details.

Page 6: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

6

From Definition 2, it follows that ∂∂xf(z0) and ∂

∂yf(z0) can be found as:

∂xf(z0) =

∂zf(z0) +

∂z∗f(z0) and

∂yf(z0) = j

(∂

∂zf(z0) − ∂

∂z∗f(z0)

). (3)

If f is dependent on several variables, Definition 2 can be extended. Later in this article, it will be shown how

the derivatives, with respect to the complex-valued matrix parameter and its complex conjugate, of all the function

types given in Table I can be identified from the differentials of these functions.

Example 1: Let f : C × C → R be f(z, z∗) = zz∗. f is differentiable3 with respect to both variables z and

z∗, and ∂∂z f(z, z∗) = z∗ and ∂

∂z∗ f(z, z∗) = z. When the complex variable z and its complex conjugate twin z ∗

are treated as independent variables [15] when finding the derivatives of f , then f is differentiable in both of

these variables. However, as shown earlier in this section, the same function is not differentiable when studying

analytical functions defined according to Definition 1.

III. COMPLEX DIFFERENTIALS

The differential has the same size as the matrix it is applied to. The differential can be found component-wise,

that is, (dZ)k,l = d (Z)k,l. A procedure that can often be used for finding the differentials of a complex matrix

function4 F (Z0,Z1) is to calculate the difference

F (Z0 + dZ0,Z1 + dZ1) − F (Z0,Z1) = First-order(dZ0, dZ1) + Higher-order(dZ0, dZ1), (4)

where First-order(·, ·) returns the terms that depend on either dZ 0 or dZ1 of the first order, and Higher-order(·, ·)

returns the terms that depend on the higher order terms of dZ 0 and dZ1. The differential is then given by

First-order(·, ·), i.e., the first order term of F (Z0+dZ0,Z1+dZ1)−F (Z0,Z1). As an example, let F (Z0,Z1) =

Z0Z1. Then the difference in (4) can be developed and readily expressed as: F (Z 0+dZ0,Z1+dZ1)−F (Z0,Z1) =

Z0dZ1 + (dZ0)Z1 + (dZ0)(dZ1). The differential of Z0Z1 can then be identified as all the first-order terms on

either dZ0 or dZ1 as dZ0Z1 = Z0dZ1 + (dZ0)Z1.

Definition 3: The Moore-Penrose inverse of Z ∈ CN×Q is denoted by Z+ ∈ C

Q×N , and it is defined through

the following four relations [22]:

(ZZ+

)H = ZZ+, (5)

3Here, Definition 2 is used for finding the formal partial derivatives.4The indexes are chosen to start with 0 everywhere in this article.

Page 7: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

7

(Z+Z

)H = Z+Z, (6)

ZZ+Z = Z, (7)

Z+ZZ+ = Z+, (8)

where the operator (·)H is the Hermitian operator, or the complex conjugate transpose.

Let ⊗ and � denote the Kronecker and Hadamard product [23], respectively. Some of the most important rules on

complex differentials are listed in Table II, assuming A, B, and a to be constants, and Z, Z 0, and Z1 to be complex-

valued matrix variables. The vectorization operator vec(·) stacks the columns vectors of the argument matrix into

a long column vector in chronological order [23]. The differentiation rule of the reshaping operator reshape(·) in

Table II is valid for any linear reshaping5 operator reshape(·) of the matrix, and examples of such operators are

the transpose (·)T or vec(·). Some of the basic differential results in Table II can be derived by means of (4), and

others can be derived by generalizing some of the results found in [7], [9] to the complex differential case.

Differential of the Moore-Penrose Inverse: The differential of the real-valued Moore-Penrose inverse can be

found in in [7], [8], but the complex-valued version is derived here.

Proposition 1: Let Z ∈ CN×Q, then

dZ+ = −Z+(dZ)Z+ + Z+(Z+)H(dZH)(IN − ZZ+

)+

(IQ − Z+Z

)(dZH)(Z+)HZ+. (9)

The proof of Proposition 1 can be found in Appendix I.

From Table II, the following four equalities follows dZ = dRe {Z}+ jd Im {Z}, Z∗ = dRe {Z}− jd Im {Z},

Re {Z} = 12 (dZ + dZ∗), and Im {Z} = 1

2j (dZ − dZ∗).

The following two lemmas are used to identify the first- and second-order derivatives later in the article. The

real variables Re {Z} and Im {Z} are independent of each other and hence are their differentials. Although the

complex variables Z and Z∗ are related, their differentials are linearly independent in the following way:

Lemma 1: Let Z ∈ CN×Q and let Ai ∈ C

M×NQ. If A0d vec(Z)+A1d vec(Z∗) = 0M×1 for all dZ ∈ CN×Q,

then Ai = 0M×NQ for i ∈ {0, 1}.

5The output of the reshape operator has the same number of elements as the input, but the shape of the output might be different, so

reshape(·) performs a reshaping of its input argument.

Page 8: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

8

TABLE IIIMPORTANT RESULTS FOR COMPLEX DIFFERENTIALS.

Function Differential

A 0

aZ adZ

AZB A(dZ)B

Z0 + Z1 dZ0 + dZ1

Tr {Z} Tr {dZ}

Z0Z1 (dZ0)Z1 + Z0(dZ1)

Z0 ⊗ Z1 (dZ0) ⊗ Z1 + Z0 ⊗ (dZ1)

Z0 � Z1 (dZ0) � Z1 + Z0 � (dZ1)

Z−1 −Z−1(dZ)Z−1

det(Z) det(Z) Tr�

Z−1dZ�

ln(det(Z)) Tr�

Z−1dZ�

reshape(Z) reshape(dZ)

Z∗ (dZ)∗

ZH (dZ)H

Z+ −Z+(dZ)Z+ + Z+(Z+)H(dZH)�

IN − ZZ+�

+�

IQ − Z+Z�

(dZH)(Z+)HZ+

The proof of Lemma 1 can be found in Appendix II.

Lemma 2: Let Z ∈ CN×Q and let Bi ∈ C

NQ×NQ. If(d vecT (Z)

)B0d vec(Z)+

(d vecT (Z∗)

)B1d vec(Z)+

(d vecT (Z∗)

)B2d vec(Z∗) = 0 for all dZ ∈ C

N×Q, then B0 = −BT0 , B1 = 0NQ×NQ, and B2 = −BT

2 .

The proof of Lemma 2 can be found in Appendix III.

IV. DEFINITION OF THE DERIVATIVE WITH RESPECT TO COMPLEX-VALUED MATRICES

The most general definition of the derivative is given here from which the definitions for less general cases follow

and they will later be given in an identification table which shows how the derivatives can be obtained from the

differential of the function.

Definition 4: Let F : CN×Q×C

N×Q → CM×P . Then the derivative of the matrix function F (Z,Z ∗) ∈ C

M×P

with respect to Z ∈ CN×Q is denoted DZF , and the derivative of the matrix function F (Z,Z ∗) ∈ C

M×P with

respect to Z∗ ∈ CN×Q is denoted DZ∗F and the size of both these derivatives is MP × NQ. The derivatives

DZF and DZ∗F are defined by the following differential expression:

d vec(F ) = (DZF ) d vec(Z) + (DZ∗F ) d vec(Z∗). (10)

DZF (Z,Z∗) and DZ∗F (Z,Z∗) are called the Jacobian matrices of F with respect to Z and Z ∗, respectively.

Remark 1: Definition 4 is a generalization of Definition 1, page 173 in [7] to include complex matrices. In [7],

several alternative definitions of the derivative of real-valued functions with respect to a matrix are discussed, and

Page 9: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

9

TABLE IIIIDENTIFICATION TABLE.

Function type Differential Derivative with respect to z, z , or Z Derivative with respect to z∗, z∗ , or Z∗ Size of derivatives

f�z, z∗� df = a0dz + a1dz∗ Dzf

�z, z∗� = a0 Dz∗f

�z, z∗� = a1 1 × 1

f�z, z∗� df = a0dz + a1dz∗ Dzf

�z, z∗� = a0 Dz∗f

�z, z∗� = a1 1 × N

f�Z , Z∗� df = vecT (A0)d vec(Z) + vecT (A1)d vec(Z∗) DZf

�Z , Z∗� = vecT (A0) DZ∗f

�Z , Z∗� = vecT (A1) 1 × NQ

f�Z , Z∗� df = Tr

�AT

0 dZ + AT1 dZ∗� ∂

∂Zf�Z , Z∗� = A0

∂∂Z∗ f

�Z , Z∗� = A1 N × Q

f�z, z∗� df = b0dz + b1dz∗ Dzf

�z, z∗� = b0 Dz∗f

�z, z∗� = b1 M × 1

f�z, z∗� df = B0dz + B1dz∗ Dzf

�z, z∗� = B0 Dz∗f

�z, z∗� = B1 M × N

f�Z , Z∗� df = β0d vec(Z) + β1d vec(Z∗) DZf

�Z , Z∗� = β0 DZ∗f

�Z , Z∗� = β1 M × NQ

F�z, z∗� d vec(F ) = c0dz + c1dz∗ DzF

�z, z∗� = c0 Dz∗F

�z, z∗� = c1 MP × 1

F�z, z∗� d vec(F ) = C0dz + C1dz∗ DzF

�z, z∗� = C0 Dz∗F

�z, z∗� = C1 MP × N

F�Z , Z∗� d vec(F ) = ζ0d vec(Z) + ζ1d vec(Z∗) DZF

�Z , Z∗� = ζ0 DZ∗F

�Z , Z∗� = ζ1 MP × NQ

it is concluded that the definition that matches Definition 4 is the only reasonable definition. Definition 4 is also a

generalization of the definition used in [15] for complex-valued vectors to the case of complex-valued matrices.

Table III shows how the derivatives of the different function types in Table I can be identified from the differentials

of these functions.6 To show the uniqueness of the representation in (10), we subtract the differential in (10) from the

corresponding differential in Table III to get (ζ0 −DZF (Z,Z∗)) d vec(Z) + (ζ1 −DZ∗F (Z,Z∗)) d vec(Z∗) =

0MP×1. The derivatives in the last line of Table III then follow by using Lemma 1 on this equation. Table III is an

extension of the corresponding table given in [7], valid in the real variable case. In Table III, z ∈ C, z ∈ CN×1,

Z ∈ CN×Q, f ∈ C, f ∈ C

M×1, and F ∈ CM×P . Furthermore, ai ∈ C

1×1, ai ∈ C1×N , Ai ∈ C

N×Q, bi ∈ CM×1,

Bi ∈ CM×N , βi ∈ C

M×NQ, ci ∈ CMP×1, Ci ∈ C

MP×N , ζi ∈ CMP×NQ, and each of these might be a function

of z, z, Z, z∗, z∗, or Z∗.

Definition 5: Let f : CN×1 × C

N×1 → CM×1. The partial derivatives ∂

∂zT f(z,z∗) and ∂∂zH f(z,z∗) of size

M × N are defined as

∂zTf(z,z∗) =

⎡⎢⎢⎢⎢⎢⎢⎣

∂∂z0

f0 · · · ∂∂zN−1

f0

......

∂∂z0

fM−1 · · · ∂∂zN−1

fM−1

⎤⎥⎥⎥⎥⎥⎥⎦

,∂

∂zHf(z,z∗) =

⎡⎢⎢⎢⎢⎢⎢⎣

∂∂z∗

0f0 · · · ∂

∂z∗N−1

f0

......

∂∂z∗

0fM−1 · · · ∂

∂z∗N−1

fM−1

⎤⎥⎥⎥⎥⎥⎥⎦

, (11)

where zi and fi is component number i of the vectors z and f , respectively.

6For the functions of the type f (Z , Z∗) two alternative definitions for the derivatives are given. The notation ∂∂Z

f (Z, Z∗) and

∂∂Z∗ f (Z , Z∗) will be defined in Subsection VI-C.

Page 10: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

10

Notice that ∂∂zT f = Dzf and ∂

∂zH f = Dz∗f . Using the partial derivative notation in Definition 5, the derivatives

of the function F (Z,Z∗), in Definition 4, are:

DZF (Z,Z∗) =∂ vec(F (Z,Z∗))

∂ vecT (Z), (12)

DZ∗F (Z,Z∗) =∂ vec(F (Z,Z∗))

∂ vecT (Z∗). (13)

This is a generalization of the real-valued matrix variable case treated in [7] to the complex-valued matrix variable

case. (12) and (13) show how the all the MPNQ partial derivatives of all the components of F with respect to

all the components of Z and Z∗ are arranged when using the notation introduced in Definition 5.

Key result: Finding the derivative of the complex-valued matrix function F with respect to the complex-valued

matrices Z and Z∗ can be achieved using the following three-step procedure:

1) Compute the differential d vec(F ).

2) Manipulate the expression into the form given (10).

3) Read out the result using Table III.

For less general function types, see Table I, a similar procedure can be used.

V. FUNDAMENTAL RESULTS

In this section, some useful theorems are presented that exploit the theory introduced earlier. All of these results

are important when solving practical optimization problems involving differentiation with respect to a complex-

valued matrix. These results include the chain rule, conditions for finding stationary points for a real scalar function

dependent on complex-valued matrices, and in which direction the same type of function has the minimum or

maximum rate of change. A comment will be made on how this result should be used in the steepest decent

method [24]. For certain types of functions, the same procedure used for the real-valued matrix case [7] can be

used, and this result is stated in a theorem.

1) Chain Rule: One big advantage of the way the derivative is defined in Definition 4 compared to other

definitions of the derivative of F (Z,Z∗) is that the chain rule is valid. The chain rule is now formulated.

Theorem 1: Let (S0, S1) ⊆ CN×Q×C

N×Q, and let F : S0×S1 → CM×P be differentiable with respect to both

its first and second argument at an interior point (Z,Z ∗) in the set S0×S1. Let T0×T1 ⊆ CM×P ×C

M×P be such

Page 11: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

11

that (F (Z,Z∗),F ∗(Z,Z∗)) ∈ T0×T1 for all (Z,Z∗) ∈ S0×S1. Assume that G : T0×T1 → CR×S is differentiable

at an interior point (F (Z,Z∗),F ∗(Z,Z∗)) ∈ T0 × T1. Define the composite function H : S0 × S1 → CR×S by

H (Z,Z∗) � G (F (Z,Z∗),F ∗(Z,Z∗)). The derivatives DZH and DZ∗H are obtained through:

DZH = (DF G) (DZF ) + (DF∗G) (DZF ∗) , (14)

DZ∗H = (DFG) (DZ∗F ) + (DF∗G) (DZ∗F ∗) . (15)

Proof: From Definition 4, it follows that:

d vec(H) = d vec(G) = (DFG) d vec(F ) + (DF∗G) d vec(F ∗). (16)

The differentials of vec(F ) and vec(F ∗) are given by:

d vec(F ) = (DZF ) d vec(Z) + (DZ∗F ) d vec(Z∗), (17)

d vec(F ∗) = (DZF ∗) d vec(Z) + (DZ∗F ∗) d vec(Z∗). (18)

By substituting the results from (17) and (18), into (16), and then using the definition of the derivatives with respect

to Z and Z∗ the theorem follows.

2) Stationary Points: The next theorem presents conditions for finding stationary points of f(Z,Z ∗) ∈ R.

Theorem 2: Let f : CN×Q × C

N×Q → R. A stationary point7 of the function f(Z,Z∗) = g(X ,Y ), where

g : RN×Q × R

N×Q → R and Z = X + jY is then found by one of the following three equivalent conditions:8

DXg(X ,Y ) = 01×NQ ∧ DY g(X ,Y ) = 01×NQ, (19)

DZf(Z,Z∗) = 01×NQ, (20)

or

DZ∗f(Z,Z∗) = 01×NQ. (21)

Proof: In optimization theory [7], a stationary point is a defined as point where the derivatives with respect to

all the independent variables vanish. Since Re{Z} = X and Im{Z} = Y contain only independent variables, (19)

7Notice that a stationary point can be a local minimum, a local maximum, or a saddle point.8In (19), the symbol ∧ means that both of the equations stated in (19) must be satisfied at the same time.

Page 12: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

12

gives a stationary point by definition. By using the chain rule in Theorem 1, on both sides of f(Z,Z ∗) = g(X ,Y )

and taking the derivative with respect to X and Y , the following two equations are obtained:

(DZf) (DXZ) + (DZ∗f) (DXZ∗) = DXg, (22)

(DZf) (DY Z) + (DZ∗f) (DY Z∗) = DY g. (23)

From Table II, it follows that DXZ = DXZ∗ = INQ and DY Z = −DY Z∗ = jINQ. If these results are inserted

into (22) and (23), these two equations can be formulated into block matrix form in the following way:⎡⎢⎢⎣ DXg

DY g

⎤⎥⎥⎦ =

⎡⎢⎢⎣ 1 1

j −j

⎤⎥⎥⎦

⎡⎢⎢⎣ DZf

DZ∗f

⎤⎥⎥⎦ . (24)

This equation is equivalent to the following matrix equation:⎡⎢⎢⎣ DZf

DZ∗f

⎤⎥⎥⎦ =

⎡⎢⎢⎣

12 − j

2

12

j2

⎤⎥⎥⎦

⎡⎢⎢⎣ DXg

DY g

⎤⎥⎥⎦ . (25)

Since DXg ∈ R1×NQ and DY g ∈ R

1×NQ, it is seen from (25), that (19), (20), and (21) are equivalent.

(24) and (25) are multi-variable generalizations of (2) and (3).

3) Direction of Extremal Rate of Change: The next theorem states how to find the maximum and minimum

rate of change of f(Z,Z∗) ∈ R.

Theorem 3: Let f : CN×Q×C

N×Q → R. The directions where the function f have the maximum and minimum

rate of change with respect to vec(Z) are given by [DZ∗f(Z,Z∗)]T and − [DZ∗f(Z,Z∗)]T , respectively.

Proof: Since f ∈ R, df can be written in the following two ways df = (DZf) d vec(Z) + (DZ∗f) d vec(Z∗),

and df = df ∗ = (DZf)∗ d vec(Z∗)+(DZ∗f)∗ d vec(Z), where df = df ∗ since f ∈ R. Subtracting the two different

expressions of df from each other and then applying Lemma 1, gives that DZ∗f = (DZf)∗ and DZf = (DZ∗f)∗.

From these results, it follows that: df = (DZf) d vec(Z) + ((DZf) d vec(Z))∗ = 2Re {(DZf) d vec(Z)} =

2Re {(DZ∗f)∗ d vec(Z)}. Let ai ∈ CK×1, where i ∈ {0, 1}. Then

Re{aH

0 a1

}=

⟨⎡⎢⎢⎣ Re {a0}

Im {a0}

⎤⎥⎥⎦ ,

⎡⎢⎢⎣ Re {a1}

Im {a1}

⎤⎥⎥⎦⟩

, (26)

Page 13: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

13

where 〈·, ·〉 is the ordinary Euclidean inner product [25] between real vectors in R2K×1. Using this

df = 2

⟨⎡⎢⎢⎣ Re

{(DZ∗f)T

}Im

{(DZ∗f)T

}⎤⎥⎥⎦ ,

⎡⎢⎢⎣ Re {d vec(Z)}

Im {d vec(Z)}

⎤⎥⎥⎦

⟩. (27)

Cauchy-Schwartz inequality gives that the maximum value of df occurs when d vec(Z) = α (DZ∗f)T for α > 0

and from this, it follows that the minimum rate of change occurs when d vec(Z) = −β (DZ∗f)T , for β > 0.

It is now explained why [DZf(Z,Z∗)]T it not the direction of maximum change of rate of the function f . Let

g : CK×1 × C

K×1 → R be g (a0,a1) = 2Re{aT

0 a1

}. If K = 2 and a0 = [1, j]T , then g (a0,a0) = 0 despite the

fact that [1, j]T �= 02×1. Therefore, the function g is not an inner product and Cauchy-Schwartz inequality is not

valid for this function. By examining the proof of Theorem 3, it is seen that the reason why [DZf(Z,Z∗)]T it not

the direction of maximum change of rate, is that the function g is not an inner product.

4) Steepest Descent Method: If a real-valued function f is being optimized with respect to the parameter Z

by means of the steepest decent method, it follows from Theorem 3 that the updating term must be proportional to

DZ∗f(Z,Z∗) and not DZf(Z,Z∗). The update equation for optimizing the real-valued function in Theorem 3 by

means of the steepest decent method, can be expressed as vec T (Zk+1) = vecT (Zk)+μDZ∗f(Zk,Z∗k), where μ is

a real positive constant if it is a maximization problem or a real negative constant if it is a minimization problem,

and Zk ∈ CN×Q is the value of the matrix after k iterations. The size of vecT (Zk) and DZ∗f(Zk,Z

∗k) is 1×NQ.

Example 2: Let f : C × C → R be f(z, z∗) = |z|2 = zz∗. The partial derivatives are Dz∗f(z, z∗) = z and

Dzf(z, z∗) = z∗. This result is in accordance with the intuitive picture for optimizing f , i.e., it is more efficient to

maximize f in the direction of Dz∗f(z, z∗) = z than in the direction of Dzf(z, z∗) = z∗ when standing at z.

5) Complex Case versus Real Case: In order to show that the consistence with the real-valued case given

in [7], the following theorem shows how the complex-valued case is an extension of [7].

Theorem 4: Let F : CN×Q × C

N×Q → CM×P and G : C

N×Q → CM×P . If F (Z0,Z1) = G(Z0), then

DZF (Z,Z∗) = DZG(Z) can be obtained by the procedure given in [7] for finding the derivative of the function

G and DZ∗F (Z,Z∗) = 0MP×NQ.

If F (Z0,Z1) = G(Z1), then DZF (Z,Z∗) = 0MP×NQ, and DZ∗F (Z,Z∗) = DZG(Z)|Z=Z∗ can be obtained

by the procedure given in [7].

Page 14: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

14

TABLE IVDERIVATIVES OF FUNCTIONS OF THE TYPE f (z, z∗)

f�z, z∗� Differential df Dzf

�z, z∗� Dz∗f

�z, z∗�

aT z = zT a aT dz aT 01×N

aT z∗ = zHa aT dz∗ 01×N aT

zT Az zT�

A + AT�

dz zT�

A + AT�

01×N

zH Az zHAdz + zT AT dz∗ zHA zT AT

zHAz∗ zH�

A + AT�

dz∗ 01×N zH�

A + AT�

Proof: Assume that F (Z0,Z1) = G(Z0) where Z0 and Z1 have independent differentials. Applying the vec(·)

and the differential operator to this equation, leads to: d vec (F ) = (DZ0F ) d vec (Z0) + (DZ1F ) d vec (Z1) =

(DZ0G) d vec (Z0). By setting Z0 = Z and Z1 = Z∗ in Lemma 1, it is seen that DZ∗F = 0MP×NQ and

DZF (Z,Z∗) = DZG(Z). Since the last equation only has one matrix variable Z, the same techniques as given

in [7] can be used. The first part of the theorem is then proved, and the second part can be shown similarly.

VI. DEVELOPMENT OF DERIVATIVE FORMULAS

A. Derivative of f(z, z∗)

The case of scalar function of scalar variables is treated in the literature for one input variable, see for example [5],

[26]. If the variables z and z∗ are treated as independent variables, the derivatives Dzf(z, z∗) and Dz∗f(z, z∗) can

be found as for scalar functions having two independent variables, see Example 1.

B. Derivative of f(z,z∗)

Let f : CN×1 × C

N×1 → C be f (z,z∗) = zHAz. This type of function is frequently used in the context of

filter optimization and in array signal processing [20]. For instance, a frequently occurring example is that of output

power zHAz optimization where z is the filter coefficients and A is the input covariance matrix. The differential

of f is df = (dzH)Az + zHAdz = zHAdz + zT AT dz∗, where Table II was used. From df , the derivatives of

zHAz with respect to z and z∗ follow from Table III, and are Dzf (z,z∗) = zHA and Dz∗f (z,z∗) = zT AT ,

see Table IV. The other selected examples are given in Table IV and they can be derived similarly. Some of the

results included in Table IV can be found in [15].

C. Derivative of f(Z,Z∗)

For functions of the type f (Z,Z∗), it is common to arrange the partial derivatives ∂∂zk,l

f and ∂∂z∗

k,lf in an

alternative way [7] than in the expressions DZf (Z,Z∗) and DZ∗f (Z,Z∗). The notation for the alternative way

Page 15: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

15

of organizing all the partial derivatives is ∂∂Zf and ∂

∂Z∗ f . In this alternative way, the partial derivatives of the

elements of the matrix Z ∈ CN×Q are arranged as:

∂Zf =

⎡⎢⎢⎢⎢⎢⎢⎣

∂∂z0,0

f · · · ∂∂z0,Q−1

f

......

∂∂zN−1,0

f · · · ∂∂zN−1,Q−1

f

⎤⎥⎥⎥⎥⎥⎥⎦

,∂

∂Z∗ f =

⎡⎢⎢⎢⎢⎢⎢⎣

∂∂z∗

0,0f · · · ∂

∂z∗0,Q−1

f

......

∂∂z∗

N−1,0f · · · ∂

∂z∗N−1,Q−1

f

⎤⎥⎥⎥⎥⎥⎥⎦

. (28)

∂∂Zf and ∂

∂Z∗ f are called the gradient of f with respect to Z and Z ∗, respectively. (28) generalizes to the

complex case of one of the ways to define the derivative of real scalar functions with respect to real matrices

in [7]. The way of arranging the partial derivatives in (28) is different than than the way given in (12) and (13).

If df = vecT (A0)d vec(Z) + vecT (A1)d vec(Z∗) = Tr{AT

0 dZ + AT1 dZ∗}, where Ai,Z ∈ C

N×Q, then it can

be shown that ∂∂Zf = A0 and ∂

∂Z∗ f = A1, where the matrices A0 and A1 depend on Z and Z ∗ in general.

This result is included in Table III. The size of ∂∂Zf and ∂

∂Z∗ f is N × Q, while the size of DZf (Z,Z∗) and

DZ∗f (Z,Z∗) is 1×NQ, so these two ways of organizing the partial derivatives are different. It can be shown, that

DZf (Z,Z∗) = vecT(

∂∂Zf (Z,Z∗)

), and DZ∗f (Z,Z∗) = vecT

(∂

∂Z∗ f (Z,Z∗)). The steepest decent method can

be formulated as Zk+1 = Zk + μ ∂∂Z∗ f(Zk,Z

∗k). Key examples of derivatives are developed in the text below,

while others useful results are stated in Table V.

1) Determinant Related Problems: Objective functions that depend on the determinant appear in several parts

of signal processing related problems, e.g., in the capacity of wireless multiple-input multiple-output (MIMO)

communication systems [19], in an upper bound for the pair-wise error probability [18], [27], and for the exact

symbol error rate (SER) for precoded orthogonal space-time block-code (OSTBC) systems [28].

Let f : CN×Q × C

Q×N → C be f(Z,Z∗) = det(P + ZZH), where P ∈ CN×N is independent of Z and

Z∗. df is found by the rules in Table II as df = det(P + ZZH)Tr{(P + ZZH)−1d(ZZH)

}= det(P +

ZZH)Tr{ZH

(P + ZZH

)−1dZ + ZT (P + ZZH)−T dZ∗

}. From this, the derivatives with respect to Z and

Z∗ of det(P + ZZH

)can be found, but since these results are relatively long, they are not included in Table V.

Page 16: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

16

TABLE VDERIVATIVES OF FUNCTIONS OF THE TYPE f (Z , Z∗)

f�Z , Z∗� Differential df ∂

∂Zf ∂

∂Z∗ f

Tr{Z} Tr {IN dZ} IN 0N×N

Tr{Z∗} Tr�

IN dZ∗� 0N×N IN

Tr{AZ} Tr {AdZ} AT 0N×Q

Tr{ZA0ZT A1} Tr��

A0ZT A1 + AT0 ZT AT

1

�dZ

�AT

1 ZAT0 + A1ZA0 0N×Q

Tr{ZA0ZA1} Tr {(A0ZA1 + A1ZA0) dZ} AT1 ZT AT

0 + AT0 ZT AT

1 0N×Q

Tr{ZA0ZHA1} Tr�

A0ZH A1dZ + AT0 ZT AT

1 dZ∗� AT1 Z∗AT

0 A1ZA0

Tr{ZA0Z∗A1} Tr�

A0Z∗A1dZ + A1ZA0dZ∗� AT1 ZHAT

0 AT0 ZT AT

1

Tr{AZ−1} −Tr�

Z−1AZ−1dZ�

−�

ZT�−1

AT�

ZT�−1

0N×N

Tr{Zp} p Tr�

Zp−1dZ�

p�

ZT�p−1

0N×N

det(Z) det(Z)Tr�

Z−1dZ�

det(Z)�

ZT�−1

0N×N

det(A0ZA1) det(A0ZA1)Tr�

A1 (A0ZA1)−1 A0dZ�

det(A0ZA1)AT0

�AT

1 ZT AT0

�−1AT

1 0N×Q

det(Z2) 2 det2(Z) Tr�

Z−1dZ�

2 det2(Z)�

ZT�−1

0N×N

det(ZZT ) 2 det(ZZT )Tr

ZT

�ZZT

�−1dZ

2 det(ZZT )

�ZZT

�−1Z 0N×Q

det(ZZ∗) det(ZZ∗) Tr�

Z∗(ZZ∗)−1dZ + (ZZ∗)−1ZdZ∗� det(ZZ∗)(ZHZT )−1ZH det(ZZ∗)ZT�

ZHZT�−1

det(ZZH) det(ZZH )Tr

ZH(ZZH )−1dZ + ZT

�Z∗ZT

�−1dZ∗

det(ZZH)(Z∗ZT )−1Z∗ det(ZZH )

�ZZH

�−1Z

det(Zp) p detp(Z) Tr�

Z−1dZ�

p detp(Z)�

ZT�−1

0N×N

λ(Z)vH0 (dZ)u0vH0 u0

= Tr

�u0vH

0vH0 u0

dZ

�v∗0uT

0vH0 u0

0N×N

λ∗(Z)vT0 (dZ∗)u∗

0vT0 u∗

0= Tr

�u∗

0vT0

vT0 u∗

0dZ∗

�0N×N

v0uH0

vT0 u∗

0

2) Eigenvalue Related Problems: Let λ0 be a simple eigenvalue9 of Z0 ∈ CN×N , and let u0 ∈ C

N×1 be the

normalized corresponding eigenvector, such that Z 0u0 = λ0u0. Let λ : CN×N → C and u : C

N×N → CN×1 be

defined for all Z in some neighborhood of Z 0 such that:

Zu(Z) = λ(Z)u(Z), uH0 u(Z) = 1, λ(Z0) = λ0, u(Z0) = u0. (29)

Let the normalized left eigenvector of Z0 corresponding to λ0 be denoted v0 ∈ CN×1, i.e., vH

0 Z0 = λ0vH0 , or

equivalently ZH0 v0 = λ∗

0v0. In order to find the differential of λ(Z) at Z = Z0, take the differential of both sides

of the first expression in (29) evaluated at Z = Z 0:

(dZ) u0 + Z0du = (dλ) u0 + λ0du. (30)

Pre-multiplying (30) by vH0 gives: vH

0 (dZ) u0 = (dλ) vH0 u0. From Lemma 6.3.10 in [22], vH

0 u0 �= 0, and hence

dλ =vH

0 (dZ) u0

vH0 u0

= Tr{

u0vH0

vH0 u0

dZ

}. (31)

9The matrix Z0 ∈ CN×N has in general N different complex eigenvalues. However, the roots of the characteristic equation, i.e., the

eigenvalues need not to be distinct. The number of times an eigenvalue appears is equal to its multiplicity. If one eigenvalue appear only

once, it is called a simple eigenvalue [22].

Page 17: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

17

This result is included in Table V, and it will be used later when the derivatives of the eigenvector u and the

Hessian of λ are found. The differential of λ∗ at Z0 is also included in Table V. These results are derived in [7].

D. Derivative of f(z, z∗)

In engineering, the objective function is often a real scalar, and this might be a very complicated function in the

parameter matrix. To be able to find the derivative in a simple way, the chain rule in Theorem 1 might be very

useful, and then functions of the types f and F might occur.

Let f(z, z∗) = af(z, z∗), then df = adf = a (Dzf(z, z∗)) dz + a (Dz∗f(z, z∗)) dz∗, where df follows from

Table III. From this equation, it follows that Dzf = aDzf(z, z∗) and Dz∗f = aDz∗f(z, z∗). The derivatives of

the vector functions az and azz∗ follow from these expressions.

E. Derivative of f(z,z∗)

First a useful results is introduced. In order to extract the vec(·) of an inner matrix from the vec(·) of a multiple

matrix product, the following formula [23] is very useful:

vec (ABC) =(CT ⊗ A

)vec (B) . (32)

Let f : CN×1 ×C

N×1 → CM×1 be given by f(z,z∗) = F (z,z∗)a, where F : C

N×1 ×CN×1 → C

M×P . Then

df = d vec(F (z,z∗)a) =(aT ⊗ IM

)d vec(F ) =

(aT ⊗ IM

)[(DzF (z,z∗)) dz + (Dz∗F (z,z∗)) dz∗], where

(32) and Table III were utilized. From df , the derivatives of f with respect to z and z∗ follow.

F. Derivative of f(Z,Z∗)

1) Eigenvector Related Problems: First a lemma is stated.

Lemma 3: Let A ∈ CN×Q, then the following equalities holds:

R (A) = R (A+A

), C (A) = C (

AA+), rank (A) = rank

(A+A

)= rank

(AA+

)(33)

where C(A), R(A), and N (A) are the symbols used for the column, row, and null space of A ∈ CN×Q,

respectively.10

The proof of Lemma 3 can be found in Appendix IV.

10C(A) = {w ∈ CN×1|w = Az, for some z ∈ C

Q×1}, R(A) = {w ∈ C1×Q|w = zA, for some z ∈ C

1×N}, and N (A) = {z ∈

CQ×1|Az = 0N×1}.

Page 18: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

18

The differential of the eigenvector u(Z) is now found at Z = Z 0. The derivation here is similar to the one in [7],

where the same result for du at Z = Z0 was derived, however, more details are included here. Let Y 0 = λ0IN−Z0,

then it follows from (30), that Y 0du = (dZ) u0 − (dλ) u0 = (dZ) u0 − vH0 (dZ)u0

vH0 u0

u0 =(IN − u0vH

0vH

0 u0

)(dZ) u0,

where (31) was utilized. Pre-multiplying Y 0du with Y +0 results in:

Y +0 Y 0du = Y +

0

(IN − u0v

H0

vH0 u0

)(dZ)u0. (34)

The operator dim(·) returns the dimension of the vector space. Since λ0 is a simple eigenvalue, dim (N (Y 0)) =

1 [5]. Hence, it follows from 0.4.4 (g) on page 13 in [7] that rank(Y 0) = N − dim (N (Y 0)) = N − 1. From

Y 0u0 = 0N×1, it follows from (71), see Appendix V, that u+0 Y +

0 = 01×N . It can be shown by direct insertion

in Definition 3 of the Moore-Penrose inverse that u+0 = uH

0 . From these results it follows that uH0 Y +

0 = 01×N .

Set C0 = Y +0 Y 0 + u0u

H0 , then it can be shown from the two facts uH

0 Y +0 = 01×N and Y 0u0 = 0N×1 that

C20 = C0, i.e., C0 is idempotent. It can be shown by the direct use of Definition 3, that the matrix Y +

0 Y 0 is also

idempotent. By the use of Lemma 10.1.1 and Corollary 10.2.2, in [8], it is found that rank (C 0) = Tr {C0} =

Tr{Y +

0 Y 0

}+ Tr

{u0u

H0

}= rank

(Y +

0 Y 0

)+ 1 = rank (Y 0) + 1 = N , where the third equation of (33) was

used. From Lemma 10.1.1 and Corollary 10.2.2, in [8], and rank (C 0) = N , it follows that C0 = IN . Using the

differential operator on both sides of the second equation in (29), yields uH0 du = 0. Using these results, it follows

that:

Y +0 Y 0du =

(IN − u0u

H0

)du = du − u0u

H0 du = du. (35)

(34) and (35) lead to:

du = (λ0IN − Z0)+

(IN − u0v

H0

vH0 u0

)(dZ) u0. (36)

From (36), it is possible to find the differential of u(Z) evaluated at Z0 with respect to the matrix Z:

du = vec(du) = vec(

(λ0IN − Z0)+

(IN − u0v

H0

vH0 u0

)(dZ) u0

)

=(

uT0 ⊗

[(λ0IN − Z0)

+

(IN − u0v

H0

vH0 u0

)])d vec (Z) , (37)

where (32) was used. From (37), it follows that DZu = uT0 ⊗

[(λ0IN − Z0)

+(IN − u0vH

0vH

0 u0

)]. The differential

and the derivative of u∗ follow by the use of Table II, and the results found here.

Page 19: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

19

The left eigenvector function v : CN×N → C

N×1 with the argument Z ∈ CN×N , denoted v(Z), is defined

through the following four relations: vH(Z)Z = λ(Z)vH , vH0 v(Z) = 1, λ(Z0) = λ0, and v(Z0) = v0. The

differential of v(Z) at Z = Z0 can be found, using a procedure similar to the one used earlier in this subsection

for finding du at Z = Z0, giving dvH = vH0 (dZ)

(IN − u0vH

0vH

0 u0

)(λ0IN − Z0)

+.

G. Derivative of F (z, z∗)

Examples of functions of the type F (z, z∗) are Az, Azz∗, and Af(z, z∗), where A ∈ CM×P . These functions

can be differentiated by finding the differentials of the scalar functions z, zz∗, and f(z, z∗).

H. Derivative of F (z,z∗)

Let F : CN×1 ×C

N×1 → CN×N be F (z,z∗) = zzH . The differential of the F is dF = (dz)zH + zdzH . And

from this equation d vec(F ) = [z∗ ⊗ IN ] d vec(z) + [IN ⊗ z] d vec(zH) = [z∗ ⊗ IN ] dz + [IN ⊗ z] dz∗. Hence,

the derivatives of F (z,z∗) = zzH are given by DzF = z∗ ⊗ IN , and Dz∗F = IN ⊗ z.

I. Derivative of F (Z,Z∗)

1) Kronecker Product Related Problems: Examples of objective functions which depends on the Kronecker

product of the unknown complex-valued matrix is found in [18], [28].

Now, the commutation matrix is introduced:

Definition 6: Let A ∈ CN×Q, then there exists a permutation matrix that connects the vectors vec (A) and

vec(AT

). The commutation matrix KN,Q has size NQ×NQ, and it defines the connection between vec (A) and

vec(AT

)in the following way: KN,Q vec(A) = vec(AT ).

The matrix KQ,N is a permutation matrix and it can be shown [7] that KTQ,N = K−1

Q,N = KN,Q. The following

result is Theorem 3.9 in [7]: Let Ai ∈ CNi×Qi where i ∈ {0, 1}, then the reason for why the commutation matrix

received its name can be seen from the following identity: KN1,N0 (A0 ⊗ A1) = (A1 ⊗ A0)KQ1,Q0 .

Let F : CN0×Q0 × C

N1×Q1 → CN0N1×Q0Q1 be given by F (Z0,Z1) = Z0 ⊗ Z1, where Z i ∈ C

Ni×Qi . The

differential of this function follows from Table II: dF = (dZ0) ⊗ Z1 + Z0 ⊗ dZ1. Applying the vec(·) operator

to dF yields: d vec(F ) = vec ((dZ0) ⊗ Z1) + vec (Z0 ⊗ dZ1). From (A ⊗ B)(C ⊗ D) = AC ⊗ BD and

Page 20: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

20

Theorem 3.10 in [7], it follows that

vec ((dZ0) ⊗ Z1) = (IQ0 ⊗ KQ1,N0 ⊗ IN1) [(d vec(Z0)) ⊗ vec(Z1)]

= (IQ0 ⊗ KQ1,N0 ⊗ IN1) [(IN0Q0d vec(Z0)) ⊗ (vec(Z1)1)]

= (IQ0 ⊗ KQ1,N0 ⊗ IN1) [(IN0Q0 ⊗ vec(Z1)) (d vec(Z0) ⊗ 1)]

= (IQ0 ⊗ KQ1,N0 ⊗ IN1) [IN0Q0 ⊗ vec(Z1)] d vec(Z0), (38)

and in a similar way it follows that

vec (Z0 ⊗ dZ1) = (IQ0 ⊗ KQ1,N0 ⊗ IN1) [vec(Z0) ⊗ d vec(Z1)]

= (IQ0 ⊗ KQ1,N0 ⊗ IN1) [vec(Z0) ⊗ IN1Q1 ] d vec(Z1). (39)

Inserting the result from (38) and (39) into d vec(F ) gives:

d vec(F ) = (IQ0 ⊗ KQ1,N0 ⊗ IN1) [IN0Q0 ⊗ vec(Z1)] d vec(Z0)

+ (IQ0 ⊗ KQ1,N0 ⊗ IN1) [vec(Z0) ⊗ IN1Q1 ] d vec(Z1). (40)

Define the matrices A(Z1) and B(Z0) by A(Z1) � (IQ0 ⊗ KQ1,N0 ⊗ IN1) [IN0Q0 ⊗ vec(Z1)], and B(Z0) =

(IQ0 ⊗ KQ1,N0 ⊗ IN1) [vec(Z0) ⊗ IN1Q1 ]. It is then possible to rewrite the differential of F (Z0,Z1) = Z0 ⊗Z1

as d vec(F ) = A(Z1)d vec(Z0) + B(Z0)d vec(Z1). From d vec(F ), the differentials and derivatives of Z ⊗ Z,

Z ⊗ Z∗, and Z∗ ⊗ Z∗ can be derived and these results are included in Table VI. In the table, diag(·) returns the

square diagonal matrix with the input column vector elements on the main diagonal [22] and zeros elsewhere.

2) Moore-Penrose Inverse Related Problems: In pseudo-inverse matrix based receiver design, the Moore-

Penrose inverse might appear [19]. This is applicable for MIMO, CDMA, and OFDM systems.

Let F : CN×Q × C

N×Q → CQ×N be given by F (Z,Z∗) = Z+, where Z ∈ C

N×Q. The reason for including

both variables Z and Z∗ in this function definition is that the differential of Z+, see Table II, depends on both

dZ and dZ∗. Using the vec(·) operator on the differential of the Moore-Penrose inverse in Table II, in addition to

(32) and Definition 6, result in:

d vec(F ) = −[(

Z+)T ⊗ Z+

]d vec(Z) +

[(IN − (

Z+)T

ZT)⊗ Z+

(Z+

)H]KN,Qd vec(Z∗)

Page 21: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

21

TABLE VIDERIVATIVES OF FUNCTIONS OF THE TYPE F (Z, Z∗)

F�Z , Z∗� Differential d vec(F ) DZF

�Z , Z∗� DZ∗F

�Z , Z∗�

Z INQd vec(Z) INQ 0NQ×NQ

ZT KN,Qd vec(Z) KN,Q 0NQ×NQ

Z∗ INQd vec(Z∗) 0NQ×NQ INQ

ZH KN,Qd vec(Z∗) 0NQ×NQ KN,Q

ZZT�

IN2 + KN,N

�(Z ⊗ IN ) d vec(Z)

�I

N2 + KN,N

�(Z ⊗ IN ) 0

N2×NQ

ZT Z�

IQ2 + KQ,Q

� �IQ ⊗ ZT

�d vec(Z)

�I

Q2 + KQ,Q

� �IQ ⊗ ZT

�0

Q2×NQ

ZZH �Z∗ ⊗ IN

�d vec(Z) + KN,N (Z ⊗ IN ) d vec(Z∗) Z∗ ⊗ IN KN,N (Z ⊗ IN )

Z−1 −�(ZT )−1 ⊗ Z−1

�d vec(Z) −(ZT )−1 ⊗ Z−1 0

N2×N2

Zpp

i=1

�(Z

T)p−i ⊗ Z

i−1�

d vec(Z)

p i=1

�(Z

T)p−i ⊗ Z

i−1�

0N2×N2

Z ⊗ Z (A(Z) + B(Z))d vec(Z) A(Z) + B(Z) 0N2Q2×NQ

Z ⊗ Z∗ A(Z∗)d vec(Z) + B(Z)d vec(Z∗) A(Z∗) B(Z)

Z∗ ⊗ Z∗ (A(Z∗) + B(Z∗))d vec(Z∗) 0N2Q2×NQ

A(Z∗) + B(Z∗)

Z � Z 2 diag(vec(Z))d vec(Z) 2 diag(vec(Z)) 0NQ×NQ

Z � Z∗ diag(vec(Z∗))d vec(Z) + diag(vec(Z))d vec(Z∗) diag(vec(Z∗)) diag(vec(Z))

Z∗ � Z∗ 2 diag(vec(Z∗))d vec(Z∗) 0NQ×NQ 2 diag(vec(Z∗))

exp(Z)∞

k=0

1

(k + 1)!

k i=0

��Z

T�k−i ⊗ Z

i�

d vec(Z)∞

k=0

1

(k + 1)!

k i=0

�Z

T�k−i ⊗ Z

i0

N2×N2

exp�Z∗� ∞

k=0

1

(k + 1)!

k i=0

��Z

H�k−i ⊗ (Z

∗)i�

d vec(Z∗) 0

N2×N2

∞ k=0

1

(k + 1)!

k i=0

��Z

H�k−i ⊗ (Z

∗)i�

exp�

ZH� ∞

k=0

1

(k + 1)!

k i=0

��Z

∗�k−i ⊗ (ZH

)i�

KN,N d vec(Z∗) 0

N2×N2

∞ k=0

1

(k + 1)!

k i=0

��Z

∗�k−i ⊗ (ZH

)i�

KN,N

+[(

Z+)T (

Z+)∗ ⊗ (

IQ − Z+Z)]

KN,Qd vec(Z∗). (41)

This leads to DZ∗F ={[(

IN − (Z+

)TZT

)⊗ Z+

(Z+

)H]

+[(

Z+)T (

Z+)∗ ⊗ (

IQ − Z+Z)]}

KN,Q and

DZF = −[(

Z+)T ⊗ Z+

]. If Z is invertible, then the derivative of Z+ = Z−1 with respect to Z∗ is equal to the

zero matrix and the derivative of Z+ = Z−1 with respect to Z can be found from Table VI. Since the expressions

associated with the differential of the Moore-Penrose inverse is so long, it is not included in Table VI.

VII. HESSIAN MATRIX FORMULAS

In the previous sections, we have proposed tools for finding first-order derivatives that can be of help to find

stationary points in optimization problems. To identify the nature of these stationary points (minimum, maximum,

or saddle point) or to study the stability of iterative algorithms, the second-order derivatives are useful. This can

be done by examining the Hessian matrix.

The Hessian matrix of a function is a matrix containing the second-order derivatives of the function. In this

section, the Hessian matrix is defined and it is shown how it can be obtained. Only the case of scalar functions

f ∈ C is treated, since this is the case of interest in most practical situations. However, the results can be extended

to the vector- and matrix-functions as well. The way the Hessian is defined here follows in the same lines as given

Page 22: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

22

in [7], treating the real case. A complex version of Newton’s recursion formula is derived in [29], [30], and there

the topic of Hessian matrices is briefly treated for real scalar functions. Therefore, the complex Hessian might be

used to accelerate the convergence of iterative optimization algorithms.

A. Identification of the Complex Hessian Matrices

When dealing with the Hessian matrix, it is the second-order differential that has to be calculated in order to

identify the Hessian matrix. Neither of the matrices dZ nor dZ∗ is a function of Z or Z∗ and, hence, their

differentials are the zero matrix. Mathematically, this can be formulated as:

d2Z = d (dZ) = 0N×Q = d (dZ∗) = d2Z∗. (42)

If f ∈ C, then(d2f

)T = d (df)T = d2fT = d2f and if f ∈ R, then(d2f

)H = d (df)H = d2fH = d2f .

The Hessian depends on two variables such that the notation must include which variable the Hessian is calculated

with respect to. If the Hessian is calculated with respect to Z 0 and Z1, the Hessian is denoted HZ0,Z1f . The

following definition is used for HZ0,Z1f and it is an extension of the definition in [7], to complex scalar functions.

Definition 7: Let Zi ∈ CNi×Qi , i ∈ {0, 1}, then HZ0,Z1f ∈ C

N1Q1×N0Q0 and HZ0,Z1f � DZ0 (DZ1f)T .

Later, the following proposition is important for showing the symmetry of the complex Hessian matrix.

Proposition 2: Let f : CN×Q × C

N×Q → C. Assumed that the function f(Z,Z∗) is twice differentiable with

respect to all of the variables inside Z and Z ∗, when these variables are treated as independent variables. Then [7]

∂2

∂zk,l∂zm,nf = ∂2

∂zm,n∂zk,lf , ∂2

∂z∗k,l∂z∗

m,nf = ∂2

∂z∗m,n∂z∗

k,lf , and ∂2

∂z∗k,l∂zm,n

f = ∂2

∂zm,n∂z∗k,l

f , where m,k ∈ {0, 1, . . . , N−1}

and n, l ∈ {0, 1, . . . , Q − 1}.

Let pi = Niki + li where i ∈ {0, 1}, ki ∈ {0, 1, . . . , Qi − 1} and li ∈ {0, 1, . . . , Ni − 1}. As a consequence of

Definition 7 and (12), it follows that element number (p0, p1) of HZ0,Z1f is given by:

(HZ0,Z1f)p0,p1=

(DZ0

(DZ1f)T)

p0,p1

=

[∂

∂ vecT (Z0)

(∂

∂ vecT (Z1)f

)T]

p0,p1

=∂

∂ (vec(Z0))p1

∂ (vec(Z1))p0

f

=∂

∂ (vec(Z0))N1k1+l1

∂ (vec(Z1))N0k0+l0

f =∂2f

∂ (Z0)l1,k1∂ (Z1)l0,k0

. (43)

And as an immediate consequence of (43) and Proposition 2, it follows that for twice differentiable functions f :

(HZ,Zf)T = HZ,Zf, (HZ∗,Z∗f)T = HZ∗,Z∗f, (HZ,Z∗f)T = HZ∗,Zf. (44)

Page 23: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

23

In order to find an identification equation for the Hessians of the function f with respect to all the possible

combinations of the variables Z and Z∗, an appropriate form of the expression d2f is required. This expression

is now derived. From (10), the first-order differential of the function f can be written as df = (DZf)d vec(Z) +

(DZ∗f)d vec(Z∗), where DZf ∈ C1×NQ and DZ∗f ∈ C

1×NQ can be expressed by the use of (10) as:

(dDZf)T =[DZ (DZf)T

]d vec(Z) +

[DZ∗ (DZf)T

]d vec(Z∗), (45)

(dDZ∗f)T =[DZ (DZ∗f)T

]d vec(Z) +

[DZ∗ (DZ∗f)T

]d vec(Z∗). (46)

From (45) and (46) it follows that:

dDZf =[d vecT (Z)

] [DZ (DZf)T

]T+

[d vecT (Z∗)

] [DZ∗ (DZf)T

]T, (47)

dDZ∗f =[d vecT (Z)

] [DZ (DZ∗f)T

]T+

[d vecT (Z∗)

] [DZ∗ (DZ∗f)T

]T. (48)

d2f is found by applying the differential operator on df and then utilizing the results from (42), (47), and (48):

d2f = (dDZf)d vec(Z) + (dDZ∗f)d vec(Z∗)

=[d vecT (Z)

] [DZ (DZf)T

]Td vec(Z) +

[d vecT (Z∗)

] [DZ∗ (DZf)T

]Td vec(Z)

+[d vecT (Z)

] [DZ (DZ∗f)T

]Td vec(Z∗) +

[d vecT (Z∗)

] [DZ∗ (DZ∗f)T

]Td vec(Z∗)

=[d vecT (Z)

] [DZ (DZf)T

]d vec(Z) +

[d vecT (Z)

] [DZ∗ (DZf)T

]d vec(Z∗)

+[d vecT (Z∗)

] [DZ (DZ∗f)T

]d vec(Z) +

[d vecT (Z∗)

] [DZ∗ (DZ∗f)T

]d vec(Z∗)

=[d vecT (Z∗), d vecT (Z)

]⎡⎢⎢⎣ DZ (DZ∗f)T , DZ∗ (DZ∗f)T

DZ (DZf)T , DZ∗ (DZf)T

⎤⎥⎥⎦

⎡⎢⎢⎣ d vec(Z)

d vec(Z∗)

⎤⎥⎥⎦ . (49)

From Definition 7, it follows that d2f can be rewritten as:

d2f =[d vecT (Z∗), d vecT (Z)

]⎡⎢⎢⎣ HZ,Z∗f, HZ∗,Z∗f

HZ,Zf, HZ∗,Zf

⎤⎥⎥⎦

⎡⎢⎢⎣ d vec(Z)

d vec(Z∗)

⎤⎥⎥⎦ . (50)

Assume that it is possible to find an expression of d2f in the following form

d2f =[d vecT (Z∗)

]A0,0d vec(Z) +

[d vecT (Z∗)

]A0,1d vec(Z∗)

Page 24: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

24

+[d vecT (Z)

]A1,0d vec(Z) +

[d vecT (Z)

]A1,1d vec(Z∗)

=[d vecT (Z∗), d vecT (Z)

]⎡⎢⎢⎣ A0,0, A0,1

A1,0, A1,1

⎤⎥⎥⎦

⎡⎢⎢⎣ d vec(Z)

d vec(Z∗)

⎤⎥⎥⎦ , (51)

where Ak,l ∈ CNQ×NQ, k, l ∈ {0, 1}, can possibly be dependent on Z and Z ∗ but not on d vec(Z) and d vec(Z∗).

The four complex Hessian matrices in (50) can now be identified from Ak,l in (51) in the following way: Subtract

the second-order differentials in (50) from (51), and then use (44) together with Lemma 2 to get

HZ∗,Zf =AT

0,0 + A1,1

2, HZ,Z∗f =

A0,0 + AT1,1

2, HZ,Zf =

A1,0 + AT1,0

2, HZ∗,Z∗f =

A0,1 + AT0,1

2. (52)

The complex Hessian matrices of the scalar function f can be computed using a three-step procedure:

1) Compute d2f .

2) Manipulate d2f into the form given in (51).

3) Use (52) to identify the four Hessian matrices.

In order to check convexity of f , the middle matrix on the right hand side of (50) must be positive definite.

B. Examples of Calculation of the Complex Hessian Matrices

1) Complex Hessian of f(z,z∗): Let f : CN×1 × C

N×1 → C be defined as: f(z,z∗) = zHΦz. The second-

order differential of f is d2f = 2(dzH

)Φdz. From this, Ak,l in (51) are A0,0 = 2Φ, A0,1 = A1,0 = A1,1 =

0N×N . (52) gives the four complex Hessian matrices HZ,Zf = 0N×N , HZ∗,Z∗f = 0N×N , HZ∗,Zf = ΦT , and

HZ,Z∗f = Φ.

2) Complex Hessian of f(Z,Z∗): See the discussion around (29) for an introduction to the eigenvalue problem.

d2λ(Z) is now found at Z = Z0. The derivation here is similar to the one in [7], where the same result for

d2λ was found. Applying the differential operator to both sides of (30), results in: 2 (dZ) (du) + Z 0d2u =

(d2λ

)u0 + 2 (dλ) du + λ0d

2u. Left-multiplying this equation by vH0 , and solving for d2λ gives

d2λ =2vH

0 (dZ − INdλ) du

vH0 u0

=2vH

0

(dZ − IN

vH0 (dZ)u0

vH0 u0

)du

vH0 u0

=2(vH

0 dZ − vH0 (dZ)u0vH

0vH

0 u0

)du

vH0 u0

=2vH

0 (dZ)(IN − u0vH

0vH

0 u0

)(λ0IN − Z0)

+(IN − u0vH

0vH

0 u0

)(dZ)u0

vH0 u0

, (53)

Page 25: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

25

where (31) and (36) were utilized. d2λ can be reformulated by means of Theorem 2.3 in [7] in the following way:

d2λ =2

vH0 u0

Tr{

u0vH0 (dZ)

(IN − u0v

H0

vH0 u0

)(λ0IN − Z0)

+

(IN − u0v

H0

vH0 u0

)(dZ)

}

=2

vH0 u0

d vecT (Z)[{(

IN − u0vH0

vH0 u0

)(λ0IN − Z0)

+

(IN − u0v

H0

vH0 u0

)}⊗ v∗

0uT0

]d vec

(ZT

)

=2

vH0 u0

d vecT (Z)[{(

IN − u0vH0

vH0 u0

)(λ0IN − Z0)

+

(IN − u0v

H0

vH0 u0

)}⊗ v∗

0uT0

]KN,Nd vec (Z) . (54)

From this expression, it is possible to identify the four complex Hessian matrices by means of (51) and (52).

Let f : CN×Q × C

N×Q → C be given by f(Z,Z∗) = Tr{ZAZH

}, where Z ∈ C

N×Q and A ∈ CQ×Q. From

Theorem 2.3 in [7], f can written f = vecT (Z∗)(AT ⊗ IN

)vec(Z). By applying the differential operator twice

to the latest expression of f , it follows that d2f = 2(d vecT (Z∗)

) (AT ⊗ IN

)d vec(Z). From this expression, it

is possible to identify the four complex Hessian matrices by means of (51) and (52).

VIII. CONCLUSIONS

An introduction is given to a set of very powerful tools that can be used to systematically find the derivative of

complex-valued matrix functions that are dependent on complex-valued matrices. The key idea is to go through the

complex differential of the function and to treat the differential of the complex variable and its complex conjugate

as independent. This general framework can be used in many optimization problems that depend on complex

parameters. Many results are given in tabular form.

APPENDIX I

PROOF OF PROPOSITION 1

Proof: (8) leads to dZ+ = dZ+ZZ+ = (dZ+Z)Z++Z+ZdZ+. If ZdZ+ is found from dZZ+ = (dZ)Z++

ZdZ+, and inserted in the expression for dZ +, then it is found that:

dZ+ = (dZ+Z)Z+ + Z+(dZZ+ − (dZ)Z+) = (dZ+Z)Z+ + Z+dZZ+ − Z+(dZ)Z+. (55)

It is seen from (55), that it remains to express dZ+Z and dZZ+ in terms of dZ and dZ∗. Firstly, dZ+Z is

handled:

dZ+Z = dZ+ZZ+Z = (dZ+Z)Z+Z + Z+Z(dZ+Z) = (Z+Z(dZ+Z))H + Z+Z(dZ+Z). (56)

The expression Z(dZ+Z) can be found from dZ = dZZ+Z = (dZ)Z+Z + Z(dZ+Z), and it is given by

Z(dZ+Z) = dZ − (dZ)Z+Z = (dZ)(IQ − Z+Z). If this expression is inserted into (56), it is found that:

Page 26: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

26

dZ+Z = (Z+(dZ)(IQ−Z+Z))H +Z+(dZ)(IQ−Z+Z) = (IQ−Z+Z)(dZH

)(Z+)H+Z+(dZ)(IQ−Z+Z).

Secondly, it can be shown in a similar manner that: dZZ+ = (IN −ZZ+)(dZ)Z++(Z+)H(dZH

)(IN −ZZ+).

If the expressions for dZ+Z and dZZ+ are inserted into (55), (9) is obtained.

APPENDIX II

PROOF OF LEMMA 1

Proof: Let Ai ∈ CM×NQ be an arbitrary complex-valued function of Z ∈ C

N×Q and Z∗ ∈ CN×Q.

From Table II, it follows that d vec(Z) = d vec(Re {Z}) + jd vec(Im {Z}) and d vec(Z∗) = d vec(Re {Z}) −

jd vec(Im {Z}). If these two expressions are substituted into A0d vec(Z) + A1d vec(Z∗) = 0M×1, then it follows

that A0(d vec(Re {Z}) + jd vec(Im {Z})) + A1(d vec(Re {Z}) − jd vec(Im {Z})) = 0M×1. The last expression

is equivalent to: (A0 + A1)d vec(Re {Z}) + j(A0 −A1)d vec(Im {Z}) = 0M×1. Since the differentials dRe {Z}

and d Im {Z} are independent, so are d vec(Re {Z}) and d vec(Im {Z}). Therefore, A 0 + A1 = 0M×NQ and

A0 − A1 = 0M×NQ. And from this it follows that A0 = A1 = 0M×NQ.

APPENDIX III

PROOF OF LEMMA 2

First three other results are stated and proven:

Lemma 4: Let A,B ∈ CN×N . zT Az = zT Bz ∀ z ∈ C

N×1 is equivalent to A + AT = B + BT .

Proof: Let (A)k,l = ak,l and (B)k,l = bk,l. Assume first that zT Az = zT Bz ∀ z ∈ CN×1, and set z = ek

where k ∈ {0, 1, . . . , N − 1}. Then eTk Aek = eT

k Bek, gives that ak,k = bk,k for all k ∈ {0, 1, . . . , N − 1}. Setting

z = ek + el leads to (eTk + eT

l )A(ek + el) = (eTk + eT

l )B(ek + el), which results in ak,k + al,l + ak,l + al,k =

bk,k+bl,l+bk,l+bl,k. Eliminating equal terms gives ak,l+al,k = bk,l+bl,k which can be written A+AT = B+BT .

Assuming that A + AT = B + BT , it follows that zT Az = 12

(zT Az + zT AT z

)= 1

2zT(A + AT

)z =

12zT

(B + BT

)z = 1

2

(zT Bz + zT BT z

)= 1

2

(zT Bz + zT Bz

)= zT Bz, for all z ∈ C

N×1.

Corollary 1: Let A ∈ CN×N . zT Az = 0 ∀ z ∈ C

N×1 is equivalent to AT = −A.

Proof: Set B = 0N×N in Lemma 4, then the corollary follows.

Lemma 5: Let A,B ∈ CN×N . zHAz = zHBz ∀ z ∈ C

N×1 is equivalent to A = B.

Proof: Let (A)k,l = ak,l and (B)k,l = bk,l. Assume first that zHAz = zHBz ∀ z ∈ CN×1, and set z = ek

where k ∈ {0, 1, . . . , N − 1}. This gives that ak,k = bk,k for all k ∈ {0, 1, . . . , N − 1}. Also in the same way as in

Page 27: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

27

the proof of Lemma 4 setting z = ek +el leads to A+AT = B +BT . Next, set z = ek + jel, then manipulations

give A − AT = B − BT . The equations A + AT = B + BT and A − AT = B − BT imply that A = B.

If A = B, then it follows that zHAz = zHBz for all z ∈ CN×1.

Now, the proof of Lemma 2 is given:

Proof: Substitution of d vec(Z) = d vec(Re {Z}) + jd vec(Im {Z}) and d vec(Z∗) = d vec(Re {Z}) −

jd vec(Im {Z}) into the second-order differential expression given in the lemma leads to:

[d vecT (Re {Z})] [B0 + B1 + B2] d vec(Re {Z}) +

[d vecT (Im {Z})] [−B0 + B1 − B2] d vec(Im {Z})

+[d vecT (Re {Z})] [

j(B0 + BT

0

)+ j

(B1 − BT

1

) − j(B2 + BT

2

)]d vec(Im {Z}) = 0. (57)

(57) is valid for all dZ and the differentials of d vec(Re {Z}) and d vec(Im {Z}) are independent. If d vec(Im {Z})

is set to the zero vector, then it follows from (57) and Corollary 1 that:

B0 + B1 + B2 = −BT0 − BT

1 − BT2 . (58)

In the same way, setting d vec(Re {Z}) to the zero vector, then it follows from (57) and Corollary 1 that:

−B0 + B1 − B2 = BT0 − BT

1 + BT2 . (59)

Because of the skew symmetry in (58) and (59) and the linear independence of d vec(Re {Z}) and d vec(Im {Z}),

it follows from (57) and Corollary 1 that:

(B0 + BT

0

)+

(B1 − BT

1

) − (B2 + BT

2

)= 0NQ×NQ. (60)

(58), (59), and (60) lead to B0 = −BT0 , B1 = −BT

1 , and B2 = −BT2 . Since the matrices B0 and B2 are skew,

Corollary 1 reduces(d vecT (Z)

)B0d vec(Z) +

(d vecT (Z∗)

)B1d vec(Z) +

(d vecT (Z∗)

)B2d vec(Z∗) = 0 to

(d vecT (Z∗)

)B1d vec(Z) = 0. Then Lemma 5 results in B1 = 0NQ×NQ.

APPENDIX IV

PROOF OF LEMMA 3

Proof: Let the symbol ⊆ means subset of. From (7) and the definition of R (A), it follows that R (A) =

{w ∈ C

1×Q | w = zA(A+A

), for some z ∈ C

1×N} ⊆ R (

A+A). From the definition of R (

A+A), it follows

that R (A+A

)=

{w ∈ C

1×Q | w = zA+A, for some z ∈ C1×Q

} ⊆ R (A). From R (A) ⊆ R (A+A

)and

Page 28: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

28

R (A+A

) ⊆ R (A), the first equality of (33) follows. From (7) and the definition of C (A), it follows that:

C (A) ={w ∈ C

N×1 | w =(AA+

)Az, for some z ∈ C

Q×1} ⊆ C (

AA+). From the definition of C (

AA+), it

follows that: C (AA+

)=

{w ∈ C

N×1 | w = AA+z, for some z ∈ CN×1

} ⊆ C (A). From C (A) ⊆ C (AA+

)and C (

AA+) ⊆ C (A), the second equality of (33) follows. The third equation of (33) is a direct consequence of

the two first equations of (33).

APPENDIX V

LEMMA SHOWING PROPERTIES OF THE MOORE-PENROSE INVERSE

Lemma 6: Let A ∈ CN×Q and B ∈ C

Q×R, then the following properties are valid for the Moore-Penrose

inverse:

A+ = A−1 for non-singular A, (61)

(A+

)+ = A, (62)

(AH

)+=

(A+

)H, (63)

AH = AHAA+ = A+AAH , (64)

A+ = AH(A+

)HA+ = A+

(A+

)HAH , (65)

(AHA

)+= A+

(A+

)H, (66)

(AAH

)+=

(A+

)HA+, (67)

A+ =(AHA

)+AH = AH

(AAH

)+, (68)

A+ =(AHA

)−1AH if A has full column rank, (69)

A+ = AH(AAH

)−1if A has full row rank, (70)

AB = 0N×R ⇔ B+A+ = 0R×N . (71)

Proof: (61), (62), and (63) can be proved by direct insertion in Definition 3 of the Moore-Penrose inverse.

Page 29: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

29

The first part of (64) can be proved like this AH = AH(AH

)+AH = AH

(AA+

)H = AHAA+, where the

result from (63) was used. The second part of (64) can be proved in a similar way A H = AH(AH

)+AH =

(A+A

)HAH = A+AAH . The first part of (65) can be shown by A+ = A+AA+ =

(AH

(AH

)+)H

A+ =

AH(A+

)HA+. The second part of (65) follows from A+ = A+AA+ = A+

((A+

)HAH

)H= A+

(AH

)+AH .

(66) and (67) can be proved by using the results from (64) and (65) in the definition of the Moore-Penrose inverse.

(68) follows from (65), (66), and (67).

(69) and (70) follow from (61), (68), and rank (A) = rank(AHA

)= rank

(AAH

), taken from 0.4.6 in [22].

Now, (71) will be shown. Firstly, it is shown that AB = 0N×R implies B+A+ = 0R×N . Assume that AB = 0N×R.

From (68), it follows that:

B+A+ =(BHB

)+BHAH

(AAH

)+. (72)

AB = 0N×R leads to BHAH = 0R×N , and then (72) yields B+A+ = 0R×N . Secondly, it will be shown that

B+A+ = 0R×N implies AB = 0N×R. Assume that B+A+ = 0R×N . Using the implication just proved, i.e., if

CD = 0M×P , then D+C+ = 0P×M , where M and P are integers given by the size of the matrices C and D,

gives(A+

)+ (B+

)+ = 0N×R, and then the desired result follows from (62).

REFERENCES

[1] A. Paulraj, R. Nabar, and D. Gore, Introduction to Space-Time Wireless Communications. Cambridge, United Kingdom: Cambridge

University Press, 2003.

[2] F. J. González-Vázquez, “The differentiation of functions of conjugate complex variables: Application to power network analysis,”

IEEE Trans. Education, vol. 31, no. 4, pp. 286–291, Nov. 1988.

[3] S. Alexander, “A derivation of the complex fast Kalman algorithm,” IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, pp.

1230–1232, Dec. 1984.

[4] A. I. Hanna and D. P. Mandic, “A fully adaptive normalized nonlinear gradient descent algorithm for complex-valued nonlinear adaptive

filters,” IEEE Trans. Signal Process., vol. 51, no. 10, pp. 2540–2549, Oct. 2003.

[5] E. Kreyszig, Advanced Engineering Mathematics, 7th ed. New York, USA: John Wiley & Sons, Inc., 1993.

[6] J. W. Brewer, “Kronecker products and matrix calculus in system theory,” IEEE Trans. Circuits, Syst., vol. CAS-25, no. 9, pp. 772–781,

Sept. 1978.

[7] J. R. Magnus and H. Neudecker, Matrix Differential Calculus with Application in Statistics and Econometrics. Essex, England: John

Wiley & Sons, Inc., 1988.

[8] D. A. Harville, Matrix Algebra from a Statistician’s Perspective. Springer-Verlag, 1997, corrected second printing, 1999.

Page 30: 1 Complex-Valued Matrix Differentiation: Techniques and Key …gesbert/papers/matrixdiff.pdf · 2006. 5. 4. · 1 Complex-Valued Matrix Differentiation: Techniques and Key Results

30

[9] T. P. Minka. (December 28, 2000) Old and new matrix algebra useful for statistics. [Online]. Available: http://research.microsoft.com/

~minka/papers/matrix/

[10] K. Kreutz-Delgado, “Real vector derivatives and gradients,” Dept. of Electrical and Computer Engineering, UC San Diego, Tech. Rep.

Course Lecture Supplement No. ECE275A, December 5. 2005, http://dsp.ucsd.edu/~kreutz/PEI05.html.

[11] S. Haykin, Adaptive Filter Theory, 4th ed. Englewood Cliffs, New Jersey, USA: Prentice Hall, 1991.

[12] A. H. Sayed, Fundamentals of Adaptive Filtering. IEEE Computer Society Press, 2003.

[13] M. H. Hayes, Statistical Digital Signal Processing and Modeling. John Wiley & Sons, Inc., 1996.

[14] A. van den Bos, “Complex gradient and Hessian,” Proc. IEE Vision, Image and Signal Process., vol. 141, no. 6, pp. 380–383, Dec.

1994.

[15] D. H. Brandwood, “A complex gradient operator and its application in adaptive array theory,” IEE Proc., Parts F and H, vol. 130,

no. 1, pp. 11–16, Feb. 1983.

[16] M. Brookes. Matrix reference manual. [Online]. Available: http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html

[17] K. Kreutz-Delgado, “The complex gradient operator and the CR-calculus,” Dept. of Electrical and Computer Engineering, UC San Diego,

Tech. Rep. Course Lecture Supplement No. ECE275A, Sept.-Dec. 2005, http://dsp.ucsd.edu/~kreutz/PEI05.html.

[18] G. Jöngren, M. Skoglund, and B. Ottersten, “Combining beamforming and orthogonal space-time block coding,” IEEE Trans. Inform.

Theory, vol. 48, no. 3, pp. 611–627, Mar. 2002.

[19] D. Gesbert, M. Shafi, D. Shiu, P. J. Smith, and A. Naguib, “From theory to practice: An overview of MIMO space-time coded wireless

systems,” IEEE J. Select. Areas Commun., vol. 21, no. 3, pp. 281–302, Apr. 2003.

[20] D. H. Jonhson and D. A. Dudgeon, Array Signal Processing: Concepts and Techniques. Prentice-Hall, Inc., 1993.

[21] W. Wirtinger, “Zur formalen theorie der funktionen von mehr komplexen veränderlichen,” Mathematische Annalen, vol. 97, pp. 357–375,

1927.

[22] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press Cambridge, UK, 1985, reprinted 1999.

[23] ——, Topics in Matrix Analysis. Cambridge University Press Cambridge, UK, 1991, reprinted 1999.

[24] D. G. Luenberger, Introduction to Linear and Nonlinear Programming. Reading, Massachusetts: Addison–Wesley, 1973.

[25] N. Young, An Introduction to Hilbert Space. The Pitt Building, Trumpington Street, Cambridge: Press Syndicate of the University of

Cambridge, 1990, reprinted edition.

[26] C. H. Edwards and D. E. Penney, Calculus and Analytic Geometry, 2nd ed. Englewood Cliffs, New Jersey: Prentice-Hall, Inc, 1986.

[27] H. Sampath and A. Paulraj, “Linear precoding for space-time coded systems with known fading correlations,” IEEE Commun. Lett.,

vol. 6, no. 6, pp. 239–241, June 2002.

[28] A. Hjørungnes and D. Gesbert, “Precoding of orthogonal space-time block codes in arbitrarily correlated MIMO channels: Iterative

and closed-form solutions,” IEEE Trans. Wireless Commun., 2005, accepted 11.10.2005.

[29] T. J. Abatzoglou, J. M. Mendel, and G. A. Harada, “The constrained total least squares technique and its applications to harmonic

superresolution,” IEEE Trans. Signal Process., vol. 39, no. 5, pp. 1070–1087, May 1991.

[30] G. Yan and H. Fan, “A Newton-like algorithm for complex variables with applications in blind equalization,” IEEE Trans. Signal

Process., vol. 48, no. 2, pp. 553–556, Feb. 2000.