Algorithms for the Geronimus transformation for orthogonal polynomials on the unit circle

Journal of Computational and Applied Mathematics 267 (2014) 195–217

Contents lists available at ScienceDirect

Journal of Computational and AppliedMathematics

journal homepage: www.elsevier.com/locate/cam

Algorithms for the Geronimus transformation for orthogonalpolynomials on the unit circle

Matthias Humet ∗, Marc Van BarelDepartment of Computer Science, KU Leuven, Celestijnenlaan 200a - bus 2402, 3001 Heverlee, Belgium

a r t i c l e i n f o

Article history:Received 30 January 2012Received in revised form 20 December 2013

MSC:42C0515A2365F99

Keywords:Orthogonal polynomialsGeronimus transformationQR stepSemiseparable matricesRQ factorizationUnitary Hessenberg matrices

a b s t r a c t

Let L be a positive definite bilinear functional on the unit circle defined on Pn, the spaceof polynomials of degree at most n. Then its Geronimus transformation L is defined byL(p, q) = L

(z−α)p(z), (z−α)q(z)

for all p, q ∈ Pn, α ∈ C. Given L, there are infinitely

many such L which can be described by a complex free parameter. The Hessenberg matrixthat appears in the recurrence relations for orthogonal polynomials on the unit circle isunitary, and can be factorized using its associated Schur parameters. Recent results showthat the unitary Hessenberg matrices associated with L and L, respectively, are relatedby a QR step where all the matrices involved are of order n + 1. For the analogue on thereal line of this so-called spectral transformation, the tridiagonal Jacobimatrices associatedwith the respective functionals are related by an LR step. In this paperwe derive algorithmsthat compute the new Schur parameters after applying a Geronimus transformation. Wepresent two forward algorithms and one backward algorithm. TheQR step between unitaryHessenberg matrices plays a central role in the derivation of each of the algorithms, wherethemain idea is to do the inverse of a QR step.Making use of the special structure of unitaryHessenberg matrices, all the algorithms are efficient and need only O(n) flops. We presentseveral numerical experiments to analyse the accuracy and to explain the behaviour of thealgorithms.

© 2014 Elsevier B.V. All rights reserved.

1. Introduction

Let Lm,n, m ≤ n, be the vector space of Laurent polynomials

p(z) =

nk=m

akzk, ak ∈ C.

We denote by P = L0,∞ the vector space of polynomials with complex coefficients and by Pn = L0,n its subspace withpolynomials of degree less than or equal to n.

The research was partially supported by the Research Council KU Leuven, project OT/10/038 (Multi-parameter model order reduction and itsapplications), CoE EF/05/006 Optimization in Engineering (OPTEC), by the Fund for Scientific Research-Flanders (Belgium), G.0828.14N (Multivariatepolynomial and rational interpolation and approximation), and by the Interuniversity Attraction Poles Programme, initiated by the Belgian State, SciencePolicy Office, Belgian Network DYSCO (Dynamical Systems, Control, and Optimization).∗ Corresponding author. Tel.: +32 16 327563; fax: +32 16 327996.

E-mail addresses: [email protected] (M. Humet), [email protected] (M. Van Barel).

http://dx.doi.org/10.1016/j.cam.2014.02.0170377-0427/© 2014 Elsevier B.V. All rights reserved.

http://dx.doi.org/10.1016/j.cam.2014.02.017

http://www.elsevier.com/locate/cam

http://www.elsevier.com/locate/cam

http://crossmark.crossref.org/dialog/?doi=10.1016/j.cam.2014.02.017&domain=pdf

mailto:[email protected]

mailto:[email protected]

http://dx.doi.org/10.1016/j.cam.2014.02.017

196 M. Humet, M. Van Barel / Journal of Computational and Applied Mathematics 267 (2014) 195–217

Next we consider a bilinear functional L defined on Pn, which is Hermitian, L(p, q) = L(q, p), and unitary, L(zp, zq) =

L(p, q). The moment matrix Tn ∈ C(n+1)×(n+1) associated with L is the Toeplitz matrix

Tn =L(z i, z j)

ni,j=0 =

µ0 µ1 · · · µn−1 µn

µ1 µ0. . . µn−1

.... . .

. . .. . .

...

µn−1. . . µ0 µ1

µn µn−1 · · · µ1 µ0

(1)

where µk = L(zk, 1), k = 0, . . . , n, are the moments associated with L.

Definition 1.1 ([1]).(i) L is quasi-definite on Pn if Tn is strongly regular, i.e., if all the leading principal submatrices of Tn are nonsingular.(ii) L is positive definite on Pn if Tn > 0, i.e., if Tn is positive definite.

For P∞ = P, we will simply say that L is quasi-definite (or positive definite).

As mentioned in [2], L can be written as L(p, q) = Fp(z)q

1z

, where F is a linear functional defined on L−n,n with

F (zk) =

µk k ≥ 0,µ−k k ≤ 0.

The bar on q(z) denotes complex conjugation of the coefficients of q(z). It is well known that if L is positive definite, thenit has an integral representation given by

L(p, q) =

Tp(z)q

1z

dµ(z), (2)

where dµ(z) is a positive measure on the unit circle T = z ∈ C : |z| = 1 (see [2–4]).If L is positive definite on Pn, then there exists a unique sequence of orthonormal polynomials φk

n0 defined by1

φk(z) = κkzk + lower degree terms, κk > 0,L(φk, φl) = δk,l,

(3)

where δk,l is the Kronecker delta. The polynomials φk(z) satisfy the following recurrence relations (see [5])

ρkφk(z) = zφk−1(z)+ akφ∗

k−1(z),φk(z) = zρkφk−1(z)+ akφ∗

k (z),

where ρk =1 − |ak|2 and φ∗

k (z) = zk φk (1/z) is the so-called reversed polynomial of φk(z). Both recurrence relationsare equivalent, the first is a forward relation, while the second is a backward one. The numbers akn1 are known as Schurparameters, Verblunsky parameters and reflection coefficients. They lie inside the open unit disc D = z ∈ C : |z| < 1 andtogether with the first moment µ0, they determine the polynomials φk

n0 completely.

IfL is quasi-definite but not positive definite on Pn, then there exists a sequence of monic orthogonal polynomials Φkn0,

soΦk(z) = zk + lower degree terms, and L(Φk,Φl) = γkδk,l with γk = 0. The polynomialsΦk(z) satisfy similar recurrencerelations determined by the same Schur parameters akn1, which satisfy |ak| = 1. Note that in the quasi-definite case, theSchur parameters can lie outside the unit circle while ak ∈ D for all k ≤ n iff L is positive definite on Pn.

Suppose L is positive definite, with associated orthonormal polynomials φk∞

0 and φ(z) = [φ0(z) φ1(z) · · · ]T , then

z φ(z)T = φ(z)TH (4)

whereH is a semi-infinite Hessenbergmatrix with orthonormal columns (see [5, Chapter 4]). It is completely determined bythe sequence of Schur parameters an∞1 that appear in the recurrence relations for φk

∞

0 . We call H the Hessenberg matrixassociated with L.

In [6,2,7,8] the following linear spectral transformation of L has been studied

L(p, q) = L(z − α)p(z), (z − α)q(z)

, α ∈ C, (5)

called the Christoffel transformation of L. We use the notation L = |z − α|2L, since for a positive definite L associated

with a measure dµ(z), it amounts to multiplying the measure by |z − α|2. Our work builds on an important connection

1 We will always assume that orthonormal polynomials are defined with a positive highest degree coefficient.

M. Humet, M. Van Barel / Journal of Computational and Applied Mathematics 267 (2014) 195–217 197

between the Christoffel transformation and a QR step, studied in [6,2]. If H and H are the Hessenberg matrices associatedwith positive definite L and L, respectively, then

H − αI = QR,

H − αI = RQ ,(6)

where Q has orthonormal columns and R is upper triangular with positive real diagonal elements. I is the semi-infiniteidentity matrix.

The inverse of the Christoffel transformation has been considered in [8,9]. In [8] it is proven that if L is a bilinear func-tional, |α| > 1 and (5) holds, then L is given by

L(p, q) = L

p(z)z − α

,q(z)z − α

+ mp(α)q

1α

+ mp

1α

q(α), (7)

where m ∈ C is a free parameter. (7) is called the Geronimus transformation of L. Hence, a third linear spectral transfor-mation appears naturally, called the Uvarov transformation, which has been studied in [6,8]

U(p, q) = L(p, q)+ mp(α)q1α

+ mp

1α

q(α), |α| > 1, m ∈ C. (8)

A recent overview of these linear spectral transformations is given in [10], both for the real line and for the unit circle.The analogue for the real line of the Hessenberg matrix H is a symmetric positive definite tridiagonal matrix J . It is oftenreferred to as the Jacobi matrix, associated with a positive measure dν(x) on the real line. The Christoffel transformation onthe real line counts down to multiplication of dν(x) by x − β , which yields a new Jacobi matrix J that satisfies

J − βI = LLT

J − βI = LT L.(9)

This relation corresponds to one step of the Cholesky LR algorithmwith shift β [11, Chapter 8, Sections 20,55], disregardingthe fact that J is semi-infinite.2 When multiplying dν(x) by (x − β)2, the resulting Jacobi matrix can be found by a QR stepwith shift β . This can be proven directly (see [12–14]) or by noticing that doing two steps of a Cholesky LR algorithm is thesame as doing one step of a QR algorithm, a fact discussed in [15].

These relations between Jacobi matrices and the Christoffel transformation on the real line first appeared in [16,12,17] inthe context of constructing Gaussian quadrature rules. The resulting algorithmswere studied again in [18], where amodifiedalgorithm is presented including a stability analysis.

Analogously to (7), the Geronimus transformation of a measure dν(x) on the real line depends on a free parameter(see [19,10]). Only one value of this parameter corresponds to the modified measure dν(z)/(x − β). Finding the Jacobimatrix associated with this measure has been studied in [20–22]. In [19] the more general case for any value of the freeparameter has been considered including a stability analysis.

In this paper we present efficient algorithms for the Geronimus transformation on the unit circle and we discuss theiraccuracy by doing several numerical experiments. We only focus on positive definite functionals, hence our algorithms canbe written in terms of unitary Hessenberg matrices. More precisely, we need to do an inverse QR step, by computing somesort of RQ factorization. By working directly on a finite sequence of n Schur parameters that constitute these Hessenbergmatrices, the algorithms need O(n) floating point operations (flops) as in the real line case.

Eq. (7) shows that the Geronimus transformation is not unique, but depends on a free parameterm. While this equationholds for quasi-definite functionals, we will only consider functionals that are positive definite. Moreover, since we workwith a finite sequence of Schur parameters, the functional L is not necessarily defined on the entire space P. In Section 2 weredefine the Geronimus transformation for a positive definite functional defined on Pn. Using linear algebra, we prove thatthe solution set can again be described by an Uvarov relation with parameterm ∈ C, a result analogous to (7).

We proceed by reviewing some linear algebra concepts in Section 3 that are needed in the following sections, such asGivens transformations and semiseparable matrices. Then in Section 4 we deduce a finite generalization of (6) which servesas a basis for all the algorithms discussed.

Before heading to our main contribution, we give an efficient algorithm for the Christoffel transformation in Section 5.Although fast implementations of this algorithm already exist in the context of the QR algorithm to find eigenvalues ofunitary Hessenberg matrices (see [23–25]), we include our version, because it is closely related to the algorithms in thefollowing sections.

In Section 6 a backward algorithm for the Geronimus transformation is presented, built on the Christoffel algorithm ofSection 5. Section 7 is devoted to computing a unique RQ factorization in a forward sense. To this end, we show that the

2 Note that the Cholesky factorization in (9) only exists if J − βI is positive definite. More general relations for Jacobi matrices can be found in thereferences that are given later.


matrix R is a semiseparable matrix and we analyse some of its properties. This RQ factorization then leads to two forwardalgorithms for the Geronimus transformation in Section 8.

We finish the paper with some numerical experiments in Section 9. First we show that the backwardmethod can be veryaccurate. Then we discuss how the backwardmethod can converge to a numerically identical sequence of Schur parametersfor any value of the free parameter. The convergence behaviour depends on α and on the input sequence. Next we showsome results of the forward method and we analyse the conditioning of the forward problem. Both the convergence ofthe backward method and the conditioning of the forward problem are linked with the Uvarov transformation. Finally wecompare both variants of the forward method.

During the preparation of this manuscript, we became aware of similar results by Cantero, Moral and Velázquez in [26].Instead of using linear matrix algebra, they work with direct relations between Schur parameters. The algorithms theypresent are more general since they compute Schur parameters associated with Hermitian Laurent polynomial modifica-tions of arbitrary degree of quasi-definite functionals. Indeed, the Christoffel transformation is a special case of a HermitianLaurent polynomial modification of degree one. However, from a numerical point of view, there is no guarantee that theiralgorithms will be accurate, since no numerical tests have been done. Although focusing on a more specific problem, wethink that our contribution is valuable, since we adopt an entirely different approach, we include several numerical exper-iments to show when our algorithms work well and we analyse the numerical behaviour of the problem. By restricting usto positive definite functionals, we can implement an inverse QR step using Givens transformation which is a numericallyattractive approach.

2. Geronimus transformation redefined

In this section, we redefine the Geronimus transformation in Pn and describe the solution set of this transformation usinglinear algebra.

Suppose a functionalL is only defined on Pn, whichwe denote asL[Pn]. Thenwe canwrite its Christoffel transformationas the functional L that satisfies

L(p, q) = L(z − α)p(z), (z − α)q(z)

, for all p, q ∈ Pn−1. (10)

The algorithms in this paper solve the inverse problem. Suppose the functional L[Pn−1] is given by its associated Schurparameters akn−1

1 , find the Geronimus transformation L[Pn] that satisfies (10), by computing the Schur parameters akn1.In terms of the moments µk and µk associated with L and L, respectively, we can write (10) as

µk = −αµk−1 + (1 + |α|2)µk − αµk+1, k = −n + 1, . . . , n − 1.

Note that µ−k = µk and µ−k = µk. If L is given by its moments µkn−10 , then the problem of finding L is equivalent to

solving the linear system

β −α−α β −α

. . .. . .

. . .

−α β −α−α β −α

−α β −α

. . .. . .

. . .

−α β −α−α β

µnµn−1...µ1µ0µ1...

µn−1µn

=

γ

µn−1...

µ1µ0µ1...

µn−1γ

,

where β = 1 + |α|2. We write this as

Aµ(γ ) = µ(γ ). (11)

The matrix A is tridiagonal, Toeplitz and Hermitian. The number γ ∈ C acts as a free parameter describing the solution setof Geronimus transformations of L. Since the matrix A is positive definite,3 any value of γ agrees with exactly one solutionand vice versa. Note that µ(γ ) and µ(γ ) satisfy the following property

Pµ(γ ) = µ(γ ),

where P ∈ R(2n+1)×(2n+1) is defined by Pi,j = δi+j,2n+2. So reversing the ordering of the elements of µ(γ ) is equivalent totaking the complex conjugate.

3 This can be checked by computing its leading principal minors.


We can write the solution of (11) as

µ(γ ) = µ(0)+ d,where µ(0) is the particular solution for γ = 0 and d satisfies

Ad =

γ0...0γ

. (12)

Let A be the matrix A without its first row and its last row, then (12) holds iff Ad = 0 and Pd = d. The nullspace of A istwo-dimensional. One can easily check that the following vectors are in this nullspace

u =α−n

· · · α−1 1 α · · · αnT ,v =

αn

· · · α 1 α−1· · · α−nT .

If |α| = 1, then u and v are linearly independent so there exist δu, δv ∈ C such that

d = δuu + δvv.Note that Pu = v. The condition Pd = d then translates to

δuv + δvu = δuu + δvv.From the linear independence of u and v it follows that δu = δv . Hence, we can write the solution of (11) as

µ(γ ) = µ(0)+ mu + mv.We have proven the following proposition, which is a variant of (7) for finite functionals.

Proposition 2.1. Let L[Pn−1] be a functional given by the sequence of moments µkn−10 . If |α| = 1, then the Geronimus trans-

formation L[Pn] defined by (10) is characterized by a solution set described by one complex parameter m. The moments µkn0

associated with a solution L satisfy

µk = µk + mαk+ m(α)−k, m ∈ C

where µkn0 agrees with a particular solution.

We adopt the following notation.

Definition 2.2. Given a functional L[Pn−1], then S is the set of Geronimus transformations of L:

S = L[Pn] : L = |z − α|2L .

If L is positive definite on Pn−1, then S+ is the set of positive definite Geronimus transformations of L:

S+= positive definite L[Pn] : L = |z − α|

2L .

Note that the set S+ can be described by all the positive definite Uvarov transformations of one particular L ∈ S+, a topicwe have discussed in [27].

Our algorithms for the Geronimus transformation are restricted to compute Schur parameters ak ∈ D. Hence, we can onlycompute solutions pertaining to the set S+. Therefore, this set will be of importance to explain the numerical behaviour ofthe algorithms that compute a Geronimus transformation.We cover this in Section 9, where we alsomake connections withthe results in [27].

If L is known by its measure dµ, then

L(p, q) =

Tp(z)q

1z

dµ(z)

|z − α|2(13)

is the unique positive definite Geronimus transformation defined by (5). Its uniqueness follows from (7) and the result in [27]thatm = 0 iff the Uvarov transformation is positive definite. In practice only the first nmoments are known correspondingto L[Pn−1], yielding a set S+ of Geronimus transformations. The functional L[Pn] given by (13) of course belongs to S+ andits Schur parameters can be computed with the algorithms presented in this paper.

3. Givens transformations, semiseparability and Hessenberg matrices

In this section, we briefly define some numerical linear algebra concepts, which are needed in other sections. Theseinclude Givens transformations, semiseparable matrices and unitary Hessenberg matrices.


Givens transformations

Givens transformations are unitary matrices that work on only two rows (columns) when multiplied to the left (right).Throughout this article, we will make extensive use of Givens reflections, which are Givens transformations with determi-nant equal to −1. Note that Givens rotations have determinant equal to 1. We consider Givens reflectionsΩk ∈ Cn×n thatact on consecutive rows, which can be written as

Ωk =

Ik−1c ss −c

In−k−1

, Ωk =

c ss −c

, |c|2 + |s|2 = 1. (14)

The subindex k indicates that it acts on rows k and k+1 andΩk denotes the nontrivial part of the Givens transformationΩk.

We will rely on two properties, discussed in [28, pp. 392–395] for real matrices, to manipulate sequences of Givenstransformations.We also introduce a visualization for Givens transformations that will clarify our reasoning in the followingsections.

Lemma 3.1 (Fusion). Given two Givens transformationsΩk and Ωk that operate on the same two rows, thenΓk = ΩkΩk is againa Givens transformation. We will call this the fusion of Givens transformations, visualized as follows:

Lemma 3.2 (Shift-Through). Suppose three Givens transformations Ak, Bk+1 and Ck are given. Then there exist Givens transfor-mations Ak+1, Bk and Ck+1 such that

Note that the fusion of two reflections is a rotation. In our implementation of shift-through, we will always choose Ak+1 andBk to be reflections, leaving Ck+1 uniquely determined. They can easily be found as the Givens reflections that induce zerosat the appropriate locations in the unitary 3 × 3 matrix AkBk+1Ck.

Semiseparability and Hessenberg matrices

Another concept that we need is semiseparability, which we define as follows (similar definitions can be found in [28]):

Definition 3.3. A matrix A ∈ Cn×n is called k-semiseparable if all submatrices whose lower-left element lies on the k-thsuperdiagonal, have rank one or zero, i.e.,

rank(A(1 : j, j + k : n)) ≤ 1, 1 ≤ j ≤ n − k. (15)

We will visualize this structure as follows. Consider from left to right A0, A1 and A2, respectively 0-, 1- and 2-semiseparable5 × 5 matrices:

Every submatrix consisting only of elements⊗ has rank one or rank zero. Note that a k-semiseparablematrix is also (k+m)-semiseparable, whenm ≥ 0. Later on, we will need the following proposition.

Proposition 3.4. Let A,H ∈ Cn×n. If A is k-semiseparable and H is lower Hessenberg then the products M = HA and N = AHwill be at least (k+ 1)-semiseparable. Furthermore the rows contained in the corresponding semiseparable parts of M and H areproportional and the columns contained in the corresponding semiseparable parts of N and H are proportional. More precisely, if1 ≤ j ≤ n − k − 1 and

A(1:j,j+k:n)= abT , M(1:j−1,j+k:n)

= cdT , and N (1:j,j+k+1:n)= ef T (16)

then d = γ b and e = δa for some γ , δ ∈ C.


Proof. We prove that if (16) holds for j ∈ 1, . . . , n − k − 1, then d = γ b. Let hTl and mT

l be the l-th row vectors of H and

M respectively. Then for any l ∈ 1, . . . , j − 1 we can write hTl =

hl

T0Twith hl ∈ Cj, so we get

mTl = hT

l A =

hl

T0T A1 abT

A2 A3

(17)

= hlT

A1 abT (18)

=

hl

TA1 (hl

Ta)bT

(19)

where A1 ∈ Cj×(j+k−1). This shows thatM is (k+ 1)-semiseparable and that d = γ b. The proof for e = δa is analogous.

Next, consider the following definition.

Definition 3.5. Let R,H ∈ Cn×n:(i) R is proper triangular if it is upper or lower triangular and has strictly positive real diagonal elements.(ii) H is proper Hessenberg if it is Hessenberg and if it has strictly positive real subdiagonal elements.

It is well known that a proper Hessenberg matrix H ∈ Cn×n with orthonormal columns can be factorized in its Schur para-metric form (see [23])

H = Ω1Ω2 . . .Ωn−1 Iβ , Ωk =

ck sksk −ck

, sk ∈ R+ (20)

whereΩk, Iβ ∈ Cn×n and Iβ is the identity matrix with the unimodular number β as its lower right element. As we noted inSection 1, ifH is the semi-infiniteHessenbergmatrix associatedwith a positive definite functionalL, then it has orthonormalcolumns and it is proper Hessenberg. Therefore, we have

H =

∞k=1

Ωk, Ωk =

−ak ρkρk ak

,

where ak is the k-th Schur parameter associated with L (see [5]). This shows that the matrix H is completely determined bythe sequence ann≥1 of Schur parameters.

Using the Givens factorization of (20), we can deduce the following proposition:

Proposition 3.6. If H is proper Hessenberg and has orthonormal columns, then H is 0-semiseparable.

4. Finite QR step

The Hessenberg matrices corresponding to a positive definite functional and its Christoffel transformation satisfy (6).This result can be found in [6,2]. In this section we adapt the proofs in these references to give a more precise relation whenL is only positive definite on Pn. Then we simplify the relation to contain only square matrices.

Let φkn0 and ψk

n−10 be the sequences of orthonormal polynomials associated with L and L = |z −α|

2L, respectively.If φn = [φ0(z) φ1(z) · · ·φn(z)]T , then

z φTn−1 = φT

nH′

(n+1) (21)

is a finite version of (4). ThematrixH ′

(n+1) ∈ C(n+1)×n is upper Hessenberg, where the lower index between brackets denotesthe number of rows, and the single inverted comma denotes that it is one column short to be square. Note that H ′

(n+1) alsohas orthonormal columns, since it is the nonzero part of the first n columns of the Hessenberg matrix H of (4). We will callH ′

(n+1) the Hessenberg matrix of order n associated with L.Let ψk = [ψ0(z) ψ1(z) · · ·ψk(z)]T . It is clear that

φTn−1 = ψT

n−1 R(n),

(z − α)ψTn−1 = φT

n Q′

(n+1),(22)

where R(n) ∈ Cn×n is upper triangular and Q ′

(n+1) ∈ C(n+1)×n is upper Hessenberg. Note that

In = Lψn−1,ψ

Tn−1

= L(z − α)ψn−1, (z − α)ψT

n−1

= LQ ′T(n+1) φn, φ

Tn Q

′

(n+1)

= Q ′T

(n+1)Lφn,φ

TnQ ′

(n+1)

= (Q ′

(n+1))∗ Q ′

(n+1),

so the matrix Q ′

(n+1) has orthonormal columns.Using Eq. (22), we can prove the following proposition, which generalizes (6).


Proposition 4.1. Let L be a bilinear functional that is positive definite on Pn and let L = |z − α|2L, so L is positive definite on

Pn−1. Let H ′

(n+1) and H ′

(n) be the Hessenberg matrices of order n and n − 1, associated with L and L, respectively. Then

H ′

(n+1) − αI ′(n+1) = Q ′

(n+1) R(n)

H ′

(n) − αI ′(n) = R(n) Q ′

(n).(23)

The matrix R(n) ∈ Cn×n is upper triangular and has positive real diagonal elements, while Q ′

(k) ∈ Ck×(k−1) is upper Hessenbergand has orthonormal columns. I ′(k) ∈ Ck×(k−1) denotes the identity matrix without its last column.

Proof. Both equations follow from (21) and (22):

φTn(H

′

(n+1) − αI ′(n+1)) = (z − α)φTn−1 = (z − α)ψT

n−1 R(n)= φT

n Q′

(n+1)R(n)

ψTn−1(H

′

(n) − αI ′(n)) = (z − α)ψTn−2 = φT

n−1 Q′

(n)

= ψTn−1 R(n)Q

′

(n).

In the rest of this section, we simplify (23) to a form containing only square matrices. This form is also the basis for thefollowing sections.

Let H ∈ C(n+1)×(n+1) be the unitary matrix H =H ′

(n+1) h. Note that the column vector h is unique up to a unimodular

factor. Next consider the following QR step

H − αI = QR,

H = RQ + αI = Q ∗HQ ,(24)

where R is proper upper triangular,Q is unitary and all thematrices involved are of order n+1. Note that the QR factorizationis uniquely determined and Q is proper Hessenberg. It follows that also H is proper Hessenberg. Moreover, it is clear that Qand R can be partitioned as

(25)

Hence

(26)

The QR step (24) thus encapsulates (23). Note that there is some freedom left as h can be scaled by a unimodular factor, butthis only changes the columns h1 and h2.

The unitary matrix H is proper Hessenberg and can be written in its Schur parametric form (20). We choose β = 1, sincethis only affects the last column of H , giving

H =

nk=1

Gk, Gk =

−ak ρkρk ak

. (27)

The matrix H is clearly unitary and has the Schur parametric form

H =

nk=1

Gk, Gk =

−ak ρk

ρk ak

, (28)

because det(H) = det(H) and det(Gk) = det(Gk) = −1.Proposition 4.1 implies that akn1 are the Schur parameters associated with the functional L, while akn−1

1 are thoseassociated with L. Note that an is not necessarily the n-th Schur parameter associated with L. In Section 6, we will treat anas a free parameter while computing the Geronimus transformation of L.

5. Christoffel algorithm

In this section we give a fast algorithm to compute the QR step (24), corresponding to a Christoffel transformation. Eventhough the ideas we use are not new, the algorithm provides valuable insights for the following sections where we handlean RQ step, corresponding to a Geronimus transformation.


The computation of H is equivalent to one step of the implicitly shifted QR algorithm to compute the eigenvalues of H .This algorithm has been extensively studied (see [29–31]). In every step, the matrix Q is typically built up implicitly byusing Givens transformations and bulge chasing. Moreover, fast algorithms have been developed for unitary Hessenbergmatrices, involving only O(n) flops per step (see [23,24,32,33,25]). We will briefly present an equivalent algorithm usingthe elementary operations on Givens transformations of Lemmas 3.1 and 3.2. This strategy is not new (see e.g. [34,35]), butit will give a groundwork for the following sections.

First, consider a general proper Hessenberg matrix H . The implicit QR step using bulge chasing is then as follows. Thematrix Q is unitary proper Hessenberg, so it can be written as Q = Q1 . . .QnIβ , where the reflections Qk and the matrix Iβsatisfy

Qk =

ck sksk −ck

, sk ∈ R+, and Iβ =

In

β

, |β| = 1,

where In is the identity matrix of order n. The first reflection Q1 induces a zero in the first column of H −αI . If h1,1 = H(1, 1)and h2,1 = H(2, 1), then

(Q1 )

∗

h1,1 − α

h2,1

=

r0

, r ∈ R+. (29)

Let Hk := Q ∗

k . . .Q∗

1 HQ1 . . .Qk. The reflections Qk are found as follows. In general, Hk is Hessenberg except for one nonzeroelement, the bulge, on position (k+2, k). We compute Qk+1 such that Q ∗

k+1Hk is back in Hessenberg form, but applying Qk+1to the right creates a bulge on position (k + 3, k + 1). We visualize this for k = 1, looking only at the leading principalsubmatrices of order 5:

Hence, after applying Q1, the other Givens reflections are found by eliminating the bulge in each step, chasing it down tothe lower right corner. The unitary matrix Hn is Hessenberg and has strictly positive and real subdiagonal elements, exceptpossibly for the element Hn(n + 1, n). With the right β , we have H = IβHnIβ , which is proper Hessenberg.

Applying this procedure to a general Hessenberg matrix requires O(n2) flop. If we exploit the Schur parametric form(20) of H , we can reduce this to O(n) flop. The idea is never to work with the full matrix H , but to manipulate the Schurparameters directly using the fusion and shift-through operations of Lemmas 3.1 and 3.2.We explain the algorithm for n = 5to keep the notation simple. To start, Q1 is the Givens reflection that satisfies (29). Multiplication of Q1 to both sides of H isdone as follows

where F denotes fusion and ST denotes shift-through on the bold matrices on the left hand side. It is important to note thefollowing:

• Q1 can be moved before G3, because two Givens transformations that operate on different rows can be interchanged.• H1 is a product of Q2 and a unitary Hessenberg matrix, so by multiplying with Q ∗

2 we remove the bulge. This explainswhy we immediately name this Givens reflection Q2.

• N1 is a rotationwhich implies thatN2 is also a rotation, becausewe chooseQ2 and G1 to be reflections in our shift-throughimplementation.

• A straightforward calculation shows that sk = Qk(2, 1) ∈ R+ and ρk = Gk(2, 1) ∈ R+.


The following steps are very similar.

and so on, until we get

Finally, Iβ is applied to the reflectionM5 in order to compute G5.The resulting Algorithm 5.1 is given below. We assume that Givens(a, b) computes the Givens reflection Q satisfying

Qab

=

r0

, r ∈ R+,

and that ShiftThrought(A, B, C) is our implementation of Lemma 3.2. In the algorithm itself, we drop the (·) notation tomake the notation lighter. Note that the subindices of Nk and Qk can be omitted without affecting the output.

Algorithm 5.1 QR step on unitary Hessenberg matrixINPUT: α; G

1 , . . . ,Gn such that H =

nk=1 Gk.

OUTPUT: G1 , . . . , G

n such that H =

nk=1 Gk = RQ + αI , with H − αI = QR.

a1 = −G1(1, 1) ρ1 = G1(2, 1)Q ∗

1 = Givens(−a1 − α, ρ1)N1 = Q ∗

1 G1for k = 1:n-1[Qk+1,Gk,Nk+1] = ShiftThrought(Nk,Gk+1,Qk)

endMn = NnQn β = sign(Mn(2, 1))

Gn =

1

β

Mn

1

β

6. Backward Geronimus algorithm

In this section, we present a first algorithm to compute the Geronimus transformation. It is very similar to the Christoffelalgorithm of the previous section. Given the matrix H , computing the unique RQ factorization of H −αI yields H = QR+αI .This can be done efficiently, by doing the RQ step implicitly using the factorizations (27) and (28). Instead of repeating thescheme of Section 5 with some adaptations, we show how the RQ step can be implemented using Algorithm 5.1. At theend of the section we explain how the resulting algorithm computes a Geronimus transformation where the last Givensreflection Gn of H acts as a free parameter.

Letn

k=1

Gk − αI = QR, (30a)

nk=1

Gk − αI = RQ , (30b)


where Gk, Gk,Q , R ∈ C(n+1)×(n+1),Q is unitary and R is proper upper triangular. Gk and Gk are Givens reflections of the form

Gk =

−ak ρkρk ak

, G

k =

−ak ρk

ρk ak

, ρk, ρk ∈ R+. (31)

While Algorithm5.1 computes Gknk=1 from Gk

nk=1 andα, the algorithmof this section is its exact inverse, since it computes

Gknk=1 from Gk

nk=1 and α.

Let P ∈ R(n+1)×(n+1) be the orthogonal matrix defined by Pi,j = δi+j,n+2, where δn,k is the Kronecker delta. Taking theconjugate transpose of (30) and multiplying with P on both sides we get

1k=n

PG∗

kP − αI = PR∗P PQ ∗P,

1k=n

PG∗

kP − αI = PQ ∗P PR∗P,

since P2= I . Note that multiplying with P on both sides reverses the order of rows and columns. It follows that if Ωk ∈

C(n+1)×(n+1) is a Givens transformation operating on rows k and k+ 1, then PΩkP operates on rows n− k+ 1 and n− k+ 2.We define

Γn−k+1 = PG∗

kP,

Γn−k+1 = PG∗

kP,(32)

for k = 1, . . . , n, and Q = PQ ∗P, R = PR∗P , yieldingn

k=1

Γk − αI = R Q ,

nk=1

Γk − αI = Q R.

Hence, we can find Gkn1 by applying Algorithm 5.1 after reversing the given sequence Gk

n1, and reversing its output again,

where we ‘reverse’ according to (32). The result is Algorithm 6.1. It is clear that the algorithm works in a backward sense,computing first Gn, then Gn−1 and so on.

Algorithm 6.1 RQ step on unitary Hessenberg matrixINPUT: α; G

1 , . . . , Gn such that H =

nk=1 Gk.

OUTPUT: G1 , . . . ,G

n such that H =

nk=1 Gk = QR + αI , with H − αI = RQ .

P =

0 11 0

Γ n−k+1 = P(G

k )∗P for k = 1, . . . , n

Γ1, . . . ,Γn = Algorithm 5.1α, Γ1, . . . , Γn

Gk = P(Γ

n−k+1)∗P for k = 1, . . . , n

Note that we could simplify the algorithm by using only the Schur parameters as inputs and outputs. Since

PG∗

kP = P

−ak ρkρk ak

∗

P =

ak ρkρk −ak

,

the algorithm would only have one line

(−an, . . . ,−a1) = Algorithm 5.1( α; −an, . . . ,−a1).

Nevertheless we stick by using Givens transformations for numerical reasons. That is, the complementary parameters ρkcannot be computed accurately from the Schur parameters ak if those are close to the unit circle (see [36, p. 59]).

Let us explain howAlgorithm6.1 computes theGeronimus transformation. Suppose that akn−1k=1 are the Schur parameters

associated with L and an is an arbitrary number inside the unit circle. From Proposition 4.1 and the discussion in Section 4,we know that the computed aknk=1 must be associated with some L ∈ S+, according to Definition 2.2. Vice versa, anyL ∈ S+ is associated with some value of an ∈ D, that can be computed with Algorithm 5.1. Hence, there is a one-to-onerelation between an ∈ D and L ∈ S+, and an can be interpreted as a complex free parameter.


One question that arises is how Algorithm 6.1 can be useful. If L is given by its associated measure (2), then one couldbe interested in computing L given by (7) for some value of m. Unfortunately, there is no simple expression relating m (orL) to an.

The answer, however, lies in the Uvarov relation between the solutions in S+ and the set D of values for m where theUvarov transformation is positive definite, which we discussed in [27]. Numerical experiments in Section 9 show that forcertain ranges of the modulus of α and depending on the input sequence, we can accurately compute a numerically uniqueGeronimus transformation using the backward algorithm.

7. RQ factorization computed in forward sense

The main equation considered in the previous section is

H − αI = RQ ,H − αI = QR,

(33)

where all matrices involved are of order n + 1. If the Schur parameters akn−11 associated with L are given, then the idea

of Section 6 is to fix the parameter an in order to determine H given by (28), thus making the RQ factorization uniquelydetermined.

In this section, we take as free parameter the first Schur parameter a1 associated with the Geronimus transformationL. Using (33) we show how a1 determines the first columns of R and Q and we deduce formulas to compute R and Q in arecursive fashion. These formulas form the basis for two forward algorithms for the Geronimus transformation, describedin the following section. We will assume that |α| > 1.

Recall that Q is proper Hessenberg and unitary, so we can write

Q = Q1 . . .QnIγ , Qk =

ck sksk −ck

, (34)

according to (20). The matrices H and H are given by (27) and (28). Note that in the current setting Gn is unknown, so thelast two columns of H are undetermined. Next, we define the vectorsw(l), v(l) ∈ Cl+1 for l = 1, . . . , n as follows:

(35)

(36)

The notation ()l means that we take the first l columns. Note three things. First, w(l) generates the first l + 1 rows of thesemiseparable part of H starting at the (l + 1)-th column, i.e., all the columns are proportional tow(l). Second,

w(l)1 = ρ1ρ2 . . . ρl = 0, (37)

since |ak| < 1 for all k. Third, (36) follows from (33) and the fact that applying Ql+1 and following Givens transformationsleaves the first l columns unchanged. The matrix Vl is the part of R that has remained untouched while multiplying byQ1 . . .Ql.

The outline of the remaining part of this section is as follows. First, formulas for the first columns of R andQ are given andthe computation of R is discussed. Then we show that R has semiseparable structure. Finally we present a way of computingQ , using this structure in R.

7.1. Computing the first columns

Let ri,j = R(i, j). From (28) and (34), the first column of the second equation of (33) can be written as−a1 − αρ1

= r1,1

c1s1

.

It follows that

r21,1 = |a1 + α|2+ ρ2

1 = |a1 + α|2+ (1 − a21).

We find the following expressions for r1,1 and c1:

r1,1 =

1 + |α|2 + 2Re(αa1), (38)

c1 = −a1 + α

r1,1. (39)


7.2. Computation of R

Suppose for now that aside from r1,1 and c1, also the values of c2, . . . , cn−1 of the Givens transformations Q2, . . . , Qn−1are given. Consider (36) for l = 1

RQ1 =

r1,1 r1,2 · · · r1,n+1r2,2 · · · · · ·

· · · · · ·

rn+1,n+1

c1 s1s1 −c1

In−1

(40a)

=

(h1,1 − α) v

(1)1 r1,3 · · · r1,n+1

h2,1 v(1)2 r2,3 · · · · · ·

r3,3 · · · · · ·

. . ....

rn+1,n+1

(40b)

where ri,j = R(i, j), hi,j = H(i, j) and v(1) =: [ v(1)1 , v

(1)2 ]

T . It follows thatr1,2r2,2

=

h1,1 − α

h2,1

− c1

r1,10

s1. (41)

v(1) can then be found using (40). This process can be repeated for Q2, Q3, . . . , Qn. Each step consists of the computationof ri,k+1 and v(k)i for i = 1, . . . , k + 1. E.g., when multiplying by Qk we get

v(k−1)1 r1,k+1...

...

v(k−1)k rk,k+10 rk+1,k+1

ck sksk −ck

=

h1,k v

(k)1

......

(hk,k − α) v(k)k

hk+1,k v(k)k+1

(42)

yielding a direct formula for the (k + 1)-th column of R,r1,k+1...

rk,k+1rk+1,k+1

=

h1,k...

hk,k − α

hk+1,k

− ck

v(k−1)1...

v(k−1)k0

sk. (43)

The computation of v(k) follows from (42).It is clear that given Gk

n−1k=1, r1,1 and ckn−1

k=1 , we can compute the first n columns of R and v(k) for k = 1, . . . , n − 1.

7.3. Semiseparable structure of R

The matrices H and H are both 0-semiseparable, according to Definition 3.3 and Proposition 3.6. From (33), we canwrite R as

Q ∗(H − αI) = R = (H − αI)Q ∗.

These are two equations expressing R as a product of a lower Hessenberg matrix and a 1-semiseparable matrix. ApplyingProposition 3.4 leads to the following result.

Proposition 7.1. The matrix R of (33) satisfies the following properties:(i) R is 2-semiseparable.(ii) The rows contained in the semiseparable part of R are proportional to the corresponding rows in H.(iii) The columns contained in the semiseparable part of R are proportional to the corresponding columns in H.

7.4. Computation of Qk

The computation of the Givens factors Qk resides in using the structural properties of the matrix R of Proposition 7.1.Before heading to the actual derivation of the formulas for ck, note the following property.

Proposition 7.2. Let L be a positive definite functional and let H be given by (27), with akn1 Schur parameters associated withL. Suppose that |α| > 1. If Q satisfies (33) and if Qk are the Givens factors of (34), then ck = 0 for all k ≤ n.


Proof. Suppose ck = 0, or equivalently

Qk =

0 11 0

,

and let ek be the k-th column of the identity matrix. Then

Qek = Q1 . . .Qkek = Q1 . . .Qk−1ek+1 = ek+1.

Note that (25) implies Qek = Q ′

n+1ek, where ek equals ek without its last element. It follows from (22) that

φk+1(z) = (z − α)ψk(z),

where φk(z)k≥0 and ψk(z)k≥0 are the orthonormal polynomials associated with L and |z − α|2L. This means that

φk+1(α) = 0, which is in contradiction with the fact that the zeros z∗ of orthogonal polynomials on the unit circle satisfy|z∗

| < 1 (see [5, Theorem 1.7.1]).

From Proposition 7.1 and the fact that H is 0-semiseparable, the first k − 1 rows of (42) becomev(k−1)1 γw

(k−2)1

......

v(k−1)k−1 γw

(k−2)k−1

ck sksk −ck

=

δw(k−2)1 v

(k)1

......

δw(k−2)k−1 v

(k)k−1

, 3 ≤ k ≤ n,

with γ , δ ∈ C and w(k−2) defined by (35). Since ck = 0 from Proposition 7.2, it follows thatv(k−1)1...

v(k−1)k−1

= ϵ

w(k−2)1...

w(k−2)k−1

, 3 ≤ k ≤ n, (44)

for some ϵ ∈ C which depends on k. This also holds for k = n + 1,4 so the first k rows of (42) can be written asv(k−1)1 r1,k+1...

...

v(k−1)k−1 rk−1,k+1

v(k−1)k rk,k+1

ck sksk −ck

=

ηw

(k−1)1 ϵ w

(k−1)1

......

η w(k−1)k−1 ϵ w

(k−1)k−1

(η w(k−1)k − α) ϵ w

(k−1)k

, 2 ≤ k ≤ n

leading tov(k−1)1...

v(k−1)k−1v(k−1)k

= ϵ′

w(k−1)1...

w(k−1)k−1

w(k−1)k

+

0...0

−α ck

.Hence, ck is given by

ck =

v(k−1)j

w(k−1)k

w(k−1)j

− v(k−1)k

α, 2 ≤ k ≤ n, (45)

for any j ≤ k − 1 such thatw(k−1)j = 0. This equation is always defined, since at leastw(k−1)

1 = 0, from (37).We conclude that given Gk

n−1k=1, r1,1 and c1, the numbers cknk=2 are completely determined by (45).

Remark. Given Gkn−1k=1 and a1, we can compute r1,1 and c1 using (38) and (39), the first columns of R using (43) and the

numbers cknk=2 using (45). Note that the numbers ckn1 representQ′

(n+1) of Proposition 4.1,which follows from (25) and (34),the first n columns of R agree with R(n) and H ′

(n) is defined by Gkn−1k=1 . It follows that given the matrix H ′

(n) of Proposition 4.1,α and the free parameter a1, Eq. (23) gives us just enough information to find Q ′

(n+1) and R(n), thus determining H ′

(n+1).

8. Forward algorithms for the Geronimus transformation

In the previous section, we showed how to compute the leading principal submatrix of order n of R and cknk=1 fromGk

n−1k=1, α and the free parameter a1. Making use of these results, in this section two algorithms are presented to compute

Gknk=2 from the same inputs. Both algorithms require only O(n) flop, by exploiting the structure of the problem.

4 It follows from (36), (34) and RQ = H − αI that v(n) is proportional to the last column of H − αI .


8.1. Direct forward algorithm

From (33) we haveH − αI

n =

Q1 . . .Qn R

n, (46)

giving a straightforward way to compute the first n columns of H and hence the associated Givens transformations Gknk=1.

Indeed, note that with Gk given by (27) and hi,j = H(i, j), we have−ak ak−1ρk

=

hk,k

hk+1,k

and from (46) we deduce

×

hk,khk+1,k

=

Qk−1

1

1

Qk

rk−1,krk,k0

+

0α0

where × denotes some number. For the computation of Qk we use (45) with j = 1, because in this case the equation isalways defined. Since we only need the last two elements of each column of R, it suffices to work with the following matrixand vectors:

Kk =

h1,k w(k)1

hk,k w(k)k

hk+1,k w(k)k+1

, rk =

r1,krk−1,krk,k

, vk =

v(k)1v(k)kv(k)k+1

. (47)

From (35), (42) and (43), we find the following formulas for k ≥ 2:

Kk+1 =

Kk(1, 2) 0Kk(3, 2) 0

0 1

· Gk+1,

rk+1 =

Kk(: , 1)−

0α0

− ck

vk−1(1)vk−1(3)

0

sk,

vk = sk

vk−1(1)vk−1(3)

0

− ck rk+1.

The resulting Algorithm 8.1 can be found below. Note that K1, r1 and v1 do not agree with (47) and have to be initializedseparately. This also explains the auxiliary variables q and i.

8.2. Forward algorithm using shift-through

In the previous subsection the Schur parameters ak and the complementary parameters ρk are found by computingtwo elements of each column of H . In this subsection we give another approach that uses the shift-through operation ofLemma 3.2. The computation of Qk remains the same, but instead of first computing the (k+1)-th column ofH , we computeGk+1 directly from Qk+1 and Gk. The motivation behind this variant is that computing Schur parameters using only the di-agonal of a unitary Hessenberg matrix can be numerically unstable. We illustrate this in Section 9 where we compare bothvariants.

Algorithm 5.1 in Section 5 computes Gknk=1 from α and Gk

nk=1. One of the key elements is the equality

(48)

Given Nk,Gk+1 and Qk, application of the shift-through lemma yields Qk+1, Gk and Nk+1. In this case, however, we areinterested in finding Gk+1. To this end, we rewrite (48) as follows

It is clear that by doing one shift-through operation, we find Gk+1 and Nk+1 from Nk,Qk+1 and Gk.In Algorithm 8.1, we had to compute the first and two last elements of each column of R in order to be able to compute

Qk and Gk. Using the shift-through approach, we only need to find Qk in each step, so it suffices to compute only the first andthe last element of each column of R. This results in someminor adaptations concerning vk and Kk. Furthermore, let Ek = N∗

kand note from Algorithm 5.1 that N1 = Q ∗

1 G1. The resulting Algorithm 8.2 is given below.


Algorithm 8.1 Forward Geronimus algorithm: direct approachINPUT: α; G1, . . . , Gn−1; a1OUTPUT: G2,. . . ,Gn

Compute r1,1 and c1 from a1 using (38) and (39)

s1 =1 − |c1|2 Q1 =

c1 s1s1 −c1

K = G1 v =

r1,10

q =

α0

i = 2

for k = 1 : n − 1

r = (K(: , 1)− q − ck · v) / sk

v = sk · v − ck · r

ck+1 =

v(1) ·

K(i, 2)K(1, 2)

− v(i)/α

sk+1 =1 − |ck+1|

2

Qk+1 =

ck+1 sk+1sk+1 −ck+1

y =

Qk

1

·

1

Qk+1

·

r(i − 1)r(i)0

+

0α0

ak+1 = −y(2)/ ak ρk+1 = y(3)

Gk+1 =

−ak+1 ρk+1ρk+1 ak+1

if k < n − 1

K =

K(1, 2) 0K(i, 2) 0

0 1

· Gk+1 v =

v(1)v(i)0

q =

0α0

i = 3

endend

9. Numerical experiments

In what follows we show the results of several numerical experiments. The goal is to give a good picture of the numericalbehaviour of the algorithms presented in the previous sections. To improve readability, we will adopt the following namesfor the algorithms:

• QRstep is Algorithm 5.1.• RQstep is Algorithm 6.1.• GFwdDirect is the direct forward Geronimus algorithm: Algorithm 8.1.• GFwdST is the shift-through forward Geronimus algorithm: Algorithm 8.2.

First we briefly examine the accuracy of QRstep and RQstep, which are numerically the same. Next we show howRQstep can converge to a numerically unique Geronimus transformation, given any value of the free parameter. The conver-gence behaviour depends on the input sequence and the parameter α. We show how this fact is closely connected with thesolution set S+ of the problem, discussed in Section 2. Then, we visualize the numerical behaviour of the forward algorithmsand show how the conditioning of the problem solved by GFwdDirect and GFwdST is again related to the solution set S+.We also show that the free parameter of these algorithms is restricted to lie in some subset of the unit circle. Finally, wecompare both forward algorithms, pointing to stability issues GFwdDirect can have.

All experiments have been run inMATLAB v7.11.0.584 (R2010b). Calculations in high precision arithmetic are done withthe function vpa using 256 decimal digits. The code to generate the figures in this section is available at the webpagehttp://people.cs.kuleuven.be/~marc.vanbarel/software/.

http://people.cs.kuleuven.be/~marc.vanbarel/software/


Algorithm 8.2 Forward Geronimus algorithm: approach using shift-throughINPUT: α; G1, . . . , Gn−1; a1OUTPUT: G2,. . . ,Gn

Compute r1,1 and c1 from a1 using (38) and (39)

s1 =1 − |c1|2 Q1 =

c1 s1s1 −c1

K = G1 v =

r1,10

q =

α0

E1 = G∗

1Q1 (G1 from a1)

for k = 1 : n − 1

r = (K(: , 1)− q − ck · v) / sk

v = sk · v − ck · r

ck+1 =

v(1) ·

K(2, 2)K(1, 2)

− v(2)/α

sk+1 =1 − |ck+1|

2

Qk+1 =

ck+1 sk+1sk+1 −ck+1

[Gk+1,X , Ek+1] = ShiftThrought(Ek ,Qk+1, Gk) // X is not used.

if k < n − 1

K =

K(1, 2) 0

0 1

· Gk+1 v =

v(1)0

q =

00

end

end

Accuracy of the backward algorithm

Algorithm RQstep computes a sequence of Givens reflections from a given sequence and the parameter α. In Fig. 1, weshow the absolute errors on the sequence of computed Schur parameters, which are mathematically equivalent with theGivens reflections. The input Givens sequence has length n = 200 and is associatedwith Schur parameters inDwith randommoduli and angles and α = 1.2eiπ/5. The errors are computed by comparing with the output of RQstep applied on the sameinput, but in high precision.

The absolute errors shown in Fig. 1 are of the order of the machine precision. This means that RQstep is accurate forthis example. We do not intend to show its accuracy in general, but note that the algorithm only consists of computationsinvolving Givens transformations, more precisely the fusion and shift-through operations of Section 3. The problem ofcomputing Givens transformations accurately and reliably is studied, e.g., in [37].

Convergence of the backward algorithm

As we explained in Section 6, every value of the free parameter an of RQstep yields a solution in S+. The usefulness ofRQstep lies in the fact that the solutions in S+ can be close to each other, i.e., for any value of an we get a sequence akn1 thatis identical up to machine precision for k ≤ K for some K < n. From Section 2 we know that if the input of RQstep comesfrom a measure dµ, then the sequence akn1 will agree up to machine precision with the measure dµ(z)

|z−α|2for k ≤ K .

We illustrate this with the following example. In Fig. 2 we plot the absolute difference between two solutions obtainedfrom two different values of the free parameter an, respectively. We consider input sequences of length n = 100. For someinput sequences and values of |α|, the solutions converge to each other as k diminishes. We note the general trend that forhigher values of the modulus of α, the slope of the differences is steeper. The same seems to hold if the modulus of the inputSchur parameters is higher.

These phenomena are explained by the fact that both solutions are related by an Uvarov transformation defined by α andsome m ∈ C, see Section 2. Since the Schur parameters lie inside the unit circle, the possible values of m lie in the convexset D for which the Uvarov transformation is positive definite on Pn. We have studied this topic in [27], where we also gave


Fig. 1. Absolute errors on the sequence ak , the output of RQstep applied on a sequence of Schur parameters ak ∈ D with random moduli and angles andα = 1.2eiπ/5 , compared with the sequence bk that is computed in high precision.

Fig. 2. Absolute difference between two output sequences ak and bk associatedwith the free parameters an = −0.95 and an = 0.5+0.5i, respectively. Theangle of α equals π/5 and three different values of the modulus are considered. The input sequence equals ak = 0.1 on the left and ak = 0.6 on the right.

the necessary condition thatm lies inside a circle if the Uvarov transformation is positive definite on Pn. This circle containsthe convex set D .

We visualize these ideas in Fig. 3. On the left we show the same differences as in Fig. 2 for the input sequence on the leftwith |α| = 2.1, but now also computed in high precision. On the right we plot the circle of the necessary condition for mand the convex set D , regarding the Uvarov transformation of the solution for an = −0.95.5

The size of the set D of possible values of m determines the range of the first Schur parameter of all the Geronimussolutions, as we show later in Fig. 6 using (49). This is also visible in Fig. 3 on the left, where the difference between the firstSchur parameter of the sequences computed in high precision is clearly of the same order of magnitude as the set D . Thiswould hold taking any two values for the free parameter an. Hence, in the current example of Fig. 3, both values of the freeparameter yield sequences akn1 that are numerically the same for k < K = 50. It remains an open question how long theinput sequence must be, given the value of K , and how to compute this length.

We note that there is a strong analogy with the backward algorithm provided in [22, Section 2.4] for the modificationof measures on the real line by a linear divisor. In this case, the coefficients of the modified Jacobi matrix have to becomputed. Likewise, the convergence of the computed sequence is discussed, although this is not linked with the Uvarovtransformation. For some classical measures, lower bounds have been obtained in [38] for the length of the input sequence,to guarantee convergence up to a given error tolerance.

5 Details on how to compute the right plot can be found in [27]. Note that we need the moments associated with the solution for an = −0.95. Using therecurrence relation (3), we can compute the orthonormal polynomial coefficients λk,i , where φk(z) =

ni=0 λk,iz

i . Since L(φk, 1) = 0, we getn

i=0 λk,iµi ,relating the moments µi . Choosing µ0 = 1, this equation then determines all the moments.


Fig. 3. (Left) Absolute difference between twooutput sequences ak and bk associatedwith the free parameters an = −0.95 and an = 0.5+0.5i, respectively.The input sequence equals ak = 0.1 and α = 2.1eiπ/5 . (Right) Convex setD and circle of necessary condition for Uvarov transformation of the sequence ak .

Fig. 4. Absolute errors of GFwdDirect and GFwdST. The angle of α equals π/5 and three different values of the modulus are considered. The input sequenceequals ak = 0.1 on the left and ak = 0.6 on the right. The errors are computed with respect to the output sequence akn1 of RQstep associated withan = −0.95.

Accuracy of forward algorithms

In order to examine the errors ofGFwdDirect andGFwdST, consider the following experiment. The input sequences akn−11

are the same as for the experiment of Fig. 2 on the left and on the right, respectively, with the same values of α and withn = 100. The output sequence akn1 of RQstep associated with an = −0.95 is the reference sequence we want to compute.Consequently, the free parameter for both forward algorithms equals the first Schur parameter a1 of this sequence. In Fig. 4,we plot the errors on the outputs of GFwdDirect and GFwdST, compared with the reference sequence.

First, observe that GFwdDirect gives somewhat larger errors than GFwdST. Later in this section we will compare bothalgorithms more thoroughly. More important now is the remarkable correspondence between the errors of the forwardalgorithms in Fig. 4 and the convergence of the backward algorithm in Fig. 2. When RQstep shows steep convergence,the errors of the forward algorithms seem to grow with a similar rate. If, on the contrary, there is no convergence, thenboth forward algorithms seem to give small errors. Obviously, this behaviour is connected with the solution set S+. In thefollowing subsection, we show how this connection can make the problem of computing a Geronimus transformation in aforward sense very badly conditioned.

Conditioning issues

In the previous experiment, both forward algorithms show bad numerical results in many of the cases. We repeat theexperiment with GFwdST for |α| = 2.1 and the input sequence ak = 0.1 again with n = 100. The reference sequence isagain the output sequence of RQstep associatedwith an = −0.95, computed in double precision. To analyse the source of theerrors, we include calculations in high precision arithmetic. The main question to answer is if the errors can be attributedto the problem or if they are due to instabilities of the algorithm.


Fig. 5. (Left) Absolute errors of GFwdST in double precision, high precision and partially in high precision. The input sequence equals ak = 0.1 andα = 2.1eiπ/5 . As a reference, also the absolute difference between two backward sequences is included. (Right) Absolute errors of the Christoffeltransformation applied on the output of GFwdST in double precision and in high precision, and of RQstep in double precision.

Fig. 5 on the left shows the absolute errors ofGFwdST in double precision and the absolute difference between the outputsof RQstep for an = −0.95 and an = 0.5 + 0.5i, respectively. Besides these numbers, already visible in previous figures, wealso show the absolute errors of GFwdST

• when all computations are done in high precision (VPA),• when all computations are done in high precision, except for the computation of c1 in the beginning of the algorithm,

which is done in double precision (partial VPA).

On the right hand side of Fig. 5, we plot the absolute errors of the Christoffel transformation on the outputs of GFwdST indouble precision, GFwdST in high precision and RQstep, respectively. The Christoffel transformation is computed withQRstep and the errors are taken with respect to the input sequence ak.

We make the following observations for Fig. 5 on the left.

• The slope of the errors for GFwdST carried out in high precision, is the same as the slope of the differences for RQstep.• GFwdST performs worse in double precision than in high precision.• Only doing one line in double precision instead of high precision in GFwdSTalready gives errors of similar magnitude as

when everything is done in double precision.

Taking the Christoffel transformation and comparing the errors with the original sequence is ameasure of howwell we havecomputed a Geronimus solution. From the right plot of Fig. 5 we observe the following.

• RQstep computes a Geronimus solution up to machine precision. This is no surprise, since QRstep and RQstep are eachother’s inverse in exact arithmetic andboth algorithms showhigh accuracy in the example at the beginning of this section.

• GFwdST in high precision computes a Geronimus solution up to machine precision, but only for k < K , for some K . Keep-ing track of the values ck in the algorithm, we note that |c48| ≥ 1, while |ck| < 1 for k < 48, so the algorithm breaksdown at K = 48.

• GFwdST in double precision drifts away exponentially from being a numerically correct Geronimus solution.

First, we explain the output of GFwdST in high precision and connect it with the subset of valid values for a1. Then wediscuss the behaviour of GFwdST in double precision. Finally we comment on the similar slope of the error of GFwdST inhigh precision and the difference between two backward sequences.

Given the reference sequence, all the possible positive definite Geronimus transformations of the input sequence areUvarov transformations of this reference sequence. If ae1 is the exact first Schur parameter of the reference sequence, thenthe first Schur parameter of an Uvarov transformation can be written as

a1(m) = −c1(m)c0(m)

=−c1 − mα − m(α)−1

c0 + m + m=

ae1 − mα − m(α)−1

1 + m + m. (49)

For a(m) to agree with a positive definite Geronimus solution,m must lie within the convex set D , shown in Fig. 3.In Fig. 6 on the left we plot a1(m) − ae1 for m on the boundary of the convex set D . This gives the boundary of possible

values of a1, shifted to the origin. Observe that the free parameter a1 of the forward algorithms is constrained to a certainsubset of D, in contrast to the backward algorithm, where the free parameter an can be anywhere in the open unit disc.

In this example, the subset is smaller than the machine precision. Since the free parameter a1 is computed in doubleprecision, it lies outside this subset and is not associated with a Geronimus solution of length n. However, together with theset D , the set of possible values of a1 grows bigger as the length K of the reference sequence diminishes. We recompute the


Fig. 6. a1(m)− ae1 form on the boundary of the convex set D computed for the entire reference sequence of length n = 100 (left) and for a part of length47 and 48, respectively, of the reference sequence. We also include the location of the computed value a1 (right).

convex set D and its associated set of valid values for a1 for subsequences of length 47 and 48, respectively, of the referencesequence. In Fig. 6 on the right we see that the computed value of a1 lies outside the set for K = 48 but inside this set forK = 47. This agrees with the fact that GFwdST in high precision breaks down at step K = 48 as the computed value of c48lies outside the unit disc.

Thus the problem of computing the Geronimus transformation in a forward sense, is extremely sensitive to small per-turbations on the free parameter. Yet there is still a difference between the errors in high precision, merely coming from thedifference between two Uvarov transformations, and the errors in double precision.

We observed that when GFwdST is applied in double precision, its errors grow much faster than predicted by theperturbation of the free parameter a1. This behaviour can again be connected to the sensitivity of the problem to smallperturbations on intermediate results. To demonstrate this, we apply GFwdST again in high precision, except for thecomputation of c1, which is done in double precision. As can be seen in Fig. 5 on the left, a perturbation on c1 of the order ofmachine precision results in errors on the output that are similar in magnitude to the errors when GFwdST is done entirelyin double precision. We conclude that when working in double precision, GFwdST seems as accurate as one can get to solvethe forward problem in this example.

The similar slope in Fig. 5 is due to the following. Both the difference between two backward sequences and the errorof GFwdST in high precision are differences between two Uvarov related sequences. The fact that we use a constant inputsequence ak = 0.1 for all k explains that these differences are straight lines (on a semilogarithmic plot) that are shiftedwhen the input sequence shortens.

Comparison of forward algorithms

GFwdDirect and GFwdST only differ in how the Schur parameters are computed from the values ck and ri,j. Recall thatGFwdDirect first computes the elements hk,k and hk+1,k of the unitary Hessenberg matrix H and then ak and ρk follow from

ak = −hk,k/ak−1, a1 = −h1,1,

ρk = hk+1,k.(50)

The approach of GFwdST is to use a shift-through step using only the computed ck and the input Givens Gk.While the first approach is simpler, it can suffer from instabilities. This can be seen as follows. Suppose that ak is the

computed ak and let ak = ak(1 + δk). In addition, suppose that the elements of the Hessenberg matrix H carry an absoluteerror ϵk. This absolute error can be due to rounding errors. The computed ak can then be written as

ak =−hk,k + ϵk

ak−1=

akak−1 + ϵk

ak−1(1 + δk−1)(1 + γk)

≈ ak

1 +

ϵk

akak−1− δk−1 + γk

,

where γk is the division error and where the approximation is more accurate when the error terms between the bracketsare closer to zero. It follows that the relative error on ak is given by

δk ≈ϵk

akak−1− δk−1 + γk.


Fig. 7. (Left) Absolute error of GFwdDirect, GFwdST and RQstep when applied on a given input sequence. (Right) Moduli of the elements of the inputsequence.

Ignoring the division error, this means that δk is at least of the same magnitude as δk−1. More importantly, if ak−1 is small,then the absolute error ϵk is blown up and can dominate in the expression for the absolute error akδk. This shows that theformulae (50) yield an unstable method to compute the Schur parameters from a unitary Hessenberg matrix H .

These issues are well known as in [23, p. 2], a stable algorithm is given to compute Schur parameters from a unitaryHessenberg matrix. By using all the elements of the matrix H , the problematic division is avoided. The drawback is that theprocess requires O(n2) operations. In our case, however, we can implicitly compute the Schur parameters using only oneshift-through operation in each step. In this fashion we need only O(n) operations.

We illustrate this in Fig. 7, where we compare both forward algorithms for a specific sequence of Schur parameters oflength n = 200 and α = 1.001eiπ/5. The sequence is visible on the right. All the moduli are chosen random in [0.01, 0.1],except for only a fewnumberswith smallermoduli.6 The input sequence is then obtained fromQRstep, applied on the chosensequence. On the left of Fig. 7, the absolute errors are shown for GFwdDirect, GFwdST and RQstep, respectively. Small Schurparameters clearly affect the errors of GFwdDirect, while there seems to be no influence at all for GFwdST and RQstep.

10. Conclusions

In this paper we derive and test several algorithms for the Geronimus transformation for measures on the unit circle. Themain relation we use is a QR step for unitary Hessenberg matrices, associated with these measures. Using Givens reflectionsto represent these matrices, all the algorithms we present can work directly with Schur parameters, requiring only O(n)operations.

We present three algorithms: two compute the Schur parameters in a forward sense and one computes them in abackward sense. All algorithms consist of computing an RQ factorization, though this happens implicitly for the backwardalgorithm. Here an RQ step is computed as a QR step for which Gragg already proposed an algorithm in [23]. We follow adifferent approach, making use of fusion and shift-through operations on Givens transformations. In the forward algorithmsthe matrix R is computed straightforwardly, but to compute Q , the semiseparable properties of R have to be considered.The variants of the forward algorithm only differ in how the Schur parameters are computed, once the RQ factorization is(partially) known.

The Geronimus transformation is defined as the inverse of the Christoffel transformation, a Hermitian polynomial modi-fication of degree one. It is known that this implies that there is a set of Geronimus transformations which can be describedby the Uvarov transformation. Using linear algebra, we prove this fact and make it more precise for finite dimensional func-tionals.

We illustrate the numerical behaviour of our methods with several numerical experiments, showing the convergence ofthe backward method, the (ill) conditioning of the forward problem and comparing both forward methods. By means of theUvarov transformation, we explain the convergence of the backward method and the conditioning of the forward problem.

Future work can involve the computation of a condition number for the forward method, like it was done in [19]. Thisnumber is then computed with the same order of operations as the algorithm itself and can indicate ill conditioning andpossibly associated good convergence of the backward method. Another numerical improvement that can be studied is thechoice of j = 1 when using the formula (45). Although we note that the forward problem is often ill conditioned, choosing

6 Note that we choose a sequence with small moduli and with |α| close to one, to make the problem well conditioned (referring to the previoussubsection).


j depending on the situation could influence the stability of the forward method. Finally, determining a suitable length forthe input sequence of the backward method, given an error tolerance for convergence, is still an open question.

References

[1] W.B. Jones, O. Njåstad, W.J. Thron, Moment theory, orthogonal polynomials, quadrature, and continued fractions associated with the unit circle, Bull.Lond. Math. Soc. 21 (1989) 113–152.

[2] F. Marcellán, J. Hernández, Christoffel transforms and Hermitian linear functionals, Mediterr. J. Math. 2 (2005) 451–458.[3] Y.L. Geronimus, Polynomials orthogonal on a circle and their applications, Amer. Math. Soc. Transl. 1954 (1954) 79.[4] U. Grenander, G. Szegő, Toeplitz Forms and their Applications, second ed., University of California Press, Berkeley, 1958.[5] B. Simon, Orthogonal Polynomials on the Unit Circle: Part 1: Classical Theory; Part 2: Spectral Theory, in: Colloquium Publications, vol. 54, American

Mathematical Society, Providence, Rhode Island, USA, 2004.[6] L. Daruis, J. Hernández, F. Marcellán, Spectral transformations for Hermitian Toeplitz matrices, J. Comput. Appl. Math. 202 (2007) 155–176.[7] E. Godoy Malvar, F. Marcellán, An analogue of the Christoffel formula for polynomial modification of a measure on the unit circle, Boll. Unione Mat.

Ital. Sez. A 5 (1991) 1–12.[8] F. Marcellán, Polinomios ortogonales no estándar. Aplicaciones en análisis numérico y teoría de la aproximación, Rev. Acad. Colombiana Cienc. Exacts

Fís. Nat. 30 (2006) 563–579. (in Spanish).[9] L. Garza, J. Hernández, F. Marcellán, Orthogonal polynomials and measures on the unit circle: the Geronimus transformations, J. Comput. Appl. Math.

233 (2010) 1220–1231.[10] K. Castillo, L. Garza, F. Marcellán, Linear spectral transformations, Hessenberg matrices, and orthogonal polynomials, Rend. Circ. Mat. Palermo Ser. II

Suppl. 82 (2010) 3–26.[11] J.H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, New York, USA, 1965.[12] G.H. Golub, J. Kautsky, Calculation of gauss quadratures with multiple free and fixed knots, Numer. Math. 41 (1983) 147–163.[13] W. Gautschi, The interplay between classical analysis and (numerical) linear algebraa tribute to gene H. Golub, Electron. Trans. Numer. Anal. 13 (2002)

119–147.[14] D.S. Watkins, Some perspectives on the eigenvalue problem, SIAM Rev. 35 (1993) 430–471.[15] H. Xu, The relation between the QR and LR algorithms, SIAM J. Matrix Anal. Appl. 19 (1998) 551–555.[16] D. Galant, An implemention of christoffel’s theorem in the theory of orthogonal polynomials, Math. Comp. 25 (1971) 111–113.[17] J. Kautsky, G.H. Golub, On the calculation of jacobi matrices, Linear Algebra Appl. 52/53 (1983) 439–455.[18] M.I. Bueno, F.M. Dopico, A more accurate algorithm for computing the christoffel transformation, J. Comput. Appl. Math. 205 (2007) 567–582.[19] M.I. Bueno, A. Deaño, E. Tavernetti, A new algorithm for computing the Geronimus transformation with large shifts, Numer. Algorithms (2009) 1–39.

Published online: 8 augustus 2009.[20] S. Elhay, J. Kautsky, Jacobi matrices for measures modified by a rational factor, Numer. Algorithms 6 (1994) 205–227.[21] D. Galant, Algebraic methods for modified orthogonal polynomials, Math. Comp. 59 (1992) 541–546.[22] W. Gautschi, Orthogonal Polynomials: Computation and Approximation, Oxford University Press, New York, USA, 2004.[23] W.B. Gragg, The QR algorithm for unitary Hessenberg matrices, J. Comput. Appl. Math. 16 (1986) 1–8.[24] W.B. Gragg, Stabilization of the UHQR algorithm, in: Advances in Computational Mathematics: Proceedings of the Guangzhou International

Symposium, pp. 139–154.[25] M. Stewart, An error analysis of a unitary Hessenberg QR algorithm, SIAM J. Matrix Anal. Appl. 28 (2006) 40–67.[26] M.J. Cantero, L. Moral, L. Velázquez, Direct and inverse polynomial perturbations of hermitian linear functionals, J. Approx. Theory 163 (2011)

988–1028.[27] M. Humet, M. Van Barel, When is the Uvarov transformation positive definite? Numer. Algorithms 59 (2012) 51–62.[28] R. Vandebril, M. Van Barel, N. Mastronardi, Matrix Computations and Semiseparable Matrices, Volume I: Linear Systems, Johns Hopkins University

Press, Baltimore, Maryland, USA, 2008.[29] D.S. Watkins, The QR algorithm revisited, SIAM Rev. 50 (2008) 133–145.[30] D.S. Watkins, Fundamentals of Matrix Computations, second ed., in: Pure and Applied Mathematics, John Wiley & Sons, Inc., New York, 2002.[31] G.H. Golub, C.F. Van Loan, Matrix Computations, third ed., Johns Hopkins University Press, Baltimore, Maryland, USA, 1996.[32] T.L. Wang, W.B. Gragg, Convergence of the shifted QR algorithm, for unitary Hessenberg matrices, Math. Comp. 71 (2002) 1473–1496.[33] T.L. Wang, W.B. Gragg, Convergence of the unitary QR algorithm with unimodular Wilkinson shift, Math. Comp. 72 (2003) 375–385.[34] M. Van Barel, R. Vandebril, P. Van Dooren, K. Frederix, Implicit double shift QR-algorithm for companionmatrices, Numer. Math. 116 (2010) 177–212.[35] R. Vandebril, M. Van Barel, N. Mastronardi, Rational QR-iteration without inversion, Numer. Math. 110 (2008) 561–575.[36] M. Stewart, Stability properties of several variants of the unitary Hessenberg QR-algorithm in structured matrices in mathematics, in: V. Olshevsky

(Ed.), Structured Matrices in Mathematics, Computer Science and Engineering, II, in: Contemporary Mathematics, vol. 281, American MathematicalSociety, Providence, Rhode Island, USA, 2001, pp. 57–72.

[37] D. Bindel, J.W. Demmel, W. Kahan, O.A. Marques, On computing Givens rotations reliably and efficiently, ACM Trans. Math. Software 28 (2002)206–238.

[38] W. Gautschi, Minimal solutions of three-term recurrence relations and orthogonal polynomials, Math. Comp. 36 (1981) 547–554.

http://refhub.elsevier.com/S0377-0427(14)00103-4/sbref1





































Documents

Algorithms for the Geronimus transformation for orthogonal polynomials on the unit circle