83
Introduction Reduced space Krylov methods Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic and atmospheric applications) S. Gratton 1 S. Gurol 3 Ph.L. Toint 2 J. Tshimanga 1 A. Weaver 3 1 ENSEEIHT, Toulouse, France 2 FUNDP, Namur, Belgium 3 CERFACS, Toulouse, France Joint French-Czech Workshop on Krylov Methods for Inverse Problems Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Data Assimilation:concepts and algorithms

(for oceanic and atmospheric applications)

S. Gratton1

S. Gurol3 Ph.L. Toint2 J. Tshimanga1 A. Weaver3

1ENSEEIHT, Toulouse, France2FUNDP, Namur, Belgium

3CERFACS, Toulouse, France

Joint French-Czech Workshop on Krylov Methods for InverseProblems

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 2: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Outline

1 IntroductionLooking at it from different sidesAn academic example

2 Reduced space Krylov methodsWorking in the observation spaceImplementation and numerical experimentation

3 Acceleration techniques for nonlinear-least squares (optional)Further improvements

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 3: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

What is data assimilation?

You use a kind of data assimilation scheme if you sneeze whilstdriving along the motorway.As your eyes close involuntary; you retain in your mind a picture ofthe road ahead and traffic nearby [background],as well as a mental model of how the car will behave in the shorttime [dynamical system]before you reopen your eyes and make a course correction[adjustment to observations].

O’Neil et al (2004)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 4: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Outline

1 IntroductionLooking at it from different sidesAn academic example

2 Reduced space Krylov methodsWorking in the observation spaceImplementation and numerical experimentation

3 Acceleration techniques for nonlinear-least squares (optional)Further improvements

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 5: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Predicting the state of the atmosphere, of the ocean

The state of the atmosphere or the ocean (the system) ischaracterized by state variables that are classically designated asfields:

velocity components

pressure

density

temperature

salinity

A dynamical model predicts the state of the system at a time giventhe state of the ocean at a earlier time. We address here thisestimation problem. Applications are found in climate,meteorology, ocean,... forecasting problems. Involving largecomputers and nearly real-time computations.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 6: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Predicting the state of the atmosphere of the ocean

The fundamental properties of the system appear in the model asparameters:

viscosities

diffusivities

rates of earth-rotation

The initial and boundary conditions necessary for integration of thedynamical model may also be regarded as parameters.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 7: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 8: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Optimal control problem

The fundamental problem of optimal control reads:

Definition

Find the control u (initial state parameters) out of a set of admissible controlsU which minimizes the cost functional

J =

Z t1

t0

F (t, x, u)dt

subject to

x = f(t, x, u), with x0 depending on u

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 9: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

DA as an optimal control problem

Since the problem of DA is to bring the model state closer to a given setobservations, this may be expressed in terms of minimizing:

J =

Z t1

t0

(H(x)− y)TR−1(H(x)− y)dt

subject to

x = f(t, x, u)

or in discrete form (that we will consider for the rest)

J =NXi=0

(H(xi)− yi)TR−1(H(xi)− yi)

subject to

xi =M(t,x0,u)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 10: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

High performance computing point of view

The simplest instance of a DA problem is a linearleast-squares problem

Typical sizes would be for this problem 107 unknowns and2 · 107 observations (including a priori information)

The problem is not sparse

If no particular structure taken into account, the solution ofthe problem on a modern (3 · 109 operations/s) computerwould take 200 centuries of computation by the normalequations

In terms of memory, working with the matrix in core memoryof a computer not practicable

Therefore iterative methods are used on parallel computers

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 11: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Regularization technique

If all mapping involved in the problem where linear, the dataassimilation problem would often result

in a linear least squares problem with more unknown thanequations

in a very ill-conditioned problem

A regularization technique is often needed. This is done using thebackground information

J (x0) =12‖x0 − xb‖2B−1 +

12

N∑i=0

‖Hi(xi)− yi‖2R−1

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 12: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

A vibrating string

We consider a vibrating string, hold fixed at both ends

It is released with a zero initial speed, from an unknownposition

The string remains in the vertical plane

The string is observed with a set of physical devices measuringthe position string at regularly spaced points during a giventime span

We would like to make aforecast of the string position outside the observation time span

Observations10

x

u(x)String

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 13: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

A vibrating string : the model

The string position u(x) is the solution of the partialdifferential equation

∂2

∂t2u(x, t)− ∂2

∂x2u(x, t) = 0 in]0, 1[×]0, T [u(0, t) = u(1, t) = 0, t ∈]0, T [u(x, 0) = u0(x), ∂

∂tu(x, 0) = 0, x ∈]0, 1[

Under regularity assumptions on u0, this system has oneunique solution

We suppose that the system is observed at times tn

The problem reads minu0

∑Nobtn=0 ‖yn − u(:, tn)‖2

This is an infinite dimensional linear least squares problem,that has to be discretized to be solved on a computer.Discretize then minimize.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 14: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

The observations

We consider now that the string is observed regularly in timeand space. No noise, more observations than unkonwns.

The discretized version of linear least-squares problemminu0

∑Nobtn=0 ‖yn − Un‖2 is solved with a conjugate gradient

technique

→ test(’over’)

Very good agreement between truth an analysis

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 15: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Realistic difficult case

In practice, observing a 3D field at all space points is out ofreach

The observations are noisy, which introduces high frequenciesin the analysis

Both effects (always) come together

→ test(’under-noisy’)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 16: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Exploiting ”a priori” information

We do not consider the previous solution acceptable, becausewe doubt a string might take such positions. We expect thesolution to be smooth enough

We would like to introduce the fact that the string positionshould not vary too much when considering points that areclose in the physical space

purely algebraic approach, e.g.minu0

∑Nobx

j=01σ |u

0j − u0

j+1|2 +∑Nobt

n=0 ‖yn − Un‖2

using a pseudo-physical smoothing process

Sum of background (a priori) term and observation term

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 17: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Smoothing in the discretized space with the heat equation

We consider the discretized heat equation∂∂tu(x, t)−

∂2

∂x2u(x, t) = 0 in]0, 1[×]0, T [u(0, t) = u(1, t) = 0, t ∈]0, T [u(x, 0) = u0(x), ∂

∂tu(x, 0) = 0, x ∈]0, 1[

For a given T , u(., T ) is smoother than u0, because highfrequency terms get strongly damped.

→ simul heat

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 18: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Eigenbasis of few steps in the heat equation

Quickly decaying spectrum

The resulting matrix writes B = UDUT , where U isorthonormal

The Fourier components of any u in this basis are the entriesof UTu

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 19: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

Eigenbasis of few steps in the heat equation

Page 20: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Application to the Data Assimilation problem

A smooth vector u has most of its energy on the ”largest”eigenvectors of B : uTBx = (Uu)TD(Uu) is large

A high-frequency vector has most of its energy on the”smallest” eigenvectors of B : uTB−1u = (Uu)TD−1(Uu) islarge

We introduce the penalization of high frequencies with respectto a guess Ub, called the background :minU0

12‖U

0 − Ub‖2B−1 + 12

∑Nobtn=0 ‖yn − Un‖2R−1 , where R is

the covariance matrix of the observation errors

This is the 4D-Var functional

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 21: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Back on the realistic difficult case

Underdetermined case

→ test(’under-reg’)

Noisy case

→ test(’noisy-reg’)

Underdetermined and noisy case

→ test(’under-noisy-reg’)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 22: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Issues on background regularization

The modelling enables to introduce a physical process todetermine the background, and make the parameterization ofthe background error covariance matrix easy. Backgroundmatrix mat-vec in CG : another differential equation has to besolved

In case of modeling,when a direct solution not applicable, aninner-outer iteration scheme has to be controlled

Determining a reasonable background matrix : based onphysical considerations, possibly on statistics over pastassimilation periods

Introduction of balanced relations in the background : whenvariables are related to each other by relations that are notaccounted for in the model and not properly observed, anadditional (weak) penalty term is added

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 23: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Four-Dimensional Variational (4D-Var) formulation

→ Very large-scale nonlinear weighted least-squares problem:

minx∈Rn

f(x) =1

2||x− xb||2B−1 +

1

2

NXj=0

||Hj(Mj(x))− yj ||2R−1j

where:

Size of real (operational) problems: x, xb ∈ R106, yj ∈ R105

The observations yj and the background xb are noisy

Mj are model operators (nonlinear)

Hj are observation operators (nonlinear)

B is the covariance background error matrix

Rj are covariance observation error matrices

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 24: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

The forward problemControl theoryAn academic example

Incremental 4D-Var

Let rewrite the problem as:

minx∈Rn

f(x) =1

2||ρ(x)||22

Incremental 4D-Var is an inexact/truncated Gauss-Newton algorithm:

It linearizes ρ around the current iterate x and solves

minx∈Rn

1

2‖ρ(x) + J(x)(x− x)‖22

where J(x) is the Jacobian of ρ(x) at x

It thus solves a sequence of linear systems (normal equations)

JT (x)J(x)(x− x) = −JT (x)ρ(x)

where the matrix is symmetric positive definite and varies along theiterations

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 25: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Outline

1 IntroductionLooking at it from different sidesAn academic example

2 Reduced space Krylov methodsWorking in the observation spaceImplementation and numerical experimentation

3 Acceleration techniques for nonlinear-least squares (optional)Further improvements

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 26: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Context

We want to find the minimizer x(t0) of the 4D-Var functional

J [x(t0)] =12(x(t0)− xb)TB−1(x(t0)− xb)

+12

p∑j=0

(Hj(x(tj))− yoj )

TR−1j (Hj(x(tj))− yo

j ),

wherex(tj) =Mj,0(x(t0));B : background-error covariance matrix;Rj : observation-error covariance matrices,Hj : maps the model field at time tj to the observation space.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 27: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Incremental 4D-Var Approach: algo overview

1 Transform the 4D-Var in a sequence of quadratic minimizationproblems

2 Increments δx(k)0 are min. of functions J (k) defined by

J [δx0] =12‖δx0 − [xb − x0]‖2B−1 +

12‖Hδx0 − d‖2R−1

3 Perform update

x(k+1)(t0) = x(k)(t0) + δx(k)0 .

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 28: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Inner minimization

Minimizing

J [δx0] =12‖δx0 − [xb − x0]‖2B−1 +

12‖Hδx0 − d‖2R−1

amounts to solve

(B−1 + HTR−1H)δx0 = B−1(xb − x0) + HTR−1d.

Exact solution writes

xb − x0 +(B−1 + HTR−1H

)−1HTR−1

(d−H(xb − x0)

),

or equivalently (using the S-M-Woodbury formula)

xb − x0 + BHT(R + HBHT

)−1(d−H(xb − x0)

).

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 29: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Dual formulation : PSAS

1 Very popular when few observations compared to modelvariables. Stimulated a lot of discussion in the Ocean andAtmosphere communities

2 Relies on

xb − x0 + BHT(R + HBHT

)−1(d−H(xb − x0)

)3 Iteratively solve(

I + R−1HBHT)w = R−1(d−H(xb − x0)) for w

4 Setδx0 = xb − x0 + BHTw

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 30: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Motivation : PSAS and CG-like algorithm

1 CG minimizes the Incremental 4D-Var function during itsiterations. It minimizes a quadratic approximation of the nonquadratic function : Gauss-Newton in the model space.

2 PSAS does not minimize the Incremental 4D-Var functionduring its iterations but works in the observation space.

Our goal : put the advantages of both approaches together in aTrust-Region framework, to guarantee convergence:

Keeping the variational property, to get the so-called Cauchydecrease even when iterations are truncated.

Being computationally efficient whenever the number ofobservations is significantly smaller than the size of the statevector.

Getting global convergence in the observation space !

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 31: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

CG-like algorithm : assumptions 1

1 Suppose the CG algorithm is applied to solve the Inc-4D usinga preconditioning matrix F

2 Suppose there exists Gm×m such that

FHT = BHTG

3 For ”exact” preconditioners

(B−1 + HTR−1H

)−1HT = BHT

(I + R−1HBHT

)−1

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 32: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Preconditioned CG on Incremental 4D-Var cost function

Initialization steps

Loop: WHILE

1 qi−1 = Api−1

2 αi−1 = rTi−1zi−1 /qT

i−1pi−1

3 vi = vi−1 + αi−1pi−1

4 ri = ri−1 + αi−1qi−1

5 zi = Fri

6 βi = rTi zi / rT

i−1zi−1

7 pi = −zi + βipi−1

Initialization steps

Loop: WHILE

1 qi−1 =(HTR−1H + B−1)pi−1

2 αi−1 = rTi−1zi−1 /qT

i−1pi−1

3 vi = vi−1 + αi−1pi−1

4 ri = ri−1 + αi−1qi−1

5 zi = Fri

6 βi = rTi zi / rT

i−1zi−1

7 pi = −zi + βipi−1

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 33: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

An useful observation

Theorem

Suppose that

1 BHTG = FHT.

2 v0 = xb − x0.

→ vectors ri, pi, vi, zi and qi such that

ri = HTri,

pi = BHTpi,

vi = v0 + BHTvi,

zi = BHTzi,

qi = HTqi

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 34: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Preconditioned CG on Incremental 4D-Var cost function(bis)

Initialization steps

given v0; r0 = (HTR−1H + B−1)v0 − b, . . .

Loop: WHILE

1 HTqi−1 = HT(R−1HB−1HT + Im)pi−1

2 αi−1 = rTi−1zi−1 / qT

i−1pi−1

3 BHTvi = BHT(vi−1 + αi−1pi−1)4 HTri = HT(ri−1 + αi−1qi−1)5 BHTzi = FHTri = BHTGri

6 βi = (rTi zi / rT

i−1zi−1)7 BHTpi = BHT(−zi + βipi−1)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 35: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Restricted PCG (version 1)

Initialization steps

given v0; r0 = (HTR−1H + B−1)v0 − b, . . .

Loop: WHILE

1 qi−1 = (Im + R−1HB−1HT)pi−1

2 αi−1 = rTi−1HBHT zi−1 / qT

i−1HBHT pi−1

3 vi = vi−1 + αi−1pi−1

4 ri = ri−1 + αi−1qi−1

5 zi = FHTri = Gri

6 βi = rTi HBHT zi / rT

i−1HBHT zi−1

7 pi = −zi + βipi−1

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 36: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

More transformations

1 Consider w and t defined by

wi = HBHTzi and ti = HBHTpi

2 From Restricted PCG (version 1)

ti ={−w0 if i = 0,−wi + βiti−1 if i > 0,

3 Use these relations into Restricted PCG (version 1)

4 Transform Restricted PCG (version 1) into Restricted PCG(version 2)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 37: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Restricted PCG (version 2)

Initialization steps

Loop: WHILE

1 qi−1 = R−1ti−1 + pi−1

2 αi−1 = wTi−1ri−1 / qT

i−1ti−1

3 vi = vi−1 + αi−1pi−1

4 ri = ri−1 + αi−1qi−1

5 zi = Gri

6 wi = HBHTzi

7 βi = wTi ri /wT

i−1ri−1

8 pi = −zi + βipi−1

9 ti = −wi + βiti−1

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 38: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Comments

We summarize here the main features of RPCG:

It amounts to solve the observation system with the rightinner-product HBHT

It is mathematically equivalent to PCG in the sense that, inexact arithmetic, both algorithms generate exactly the sameiterates.

It contains a single occurrence of the matrix-vector productsby B, H, HT and R−1 per iteration.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 39: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Loss (and recovery) of orthogonality

1 The modified (G-S) orthogonalization scheme writes

ri ←i−1∏j=1

(In −

rjrTj

rTj Frj

)ri.

2 We suggest the following re-orthogonalization scheme

ri ←i−1∏j=1

(Im −

rjwTj

rTj wj

)ri. (1)

3 Note that the total number of pairs to be stored can bereduced if selective reorthogonalization is performed.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 40: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Loss (and recovery) of orthogonality : experiment

0 10 20 30 40 5010

0

105

1010

1015

Iterations i

Cos

t fun

ctio

n J[

v i]

n= 200 m= 40

Algorithm 5Algorithm 5 + orthoAlgorithm 3Algorithm 3 + ortho

Figure: Orthogonalization issues.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 41: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Experiments

0 20 40 60 8010

2

103

104

105

size(B) = 1024 size(R) = 256

Iterations

Co

st f

un

ctio

n

RPCGPSAS

Figure: The cost function versus the inner iterations for 1 Gauss-Newton

iteration.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 42: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Conclusions

Have proposed a reformulation of the PCG for

(B−1 + HTR−1H)δx0 = B−1(xb − x0) + HTR−1d

The RPCG is mathematically equivalent to PCG

Exploits the fact that all vectors lie in a subspace of IRm

Cheaper than CG (memory and computation)

Some numerical experiments shown

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 43: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Perpectives

Perpectives

Behaviour in presence of round-off error

Find efficient preconditioners F such that

FHT = BHTG

Implement RPCG in a real life data assimilation system :RTRA project

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 44: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Towards further reduction of the cost

We have shown that RPCG allows memory and computational costreduction whenever the number of observation is smaller than the size ofthe control vector

Similar results are possible with other Krylov methods (GMRES, FOM, ...)

The question now is: can we reduce cost further ?

Possible answer: inexact (cheap) matrix-vector products (truncated B−1,R−1, simplified models, ...)

(Simoncini and Szyld, van den Eshop and Sleipen, Giraud, Gratton andLangou, ...)

→ But, there is a need of a stable modification of RPCG.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 45: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

The Arnoldi process

Define (in the full space) A = In +BHTR−1H and set

K = BHT , L = R−1H

the successive nested Krylov subspaces generated by the sequence

b, (γIn +KTL)b, (γIn +KTL)2b, (γIn +KTL)3b, . . . (2)

or, equivalently, by

b, (KTL)b, (KTL)2b, (KTL)3b, . . . (3)

The Arnoldi process generates an orthonormal basis of each of the thesesubspaces, i.e. a set of vectors {vi}k+1

i=1 with v1 = b/‖b‖ such that, after ksteps,

KTLVk = Vk+1Hk, (4)

where Vk ≡ [v1, . . . , vk] and Hk is a (k + 1)× k upper-Hessenberg matrix.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 46: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Related methods: GMRES, MINRES, FOM, CG

Depending on how the matrix Hk is exploited to solve the problem we have

The GMRES algorithm (≡ MINRES for KT = L)

yk = arg miny‖Hky − β1e1‖, sk = Vkyk

The FOM algorithm (≡ CG for KT = L)

H�k y = β1e1, sk = Vkyk

here, H�k is the leading k × k submatrix of Hk.

GMRES (FOM) use long recurrences while MINRES (CG) use short ones.

Let

rk = (I+KTK)Vkyk− b and fk =1

2yTk V

Tk (γI+KTK)Vkyk− bTVkyk

→ GMRES and MINRES monotionically minimize rk while FOM and CGmonotically minimize fk along the iterations.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 47: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Range-space GMRES and FOM (RSGMR and RSFOM)

As CG may be rewritten in the observation space to yield RPCG, algorithmsGMRES, MINRES and FOM may be rewritten to yields similar variants.

Why a range-space GMRES and FOM (RSGMR and RSFOM)?

The FOM setting provides better accuracy and is much better suited touse inexact matrix-vector products.

The cost of storing an orthonormal basis of the successive Krylov spacesis much lower for range-space methods than for full-space ones.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 48: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Exact and inexact products: FOM vs CG

Is CG a reasonable framework for inexact products ?

Comparing ‖rk‖/(‖A‖‖s∗‖) for FOM, CG with reortho and CG for exact (left) and inexact (right) products

(τ = 10−9, κ ≈ 106)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 49: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Stability and convergence with inexact product

We want to bound ‖rk‖ in the context of Arnoldi process under inexactmatrix-vector products.

Some reasons to consider this question

The inexact nature of computer arithmetic implies that such such errorsare inevitable

To allow matrix-vector products in an inexact but cheaper form

Note that

the analysis is for GMRES but that in the context of FOM similarconclusions will hold.

standard CG and MINRES are no longer equivalent to FOM and GMRESin the context of unsymmetric perturbations.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 50: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Two error models

Assume that each iteration i product by K, K or L is inexact, that is

Li = L+ EL,i, KTi = KT + EKT ,i, and Ki = K + EK,i

for some errors EL,i, EKT ,i, and EK,i. Consider two error models todescribe the inaccuracy in the matrix-vector products.

Backward:

‖EK,i+1‖ ≤ τK,i+1‖K‖,‖EKT ,i+1‖ ≤ τKT ,i+1‖K‖,‖EL,i+1‖ ≤ τL,i+1‖L‖,‖EKT ,∗‖ ≤ τ∗‖K‖

Forward:

‖EK,i+1 un‖ ≤ τK,i+1‖Kun‖,‖EKT ,i+1 um‖ ≤ τKT ,i+1‖Kum‖‖EL,i+1 un‖ ≤ τL,i+1‖Lun‖‖EKT ,∗ um‖ ≤ τ∗‖Kum‖

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 51: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Results for the backward error model

Define

qk = Hkyk − βe1, G = max[‖K‖, ‖L‖], ωk = maxi,...,k

‖vi‖

κ(K) = condition number of K

(... after some analysis ...)

Theorem

Assume the backward-error model. Then

‖rk‖ ≤p

2(k + 1) ‖qk‖+ ‖K‖ωk»τ∗γ√k‖yk‖+ 4G2Pk

i=1 |[yk]i| τi–

≤p

2(k + 1)ˆ‖qk‖+ τmaxκ(K) (γ + 4G2)‖yk‖

˜.

where τmax = max[τ1, . . . , τk].

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 52: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Results for the forward error model

Theorem

Assume the forward-error model. Then

‖rk‖ ≤p

2(k + 1) ‖qk‖+√

2

»τ∗γ√k‖yk‖+ 4G ‖K‖

Pki=1 |[yk]i| τi

–≤

p2(k + 1)

»‖qk‖+ τmax (γ + 4G ‖K‖)‖yk‖

–.

Note in both sets of bounds:

The first of these bounds allows for variable accuray requirements

special role of τ∗.

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 53: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Error models (1)

Is the error model important? (ε = 10−5, κ ≈ 102)

Backward error model Forward error model

(normalized ‖rk‖, normalized ‖qk‖, accuracy threshold τ )

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 54: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Error models (2)

Yes, it can definitely make the difference (ε = 10−5, κ ≈ 109)

Backward error model Forward error model

(normalized ‖rk‖, normalized ‖qk‖, accuracy threshold τ )

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 55: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Fixed vs variable accuracy threshold (1)

Can we use variable accuracy thresholds efficiently? (ε = 10−5, κ ≈ 102)

Fixed τ τ ≈ 1/‖qk‖

(normalized ‖rk‖, normalized ‖qk‖, accuracy threshold τ )

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 56: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Fixed vs variable accuracy threshold (2)

Maybe..., not obvious. (ε = 10−5, κ ≈ 102)

Fixed τ τ ≈ 1/‖qk‖

(normalized ‖rk‖, normalized ‖qk‖, accuracy threshold τ )

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 57: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Working in the observation spaceImplementation and numerical experimentation

Conclusions

Range space methods may be designed to gain from low rank

Further gains may be obtained from inexact products

Formal bounds on the residual norm are available in this context

Forward error modelling gives more flexibility than backward

True application: a real challenge (but we are working on it!)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 58: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Outline

1 IntroductionLooking at it from different sidesAn academic example

2 Reduced space Krylov methodsWorking in the observation spaceImplementation and numerical experimentation

3 Acceleration techniques for nonlinear-least squares (optional)Further improvements

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 59: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Linear systems in sequence

Let

A: symmetric and positive definite matrix of order n

b1, . . . , br ∈ Rn: right-hand sides available in sequence

Solve in sequence:

Ax = b1, Ax = b2, . . . by an iterative method (Krylov solvers)

Preconditioning each system using information obtained during thesolution of the previous system(s)

→ Extend the idea to the case where A varies along the iterations(Gauss-Newton method – variational ocean data assimilation)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 60: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Preconditioning technique

Solve Ax = b1 and extract information info1

Solve Ax = b2 using info1 to precondition and extract information info2

Solve Ax = b3 using info2 (and possibly info1) to precondition andextract information info3

. . .

where infok contains (in our case):

Descent directions pi

Ritz pairs (θi, zi) (approximations to eigenpairs)

produced by a conjugate gradient algorithm (or an equivalent Lanczos process)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 61: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Conjugate gradient (CG) method

→ Solves minx∈Rn12xTAx− bTx or equivalently Ax = b

Given x0, set r0 ← Ax0 − b, p0 ← −r0, k ← 1

Loop on k

αk−1 ←rTk−1rk−1

pk−1TApk−1

Compute the step length

xk ← xk−1 + αk−1pk−1 Update the iterate

rk ← rk−1 + αk−1Apk−1 Update the residual

βk ← rTk rkrTk−1rk−1

Ensure A-conjugate directions

pk ← −rk + βkpk−1 Update the descent direction

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 62: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Elementary properties of the LMP

H =hIn − S(STAS)−1STA

iMhIn −AS(STAS)−1ST

i+ S(STAS)−1ST

Proposition

H is symmetric and positive definite

H is invariant under a change of basis for the columns of S(S ← Z = SX, X nonsingular)

H = A−1 if S is of order n (k = n)

(Possibly cheap) factored form: H = GGT with

G = L − SR−1R−TSTAL + SR−1X−TSTL−T

where

M = LLT (L of order n)STAS = RTR (R of order k)STL−TL−1S = XTX (X of order k)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 63: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Connection with the existing L-BFGS form

(Let M = In)

Using Y = AS and letting B = Y TS = STAS we have:

H =hIn − SB−1Y T

i hIn − Y B−1ST

i+ SB−1ST

Letting R = triu(B) and D = diag(B), the classical L-BFGS update

reads [Gilbert, Nocedal, 1993], [Byrd, Nocedal, Schnabel, 1994]:

hIn − SR−TY T

i hIn − Y R−1ST

i+ SR−TDR−1ST

This last formula is not invariant under transformations of S

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 64: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

First-level preconditioner

f(x) =

1

2||ρ(x)||22 =

1

2||x− xb||2B−1 +

1

2

NXj=0

||Hj(Mj(x))− yj ||2R−1j

!

At the background xb:

JT (xb)J(xb) = B−1 +

NXj=0

MTj HT

j R−1j HjMj

Choosing M = B1/2(B1/2)T as first-level preconditioner yields:

(B1/2)TJT (xb)J(xb)B1/2 = In +

NXj=0

(B1/2)TMTj HT

j R−1j HjMjB

1/2 (= A0)

→ Large amount of eigenvalues already clustered at 1

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 65: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

The framework

[Tshimanga, Gratton, Weaver, Sartenaer, QJRMS, 2007]

System with 107 degrees of freedom

A realistic outer/inner loop configuration is considered:

3 outer loops of Gauss-Newton (linearization)

10 inner loops of conjugate gradient (on each of the 3 systems)

The performance is measured by the value of the quadratic cost function

The convergence of Ritz pairs is measured by the backward errors:

‖Azi − θizi‖‖A‖‖zi‖

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 66: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Unpreconditioned runs

0 2 4 6 8 1010

1

102

103

104

Couting index of Ritz values

Ritz

val

ues

outer iteration 1outer iteration 2outer iteration 3

0 2 4 6 8 1010

−5

10−4

10−3

10−2

10−1

Couting index of Ritz valuesE

rror

bou

nds

on R

itz p

airs

outer iteration 1outer iteration 2outer iteration 3

→ The Ritz values for the three matrices are close together

→ The extremal Ritz pairs have the smallest backward errors (better approx.)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 67: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Preconditioned runs

We consider the three forms:

Quasi-Newton LMP

Inexact spectral-LMP

Ritz-LMP

In order to

Analyse, for each, the effect of increasing the number of vectors in S(second and third systems)

Compare their performance(second system)

To this aim, an unpreconditioned conjugate gradient is run on the first systemto produce 10 vectors from which 2, 6 and 10 relevant ones are selected:

Ritz-vectors are selected according to their convergence

Descent directions are selected as the latest ones

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 68: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Quasi-Newton LMP

10 15 20 25 300.8

0.9

1

1.1

1.2

1.3x 10

5

Cumulated preconditioned CG iterations

Qua

drat

ic c

ost f

unct

ion

no precondQN LMP (2 vectors)QN LMP (6 vectors)QN LMP (10 vectors)

→ Positive impact of an increase in the number of vectors in S

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 69: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Inexact spectral-LMP

10 15 20 25 300.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

1.3x 10

5

Cumulated preconditioned CG iterations

Qua

drat

ic c

ost f

unct

ion

no precondspectral−LMP (2 vect)spectral−LMP (6 vect)spectral−LMP (10 vect)

→ Negative impact of an increase in the number of vectors in S

(Ritz pairs may be bad eigenpairs approximation)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 70: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Ritz-LMP

10 15 20 25 300.8

0.9

1

1.1

1.2

1.3x 10

5

Cumulated preconditioned CG iterations

Qua

drat

ic c

ost f

unct

ion

no precondRitz−LMP (2 vectors)Ritz−LMP (6 vectors)Ritz−LMP (10 vectors)

→ Positive and faster impact of an increase in the number of vectors in S

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 71: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Ranking LMP (2 vectors)

10 12 14 16 18 200.95

1

1.05

1.1

1.15

1.2

1.25

1.3x 10

5

Cumulated preconditioned CG iterations

Qua

drat

ic c

ost f

unct

ion

no precondQN−LMP (2 vect)spectral−LMP (2 vect)Ritz−LMP (2 vect)

→ Inexact spectral-LMP ≡ Ritz-LMP – Quasi-Newton LMP is worse

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 72: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Ranking LMP (6 vectors)

10 12 14 16 18 200.9

0.95

1

1.05

1.1

1.15

1.2

1.25

1.3x 10

5

Cumulated preconditioned CG iterations

Qua

drat

ic c

ost f

unct

ion

no precondQN−LMP (6 vect)spectral−LMP (6 vect)Ritz−LMP (6 vect)

→ Ritz-LMP is the best – Inexact spectral-LMP deteriorates

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 73: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Ranking LMP (10 vectors)

10 12 14 16 18 200.9

0.95

1

1.05

1.1

1.15

1.2

1.25

1.3x 10

5

Cumulated preconditioned CG iterations

Qua

drat

ic c

ost f

unct

ion

no precondQN−LMP (10 vect)spectral−LMP (10 vect)Ritz−LMP (10 vect)

→ Quasi-Newton LMP ≡ Ritz-LMP

→ Inexact spectral-LMP even worse than no preconditioning

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 74: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

What about the first system (A0)?

Appropriate starting point for CG

LMP again!

→ Illustration on a one-dimensional shallow water model

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 75: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

One-dimensional shallow water model

→ Estimate the velocity and geopotential of a fluid flow over an obstacle:

1D-grid with 250 mesh-points

x, xb (background) ∈ R500

yj (observations) ∈ R80

→ Outer/inner loop configuration:

3 outer loops of Gauss-Newton (linearization)

5 inner loops of conjugate gradient (on each of the 3 systems)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 76: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Gauss-Newton (with x00 = xb)

0 5 10 15220

230

240

250

260

270

280

first system second system third system

Cumulated inner conjugate gradient iteration

Cos

t fun

ctio

ns v

alue

first system second system third system

Gauss−Newton

→ Computational cost dominated by 15 matrix-vector products

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 77: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Improving the starting point x00

Physical considerations:

The ocean and the atmosphere exhibit an attractor

Most of the variability can be explained in the “attractor subspace”(of low dimension r)

→ Minimize first in this subspace (of basis L)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 78: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Empirical Orthogonal Functions (EOFs)

Construction of L:

Let x1, . . . , xp ∈ Rn be a set of state vectors (p = 200)

Build C = 1p−1

Ppi=1(xi − x)(xi − x)T

Compute the eigenvectors of C (EOFs)

Store r eigenvectors corresponding to the largest eigenvalues

→ Already used in the reduced Kalman filters (SEEK filter)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 79: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Choice for r

Select r such that:

Pri=1 λiPni=1 λi

≥ 0.8

(λi ↘)

For the shallow water model

0 5 10 15 20 25 30 35 40 45 5050

55

60

65

70

75

80

85

90

95

100

Number of selected EOFs

Per

cent

age

of e

xpla

ined

sys

tem

’s v

aria

bilit

y

→ The five first EOFs are computed (r = 5)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 80: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Ritz-Galerkin starting point

The solution of the first system in the subspace spanned by L:

x00 = xb + L(LTA0L)−1LT b0

is called the Ritz-Galerkin starting point

is used as starting point in the CG for the first system (A0x = b0)

→ Computational cost dominated by r = 5 matrix-vector products

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 81: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

First improvement

0 5 10 15210

220

230

240

250

260

270

280

Cumulated inner conjugate gradient iteration

Cos

t fun

ctio

n

Gauss−Newtonwith Ritz−Galerkin starting point

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 82: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Second improvement

0 5 10 15190

200

210

220

230

240

250

260

270

280

Cumulated inner conjugate gradient iteration

Cos

t fun

ctio

n

Gauss−Newtonwith Ritz−Galerkin starting pointwith Ritz−Galerkin starting point and LMP

(Same H for the 3 systems)

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms

Page 83: Data Assimilation: concepts and algorithms (for oceanic ... · Acceleration techniques for nonlinear-least squares (optional) Data Assimilation: concepts and algorithms (for oceanic

IntroductionReduced space Krylov methods

Acceleration techniques for nonlinear-least squares (optional)

Incremental 4D-Var approachNumerical experimentsFurther improvements

Thank you for your attention !

Gurol, Toint, Tshimanga, Weaver Data Assimilation: concept and some algorithms