The ROMES method for reduced-order-model …ktcarlb/presentations/kc_14_1_6_rom_data.pdfThe ROMES method for reduced-order-model uncertainty quanti cation: application to data assimilation

The ROMES method forreduced-order-model uncertainty quantification:

application to data assimilation

M. Drohmann and K. Carlberg

Sandia National Laboratories

Workshop on Model Order Reduction and DataParis, France

January 6, 2014

ROMES M. Drohmann and K. Carlberg 1 / 24

Data assimilation by Bayesian inference

Structural health monitoring

source: Theoretical & Computational Biophysics Group, UIUCsource: Holger Speckmann, Airbus

Given sensor data, what is the updated knowledge ofmaterial properties throughout the aircraft?

Bayesian inference problem

inputs µ → high-fidelity model → outputs y

Given measurements of the outputs, what is the posteriordistribution of the inputs?


Bayesian inference

inputs µ → high-fidelity model → outputs y

Bayes’ theorem

P(µ|y) =P (y|µ)P(µ)

P(y)

measured outputs y = y(µ?) + ε with noise ε ∼ N (0, σ2I)posterior P(µ|y) is soughtprior P (µ) is givennormalizing factor P (y) is handled indirectlylikelihood P (y|µ) ∼ N (y(µ), σ2I) sampling requireshigh-fidelity model evaluations

Objective: numerically sample the posterior distribution

+ achievable in principle, e.g., by MCMC or importance sampling.- barrier: sampling requires high-fidelity forward solves


Reduced-order modeling and Bayesian inference

inputs µ → reduced-order model → outputs yred

Replace the high-fidelity model with reduced-order modelmeasured outputs y = y(µ?) + ε ≈ yred(µ?) + εlikelihood P (y|µ) ∼ N (yred(µ), σ2I) sampling requiresreduced-order model evaluationssampling from the posterior becomes tractable

Problem: neglects reduced-order-model errors

y = y(µ?) + ε (1)

= yred(µ?) + δy(µ?) + ε (2)

“An interesting future research direction is the inclusion ofestimates of reduced model error as an additional source ofuncertainty in the Bayesian formulation.” [Galbally et al., 2009]

Goal: construct a statistical model of thereduced-order-model error


Strategies for ROM error quantification

1 Rigorous error bounds+ independent of input-space dimension- not amenable to statistical analysis- often overestimate the error (i.e., high effectivity)- improving effectivity incurs larger costs

[Huynh et al., 2010, Wirtz et al., 2012] or intrusive reformulation ofdiscretization [Yano et al., 2012]

2 Multifidelity correction [Eldred et al., 2004]

y

eyylow

inputs µ

outp

uts

surrogate model of low-fidelity error as a function of inputs‘correct’ low-fidelity outputs with surrogate

+ amenable to statistics- curse of dimensionality- ROM errors often highly oscillatory in the input space

[Ng and Eldred, 2012]ROMES M. Drohmann and K. Carlberg 5 / 24

Our key observation

10−5 10−4

10−5

10−4

Residual r/error bound

error(energy

norm

)|||u

h−

ure

d|||

10−5

10−4

10−5

10−4

10−5

10−4

Residual r error b

ound

error(energy

norm

)|||u

h−

ure

d|||

(r; |||δu|||)(∆µ

u ; |||δu|||)

Residual and error bound often correlate with the true error

Main idea: construct a stochastic process that maps errorindicators (not inputs µ!) to a distribution of the error

+ independent of input-space dimension+ amenable to statistics

Reduced-order model error surrogates


ROMES

y

eyylow

inputs µ

outp

uts

105 104

105

104

residual R

erro

r(e

ner

gy

norm

)|||u

h

ure

d|||

(i) Kernel method

105 104

106

104

residual R

erro

r(e

ner

gy

norm

)|||u

h

ure

d|||

(ii) RVM

training point mean 95% confidence “uncertainty of mean”

error

error indicator

Construct a stochastic process of the ROM error δ(ρ)

Select a small number of error indicators ρ = ρ(µ) that are

1 cheaply computable online, and2 lead to low variance of the stochastic process.

First attempt: Gaussian process (GP) such that randomvariables (δ(ρ1), δ(ρ2), . . .) have joint Gaussian distribution


Gaussian process [Rasmussen and Williams, 2006]

Definition (Gaussian process)

A Gaussian process is a collection of random variables, any finitenumber of which have a joint Gaussian distribution.

δ(ρ) ∼ GP(m(ρ), k(ρ,ρ′))

mean function m(ρ); covariance function k(ρ,ρ′)

given a training set (δi ,ρi ), the mean and covariancefunctions can be inferred via Bayesian analysis

consider two types of Gaussian processes

1 kernel regression [Rasmussen and Williams, 2006]

2 relevance vector machine (RVM) [Tipping, 2001]


GP #1: Kernel regression [Rasmussen and Williams, 2006]

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006,ISBN 026218253X. c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml

2.2 Function-space View 15

−5 0 5

−2

−1

0

1

2

input, x

outp

ut, f

(x)

−5 0 5

−2

−1

0

1

2

input, x

outp

ut, f

(x)

(a), prior (b), posterior

Figure 2.2: Panel (a) shows three functions drawn at random from a GP prior;the dots indicate values of y actually generated; the two other functions have (lesscorrectly) been drawn as lines by joining a large number of evaluated points. Panel (b)shows three random functions drawn from the posterior, i.e. the prior conditioned onthe five noise free observations indicated. In both plots the shaded area represents thepointwise mean plus and minus two times the standard deviation for each input value(corresponding to the 95% confidence region), for the prior and posterior respectively.

which informally can be thought of as roughly the distance you have to move ininput space before the function value can change significantly, see section 4.2.1.For eq. (2.16) the characteristic length-scale is around one unit. By replacing|xpxq| by |xpxq|/` in eq. (2.16) for some positive constant ` we could changethe characteristic length-scale of the process. Also, the overall variance of the magnitude

random function can be controlled by a positive pre-factor before the exp ineq. (2.16). We will discuss more about how such factors a↵ect the predictionsin section 2.3, and say more about how to set such scale parameters in chapter5.

Prediction with Noise-free Observations

We are usually not primarily interested in drawing random functions from theprior, but want to incorporate the knowledge that the training data providesabout the function. Initially, we will consider the simple special case where theobservations are noise free, that is we know (xi, fi)|i = 1, . . . , n. The joint joint prior

distribution of the training outputs, f , and the test outputs f according to theprior is

ff

N

0,

K(X, X) K(X, X)K(X, X) K(X, X)

. (2.18)

If there are n training points and n test points then K(X, X) denotes then n matrix of the covariances evaluated at all pairs of training and testpoints, and similarly for the other entries K(X, X), K(X, X) and K(X, X).To get the posterior distribution over functions we need to restrict this jointprior distribution to contain only those functions which agree with the observeddata points. Graphically in Figure 2.2 you may think of generating functionsfrom the prior, and rejecting the ones that disagree with the observations, al- graphical rejection

(a) prior

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006,ISBN 026218253X. c 2006 Massachusetts Institute of Technology. www.GaussianProcess.org/gpml

2.2 Function-space View 15

−5 0 5

−2

−1

0

1

2

input, x

outp

ut, f

(x)

−5 0 5

−2

−1

0

1

2

input, x

outp

ut, f

(x)

(a), prior (b), posterior

Figure 2.2: Panel (a) shows three functions drawn at random from a GP prior;the dots indicate values of y actually generated; the two other functions have (lesscorrectly) been drawn as lines by joining a large number of evaluated points. Panel (b)shows three random functions drawn from the posterior, i.e. the prior conditioned onthe five noise free observations indicated. In both plots the shaded area represents thepointwise mean plus and minus two times the standard deviation for each input value(corresponding to the 95% confidence region), for the prior and posterior respectively.

which informally can be thought of as roughly the distance you have to move ininput space before the function value can change significantly, see section 4.2.1.For eq. (2.16) the characteristic length-scale is around one unit. By replacing|xpxq| by |xpxq|/` in eq. (2.16) for some positive constant ` we could changethe characteristic length-scale of the process. Also, the overall variance of the magnitude

random function can be controlled by a positive pre-factor before the exp ineq. (2.16). We will discuss more about how such factors a↵ect the predictionsin section 2.3, and say more about how to set such scale parameters in chapter5.

Prediction with Noise-free Observations

We are usually not primarily interested in drawing random functions from theprior, but want to incorporate the knowledge that the training data providesabout the function. Initially, we will consider the simple special case where theobservations are noise free, that is we know (xi, fi)|i = 1, . . . , n. The joint joint prior

distribution of the training outputs, f , and the test outputs f according to theprior is

ff

N

0,

K(X, X) K(X, X)K(X, X) K(X, X)

. (2.18)

If there are n training points and n test points then K(X, X) denotes then n matrix of the covariances evaluated at all pairs of training and testpoints, and similarly for the other entries K(X, X), K(X, X) and K(X, X).To get the posterior distribution over functions we need to restrict this jointprior distribution to contain only those functions which agree with the observeddata points. Graphically in Figure 2.2 you may think of generating functionsfrom the prior, and rejecting the ones that disagree with the observations, al- graphical rejection

(b) posterior

prior: δ(ρ) ∼ N (0,K(ρ,ρ

)+ σ2I)

k(ρi ,ρj) = exp‖ρi−ρj‖2

r2 is a positive definite kernel

ρ :=[ρ

trainρ

predict

]T

posterior: δ(ρpredict) ∼ N (m(ρpredict), cov(ρpredict))

infer hyperparameters σ2 and r 2


GP #2: Relevance vector machine [Tipping, 2001]

δ(ρ) =M∑

m=1

wmφm(ρ) + ε

fixed basis functions φm (e.g., polynomials, radial-basisfunctions)

stochastic coefficients wm

noise ε ∼ N (0, σ2I)

prior: w ∼ N (0, diag(αi ))

posterior: w ∼ N (m,Σ) leads to posterior dist. of δ(ρ)

infer hyperparameters σ2 and αi


ROMES Algorithm

Offline

1 Populate ROMES database (δ(µ), ρ(µ)) | µ ∈ Dtrain,where ρ denotes candidate indicators.

2 Identify a few error indicators ρ ⊂ ρ that lead to low variancein the Gaussian process.

3 Construct the Gaussian process δ(ρ) ∼ GP(m(ρ), k(ρ,ρ′)) byBayesian inference.

Online (for any µ? ∈ D)

1 compute the ROM solution

2 compute error indicators ρ(µ?)

3 obtain δ(ρ(µ?)) ∼ N (m(ρ(µ?))), k(ρ(µ?),ρ(µ?))

4 correct the ROM solution


Thermal block (Parametrically coercive and compliant, affine, linear, elliptic)

1 2 3

4 5 6

7 8 9

ΓD

ΓN1

ΓN0

4c(x ;µ)u(x ;µ) = 0 in Ω u(µ) = 0 on ΓD

∇c(µ)u(µ) · n = 0 on ΓN0 ∇c(µ)u(µ) · n = 1 on ΓN1

Inputs µ ∈ [0.1, 10]9 define diffusivity c in subdomainsOutput y(µ) =

∫ΓN1

u(µ)dx is compliant

ROM constructed via RB–Greedy [Patera and Rozza, 2006]Consider two errors: 1) energy norm of state-space error|||u(µ)− ured(µ)|||, 2) output bias y(µ)− yred(µ)


Error #1: energy norm |||u(µ)− ured(µ)|||

10−5 10−4

10−5

10−4


error(energy

norm

)|||u

h−

ure

d|||

10−5

10−4

10−5

10−4

10−5

10−4

Residual r error b

ound

error(energy

norm

)|||u

h−

ure

d|||

(r; |||δu|||)(∆µ

u ; |||δu|||)

Residual and error bound correlate with error

105 104

105

104

residual R

erro

r(e

ner

gy

norm

)|||u

h

ure

d|||

(i) Kernel method

105 104

106

104

residual R

erro

r(e

ner

gy

norm

)|||u

h

ure

d|||

(ii) RVM


105 104

105

104

residual R

erro

r(e

ner

gy

norm

)|||u

h

ure

d|||

(i) Kernel method

105 104

106

104

residual R

erro

r(e

ner

gy

norm

)|||u

h

ure

d|||

(ii) RVM


ROMES (Kernel, residual indicator) in log-log space promisingROMES M. Drohmann and K. Carlberg 13 / 24

Gaussian-process assumptions verified (Kernel, residual indicator)

−0.2 0

0.2

0

5

10

deviation from GP–mean

norm

alizedrate

ofoccurren

ce

T = 10

−0.2 0

0.2

0

2

4

6

8

deviation from GP–meannorm

alizedrate

ofoccurren

ce

T = 35

−0.2 0

0.2

0

2

4

6

8


norm

alizedrate

ofoccurren

ce

T = 65−0.2 0

0.2

0

2

4

6


norm

alizedrate

ofoccurren

ce

T = 95

Observed estimates in predicted interval:

predicted T = 10 T = 35 T = 65 T = 95

80 % 49.00 70.95 76.21 77.7490 % 59.21 82.05 86.95 88.2695 % 67.53 89.11 91.95 93.2698 % 75.58 93.00 95.11 95.6899 % 80.16 94.42 96.26 96.68

histogram inferred pdf


GP variables converge (Kernel GP, residual indicator)

0 50 100

10−2

10−1

size of training sample

‖ν−

νfi

ne‖

(i) convergence of mean m

0 50 100

10−4

10−3

10−2

10−1


σ2

(ii) convergenceof variance σ2


Can achieve ‘statistical rigor’ (Kernel GP, residual indicator)

50 1000

2

4

6

8

bound

∆


effec

tivit

y(1

=good

)50%-rigorous estimate

50 1000

2

4

6

8

bound

∆


effec

tivit

y(1

=good

)

90%-rigorous estimate

20 40 60 80 1000

0.2

0.4

0.6


freq

uen

cyof

non

-ri

goro

us

esti

mati

on

s

20 40 60 80 1000

0.1

0.2


freq

uen

cyof

non

-ri

goro

us

esti

mati

on

s

mean ± std

median

minimum

maximum


ROMES: Kernel and RVM GP comparison

10−2 10−110−3

10−2

10−1

residual R

error(energy

norm

)|||u

h−

ure

d|||

(i) Kernel method

10−2 10−1

10−3

10−1

101

residual R

error(energy

norm

)|||u

h−

ure

d|||

(ii) RVM


RVM (Legendre-polynomial basis functions): significantuncertainty in mean’s high-order polynomial coefficients


ROMES: Kernel and RVM GP comparison

0.2 0

0.2

0

2

4

6

8


norm

alize

dra

teofocc

urr

ence

(i) Kernel method

0.5 0

0.5

0

0.5

1

1.5


norm

alize

dra

teofocc

urr

ence

(ii) RVMObserved estimates in c–confidence interval of normaldistribution with variance

2 2 + diag()

c (i) (ii) (i) (ii)

80 % 77.7 77.9 84.32 92.7490 % 88.1 88.7 92.53 97.7995 % 92.9 94.7 95.11 99.7498 % 95.6 97.4 97.11 100.0099 % 96.6 99.2 98.53 100.00


0.2 0

0.2

0

2

4

6

8


norm

alize

dra

teofocc

urr

ence

(i) Kernel method

0.5 0

0.5

0

0.5

1

1.5


norm

alize

dra

teofocc

urr

ence

(ii) RVMObserved estimates in c–confidence interval of normaldistribution with variance

2 2 + diag()

c (i) (ii) (i) (ii)

80 % 77.7 77.9 84.32 92.7490 % 88.1 88.7 92.53 97.7995 % 92.9 94.7 95.11 99.7498 % 95.6 97.4 97.11 100.0099 % 96.6 99.2 98.53 100.00


GP structure and confidence intervals verified in both cases

20 40 60 80 1000.5

1

1.5


effec

tivit

y(1

=good

)

(i) Kernel based GP

20 40 60 80 100

0

2

4

6


effec

tivit

y(1

=good

)

(ii) RVM based GP

mean ± std median minimum maximum

Kernel GP produces better effectivity


Error #2: output bias y(µ)− yred(µ)

10−4 10−3 10−2 10−1

10−4

10−3

10−2


outputbiass r

ed−s h

(i) Indicators mapped to output bias

0 5 10

10−4

10−3

10−2

Parameters (µ1, µ2)

outputbiass r

ed−s h

(ii) Parameters mapped to output bias

(r; δs)

(∆s; δs)

(µ1; δs(µ))

(µ2; δs(µ))

+ Residual and error bound are good indicators (ROMES)

- Inputs are poor indicators (Multifidelity correction)


Gaussian-process assumption verification

0 50 100

80%

90%

95%

99%


freq

.ofvalidations

inco

nf.

interval

Residual based GP

0 50 100

80%

90%

95%

99%


freq

.ofvalidations

inco

nf.

interval

Parameter based GP

80% 90% 95% 98% 99%

+ ROMES (Kernel GP, residual indicator) confidence intervalsconverge.

- Multifidelity correction (Kernel GP, input indicator)confidence intervals do not converge.


Bias improvement

bias reduction =E(

yred(µ) + δy(µ)− y(µ))

yred(µ)− y(µ)

20 40 60 80 100

0

0.5

1

1.5


impro

vem

ent

(sm

aller

=bet

ter)

(i) Residual based GP

20 40 60 80 100

0

5

10

15

20

size of training sampleim

pro

vem

ent

(sm

aller

=bet

ter)

(ii) Parameter based GP

mean ± std median minimum maximum

bia

sre

duct

ion

bia

sre

duct

ion

+ ROMES reduces bias by roughly an order of magnitude

- Multifidelity correction often increases the bias


Conclusions

ROMEScombines existing ROM error indicators with supervisedmachine learning to statistical quantify ROM errorrelies on identifying error indicators that yield low variance‘statistical rigor’ achievableoutperforms multifidelity correction (inputs are poor errorindicators)highlights strength of reduced-order models for dataassimilation: other surrogates (likely) do not have suchpowerful error indicators

Future workapply to nonlinear, time-dependent problemsincorporate in likelihood function

y = yred(µ?) + δy(µ?) + ε

where δy and ε may have different distributionsdevelop error indicators for this purposeautomated selection of indicators and Gaussian process


Acknowledgments

Khachik Sargsyan: helpful discussions

This research was supported in part by an appointment to theSandia National Laboratories Truman Fellowship in NationalSecurity Science and Engineering, sponsored by SandiaCorporation (a wholly owned subsidiary of Lockheed MartinCorporation) as Operator of Sandia National Laboratoriesunder its U.S. Department of Energy Contract No.DE-AC04-94AL85000.


Questions?

M. Drohmann & K. Carlberg, “The ROMES method forreduced-order-model uncertainty quantification,” in preparation.

50 1000

2

4

6

8

bound

∆


effec

tivit

y(1

=good)


50 1000

2

4

6

8

bound

∆

size of training sampleeff

ecti

vit

y(1

=good)


20 40 60 80 1000

0.2

0.4

0.6


freq

uen

cyof

non-

rigoro

us

esti

mati

ons

20 40 60 80 1000

0.1

0.2


freq

uen

cyof

non-

rigoro

us

esti

mati

ons

mean ± std

median

minimum

maximum


Bibliography I

Eldred, M. S., Giunta, A. A., Collis, S. S., Alexandrov, N. A.,and Lewis, R. M. (2004).Second-order corrections for surrogate-based optimization withmodel hierarchies.In Proceedings of the 10th AIAA/ISSMO MultidisciplinaryAnalysis and Optimization Conference, Albany, NY, numberAIAA Paper 2004-4457.

Galbally, D., Fidkowski, K., Willcox, K., and Ghattas, O.(2009).Non-linear model reduction for uncertainty quantification inlarge-scale inverse problems.International Journal for Numerical Methods in Engineering,81(12):1581–1608.


Bibliography II

Huynh, D. B. P., Knezevic, D. J., Chen, Y., Hesthaven, J. S.,and Patera, A. T. (2010).A natural-norm successive constraint method for inf-sup lowerbounds.Comput. Methods Appl. Mech. Engrg., 199:1963–1975.

Ng, L. and Eldred, M. S. (2012).Multifidelity uncertainty quantification using non-intrusivepolynomial chaos and stochastic collocation.In AIAA 2012-1852.

Patera, A. T. and Rozza, G. (2006).Reduced basis approximation and a posteriori error estimationfor parametrized partial differential equations.MIT.


Bibliography III

Rasmussen, C. and Williams, C. (2006).Gaussian Processes for Machine Learning.MIT Press.

Tipping, M. E. (2001).Sparse bayesian learning and the relevance vector machine.The Journal of Machine Learning Research, 1:211–244.

Wirtz, D., Sorensen, D. C., and Haasdonk, B. (2012).A-posteriori error estimation for DEIM reduced nonlineardynamical systems.Preprint Series, Stuttgart Research Centre for SimulationTechnology.


Bibliography IV

Yano, M., Patera, A. T., and Urban, K. (2012).A space-time certified reduced-basis method for Burgers’equation.Math. Mod. Meth. Appl. S., submitted.


Documents

The ROMES method for reduced-order-model …ktcarlb/presentations/kc_14_1_6_rom_data.pdfThe ROMES method for reduced-order-model uncertainty quanti cation: application to data assimilation