Benchmarking robust regression techniques for global ... · Benchmarking robust regression...

Preview:

Citation preview

Benchmarking robust regression techniques

for global energy con�nement scaling in tokamaks

Geert VerdoolaegeDepartment of Applied Physics, Ghent University, Ghent, Belgium

Laboratory for Plasma Physics, Royal Military Academy (LPP�ERM/KMS), Brussels, Belgium

IAEA TM Fusion Data Processing, Validation and Analysis, May 30, 2017

1 Motivation

2 Geodesic least squares regression (GLS)

3 Energy con�nement scaling

4 Conclusion

2

Overview

1 Motivation

2 Geodesic least squares regression (GLS)

3 Energy con�nement scaling

4 Conclusion

3

Overview

Parametric dependencies

Validation, prediction

Ordinary least squares

Uncertainties:

All variables (`x ' and `y ')

Heterogeneous data, outliers

Model: deterministic +stochastic component

Collinearity: regularizationy = β0 + β1x + ε

ε ∼ N (0, σ2

ε )

Power scaling laws: astronomy, biology, geology, �nance, . . .

4

Regression analysis

Robust regression analysis

Need a robust, general-purpose regression technique that is easy to apply.

5

1 Motivation

2 Geodesic least squares regression (GLS)

3 Energy con�nement scaling

4 Conclusion

6

Overview

7

Two measurements

8

Zooming in...

9

Example 1: electron density

10

Example 1: electron density distribution

11

Example 2: inter-ELM time

12

Example 2: inter-ELM time distribution

13

Di�erence/distance between measurements

14

Euclidean distance

15

Which distance?

16

A point and a distribution

17

Sum of squares

18

Mahalanobis distance

p(yi |xi , θ) =1√2πσ

exp

−1

2

(yi − µi

σ

)2 → maximum likelihood

µi = fi (xi , θ)e.g.= β0 + β1xi

19

20

21

22

23

Mahalanobis distance

24

Telling cats from dogs

25

Rao geodesic distance

26

Information geometry

Pseudosphere model

27

The Gaussian probability space

1√2π(

σ2y + ∑m

j=1 βj2

σ2

x ,j

) exp

−1

2

[y −

(β0 + ∑m

j=1 βj xij

)]2σ2y + ∑m

j=1 βj2

σ2

x ,j

Modeled

distribution

1√2π σobs

exp

[−1

2

(y − yi )2

σobs2

]Observed distribution

Rao GD

To be estimated: σobs, β0, β1, . . . , βm

iid data: minimize sum of squared GDs =⇒ geodesic least squares (GLS) regression

If σmod = σobs ⇒ Mahalanobis distance

G. Verdoolaege et al., Nucl. Fusion 55, 113019, 2015

28

GLS with linear model

1 Motivation

2 Geodesic least squares regression (GLS)

3 Energy con�nement scaling

4 Conclusion

29

Overview

Engineering parameters:

τE,th = β0 IβIp B

βBt n̄

βne P

βP

l RβR κβκ εβε MβM

eff

Dimensionless variables:

ωciτE,th = α0 ρ∗αρ βαβ

t ν∗αν qαq

95κακ εαε MαM

eff

ITPA global H-mode database: 1296 measurements from 9 tokamaks

IPB98(y,2):τE,th ∝ I 0.93p B0.15

t n̄0.41e P−0.69l R1.97 κ0.78 ε0.58M0.19eff

ωciτE,th ∝ ρ∗−2.70 β−0.90t ν∗−0.01 q−3.095

κ3.3 ε0.73M0.96eff

30

Global con�nement scaling

ITER-relevance

Uncon�rmed predictions

New predictor variables

Not robust:

Heterogeneous data

Outliers

Log-linear vs. nonlinear

31

Issues with IPB98

Proportional error bars

Unconstrained

100 bootstrap samples:

Average

95% con�dence interval

Benchmarking:

Ordinary least squares (OLS)

Iteratively reweighted least squares (ROB)

Bayesian: uninformative priors, marginalized σ (ROB)

Kullback-Leibler least squares (KLD)

Geodesic least squares (GLS)

32

Methodology

β0 βI βB βn βP βR βκ βε βM τ̂E,th (s)

IPB98 0.056 0.93 0.15 0.41 −0.69 1.97 0.78 0.58 0.19 4.9

OLS ll. 0.049 0.78 0.32 0.44 −0.67 2.24 0.39 0.58 0.18 4.3± 0.25OLS nl. 0.058 0.67 0.50 0.47 −0.83 2.60 1.0 0.86 −0.26 3.5± 0.33

ROB 0.046 0.77 0.32 0.45 −0.66 2.26 0.33 0.57 0.24 4.4± 0.24

BAY 0.051 0.87 0.13 0.47 −0.67 2.13 0.17 0.49 0.23 4.3

KLD ll. 0.056 0.61 0.49 0.46 −0.81 2.53 0.93 1.0 0.18 3.2± 0.29KLD nl. 0.053 0.60 0.49 0.49 −0.81 2.57 0.94 1.0 0.18 3.3± 0.37

GLS ll. 0.048 0.65 0.44 0.49 −0.76 2.52 0.63 0.87 0.27 4.0± 0.23GLS nl. 0.047 0.65 0.44 0.50 −0.75 2.52 0.62 0.85 0.22 4.1± 0.25

ll. = log-linear, nl. = nonlinear33

Regression results

J.G. Cordey et al., Nucl. Fusion 45, 1078, 200534

Robustness w.r.t. error bars

35

Interpretation on pseudosphere: JET data

1 Motivation

2 Geodesic least squares regression (GLS)

3 Energy con�nement scaling

4 Conclusion

36

Overview

Geodesic least squares regression: �exible and robust

Easy to use, fast optimization

Works for linear and nonlinear relations and any distribution model

Revisit established scaling laws, contribute to new regression analyses

Robust estimation of con�nement scaling

Comparing probability distributions:

Quanti�cation of stochasticity

Model validation

37

Conclusions

Recommended