1
Method-independent, Computationally Frugal Convergence Testing for Sensitivity Analyses J. Mai 1 and B. Tolson 1 1 University of Waterloo, Waterloo, Canada Contact: ([email protected]) 1. Introduction The increasing complexity and runtime of environmental mod- els lead to the current situation that the calibration of all model variables or the estimation of all of their uncertainty is often computationally infeasible. Hence, techniques to de- termine the sensitivity of model variables are used to iden- tify most important variables or model processes. While the examination of the convergence of calibration and un- certainty methods is state-of-the-art, the convergence of the sensitivity methods is usually not checked. If any, bootstrapping of the sensitivity results is used to determine the reliability of the estimated indexes. Bootstrapping, how- ever, requires non-negligible implementation efforts and can also become computationally expensive in case of large model outputs and a high number of bootstraps. It also does not perform well for small sample sizes. We, therefore, present a Model Variable Augmentation (MVA) approach to check the convergence of sensitivity indexes without perform- ing any additional model run. This technique is method- and model-independent. It can be applied either during the sensitivity analysis (SA) or afterwards. The method is tested using the variance-based, global method of Sobol’ sensitivity indexes. Different numbers of reference variable sets N S were used, i.e. 10, 100, and 1000. To compare the results of MVA with standard approaches the sensitivity indexes were also bootstrapped. The number of bootstrap samples was set to N B = 1000. We employed 12 benchmark functions with different numbers of variables N X presented already in several studies such as Cuntz et. al (2016) to demonstrate the reliability of MVA. f (x) N X Distr. μ f σ 2 f var. with import. x i low interm. high Sobol’s G 6 U [0, 1] 1.0 0.2 4 1 1 Saltelli’s G * 1 10 U [0, 1] 1.0 0.8 8 0 2 Saltelli’s G * 2 10 U [0, 1] 1.0 3.0 6 4 0 Saltelli’s G * 3 10 U [0, 1] 1.0 0.3 8 0 2 Saltelli’s G * 4 10 U [0, 1] 1.0 0.7 8 2 0 Saltelli’s G * 5 10 U [0, 1] 1.0 2.5 8 0 2 Saltelli’s G * 6 10 U [0, 1] 1.0 20.0 4 6 0 Bratley’s K 10 U [0, 1] -0.3 0.1 8 1 1 Saltelli’s B 10 N [0] 0.0 2.0 6 4 0 Ishigami-H. 3 U [-π,π ] 1.0 2100.0 1 0 2 Oakley-O’H. 15 N [0, 1] 11.0 37.0 15 0 0 Morris 20 U [0, 1] 30.0 1100.0 10 5 5 2. Test functions & Experimental Setup augment true model with artificial model variables z 0 , z 1 , and z 2 with known properties original model output f (x) is converted into y (x,z,c) where c is a correction factor such that σ 2 f σ 2 y z 0 is dummy variables to check correctness of sensitivity method itself (S z 0 =0) z 1 and z 2 are variables which variance is controlled by pa- rameter Δ to check for sampling uncertainty (S z 1 = S z 2 ) Bootstrapping MVA Convergence Analyze mean μ (B ) and variance σ (B ) of boot- strapped distribution D (B ) of sensitivity in- dexes. Convergence, if rel. error is below δ C : μ (B ) i σ (B ) i C x i Determine number of variables with sensitivity between sensitivities of augmented variables z 1 and z 2 . No evidence of non-convergence if: S z 1 <S x i <S z 2 @x i Screening Identify variables above certain threshold δ B as important: S i B Identify variables above certain threshold δ M as important: S i M = δ B -|S z 1 -S z 2 | Ranking Kolmogorov-Schmirnov test to check if boot- strap distributions of sensitivity indexes for two variables x i and x j are significantly differ- ent: H 0 : D (B ) S i = D (B ) S j Check if difference be- tween indexes of two vari- ables S i and S j is smaller than difference for aug- mented variables z 1 and z 2 . Variables are indistin- guishable if: |S i - S j |≤|S z 1 - S z 2 | 3. Model Variable Augmentation MVA x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 z 0 z 1 z 2 0.0 0.1 0.2 0.3 0.4 0.5 Sobol Index ST i A) Theor. Index w/ augm. w/o augm. x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 B) 0.01 0.1 0.2 0.3 0.4 0.5 ST i x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 C) 1 3 5 7 9 11 13 15 Rank x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 D) above between z 1 and z 2 below Location Fig. 1: Results of a Sobol’ sensitivity analy- sis for the Oakley-O’Hagan test function using 10 reference variable sets and a confidence threshold Δ of 0.2. (A) shows the Sobol’ sensitivity indexes with and without MVA as well as the true indexes. In (B) the individual indexes of all 1000 bootstraps and in (C) the ranking of the variables is depicted. (D) shows how many variables are enveloped by the augmented variables (red) and hence are not con- verged. x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 z 0 z 1 z 2 0.0 0.1 0.2 0.3 0.4 0.5 Sobol Index ST i A) Theor. Index w/ augm. w/o augm. x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 B) 0.01 0.1 0.2 0.3 0.4 0.5 ST i x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 C) 1 3 5 7 9 11 13 15 Rank x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 D) above between z 1 and z 2 below Location Fig. 2: Results of a Sobol’ sensitivity analysis for the Oakley-O’Hagan test function using 100 ref- erence variable sets and a confidence threshold Δ of 0.2. The individual subplots are the same as in Figure 1. x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 A) N S = 10 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 B) N S = 100 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 C) N S = 1000 = 0.1 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 D) x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 E) x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 F) = 0.2 above between z 1 and z 2 below Location Fig. 3: Results of a Sobol’ sensitivity analysis for the Oakley-O’Hagan test function using 10, 100, and 1000 reference variable sets (N S ) and confi- dence threshold Δ of 0.1 and 0.2. The number of variables enveloped by the augmented variables z 1 and z 2 are a surrogate for non-convergence. x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 z 0 z 1 z 2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Sobol Index ST i A) Theor. Index w/ augm. w/o augm. x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Bootstrap B) Δ = 0.1 0.0001 0.05 0.1 0.15 0.2 p- value x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 C) Δ = 0.2 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 MVA D) 0.01 0.25 0.5 0.75 1.0 P ( x i = x j ) x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 E) Fig. 4: Testing if variables are significantly dif- ferent. In case of bootstrapping the Kolmogorov- Smirnov test is applied (B, C). Low p-values indi- cate distinguishable variables. In case of MVA, vari- ables whose difference is smaller than the difference between the augmented variables are identified to be indistinguishable. Plots D and E show the prob- ability that a pair of variables is distinguishable. 0.6 0.7 0.8 0.9 1.0 Δ = 0.1 A) G G * 1 G * 2 G * 3 G * 4 G * 5 G * 6 K B I. -H. O. -O. 0.6 0.7 0.8 0.9 1.0 Δ = 0.2 Ratio of correctly identied informative variables B) N S = 10 N S = 100 N S = 1000 w/ augm. w/o augm. Fig. 5: The ratio of correctly determined infor- mative variables using different confidence thresh- olds Δ and different numbers of reference sets N S used to estimate the Sobol’ sensitivities. The vari- able augmentation MVA is increasing the number of correctly identified informative variables in all ex- periments (compare squares and circles). 4. Results MVA is computationally less expensive than bootstrap- ping since automatically computed during sens. estimation MVA indicates reliability of sens. estimates (Fig. 1 & 2) MVA indicates non-converged sens. estimates (Fig. 3) MVA identifies indistinguishable variables (Fig. 4) MVA identifies important variables more reliably than standard fixed-threshold method (Fig. 5) 5. Conclusions

Method-independent, Computationally Frugal Convergence ......Method-independent, Computationally Frugal Convergence Testing for Sensitivity Analyses J. Mai 1 and B. Tolson 1 1 University

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Method-independent, Computationally Frugal Convergence ......Method-independent, Computationally Frugal Convergence Testing for Sensitivity Analyses J. Mai 1 and B. Tolson 1 1 University

Method-independent, Computationally Frugal Convergence Testing for Sensitivity AnalysesJ. Mai 1 and B. Tolson 1

1 University of Waterloo, Waterloo, Canada Contact: ([email protected])

1. IntroductionThe increasing complexity and runtime of environmental mod-els lead to the current situation that the calibration of allmodel variables or the estimation of all of their uncertaintyis often computationally infeasible. Hence, techniques to de-termine the sensitivity of model variables are used to iden-tify most important variables or model processes. Whilethe examination of the convergence of calibration and un-certainty methods is state-of-the-art, the convergence ofthe sensitivity methods is usually not checked. If any,bootstrapping of the sensitivity results is used to determinethe reliability of the estimated indexes. Bootstrapping, how-ever, requires non-negligible implementation efforts and canalso become computationally expensive in case of large modeloutputs and a high number of bootstraps. It also does notperform well for small sample sizes. We, therefore, presenta Model Variable Augmentation (MVA) approach tocheck the convergence of sensitivity indexes without perform-ing any additional model run. This technique is method-and model-independent. It can be applied either duringthe sensitivity analysis (SA) or afterwards.

The method is tested using the variance-based, globalmethod of Sobol’ sensitivity indexes. Different numbers ofreference variable sets NS were used, i.e. 10, 100, and 1000.To compare the results of MVA with standard approachesthe sensitivity indexes were also bootstrapped. The numberof bootstrap samples was set to NB = 1000. We employed12 benchmark functions with different numbers of variablesNX presented already in several studies such as Cuntz et. al(2016) to demonstrate the reliability of MVA.

f (x) NX Distr. µf σ2f var. with import.

xi low interm. highSobol’s G 6 U [0, 1] 1.0 0.2 4 1 1Saltelli’s G∗1 10 U [0, 1] 1.0 0.8 8 0 2Saltelli’s G∗2 10 U [0, 1] 1.0 3.0 6 4 0Saltelli’s G∗3 10 U [0, 1] 1.0 0.3 8 0 2Saltelli’s G∗4 10 U [0, 1] 1.0 0.7 8 2 0Saltelli’s G∗5 10 U [0, 1] 1.0 2.5 8 0 2Saltelli’s G∗6 10 U [0, 1] 1.0 20.0 4 6 0Bratley’s K 10 U [0, 1] -0.3 0.1 8 1 1Saltelli’s B 10 N [0, σ] 0.0 2.0 6 4 0Ishigami-H. 3 U [−π, π] 1.0 2100.0 1 0 2Oakley-O’H. 15 N [0, 1] 11.0 37.0 15 0 0Morris 20 U [0, 1] 30.0 1100.0 10 5 5

2. Test functions & Experimental Setup

• augment true model with artificial model variables z0, z1,and z2 with known properties

• original model output f (x) is converted into y(x, z, c)

where c is a correction factor such that σ2f ≡ σ2

y

• z0 is dummy variables to check correctness of sensitivitymethod itself (Sz0

= 0)• z1 and z2 are variables which variance is controlled by pa-

rameter ∆ to check for sampling uncertainty (Sz1= Sz2

)

Bootstrapping MVA

Conve

rgen

ce

Analyze mean µ(B) andvariance σ(B) of boot-strapped distributionD(B) of sensitivity in-dexes. Convergence, ifrel. error is below δC:

µ(B)i

σ(B)i

< δC ∀xi

Determine number ofvariables with sensitivitybetween sensitivities ofaugmented variables z1

and z2. No evidence ofnon-convergence if:

Sz1< Sxi < Sz2

@xi

Scree

ning

Identify variables abovecertain threshold δB asimportant:

Si > δB

Identify variables abovecertain threshold δM asimportant:

Si > δM = δB−|Sz1−Sz2|

Ranking

Kolmogorov-Schmirnovtest to check if boot-strap distributions ofsensitivity indexes fortwo variables xi and xjare significantly differ-ent:

H0 : D(B)Si

= D(B)Sj

Check if difference be-tween indexes of two vari-ables Si and Sj is smallerthan difference for aug-mented variables z1 andz2. Variables are indistin-guishable if:

|Si − Sj| ≤ |Sz1− Sz2

|

3. Model Variable Augmentation MVA

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 z0 z1 z20.0

0.1

0.2

0.3

0.4

0.5

Sobo

l′Inde

xST

i

A)Theor. Indexw/ augm.w/o augm.

100

200

300

400

500

600

700

800

900

1000

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15

B)

0.01 0.1 0.2 0.3 0.4 0.5STi

100

200

300

400

500

600

700

800

900

1000

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15

C)

1 3 5 7 9 11 13 15Rank

100

200

300

400

500

600

700

800

900

1000

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15

D)

above betweenz1 and z2

belowLocation

Fig. 1: Results of a Sobol’ sensitivity analy-sis for the Oakley-O’Hagan test function using 10reference variable sets and a confidence threshold∆ of 0.2. (A) shows the Sobol’ sensitivity indexeswith and without MVA as well as the true indexes.In (B) the individual indexes of all 1000 bootstrapsand in (C) the ranking of the variables is depicted.(D) shows how many variables are enveloped by theaugmented variables (red) and hence are not con-verged.

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 z0 z1 z20.0

0.1

0.2

0.3

0.4

0.5

Sobo

l′Inde

xST

i

A)Theor. Indexw/ augm.w/o augm.

100

200

300

400

500

600

700

800

900

1000

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15

B)

0.01 0.1 0.2 0.3 0.4 0.5STi

100

200

300

400

500

600

700

800

900

1000

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15

C)

1 3 5 7 9 11 13 15Rank

100

200

300

400

500

600

700

800

900

1000

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15

D)

above betweenz1 and z2

belowLocation

Fig. 2: Results of a Sobol’ sensitivity analysisfor the Oakley-O’Hagan test function using 100 ref-erence variable sets and a confidence threshold ∆of 0.2. The individual subplots are the same as inFigure 1.

100

200

300

400

500

600

700

800

900

1000

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15

A)NS = 10

100

200

300

400

500

600

700

800

900

1000

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15

B)NS = 100

100

200

300

400

500

600

700

800

900

1000

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15

C)NS = 1000

∆=

0.1

100

200

300

400

500

600

700

800

900

1000

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15

D)

100

200

300

400

500

600

700

800

900

1000

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15

E)

100

200

300

400

500

600

700

800

900

1000

x1x2x3x4x5x6x7x8x9x10x11x12x13x14x15

F)

∆=

0.2

above betweenz1 and z2

belowLocation

Fig. 3: Results of a Sobol’ sensitivity analysis forthe Oakley-O’Hagan test function using 10, 100,and 1000 reference variable sets (NS) and confi-dence threshold ∆ of 0.1 and 0.2. The number ofvariables enveloped by the augmented variables z1and z2 are a surrogate for non-convergence.

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 z0 z1 z20.00.10.20.30.40.50.60.70.8

Sobo

l′IndexST

i A)Theor. Indexw/ augm.w/o augm.

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

x1x2x3x4x5x6x7x8x9x10

Bootstrap

B)∆ = 0.1

0.0001 0.05 0.1 0.15 0.2p-value

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

x1x2x3x4x5x6x7x8x9x10

C)∆ = 0.2

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

x1x2x3x4x5x6x7x8x9x10

MVA

D)

0.01 0.25 0.5 0.75 1.0P(x i = x j)

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

x1x2x3x4x5x6x7x8x9x10

E)

Fig. 4: Testing if variables are significantly dif-ferent. In case of bootstrapping the Kolmogorov-Smirnov test is applied (B, C). Low p-values indi-cate distinguishable variables. In case of MVA, vari-ables whose difference is smaller than the differencebetween the augmented variables are identified tobe indistinguishable. Plots D and E show the prob-ability that a pair of variables is distinguishable.

0.6

0.7

0.8

0.9

1.0

∆=0.1

A)

G G∗1 G∗

2 G∗3 G∗

4 G∗5 G∗

6 K B I.-H. O.-O.0.6

0.7

0.8

0.9

1.0

∆=0.2

Ratio

ofcorrectly

identi�

edinform

ativevaria

bles

B)NS = 10NS = 100NS = 1000w/ augm.w/o augm.

Fig. 5: The ratio of correctly determined infor-mative variables using different confidence thresh-olds ∆ and different numbers of reference sets NSused to estimate the Sobol’ sensitivities. The vari-able augmentation MVA is increasing the numberof correctly identified informative variables in all ex-periments (compare squares and circles).

4. Results

• MVA is computationally less expensive than bootstrap-ping since automatically computed during sens. estimation

• MVA indicates reliability of sens. estimates (Fig. 1 & 2)

• MVA indicates non-converged sens. estimates (Fig. 3)

• MVA identifies indistinguishable variables (Fig. 4)

• MVA identifies important variables more reliably thanstandard fixed-threshold method (Fig. 5)

5. Conclusions