23
Materials for Lecture 11 Chapters 3 and 6 Chapter 16 Section 4.0 and 5.0 Lecture 11 Pseudo Random LHC.xls Lecture 11 Validation Tests.xls Next 4 slides were added because right about now most students are

Materials for Lecture 11 Chapters 3 and 6 Chapter 16 Section 4.0 and 5.0 Lecture 11 Pseudo Random LHC.xls Lecture 11 Validation Tests.xls Next 4 slides

Embed Size (px)

Citation preview

Materials for Lecture 11

• Chapters 3 and 6• Chapter 16 Section 4.0 and 5.0• Lecture 11 Pseudo Random

LHC.xls• Lecture 11 Validation Tests.xls

• Next 4 slides were added because right about now most students are confused about PDF parameters and what functions to use

Parameter Estimation

• Parameters for a distribution define the shape and position on the number scale– Uniform( Min, Max)– Norm( Mean, Std Dev)– Mean (Ỹ or Ῡ) and risk as Empirical( Si,

P(Si))• Shape can be skewed right or left, can be

tall or squatty (kurtosis)• Parameters reflect amount of variability in

the stochastic variable• Must validate random variables against

their parameters• We use the parameters to simulate the

distributions

Same Mean Different Std Dev

-15.00 -10.00 -5.00 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00

PDF Approximations

Variable 1 Variable 2

0

0.2

0.4

0.6

0.8

1

-20 -10 0 10 20 30 40

Prob

CDF

Variable 1 Variable 2

Review Steps for Parameter Estimation

• Step 1: Check for presence of a trend, cycle or structural pattern – If trend or structural model, work with the

residuals (ẽt)– If no trend use actual data (X’s)

• Step 2: Estimate parameters for several assumed distributions using the X’s or the residuals (ẽt)

• Step 3: Simulate the different distributions • Step 4: Pick the best match based on

– Mean, Variability -- use validation tests– Minimum and Maximum– Shape of the CDF vs. historical series– Penalty function CDFDEV() to quantify differences

Univariate Parameter Estimation

• When do you use UPES?• When there is no trend in the data• When you want to use the historical

mean as your forecasted y-hat• Test an unknown random variable for

its shape• Or use residuals

Univariate Parameter Estimation

• Empirical distribution fits your data best because it lets the data define the shape

• Prefer to use the EMP with deviations as a percent or fraction from Y-hat • If there is a trend, then account for

it with deviations from trend• Else use deviations from mean• EMP allows us to model low

probability events • Test with =CDFDEV(original data, sim

data)

Model Validation• Do the simulated values for the random

variables reproduce their parameters?• Does the model accurately forecast the

system?• Do the results conform to theoretical

expectations?• Do the results conform to expectations of

experts?– Touring Test of simulation model results– Show the results to experts, using alternative

assumptions about the input values

Four P’s for Validation• Planning – in the initial model preparation

mode, developer should plan how to validate the model

• Personal – it’s the developer’s responsibility to verify every equation, coefficient, and random variable; check if results are theoretically correct?

• Peers – utilize experts in the field to review model results using Touring Test; use sensitivity testing of model

• Prospective Clients – do the results conform to their expectations? Are the results useful to the client?

Model Verification• Check all equations for arithmetic

accuracy– Use Excel’s “Trace Dependence” functions

• Check linkage of variables coming into each equation

– Check model in “Expected Value” and “Stochastic” mode

• Insure that the variables in each equation are theoretically correct

• Make sure the model contains all of the necessary equations to calculate the KOVs

Model Validation• Use statistical tests of each random

variable to insure that it:– reproduces the historical distribution– reproduces the historical correlation matrix

among random variables• Statistical Tests

– Student t test

– F test

– Chi Square test

Statistical Tests for Validation• Test the means of the random variables

against their historical values– Statistically equal at 95% level based on a t-test?

• Test the variance against historical values– Statistically equal at 95% level based on an F-test?

• Check the historical vs. simulated coefficient of variation– Needs to be constant over time

• Check the minimum and maximum – For a Normal distribution are they reasonable?

Should be:Min ≈ Mean + Std Dev * (-3)Max ≈ Mean + Std Dev * (3)

– For an Empirical distribution compare simulated min and max to values the model “should” simulate or

Xmin should get = Y-hat * (1+Minimum Fractional Deviate) Xmax should get = Y-hat * (1+Maximum Fractional Deviate)

• Check the correlation matrix for the simulated variables vs. the historical correlation matrix using t-tests

Validation Tests in Simetar• Verification/Validation tests in Simetar

– Hypothesis tests icon – Compare Two Series Historical Data vs. Simulated

Values– 1st Data Series is history– 2nd Data Series is simulated– Test means and

variances for two series,i.e., are they statistically equal

– Test works for a pair of variables and for comparing two multivariate distributions (matrices)

Statistical Tests for Validation

• Compare Two Series Historical Data vs. Simulated Values – 1st Data Series is history– 2nd Data Series is simulated

Validation Tests in Simetar• Compare mean and standard deviation of

simulated data to the user’s specified values– “Data Series” is the simulated values– Type in the mean or cell– Specify the Std Dev as a

value or a cell location• The test is used when

– Only mean and std devare known, i.e., there is no history for the variable– Mean is a projected value which is different from the history

Validation Tests in Simetar• Compare mean and standard deviation of

simulated data to the user’s specified values• The test is used when only mean and std dev

are known, i.e., there is no history for the variableOr the mean is a projected value different from

history• Note the Given Values are Mean = 10 and Std Dev

= 3Test of Hypothesis for Parameters for Variable 1Confidence Level 95.0000%

Given ValueTest ValueCritical ValueP-Valuet-Test 10 0.00 2.25 1.00 Fail to Reject the Ho that the Mean is Equal to 10Chi-Square Test 3 500.47 LB: 439.00 0.95 Fail to Reject the Ho that the Standard Deviation is Equal to 3

UB: 562.79

Validation Tests in Simetar• Test simulated values for Multivariate

Distributions (MVE and MVN) to test if the historical correlation matrix is reproduced in the simulation– Data Series is the simulated values for all random

variables in the MV distribution, a matrix of variables in SimData

– The original correlation matrix used to simulate the MVE or MVN distribution

• OK, if the majority of correlation coefficients are statistically thesame as the historical correlation matrix

Charts for Validation• Test simulated values for Multivariate

Distributions (MVE and MVN) to test if the historical correlation matrix is reproduced in the simulation

Using Charts for Visual Validation

• Use a CDF to compare historical series to simulated series, tests the min and max

• Use a PDF to compare historical series to simulated series, tests the shape

• Use a Box Plot to compare historical series to simulated series, checks the variability

• Use a Probability graph to compare historical series to simulated series, P(x) vs. F(x)

• Use a Fan graph to show the range of the risk and level of the mean over time, visual test of CV constant over time

How Simetar Simulates Random Numbers

• A pseudo random number generator is used so we can reproduce the simulation results from day to day with the same inputs

• Pseudo random number generator uses a seed to start the sampling sequence – The default seed in Simetar is 31517– Change the seed if you like

• If you do not use a pseudo random number generator then every time you simulate the model you get different answers, even if the input has not changed

Latin Hyper Cube vs. Monte Carlo Simulated Numbers

• Monte Carlo simulation procedure samples randomly from the full range of the possible values for a random variable– Requires large number of iterations for

adequate coverage over possible range of a variable

– For small number of iterations does not sample adequately

Latin Hyper Cube vs. Monte Carlo Simulated Numbers

• Latin Hyper Cube systematically samples all segments of the distribution for a random variable– If 100 iterations are to be simulated, LHC

samples one value randomly from each of 100 intervals of equal length on 0 to 1 USD scale

– Insures all segments of distribution are sampled, even at small numbers of iterations

• With LHC get “adequate” sampling coverage of a distribution with fewer iterations

Latin Hyper Cube vs. Monte Carlo

• A Uniform distribution defined as U(0,1) is a straight line with a 450 angle out of the origin

• A perfect sample would lie on the straight line

• Use the following USDs– Excel’s =RAND()– Simetar’s =UNIFORM()

• Simulate these two USDs• Draw a CDF with the two

random variables, Which one lies on the straight line between 0 and 1?

X

F(x)

0.0

1.0

1.0

Example of Latin Hyper Cube vs. Monte Carlo Simulation of USD