Upload
marc-spiller
View
212
Download
0
Embed Size (px)
Citation preview
Materials for Lecture 11
• Chapters 3 and 6• Chapter 16 Section 4.0 and 5.0• Lecture 11 Pseudo Random
LHC.xls• Lecture 11 Validation Tests.xls
• Next 4 slides were added because right about now most students are confused about PDF parameters and what functions to use
Parameter Estimation
• Parameters for a distribution define the shape and position on the number scale– Uniform( Min, Max)– Norm( Mean, Std Dev)– Mean (Ỹ or Ῡ) and risk as Empirical( Si,
P(Si))• Shape can be skewed right or left, can be
tall or squatty (kurtosis)• Parameters reflect amount of variability in
the stochastic variable• Must validate random variables against
their parameters• We use the parameters to simulate the
distributions
Same Mean Different Std Dev
-15.00 -10.00 -5.00 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00
PDF Approximations
Variable 1 Variable 2
0
0.2
0.4
0.6
0.8
1
-20 -10 0 10 20 30 40
Prob
CDF
Variable 1 Variable 2
Review Steps for Parameter Estimation
• Step 1: Check for presence of a trend, cycle or structural pattern – If trend or structural model, work with the
residuals (ẽt)– If no trend use actual data (X’s)
• Step 2: Estimate parameters for several assumed distributions using the X’s or the residuals (ẽt)
• Step 3: Simulate the different distributions • Step 4: Pick the best match based on
– Mean, Variability -- use validation tests– Minimum and Maximum– Shape of the CDF vs. historical series– Penalty function CDFDEV() to quantify differences
Univariate Parameter Estimation
• When do you use UPES?• When there is no trend in the data• When you want to use the historical
mean as your forecasted y-hat• Test an unknown random variable for
its shape• Or use residuals
Univariate Parameter Estimation
• Empirical distribution fits your data best because it lets the data define the shape
• Prefer to use the EMP with deviations as a percent or fraction from Y-hat • If there is a trend, then account for
it with deviations from trend• Else use deviations from mean• EMP allows us to model low
probability events • Test with =CDFDEV(original data, sim
data)
Model Validation• Do the simulated values for the random
variables reproduce their parameters?• Does the model accurately forecast the
system?• Do the results conform to theoretical
expectations?• Do the results conform to expectations of
experts?– Touring Test of simulation model results– Show the results to experts, using alternative
assumptions about the input values
Four P’s for Validation• Planning – in the initial model preparation
mode, developer should plan how to validate the model
• Personal – it’s the developer’s responsibility to verify every equation, coefficient, and random variable; check if results are theoretically correct?
• Peers – utilize experts in the field to review model results using Touring Test; use sensitivity testing of model
• Prospective Clients – do the results conform to their expectations? Are the results useful to the client?
Model Verification• Check all equations for arithmetic
accuracy– Use Excel’s “Trace Dependence” functions
• Check linkage of variables coming into each equation
– Check model in “Expected Value” and “Stochastic” mode
• Insure that the variables in each equation are theoretically correct
• Make sure the model contains all of the necessary equations to calculate the KOVs
Model Validation• Use statistical tests of each random
variable to insure that it:– reproduces the historical distribution– reproduces the historical correlation matrix
among random variables• Statistical Tests
– Student t test
– F test
– Chi Square test
Statistical Tests for Validation• Test the means of the random variables
against their historical values– Statistically equal at 95% level based on a t-test?
• Test the variance against historical values– Statistically equal at 95% level based on an F-test?
• Check the historical vs. simulated coefficient of variation– Needs to be constant over time
• Check the minimum and maximum – For a Normal distribution are they reasonable?
Should be:Min ≈ Mean + Std Dev * (-3)Max ≈ Mean + Std Dev * (3)
– For an Empirical distribution compare simulated min and max to values the model “should” simulate or
Xmin should get = Y-hat * (1+Minimum Fractional Deviate) Xmax should get = Y-hat * (1+Maximum Fractional Deviate)
• Check the correlation matrix for the simulated variables vs. the historical correlation matrix using t-tests
Validation Tests in Simetar• Verification/Validation tests in Simetar
– Hypothesis tests icon – Compare Two Series Historical Data vs. Simulated
Values– 1st Data Series is history– 2nd Data Series is simulated– Test means and
variances for two series,i.e., are they statistically equal
– Test works for a pair of variables and for comparing two multivariate distributions (matrices)
Statistical Tests for Validation
• Compare Two Series Historical Data vs. Simulated Values – 1st Data Series is history– 2nd Data Series is simulated
Validation Tests in Simetar• Compare mean and standard deviation of
simulated data to the user’s specified values– “Data Series” is the simulated values– Type in the mean or cell– Specify the Std Dev as a
value or a cell location• The test is used when
– Only mean and std devare known, i.e., there is no history for the variable– Mean is a projected value which is different from the history
Validation Tests in Simetar• Compare mean and standard deviation of
simulated data to the user’s specified values• The test is used when only mean and std dev
are known, i.e., there is no history for the variableOr the mean is a projected value different from
history• Note the Given Values are Mean = 10 and Std Dev
= 3Test of Hypothesis for Parameters for Variable 1Confidence Level 95.0000%
Given ValueTest ValueCritical ValueP-Valuet-Test 10 0.00 2.25 1.00 Fail to Reject the Ho that the Mean is Equal to 10Chi-Square Test 3 500.47 LB: 439.00 0.95 Fail to Reject the Ho that the Standard Deviation is Equal to 3
UB: 562.79
Validation Tests in Simetar• Test simulated values for Multivariate
Distributions (MVE and MVN) to test if the historical correlation matrix is reproduced in the simulation– Data Series is the simulated values for all random
variables in the MV distribution, a matrix of variables in SimData
– The original correlation matrix used to simulate the MVE or MVN distribution
• OK, if the majority of correlation coefficients are statistically thesame as the historical correlation matrix
Charts for Validation• Test simulated values for Multivariate
Distributions (MVE and MVN) to test if the historical correlation matrix is reproduced in the simulation
Using Charts for Visual Validation
• Use a CDF to compare historical series to simulated series, tests the min and max
• Use a PDF to compare historical series to simulated series, tests the shape
• Use a Box Plot to compare historical series to simulated series, checks the variability
• Use a Probability graph to compare historical series to simulated series, P(x) vs. F(x)
• Use a Fan graph to show the range of the risk and level of the mean over time, visual test of CV constant over time
How Simetar Simulates Random Numbers
• A pseudo random number generator is used so we can reproduce the simulation results from day to day with the same inputs
• Pseudo random number generator uses a seed to start the sampling sequence – The default seed in Simetar is 31517– Change the seed if you like
• If you do not use a pseudo random number generator then every time you simulate the model you get different answers, even if the input has not changed
Latin Hyper Cube vs. Monte Carlo Simulated Numbers
• Monte Carlo simulation procedure samples randomly from the full range of the possible values for a random variable– Requires large number of iterations for
adequate coverage over possible range of a variable
– For small number of iterations does not sample adequately
Latin Hyper Cube vs. Monte Carlo Simulated Numbers
• Latin Hyper Cube systematically samples all segments of the distribution for a random variable– If 100 iterations are to be simulated, LHC
samples one value randomly from each of 100 intervals of equal length on 0 to 1 USD scale
– Insures all segments of distribution are sampled, even at small numbers of iterations
• With LHC get “adequate” sampling coverage of a distribution with fewer iterations
Latin Hyper Cube vs. Monte Carlo
• A Uniform distribution defined as U(0,1) is a straight line with a 450 angle out of the origin
• A perfect sample would lie on the straight line
• Use the following USDs– Excel’s =RAND()– Simetar’s =UNIFORM()
• Simulate these two USDs• Draw a CDF with the two
random variables, Which one lies on the straight line between 0 and 1?
X
F(x)
0.0
1.0
1.0