Upload
vuongkhuong
View
219
Download
0
Embed Size (px)
Citation preview
Page 1
TDEI52 Simulering och Modellering
Lecture 6 and 7Overview of a Simulation Study
TDEI52Modelling and Simulation
2
Steps in a Simulation Study
• Problem structuring and formulation• System and simulation specification• Data collection and input modeling• Model formulation and construction• Verification and validation• Experimentation and analysis• Presenting and preserving the results• Implementation
Page 2
TDEI52 Simulering och Modellering
3
Problem Formulation
• The initial problem is rarely defined clearly
• Puzzles, problems and messes– Problem structuring - from messes to definition of problems
to which a useful contribution can be made
• Some types of problems– improving the performance of existing systems– avoiding future problems (design of new systems)– testing the designs before the implementation
4
Project Objectives and Overall Plan
• Objectives indicate questions to be answered by simulation
• Establish the performance metrics that will be used to measure:– quality of the system under study– success of the study
• Is simulation the right tool to use?
• Make the overall plan for the project (develop further into specification)
Page 3
TDEI52 Simulering och Modellering
5
System and Simulation Specification
• Investigate and describe the system under study
• Make sure all involved parties understand what is being done
• Typical specification includes:– Simulation objectives– System description and modeling approach– Animation exactness– Model input and output– Project deliverables
6
Conceptual Model
• Construction of a model - as much art as science.
– Abstract the essential features of the problem– Select and modify basic assumptions that characterize the
system– Start with a simple model, and grow in complexity– Keep the model as simple as possible, but not simpler– Involve the user in modeling process– Use some methodological support
• Only experience with real systems can ”teach” the art of model building
Page 4
TDEI52 Simulering och Modellering
7
Input Analysis: Specifying Model Parameters, Distributions
• Structural modeling– Logical aspects — entities, resources, paths, etc.
• Quantitative modeling– Numerical, distributional specifications– Like structural modeling, need to observe system’s
operation, take data if possible
8
Deterministic vs. Random Inputs
• Deterministic: nonrandom, fixed values– Number of units of a resource– Entity transfer time (?)– Interarrival, processing times (?)
• Random (a.k.a. stochastic): model as a distribution, “draw” or “generate” values from to drive simulation– Transfer, Interarrival, Processing times– What distribution? What distributional parameters?– Causes simulation output to be random, too
• Don’t just assume randomness away — validity
Page 5
TDEI52 Simulering och Modellering
9
Collecting Data
• Generally hard, expensive, frustrating, boring– System might not exist
– Data available on the wrong things — might have to change model according to what’s available
– Incomplete, “dirty” data
– Too much data (!)
• Sensitivity of outputs to uncertainty in inputs
• Match model detail to quality of data
• Cost — should be budgeted in project
• Capture variability in data — model validity
• Garbage In, Garbage Out (GIGO)
10
Using Data:Alternatives and Issues
• Use data “directly” in simulation– Read actual observed values to drive the model inputs (interarrivals,
service times, part types, …)– All values will be “legal” and realistic– But can never go outside your observed data– May not have enough data for long or many runs– Computationally slow (reading disk files)
• Or, fit probability distribution to data– “Draw” or “generate” synthetic observations from this distribution to
drive the model inputs– We’ve done it this way so far– Can go beyond observed data (good and bad)– May not get a good “fit” to data — validity?
Page 6
TDEI52 Simulering och Modellering
11
Fitting Distributions via the Arena Input Analyzer
• Assume:– Have sample data: Independent and Identically Distributed
(IID) list of observed values from the actual physical system– Want to select or fit a probability distribution for use in
generating inputs for the simulation model
• Arena Input Analyzer– Separate application, also accessible via Tools menu in
Arena– Fits distributions, gives valid Arena expression for
generation to paste directly into simulation model
12
Fitting Distributions via the Arena Input Analyzer (cont’d.)
• Fitting = deciding on distribution form (exponential, gamma, empirical, etc.) and estimating its parameters– Several different methods (Maximum likelihood, moment
matching, least squares, …)– Assess goodness of fit via hypothesis tests
• H0: fitted distribution adequately represents the data• Get p value for test (small = poor fit)
• Fitted “theoretical” vs. empirical distribution
• Continuous vs. discrete data, distribution
• “Best” fit from among several distributions
Page 7
TDEI52 Simulering och Modellering
13
Data Files for the Input Analyzer
• Create the data file (editor, word processor, spreadsheet, ...)– Must be plain ASCII text (save as text or export)– Data values separated by white space (blanks, tabs,
linefeeds)– Otherwise free format
• Open data file from within Input Analyzer– File/New menu or– File/Data File/Use Existing … menu or – Get histogram, basic summary of data– To see data file: Window/Input Data menu
• Can generate “fake” data file to play around– File/Data File/Generate New … menu
14
The Fit Menu
• Fits distributions, does goodness-of-fit tests
• Fit a specific distribution form
– Plots density over histogram for visual “test”– Gives exact expression to Copy and Paste (Ctrl+C, Ctrl+V)
over into simulation model– May include “offset” depending on distribution– Gives results of goodness-of-fit tests
• Chi square, Kolmogorov-Smirnov tests
• Most important part: p-value, always between 0 and 1:Probability of getting a data set that’s more inconsistent with
the fitted distribution than the data set you actually have, if the the fitted distribution is truly “the truth”“Small” p (< 0.05 or so): poor fit (try again or give up)
Page 8
TDEI52 Simulering och Modellering
15
The Fit Menu (cont’d.)
• Fit all Arena’s (theoretical) distributions at once– Fit/Fit All menu or– Returns the minimum square-error distribution
• Square error = sum of squared discrepancies between histogram frequencies and fitted-distribution frequencies
• Can depend on histogram intervals chosen: different intervals can lead to different “best” distribution
– Could still be a poor fit, though (check p value)– To see all distributions, ranked: Window/Fit All Summary or
16
The Fit Menu (cont’d.)
• “Fit” Empirical distribution (continuous or discrete): Fit/Empirical– Can interpret results as a Discrete or Continuous
distribution• Discrete: get pairs (Cumulative Probability, Value)
• Continuous: Arena will linearly interpolate within the data range according to these pairs (so you can never generate values outside the range, which might be good or bad)
– Empirical distribution can be used when “theoretical” distributions fit poorly, or intentionally
Page 9
TDEI52 Simulering och Modellering
17
Some Issues in Fitting Input Distributions
• Not an exact science — no “right” answer
• Consider theoretical vs. empirical
• Consider range of distribution– Infinite both ways (e.g., normal)– Positive (e.g., exponential, gamma)– Bounded (e.g., beta, uniform)
• Consider ease of parameter manipulation to affect means, variances
• Simulation model sensitivity analysis
• Outliers, multimodal data– Maybe split data set
18
No Data?
• Happens more often than you’d like
• No good solution; some (bad) options:– Interview “experts”
• Min, Max: Uniform
• Avg., % error or absolute error: Uniform
• Min, Mode, Max: Triangular– Mode can be different from Mean — allows asymmetry
– Interarrivals — independent, stationary• Exponential— still need some value for mean
– Number of “random” events in an interval: Poisson– Sum of independent “pieces”: normal– Product of independent “pieces”: lognormal
Page 10
TDEI52 Simulering och Modellering
19
Cautions on Using Normal Distributions
• Probably most familiar distribution – normal “bell curve” used widely in statistical inference
• But it has infinite tails in both directions … in particular, has an infinite left tail so can always (theoretically) generate negative values– Many simulation input quantities (e.g., time durations) must be positive
to make sense – Arena truncates negatives to 0
• If mean P is big relative to standard deviation V, then P(negative) value is small … one in a million
• But in simulation, one in a million can happen
• Moral – avoid normal distribution as input model
20
Nonstationary Arrival Processes
• External events (often arrivals) whose rate varies over time– Lunchtime at fast-food restaurants– Rush-hour traffic in cities– Telephone call centers– Seasonal demands for a manufactured product
• It can be critical to model this nonstationarity for model validity– Ignoring peaks, valleys can mask important behavior– Can miss rush hours, etc.
• Good model: Nonstationary Poisson process
Page 11
TDEI52 Simulering och Modellering
21
Nonstationary Arrival Processes(cont’d.)
• Two issues:– How to specify/estimate the rate function– How to generate from it properly during the simulation
• Several ways to estimate rate function — we’ll just do the piecewise-constant method– Divide time frame of simulation into subintervals of time
over which you think rate is fairly flat– Compute observed rate within each subinterval– Be very careful about time units!
• Model time units = minutes• Subintervals = half hour (= 30 minutes)• 45 arrivals in the half hour; rate = 45/30 = 1.5 per minute
22
Multivariate and Correlated Input Data
• Usually we assume that all generated random observations across a simulation are independent (though from possibly different distributions)
• Sometimes this isn’t true:– A “difficult” part requires long processing in both the Prep
and Sealer operations– This is positive correlation
• Ignoring such relations can invalidate model
• See textbook for ideas, references
Page 12
TDEI52 Simulering och Modellering
23
Data Collection and Input Modeling -Summary
• Interconnected with model construction and is one of the most time consuming tasks
• Decide on deterministic vs. random inputs
• Decide on using available, historical data
• Collect data from system under study
• Use expert knowledge for estimates in case no data is available or possible collect
24
Model Formulation and Construction
• Moving from specification and conceptual model to computer implementation
• Elaborate further your modeling approach– data structures, animation, different scenarios, etc.
• Decide what constructs to use
• Start constructing your model
• Remember - model rarely works right the first time !
Page 13
TDEI52 Simulering och Modellering
25
Verification and Validation
• System o Model o “Code”
• Validation: Is Model = System?
• Verification: Is “Code” = Model? (debugging)
• The Truth: Can probably never completely verify, especially for large models
26
Verification
• Verification is a procedure to ensure that the model is built according to specifications and to eliminate errors in the structure, algorithm, and computer implementation of the model.
• In other words: – Building the model right, or– Debugging the model
Page 14
TDEI52 Simulering och Modellering
27
Some Techniques to Attempt Verification
– Eliminate error messages (obviously)– Single entity release, Step through logic
• Set Max Batches = 1 in Arrive
• Replace part-type distribution with a constant
– “Stress” model under extreme conditions– Inspection by peers– Check model logic by drawing a flow diagram of all possible
actions– Verify using animation– Debugging tools - watches, breakpoints, traces.
28
Validation
• A process by which we ensure that the model is suitable for the purpose for which it was built– in other words: building the right model
• Is validation possible? – Can we demonstrate that a model is fully correct?– We can only invalidate or fail to invalidate.
• Black-box validation– Turing test, predictive power of the model
• Open-box validation– The detailed internal structure of the model should be
compared with that of its reference structure
Page 15
TDEI52 Simulering och Modellering
29
Face Validity
• The first goal of the simulation modeler is to construct a model that appears reasonable on its face to model users and others who are knowledgeable about the real system being simulated
• Conversations with system ”experts”– Observations of the system– Existing theory– Relevant results from similar simulation models– Experience, intuition
30
Validation of Model Assumptions
• Structural assumptions– involve questions of how the system operates and usually
involve simplifications and abstractions of reality
• Data assumptions– based on the collection of reliable data– and correct statistical analysis of the data
Page 16
TDEI52 Simulering och Modellering
31
Validating Input-Output Transformations
• Testing the model’s ability to predict the future behavior of the real system– under known historical circumstances– under new circumstances
• Validating models of new, non-existing systems– if it is an improvement or changed version of some existing
system, can consider modeling and validating that first
32
Model Credibility and Acceptance
• Credibility is reflected in the willingness of persons to base the decisions on the information obtained from a model
• Can we actually produce reliable predictions of living systems such as companies?
• Instead of making predictions, models can be used in facilitating a manager or a group of managers ”to make up their own minds”
• Suggests modeling for learning
Page 17
TDEI52 Simulering och Modellering
33
Experimentation and Analysis
• Simple experimentation and design of experiments– The statistical design of experiments is a set of principles
for evaluation and maximizing the information gained from an experiment.
– For each system design determine:– warm-up period (if any)– replication length– number of replications
• Appropriate statistical techniques should be used to analyze the simulation experiments
34
Statistical Analysis of Output Data: Terminating Simulations
• Random input leads to random output (RIRO)
• Run a simulation (once) — what does it mean?– Was this run “typical” or not?– Variability from run to run (of the same model)?
• Need statistical analysis of output data
• Time frame of simulations– Terminating: Specific starting, stopping conditions– Steady-state: Long-run (technically forever)– First: Terminating
Page 18
TDEI52 Simulering och Modellering
35
Strategy for Data Collection and Analysis
• For terminating case, make IID replications– Simulate module: Number of Replications field– Check both boxes for Initialization Between Reps.– Get multiple independent Summary Reports
• How many replications?– Trial and error (now)– Approximate no. for acceptable precision (below)– Sequential sampling
• Save summary statistics across replications– Statistic Module, Output Type, save to files
• Maybe turn off animation (Run/Setup/Mode)
36
Confidence Intervals for Terminating Systems
• Output Analyzer on files saved from Outputs (cross-replication) of Statistic module
• Define, read in, save Data Group(s)
• In Output Analyzer– Analyze/Conf. Interval on Mean/Classical… menu (or )– Add desired files; select Lumped for Replications
Page 19
TDEI52 Simulering och Modellering
37
Confidence Interval Dialogs
Add files to Data Group
Select filesfor confidenceintervals
Select “Lumped”Replications treatmentto use all replications
Can change confidence level(95% is default)
38
Confidence Interval Results
• Interpretation of confidence intervals– What’s being estimated– Coverage, precision, robustness to non-normality
Page 20
TDEI52 Simulering och Modellering
39
Automatic Text-Only 95% Confidence Intervals
• At end of summary report, get 95% confidence intervals as above, in text-only format, if– You ask for more than one replication, and– You have a Statistics module with Outputs entries
• Done only for output measures in Statistics module’s Outputs area
OUTPUTSIdentifier Average Half-width Minimum Maximum # Replications_______________________________________________________________________________avg WIP 11.327 1.7801 6.3666 18.704 20Part 1 cycle avg 148.38 23.809 93.763 296.79 20Part 2 cycle avg 189.47 25.602 124.38 296.29 20Cell 4 avg Q length 1.5652 .64270 .48813 6.4610 20Cell 2 avg Q length 1.7943 .66541 .45236 6.3355 20Cell 1 avg Q length 1.7960 .52429 .45934 4.5093 20Part 3 cycle avg 107.80 15.704 65.692 193.10 20Cell 3 avg Q length 1.2976 .40001 .45141 3.3876 20
40
Half Width and Number of Replications
• Prefer smaller confidence intervals — precision
• Notation:
• Confidence interval:
• Half-width =
• Can’t control t or s
• Must increase n — how much?
tsnn− −11 2, /α
X tsnn± − −11 2, /α
n
X
s
t tn
==
=− −
no. replications
sample mean
= sample standard deviation
critical value from tables11 2, /α
Want this to be “small,” say< h where h is prespecified
Page 21
TDEI52 Simulering och Modellering
41
Half Width and Number of Replications (cont’d.)
• Set half-width = h, solve for
• Not really solved for n (t, s depend on n)
• Approximation:
– Replace t by z, corresponding normal critical value– Pretend that current s will hold for larger samples
– Get
• Easier but different approximation:
n tshn= − −11 2
22
2, /α
n zsh
≅ −1 22
2
2α /
s = sample standarddeviation from “initial”number n0 of replications
n nh
h≅ 0
02
2h0 = half width from “initial”number n0 of replications
n grows quadraticallyas h decreases.
42
Comparing Alternatives
• Usually, want to compare alternative system configurations, layouts, scenarios, sensitivity analysis …
• Here: Transfer time (2 min) smells like a guess —does it matter if it’s, say, 1 vs. 3?– Call these alternatives A and B in Arena
• Single measure of performance: average WIP
• Make two sets of 20 replications, for A and B– Must rename output files to distinguish them
Page 22
TDEI52 Simulering och Modellering
43
Comparing Alternatives (cont’d.)
• Analyze/Compare Means menu (no button)– Add comparable data files for A and B– Lumped Replications
44
Comparing Alternatives (cont’d.)
• Results:
• c.i. on difference misses 0, so conclude that there is a (statistically) significant difference
Page 23
TDEI52 Simulering och Modellering
45
More analysis options
• Process Analyzer– The Process Analyzer is focused at post-model development
comparison of models.
– At this stage, the simulation model is assumed to be complete, validated, and configured appropriately for use by the Process Analyzer.
– The role of the Process Analyzer then is to allow for comparison of the outputs from validated models based on different model inputs.
• OptQuest for ARENA– Automatically searches for the best solution by changing specified
controls and running ARENA simulation models.
46
Statistical Analysis of Steady-State Simulations
• Recall: Difference between terminating, steady-state simulations– Which is appropriate depends on model, study
• Now, assume steady-state is desired– Be sure this is so, since running and analysis is a lot harder
than for terminating simulations
• Naturally, simulation run lengths can be long– Opportunity for different internal computation order– Can change numerical results– Underscores need for statistical analysis of output
Page 24
TDEI52 Simulering och Modellering
47
Warm Up and Run Length
• Most models start empty and idle– Empty: No entities present at time 0– Idle: All resources idle at time 0– In a terminating simulation this is OK if realistic– In a steady-state simulation, though, this can bias the
output for a while after startup• Bias can go either way
• Usually downward (results are biased low) in queueing-type models that eventually get congested
• Depending on model, parameters, and run length, the bias can be very severe
48
Warm Up and Run Length (cont’d.)
• Remedies for initialization bias– Better starting state, more typical of steady state
• Throw some entities around the model• Can be inconvenient to do this in the model• How do you know how many to throw and where? (This is what
you’re trying to estimate in the first place.)
– Make the run so long that bias is overwhelmed• Might work if initial bias is weak or dissipates quickly
– Let model warm up, still starting empty and idle• Simulate module: Warm-Up Period (time units!)• “Clears” all statistics at that point for summary report, any cross-
replication data saved with Statistics module’s Outputs area (but not Time-Persistent or Tallies)
Page 25
TDEI52 Simulering och Modellering
49
Warm Up and Run Length (cont’d.)
• Warm-up and run length times?– Most practical idea: preliminary runs, plots– Simply “eyeball” them– Statistics module, Time-Persistent and Tallies areas, then
Plot with Output Analyzer
• Be careful about variability — make multiple replications, superimpose plots
• Also, be careful to note “explosions”
50
Warm Up and Run Length (cont’d.)
• No explosions
• All seem to be settling into steady state
• Run length seems adequate to reach steady state
• Hard to judge warm-up ...
Page 26
TDEI52 Simulering och Modellering
51
Warm Up and Run Length (cont’d.)
• “Crop” plots to time 0 - 5,000– Plot dialog,
“Display Time from … to …”
• Conservative warm-up: maybe 2,000
• If measures disagreed, use max warm-up
52
Truncated Replications
• If you can identify appropriate warm-up and run-length times, just make replications as for terminating simulations– Only difference: Specify Warm-Up Period in Simulate
module– Proceed with confidence intervals, comparisons, all
statistical analysis as in terminating case
Page 27
TDEI52 Simulering och Modellering
53
Truncated Replications (cont’d.)
• Output Analyzer, Classical Confidence Intervals
• Separate invocations due to different units
• Interpretation for steady-state expectations here
• Want smaller?– More reps, same length
– Longer reps, same number of them
54
Batching in a Single Run
• If model warms up very slowly, truncated replications can be costly– Have to “pay” warm-up on each replication
• Alternative: Just one R E A L L Y long run– Only have to “pay” warm-up once– Problem: Have only one “replication” and you need more
than that to form a variance estimate (the basic quantity needed for statistical analysis)• Big no-no: Use the individual points within the run as “data” for
variance estimate
• Usually correlated (not indep.), variance estimate biased
Page 28
TDEI52 Simulering och Modellering
55
Batching in a Single Run (cont’d.)
• Break each output record from the run into a few large batches– Tally (discrete-time) outputs: Observation-based– Time-Persistent (continuous-time): Time-based
• Take averages over batches as “basic” statistics for estimation: Batch means– Tally outputs: Simple arithmetic averages– Time-Persistent: Continuous-time averages
• Treat batch means as IID– Key: batch size for low correlation (details in text)– Still might want to truncate (once, time-based)
56
Batching in a Single Run (cont’d.)
• Picture for WIP (time-persistent):
– For observation-based Tallies, just count points
• To batch and analyze (details in text):– Statistics module, Time-Persistent, Tally areas to save
within-run records (could be big files)– Output Analyzer, Analyze/Batch/Truncate or
• Warning if batches are too small for IID
– Get means .flt file; Classical C.I. as before
Page 29
TDEI52 Simulering och Modellering
57
Automatic Run-Time Confidence Intervals via Batch Means
• Arena will automatically attempt to form 95% confidence intervals on steady-state output measures via batch means– “Half Width” column in summary output– Ignore if you’re doing a terminating simulation– Uses internal rules for batch sizes (details in text)– Won’t report anything if your run is not long enough
• “(Insuf)” if you don’t have the minimum amount of data Arena requires even to form a c.i.
• “(Correl)” if you don’t have enough data to form nearly-uncorrelated batch means, required to be safe
58
Recommendations, Other Methods for Steady-State Analysis
• What to do?– Frankly, try to avoid steady-state simulations
• Look at goal of the study
– If you really do want steady-state• First try warm-up and truncated replications
• Automatic run-time batch-means c.i.’s
• Batch-means c.i.’s “by hand” if necessary
• Other methods, goals– Large literature on steady-state analysis– Other goals: Comparison, Selection, Ranking
Page 30
TDEI52 Simulering och Modellering
59
Documenting and Presenting the Results
• A written final report or presentation– quality of report may determine if your results are accepted– include executive summary (one page or less)
• Program and progress documentation
• Disseminating the model– using ”Pack and Go” feature you can create run-only
versions of your models– these models can be run by Arena Viewer (no license
required)
60
Implementation
• Depends on how well all previous steps are completed
• Tangible product - clear recommendations about action to be taken
• Improved knowledge and insight– can occur already at problem structuring phase– insight gained during experimentation