Comparison of Alternatives Notes

Preview:

Citation preview

Azim Houshyar, April 2015 Page 1

Personal Notes on Simulation Methodology Azim Houshyar, Ph.D.

ANALYSIS OF SIMULATION DATA

Comparison and Evaluation of Alternative System Designs

Introduction In most manufacturing processes management is faced with making decisions on competing system designs or alternative operating policies. In those situations, the management is asked to choose between different alternatives. Realizing that making decisions without a detailed analysis of the suitability of each alternative is not acceptable, you may want to use simulation to compare alternatives before implementation. Remember that simulation is a powerful tool for answering what-if questions. In this section, we discuss statistical analyses of the output from several different simulation models that might represent competing system designs or alternative operating policies. This is a very important subject, since the real utility of simulation lies in comparing such alternatives before implementation. Therefore, appropriate statistical methods are essential, if we are to avoid making serious errors. As an example assume that we want to compare an M/M/1 queue, in which customers arrive with rate of 1 per minute and are served with mean of 0.9 minute, with a comparable M/M/2, in which customers arrive with rate of 1 per minute, and are served by one of the two servers with mean of 1.8 minutes. Even though queuing theory tells us that the customers wait less in the second system, the one run of simulation will mislead us most of the times. There are some very useful statistical tools that can be used for this purpose. An example of such tools is a confidence interval on the difference in two means. As an example, assume that your company is considering the purchase of new equipment that is believed to improve throughput. You are asked to run two simulations to compare the performance of the existing system with that of the new system to see if there is a significant difference. How would you respond to the management's inquiry?

Azim Houshyar, April 2015 Page 2

Comparison of Two System Designs Forming a confidence interval for the difference in the two expectations is a better comparison than a hypothesis test to see whether the observed difference is significantly different from zero. It gives more information than just accepting or rejecting the hypothesis. In most manufacturing processes the mean and standard deviation of a process are unknown, and we have to estimate them using sample observations. Assume that you want to compare the difference in the performance of two processes. Recognizing that the simulation output data are stochastic, comparing the two systems on the basis of only one run is a complete negligence of the simulation methodology. For i=1,2, let {Xi1, Xi2,…, Xin} be a sample of ni IID observations from system i, and let

i=E(Xij) be the expected response of interest. We want to construct a confidence

interval for =1-2.

A Paired-t Confidence Interval: For simulating two different configurations with the same number of replications, say n, we can pair the observations from X1 and X2 for the j-th replication, and calculate Dj which represents the difference between the two outputs for the j-th replication.

The Dj's are IID random variables, and the E(Dj)= . Therefore, the mean and variance of the random variable D are:

D(n) = { Dj} / n

Var(D) = S2[D(n)] = { [Dj - D(n)]

2} / (n-1)

Var(D) = S2[D(n)] = { [Dj - D(n)]

2} / [n.(n-1)]

If the random variables Dj are Normally distributed, then the 100(1-)% confidence

interval for the difference in means 1-2 is:

D(n) – t n-1,1-/2. S[D(n)] < 1-2 < D(n) + t n-1,1-/2. S[D(n)]

where t n-1,1-/2 is the upper (1-/2) critical point for a t-distribution with (n-1) degrees of freedom. Even if the random variables Dj are not normally distributed, by increasing the number of replications, we can approximate the Normal equation. The importance of this approach is that there is no need for assuming that X1 and X2 be independent, and certainly no need for the assumption of equality of their variances. In fact, being able to positively correlate the two random variables will help reduce the variance and improve the confidence interval. To positively correlate the two random variables, we can choose the same streams of random variables for simulating both configurations.

Azim Houshyar, April 2015 Page 3

Allowing positive correlation between X1j and X2j can be of great importance, since this leads to a reduction of Var(Dj), and thus a smaller confidence interval. We will see that the method of common random numbers can induce this positive correlation between observations on the two systems - note that the Xij’s are random variables defined over an entire replication. For example, Xij might be the average of the 100 delays on the j-th replication, and is not the delay of an individual customer.

A Modified Two-Sample-t Confidence Interval: This method does not pair up the observations from the two systems, but does require that the X1j’s be independent of the X2j’s. However, n1 and n2 can now be different. Assume we make n1 replications for the first configuration and n2 replications for the second configuration. Then we have two independent random variables:

a) X1 with unknown mean 1 and unknown variance 12

b) X2 with unknown mean 2 and unknown variance 22

If it is reasonable to assume that both variances are approximately equal, we can use

classical statistics to find a 100(1-)% confidence interval on the difference in means,

1-2. Denote the mean and variance of the replications of the first system of size n1 by X1 and S1

2; and the mean and variance of the replications of the second system of size n2

by X2 and S22. Then, if it is reasonable to assume that 1

2 is approximately equal to 2

2,

then SD2 is a good estimate of the common variance

2.

SD

2 = {(n1-1). S1

2+ (n2-1). S2

2}/(n1+ n2-2)

Moreover, the statistic T is t-distributed with (n1+ n2-2) degrees of freedom (dof).

T = {(X1 – X2) - ( 1-2)}/{ SD.[1/ n1+1/ n2]}0.5

A 100(1-)% two-sided C.I. on the difference in means, 1-2, is:

(X1–X2)– tn1+n2-2,1-/2.SD.[1/n1+1/n2]}0.5

< 1-2 < {(X1–X2) + tn1+n2-2,1-/2.SD.[1/ n1+1/ n2]}0.5

The choice of either the paired-t or the modified approach will usually be made according to the situation. Note that the basic ingredient for most comparison techniques is a sample of IID observations with expectation equal to the performance measure on which the comparison is to be made. This is easily done for terminating simulations, because such observations come naturally by simply replicating the simulation some number of times. But what if we wanted to compare two (or more) systems on the basis of a steady-state measure of performance. Here we can no longer simply replicate the models, since initialization effects may bias the output.

Azim Houshyar, April 2015 Page 4

There are means to solve this problem. For instance, if the warm-up period is long, we might want to use batch means on each alternative system to obtain IID unbiased observations, but to eliminate correlation between batches, we must take care to define the batches appropriately.

We are making c=k-1 individual intervals, so they should each be constructed at a level

1-/(k-1). Note that the Bonferroni inequality is quite general, it doesn’t matter how the individual confidence intervals are formed, they need not result from the same number of replications, nor must they be independent.

All pairwise comparisons: if we want to compare each system with any other system to detect and quantify any significant pairwise differences, then one approach would be

to form confidence intervals for the differences i2-i1 for all i1 and i2 between 1 and k, with i1<i2. We will have c = k.(k-1)/2 individual intervals, so each must be made at level

1-/c in order to have a confidence level of at least 1- for all the intervals together.

Comparison of Several System Designs The Bonferroni inequality implies that if we want to make some number, say c, confidence interval statements, then we should make each separate interval at level 1-

/c, so that the overall confidence level associated with all intervals, covering their

targets will be at least 1-. Although there are many goals for comparing k systems, we will focus on the following two procedures:

1. Comparisons with a standard

2. All pairwise comparisons.

Comparisons with a standard: Suppose that one of the model variants is a standard, perhaps representing the existing system or policy. If we call the standard system 1 and the other variants systems 2, 3, …, k, the goal is to construct k-1 confidence

intervals for the k-1 differences, 2-1, 3-1, …, k-1, with overall confidence level 1-.

Azim Houshyar, April 2015 Page 5

Ranking and selection of one of the k systems as being the best one: Let Xij be the

random variable of interest from the j-th replication of the i-th system, and let i=E(Xij). Assume that Xij‘s are all independent of each other, i.e., the replications for a given alternative are independent, and the runs for different alternatives are also made independently. For example, Xij could be the average total cost per month for the j-th replication of policy i.

Let il be the i-th smallest of the i‘s, so that i1 <i2 <…<ik. Our goal is to select a

system with the smallest expected response, i1. Let “CS” denote this event of correct

selection. Note that if i1 and i2 are actually very close together, we might not care if we erroneously choose system i2, so we want a method that avoids making a large number of replications to resolve this unimportant difference.

The exact problem formulation is that we want P(CS)>P* provided that i2 -i1>d*, where P* and d* are specified by the analyst. Consider a two stage sampling from each of the k systems. In the first stage, we make a fixed number of replications of each system, then use resulting variance estimates to determine how many more replications from each system are necessary in the second stage of sampling -- in order to reach a decision. It must be assumed that the Xij’s are normally distributed, but we don’t have to assume

that the values of i2=Var(Xij) are known, nor do we have to assume that i

2 are the

same for different i’s. In the first-stage sampling, we make n0>2 replications of each of the k systems, and define the first-stage sample means and variances as follows:

Xi(1)

(n0) = { Xij} / n0

Si2(n0) = { [Xij - Xi

(1)(n0)]

2} / (n0-1) where both summations are from j=1 to n0.

We, then make Ni-n0 more replications of system i (i=1, 2,…,k) and obtain the second-stage sample means:

Xi(2)

(Ni-n0) = { Xij}/(Ni-n0) where the summation is from j= n0+1 to Ni. Finally, we define the weighted sample mean as:

Xi(Ni) = wi1. Xi(1)

(n0)+ wi2. Xi(2)

( Ni-n0) The last step is to select the system with the smallest Xi(Ni). The values of total sample size Ni needed for system i, and the weights wi1 and wi2 can be found in Law and Kelton, page 597.

Azim Houshyar, April 2015 Page 6

Statistical Methods for Estimating the Effect of Design Alternatives In a previous section we assumed that various configurations are externally designated as the only feasible choices, and we ventured to determine the best configuration. In the design of experiments, however, there is very limited instruction as to the model specifications that may lead to optimal system performance. For instance, we may be interested in determining which of possibly many parameters and assumptions have a greater effect on the systems effectiveness. Experimental design helps us decide (before the runs are made) which configurations to simulate so that the desired knowledge are extracted with minimal work on data collection and simulation runs. To learn more about systematic design of experiments we need to be familiar with some additional statistical procedures. Many standard experimental designs have little or no importance for simulation experiments because they were developed for physical experiments in which the experimenter lacks complete control over the experiment, or is unable to collect data for certain conditions of factors. In design of experiments, we want to design the strategic data collection procedure so as to perform only the informative experiments. For this purpose, we need to define some of the experimental-design terminology such as: factors, responses, and levels. Factor Input parameters and structural assumptions of a model that will be changed in the

course of simulation. Factors can be either quantitative, or qualitative. A quantitative factor is one whose levels can be measured on a numerical scale, while a qualitative factor represents structural assumptions that are not naturally quantified. WIP, number of machines, mean inter-arrival time, reorder point, and processing time are examples of quantitative factors, whereas queuing policy, ordering policy, and maintenance policy are examples of qualitative factors.

Response Output performance measures or quantities to be measured. Throughput, makespan, machine utilization, and delay in queue are some examples of responses.

Levels Different settings or values used for the factors. A combination of factors all at a

specified level is called a treatment.

Because simulation is conducted in a completely controlled environment, and in particular, the simulation analyst controls the sources of random variation, it is possible to replicate a simulation model under identical conditions. There are several experimental designs used for estimating the effects of the factors, including:

single-factor completely randomized experimental design

factorial design with two factors.

Azim Houshyar, April 2015 Page 7

In a model with only one factor, the design of experiments is simple: run n replications at different levels of the factor, and use the concepts of the previous section to determine if there is a significant difference between different levels of the factor under consideration. Even if the number of factors is two, we have the technique for the comparison of the means that can be used accordingly. But what if the number of factors is three or more, then, we need to know more about Factorial design. In a factorial design, we want to study the actual impact of these factors on the systems responses. When there are numerous factors of interest in an experiment, a factorial design should be used. These are designs in which factors are varied together; that is, in each replication of the experiment, all combinations of the levels of the factors are examined. A factorial design is a strategic plan for gaining information about the impact of the factors on the response. The design specifies how many runs of the simulation are to be performed, and what level or value of each of the factors is to be used for each run. So, the factorial design provides more than a way to compare pre-specified alternatives; it also provides a strategy for determining which alternatives should be compared.

Single-Factor Completely Randomized Experimental Design (SFCR): When there is only one factor having some number of levels, say k, the experiment is called a single-

factor experiment. The effect of level j of the factor is called j. If the experiments of the model at each level and for different levels of the factor are based on independent streams of random numbers, the design is called a completely randomized design. Note that this condition implies that correlated sampling (use of common random numbers) is not used across factors. Note also that the numbers of replications at each level of factor do not have to be the same. The statistical model for the analysis of the SFCR experimental design with k treatment

level is: Yrj = +j + rj r = 1, 2, …, Rj and j = 1, 2, …, k where:

Yrj is observation r of the response variable for levels j of the factor.

is the overall mean effect.

j is the effect due to level j of the factor.

rj is a random error in observation r at level j, assumed to be N(0,2).

Rj is the number of observations made at level j.

K is the number of levels of the factor under study.

We will look at the model for which and j are assumed to be fixed and to satisfy j = 0 (the summation is for j going from 1 to k). This model is called the fixed effects model.

If the level of the factors j can not be fixed but instead are chosen at random from some population that is assumed to be normally distributed, then the resulting model is a random effects model.

Azim Houshyar, April 2015 Page 8

The initial analysis of a single-factor fixed-effects completely randomized experiment consists of a statistical test of the hypothesis:

H0: j = 0 (j = 1, 2, …, k) That is, the factor has no statistically significant effect on the response variable. The applicable statistical test is a one-way analysis of variance (ANOVA). The test consists of computing an F-statistic and comparing its value to an appropriate critical value. The layout used for ANOVA analysis is as follows:

Replication r

Level j of the single factor 1 2 … j … k

1 2 … Rj

Y11 Y12 … Y1j … Y1k

Y21 Y22 … Y2j … Y2k

YR1,1 YR2,2 … YRj,j … YRk,k

Totals T01 T02 … T0j … T0k T00

Means Y01 Y02 … Y0j … Y0k Y00

The variation of the response variable, Yij, about the overall sample mean, Y00, can be written as: Yrj - Y00 = (Y0j - Y00) + (Yrj - Y0j) (Y0j-Y00) is due to variation of a treatment mean from the grand mean and (Yrj-Y0j) represents the deviation of the response from the treatment mean at its level.

(Yrj - Y00 )2 = (Y0j - Y00)

2 + (Yrj - Y0j)

2

SS Total = SS Treat + SS Error

In the first term, j goes from 1 to k, and r goes from 1 to Rj, in the second term, j

goes from 1 to k, and finally in the last term, j goes from 1 to k, and r goes from 1 to Rj. Note: 1. If the assumption of a common variance is correct, then:

MSE =SSE/(R-k) is an unbiased estimate of variance 2 of the response variable

Y, that is: E(MSE) = 2.

2. If H0 is true, then:

MSTreart =SSTreat/(k-1) is an unbiased estimate of variance 2.

3. In any case MSE and MSTreart are statistically independent when the data are

Normally distributed.

Azim Houshyar, April 2015 Page 9

When H0 is true, SSTreat/2 and SSError/

2 have

2 distribution with (k-1) and (R-k)

degrees of freedom. Therefore, the test statistic for testing the H0 hypothesis is:

F = MSTreart / MSE = [SSTreat/(k-1)] / [SSError/(R-k)]. When H0 is true, this test statistic has an F-distribution with (k-1) and (R-k) degrees of freedom. The ANOVA test of the hypothesis H0 is:

To reject H0 if: F > F1-,(k-1),(R-k)

Fail to reject H0 if: F < F1-,(k-1),(R-k) Note that if the test indicates a statistically significant effect due to the factor, the

analyst may be interested in estimating 100(1-)% confidence interval for (+j) using:

Y0j + t /2. (R-k). [MSE/Rj] Even though there are numerous software packages available for the ANOVA analysis, the following table can be used for manual calculation and is common to almost all software applications.

Source of Sum of d.o.f Mean F Variation squares squares

Treatment SSTreat k-1 MSTreat MSTreat /MSE

Error SSE R-k MSE Total SSTotal R-1

Factorial Designs with Two Factors: The statistical model for the analysis of the factorial designs with two factors is:

Yijr = +Ai + Bj + ijr Where:

Yijr is the observation of the response variable Y, for replication r of level i of the first factor (A) and level j of the second factor (B).

To conduct an ANOVA test, assuming that there are a levels of factor A, b levels of factor B, and k replications at each treatment level (for a total of R = abk replications), then:

SSTotal = (Yijr - Y000 )2

SSA= b.k.(Yi00 - Y000 )

2

SSB= a.k.(Y0j0 - Y000 )

2

SSAB= k.(Yij0 – Yi00 – Y0j0 +Y000 )2

SSE = SSTotal - SSA - SSB - SSAB

Azim Houshyar, April 2015 Page 10

In the first summation, i goes from 1 to a, j goes from 1 to b, and r goes from 1 to

k. In the second summation, i goes from 1 to a. In the third summation, j goes from

1 to b, and in the last summation, i goes from 1 to a, and j goes from 1 to b. The layout used for manual calculation of the ANOVA is as follows:

Source of Sum of d.o.f Mean F Variation squares squares

Factor A SSA a-1 MSA = SSA/(a-1) MSA/MSE

Factor B SSB b-1 MSB = SSB/(a-1) MSB /MSE

Factor AB SSAB (a-1)(k-1) MSAB = SSAB/[(a-1)(k-1)] MSAB /MSE Error SSE ab(k-1) MSE=SSE/[ab(k-1)] Total SSTotal abk-1

This layout allows three hypotheses to be tested:

H01: Ai = 0 for all i

H02: Bj = 0 for all j

H03: ABij = 0 for all I and j

Metamodeling: Suppose that there is a simulation output response variable, Y, that is related to k independent variables, say X1, X2, …, Xk. In most cases the functional relationship is unknown, and the analyst must select an appropriate function containing unknown parameters, and then estimate those parameters from a set of data (Y,X). Regression Analysis is one such method for estimating the parameters. As an example, suppose that it is desired to estimate the relationship between a single independent variable X and a dependent variable Y, and suppose that the true relationship between Y and X is linear.

E(Y | x) = 0 + 1 x It is further assumed that each observation of Y can be described by the model:

Y = 0 + 1 x +

where is a random error with mean zero and constant variance 2. Suppose that there

are n pairs of observations (y1, x1), …, (yn, xn). These observations may be used to

estimate 0 and 1. In the method of least squares, 0 and 1 are estimated such that the sum of the squares of the deviations between the observations and the regression line is minimized.

0 = Y = yi/n 1 = [yi.(xI – X)]/ [(xI – X)] Testing for significance of regression is one of many hypothesis tests that can be

developed. Suppose the null hypothesis is H0 : 1 = 0, then the appropriate test statistic for significance of regression is given by:

T0 = 1 /[MSE/SXX], where MSE = (i)2/(n-2) and SXX = (Xi)

2 – [(Xi)

2/n].

Recommended