9
The calibration of traffic microsimulation models has received widespread attention in transportation modeling. A recent concern is whether these models can simulate traffic conditions realistically. The recent widespread deployment of intelligent transportation systems in North America has provided an opportunity to obtain traffic-related data. In some cases the distribution of the traffic data rather than simple measures of central tendency such as the mean, is available. This paper examines a method for calibrating traffic microsimulation models so that simulation results, such as travel time, represent observed distributions obtained from the field. The approach is based on developing a statistically based objective function for use in an automated calibration procedure. The Wilcoxon rank–sum test, the Moses test and the Kolmogorov–Smirnov test are used to test the hypothesis that the travel time distribution of the simulated and the observed travel times are statistically identical. The approach is tested on a signalized arterial roadway in Houston, Texas. It is shown that potentially many different parameter sets result in statistically valid simulation results. More important, it is shown that using simple met- rics, such as the mean absolute error, may lead to erroneous calibration results. The calibration of traffic microsimulation models has received wide- spread attention because of the increase in their use in the evaluation of both traffic operations and transportation planning applications. The ability to accurately and efficiently model traffic flow character- istics, drivers’ behavior, and traffic control operations is critical for obtaining realistic microsimulation results. Because of the difficulty in collecting data in the field and the lack of readily available auto- matic calibration procedures, traffic microsimulation models are often used with default parameter values. If the parameters are adjusted, they are typically based on educated guesses or manual trial-and-error calibration approaches. If the traffic microsimulation model has inaccurate or inappropriate parameters, then there is a greater prob- ability that incorrect results will be obtained and, ultimately, that could lead to faulty decisions. In this paper calibration is defined as the process of adjusting the value of the microsimulation model parameters such that the observed data are “consistent” with the simulated data. There are three key points to the last sentence. The first is that collecting empirical data is difficult, expensive, and time-consuming (1). However, with the recent widespread deployment of intelligent transportation systems (ITS) nationwide there is an abundance of data on traffic systems and there is an opportunity to use these data for calibration. The sec- ond is that with the recent growth in computational resources it is now possible to develop automatic calibration procedures based on stan- dard optimization theory that takes account of these ITS data. Last, and more important, from the perspective of this paper it is unclear what “consistent” means in the context of data calibration. There are many types of ITS data acquisition technologies, and the data may be archived in different ways. For example, travel times of individ- ual vehicles may be stored, the distributions of travel times may be stored, or the measures of central tendency (i.e., mean) and dispersion (variance) may be stored. The focus of this paper is on developing a statistically based def- inition of consistency and demonstrating this definition using an auto- matic calibration procedure. A genetic algorithm (GA) is used to find the best parameters. In this case the observed data are in the form of a distribution, as opposed to an average. The Wilcoxon rank–sum test and the Moses test, and the Kolmogorov–Smirnov test are used to test whether the observed and simulated travel time populations are statistically equivalent. To the authors’ knowledge this is the first research on the calibration of traffic microsimulation models that uses that approach. This paper consists of the following six sections: (a) literature overview, (b) implementation, (c) statistically based objective function, (d ) calibration procedure, (e) analysis of results, and ( f ) concluding remarks. OVERVIEW Historically, traffic microsimulation model calibration was a simple procedure because of the lack of available data and the relatively high cost of manual search techniques. With the greater availability of ITS data and higher-speed computers, it is now possible to apply the optimization theory toward the automated calibration of simulation models. Although there are numerous optimization procedures that have been used to calibrate traffic microsimulation models, the follow- ing techniques have been used most frequently: (a) manual search, (b) gradient approach, (c) simplex-based approach, and (d) artificial intelligence techniques. In a manual search the selected parameters that need to be calibrated are explicitly changed on the basis of previous knowledge and expe- rience with the simulation model. It is a commonly used, less compli- cated, and intuitive approach, and typically the transportation engineer Calibration of Microsimulation Models Using Nonparametric Statistical Techniques Seung-Jun Kim, Wonho Kim, and L. R. Rilett S.-J. Kim, Texas Transportation Institute, Texas A&M University System, 3136 TAMU, College Station, TX 77843-3136. W. Kim and L. R. Rilett, Mid- America Transportation Center, University of Nebraska, W348 Nebraska Hall, Lin- coln, NE 68588-0531. Current affiliation for W. Kim: Seoul Development Institute, 391 Secho-Dong, Secho-Gu, Seoul, South Korea. 111 Transportation Research Record: Journal of the Transportation Research Board, No. 1935, Transportation Research Board of the National Academies, Washington, D.C., 2005, pp. 111–119.

Calibration of Microsimulation Models Using Nonparametric ...courses.washington.edu › cee500e › Files › Rilett.pdf · calibration approaches. If the traffic microsimulation

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Calibration of Microsimulation Models Using Nonparametric ...courses.washington.edu › cee500e › Files › Rilett.pdf · calibration approaches. If the traffic microsimulation

The calibration of traffic microsimulation models has received widespreadattention in transportation modeling. A recent concern is whether thesemodels can simulate traffic conditions realistically. The recent widespreaddeployment of intelligent transportation systems in North America hasprovided an opportunity to obtain traffic-related data. In some cases thedistribution of the traffic data rather than simple measures of centraltendency such as the mean, is available. This paper examines a methodfor calibrating traffic microsimulation models so that simulation results,such as travel time, represent observed distributions obtained from thefield. The approach is based on developing a statistically based objectivefunction for use in an automated calibration procedure. The Wilcoxonrank–sum test, the Moses test and the Kolmogorov–Smirnov test are usedto test the hypothesis that the travel time distribution of the simulatedand the observed travel times are statistically identical. The approach istested on a signalized arterial roadway in Houston, Texas. It is shownthat potentially many different parameter sets result in statistically validsimulation results. More important, it is shown that using simple met-rics, such as the mean absolute error, may lead to erroneous calibrationresults.

The calibration of traffic microsimulation models has received wide-spread attention because of the increase in their use in the evaluationof both traffic operations and transportation planning applications.The ability to accurately and efficiently model traffic flow character-istics, drivers’ behavior, and traffic control operations is critical forobtaining realistic microsimulation results. Because of the difficultyin collecting data in the field and the lack of readily available auto-matic calibration procedures, traffic microsimulation models are oftenused with default parameter values. If the parameters are adjusted,they are typically based on educated guesses or manual trial-and-errorcalibration approaches. If the traffic microsimulation model hasinaccurate or inappropriate parameters, then there is a greater prob-ability that incorrect results will be obtained and, ultimately, that couldlead to faulty decisions.

In this paper calibration is defined as the process of adjusting thevalue of the microsimulation model parameters such that the observeddata are “consistent” with the simulated data. There are three key

points to the last sentence. The first is that collecting empirical datais difficult, expensive, and time-consuming (1). However, with therecent widespread deployment of intelligent transportation systems(ITS) nationwide there is an abundance of data on traffic systemsand there is an opportunity to use these data for calibration. The sec-ond is that with the recent growth in computational resources it is nowpossible to develop automatic calibration procedures based on stan-dard optimization theory that takes account of these ITS data. Last,and more important, from the perspective of this paper it is unclearwhat “consistent” means in the context of data calibration. There aremany types of ITS data acquisition technologies, and the data maybe archived in different ways. For example, travel times of individ-ual vehicles may be stored, the distributions of travel times may bestored, or the measures of central tendency (i.e., mean) and dispersion(variance) may be stored.

The focus of this paper is on developing a statistically based def-inition of consistency and demonstrating this definition using an auto-matic calibration procedure. A genetic algorithm (GA) is used to findthe best parameters. In this case the observed data are in the form ofa distribution, as opposed to an average. The Wilcoxon rank–sumtest and the Moses test, and the Kolmogorov–Smirnov test are usedto test whether the observed and simulated travel time populationsare statistically equivalent. To the authors’ knowledge this is the firstresearch on the calibration of traffic microsimulation models thatuses that approach. This paper consists of the following six sections:(a) literature overview, (b) implementation, (c) statistically basedobjective function, (d ) calibration procedure, (e) analysis of results,and ( f ) concluding remarks.

OVERVIEW

Historically, traffic microsimulation model calibration was a simpleprocedure because of the lack of available data and the relativelyhigh cost of manual search techniques. With the greater availabilityof ITS data and higher-speed computers, it is now possible to apply theoptimization theory toward the automated calibration of simulationmodels.

Although there are numerous optimization procedures that havebeen used to calibrate traffic microsimulation models, the follow-ing techniques have been used most frequently: (a) manual search,(b) gradient approach, (c) simplex-based approach, and (d) artificialintelligence techniques.

In a manual search the selected parameters that need to be calibratedare explicitly changed on the basis of previous knowledge and expe-rience with the simulation model. It is a commonly used, less compli-cated, and intuitive approach, and typically the transportation engineer

Calibration of Microsimulation Models Using Nonparametric Statistical Techniques

Seung-Jun Kim, Wonho Kim, and L. R. Rilett

S.-J. Kim, Texas Transportation Institute, Texas A&M University System, 3136 TAMU, College Station, TX 77843-3136. W. Kim and L. R. Rilett, Mid-America Transportation Center, University of Nebraska, W348 Nebraska Hall, Lin-coln, NE 68588-0531. Current affiliation for W. Kim: Seoul Development Institute,391 Secho-Dong, Secho-Gu, Seoul, South Korea.

111

Transportation Research Record: Journal of the Transportation Research Board, No. 1935, Transportation Research Board of the National Academies, Washington,D.C., 2005, pp. 111–119.

Page 2: Calibration of Microsimulation Models Using Nonparametric ...courses.washington.edu › cee500e › Files › Rilett.pdf · calibration approaches. If the traffic microsimulation

decides a priori the criteria for starting and stopping points (2, 3). Thegradient approach changes initial parameters on the basis of the per-ceived direction of the maximum increase of the objective function.The goal is to produce an optimal value of the objective functions.Each parameter is changed in proportion to the magnitude of its slope(4, 5). The simplex-based approach can be thought of as one of thepattern search techniques, which assume that a successful move isworth being repeated. In general, a series of simple moves are repeatedcontinually. The resulting simplex either grows or shrinks. The processrepeats itself until no further improvement can be made (5, 6 ).

A GA is a problem-solving algorithm that emulates biological evo-lutionary theories to solve problems of the field of optimization. Theuse of a GA does not require mathematically sophisticated knowledgeof the objective function to be optimized. The GA changes codedparameters based on probabilistic, not deterministic, rules, whichare applied through selection, crossover, and mutation. The GA hasbeen employed in a wide range of transportation applications, mostof which are for timing traffic signals and calibrating simulationmodels (6–10).

Of more critical importance is the measure of similarity between theempirical observations and simulated results used in the calibrationprocess. Many authors have used aggregated performance measures,such as average travel time or total traffic volume, and attempt tofind the best parameter set that minimizes some objective function.A common measure is the mean absolute error ratio (MAER), whichis shown in Equation 1. The parameter set that has the lowest MAERis selected as the “best” one.

where

Si = metric (i.e., travel time, speed, etc.) from simulation model,Oi = observed metric, andn = number of observations.

When an aggregated performance measure such as the MAER isused, it is automatically assumed that the parameter set that producesthe minimum MAER value is the best descriptor for real traffic con-ditions. However, that assumption is valid only when the distributionsfor the simulated and observed travel times are identical. That is, theonly difference between the results from different parameter sets isthe measure of central tendency (i.e., mean or median).

Kim calibrated the freeway corridor using empirical ITS data, inwhich travel time data were extracted from the automatic vehicleidentification system (6 ). Both the average absolute error and relativeabsolute error of average volume and speed have been used by Cheuet al. in the calibration objective function and the fitness function (8).Bloomberg and May used the average speed of vehicles obtainedfrom the main lane for calibrating a parameter set that was indicativeof the real system (11). Rakha et al. noted a variation in observedfield data due to the stochastic nature of traffic (12). Consequently, theyemphasized a statistical approach that would quantify the variabilityin the simulated data relative to the field data. However, that approachwas limited to daily variation in flow and to different simulationrandom seeds.

Although some authors have used more disaggregate data obtainedfrom individual probe vehicles equipped with a Global Position-ing System, these approaches tended to focus on validating thecar-following logic on which the traffic microsimulation model isbased rather than on calibrating a fully operational microsimulationmodel (13).

MAER =

−=∑ S O

On

i i

ii

n

1 1( )

112 Transportation Research Record 1935

IMPLEMENTATION

Test Bed

An arterial section of Bellaire Boulevard, which is located in thesouthwest section of Houston, Texas, was selected as the test bed.The location and map are shown in Figure 1. This arterial section isapproximately 1.1 km in length and includes four intersections—three signalized intersections and one two-way stop–controlled inter-section. The test bed is part of the Houston Bus Priority Project, andthere are three metro bus stops located within its boundaries. BellaireBoulevard, one of the major east–west arterials, has wide landscapedmedians. As such the test bed is heavily traveled and serves relativelyhigh-density residential areas. The traffic model used was VISSIMVersion 3.70. It was selected because of its ability to obtain detailedinformation for each vehicle, which can be used to generate thedistribution of the simulated travel times.

Travel Time Data

Travel time data were collected on October 16, 2003, during thea.m. peak period (7:30 a.m.∼8:30 a.m.). The average travel time was164 s, and the standard deviation was 54 s. A relatively large vari-ability in travel time was expected because the test bed is a signal-ized arterial that operates under a coordinated control system. Theobserved travel times in the study site exhibit a bimodal distributionas shown in Figure 2. The vehicles in the first peak represent thosethat receive a green signal; those in the second peak represent vehiclesthat are stopped by a red signal. The average travel time in the firstpeak (i.e., travel time less than 160 s) is 124 s, and the standard devi-ation is 18 s. In contrast, the average travel time in the second peak(i.e., travel time greater than 160 s) is 214 s, and the standard devi-ation is 42 s. Intuitively, the vehicles in the first peak would have asmaller standard deviation than those in the second peak because thevehicles in the progression band tend to travel at a similar speed tostay in the progression band.

STATISTICALLY BASED OBJECTIVE FUNCTION

When traffic conditions in a test bed have large variability and a highlynonnormal distribution, an aggregated performance measure such asmean travel time may not be the most appropriate measure of effec-tiveness for calibration purposes. If the aggregated performancemeasure is used in the calibration process, there is a danger that in-appropriate parameter sets may be selected. To avoid that situation thispaper proposes a statistically based approach, which is based on amore disaggregate form of the observed travel time. Specifically, the“closeness” of the observed travel time distribution to that of thesimulated travel time distribution is chosen as the objective function.

A conceptualization of the calibration process is shown in Figure 3.Note that because the process is statistically based, there may benumerous parameter sets (or none) that match the observed data. Whenthat occurs, an alternative selection technique for identifying the“best” parameter set is required, as will be discussed later.

There are numerous statistical methods for testing whether twosamples are drawn from the same population (14). The more popularof these techniques focuses on testing the equality of the means orvariance of the different distributions. The most popular methodsare the student t-test for testing means and the F-test for testing vari-ances. However, these tests do not examine the distribution of the

Page 3: Calibration of Microsimulation Models Using Nonparametric ...courses.washington.edu › cee500e › Files › Rilett.pdf · calibration approaches. If the traffic microsimulation

Kim, Kim, and Rilett 113

45 59

59 45

1010

290

610

610

8

8

8

8

BELLAIRE BLVD

Stop A Stop B Stop C

N45 59

59 45

1010

290

610

610

8

8

8

8

BELLAIRE BLVD

Stop A Stop B Stop C

N

(a)

(b)

FIGURE 1 Site map for (a) Houston and (b) test bed.

Page 4: Calibration of Microsimulation Models Using Nonparametric ...courses.washington.edu › cee500e › Files › Rilett.pdf · calibration approaches. If the traffic microsimulation

metric, which may be rather restrictive for many transportation appli-cations. In this study the metric of interest is the travel time on a sig-nalized arterial street, and for this situation modeling the distributionof travel times would be important.

In these situations nonparametric or distribution-free methods fortesting the difference between two sample populations are required.These techniques do not require a priori assumptions about the distri-bution of the underlying population other than that it is continuous.Two nonparametric tests are used to test the difference between twopopulations in this paper as will be discussed in the following sections.

Moses’ Distribution Free Rank-Like and Wilcoxon Rank–Sum Test

The Moses’ distribution free rank-like test is used for testing theequality of dispersion (15). the Moses test was carried out before theWilcoxon test for determining the median simply because for this test,

114 Transportation Research Record 1935

knowledge of the true value of the median is not a prerequisite. theMoses test is made up of two subtasks. The first subtask, noted asStep 1 through Step 3 inclusive, constructs subsamples that are usedto estimate dispersions. These dispersion estimates are similar tovariance estimates but are, in fact, simply sums of squares. The sec-ond subtask, Step 4, is used to compare the dispersions in the twogroups using the Wilcoxon test. Let X be the field observations and Ybe the observations from the microsimulation.

Step 1. Select a positive integer k ≥ 2, and randomly divide the Xand Y observations into m′ and n′ subgroups of size k.

Step 2. For i = 1, . . . , m′, define ith subgroup of X consisting ofk observations by Xi1, . . . , Xik; for i = 1, . . . , n′, define ith subgroupof Y consisting of k observations by Yi1, . . . , Yik.

Step 3. Define Ci, . . . , Cm′ and Di, . . . , Dn′ by the following:

Step 4. Use the Wilcoxon rank–sum test on C and D values. Thistest examines whether the adjusted estimates of the dispersion forthe two samples differ.

Step 5. If the Moses test is satisfied in Step 4, then use theWilcoxon rank–sum test for the median.

If the test for equality of dispersion for the two samples is notsatisfied in Step 4, the the Moses test ends in Step 4. Otherwise, theWilcoxon rank–sum test, Step 5, follows the Moses test and is usedto check the equality of location for two populations. In other words,it checks whether the underlying distributions of the populationshave equal dispersion. Note that the Wilcoxon test assumes that thedistributions of two populations have the same shape and spread anddiffer only in their locations. If the statistics for either of the testssignificantly exceeds the expected value when the null hypothesis istrue, that provides evidence against the fact that the two distributionsare identical.

D X X i ni is i

s

k

= −( ) = ′=

∑ 2

1

1 3, . , ( ). .

C X X i mi is i

s

k

= −( ) = ′=

∑ 2

1

1 2, . , ( ). .

100 150 200 250

Travel Time (second)

0

10

20

30

ycneuqerF

MEAN =124 S.D =18

MEAN =214 S.D =42

FIGURE 2 Observed travel time distribution from Houston andtest bed.

Simulated Travel Time Data

Observed Travel Time Data

Statistical Test

Same Distribution

)()( xFxF OS =

Acceptable Parameter Sets from

Statistically “Same” Population

Μ

Ox1

Ox2

Ox 3

Onx

Sx1

Sx2

Sx 3

Smx

.

.

.

.

FIGURE 3 Conceptualization of disaggregated performance measure in calibration.

Page 5: Calibration of Microsimulation Models Using Nonparametric ...courses.washington.edu › cee500e › Files › Rilett.pdf · calibration approaches. If the traffic microsimulation

Kolmogorov–Smirnov Test

The Kolmogorov–Smirnov tests can be used to test the hypothesisthat two populations have the same distribution. Let x1, . . . , xm be thefield observations with cumulative distribution function (CDF) F1,and let y1, . . . , yn be the observations from the microsimulation withCDF F2. The null hypothesis is shown below:

The Kolmogorov–Smirnov test statistic is defined as follows:

The hypothesis is rejected if the test statistic, D, is greater than thecritical value obtained from a KS table, which is found in most sta-tistics textbooks (16, 17 ). Note that there are a limited number ofentries in these tables with respect to level of significance and num-ber of samples. Consequently, the functions shown in Equations 5,6, and 7, which can be used to approximate the KS table entries,were coded into the calibration program (17 ).

where

S(D) = level of significance,Ne = effective number of data points,

N1, N2 = number of data points in simulated and observed distri-butions, and

QKS = monotonic function for computing significance level.

CALIBRATION PROCEDURE

The proposed automated calibration method employs a genetic algo-rithm (GA), which uses the nonparametric statistical testing meth-ods discussed earlier. This section contains a brief overview of theVISSIM calibration parameters, the fundamentals of the GA, andthe calibration procedure.

Q x eKSi i x

i

( ) = −( ) − −

=

∑2 1 71 2

1

2 2

( )

NN N

N Ne

i=+

2

1 2

6( )

S D Q N N DKS e e( ) = + +( ) ×[ ]0 12 0 11 5. . ( )

D F x F xi= ( ) − ( )max ( )2 4

H F x F x x0 1 2: ( ) = ( ) for all

Kim, Kim, and Rilett 115

VISSIM Calibration Parameters

VISSIM, which is based on a psychophysical driver behavior modeldeveloped by Wiedeman, attempts to capture both the physical and thehuman components of traffic (18). The basic concept of the psycho-physical car-following model is that drivers of faster-moving vehiclesare sensitive to the changes in distance and speed of (slower) movingvehicles located in front of them. Complete details of the theory maybe found elsewhere (18).

VISSIM includes a variety of user-controlled parameters that canbe used in a calibration process so that the simulated traffic outputcan match observed traffic data. The VISSIM calibration parameterscan be placed into two general categories: (a) driver behavior param-eters and (b) vehicle performance parameters. The base calibrationparameters for VISSIM that have been considered in this research arethe driver behavior parameters, which include both car-following andlane-changing parameters. The parameters are shown in Table 1,and a brief description of each is provided below:

1. Number of observed preceding vehicles. This parameter definesthe number of vehicles (located ahead of the current vehicle) that adriver will consider when making a decision.

2. Look ahead distance. This parameter defines the distance (ahead)a driver will consider when making a decision.

3. Average standstill distance. This parameter defines the averagedesired distance between stopped cars.

4. Desired safety distance. There are two parameters associatedwith the desired safety distance: an additive parameter and a multi-plicative parameter. These parameters affect the computation of thedesired minimum following distance for low speed differences and areused to identify the range of the desired safety distance. The saturationflow rate of VISSIM is determined by adjusting these parameters.

5. Lane change distance. This defines the distance at which vehicleswill begin to attempt to change lanes.

The minimum and maximum allowable values used in the calibra-tion procedure and the ranges of calibrated parameters are also pro-vided in Table 1. The minimum and maximum values were basedon the engineering judgment of the authors.

Genetic Algorithm Process

Although the detailed theory behind the GA can be found in the liter-ature, a basic description of the GA methodology and logic is presentedhere to aid in the comprehension of the simulation and calibrationresults.

TABLE 1 VISSIM Calibration Parameters

Default Allowable Calibrated ValueParameter(Pi) Unit Value Min.∼Max. Min.∼Max.

P1 Number of observed preceding vehicles veh. 2 0∼4 0∼4

P2 Look ahead distance m 250 0∼400 40∼390

P3 Average standstill distance (AX) m 2 1∼4 1∼4

P4 Additive part of desired safety distance NA 2 1∼10 1∼8

P5 Multiplicative part of desired safety distance NA 3 1∼10 1∼10

P6 Lane change distance m 200 50∼300 80∼290

Page 6: Calibration of Microsimulation Models Using Nonparametric ...courses.washington.edu › cee500e › Files › Rilett.pdf · calibration approaches. If the traffic microsimulation

The calibration parameters are encoded as strings of chromosomesthat are uniquely mapped to each of the parameters. In each gener-ation (i.e., iteration), the GA performs the following three opera-tions: reproduction, crossover, and mutation. Reproduction starts withassigning a probability of being selected to each chromosome. Theprobabilities are calculated on the basis of the fitness values obtainedby a predetermined fitness function whose input is, in this research,the p-value obtained from the KS test. The chromosomes with a higherfitness value are more likely to be selected during reproduction thanthose with a lower fitness value. After reproduction, the crossoveroperations are performed to create new offspring chromosomes fromthe parent chromosomes by exchanging genes. The mutation oper-ation is performed to ensure that fresh solutions are considered. Fol-lowing the process of reproduction, crossover, and mutation, a newlyderived population is generated and another new competition takesplace in which the weak candidates are discarded. This entire processis continued until the stopping rules are met. Previous work has shownthat it can be successfully used for microsimulation calibration. Its mainadvantage is that it searches over multiple locations and consequentlyhas less chance of identifying a local minimum.

Figure 4 provides an overview of the calibration procedure. It canbe seen that the procedure is essentially iterative in that a series ofcandidates are identified, simulation results of the candidates are eval-uated, and then a new population of candidates is generated. On thebasis of the statistical test for the microscopic simulation output,parent chromosomes are identified and stored in the pool of acceptedchromosomes. As can be seen in Figure 4, there are five steps in theprocedure; each step is explained in the following subsections.

116 Transportation Research Record 1935

Step 1. Initialize GA Parameters and SignificantLevel for Statistical Test

The first step in the calibration procedure is the initialization of the GAparameters including the population size (P), mutation probability(Pm), and probability of crossover (Pc). In addition, the maximumnumber of accepted chromosomes (Nt), the maximum number of iter-ations (Nr) and the significant level (α) are identified along with ascheme of statistical tests. In this research, the following values wereused: (P = 30, Pm = 0.3, Pc = 0.5, Nt = 150, Nr = 100, and α = 0.05).

Step 2. Operate Microscopic Simulation Model

The microscopic traffic simulation model is run with the input filein which the parameters generated in the format of binary strings aretranslated into the appropriate VISSIM format.

Step 3. Evaluate Model Output, and Select Parameter Set

The evaluation of the model output and selection of the potentialparameter set are the major components of the procedure proposedin this paper that distinguish it from other calibration methods. Themodel output for each candidate (i.e., chromosome) is evaluated usingthe statistical tests for equality of the populations. As discussed pre-viously, two descriptive statistics, median and dispersion, and the

5 Genetic Algorithm • Crossover & mutation

2 Micro-Simulation Model• Run N simulations • Generate simulation results

1 Initialization • Generate parents • Set GA probabilistic parameters • Set significance level

Moses’ test Wilcoxon test

Pool of “acceptable” parameters

4 Stopping Criteria

3 Statistically Based Objective Function

Kolmogorov Smirnov test

ITS Data

Travel time in distribution

Calibrated Model

FIGURE 4 Overview of calibration procedure.

Page 7: Calibration of Microsimulation Models Using Nonparametric ...courses.washington.edu › cee500e › Files › Rilett.pdf · calibration approaches. If the traffic microsimulation

maximum difference in the cumulative distribution function, are testedusing nonparametric testing methods. The chromosomes that areaccepted by both tests are stored in the pool of “acceptable” solutions,which are used in forming the new set of parent chromosomes. Anycandidate parameter set that is rejected by either test is discarded.

Step 4. Check Stopping Rules

After the parent chromosomes are selected, the stopping rules estab-lished for the analysis are checked. If either a maximum number ofacceptable chromosomes or a maximum number of iterations that wereidentified a priori is met first, the process ends. If not, the algorithmproceeds to Step 5, and new offspring chromosomes are generated.

Step 5. Perform Crossover and Mutation Operations

The total number of offspring created in this step is the sum of theoperation results from crossover (O1) and mutation (O2). The algorithmproceeds to Step 2, in which the offspring chromosome parametersets are simulated and the process continues.

ANALYSIS OF CALIBRATION RESULTS

The proposed calibration procedure was applied to the observed linktravel times of the test bed. The number of acceptable parameter setswas 74 for the Moses’ distribution free rank-like and Wilcoxonrank–sum test and 128 for the Kolmogorov–Smirnov (KS) test witha .05 level of significance.

Table 2 shows a sample of the acceptable parameter sets identifiedin the calibration. Also shown are the test statistics for both statisticaltests and the MAER, which is calculated using observed and simu-lated travel times. The examples are shown with respect to MAERin descending order. Most of the acceptable parameter sets have arelatively low travel time MAER, which ranges from 0.007 (0.7%) to0.067 (6.7%). Only 11 of the acceptable parameter sets have a traveltime MAER above 0.07%. It should be noted that the default param-eter set results in a MAER of 0.215 (21.5%), indicating that, onaverage, the travel times obtained from an uncalibrated model wouldhave a 21 percentage error. More important, the travel time distri-bution obtained using the default parameter set passed neither of thedistribution tests, which illustrates the importance of calibratingmicrosimulation models before using them.

Kim, Kim, and Rilett 117

There are a fairly large number of acceptable or “statistically valid”parameter sets. That shows the danger of identifying the “best” param-eter set based on a single metric such as average travel time. Therefore,another technique will be required to select the best parameter set.Intuitively, the one with the lowest MAER might be selected becauseit would represent the parameter set that provides the closest measureof central tendency while still providing a statistically valid distri-bution. Alternatively, the parameter with the highest p value could beselected as the one that most closely represents the observed distri-bution. Another selection criterion would be to select the parameter setthat is the least different from the default values. Engineering judgmentcould also be used—for example, the parameter set that best representsthe saturation flow rate might be an appropriate decision-makingmetric.

It is important to reiterate that simply using the MAER as the solemetric for identifying the best parameter set could lead to erroneousresults. Figure 5 illustrates two simulated travel time distributionsidentified during the calibration process. The cumulative travel timedistribution for each parameter set is shown in Figure 6. Althoughboth distributions have a MAER value of 1%, it can be seen that theshapes of the two distributions are considerably different. The sta-tistically valid parameter set results in a bimodal distribution similarto the observed distribution shown in Figure 2.

The MAERs are plotted in Figure 7 as a function of the corre-sponding p-values obtained from the KS test. The parameter sets thatare circled (lower left corner) have relatively low MAERs but failedthe nonparametric tests related to their distributions. The statisticalapproach proposed in this paper performs in such a manner that allparameter sets represented by circles are never selected.

To validate the calibrated parameter sets, the saturation flow ratefor the accepted parameter sets were compared with that from theHighway Capacity Manual (HCM) (19). Figure 8 shows the satura-tion flow rates obtained using Equation 16-4 in the HCM and by theaccepted parameter sets. The statistical test revealed that the saturationflow rates for the accepted parameter sets coincide with the valuesfrom the HCM. That is a further indication that the calibrated param-eter sets were selected appropriately. It should be noted that for thegiven traffic network the default parameter set resulted in a saturationflow rate of 2,180 vehicles per hour per lane, which is approximately36.2% higher than recommended by the HCM.

CONCLUDING REMARKS

There are numerous advantages of the proposed nonparametric cal-ibration approach. First, it provides a statistically based approach that

TABLE 2 Example of Accepted Parameter Sets

Calibrated Parameters P Value

No. P1 P2 P3 P4 P5 P6 Moses’ Wilcoxon KS Test MAER

1 2 1300 1 5 5 80 0.273 0.610 0.611 0.007

2 2 290 1 8 1 210 0.767 0.191 0.691 0.017

3 2 240 3 5 4 270 0.217 0.345 0.779 0.028

4 4 190 2 5 3 110 0.110 0.179 0.915 0.037

5 3 80 2 5 4 270 0.110 0.475 0.312 0.045

6 3 90 2 5 5 270 0.263 0.620 0.696 0.054

7 1 270 2 4 7 280 0.058 0.766 0.377 0.067

KS = Kolmogorov–Smirnov

Page 8: Calibration of Microsimulation Models Using Nonparametric ...courses.washington.edu › cee500e › Files › Rilett.pdf · calibration approaches. If the traffic microsimulation

118 Transportation Research Record 1935

(a)

1 5 0 200 250 300 0

25

50

75

100

Travel Time

100

Fre

qu

ency

FIGURE 5 Comparison of travel time distributions: (a) simulated travel time distribution (1% MAER)—rejected and (b) simulated travel time distribution (1% MAER)—accepted.

0

0.25

0.5

0.75

1

50 100 150 200 250 300 350

Travel Time

CD

F

Observed Travel Time CDF

Simulated Rejected Travel Time CDF

Simulated Accepted Travel Time CDF

FIGURE 6 Comparison of travel time CDF.

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.2 0.4 0.6 0.8 1

p Value

MA

ER

Accepted Parameter Rejected Parameter

FIGURE 7 Relationship between MAER and p value from KS test.

(b)

100 150 200 250 300

Travel Time

0

25

50

75

100

Fre

qu

ency

Page 9: Calibration of Microsimulation Models Using Nonparametric ...courses.washington.edu › cee500e › Files › Rilett.pdf · calibration approaches. If the traffic microsimulation

goes beyond simply identifying a parameter set that is closest to onesimple metric. As shown in the paper, simple metrics such as MAERare not robust enough to identify parameter sets that mimic the actualtravel time distribution. Second, there is the capacity for additionalanalysis of the candidate parameter sets. Therefore, analysts can bringtheir own knowledge in identifying the most appropriate of the can-didate parameter sets. For example, the authors used the lowest dif-ference in bus travel times between stops as their secondary objectivebecause the simulation model was used in the performance evaluationof a transit signal priority system. In addition, although it was not donein this paper it would be a relatively easy extension to incorporatethe decision making into the automatic calibration procedure.

For the test bed, the proposed calibration procedure was successfulin exploring the travel time distributions that are a bimodal mixtureof two distributions produced due to the effects of signal progres-sion. The travel time MAER was improved for all accepted param-eter sets as compared with the default values. It should be noted thatthe model should be applied on other networks and other networktypes (i.e., freeway) to determine whether the results identified inthis paper hold in other situations as well.

REFERENCES

1. Hu, K., S. Skehan, and R. Gephart. Implementation of a Smart TransitPriority System for Metro Rapid Bus in Los Angeles. Presented at 80thAnnual Meeting of the Transportation Research Board, Washington,D.C., 2001.

2. Daigle, G., M. Thomas, and M. Vasudevan. Field Applications of Corsim:I-40 Freeway Design Evaluation, Oklahoma City. Proc., 1998 WinterSimulation Conference, Washington, D.C., 1998.

3. Horst, R., and P. Pardalos. Handbook of Global Optimization. KluwerAcademic Publishers, Dordrecht, Netherlands, 1995.

4. Beightler, C. S., D. Philips, and D. Wilde. Foundations of Optimization,2nd ed. Prentice-Hall., Englewood, N.J., 1979.

5. Kleijnen, J. P. Sensitivity Analysis and Optimization in Simulation:Design of Experiments and Case Studies. Proc., 1995 Winter SimulationConference, Los Angeles, Calif., 1995.

Kim, Kim, and Rilett 119

6. Kim, K. Optimization Methodology for the Calibration of TransportationNetwork Microsimulation Models. Ph.D. dissertation. Texas A&M Uni-versity, College Station, 2002.

7. Kim, K., and L. R. Rilett. Genetic-Algorithm-Based Approach for Cal-ibrating Microscopic Simulation Models. Proc., 2001 IEEE IntelligentTransportation Systems Conference, Oakland, Calif., 2001, pp. 698–704.

8. Cheu, R. L., X. Jin, K. C. Ng, Y. L. Ng, and D. Srinivasan. Calibrationof FREESIM for Singapore Expressway Using Genetic Algorithm.Journal of Transportation Engineering, 1998, pp. 526–535.

9. Park, B., C. J. Messer, and T. Urbanik II. Enhanced Genetic Algorithmfor Signal-Timing Optimization of Oversaturated Intersections. InTransportation Research Record: Journal of the Transportation ResearchBoard, No. 1727, TRB, National Research Council, Washington, D.C.,2000, pp. 32–41.

10. Yin, Y. Genetic-Algorithms-Based Approach for Bilevel ProgrammingModels. Journal of Transportation Engineering, 2000, pp. 115–120.

11. Bloomberg, L., and A. D. May. Simulation Modeling of the Santa MonicaFreeway. California PATH Working Paper. UCB-ITS-PWP-94-14. 1994.

12. Rakha, H., M. Van Aerde, L. Bloomberg, and X. Huang. Constructionand Calibration of a Large-Scale Microsimulation Model of the SaltLake Area. In Transportation Research Record 1644, TRB, NationalResearch Council, Washington, D.C., 1998, pp. 93–102.

13. Brockfeld, E., R. D. Kühne, and P. Wagner. Calibration and Validationof Microscopic Traffic Flow Models. In Transportation ResearchRecord: Journal of the Transportation Research Board, No. 1876,Transportation Research Board of the National Academies, Washing-ton, D.C., 2004, pp. 62–70.

14. J. L. Devore. Probability and Statistics for Engineering and the Sciences,4th ed. Duxbury Press, New York, N.Y., 1995.

15. Milton, J. S., and J. C. Arnold. Introduction to Probability and Statistics:Principles and Applications for Engineering and the Computing Sciences,3rd ed. McGraw-Hill International Editions, New York, 1995.

16. O’Connor, P. D. T. Practical Reliability Engineering, 4th ed. JohnWiley & Sons Ltd, Chichester, England, 2002.

17. Press, W. H., S. Teukolsky, W. Vetterling, and B. Flannery. NumericalRecipes in C: The Art of Scientific Computing. Cambridge UniversityPress, 2002.

18. Brackstone, M., and M. McDonald. Car-Following: A Historical Review.Transportation Research Part F, 1999, pp. 181–196.

19. Highway Capacity Manual. TRB, National Research Council, Washing-ton, D.C., 2000.

The Artificial Intelligence and Advanced Computing Applications Committee sponsored publication of this paper.

0

500

1000

1500

2000

0 20 40 60 80 100 120 140

Parameter Set

Sat

ura

tio

n F

low

Rat

e (v

eh/h

r/ln

)

HCM 1606 veh/hr/ln

FIGURE 8 Saturation flow rate obtained by accepted parameter sets from KS test.