DCA Disaggregate 1

Embed Size (px)

Citation preview

  • 7/28/2019 DCA Disaggregate 1

    1/66

    Discrete choice models have played an important role in transportation modeling for the last 25

    years. They are namely used to provide a detailed representation of the complex aspects of

    transportation demand, based on strong theoretical justifications. Moreover, several packagesand tools are available to help practionners using these models for real applications, making

    discrete choice models more and more popular.

    Discrete choice models are powerful but complex. The art of finding the appropriate model for a

    particular application requires from the analyst both a close familiarity with the reality underinterest and a strong understanding of the methodological and theoretical background of the

    model.

    The main theoretical aspects of discrete choice models are reviewed in this paper. The mainassumptions used to derive discrete choice models in general, and random utility models in

    particular, are covered in detail. The Multinomial Logit Model, the Nested Logit Model and the

    Generalized Extreme Value model are also discussed.

    In the context of transportation demand analysis, disaggregate models have played an importantrole these last 25 years. These models consider that the demand is the result of several decisions

    of each individual in the population under consideration. These decisions usually consist of a

    choice made among a finite set of alternatives. An example of sequence of choices in the context

    of transportation demand is described in Figure 1: choice of an activity (play-yard), choice ofdestination (6th street), choice of departure time (early), choice of transportation mode (bike) and

    choice of itinerary (local streets). For this reason, discrete choice models have been extensively

    used in this context.

    Figure 1: A sequence of choices

    A model, as a simplified description of the reality, provides a betterunderstandingof complex

    systems. Moreover, it allows for obtainingprediction of future states of the considered system,controllingorinfluencingits behavior and optimizingits performances.

    The complex system under consideration here is a specific aspect of human behavior dedicatedto choice decisions. The complexity of this ``system'' clearly requires many simplifying

    assumptions in order to obtain operational models. A specific model will correspond to a specific

    set of assumptions, and it is important from a practical point of view to be aware of these

    assumptions when prediction, control or optimization is performed.

    http://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexamplehttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexample
  • 7/28/2019 DCA Disaggregate 1

    2/66

    The assumptions associated with discrete choice models in general are detailed in Section2.

    Section 3 focuses specifically on assumptions related to random utility models. Some of the most

    used models, the Multinomial Logit Model (Section4), the Nested Logit Model (Section 5) andthe Generalized Extreme value Model (Section6), are then introduced, with special emphasis on

    the Nested Logit model.

    Among the many publications that can be found in the literature, we refer the reader to Ben-

    Akiva and Lerman (1985), Anderson, De Palma and Thisse (1992), Hensher and Johnson (1981)and Horowitz, Koppelman and Lerman (1986) for more comprehensive developments.

    In order to develop models capturing how individuals are making choices, we have to make

    specific assumptions. We will distinguish here among assumptions about

    1. the decision-maker: these assumptions define who is the decision-maker, and what are

    his/her characteristics;2. the alternatives: these assumptions determine what are the possible options of the

    decision-maker;3. the attributes: these assumptions identify the attributes of each potential alternative that

    the decision-maker is taking into account to make his/her decision;

    4. the decision rules: they describe the process used by the decision-maker to reach his/herchoice.

    In order to narrow down the huge number of potential models, we will consider some of these

    assumptions as fixed throughout the paper. It does not mean that there is no other valid

    assumption, but we cannot cover everything in this context. For example, even if continuous

    models will be briefly described, discrete models will be the primary focus of this paper.

    Decision-maker

    As mentioned in the introduction, choice models are referred to as disaggregate models. It meansthat the decision-maker is assumed to be an individual. In general, for most practical

    applications, this assumption is not restrictive. The concept of ``individual'' may easily been

    extended, depending on the particular application. We may consider that a group of persons (a

    household or a government, for example) is the decision-maker. In doing so, we decide to ignore

    all internal decisions within the group, and to consider only the decision of the group as a whole.The example described in Figure 1 reflects the decisions of a household, without accounting for

    all potential negotiations among the parents and the children. We will refer to ``decision-maker''and individual'' interchangeably throughout the rest of the paper.

    http://roso.epfl.ch/mbi/papers/discretechoice/node2.html#secassumptionshttp://roso.epfl.ch/mbi/papers/discretechoice/node2.html#secassumptionshttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#secnestedhttp://roso.epfl.ch/mbi/papers/discretechoice/node16.html#secGEVhttp://roso.epfl.ch/mbi/papers/discretechoice/node16.html#secGEVhttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexamplehttp://roso.epfl.ch/mbi/papers/discretechoice/node2.html#secassumptionshttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#secnestedhttp://roso.epfl.ch/mbi/papers/discretechoice/node16.html#secGEVhttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexample
  • 7/28/2019 DCA Disaggregate 1

    3/66

    Because of its disaggregate nature, the model has to include the characteristics, or attributes, of

    the individual. Many attributes, like age, gender, income, eyes color or social security number

    may be considered in the model .

    The analyst has to identify those that are likely to explain the choice of the individual. There isno automatic process to perform this identification. The knowledge of the actual application and

    the data availability play an important role in this process.

    Alternatives

    Analyzing the choice of an individual requires the knowledge of what has been chosen, but also

    of what has notbeen chosen. Therefore, assumptions must be made about options, or

    alternatives, that were considered by the individual to perform the choice. The set containingthese alternatives, called the choice set, must be characterized.

    The characterization of the choice set depends on the context of the application. If we consider

    the example described in Figure 2, the time spent on each Internet site may be anything, as far as

    the total time is not more than two hours. The resulting choice set is represented in Figure3,and is defined by

    It is a typical example of a continuous choice set, where the alternatives are defined by some

    constraints and cannot be enumerated.

    Figure 2: Choice on Internet

    http://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figexamplecontinuoushttp://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figcontinuoushttp://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figcontinuoushttp://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#782http://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figexamplecontinuoushttp://roso.epfl.ch/mbi/papers/discretechoice/node4.html#figcontinuous
  • 7/28/2019 DCA Disaggregate 1

    4/66

    Figure 3: Example of a continuous choice set

    In this paper, we focus on discrete choice sets. A discrete choice set contains a finite number ofalternatives that can be explicitly listed. The corresponding choice models are called discrete

    choice models. The choice of a transportation mode is a typical application leading to a discrete

    choice set. In this context, the characterization of the choice set consists in the identification ofthe list of alternatives. To perform this task, two concepts of choice set are considered: the

    universalchoice set and the reducedchoice set.

    The universal choice set contains all potential alternatives in the context of the application.

    Considering the mode choice in the example of Figure 1, the universal choice set may contain allpotential transportation modes, like walk, bike, bus, car, etc. The alternative plane, which is

    also a transportation mode, is clearly not an option in this context and, therefore, is not included

    in the universal choice set.

    The reduced choice set is the subset of the universal choice set considered by a particularindividual. Alternatives in the universal choice set that are not available to the individual under

    consideration are excluded (for example, the alternative car may not be an option for individuals

    without a driver license). The awareness of the availability of the alternative by the decision-

    maker should be considered as well. The reader is referred to Swait (1984) for more details onchoice set generation. In the following, ``choice set'' will refer to the reduced choice set, except

    when explicitly mentioned.

    Attributes

    http://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexamplehttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexamplehttp://roso.epfl.ch/mbi/papers/discretechoice/node1.html#figexample
  • 7/28/2019 DCA Disaggregate 1

    5/66

    Each alternative in the choice set must be characterized by a set of attributes. Similarly to the

    characterization of the decision-maker described in Section2.1, the analyst has to identify theattributes of each alternatives that are likely to affect the choice of the individual. In the context

    of a transportation mode choice, the list of attributes for the mode car could include the traveltime, the out-of-pocket cost and the comfort. The list forbus could include the travel time, the

    out-of-pocket cost, the comfort and the bus frequency. Note that some attributes may be generic

    to all alternatives, and some may be specific to an alternative (bus frequency is specific to bus).

    Also, qualitative attributes, like comfort, may be considered.

    An attribute is not necessarily a directly observed quantity. It can be any function of availabledata. For example, instead of considering travel time as an attribute, the logarithm of the travel

    time may be considered. The out-of-pocket cost may be replaced by the ratio between the out-of-

    pocket cost and the income of the individual. The definition of attributes as a function ofavailable data depends on the problem. Several definitions must usually be tested to identify the

    most appropriate.

    Decision rules

    At this point, we have identified and characterized both the decision-maker and all availablealternatives. We will now focus on the assumptions about the rules used by the decision-maker to

    come up with the actual choice. Different sets of assumptions can be considered, that leads to

    different family of models. We will describe here three theories on decision rules, and thecorresponding models. The neoclassical economic theory, described in Section2.4.1, introduces

    the concept ofutility. The Luce model (Section2.4.2) and the random utility models (introducedin Section2.4.3 and developed in Section 3) are designed to capture uncertainty.

    Neoclassical Economic Theory

    The neoclassical economic theory assumes that each decision-maker is able to compare two

    alternatives a and b in the choice set using a preference-indifference operator . If , thedecision-maker either prefers a to b, or is indifferent. The preference-indifference operator is

    supposed to have the following properties:

    1. Reflexivity:

    http://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#secneoclassicalhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#secneoclassicalhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#secneoclassicalhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrum
  • 7/28/2019 DCA Disaggregate 1

    6/66

    2. Transitivity:

    3. Comparability:

    Because the choice set is finite, the existence of an alternative which is preferred to all of them

    is guaranteed, that is

    More interestingly, and because of the three properties listed above, it can be shown that the

    existence of a function

    such that

    is guaranteed. Therefore, the alternative defined in (2) may be identified as

    It results that using the preference-indifference operator to make a choice is equivalent toassigning a value, called utility, to each alternative, and selecting the alternative associated

    with the highest utility.

    The concept of utility associated with the alternatives plays an important role in the context of

    discrete choice models. However, the assumptions of neoclassical economic theory presents

    strong limitations for practical applications. Indeed, the complexity of human behavior suggeststhat a choice model should explicitly capture some level of uncertainty. The neoclassical

    economic theory fails to do so.

    The exact source of uncertainty is an open question. Some models assume that the decision rules

    are intrinsically stochastic, and even a complete knowledge of the problem would not overcomethe uncertainty. Others consider that the decision rules are deterministic, and motivate the

    uncertainty from the impossibility of the analyst to observe and capture all dimensions of the

    problem, due to its high complexity. Anderson et al. (1992) compare this debate with the one

    between Einstein and Bohr, about the uncertainty principle in theoretical physics. Bohr arguedfor the intrinsic stochasticity of nature and Einstein claimed that ``Nature does not play dice''.

    http://roso.epfl.ch/mbi/papers/discretechoice/node7.html#eqprefalthttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#eqprefalt
  • 7/28/2019 DCA Disaggregate 1

    7/66

    Two families of models can be derived, depending on the assumptions about the source of

    uncertainty. Models with stochastic decision rules, like the model proposed by Luce (1959),

    described in Section 2.4.2, or the ``elimination by aspects'' approach, proposed by Tverski(1972), assumes a deterministic utility and a probabilistic decision process. Random Utility

    Models, introduced in Section 2.4.3 and developed in Section3, are based on the deterministic

    decision rules from the neoclassical economic theory, where uncertainty is captured by randomvariables representing utilities.

    The Luce model

    An important characteristic of models dealing with uncertainty is that, instead of identifying one

    alternative as the chosen option, they assign to each alternative aprobability to be chosen.

    Luce (1959) proposed the choice axiom to characterize a choice probability law. The choice

    axiom can be stated as follow.

    Denoting the probability of choosing a in the choice set , and the probability ofchoosing one element of the subset within , the two following properties hold for any choice

    set , and , such that .

    1. If an alternative is dominated, that is if there exists such that b is always

    preferred to a or, equivalently, , then removing a from does not modify

    the probability of any other alternative to be chosen, that is

    2. If no alternative is dominated, that is if for all , then the

    choice probability is independent from the sequence of decisions, that is

    The independence described by (7) can be illustrated using a example of transportation mode

    choice, where we consider Car, Bike, Bus . We apply two different assumptions tocompute the probability of choosing ``car'' as a transportation mode.

    1. The decision-maker may decide first to use a motorized mode (car or bus, in this case).

    The probability of choosing ``car'' is then given by

    http://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#seclucehttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#secintrorumhttp://roso.epfl.ch/mbi/papers/discretechoice/node10.html#secrumhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominated
  • 7/28/2019 DCA Disaggregate 1

    8/66

    2. Alternatively, the decision-maker may decide first to use a private transportation mode

    (car or bike, in this case). The probability of choosing ``car'' is then given by

    Equation (7) of the choice axiom imposes that both assumptions produce the same probability,

    that is

    The second part of the choice axiom can be interpreted in a different way. Luce (1959) has

    shown that (7) is a sufficient and necessary condition for the existence of a function

    , such that, for all , we have

    Also, function v is unique up to a proportionality factor. If there exists verifying(11), then

    where . Similarly to (3), may be interpreted as a utility function. We will elaborate

    more on this result in Section 4.

    Random Utility Models

    Random utility models assume, as neoclassical economic theory, that the decision-maker has a

    perfect discrimination capability. In this context, however, the analyst is supposed to have

    incomplete information and, therefore, uncertainty must be taken into account. Manski (1997)identifies four different sources of uncertainty: unobserved alternative attributes, unobserved

    individual attributes (called ``unobserved taste variations'' by Manski, 1997), measurement errorsand proxy, or instrumental, variables.

    The utility is modeled as a random variable in order to reflect this uncertainty. More specifically,

    the utility that individual i is associating with alternative a is given by

    http://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucemultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#eqneoutilityhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucemultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node7.html#eqneoutilityhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#secmultinomial
  • 7/28/2019 DCA Disaggregate 1

    9/66

    where is the deterministic part of the utility, and is the stochastic part, capturing the

    uncertainty. Similarly to the neoclassical economic theory, the alternative with the highest utility

    is supposed to be chosen. Therefore, the probability that alternative a is chosen by decision-makeri within choice set is

    Random utility models are the most used discrete choice models for transportation applications.

    Therefore, the rest of the paper is devoted to them.

    Random utility models

    The derivation of random utility models is based on a specification of the utility as defined by

    (13). Different assumptions about the random term and the deterministic term will produce

    specific models. We present here the most usual assumptions that are used in practice. In

    Section 3.1, common assumptions about the random part of the utility are discussed. Thedeterministic part is treated in Section 3.2

    Assumptions on the random term

    We will focus here on assumptions about the mean, the variance and the functional form of the

    random term.

    Figure 4: A binary model

    http://roso.epfl.ch/mbi/papers/discretechoice/node9.html#equtilityhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#secrandomhttp://roso.epfl.ch/mbi/papers/discretechoice/node12.html#secdeterministichttp://roso.epfl.ch/mbi/papers/discretechoice/node12.html#secdeterministichttp://roso.epfl.ch/mbi/papers/discretechoice/node9.html#equtilityhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#secrandomhttp://roso.epfl.ch/mbi/papers/discretechoice/node12.html#secdeterministic
  • 7/28/2019 DCA Disaggregate 1

    10/66

    For all practical purposes, the mean of the random term is usually supposed to be zero. It can be

    shown that this assumption is not restrictive. We do it here on a simple example. Considering the

    example described in Figure 4, we denote the mean of the error term of each alternative by

    and , respectively. Then, the error terms can be specified as

    and

    where and are random variables with zero mean. Therefore,

    The terms and , called Alternative Specific Constants (ASC), are capturing the mean ofthe error term. Therefore, it can be assumed without loss of generality, that the error terms have

    zero mean if the model specification includes these ASCs.

    In practice, it is impossible to estimate the value of all ASCs from observed data. Considering

    again the example of Figure4, the probability of choosing alternative 1, say, is not modified if anarbitrary constantKis added to both utilities. Therefore, only the difference between the two

    ASCs can be identified. Indeed, from (17), we have

    for any . If , we obtain

    or, equivalently, defining ,

    Defining produces the same result. This property can be generalized easily to models

    with more than two alternatives, where only differences between ASCs can be identified.

    http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqasc0http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqasc0
  • 7/28/2019 DCA Disaggregate 1

    11/66

    It is common practice to constrain one ASC in the model to zero. From a modeling viewpoint,

    the choice of the particular alternative whose ASC is constrained is purely arbitrary. However,

    Bierlaire, Lotan and Toint (1997) have shown that the estimation process is influenced by thischoice. They propose a different technique of ASC specification which is optimal from an

    estimation perspective.

    To derive assumptions about the variance of the random term, we observe that the scale of the

    utility may be arbitrarily specified. Indeed, for any , we have

    The arbitrary decision about is equivalent to assuming a particular variance v of the distribution

    of the error term. Indeed, if

    we have also

    We will illustrate this relationship with several examples in the remaining of this section.

    Once assumptions about the mean and the variance of the error term distribution have beendefined, the focus is now on the actual functional form of this distribution. We will consider here

    three different distributions yielding to three different families of models: linear, probit and logitmodels.

    The linear model is obtained from the assumption that the density function of the error term is

    given by

    where , is an arbitrary constant. This density function is used to derive the

    probability of choosing one particular alternative. Considering the example presented inFigure4, the probability is given by (23) (see Figure 5).

    http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqlinearhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figlinearhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqlinearhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figlinear
  • 7/28/2019 DCA Disaggregate 1

    12/66

    Figure 5: Linear model

    The linear model presents some problem for real applications. First, the probability associated

    with extreme values ( in the example) is exactly zero. Therefore, if any extreme

    event happens in the reality, the model will never capture it. Second, the discontinuity of thederivatives at -L andL causes problems to most of the estimation procedures. We conclude the

    presentation of the linear model by emphasizing that the constantL determines the scale of the

    distribution. For the binary example, . Using (21), we have that

    assuming is equivalent to assuming . A common value for

    L is 1/2, that is .

    The Normal Probability Unit, or Probit, model is derived from the assumption that the error

    terms are normally distributed, that is

    where is an arbitrary constant. This density function is used to derive the

    probability of choosing one particular alternative. Considering the example presented in

    http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2
  • 7/28/2019 DCA Disaggregate 1

    13/66

    Figure4, and assuming that and are normally distributed with zero mean, variances and

    respectively, and covariance , the probability is given by (25) (see Figure 6).

    where is the variance of

    Figure 6: Probit model

    The probit model is motivated by the Central Limit Theorem , assuming that the error terms

    are the sum of independent unobserved quantities. Unfortunately, the probability function (25)has no closed analytical form, which limits practical use of this model. We refer the reader to

    Daganzo (1979) for a comprehensive development of probit models. We conclude this shortintroduction of the probit model by looking at the scale parameter. Considering again the binary

    example presented in Figure 4in the probit context, we have . Using (21),

    we have that assuming is equivalent to assuming . It iscommon practice to arbitrary define , that is .

    http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figprobithttp://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#856http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2http://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#856http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqprobithttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2
  • 7/28/2019 DCA Disaggregate 1

    14/66

    Despite its complexity, the probit model has been applied to many practical problems (see

    Whynes, Reedand and Newbold, 1996, Bolduc, Fortin and Fournier, 1996, Yai, Iwakura and

    Morichi, 1997 among recent publications). However, the most widely used model in practicalapplications is probably the Logistic Probability Unit, or Logit, model. The error terms are now

    assumed to be independent and identically Gumbel distributed. The density function of the

    Gumbel distribution is given by (26) (see Figure 7).

    where is the location parameter, and is the scale parameter.

    Figure 7: Gumbel distribution

    The mean of the Gumbel distribution is

    where

    is the Euler constant. The variance is

    http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqdensitygumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figdensitygumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqdensitygumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figdensitygumbel
  • 7/28/2019 DCA Disaggregate 1

    15/66

    The Gumbel distribution is an approximation of the Normal law, as shown in Figure8, where theplain line represents the Normal distribution, and the dotted line the Gumbel distribution.

    Figure 8: Comparison between Normal and Gumbel distribution

    We derive the probability function for the binary example of Figure 4from the followingproperty of the Gumbel distribution. If is Gumbel distributed with location parameter and

    scale parameter , and is Gumbel distributed with location parameter and scale parameter

    , then follows a Logistic distribution with location parameter and scale

    parameter (the name of the Logit model comes from this property). The density function of the

    Logistic distribution is given by

    where is the scale parameter. As a consequence, we have,

    or, equivalently,

    http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#fignormalgumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#fignormalgumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinaryhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#fignormalgumbelhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#figbinary
  • 7/28/2019 DCA Disaggregate 1

    16/66

    In order to determine the relationship between the scale parameter and the variance of the

    distribution, we compute . Using (21),

    we have that assuming is equivalent to assuming . It is

    common practice to arbitrary define , that is .

    In most cases, the arbitrary decision about the scale parameter does not matter and can be safelyignored. But it is important not to completely forget its existence. Indeed, it may sometimes play

    an important role. For example, utilities derived from different models can be compared only if

    the value of is the same for all of them. It is usually not the case with the scale parameterscommonly used in practice, as shown in Table1. Namely, a utility estimated with a logit model

    has to be divided by before being compared with a utility estimated with a probit model.

    Table 1: Model comparison

    The list of models presented here above is not exhaustive. Other assumptions about thedistribution of the error term will lead to other families of models. For instance, Ben-Akiva and

    Lerman (1985) cite the arctan and the truncated exponential models. These models are not often

    used in practice and we will not consider them here.

    Assumptions on the deterministic term

    The utility of each alternative must be a function of the attributes of the alternative itself and of

    the decision-maker identified in Sections2.1 and 2.3. We can write the deterministic part of theutility that individual i is associating with alternative a as

    http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#tabcomparisonhttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#tabcomparisonhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributeshttp://roso.epfl.ch/mbi/papers/discretechoice/node11.html#eqscale2http://roso.epfl.ch/mbi/papers/discretechoice/node11.html#tabcomparisonhttp://roso.epfl.ch/mbi/papers/discretechoice/node3.html#secdecisionmakerhttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributes
  • 7/28/2019 DCA Disaggregate 1

    17/66

    where is a vector containing all attributes, both of individual i and alternative a. The function

    defined in (33) is commonly assumed to be linear in the parameters, that is, ifn attributes are

    considered,

    where are parameters to be estimated. This assumption simplifies the formulation

    and the estimation of the model, and is not as restrictive as it may seem. Indeed, nonlinear effects

    can still be captured in the attributes definition, as mentioned in Section2.3.

    Multinomial logit model

    As introduced in the previous section, the logit model is derived from the assumption that the

    error terms of the utility functions are independent and identically Gumbel distributed. These

    models were first introduced in the context of binary choice models, where the logisticdistribution is used to derive the probability. Their generalization to more than two alternative is

    referred to as multinomiallogit models.

    If the error terms are independent and identically Gumbel distributed, with location parameter 0

    and scale parameter , the probability that a given individual choose alternative i within is

    given by

    The derivation of this result is attributed to Holman and Marley by Luce and Suppes (1965). We

    refer the reader to Ben-Akiva and Lerman (1985) and Anderson et al. (1992) for additional

    details.

    It is interesting to note that the multinomial logit model can also be derived from the choice

    axiom defined by (6) and (7). Indeed, defining and , we have that (11) isequivalent to (35).

    An important property of the multinomial logit model is the Independence from Irrelevant

    Alternatives (IIA). This property can be stated as follows. The ratio of the probabilities of any

    two alternatives is independent from the choice set. That is, for any choice sets and such that

    , for any alternative and in , we have

    http://roso.epfl.ch/mbi/papers/discretechoice/node12.html#eqdeterministicutilhttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributeshttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributeshttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucedominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucemultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node12.html#eqdeterministicutilhttp://roso.epfl.ch/mbi/papers/discretechoice/node5.html#secattributeshttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucedominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucenondominatedhttp://roso.epfl.ch/mbi/papers/discretechoice/node8.html#eqlucemultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomial
  • 7/28/2019 DCA Disaggregate 1

    18/66

    This result can be proven easily using (35). Ben-Akiva and Lerman (1985) propose an equivalentdefinition: The ratio of the choice probabilities of any two alternatives is entirely unaffected by

    the systematic utilities of any other alternatives.

    The IIA property of multinomial logit models is a limitation for some practical applications. This

    limitation is often illustrated by the red bus/blue bus paradox (see, for example, Ben-Akiva andLerman, 1985) in the modal choice context. We prefer here the path choice example presented in

    Figure9.

    Figure 9: A path choice example

    The probability provided by the multinomial logit model (35) for this example are

    which is not consistent with the intuitive result. This situation appears in choice problems with

    significantly correlated alternatives, as it is clearly the case in the example. Indeed, alternatives2a and 2b are so similar that their utilities share many unobserved attributes of the path and,

    therefore, the assumption of independence of the random part of these utilities is not valid in thiscontext.

    The Nested Logit Model, presented in the next section, partly overcomes this limitation of the

    multinomial logit model

    Nested logit model

    http://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiiahttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiiahttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomialhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiiahttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eqmultinomial
  • 7/28/2019 DCA Disaggregate 1

    19/66

    The nested logit model, first derived by Ben-Akiva (1973), is an extension of the multinomial

    logit model designed to capture correlations among alternatives. It is based on the partitioning of

    the choice set into several nests such that

    and

    The utility function of each alternative is composed of a term specific to the alternative, and a

    term associated with the nest. If , we have

    The error terms and are supposed to be independent. As for the multinomial logit model,

    error terms are supposed to be independent and identically Gumbel distributed, with scale

    parameter . The distribution of is such that the random variable is Gumbel

    distributed with scale parameter .

    Each nest within the choice set is associated with a pseudo-utility, called composite utility,expected maximum utility, inclusive value oraccessibility in the literature. The composite utility

    for nest is defined as

    where is the component of the utility which is common to all alternatives in the nest .

    The probability model is then given by

    where

  • 7/28/2019 DCA Disaggregate 1

    20/66

    and

    The parameters and reflect the correlation among alternatives in the nest . Indeed, if

    , we have

    Clearly, we have

    Ben-Akiva and Lermand (1985) derive condition (46) directly from utility theory. Note also that

    if , we have .

    The parameters and are closely related in the model. Actually, only their ratio is meaningful.It is not possible to identify them separately. A common practice is to arbitrarily constrain one of

    them to a value (usually 1). The impacts of this arbitrary decision on the model are briefly

    discussed in Section 5.1. We illustrate here the Nested Logit Model with the path choice example

    described in Figure 9. First, the choice set is divided into and

    . The deterministic components of the utilities are , ,

    and . The composite utilities of each nest are

    and

    The probability of choosing each nest is then

    http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/node15.html#secnormnlmhttp://roso.epfl.ch/mbi/papers/discretechoice/node15.html#secnormnlmhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiiahttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/node15.html#secnormnlmhttp://roso.epfl.ch/mbi/papers/discretechoice/node13.html#figiia
  • 7/28/2019 DCA Disaggregate 1

    21/66

    and

    where the value of has been assumed to be 1, without loss of generality. The probability ofeach alternative is then computed. We obtain

    and

    The values of , and as a function of are plotted on Figure 10. From

    (46), we have that because has been arbitrarily defined as 1. We observe that,

    when , the nested logit model produces the same results as the multinomial logit model

    (37), and all probabilities are . On the other hand, when goes to infinity, and goes to 0,

    the probability of each nest is closer and closer to 1/2. At the limit, the model is becoming abinary choice model, where the small detours a and b are ignored in the choice process.

    http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#figprobnestedhttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eq13http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#figprobnestedhttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/node13.html#eq13
  • 7/28/2019 DCA Disaggregate 1

    22/66

    Figure 10: Probability of each alternative as a function of .

    Normalization of nested logit models

    In order to compute the probabilities in the previous example, we have arbitrarily decided to

    constraint to 1. Alternatively, we could have decided to constraint to 1. It is easy to show

    that, in this case, we have

    and

  • 7/28/2019 DCA Disaggregate 1

    23/66

    which is equivalent to (51) and (52), replacing by .

    A model where the scale parameter is arbitrarily constrained to 1 is said to be ``normalized

    from the top''. A model where one of the parameters is constrained to 1 is said to be

    ``normalized from the bottom''. The latter may produce a simpler formulation of the model. Weillustrate it using the example of Figure 11.

    Figure 11: A mode choice example

    We have

    and

    If we impose , we can define , , and to obtain

    the following expressions.

    and

    http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqexprob1http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqexprob2http://roso.epfl.ch/mbi/papers/discretechoice/node15.html#figexnormalizehttp://roso.epfl.ch/mbi/papers/discretechoice/node15.html#figexnormalizehttp://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqexprob1http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqexprob2http://roso.epfl.ch/mbi/papers/discretechoice/node15.html#figexnormalize
  • 7/28/2019 DCA Disaggregate 1

    24/66

    This formulation, proposed by Daly (1987), simplifies the estimation process. For this reason, it

    has been adopted in estimation packages like ALOGIT (Daly, 1987) or HieLoW (Bierlaire, 1995,

    Bierlaire and Vandevyvere, 1995).

    We emphasize here that this formulation should be used with caution when the same parameters

    are present in more than one nest. In this case, specific techniques, inspired from artificial treesproposed by Bradley and Daly (1991) must be used to obtain a correct specification of the

    model. The description of these techniques is out of the scope of this paper.

    A direct extension of the nested logit model consists in partionning some or all nests into sub-

    nests, which can, in turn, be divided into sub-nests. Because of the complexity of these models,

    their structure is usually represented as a tree, as suggested by Daly (1987). Clearly, the numberof potential structures, reflecting the correlation among alternatives, can be very large. No

    technique has been proposed thus far to identify the most appropriate correlation structure

    directly from the data.

    We conclude our introduction of nested logit models by mentioning their limitations. Thesemodels are designed to capture choice problems where alternatives within each nestare

    correlated. No correlation across nests can be captured by the Nested Logit Model. When

    alternatives cannot be partitioned into well separated nests to reflect their correlation, Nested

    Logit Models are not applicable. This is the case for most route choice problems. Several modelswithin the ``logit family'' have been designed to capture specific correlation structures. For

    example, Cascetta (1996) captures overlapping paths in a route choice context using

    commonality factors, Koppelman and Wen (1997) capture correlation between pair ofalternatives, and Vovsha (1997) proposes a cross-nested model allowing alternatives to belong to

    more than one nest. The two last models are derived from the Generalized Extreme Value model,

    presented in the next section.

    Generalized extreme value model

    The Generalized Extreme Value (GEV) model has been introduced by McFadden (1978) in the

    context of residential location. This general model actually consists in a large family of modelsthat are consistent with random utility theory. The probability of choosing alternative i within

    is given by

    where is a differentiable function with the following properties.

  • 7/28/2019 DCA Disaggregate 1

    25/66

    1. for all ,

    2. G is homogeneous of degree , that is , for all ,

    3. for all i such that , and

    4. the kth partial derivative with respect to kdistinct is non-negative ifkis odd, and non-

    positive ifkis even, that is, such that if and

    if and , we have

    As an example, we consider

    which has the required properties, as it can be easily verified. Then,

    which is the multinomial logit model. Similarly, the nested logit model can be derived with

    It can be shown that property 4holds if , which is consistent with condition (46).

    The Generalized Extreme Value model provides a nice theoretical framework for thedevelopment of new discrete choice models, like Koppelman and Wen (1997) and Vovsha

    (1997) .

    Conclusion

    We have covered in this paper the main theoretical aspects of discrete choice models in general,and random utility models in particular. A good awareness of underlying assumptions is

    necessary for an efficient use of these models for practical applications. In particular, we have

    focused on the location parameters and the scale parameters in multinomial and nested logit

    http://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#2577http://roso.epfl.ch/mbi/papers/discretechoice/node16.html#prop4http://roso.epfl.ch/mbi/papers/discretechoice/node16.html#prop4http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01http://roso.epfl.ch/mbi/papers/discretechoice/footnode.html#2577http://roso.epfl.ch/mbi/papers/discretechoice/node16.html#prop4http://roso.epfl.ch/mbi/papers/discretechoice/node14.html#eqscaleparam01
  • 7/28/2019 DCA Disaggregate 1

    26/66

    models. Despite its importance, the role of these parameters tend to be underestimated by

    practitioners. This may lead to incorrect specifications of the models, or incorrect interpretation

    of the results.

    AcknowledgmentsThis paper is based on a lecture given at the NATO Advanced Studies Institute Operations

    Research and Decision Aid Methodologies in Traffic and Transportation Management,Balatonfured, Hungary, March 1997. Comments from the students and other lecturers of the ASI

    have been very useful to write this paper. Moreover, I am very grateful to Moshe Ben-Akiva and

    John Bowman for their valuable discussions and comments.

    References

    1

    Simon P. Anderson, And de Palma, and Jacques-Franois Thisse.Discrete ChoiceTheory of Product Differentiation. MIT Press, Cambridge, Ma, 1992.

    2

    M. E. Ben-Akiva. Structure of passenger travel demand models. PhD thesis, Department

    of Civil Engineering, MIT, Cambridge, Ma, 1973.

    3M. E. Ben-Akiva and S. R. Lerman.Discrete Choice Analysis: Theory and Application to

    Travel Demand. MIT Press, Cambridge, Ma., 1985.

    4

    Moshe Ben-Akiva and B. Franois. homogeneous generalized extreme value model.

    Working paper, Department of Civil Engineering, MIT, Cambridge, Ma, 1983.

    5M. Bierlaire. A robust algorithm for the simultaneous estimation of hierarchical logit

    models. GRT Report 95/3, Department of Mathematics, FUNDP, 1995.

    6

    M. Bierlaire, T. Lotan, and Ph. L. Toint. On the overspecification of multinomial and

    nested logit models due to alternative specific constants. Transportation Science, 1997.

    (forthcoming).

    7

    M. Bierlaire and Y. Vandevyvere.HieLoW: the interactive user's guide. Transportation

    Research Group - FUNDP, Namur, 1995.

    8

    Denis Bolduc, Bernard Fortin, and Marc-Andre Fournier. The effect of incentive policieson the practice location of doctors: A multinomial probit analysis.Journal of laboreconomics, 14(4):703, 1996.

    9

    M. A. Bradley and A.J. Daly. Estimation of logit choice models using mixed stated

    preferences and revealed preferences information. InMethods for understanding travelbehaviour in the 1990's, pages 116-133, Qubec, mai 1991. International Association for

    Travel Behaviour. 6th international conference on travel behaviour.

  • 7/28/2019 DCA Disaggregate 1

    27/66

    10

    Ennio Cascetta. A modified logit route choice model overcoming path overlapping

    problems. Specification and some calibration results for interurban networks. InProceedings of the 13th International Symposium on the Theory of Road Traffic Flow

    (Lyon, France), 1996.

    11 C. F. Daganzo.Multinomial Probit: The theory and its application to demand

    forecasting. Academic Press, New York, 1979.

    12A. Daly. Estimating ``tree'' logit models. Transportation Research B, 21(4):251-268,

    1987.

    13

    D. A. Hensher and L. W. Johnson.Applied discrete choice modelling. Croom Helm,London, 1981.

    14

    J. L. Horowitz, F. S. Koppelman, and S. R. Lerman.A self-instructing course in

    disaggregate mode choice modeling. Technology Sharing Program, US Department ofTransportation, Washington, D.C. 20590, 1986.

    15F. S. Koppelman and Chieh-Hua Wen. The paired combinatorial logit model: properties,

    estimation and application. Transportation Research Board, 76th Annual Meeting,

    Washington DC, January 1997. Paper #970953.

    16

    R. Luce.Individual choice behavior: a theoretical analysis. J. Wiley and Sons, New

    York, 1959.

    17R. D. Luce and P. Suppes. Preference, utility and subjective probabiblity. In R. D. Luce,

    R. R. Bush, and E. Galanter, editors,Handbook of Mathematical Psychology, New York,

    1965. J. Wiley and Sons.

    18

    C. Manski. The structure of random utility models. Theory and Decision, 8:229-254,

    1977.

    19

    Andrey Andreyevich Markov. Calculation of probabilities. Tip. Imperatorskoi Akademii

    Nauk, Sint Petersburg, 1900. (in Russian).

    20D. McFadden. Modelling the choice of residential location. In A. Karlquist et al., editor,

    Spatial interaction theory and residential location, pages 75-96, Amsterdam, 1978.

    North-Holland.

    21

    J. Swait.Probabilistic choice set formation in transportation demand models. PhD thesis,

    Department of Civil and Environmental Engineering, Massachussetts Institute ofTechnology, Cambridge, Ma, 1984.

    22

    A. Tversky. Elimination by aspects: a theory of choice. Psychological Review, 79:281-

    299, 1972.

  • 7/28/2019 DCA Disaggregate 1

    28/66

    23

    Peter Vovsha. Cross-nested logit model: an application to mode choice in the Tel-Aviv

    metropolitan area. Transportation Research Board, 76th Annual Meeting, WashingtonDC, January 1997. Paper #970387.

    24

    D.K. Whynes, G. Reedand, and P. Newbold. General practitioners' choice of referraldestination: A probit analysis.Managerial and Decision Economics, 17(6):587, 1996.

    25

    T. Yai, S. Iwakura, and S. Morichi. Multinomial probit with structured covariance forroute choice behavior. Transportation Research B, 31(3):195-208, June 1997.

    Chapter

    5

    Discrete Dependent Variable Models

    CHAPTER 5; SECTION A: LOGIT, NESTED LOGIT, & PROBIT

    Purpose of Logit, Nested Logit, and Probit Models:

    Logit, Nested Logit, and Probit models are used to model a relationship between a dependent

    variableY and one or more independent variables X. The dependent variable, Y, is a discrete

    variable that represents a choice, or category, from a set of mutually exclusive choices or

    categories. For instance, an analyst may wish to model the choice of automobile purchase (from aset of vehicle classes), the choice of travel mode (walk, transit, rail, auto, etc.), the manner of an

    automobile collision (rollover, rear-end, sideswipe, etc.), or residential location choice (high-density,

    suburban, exurban, etc.). The independent variables are presumed to affect the choice or category

    or the choice maker, and represent a priori beliefs about the causal or associative elements

    important in the choice or classification process. In the case ofordinal scale variables, an ordered

    logit or probitmodel can be applied to take advantage of the additional information provided by the

    ordinal over the nominal scale (not discussed here).

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#146http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#192http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#146http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#92http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#192http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174
  • 7/28/2019 DCA Disaggregate 1

    29/66

    1. Examples: An analyst wants to model:2. 1. The effect of household member characteristics, transportationnetwork

    characteristics, and alternativemodecharacteristics on choice of transportation mode;bus, walk, auto, carpool, single occupant auto, rail, or bicycle.

    3. 2. The effect of consumer characteristics on choice of vehicle purchase: sport utilityvehicle, van, auto, light pickup truck, or motorcycle.

    4. 3. The effect of traveler characteristics and employment characteristics on airlinecarrier choice; Delta, United Airlines, Southwest, etc.5. 4. The effect of involved vehicle types, pre-crash conditions, and environmental

    factors on vehicle crash outcome: property damage only, mild injury, severe injury,fatality.

    Basic Assumptions/Requirements of Logit, Nested Logit, and Probit Models:

    1) 1) The observations on dependentvariable Y are assumed to have been randomly sampled

    from thepopulationof interest (even for stratified samples or choice-based samples).

    2) 2) Y is caused by or associated with the Xs, and the Xs are determined by influences

    (variables) outside of the model.

    3) 3) There is uncertainty in the relation between Y and the Xs, as reflected by a scattering of

    observations around the functional relationship.

    4) 4) Thedistribution oferror terms must be assessed to determine if a selected model is

    appropriate.

    Inputs for Logit, Nested Logit, and Probit Models:

    Discrete variable Y is the observed choice or classification, such as brand selection, transportation

    modeselection, etc. For grouped data, where choices are observed for homogenous experimental

    units or observed multiple times per experimental unit, the dependent variable is proportion of

    choices observed.

    One or more continuous and/or discrete variables X, which describe the attributes of the choice

    maker or event and/or various attributes of the choices thought to be causal or influential in the

    decision or classification process.

    Outputs of Logit, Nested Logit, and Probit Models:

    Functional form of relation between Y and Xs.

    Strength ofassociation between Y and Xs (individual Xs and collective set of Xs).

    Proportion of choice or classification uncertainty explained by hypothesized relation.

    Confidence in predictions of future/other observations on Y given X.

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#17http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#288http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#17
  • 7/28/2019 DCA Disaggregate 1

    30/66

    Logit, Nested Logit, and Probit Methodology:

  • 7/28/2019 DCA Disaggregate 1

    31/66

    Examples of Logit, Nested Logit, and Probit:

    PavementsKoehne, Jodi, Fred Mannering, and Mark Hallenbeck (1996). Analysis of Trucker and MotoristOpinions Toward Truck-lane Restrictions. Transportation Research Record #1560 pp. 73-82.

    National Academy of Sciences.

    TrafficMannering, Fred, Jodi Koehne and Soon-Gwan Kim. (1995). Statistical Assesssment of Public

    Opinion Toward Conversion of General-Purpose Lanes to High-Occupancy Vehicle Lanes.

    TransportationResearch Record #1485 pp. 168-176. National Academy of Sciences.

    PlanningKoppelman, Frank S., and Chieh-Hua Wen (1998). Nested Logit Models: Which Are You Using?

    TransportationResearch Record #1645 pp. 1-9. National Academy of Sciences.

    Yai, Tetsuo, and Tetsuo Shimizu (1998). Multinomial Probit with Structured Covariance for ChoiceSituations with Similar Alternatives. Transportation Research Record #1645 pp. 69-75. National

    Academy of Sciences.

    McFadden, Daniel. Modeling the Choice of Residential Location. (1978). TransportationResearch

    Record #673 pp. 72-77. National Academy of Sciences.

    Horowitz, Joel L. (1984) Testing Disaggregate Travel Demand Models by Comparing Predicted

    and Observed Market Shares. Transportation Research Record #976 pp. 1-7. National Academy

    of Sciences.

    Interpretation of Logit, Nested Logit, and Probit:

    How is a choice modelequation interpreted?How do continuous andindicator variables differ in the choice model?How are beta coefficients interpreted?How is the Likelihood Ratio Test interpreted?How are t-statistics interpreted?How are phi and adjusted phi interpreted?How are confidence intervals interpreted?How are degrees of freedominterpreted?How are elasticities computed and interpreted?When is the independence of irrelevant alternatives (IIA) assumption violated?

    Troubleshooting: Logit, Nested Logit, and Probit:

    Shouldinteraction terms be included in the model?How many variables should be included in the model?What methods can be used to specify the relation between choice and the Xs?What methods are available for fixing heteroscedastic errors?

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#191http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#80http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#80http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#147http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#40http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#88http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#88http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#152http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#191http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#80http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#147http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#40http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#88http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#152
  • 7/28/2019 DCA Disaggregate 1

    32/66

    What methods are used for fixing serially correlated errors?What can be done to deal with multi-collinearity?What is endogeneity and how can it be fixed?How does one know if the errors are Gumbel distributed?

    Logit, Nested Logit, and Probit References:

    Ben Akiva, Moshe and Steven R. Lerman. Discrete Choice Analysis: Theory and

    Application to Predict Travel Demand. The MIT Press, Cambridge MA. 1985.

    Greene, William H. Econometric Analysis. MacMillan Publishing Company, New York,

    New York. 1990.

    Ortuzar, J. de D. and L. G. Willumsen. Modelling Transport. Second Edition. John Wiley

    and Sons, New York, New York. 1994.

    Train, Kenneth. Qualitative Choice Analysis: Theory, Econometrics, and an Application to

    Automobile Demand. The MIT Press, Cambridge MA. 1993.

    Logit, Nested Logit, and Probit Methodology:

    Postulate mathematical models from theory and past

    research.

    Discrete choice models (logit, nested logit, and probit) are used to develop models of behavioral

    choice or of event classification. It is accepted a priorithat the analyst doesnt know the complexity

    of the underlying relationships, and that any model of reality will be wrong to some degree. Choice

    models estimated will reflect the a prioriassumptions of the modeler as to what factors affect the

    decision process. Common applications of discrete choice models include choice of transportation

    mode, choice of travel destination choice, and choice of vehicle purchase decisions. There are

    many potential applications of discrete choice models, including choice of residential location,

    choice of business location, andtransportationproject contractor selection.

    In order to postulate meaningful choice models, the modeler should review past literature regarding

    the choice context and identify factors with potential to affect the decision making process. These

    factors should drive the data-collection processusually a survey instrument given to experimental

    units, to collect the information relevant in the decision making process. There is much written

    about survey design and data collection, and these sources should be consulted for detaileddiscussions of this complex and critical aspect of choice modeling

    Transportation Planning Example: An analyst is interested in modeling the mode choicedecision made by individuals in a region. The analyst reviews the literature and developsthe following list of potential factors influencing themodechoice decision for mosttravelers in the region.1. Trip maker characteristics (within the household context):Vehicle availability, possession of drivers license, household structure (stage of life-cycle),role in household, household income (value of time)

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173
  • 7/28/2019 DCA Disaggregate 1

    33/66

    1. 2. Characteristics of the journey or activity:Journey or activity purpose; work, grocery shopping, school, etc., time of day, accessibilityand proximity of activity destination2. 3. Characteristics of transport facility:Qualitative Factors; comfort and convenience,

    reliability and regularity, protection, securityQuantitative Factors; in-vehicle travel times, waiting and walking times, out-of-pocket

    monetary costs, availability and cost of parking, proximity/accessibility of transportmode

    Estimate choice models

    Qualitative choice analysis methods are used to describe and/or predict discrete choices of

    decision-makers or to classify a discrete outcome according to a host of regressors. The need to

    modelchoice and/or classification arises in transportation, energy, marketing, telecommunications,

    and housing, to name but a few fields. There are, as always, a set of assumptions or requirements

    about thedatathat need to be satisfied. The response variable (choice or classification) must meet

    the following three criteria.

    1. 1. The set of choices or classifications must be finite.

    2. 2. The set of choices or classifications must be mutually exclusive; that is, a

    particular outcome can only be represented by one choice or classification.

    3. 3. The set of choices or classifications must be collectively exhaustive, that is

    all choices or classifications must be represented by the choice set or

    classification.

    Even when the 2nd and 3rd criteria are not met, the analyst can usually re-define the set of

    alternatives or classifications so that the criteria are satisfied.

    Planning Example: An analyst wishing tomodelmode choice for commute decisionsdefines the choice set as AUTO, BUS, RAIL, WALK, and BIKE. The modeler observed a

    person in the database drove her personal vehicle to the transit station and then took abus, violating the second criteria. To remedy the modeling problem and similar problemsthat might arise, the analyst introduces some new choices (or classifications) into themodeling process: AUTO-BUS, AUTO-RAIL, WALK-BUS, WALK-RAIL, BIKE-BUS, BIKE-RAIL. By introducing these new categories the analyst has made the discrete choice datacomply with the stated modeling requirements.

    Deriving Choice Models from Random Utility Theory

    Choice models are developed from economic theories of random utility, whereas classification

    models (classifying crash type, for example) are developed by minimizing classification errors with

    respect to the Xs and classification levels Y. Because most of the literature in transportationis

    focused on choice models and because mathematically choice models and classification models

    are equivalent, the discussion here is based on choice models. Several assumptions are made

    when deriving discrete choice models from random utility theory:

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#272
  • 7/28/2019 DCA Disaggregate 1

    34/66

    1. 1. An individual is faced with a finite set of choices from which only one can be chosen.

    2. 2. Individuals belong to a homogenous population, act rationally, and possess perfect

    information and always select the option that maximizes their net personal utility.

    3. 3. If C is defined as the universal choice set of discrete alternatives, and J the number of

    elements in C, then each member of the population has some subset of C as his or her choiceset. Most decision-makers, however, have some subset Cn, that is considerably smaller than

    C. It should be recognized that defining a subset Cn, that is the feasible choice set for an

    individual is not a trivial task; however, it is assumed that it can be determined.

    4. 4. Decision-makers are endowed with a subset of attributes xn X, all measured attributes

    relevant in the decision making process.

    Planning Example: In identifying the choice set of travelmode the analyst identifies theuniversal choice set C to consist of the following:1. driving alone2. sharing a ride

    3. taxi 4. motorcycle5. bicycle6. walking7. transit bus8. light rail transit

    The analyst identifies a family whose choice set is fairly restricted because the do not owna vehicle, and so their choice set Cn is given by:1. 1. sharing a ride2. 2. taxi3. 3. bicycle4. 4. walking

    5. 5. transit bus6. 6. light rail transit

    The modeler, who is an OBSERVER of the system, does not possess complete information about

    all elements considered important in the decision making process by all individuals making a

    choice, so Utility is broken down into 2 components, V and :

    Uin = (Vin + in);

    where;

    Uin is the overall utility of choice i for individual n,Vin is the systematic or measurably utility which is a function of xn and i

    for individual n and choice i

    in includes idiosyncrasies and taste variations, combined with

    measurement or observations errors made by modeler, and is the randomutility component.

    The errorterm allows for a couple of important cases: 1) two persons with the same measured

    attributes and facing the same choice set make different decisions; 2) some individuals do not

    select the best alternative (from the modelers point of view it demonstrated irrational behavior).

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#203http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103
  • 7/28/2019 DCA Disaggregate 1

    35/66

    The decision maker n chooses the alternative from which he derives the greatest utility. In the

    binomial or two-alternative case, the decision-maker chooses alternative 1 if and only if:

    U1n U2n

    or when:

    V1n + 1n V2n + 2n.

    In probabilistic terms, the probability that alternative 1 is chosen is given by:

    Pr (1) = Pr (U1 U2)

    = Pr (V1 + 1 V2 + 2)

    = Pr (2 - 1 V1 - V2).

    Note that this equation looks like a cumulative distribution functionfor a probability density. That is,

    the probability of choosing alternative 1 (in the binomial case) is equal to the probability that the

    difference in random utility is less than or equal to the difference in deterministic utility.

    If = 2 - 1, which is the difference in unobserved utilities between alternatives 2 and 1 for travelers

    1 through N (subscript not shown), then the probability distribution or density of , (), can be

    specified to form specific classes of models.

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#95http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#95http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#95http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94
  • 7/28/2019 DCA Disaggregate 1

    36/66

    A couple of important observations about the probability density given by F (V1 - V2) can be made.

    1. 1. The error is small when there are large differences in systematic utility between

    alternatives one and two.

    2. 2. Large errors are likely when differences in utility are small, thus decision makers

    are more likely to choose an alternative on the wrong side of the indifference line(V1 - V2 = 0).

    Alternative 1 is chosen when V1 - V2 > 0 (or when > 0), and alternative 2 is chosen

    whenV1 - V2 < 0.

    Thus, for binomial models of discrete choice:

    .

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103
  • 7/28/2019 DCA Disaggregate 1

    37/66

    The cumulative distributionfunction, or CDF, typically looks like:

    V1 -V2

    This structure for the error term is a general result for binomial choice models. By making

    assumptions about the probability density of the residuals, the modeler can choose between

    several different binomial choice model formulations. Two types of binomial choice models are

    most common and found in practice: the logit and the probit models. The logit model assumes a

    logistic distribution of errors, and the probit model assumes a normal distributed errors. These

    models, however, are not practical for cases when there are more than two cases, and the probit

    modelis not easy to estimate (mathematically) for more than 4 to 5 choices.

    Mathematical Estimation of Choice Models

    Recall that choice models involve a response Y with various levels (a set of choices or

    classification), and a set of Xs that reflect important attributes of the choice decision or

    classification. Usually the choice or classification of Y is a modeled as a linear function or

    combination of the Xs. Maximum likelihood methods are employed to solve for the betas in choice

    models.

    Consider the likelihood of a sampleof N independent observations with probabilities p1, p2,,pn.

    The likelihood of the sample is simply the product of the individual likelihoods. The product is a

    maximum when the most likely set of ps is used.

    i.e. Likelihood L* = p1p2p3pn =

    For the binary choice model:

    L* = (1, , K) =

    where, Prn (i) is a function of the betas, and i and j are alternatives 1 and 2 respectively. It is

    generally mathematically simpler to analyze the logarithm ofL*, rather than the likelihood function

    itself. Using the fact that ln (z1z2) = ln (z1) + ln (z2), ln (z)x = x ln (z), Pr (j)=1-Pr (i), and yjn = 1 yin,

    the equation becomes:

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#232http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#232http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#232http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#103http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#94http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#232http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#232http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164
  • 7/28/2019 DCA Disaggregate 1

    38/66

    The maximum ofL is solved by differentiating the function with respect to each of the betas and

    setting the partial derivatives equal to zero, or the values of1, , K that provides the maximum

    ofL . In many cases the log likelihood function is globally concave, so that if a solution to the first

    order conditions exist, they are unique. This does not always have to be the case, however.Under general conditions the likelihood estimators can be shown to be consistent, asymptotically

    efficient, and asymptotically normal.

    In more complex and realistic models, the likelihood function is evaluated as before, but instead of

    estimating one parameter, there are many parameters associated with Xs that must be estimated,

    and there are as many equations as there are Xs to solve. In practice the probabilities that

    maximize the likelihood functionare likely to be different across individuals (unlike the simplified

    example above where all individuals had the same probability).

    Because the likelihood function is between 0 and 1, the log likelihood function is negative. The

    maximum to the log-likelihood function, therefore, is the smallest negative value of the log

    likelihood function given thedataand specified probability functions.

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#84
  • 7/28/2019 DCA Disaggregate 1

    39/66

    Planning Example. Suppose 10 individuals making travel choices between auto (A) andtransit (T) were observed. All travelers are assumed to possess identical attributes (a really

    poor assumption), and so the probabilities are not functions of betas but simply a functionof p, the probability of choosing Auto. The analyst also does not have any alternativespecific attributesa very naivemodelthat doesnt reflect reality. The likelihood functionwill be:L* = px(1-p)n-x= p7(1-p)3

    where; p = probability that a traveler chooses A,1-p = probability that a traveler chooses T,n = number of travelers = 10

    x = number of travelers choosing A.

    Recall that the analyst is trying to estimate p, the probability that a traveler chooses A. If 7travelers were observed taking A and 3 taking T, then it can be shown that the maximumlikelihood estimate of p is 0.7, or in other words, the value ofL* is maximized when p=0.7and 1-p=0.3. All other combinations of p and 1-p result in lower values ofL*. To see this,the analyst plots numerous values ofL* for all integer values of P (T) from 0.0 to 10.0. Thefollowing plot is obtained:

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174
  • 7/28/2019 DCA Disaggregate 1

    40/66

    Similarly (and in practice), one could use the loglikelihood function to derive the maximumlikelihood estimates, where L = log (L*) = Log [p7(1-p)3] = Log p7+ Log (1-p)3 = 7 Log p + 3Log (1-p).

    LogLikehood Function

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#164
  • 7/28/2019 DCA Disaggregate 1

    41/66

    Note that in this simple modelp is the only parameter being estimated, so maximizing thelikelihood function L* or the log (L*) only requires one first order condition, the derivative of

    p with respect to log (L*).

    The Multinomial Logit Model

    The multinomial logit (MNL) model is the most commonly applied model to explain and forecast

    discrete choices due to its ease of estimation and foundation in utility theory. The MNL model is a

    general extension of the binomial choicemodel to more than two alternatives. The universal choice

    set is C, which contains j elements, and a subset of C for each individual C n,defines their restricted

    choice sets. It should be noted that it is not a trivial task to define restricted choice sets for

    individuals. In most cases Jn for decision maker n is less than or equal to J, the total number of

    alternatives in the universal choice set, however it is often assumed that all decision makers facethe same set of universal alternatives.

    Without showing the derivation, which can be found in the references for this chapter, the MNL

    modelis expressed as:

    Where;

    1. 1. Utility for traveler n andmode i = Uin = Vin + in

    2. 2. Pn (i) is the probability that traveler n chooses modei

    3. 3. Numerator is utility formodei for travelern, denominator is the sum of

    utilities for all alternative modes Cn for travelern

    4. 4. The disturbances in are independently distributed

    5. 5. The disturbances inare identically distributed

    6. 6. The disturbances are Gumbel distributed with locationparameterand a

    scaleparameter> 0.

    The MNL model expresses the probability that a specific alternative is chosen is the exponent of

    the utility of the chosen alternative divided by the exponent of the sum of all alternatives (chosen

    and not chosen). The predicted probabilities are bounded by zero and one. There are several

    assumptions embedded in the estimation of MNL models.

    http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#173http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#196http://onlinepubs.trb.org/onlinepubs/nchrp/cd-22/glossary.html#174
  • 7/28/2019 DCA Disaggregate 1

    42/66

    Linear in parameters restriction:

    The linear in parameters restriction is made for convenience of estimation, which enables simple

    and efficient estimation of parameters. When the functional form of the systematic component of

    the utility function is linear in parameters, the MNL modelcan be written as:

    where xin and xjn are vectors describing the attributes of alternatives i and j as well as attributes of

    traveler n.

    Independence from Irrelevant Alternatives Property (IIA)

    Succinctly stated, the IIA property states that for a specific individual the ratio of the choiceprobabilities of any two alternatives is entirely unaffected by the systematic utilities of any other

    alternatives. This property arises from the assumption in the deriv