25

Click here to load reader

file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

  • Upload
    vuhanh

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

Title: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer

Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine Paynea, PhD

a Manchester Centre for Health Economics, The University of Manchester, UK

b UWA School of Agriculture and Environment, University of Western Australia, Australia

*Corresponding author: Caroline Vass

Telephone: +44 (0) 161 306 7906

Fax: +44 (0) 161 275 5205

Email: [email protected]

Journal: The Patient

Word count: 2,964/3,000 max

No. of tables: 3

No. of figures: 1

Appendices: 1

Journal reference: PTTA-D-16-00217

Keywords: error variance, heteroskedastic errors, discrete choice models, choice consistency, discrete choice experiments

Ethical approval: No ethical approval was required for this study.

Conflict of interest: CV and KP were supported in the preparation and submission of this paper by Mind the Risk, from The

Swedish Foundation for Humanities and Social Sciences. The views and opinions expressed are those of the authors, and not

necessarily those of other Mind the Risk members or The Swedish Foundation for Humanities and Social Sciences. SW and MB

declare that they have no conflict of interest.

Running head: Scale heterogeneity in healthcare DCEs

Author contributions: Caroline Vass was involved in the drafting and editing of the manuscript. Stuart Wright was involved in

the drafting and editing of the manuscript. Michael Burton was involved in the simulation of choice data, and drafting and editing

of the manuscript. Katherine Payne was involved in the drafting and editing of the manuscript.

Acknowledgements: The authors wish to thank Dr Arne Hole from the University of Sheffield for reading and commenting on a

draft of the manuscript.

1

Page 2: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

Abstract

Discrete choice experiments (DCEs) are used to quantify the preferences of specified sample populations for different aspects of a

good or service and increasingly used to value interventions and services related to healthcare. Systematic reviews of healthcare

DCEs have focussed on the trends over time of specific design issues and changes in the approach to analysis, with a more recent

move towards consideration of a specific type of variation in preferences within the sample population called taste heterogeneity,

noting rises in the popularity of mixed logit and latent class models. Another type of variation called scale heterogeneity that

relates to differences in the randomness of choice behaviour, may also account for some of the observed ‘differences’ in

preference weights. The issue of scale heterogeneity becomes particularly important when comparing preferences across

subgroups of the sample population as apparent differences in preferences could be due to taste and/or choice consistency. This

primer aims to define and describe the relevance of scale heterogeneity in a healthcare-context and illustrate key points with a

simulated data-set provided to readers in the appendix.

Key points:

Many health care discrete choice experiments (DCEs) are analysed using models which estimate scaled preference

parameters indicating the effect of each attribute relative to the variance of the error term.

The confounding of scale and preferences in the estimated parameters become an issue when comparing parameters; for

example, when considering differences in tastes between two populations.

There exist various approaches for identifying and accounting for scale heterogeneity.

2

Page 3: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

1. Introduction

Discrete choice experiments (DCEs) have become a key tool for researchers seeking to quantify the preferences of specified

populations of the public, patients, clinicians or other key stakeholders. Systematic reviews have shown an increasing trend in the

number of healthcare DCEs [1–3]. Originating in market research, and underpinned by economic theory [4], DCEs aim to elicit

respondents’ preferences by assuming the choices made by trading-off specifically designed alternatives provide information

about their preferences for the characteristics of the good or service being valued described using attributes and levels. In a DCE,

respondents are required to choose the alternative which they perceive provides the most utility from a choice set [5]. The choice

data generated are commonly analysed using some form of discrete choice model which reveals to the researcher if, and how, the

attributes in the design contributed to a particular alternative being chosen [6,7]. A systematic review of scale heterogeneity in

healthcare DCEs found it was an important issue but even in studies comparing preferences, it was not always acknowledged [8].

Some commentators [5,7,9,10] have also noted the importance of scale heterogeneity in the healthcare context but the existing

guides are broad in focus and are consequently brief or assume background knowledge in their description of the subject.

This primer aims to define and describe the relevance of scale heterogeneity, focussing on DCEs in the healthcare context. The

primer starts by defining the ‘scale parameter’ (which is related to the variance of the error term) and describing how it affects the

coefficients estimated by discrete choice models. We then describe why scale heterogeneity can be an issue, particularly when

comparing coefficients from datasets from different populations or data generated from different sources (for example, sources

using different sampling strategies). The last section of the primer aims to introduce methods to test for heterogeneity in scale and

direct the reader to relevant discrete choice models which account for it. In this primer we use mathematical notation, and so a

basic understanding is required in order to access further literature on this topic. However, we aim to make the notation accessible

and use a Technical Appendix which presents the simulation and estimation of choice data to illustrate the impact of scale

heterogeneity in a hypothetical DCE.

2. What is the ‘scale parameter’?

When asking respondents to make a choice, most experimental designs implicitly encourage them to consider the relative balance

between desirable (preferred) and undesirable (not preferred) attribute levels. Analysis of these choice data involves estimating

parameters for each attribute to quantify the individuals’ tastes (or preferences) within the specified population who completed the

experiment. Assume an individual, n, has to choose from J alternatives (j=1,..,J), where each alternative is described by K

attributes. This individual has a vector of preferences1 βkassociated with each attribute, and the deterministic element of utility

derived from alternative, j, is a function of their preferences and the level of that attribute (X):

V n , j=f ¿) [1]

However, random utility theory acknowledges that there may be an additional component of utility that is unobservable to the

researcher, so that the total utility is defined as:

Un , j=V n , j+εn , j [2]

An individual’s utility function can therefore be seen as comprising two elements: (i) a deterministic component (V n), based on

their preferences for the observed attributes of the alternative j, and (ii) random noise (ε n) associated with the alternative.

1 In some studies, preferences are allowed to vary by individual introducing the subscript, n: βn ,k . This primer focuses on scale heterogeneity and not preference heterogeneity. For this reason, issues relating to the joint identification of preference and scale heterogeneity are not addressed in this primer. We refer readers to Hess & Train [36] for a description unsuitable for this primer.

3

Page 4: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

Individuals are assumed to act rationally and select the alternative in a choice set that provides them with the most utility [11,12].

Equation 3 shows, under utility maximisation, the individual, n, will choose alternative j over any other alternative i if:

U n , j>U n , i for all i≠ j [3]

The key determinant of choice is the differences in utility, whetherUn , j>U n , i. However as ε is unobservable, utility, U, is also

unobservable. For this reason, probabilistic models are used to acknowledge that U is measured with some degree of error (ɛ).

Under Random Utility Theory, the probability of an alternative being chosen increases with deterministic utility [13]. That is to

say, as the observable part of utility increases, choices are less random, and therefore the probability of an individual choosing the

alternative increases. The probability of an individual, n, choosing alternative j over alternative i is shown in equations 4 and 5:

Pn , j=Prob (V n , j+εn , j>V n , i+εn ,i ∀ i≠ j) [4]

Pn , j=Prob (ε n ,i−εn , j<V n , j−V n ,i∀ i ≠ j) [5]

The analysis is therefore concerned with understanding the probability that unobserved differences in utility between alternatives

ε n ,i−εn , j is less than the observed V n , j−V n , i. Using the density f (ε n), the cumulative probability is therefore:

Pn , j=∫ε

I ( εn , i−εn , j<V n , j−V n ,i∀ i≠ j ) f (εn)d εn [6]

Equation 6 is a multidimensional integral over the density of the unobserved portion of utility (ε n¿, and only certain distributions

of ε result in a closed form-expression (allowing parameters to be estimated). As part of the specification of the random

component, ɛ, inferences must be made regarding the shape, location, and dispersion of its distribution. Different discrete choice

models are obtained from different specifications of this density. For a guide to more general DCE model specification, estimation

and software, we refer readers to Lancsar et al., (2017) [7]. In this primer, we will focus on one of the most commonly used

discrete choice models [2], the multinomial logit (MNL) model,2 but the general arguments extend to other models such as, for

example, the nested logit or multinomial probit models.

In the MNL it is assumed the error term has an independent and identical Gumbel (type 1 extreme value) distribution i.e. there is

no correlation in the error term across alternatives or across choice occasions. It is also assumed that the random term has a mean

of zero. This means that any systematic deviation in the mean error common to all alternatives will have no effect on choices, as

the model is defined in utility differences (see equations 4 and 5).

If the random part of utility follows a Gumbel type 1 distribution, then the probability of selecting an alternative can be expressed

as:

Pn , j=eV n , j

∑i=1

I

eV n , i [7]

2 The MNL is often used interchangeably with ‘conditional logit’. In this case, we use the term ‘multinomial logit’ or ‘MNL’ as this is commonly used in the literature cited but acknowledge estimations in Stata for this definition will use the clogit command.

4

Page 5: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

As utility has no units, analysis of discrete choice models is based on differences in utility. Therefore the ‘scale’ (the x-axis of the

distribution) is often normalised so the scale parameter, λ, is set to a value of one. In the Gumbel distribution, variance, σ εn , j

2 , is

defined as π2

6 λ . Thus in the MNL, the scale parameter λ, is inversely related the variance of the error term.

At this point it is useful to mention some differences in terminology between the language used in the discrete choice modelling

literature and the statistics literature. In discrete choice modelling the lambda term, λ, is termed the ‘scale parameter’. In contrast,

statisticians often refer to a ‘dispersion parameter’. In economics and econometrics, it is therefore almost always said the scale

parameter (λ) is inversely proportional to the variance of the error term, σ εn , j

2 .

Arbitrarily scaling by λ has implications for the estimated choice probabilities which are multiplied through byλ:

Pn , j=eλV n, j

∑i=1

I

eλV n ,i [8]

It means the model is unidentified as scale and preference parameters cannot be separately identified. Reported estimates (β¿¿ are

therefore scaled preference parameters (λβ ¿ which indicate the effect of each observable variable relative to the variance of the

error term (unobservable components).

As the scale parameter (λ ¿ decreases, or variance increases and the errors become more dispersed, estimated coefficients (β¿¿ appear smaller. This is logical as the increased error variance means the random part of utility becomes larger relative to the

deterministic component (ε n , j has relatively larger weight than V n , j) and choices will tend to be more random. Completely

random choices occur when there is an equal probability of selecting any alternative irrespective of attribute level.

3. Why is scale an issue?

Remember that utility has no scale so the estimated coefficients are uninterpretable as numbers alone because they are all scaled

by the constant scale parameter (λ). This scaling becomes an important issue when comparisons are being made across estimated

β¿s and the scale (λ) does differ systematically across groups of individuals. Within healthcare, studies have compared the

preferences of healthcare professionals, patients and the public for various healthcare choices [14–16]). However, the scale

parameter may differ between these samples (i.e when patients are more familiar with the illness than the public) but also through

other means such as across survey modes (i.e when completing questions face-to-face versus online) or sources (i.e stated and

revealed preference data), or even within the choices made by an individual (i.e when an individual suffers from fatigue as the

choice-tasks progress).3 The confounding of the β coefficients and scale parameter makes it difficult to establish whether

differences in preference weights are because e.g. V is larger or because the variance of ɛ is smaller. Parameter estimates of utility

will depend on the variance of the error term (σ εn , j

2 ) and/or preferences.

Table 1 shows the estimated coefficients from an example of a simulated DCE comprising four attributes, assumed to be

continuous, each with nine levels, to elicit and compare the preferences of a sample of patients (n=1000) and members of the

3 Comparisons of coefficients estimated with different models can be problematic too. For example, estimating a probit model with a standard normal distribution (with a variance of 1) and a logit model with a standard logistic distribution with a variance of

π 2

3, will result in a difference of estimated coefficients of √ π2

3 or 1.8. For a worked-example see Part D of the Technical

Appendix and for more details, see Chapter 2 and 3 of Train [37].5

Page 6: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

public (n=1000) for a healthcare treatment. As these data were simulated, the ‘preferences’ and scale are known. The Technical

Appendix provides Stata code for the generation and analysis of these data. In the simulation, ‘respondents’ were asked to select

their preferred treatment from two alternatives.

<insert table 1 here>

Table 2 shows the results of pooled and split-sample models. The first two columns of preference parameters are from the MNL

analysis of the ‘patient’ and ‘public’ samples, respectively (see Part A of the Technical Appendix). At first glance, the estimated

coefficients may simply, but incorrectly, be interpreted as showing that the patient sample had different preferences than the

public for all of the attributes. From the larger estimated coefficients, one could also erroneously conclude that patients were more

sensitive to each attribute.

<insert table 2 here>

It is feasible to formally test whether the coefficients are different by performing a log-likelihood test: estimate a ‘pooled’ model,

where the coefficients are restricted to be the same for both samples, and compare the fit of the restricted model to a model that

allows them to vary. Column 3 reports the pooled model, and the loglikelihood test statistics has a p-value <0.001, which would

imply that the null hypothesis that the coefficients for the two samples is the same is rejected [17].

However, the estimated coefficients actually reflect the combination of both preferences towards the attribute and the variance of

the unobservable element of utility, in the sample of the patients and the public, respectively. Therefore, the observed difference

between estimates for patients and the public may reflect the relative variance of the error term of patients and the public i.e. their

different scales. Consequently, the estimated model may be revealing differences in the randomness of choice between patients

and the public, and not a difference in sensitivity to the attributes of treatment.

When the variance of the error term is not the same across two groups of individual respondents, there are said to be

‘heteroskedastic errors’ and normalising λ public = λ patient = 1, (which is what the pooled model does) is incorrect. Making

comparisons across groups, when there are heteroskedastic errors, is when issues related to scale heterogeneity become apparent

and important. However, this issue of scale heterogeneity disappears if the aim is to examine differences in willingness-to-pay or

other marginal rates of substitution. In the calculation of these ratios, the confounding disappears because of the impact of simple

division: β1

¿

β2¿ =

λβ1

λβ2=

β1

β2. In Table 3, marginal willingness-to-pay from the MNL models for each sub-sample are presented

confirming the marginal rates of substitution are very similar across the two samples.

<insert table 3 here>

It is common and attractive for identification purposes to assume that the variance of the error term is equal to one and the same

for all individuals. It is also appealing to present marginal rates of substitution removing the issue of scale and preference

confounding. However, in some healthcare DCEs, such as those valuing health states, there may not be a meaningful numeraire.

Furthermore, if some respondents have very large error variance (verging on random) it may be incorrect to conclude that there is

a common willingness-to-pay for all respondents [18].Therefore, it may be necessary to formally test whether the error term is the

same for all individuals within a population sample and discovering differences in choice consistency could be an important

finding in itself.

4. Identifying scale heterogeneity

6

Page 7: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

It is not possible to separately identify scale and preference parameters if there is only a single sample where everyone has the

same error variance. However, if there are two samples it is possible to test if the error variances are different, subject to assuming

that they have the same preferences. Testing is carried out by normalising the variance in one selected group, for example the

patient sample, and identifying the relative scale parameter of the public group. In this hypothetical example, the utility functions

of each group are therefore:

Un , j=β1 Healthn , j+β2 Lifen , j+β3 Risk n , j+β4 Costn , j+εn , j for patients [9]

U n , j= λpublic β1 Healthn , j+ λpublic β2 Lifen, j+λ public β3 Riskn , j+λ public β4Cost n , j+εn , j for the public [10]

Where λ public is the relative scale parameter for the public and λ patients is normalised to one.

To identify scale heterogeneity, Swait and Louviere [19] formalised the normalisation of the error variance. The authors also

proposed plotting the coefficients of the MNL model results for each sample of respondents. Figure 1 graphs the coefficients from

Table 2 and suggests they differ by a scalar of approximately 0.5 (the slope of the straight trend line fitted through the points,

which passes through the origin). This graph can be reproduced by following the code in Part B of the Technical Appendix.4 This

approach is favourable because of its simplicity as, regardless of the software package used to analyse the data, plots can easily be

drawn-up in Excel. This diagnostic approach is recommended as a minimum in DCE studies that want to compare preferences

across different samples and was used, for example, in Payne et al [14].

<insert figure 1 here>

The disadvantage of the approach is that it does not allow for a formal evaluation of whether the preference parameters are the

same, once one has accounted for scale (in actual studies, the alignment of the coefficients is seldom as clear as in Figure 1).5 The

heteroskedastic multinomial logit model [20], also known as the heteroskedastic conditional logit (HCL) [21], allows unequal

error variances across individuals in a data set. The HCL model allows the scale parameter λnto be a function of the individual’s,

n’s, characteristics. The model parameterises λn as exp (Zn γ ), where Zn is a vector of individual characteristics (for example,

being a patient or a member of the public) and γ is a vector of parameters reflecting the effect of individuals’ characteristics on

the scale parameter. The heteroskedastic conditional logit model can be estimated using maximum likelihood methods in Stata

[22] using the command clogithet [23]. Advantageously, the command uses the same data set-up as clogit a (fixed-effects)

multinomial logit model. The HCL model can also be implemented in Nlogit [24]. Part C of the Technical Appendix reproduces

the HCL model results of column 4 in Table 2. The scale term shows that the public sample had a statistically significant smaller

scale parameter which suggests increased error variance. The scale parameter is the exponential of this term, so 0.480 in this

example, and standard errors can be estimated using the nlcom command in Stata. It is important to note that similar to the Swait

and Louviere test and use of coefficient plots, the HCL assumes that the groups of interest have homogenous preferences.6

If one now conducts a formal test of whether the preferences of the two samples can be restricted to be the same, conditional upon

there being different scales, the associated p-value is 0.340, suggesting that the null hypothesis of equivalent preferences cannot

be rejected. This is not surprising: as the only difference between the simulated respondents was the error variance used in

generating the public sample data, which was larger (and hence the scale smaller) than the patients sample.

4 For simplicity alternative specific constants are not included in this model. It is noted in some literature that alternative specific constants should also not be included on these coefficient plots [38].5 For a more detail discussion of issues which arise when comparing preference data and a formal test to identify scale heterogeneity, we refer readers to chapters 8 and 13 of Louviere et al. [13] and Swait & Louviere [38].6 Part E of the Technical Appendix generates data with different scale and preference parameters in the two samples and example estimation where the equivalence of the preference parameters is rejected, even once one has controlled for scale.

7

Page 8: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

Using the HCL, studies have established that scale heterogeneity could be due to many reasons. Bech et al [25] found that the

following all explained heterogeneity in the error variance: age over 60 years; response time; certainty of choices; familiarity with

attributes; perceived task difficulty; choice certainty; choice inconsistency; and number of choice sets. Flynn et al [26]

hypothesised literacy (educational qualifications), age and previous diagnosis of a psychological condition would affect error

variance in a best-worst scaling DCE eliciting preferences for quality of life. Analysis revealed six significant predictors of error

variance including: time spent completing interview; having a qualification; owning a car; having a higher than population-level

quality of life and better self-rated general health. There exist other examples where accounting for scale heterogeneity has

improved models [20,26–29]. Simply, identification of significant scale heterogeneity may also be an interesting finding in its

own right for decision makers using the results of a choice-based stated preference survey, revealing additional information about

heterogeneity in the strategies of the sample when they made choices.

5. Taking account of both scale heterogeneity and preference heterogeneity

The MNL and HCL models described here are limited in their ability to account for preference heterogeneity and, therefore,

assume that all individuals in the selected population have the same preferences. Systematic reviews of the healthcare DCE

literature have found examples of more sophisticated models being used in healthcare DCEs [2]. Substantial efforts have been

made to understand how to extend the MNL models to allow for different preference distributions [30,31]. The generalised

multinomial logit (GMNL) model allows for both preference and scale heterogeneity [32] and can also be implemented in Stata

[33]. Discrete preference distributions and scale heterogeneity can be modelled with scale-adjusted latent class analysis in Latent

Gold Choice [34]. It has been argued that it is impossible to disentangle the heterogeneity further where differences in preference

and scale parameters exist [35]. For further explanation, we refer readers to Hess & Train [36].

6. Conclusion

The potential impact of scale heterogeneity in the context of DCEs has not gone un-noticed [25,27], although it has received

relatively little attention in the healthcare context [32]. This primer has provided an introduction about the importance of scale

heterogeneity particularly when comparing preferences elicited from different samples. The next logical step is to understand the

degree to which published studies have erroneously concluded differences in preferences which actually could be attributed to

differences in the variance of the error term.

8

Page 9: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

Bibliography

1. De Bekker-Grob EW, Ryan M, Gerard K. Discrete choice experiments in health economics: a review of the literature. Health Econ. 2012;21(2):145–72.

2. Clark M, Determann D, Petrou S, Moro D, de Bekker-Grob EW. Discrete choice experiments in health economics: a review of the literature. Pharmacoeconomics. 2014;32(9):883–902.

3. Vass C, Rigby D, Payne K. The role of qualitative research methods in discrete choice experiments: a systematic review and survey of authors. Med. Decis. Mak. 2017;37(3):298–313.

4. Lancaster KJ. A new approach to consumer theory. J. Polit. Econ. 1966;74(2):132–57.

5. Lancsar E, Louviere J. Conducting discrete choice experiments to inform healthcare decision making: a user’s guide. Pharmacoeconomics. 2008;26(8):661–77.

6. McFadden D. The choice theory approach to market research. Mark. Sci. 1986;5(4):275–97.

7. Lancsar E, Fiebig DG, Hole AR. Discrete choice experiments: a guide to model specification, estimation and software. Pharmacoeconomics. 2017;35(7):697–716.

8. Wright SJ, Vass CM, Sim G, Burton M, Fiebig DG, Payne K. Accounting for scale heterogeneity in health-related discrete choice experiments: the current state of play. Patient. 2017;

9. Hauber AB, González JM, Groothuis-Oudshoorn CGM, Prior T, Marshall DA, Cunningham C, et al. Statistical methods for the analysis of discrete choice experiments: a report of the ispor conjoint analysis good research practices task force. Value Heal. 2016;19(4):300–15.

10. Viney R, Lancsar E, Louviere J. Discrete choice experiments to measure consumer preferences for health and healthcare. Expert Rev. Pharmacoeconomics Outcomes Res. 2002;2(4):319–26.

11. Thurstone L. A law of comparative judgment. Psychol. Rev. 1927;34(4):273–86.

12. Marschak J. Binary-choice constraints and random utility indicators. Math. Methods Soc. Sci. Dordrecht: Springer Netherlands; 1960:312–29.

13. Louviere J, Hensher D, Swait J. Stated choice methods: analysis and application. Cambridge University Press; 2000.

14. Payne K, Fargher EA, Roberts SA, Tricker K, Elliott RA, Ratcliffe J, et al. Valuing pharmacogenetic testing services: a comparison of patients’ and health care professionals' preferences. Value Heal. 201;14(1):121–34.

15. Najafzadeh M, Johnston KM, Peacock SJ, Connors JM, Marra M a, Lynd LD, et al. Genomic testing to determine drug response: measuring preferences of the public and patients using discrete choice experiment (dce). BMC Health Serv. Res. 2013;13(1):454.

16. Morillas C, Feliciano R, Catalina PF, Ponte C, Botella M, Rodrigues J, et al. Patients’ and physicians' preferences for type 2 diabetes mellitus treatments in spain and portugal: a discrete choice experiment. Patient Prefer. Adherence. 2015;9:1443–58.

17. Wooldridge J. Introductory econometrics. 4th ed. South Western College; 2008.

18. Burton M, Davis KJ, Kragt ME. Interpretation issues in heteroscedastic conditional logit models. Univ. West. Aust. Sch. Agric. Resour. Econ. Work. Pap. 2016;1603.

19. Swait J, Louviere J. The role of the scale parameter in the estimation and comparison of multinomial logit models. J. Mark. Res. 1993;30(3):305–14.

20. Hensher D, Louviere J, Swait J. Combining sources of preference data. J. Econom. 1998;89(1-2):197–221.

9

Page 10: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

21. Hole AR. Small-sample properties of tests for heteroscedasticity in the conditional logit model. Econ. Bull. 2006;3:1–14.

22. StataCorp. Stata statistical software: release 12. college station, tx: statacorp lp. 2011;

23. Hole AR. Clogithet: stata module to estimate heteroscedastic conditional logit model. Stat. Softw. Components. 2006;(S456737).

24. LIMDEP. Nlogit. 2015.

25. Bech M, Kjaer T, Lauridsen J. Does the number of choice sets matter? results from a web survey applying a discrete choice experiment. Health Econ. 2011;20(3):273–86.

26. Flynn T, Louviere J, Peters T, Coast J. Using discrete choice experiments to understand preferences for quality of life. variance-scale heterogeneity matters. Soc. Sci. Med. 2010;70(12):1957–65.

27. DeShazo JR, Fermo G. Designing choice sets for stated preference methods: the effects of complexity on choice consistency. J. Environ. Econ. Manage. 2002;44(1):123–43.

28. Pedersen LB, Kjaer T, Kragstrup J, Gyrd-Hansen D. Do general practitioners know patients’ preferences? an empirical study on the agency relationship at an aggregate level using a discrete choice experiment. Value Heal. 2012;15(3):514–23.

29. Vass CM, Rigby D, Payne K. Investigating the heterogeneity in women’s preferences for breast screening: does the communication of risk matter? Value Heal. 2017;

30. Hensher D, Greene W. The mixed logit model: the state of practice. Transport. 2003;30:133–76.

31. Greene WH, Hensher D. A latent class model for discrete choice analysis: contrasts with mixed logit. Transp. Res. Part B Methodol.. 2003;37(8):681–98.

32. Fiebig D, Keane M, Louviere J, Wasi N. The generalized multinomial logit model: accounting for scale and coefficient heterogeneity. Mark. Sci. 2010;29(3):393–421.

33. Gu Y, Hole AR, Knox S. Fitting the generalized multinomial logit model in stata. Stata J. 2013;13(2):382–97.

34. Statistical Innovations. Latent gold. 2013.

35. Hess S, Rose JM. Can scale and coefficient heterogeneity be separated in random coefficients models? Transportation (Amst). 2012;39(6):1225–39.

36. Hess S, Train K. Correlation and scale in mixed logit models. J. Choice Model. 2017;23:1–8.

37. Train K. Discrete choice methods with simulation. 2nd ed. Cambridge University Press; 2009.

38. Swait J, Louviere J. The role of the scale parameter in the estimation and comparison of multinomial logit models. J. Mark. Res. 1993;30(3):305–14.

10

Page 11: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

Tab. 1: Attribute and level definitions for a hypothetical DCE

Attribute Label Levels

Improvement on disease activity scale Health 0,1,2,3,4,5,6,7,8 or 9 points

Additional life expectancy Life 0,1,2,3,4,5,6,7,8 or 9 years

For every 10 people, number who will have a side effect

Risk 0,1,2,3,4,5,6,7,8 or 9 people

Cost of treatment Cost £0, £1, £2, £3, £4, £5, £6, £7, £8 or £9

Tab. 2: Pooled and split-sample estimates of discrete choice data using different model specifications

Patients multinomial logit model

Public multinomial logit model

Pooled multinomial logit model

Heteroskedastic conditional logit model

Attribute [1] [2] [3] [4]Health 1.161 (0.159) 0.426 (0.050) 0.592 (0.048) 1.116 (0.154)Life 2.368 (0.305) 1.011 (0.088) 1.295 (0.088) 2.359 (0.300)Risk -3.563 (0.456) -1.483 (0.125) -1.928 (0.128) -3.525 (0.449)Cost -1.128 (0.149) -0.465 (0.052) -0.613 (0.048) -1.113 (0.146)Scale term (public) -0.870 (0.152)Log-likelihood -234.504 -156.835 -238.648 -221.802sample n=1000 n=1000 n=2000 n=2000Standard errors in parentheses

Tab. 3: Marginal willingness-to-pay for a marginal increase in one unit by sample (multinomial logit model)

Health Life RiskPatients’ willingness-to-pay £1.03 £2.10 -£3.16

11

Page 12: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

Public’s willingness-to-pay £0.92 £2.17 -£3.19

12

Page 13: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

Fig. 1: ‘Swait and Louviere’ [20] plot of hypothetical coefficients estimated from the preferences of a sample of patients

and the public. Dotted line is at 45o for reference.

13

Page 14: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

Technical AppendixThis appendix presents Stata code for the simulation of the choice data and estimation of models described in the manuscript. It assumes that there are two sub-populations which have identical preferences, but the random component of utility has different variances/scale.

clear***Generating choice data: this code simulates hypothetical choice data for the healthcare DCE described in Table 1***

* Setting the number of observations* a defines the overall sample size, and s identifies two subsamples when s=0 and s=1

local a=2000set obs `a'gen id=_ngen s=0replace s=1 if id>`a'/2

* Setting the number of alternatives* The number of alternatives in each choice set is set to 2

expand 2bysort id: gen alt=_n

* Generating attribute levels* Four attributes are defined. They are assumed to take on integer values from 0 to 9* There is no formal ‘experimental design’ employed: attribute levels across individuals and alternatives is assigned at random.* Given the large sample size used in the simulation, the small number of attributes and the homogeneity of preferences, this still allows accurate identification of estimates, whilst simplifying the generation of the data.* Setting a seed value allows the results reported in the paper to be reproduced exactly: changing this value will draw a different sample.

set seed 26gen x1=int(10*uniform())gen x2=int(10*uniform())gen x3=int(10*uniform())gen x4=int(10*uniform())

* Generating error and utilities* The random element of utility is drawn from a Gumbel distribution where the variance of the distribution is twice as large when s=1 compared to s=0* The simulated utility from each alternative includes the deterministic part (1*x1+2*x2-3*x3-1*x4) and the stochastic part (err)

gen err=(1+s)*log(-log(uniform()))gen u=1*x1+2*x2-3*x3-1*x4+errsum err if s==0sum err if s==1

* Generating choices* First the maximum utility of the two alternatives is identified for each individual, and then this is used to make the ‘choice’ of the alternative with the highest utility

bysort id: egen max=max(u)gen choi=0replace choi=1 if u==max

* Renaming variables to attribute labels in Table 1rename x1 healthrename x2 life

14

Page 15: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

rename x3 riskrename x4 cost

* Renaming sample identifierrename s public

***PART A: MNL models. This code reproduces the results of columns 1-3 of Table 2. **** Patients MNL [1]

* Estimates the model on just the patient sample, and saves resultsclogit choi health life risk cost,group(id), if public==0estimates store m0matrix b0=e(b)'

* Public MNL [2]* Estimates the model on just the public sample, and saves results

clogit choi health life risk cost,group(id), if public==1estimates store m1matrix b1=e(b)'

* Pooled MNL [3]* Estimates the model on the pooled sample, and saves results

clogit choi health life risk cost,group(id)estimates store m01

*Conducts a loglikelihood test for whether the parameter restrictions implied by the pooled model can be accepted. lrtest m01 (m0 m1)

* Estimating willingness to pay* This re-estimates the sub-sample models, and reports the WTP estimates

ssc install wtpclogit choi health life risk cost,group(id),if public==0wtp cost health life riskclogit choi health life risk cost,group(id),if public==1wtp cost health life risk

***PART B: Swait and Louviere plot. This code plots the coefficients from each sub-sample to reproduce Figure 1. **** Converts matrix into variables for plottingsvmat b0, names(b0)svmat b1, names(b1)sort b01

* Graph of the two sets of parameters against each other with a best fit line through themregress b01 b11,noconspredict hat,xbtwoway (scatter b01 hat b11 b11, c(i l l) m(s i i) lpattern(solid dash solid ) ytitle(patients’ coefficients) xtitle(public’s coefficients) ) , legend(off)

***PART C: HCL model. This code produces the results in column 4 of Table 2. **** Estimating HCL model

* This model uses the pooled data, but allows the scale coefficient to differ between themssc install clogithetclogithet choi health life risk cost,group(id) het(public)estimates store m01hetlrtest m01het (m0 m1)

* Identifying the scale parameter (as λ=exp(β))15

Page 16: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

nlcom exp([het]_b[public])

*Conducting a loglikilihood test to see if the preference parameters can be restricted to be the same, conditional upon the scale coefficient being different in the two sampleslrtest m01het (m0 m1)

***PART D: Logit and Probit comparison as described in Footnote 3. **** Coverting the data into standard logit/probit formbysort id (alt): gen healthd=health-health[_n+1]bysort id (alt): gen lifed=life-life[_n+1]bysort id (alt): gen riskd=risk-risk[_n+1]bysort id (alt): gen costd=cost-cost[_n+1] * Estimating the logitlogit choi healthd lifed riskd costd,nocons, if alt==1matrix l=e(b)'

*Estimating the probitprobit choi healthd lifed riskd costd,nocons, if alt==1matrix p=e(b)'svmat l, names(l)svmat p, names(p) *Checking the scaling of logit to probit values as per footnote 3gen lp=l1/p1list l1 p1 lp if l1~=.

16

Page 17: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

***PART E: Supplementary Analysis described in Footnote 4. ****This set of commands repeats the analysis above, but changes both scale and preference parameters in the two samples.

*This analysis is not reported in the paper, but it is informative, as it shows an example where the equivalence of the preference parameters is rejected, even once one has controlled for scale. clear all* Setting the number of observations                *  a defines the overall sample size, and s identifies two subsamples when s=0 and s=1local a=2000set obs `a'gen id=_ngen s=0replace s=1 if id>`a'/2 * Setting the number of alternatives                * The number of alternatives in each choice set is set to 2expand 2 bysort id: gen alt=_n * Generating attribute levels

* Four attributes are defined. They are assumed to take on integer values from 0 to 9* There is no formal ‘experimental design’ employed: attribute levels across individuals and alternatives is assigned at random.* Given the large sample size used in the simulation, the small number of attributes and the homogeneity of preferences, this still allows accurate identification of estimates, whilst simplifying the generation of the data.* Setting a seed value allows the results reported in the paper to be reproduced exactly: changing this value will draw a different sample.

set seed 26gen x1=int(10*uniform())gen x2=int(10*uniform())gen x3=int(10*uniform())gen x4=int(10*uniform()) * Generating error and utilities

* The random element of utility is drawn from a Gumbel distribution, where the variance of the distribution is twice as large when s=1 compared to s=0* The simulated utility from each alternative includes the deterministic part (1*x1+2*x2-3*x3-1*x4) and the stochastic part (err) * Note here that some of the preference parameters are being influenced by the sample dummy, s i.e. preferences are truly different between the two samples.

 gen err=(1+s)*log(-log(uniform()))gen u=(1+1*s)*x1+(2+2*s)*x2-3*x3-1*x4+errsum err if s==0sum err if s==1 * Generating choices

* First the maximum utility of the two alternatives is identified for each individual, and then this is used to make the ‘choice’ of the alternative with the highest utility

 bysort id: egen max=max(u) gen choi=0replace choi=1 if u==max * Renaming variables to attribute labels in Table 1rename x1 healthrename x2 liferename x3 riskrename x4 cost 

17

Page 18: file · Web viewTitle: Scale Heterogeneity in Healthcare Discrete Choice Experiments: A Primer Authors: Caroline M Vassa*, PhD; Stuart Wrighta, MSc; Michael Burtonb, PhD; Katherine

* Renaming sample identifierrename s public 

***PART E1: MNL models*** * Patients MNL [1]                * Estimates the model on just the patient sample, and saves resultsclogit choi health life risk cost,group(id), if public==0estimates store m0matrix b0=e(b)' * Public MNL [2]                * Estimates the model on just the public sample, and saves resultsclogit choi health life risk cost,group(id), if public==1estimates store m1matrix b1=e(b)'  * Pooled MNL [3]

* Estimates the model on the pooled sample, and saves resultsclogit choi health life risk cost,group(id)estimates store m01 *Conducts a loglikelihood test for whether the parameter restrictions implied by the pooled model can be accepted. lrtest m01 (m0 m1) * Estimating willingness to pay                * This re-estimates the sub-sample models, and reports the WTP estimatesssc install wtpclogit choi health life risk cost,group(id),if public==0wtp cost health life riskclogit choi health life risk cost,group(id),if public==1wtp cost health life risk  

***PART E2: Swait and Louviere graphs**** Converts matrix into variables for plottingsvmat b0, names(b0)svmat b1, names(b1)sort b01 * Graph of the two sets of parameters against each other with a best fit line through themregress b01 b11,noconspredict hat,xbtwoway (scatter b01 hat b11 b11, c(i l l) m(s i i) lpattern(solid dash solid )  ytitle(patients) xtitle(public) ) , legend(off)  

***PART E3: HCL model**** Estimating HCL model                * This model uses the pooled data, but allows the scale coefficient to differ between themssc install clogithetclogithet choi health life risk cost,group(id) het(public)estimates store m01het * Identifying the scale parameter (as λ=exp(β))nlcom exp([het]_b[public]) *Conducting a log-likelihood test to see if the preference parameters can be restricted to be the same, conditional upon the scale coefficient being different in the two sampleslrtest m01het (m0 m1)

18