Application and Test of Web-based Adaptive Polyhedral ...hauser/Papers/FPE052802.pdf · Application and Test of Web-based Adaptive Polyhedral Conjoint Analysis tion for individual-respondent

Application and Test of Web-based Adaptive Polyhedral Conjoint Analysis

Ely Dahan

John R. Hauser

Duncan Simester

Olivier Toubia

May 28, 2002

Ely Dahan is an Assistant Professor of Management Science, John Hauser is the Kirin Professor of Mar-keting, Duncan I. Simester is an Associate Professor of Management Science, Olivier Toubia is a Ph.D. candidate all at the Sloan School of Management, Massachusetts Institute of Technology, 38 Memorial Drive, Cambridge, MA 02142-1307. The Sloan School of Management, the Center for Innovation in Product Development at MIT and the EBusiness Center at MIT all supported this research. This paper may be downloaded from http://mitsloan.mit.edu/vc. That website also contains (1) open source code to implement the methods described in this paper, (2) demonstrations of web-based questionnaires used to collect data for the tests described in this paper, and (3) related papers and demonstrations of web-based interviewing methods. The authors are listed in alphabetical order. We wish to thank Limor Weisberg for creating the graphics used on the web-based questionnaire and Rob Hardy, Adlar Kim, and Bryant Lin who assisted in the development of the web-based questionnaire. We gratefully acknowledge the contribution of Robert M. Freund who proposed the use of the analytic center and approximating ellipsoids for the math programs required for question design and partworth estima-tion. Bryan Orme and Richard Johnson at Sawtooth Software provided many useful comments on initial drafts of the paper and generously cooperated in providing information about Sawtooth Software’s ACA algorithm and verification of our code. The study benefited from comments by numerous pretesters at the MIT Operations Research Center, Analysis Group Economics, Applied Marketing Science, the Center for Innovation in Product Development, and Dražen Prelec’s market measurement class. Our thanks to Brennan Mulligan of Timbuk2 Designs, Inc. for supplying, at reduced cost, the laptop computer bags used in our experiments.

Application and Test of Web-based Adaptive Polyhedral Conjoint Analysis Abstract

In response to the need for more rapid and iterative feedback on customer preferences, research-

ers are developing new web-based conjoint analysis methods that adapt the design of conjoint questions

based on a respondent’s answers to previous questions. Adapting within a respondent is a difficult dy-

namic optimization problem and, until recently, adaptive conjoint analysis’ (ACA) utility-balance heuris-

tic was the only widely-used method for addressing this adaptation. In this paper we apply and test a new

polyhedral method that uses “interior-point” math programming techniques. This question-design

method is benchmarked against both ACA and an efficient, non-adaptive design (Fixed).

Over 300 respondents were randomly assigned to different experimental conditions and were

asked to complete a web-based conjoint exercise. The conditions varied based on the design of the con-

joint exercise. Respondents in one group completed a conjoint exercise designed using the ACA method,

respondents in another group completed an exercise designed using the Fixed method, and the remaining

respondents completed an exercise designed using the polyhedral method. Following the conjoint exer-

cise, respondents were given $100 and allowed to make a purchase from a Pareto choice set of five new-

to-the-market laptop computer bags. The respondents received their chosen bag together with the differ-

ence in cash between the price of their chosen bag and the $100.

We compare the methods on both internal and external validity. Internal validity is evaluated by

comparing how well the different conjoint methods predict several holdout conjoint questions. External

validity is evaluated by comparing how well the conjoint methods predict the respondents’ selections

from the choice sets of five bags. Each method combines a modular question-design component with a

modular estimation component, and so we conducted additional analysis to separately evaluate the accu-

racy of the question-design and estimation components. As part of this analysis we explored alternative

estimation methods.

The results reveal a remarkable level of consistency across the two validation tasks. In the initial

comparisons the polyhedral method was significantly more accurate than the Fixed method and offered

similar performance to the ACA method. In our analysis of hybrid methods, the preferred estimation

method varied based on the question-design method, although the Hierarchical Bayes estimation methods

consistently performed well. The comparison of question-design methods offered strong evidence sup-

porting polyhedral question-design over the Fixed and utility-balance (ACA) question-design methods.

This finding holds across estimation methods.

At the time of the test, the bags were prototypes. Based, in part, on the results of this study these

bags are now available commercially.


1. Recent Developments Preference measurement has a long history in marketing. Firms routinely rely upon various

forms of conjoint analysis, self-stated importances, and voice-of-the-customer methods to design new

products, expand product lines, and set segmentation strategies (e.g., Cattin and Wittink 1982; Green and

Srinivasan 1990; Wittink and Cattin 1989). Until recently, preference measurement could safely be called

a mature technology with a rich history of applications, improvements, simulations, and empirical tests.

However, three recent developments have caused researchers to refocus on this fertile area of research.

The first development is a change in the product development (PD) process. Business-to-

business and complex durable goods manufacturers are adopting consumer-packaged-good methods. Be-

cause these products are typically complex with many features, PD teams require preference measurement

methods that can handle more customer needs and provide prioritizations among substantially more fea-

tures (Eppinger 1998; Eppinger, et. al. 1994; Ulrich and Eppinger 2000). At the same time concurrent

engineering, rapid web-based product design, and “spiral” processes view preference measurement as

iterative (Cusumano and Selby 1995; McGrath 1996; Smith and Reinertsen 1998; Tessler, Wada and

Klein 1993). PD teams routinely screen large numbers of features, then focus, remeasure, and iterate.

Accuracy need only be sufficient to separate the winning features from the losing features, shifting the

tradeoff toward lower research costs per iteration. PD teams are more willing to sacrifice some precision

in the measurement of preferences in order to handle more features.

The second development is the advent of the Internet as a means to collect data. Web-based pan-

els with large numbers of respondents provide the potential for very rapid and reliable feedback (Buck-

man 2000; Gonier 1999; Nadilo 1999; Willkie, Adams, and Girnius 1999). A conjoint analysis study can

be fielded nationally or internationally in days. Furthermore, with broadband capabilities, virtual con-

cepts can be created with vivid representations, animation, context-sensitive help, and sound (Dahan and

Hauser 2002; Dahan and Srinivasan 2000). In addition, the opportunity for computation between ques-

tions has renewed interest in new forms of adaptive questions – questions selected and created for each

customer based on his or her prior answers. The Internet has also brought a new set of limitations. In

particular, web-based interviewing techniques must address reduced respondent tolerance for answering

multiple questions. In central location interviewing, respondents are remarkably tolerant of multiple

questions. However, the literature and commercial experience indicate much lower tolerance levels in a

web context (e.g., De Angelis 2001). This creates greater demand for conjoint methods that gather more

information with fewer questions.

The third development is the introduction of projective interior-point methods in math program-

ming. Conjoint analysis has used math programming successfully for almost thirty years (e.g., Srinivasan

and Shocker 1973a, 1973b), but its application to question design is new. For large problems, optimiza-

1


tion for individual-respondent question design is a complex, difficult-to-solve problem, which, if solved

by traditional methods, would lead to excessive delays between questions – a situation that is unaccept-

able to web-based respondents. The new algorithms search the interior of a parameter space and find

near-optimal solutions extremely fast (Freund 1993; Kamarkar 1984; Nesterov and Nemiroskii 1994;

Sonnevend 1985a, 1985b; Vaidja 1989). This represents a change in philosophy from a primarily statisti-

cal focus, to a focus on feasible heuristics. These methods do well in simulation, but have not yet been

tested on real customers making real decisions.

In this paper we evaluate the interior-point method recently proposed by Toubia, Simester and

Hauser (2002). We use a real product – a laptop computer bag worth approximately $100. After com-

pleting the various web-based conjoint questionnaires (assigned randomly), respondents in the study were

given $100 to spend on a choice set of five bags. Respondents received their chosen bag together with the

difference in cash between the price of their chosen bag and the $100. These data enable us to compare

the methods on both internal and external validity. Internal validity is evaluated by comparing how well

the methods predict several holdout conjoint questions. External validity is evaluated by comparing how

well the different conjoint methods predict which bag respondents later chose to purchase using their

$100. We then explore whether there is a question-order effect, whether the performance of the “pure-

bred” methods can be improved with hybrids that combine components from the different methods, and

whether alternative estimation methods (such as Hierarchical Bayes) improve accuracy.

Our goal is to evaluate whether the types of methods proposed by Toubia, Simester and Hauser

(2002) are sufficiently promising to justify further development. Consistent with their simulations, we do

not expect that any one method will always out-perform the other benchmarks, nor do we intend that our

findings be interpreted as criticism of any of the benchmarks. Rather this study examines relative and

absolute performance in a product category that is typical of web-based methods. If the methods are

promising in this category, future studies might explore further which situations favor each of the differ-

ent methods.

We begin with a brief review of the methods and then describe the research design. The core of

the paper reports on and interprets the internal and external validity tests. We close with a summary and

discussion of the findings.

2. Three Conjoint Methods for Metric Paired-Comparison Judgments Toubia, Simester and Hauser’s (2002) FastPace method adapts metric paired-comparison ques-

tions for each respondent using that respondent’s answers to previous questions. Adapting within a re-

spondent is a difficult dynamic optimization problem that should be distinguished from techniques that

adapt across respondents. The only other approach that we found to address this problem is Sawtooth

2


Software’s adaptive conjoint analysis (ACA). In contrast, techniques such as the Arora and Huber

(2001), Huber and Zwerina (1996), and Sandor and Wedel (2001) swapping/relabeling methods adapt

choice-based conjoint designs across respondents. While, the Toubia, et. al. method can be adapted to

choice-based conjoint analysis, it is beyond the scope of this initial test to evaluate that algorithm versus

choice-based and random-utility benchmarks. We hope our tests encourage such comparisons.

We compare the so-called FastPace method with two benchmarks that are appropriate for metric

paired-comparison data: ACA and an efficient non-adaptive (Fixed) method. We also explore hybrid

techniques and improvements based on these data collection models. For this first empirical test, we fo-

cus on paired-comparison questions in which respondents provide a metric rating of the extent to which

they prefer one product over the other (metric-paired-comparison questions) because this is the question

format that ACA uses. This is a common format, well suited to the web, standard in the ACA method,

and shown to provide interval-scaled preferences (Hauser and Shugan 1980; Johnson 1991; Mahajan and

Wind 1992; Wittink and Cattin 1989). We hold this question format constant across methods. If polyhe-

dral methods work well for this task, future empirical tests can evaluate them for alternative tasks includ-

ing choice-based tasks and full-profile evaluation.

The methods we evaluate vary in at least two respects: (1) the heuristics that they use to select the

product profiles that respondents compare, and (2) the manner in which partworths associated with each

feature are estimated. In the discussion that follows, we review the design of the product profiles and the

basic estimation procedures for the three methods. As all three methods are described elsewhere, our re-

view is brief (readers seeking additional details should consult the references). We consider alternative

estimation methods in later sections.

Fast Polyhedral Adaptive Conjoint Estimation (FP)

Fast polyhedral adaptive conjoint estimation (FP) uses recent advances in interior-point methods

to design questions (product profiles) and estimate partworths.1 Toubia, Simester and Hauser (2002)

propose two polyhedral algorithms, one to address metric paired-comparison questions and one to address

choices from a set of profiles. They test the metric-paired comparison version using simulated respon-

dents. Their methods interpret the problem of estimating partworths for p different product features as

searching a p-dimensional space.2 A point in this p-dimensional space represents one set of relative im-

portances (partworths) for the features. The goal is to identify the point in the space that best represents

the true partworths for an individual respondent.

1 Acronyms are used throughout the paper to simplify exposition. A complete summary of the acronyms is provided in the Appendix. 2 FP allows the initial space to be constrained by prior information or other constraints. For features with more than two levels or for features that interact, FP simply expands the dimensionality of the space accordingly.

3


When respondents answer a metric question comparing two product profiles, their answers are

generally consistent with only a subset of the points in the p-dimensional space. By eliminating points

that are inconsistent with the responses, the method is designed to converge to the point that represents

the true set of relative importances. Responses are more informative if they allow the researcher to

eliminate a larger subset of the points, thus FP designs questions to reduce quickly the set of consistent

(feasible) points. The FP question-design method approximates the current feasible (polyhedral) region

using ellipsoids. By designing question vectors that are parallel to the longest axis of the ellipsoids, the

feasible region shrinks rapidly. We refer interested readers to their paper, where the authors describe the

details and rationale for this approach, including methods for modeling respondent errors. (Their paper

and open-source code are available at the website listed in the acknowledgments section of their paper.)

The managerial problem imposed by the recent changes in product development requires esti-

mates of partworths with relatively few questions – often fewer questions than there are features. This

means that the researcher must often estimate the respondent’s revealed partworths even though there re-

mains a (hopefully small) region of feasible partworths that are consistent with the respondent’s answers

to the chosen questions. Based on research that justifies centrality as a criterion (Dawes and Corrigan

1974; Einhorn 1971; Huber 1975; Moore and Semenik 1988; Srinivasan and Park 1997; Srinivasan and

Shocker 1973a), FP uses an estimate that characterizes the set of remaining feasible points by the center

of gravity of the set. Because it is computationally difficult to identify the center of gravity of a polyhe-

dron, they use the analytic center, which provides an excellent approximation. The analytic center can be

estimated with effectively no computational delay between questions.

Tests conducted with simulated respondents suggest that the FP algorithm offers improved accu-

racy over ACA and an efficient Fixed design when there are many features relative to the number of ques-

tions and/or when the self-explicated measures in ACA are noisy (Toubia, Simester and Hauser 2002).

When the number of questions is sufficiently high all three methods have similar performance.

Adaptive Conjoint Analysis (ACA)

Sawtooth Software’s Adaptive Conjoint Analysis (ACA) has been the dominant approach for

adapting questions within a respondent. It has become an industry standard for conjoint analysis (Green,

Krieger and Agarwal 1991, p. 215, Green, Krieger, and Wind 2001).3 Academic and commercial experi-

ence suggest that ACA is often reliable, accurate, and managerially valuable.

3 See also: Carroll and Green 1995; Choi and DeSarbo 1994; Green and Krieger 1995; Green, Krieger and Agarwal 1991; Huber, Wittink, Fiedler and Miller 1993; Johnson 1987; Klein 1988; Orme 1999; Wittink and Cattin 1989. We conducted an informal poll among practitioners. ACA is still strong and seems to be second only to full-profile methods in application. However, CBC methods are coming on strong. Recent sales by Sawtooth Software suggest a 45% share for CBC, a 37% share for ACA, and an 18% share for full-profile software.

4


The ACA interview consists of up to five sequential tasks (Klein 1988; Sawtooth Software 1996).

Respondents (1) eliminate unacceptable levels, (2) rank levels within attributes, (3) state the relative im-

portances of improving a product from one feature level to another (the “self-explicated” questions), (4)

respond to metric paired-comparison questions, and (5) evaluate a small set of full product profiles on a

purchase-intention scale. The first task is often skipped. The “adaptive” task is the fourth task. The

ACA algorithm uses information from a respondent’s earlier responses to provide intermediate estimates

of a respondent’s preferences and then designs product profiles that are closest in estimated preference,

subject to constraints that ensure the overall design is nearly balanced and orthogonal.

The intermediate estimates are based on a modified regression that combines the information

from the self-explicated (SE) and the metric paired-comparison (PC) data. The intermediate regression

has one data point for each level in each SE response and one data point for each PC response. This

makes it feasible to estimate intermediate estimates even after a single PC question.

The procedure used to derive final parameter estimates, once the data collection is complete, has

changed over time with new versions of the ACA algorithm. (The procedure to design PC questions has

remained more or less constant.) In our initial comparisons we focus on the version that was available at

the time of our data collection. This approach uses a logit regression based on the purchase intention (PI)

responses (the fifth task) to determine how heavily to weight the SE and PC data (for more details see

Green, Krieger and Agarwal 1991 and Sawtooth Software 1996). Our code was written using Sawtooth

Software’s documentation together with e-mail interactions with the company’s representatives.4 It is

available from the website listed in the acknowledgements section of this paper. We confirmed the accu-

racy of the estimation code by using executable files generously provided by Sawtooth Software to re-

estimate partworths for a sample of data.

Fixed Efficient Design

To test whether the adaptive nature of the questions improves or degrades accuracy we included a

“Fixed” benchmark, in which the set of PC questions was chosen in advance. Every respondent saw the

same set of questions, although the order of the questions was randomized. For the Fixed design we used

an algorithm that uses the geometric mean of the eigenvalues of the information matrix to maximize a D-

efficiency criterion – a criterion that yields profile designs that are as balanced and orthogonal as feasible

(Kuhfeld 1999; Kuhfeld, Tobias and Garratt 1994). Because the PC data are metric, the partworths are

estimated with ordinary least-squares (OLS) regression. OLS requires more data points than estimated

4 After seeing the initial comparisons, Sawtooth Software revealed that undocumented heuristics are used for 40-60% of the purchase-intention regressions. We programmed these heuristics and they are now included in our esti-mates. Except for a few undocumented special cases, a representative of Sawtooth confirmed that our code for paired-comparison question design captures the “essential approach” they use and document.

5


parameters, and so the basic Fixed method provides estimates only after the number of PC questions ex-

ceeds the number of partworths (parameters).

Alternative Estimation Methods

Following the initial comparisons we separately evaluate the accuracy of the question-design and

estimation methods. As part of this analysis we explore several alternative estimation methods. These

include an estimation method introduced by Sawtooth Software subsequent to our data collection, an al-

ternative analytic-center method, and Hierarchical Bayes estimation, which has been the subject of con-

siderable recent interest amongst marketing academics (see for example Allenby and Rossi 1999; Arora,

Allenby and Ginter 1998; Lenk, et. al. 1996; and Liechty, Ramaswamy and Cohen 2001).

3. The Across-Subjects Research Design Conjoint analysis represents a product as a profile of its features and price. The preference for a

product is then a function of the “partworths” assigned to the levels of each feature. The product used in

the study was an innovative new laptop computer bag that includes a removable padded sleeve to hold

and protect a laptop computer. At the time of our study, this product was not yet on the market so re-

spondents had no prior experience with it. Subsequent to our research and based, in part, on data from

our research, the product is now commercially available from Timbuk2 (http://www.timbuk2.com).

Figure 1 offers a photograph illustrating examples of the bags.

Figure 1 Examples of Product Category (Laptop Computer Bags)

6


The bag includes a range of separable product features, such as the inclusion of a mobile-phone

holder, side pockets, or a logo. We focused on nine product features, each with two levels, and included

price as a tenth feature. Price is restricted to two levels ($70 and $100) – the extreme prices for the bags

in both the internal and external validity tests. For all three conjoint methods, we estimated the part-

worths associated with prices between $70 and $100 by linearly interpolating. A more detailed descrip-

tion of the product features can be found on the website listed in the acknowledgements section of this

paper.

Based on pretests, the features appear to satisfy preferential independence, allowing us to model

preference for each bag as a linear additive combination of these features (Keeney and Raiffa 1976, p.

101-105; Krantz, et. al. 1971). We recognize that in other applications features may have more than two

levels and may not be separable, discrete, or additive. The separable nature of the product features may

also influence the accuracy of the self-explicated (SE) questions used by ACA. In particular, we expect

SE responses to be more accurate in categories where customers make purchasing decisions about fea-

tures separately, perhaps by choosing from a menu of features. In contrast, we expect SE responses to be

less accurate for products where the features are typically bundled together, so that customers have little

experience in evaluating the importance of the individual features.

The basic Fixed, FastPace and ACA methods all use metric PC questions, but only the basic ACA

method requires the SE (and PI) questions. Therefore, as the accuracy of the SE responses improve, we

expect an improvement in the relative performance of ACA. Consistent with this, Toubia, Simester and

Hauser’s (2002) simulations suggest that if the SE responses are sufficiently accurate, then ACA outper-

forms both the Fixed and FP methods. In their simulations ACA does not perform as well as the other

two methods when the SE responses are noisy. Thus, our choice of separable features is likely to favor

ACA relative to the FP and Fixed methods (and other methods that do not make use of the SE responses).

Research Design

We begin with an overview of the research design and then discuss each element of the design in

greater detail. The overview of the research design is summarized in Table 1 and the detailed research

design is summarized in Table 2. Subjects were randomly assigned to one of the three conjoint question-

design methods. After completing the respective conjoint tasks, all of the respondents were presented

with the same validation exercises. The internal validation exercise involved four holdout metric paired-

comparison (PC) questions, which occurred immediately after the sixteen PC questions designed by the

respective conjoint methods. The external validation exercise was the selection of a laptop computer bag

from a choice set of five bags. This exercise occurred in the same session as the conjoint tasks and hold-

out questions, but was separated from these activities by a filler task designed to cleanse memory.

7


Table 1 Overview of the Research Design

FastPace Fixed ACA

FP Conjoint Fixed Conjoint ACA Conjoint

Internal Validity Task Internal Validity Task Internal Validity Task

External Validity Task External Validity Task External Validity Task

Conjoint Tasks

Recall that ACA requires five sets of questions. Pretests confirmed that all of the features were

acceptable to the target market, allowing us to skip the unacceptability task. This left four remaining

tasks: ranking of levels within attributes, self-explicated (SE) questions, metric paired-comparison (PC)

questions, and purchase intention (PI) questions. ACA uses the SE questions to select the PC questions,

thus the SE questions in ACA must come first, followed by the PC questions and then the PI questions.

To test ACA fairly, we adopted this question order for the ACA condition.

The Fixed and FP question-design techniques do not require SE or PI questions. Because asking

the SE questions first could create a question-order effect, we asked only the PC questions (not the SE or

PI questions) prior to the validation task in the Fixed condition.5 To investigate the question-order effect

we included two FP data collection procedures: one that matched that of fixed-efficient designs (FP1) and

one that matched that of ACA (FP2). In FP1 the PC questions precede the validation task, while in FP2,

the SE, PC, and PI questions precede the validation task. This enables us to (a) explore whether the SE

questions affect the responses to the PC questions and (b) evaluate hybrid techniques that combine data

from the PC and SE questions.6

Our complete research design, including the question order, is summarized in Table 2. Questions

associated with the conjoint tasks are highlighted in green (Rows 1, 2 and 4), while the validation tasks

are highlighted in yellow (Rows 3 and 6). The filler task is highlighted in blue (Row 5). In this design,

FP1 can be matched with Fixed; FP2 can be matched with ACA.

5 See Alreck and Settle 1995; Green, Krieger and Agarwal 1991; Huber, et. al. 1993; Johnson 1991. For more gen-eral discussions of question-order effects, see Bickart 1993; Feldman and Lynch 1988; Nowlis and Simonson 1997; Tourangeau, Rips and Rasinski 2000 6 Although the SE responses are collected in the FP2 condition, they are not used in analytic-center estimation or FP question design. However, they do provide the opportunity to post-test alternative hybrid estimation methods.

8


Table 2 Detailed Research Design

Row FP1 Fixed FP2 ACA

1 Self-explicated Self-explicated

2 FP paired comparison

Fixed paired comparison

FP paired comparison

ACA paired comparison

3 Internal validity task Internal validity task Internal validity task Internal validity task

4 Purchase intentions Purchase intentions

5 Filler task Filler task Filler task Filler task

6 External validity task

External validity task



Internal Validity Task: Holdout PC Questions

We allowed each of the conjoint methods to design sixteen metric paired-comparison (PC) ques-

tions. Following these sixteen questions, respondents were presented with four additional PC questions.

These four questions served as holdout questions and were used to evaluate internal validity. This is a

common test that has been used extensively in the literature.7

The same procedure was used to design the four holdout questions in each of the experimental

conditions, and this design was independent of the sixteen questions that the respondents had just an-

swered. In particular, the product profiles in these four questions were randomly selected from an inde-

pendent efficient design. There was no separation between the sixteen initial questions and the four hold-

out questions, so that respondents were not aware that the questions were serving a different role.

Filler Task

The filler task was designed to separate the conjoint tasks and the external validity task. It was

hoped that this separation would mitigate any memory effects that might influence how accurately the

7 The number of holdout questions varies, but is generally between two and four questions. Examples include Acito and Jain (1980), Akaah and Korgaonkar (1983), Cattin, Hermet and Pioche (1982), Elrod, Louviere and Davey (1992), Green, Goldberg and Wiley (1982), Green, Goldberg and Montemayor (1981), Green and Helsen (1989), Green, Helsen and Shandler (1988), Green and Krieger (1995), Green, Krieger and Agarwal (1991), Green, Krieger and Bansal (1988), Haaijer, Wedel, Vriens and Wansbeek (1998), Hagerty (1985), Huber (1975), Huber, Wittink, Fiedler and Miller (1993), Hauser and Koppelman (1979), Hauser and Urban (1977), Hauser, Tybout and Koppel-man (1981), Jain, Acito, Malhotra and Mahajan (1979), Johnson, Meyer and Ghosh (1989), Johnson (1999), Lenk, DeSarbo, Green and Young (1996), Malhotra (1986), Moore (1980), Moore and Semenik (1988), Orme, Alpert and Christensen (1998), Orme and King (1998), Parker and Srinivasan (1976), and Tybout and Hauser (1981).

9


information from the conjoint tasks predicted which bags respondents chose in the external validity tasks.

The filler task was the same in all four experimental conditions and comprised a series of questions ask-

ing respondents about their satisfaction with the survey questions. There was no significant difference in

the responses to the filler task across the four conditions.

External Validity Task: Final Bag Selection

Respondents were told that they had $100 to spend and were asked to choose among five bags.

The five bags shown to each respondent were drawn randomly from an orthogonal fractional factorial

design of sixteen bags. This design was the same across all four experimental conditions, so that there

was no difference, on average, in the bags shown to respondents in each condition. The five bags were

also independent of responses to the earlier conjoint questions. The price of the bags varied between $70

and $100 reflecting the difference in the market price of the features included with each bag. Pricing the

bags in this manner ensured that the choice set represented a Pareto frontier, as recommended by Elrod,

Louviere, and Davey (1992), Green, Helsen and Shandler (1988), and Johnson, Meyer and Ghosh (1989).

Respondents were instructed that they would receive the bag that they chose. If the bag was

priced at less that $100, they were promised cash for the difference. In order to obtain a complete rank-

ing, we told respondents that if one or more alternatives were unavailable, they might receive a lower

ranked bag. The page used to solicit these rankings is presented in Figure 2. We acknowledge two trade-

offs in this design. The first is an endowment effect because we endow each respondent with $100. The

second is the lack of a “no bag” option. While both are interesting research opportunities and quite rele-

vant to market forecasting, neither should favor one of the three methods relative to the other; the en-

dowment/forced-choice design is common to all treatments. Pragmatically, we designed the task to maxi-

mize the power of the statistical comparisons of the four treatments. The forced-choice also helped to re-

duce the (substantial) cost of this research.

At the end of the study the chosen bags were distributed to respondents together with the cash dif-

ference (if any) between the price of the selected bag and $100.

Self-Explicated and Purchase Intention Questions

The self-explicated questions asked respondents to rate the importance of each of the ten product

features. For a fair comparison to ACA, we used the wording for the questions and (four-point) response

scale proposed by Sawtooth Software (1996). Further details regarding the design of the questions and

the measurement scales are available in Sawtooth Software (1996) and by visiting the website listed in the

acknowledgements section of this paper (see also Figure 3d). For the purchase intentions questions, re-

spondents were shown six bags and were asked how likely they were to purchase each bag. We adopted

10


the wording, response scale, and algorithms for profile selection suggested by Sawtooth Software.8

Figure 2 Respondents Choose and Keep a Laptop Computer Bag

Subjects

The subjects (respondents) were first-year MBA students. Although the subjects had taken basic

marketing, they neither knew the objectives of our experiment nor had they taken a course in which con-

joint analysis was taught in detail. A total of 360 students were sent an email inviting their participation.

We received 330 complete responses (there was one incomplete response) resulting in a response rate of

over 91%. This high response rate reflects the perceived desirability of the laptop bag relative to the short

time investment required to respond. Random assignment yielded 80 subjects for the ACA condition, 88

for the Fixed condition, and 162 for the FP conditions broken out as 88 for the standard question order

(FP1) and 74 for the alternative question order (FP2).

8 The six profiles include the most attractive, the least attractive, and four “middling” profiles as determined by the respondent’s preferences. The four middling profiles are chosen to differ most strongly based on partworths as de-termined by the SE and PC questions.

11


Figure 3 Example Screens from Questionnaires

(a) Price as change from $100 (b) Introduction of “sleeve” feature

(c) Metric paired-comparison (PC) question (d) Self-explicated (SE) questions

The questionnaires were pretested on a total of 69 subjects drawn from professional market re-

search and consulting firms, former students, graduate students in Operations Research, and second-year

students in an advanced marketing course that studied conjoint analysis. The pretests were valuable for

fine-tuning the question wording and the web-based interfaces. By the end of the pretest, respondents

found the questions unambiguous and easy to answer. Following standard scientific procedures, the pre-

test data were not merged with the experimental data. However, analysis of this small sample suggests

that the findings agree directionally with those reported here, albeit not at the same level of significance.

Additional Details

Figure 3 illustrates some of the key screens in the conjoint analysis questionnaires. In Figure 3a

12


respondents are introduced to the price feature.9 Figure 3b illustrates one of the dichotomous features –

the closure on the sleeve. This is an animated screen that provides more detail as respondents move their

pointing devices past the picture. Figure 3c illustrates one of the PC tasks. Each respondent was asked to

rate his/her relative preference for two profiles that varied on three features. Sawtooth Software (1996)

recommends that respondents can handle three features once they are familiar with the task and cites ex-

perience that little benefit is gained by using more than three features. To provide a fair comparison to

ACA, we adopted this guideline. Both text and pictures were used to describe the profiles. Features that

did not vary between the products were chosen to coincide with the respondent’s choices in the tasks il-

lustrated in Figure 3b. The format was identical for all four experimental treatments. Finally, Figure 3d

illustrates the first three self-explicated questions. The full questionnaires for each treatment are available

on the website listed in the acknowledgements to this paper.

4. Tests of Internal Validity To test internal validity we compare the ability of each method to predict preferences in four

holdout metric paired-comparison (PC) questions. As an initial comparison we compare the standard ver-

sions of each method, as described in the previous section.10 These basic methods vary on both question-

design and estimation method. In later analysis we consider hybrid versions (and additional estimation

methods) in which we can isolate question design versus estimation. Recall that we fielded four experi-

mental conditions. In two of the conditions (ACA and FP2) self-explicated (SE) questions preceded the

paired-comparison (PC) questions, while in the other two conditions (Fixed and FP1) only the PC ques-

tions preceded the validation tasks. This experimental design anticipates that asking the SE questions

before the PC questions (as required by ACA) may have an effect on the responses to the PC questions.

For this reason we also distinguish between the different question orders when presenting the findings.

For each respondent we calculated the correlation between the predicted and observed responses

and then averaged these correlations across respondents in each condition.11 We present the resulting av-

erages in two formats. First, we present the values of the average correlations after 8 and 16 PC questions

in Table 3. Second, we display the averages graphically in Figure 4, where we recalculated the correla-

tions for each respondent after each PC question, so that we can observe the incremental accuracy from

asking additional questions.

9 Due to human-subject concerns, the details of the $100 incentive were described in an introductory screen that included other experimental protocols. 10 In this initial comparison we use the version of ACA that was standard when the data were collected (Version 4). 11 As an alternative metric, we compared how well the methods predicted which product the respondents favored from each pair. The two metrics provide a very similar pattern of results and so, for ease of exposition, we focus on the correlation measure in Figure 4. There is an additional reason to focus on correlations in the validity tests. First-

13


Table 3 Performance on the Internal Validity Task for the Basic Methods:

Correlations Between Predicted and Observed Responses

After 8 Questions

After 16 Questions

Sample Size

Experimental Cells Without SE Questions Fixed na 0.73 88 FP1 0.69 0.79 88

Experimental Cells With SE Questions ACA 0.74 0.81 80 FP2 0.74 0.83 74

Figure 4 Interval Validity Test for the Basic Methods – Holdout Pairs

Without SE Questions

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Number of Questions

Ave

rage

Hol

dout

Co

rrel

atio

n

FP1

With SE Questions

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Number of Questions

Ave

rage

Hol

dout

Co

rrel

atio

n

FP2

ACA

Fixed

There are several findings of interest. First, all of the correlations are within the ranges reported

in prior research. When metric data are gathered, published correlations of predicted and observed ratings

vary from 0.25 to 0.82 and are typically in the range of 0.60 to 0.75. Second, for the Fixed method, OLS

estimation is only feasible after at least eleven PC responses (there are ten parameters to estimate). Al-

though accuracy after eleven questions is lower than the corresponding FP results (FP1), accuracy im-

proves when additional PC responses are received. Toubia, Simester and Hauser (2002) report simula-

tions suggesting a similar pattern of results. They found that fixed designs do not do as well as FP when

responses are available for only a small number of questions. It is only after the number of questions ex-

ceeds 170% of the number of parameters (17 questions in our experiment) that the Fixed method performs

as well as other methods. Were we to ask 20 questions from an efficient design, Toubia, Simester and

preference prediction is a dichotomous variable and has higher variance than the Spearman correlation, which is based on four observations per respondent. A similar argument applies to the external validity tests.

14


Hauser’s (2002) simulations suggest that the Fixed method would approach the accuracy of FP1. Third,

there is effectively no difference in the accuracy of the basic FP2 and ACA methods. This is true, irre-

spective of the number of PC questions that are asked.12

Significance Tests

We used independent sample t-tests to compare the mean correlations across the different meth-

ods. The t-tests were conducted after 8 and 16 PC questions for the ACA versus FP2 comparison and

after 16 questions for the Fixed versus FP1 comparison. None of these differences were statistically sig-

nificant (p<0.05). However, these independent sample t-tests do not use all of the information available

in the data. For this reason, we also evaluate significance by a method that pools the correlation measures

calculated after each additional PC question. This results in a total of sixteen observations for each re-

spondent in the two adaptive conditions, and six observations for respondents in the Fixed condition. To

control for heteroscedasticity we estimate a separate intercept for each question number. The multivariate

specification also allows us to explicitly control for heterogeneity in the respondent samples. In particular,

the null model in each condition is that the ten laptop bag features are equally important. The methods

each seek to improve on this null model by estimating variation in the partworths for each respondent. If,

despite the random assignment of respondents to conditions, the responses in one condition are more con-

sistent with the null model, then the comparisons will be biased in favor of this condition. We explicitly

control for this possibility by including a measure describing how accurately the equal weights (null)

model performs on the respective validity measures for each respondent.

The complete specification (for the FP2 versus ACA comparison) is described in Equation 1,

where r indexes the respondents and q indexes the number of PC questions used in the partworth esti-

mates. The α’s and β’s are coefficients in the regression and εrq is an error term. The equation is esti-

mated using OLS.

(1) 16

1 21

rq q q r rqq

Correlation Question ACA EqualWeightα β β=

= + +∑ ε+

The independent variables are defined as follows:

ACA 1 if the respondent was in the basic ACA condition; 0 otherwise.

Questionq 1 if the correlation was calculated using only the first q PC questions; 0 otherwise.

EqualWeightr Correlation obtained for respondent r with an equal-weights model.

12 Undocumented heuristics were necessary for 63% of the ACA observations. Based on documented procedures, the predictive performance of ACA was significantly lower at 0.53 and 0.61 for q = 8 and 16, respectively.

15


This equation was estimated using correlations from the FP2 and ACA conditions (basic meth-

ods). Under this specification, the intercepts (αq’s) describe the correlations for FP2 after each question,

while the β1 coefficient represent the expected increase or decrease in this correlation in the ACA condi-

tion. Positive (negative) values for the β1 coefficient indicate higher (lower) correlation for the basic

ACA method compared to the basic FP method in FP2. The findings are presented in Table 4, together

with the findings from an analogous analysis comparing the Fixed and FP1 results. This analysis was

only conducted using correlations calculated after questions 11 through 16 (the Fixed method does not

allow predictions for fewer questions). We also present the results of a similar study conducted on the

external validity task.

In Table 4 we report the β1 and β2 coefficients (the other coefficients are omitted for ease of expo-

sition). We control for heterogeneity introduced by the panel nature of the data by reporting robust stan-

dard errors (White 1980). We also estimated a random effects model but there was almost no difference in

the coefficients of interest. Moreover, the Hausman specification test favored the fixed-effects specifica-

tion over the random-effects specification.

The findings from this multivariate analysis indicate that the basic FP method is significantly more

accurate (p<0.01) than the basic Fixed method on the internal validity task. There was no significant dif-

ference in the performance of the basic FP and ACA methods (t=0.86). We next consider external validity

by examining whether the various methods can be used to predict actual customer choice.

Table 4 Multivariate Analysis of the Pooled Data: Basic Comparisons Only

Internal Validity Task (Holdout Questions)

External Validity Task (Bag Choice)

FP1 vs. Fixed FP2 vs. ACA FP1 vs. Fixed FP2 vs. ACA

Basic Fixed (β1) -0.166**

(0.019) -0.275**

(0.023)

Basic ACA (β1) 0.008

(0.009) 0.027*

(0.012)

Equal Weightr (β2) 0.445**

(0.033) 0.507**

(0.022) 0.327**

(0.029) 0.413**

(0.020)

Adjusted R2 0.33 0.40 0.21 0.25

Sample Size 1,149 2,368 1,144 2,464

Coefficients describing the question effects omitted from table. Robust standard errors are in parentheses. ** Significant at p<0.01. * Significant at p<0.05.

16


5. Tests of External Validity While tests of internal validity are common in the conjoint-analysis literature, tests of external va-

lidity at the individual level are rare.13 A search of the literature revealed four studies that predict choices

in the context of natural experiments and one study based on a lottery choice. Wittink and Montgomery

(1979), Srinivasan (1988), and Srinivasan and Park (1997) all use conjoint analysis to predict MBA job

choice. Samples of 48, 45, and 96 student subjects, respectively, completed a conjoint questionnaire prior

to accepting job offers. The methods were compared on their ability to predict these students’ job choices.

First preference predictions ranged from 64% to 76% versus random-choice percentages of 26-36%. This

corresponds to an improvement relative to random in the range of 50-56%.14

In another natural experiment, Wright and Kriewall (1980) used conjoint analysis (Linmap) to

predict college applications by 120 families. They were able to correctly predict 20% of the applications

when families were prompted to think seriously about the features measured in conjoint analysis; 15%

when they were not. This converts to a 16% improvement relative to their null model. Leigh, MacKay

and Summers (1984) allocated 122 undergraduate business majors randomly to twelve different conjoint

tasks designed to measure partworths for five features. Respondents indicated their preferences for ten

calculators offered in lottery. There were no significant differences among methods with first-preference

predictions in the range of 26-41% and percentage improvements of 28%. The authors also compared the

performance of estimates based solely on SE responses and observed similar performance to the conjoint

methods.

In the related choice-based conjoint literature, Louviere, Hensher, and Swait (2000) compare the

convergent validity between revealed preference and “stated preference” methods to suggest that, in most

cases, the two methods measure the same relative coefficients (partworths). Although this is not predic-

tive validity per se, it does provide evidence that stated-preference questions, where respondents compare

feature-based product profiles, provide data consistent with actual choices made by consumers. Urban

and Katz (1983) provide further evidence that models based on metric paired-comparison questions, such

as their Assessor model, predict well.

Against this background we examine the results from the external validity task in which respon-

dents “purchased” a bag from a selection of five bags in transactions involving real money. In case their

chosen bag was unavailable, the respondents ranked all five bags. To evaluate the conjoint methods we

calculated the correlation between the actual and observed rankings for the five bags shown to each re-

spondent. We report the findings in the same format as the findings for the internal validity task. The

13 Some researchers report aggregate predictions relative to observed market share. See Bucklin and Srinivasan (1991), Currim (1981), Davidson (1973), Green and Srinivasan (1978), Griffin and Hauser (1993), Hauser and Gas-kin (1984), McFadden (2000), Page and Rosenbaum (1989), and Robinson (1980).

17


correlations averaged across respondents in each condition are presented in Figure 5, where we re-

calculate the correlations after each of the PC questions. In Table 5 we present the average correlations

and average proportions after questions 8 and 16.

The findings from this task are strikingly similar to the results of the internal validity test. The

basic FP method yields more accurate predictions of the respondents’ bag rankings than the basic Fixed

method. The difference in the average correlation after 16 Questions is significant (p<0.01). The pooled

analysis conducted using Equation 1 confirmed that the difference calculated across multiple questions

was also significant (p<0.01).

Table 5 Performance on the External Validity Task for the Basic Methods

After 8

Questions After 16

Questions Sample

Size

Experimental Cells Without SE Questions

Fixed na 0.54 88

FP1 0.59 0.68 88

Experimental Cells With SE Questions

ACA 0.61 0.67 80

FP2 0.62 0.68 74

There is little difference between the basic FP and the ACA methods in both Figure 5 and in Table

5. However, the findings of the pooled analysis (Equation 1) revealed a small improvement in accuracy

when using ACA. This highlights the additional power and better controls offered by this multivariate

analysis. The relatively strong performance of the basic ACA method suggests that the SE questions may

contain useful information. Recall that in the FP2 condition we collected SE measures but did not use

them in the analytic-center estimation. This finding suggests that we may be able to improve the accuracy

of the predictions in the FP2 condition by incorporating SE responses into the FP estimates. In later

analysis we evaluate estimation techniques that allow us to investigate this opportunity.

We also compared how frequently the different methods identified which bag each respondent

most preferred. After 16 questions FP1 correctly identified the first preference bag approximately 59% of

respondents, compared to 52% for the Fixed method, while ACA identified the favored bag 64% of the

time, compared to 54% for FP2 (neither of these differences is statistically significant, p<0.05). A ran-

dom selection would have predicted the first preference bag just 20% of the time. These represent similar

percentage improvements to those reported elsewhere in the literature.

14 This commonly reported metric is calculated as (predicted – random)/(100% - random).

18


Figure 5 External Validity Test – Correlation with Actual Choice

Without SE Questions

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Number of Questions

Aver

age

Choi

ce C

orre

latio

n

FP1

With SE Questions

0.0

0.2

0.4

0.6

0.8

1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Number of Questions

Ave

rage

Cho

ice

Cor

rela

tion

FP2

ACA

Fixed

Summary of Internal and External Validity Tests

Recall that polyhedral methods for conjoint analysis were developed to provide reasonable esti-

mates with fewer questions, especially for web-based questionnaires and for use in the new product de-

velopment processes. The basic FP method (FP question design and analytic-center estimation) per-

formed well compared to the basic Fixed and ACA methods in the internal and external validity tests

conducted in this study, out-performing the basic Fixed method and offering similar performance to basic

ACA. When interpreting the comparison to ACA it is helpful to recall that ACA has undergone extensive

development and improvements in its approximately twenty years of existence and is the only method in

Figure 5 to use the SE and PI (Purchase Intentions) data. In comparison, FP has benefited from just two

years of development and this is its first empirical test. Given further development and refinement, it

seems likely that researchers will develop polyhedral methods that can do even better than current FP

methods. Indeed, the findings suggest at least one avenue for improvement. The relatively strong per-

formance of the ACA method suggests that the SE measures may contribute useful information, which

could enhance the accuracy of the basic FP method. In a later section we investigate this opportunity.

Although basic FP performed significantly better than the basic Fixed benchmark in this test, this

should not be interpreted as evidence rejecting the Fixed method. Fixed methods have proven valuable in

many managerial applications and have strengths that might allow them to outshine FP in other empirical

contexts. Finally, these initial comparisons evaluated only the basic methods – both question design and

estimation varied among methods. All question-design methods might do better with improved estima-

tion methods, such as Hierarchical Bayes estimation, that are investigated later in this paper. This later

analysis also enables us to isolate the relative contributions of question design versus estimation method.

19


6. Question-Order Effects – FP1 vs. FP2 The results reported above suggest an additional finding of interest. The two conditions in which

we collected SE questions (FP2 and ACA) generally yielded more accurate predictions than the condi-

tions in which SE questions were not collected prior to the validation tasks (FP1 and Fixed). In the FP1

and FP2 conditions we used the same method to design questions and estimate partworths (the SE ques-

tions were not used in estimation in FP2), suggesting that, if there is a difference in the relative accuracy

of FP1 and FP2, it could be explained by the question order.

The literature offers some guidance on the influence that question order may have on the accu-

racy of responses. If the SE questions “wear out” or tire respondents, causing them to pay less attention

to the PC questions, we might expect that inclusion of the SE questions will degrade the accuracy of the

PC responses (see for example Alreck and Settle 1995). Alternatively, the SE questions may improve the

accuracy of the PC questions by acting as a training or “warm-up” task. Some researchers have observed

learning or priming phenomenon, in which the SE questions help respondents clarify their values, increas-

ing the accuracy of the PC questions (Green, Krieger and Agarwal 1991; Huber, et. al. 1993; Johnson

1991). The literature advances several theories that support this training hypothesis, including task learn-

ing, self-preference learning, memory accessibility, and context effects (Bickart 1993; Feldman and

Lynch 1988; Nowlis and Simonson 1997; Tourangeau, Rips and Rasinski 2000; Simmons, Bickart and

Lynch 1993). Because any “warm-up” effect might counter the “wear out” effect, the directional impact

of question order is an empirical issue.15

By comparing FP1 and FP2 we examine whether or not there is a significant question-order ef-

fect. When using all sixteen questions, the predictive accuracy of the FP1 and FP2 conditions are not sta-

tistically different. This suggests that there was no measurable wearout due to the SE questions, and that

by the sixteenth question any warm-up/learning advantage had disappeared. However, there might still be

an effect for low numbers of questions. When we plot FP1 and FP2 for questions 1 through 16 we ob-

serve that FP2 performs slightly better than FP1 after a small number of questions (this plot is not re-

ported). When we estimate the performance using a version of Equation 1, the effect is not significant for

the choice correlations (t = 0.68), but is significant on the holdout correlations (t = 2.60).

In summary, the evidence is mixed. FP2 performs better than FP1, but the difference is not sig-

nificant for external validity and only marginally significant for internal validity. The SE questions might

improve the accuracy of the PC questions for the holdout correlations, but further testing is warranted.

15 We might also observe a question-order effect if the PC questions preceded the SE questions or if the choice task preceded the SE questions. For example, Green, Krieger and Agarwal (1991) observe an effect that is consistent with PC-affects-SE hypothesis, but Huber, et. al. (1993) do not. In addition, theories in Allen 1982; Allen and Dil-lon 1982; Bem 1972; Folkes and Kiesler 1991; and Tybout and Yalch 1980 suggest that SEs that follow a choice task might be stated to justify choices.

20


7. Improving Predictive Performance with Hybrid Methods The term heterosis is used in biology to describe the principle under which genetically-diverse

offspring of two pure species often have traits superior to either of their purebred parents (Campbell 1996,

p. 428). Analogously, there is evidence that combinations of conjoint methods often yield more accurate

or more efficient predictions than either of the parent methods.16 Successful hybrid conjoint models have

combined the power of compositional methods, such as self-explicated (SE) questions, and decomposi-

tional methods, such as metric paired-comparison (PC) questions to produce new estimates. In this con-

text, ACA itself can be considered a hybrid. Although there are instances in which purebred methods ei-

ther outperform or provide equivalent accuracy to hybrid methods, there are many situations and product

categories in which hybrid methods outperform their purebred progenitors (Green 1984; Wittink and

Bergestuen 2001).

To explore heterosis in conjoint analysis, we compare six estimation procedures: three that make

use of self-explicated responses and three that do not. We begin by introducing the estimation procedures

and then presenting the findings.

Alternative Estimation Procedures When Self-Explicated Responses Are Not Available

We consider three estimation procedures that do not require self-explicated responses: OLS (or-

dinary least squares); Analytic Center estimation (AC); and Hierarchical Bayes estimation (HB). The

first two methods were used in the initial comparisons reported above, with the Fixed predictions using

OLS estimation and the FP1 and FP2 conditions using Analytic Center estimation. The third method,

Hierarchical Bayes estimation, was not used in the initial comparisons, but has been the subject of consid-

erable interest amongst marketing academics (see for example Allenby and Rossi 1999; Arora, Allenby

and Ginter 1998; Lenk, et. al. 1996; and Liechty, Ramaswamy and Cohen 2001). This method uses pre-

dictions at the population level to mitigate the effect of errors in individual-level responses. In particular,

HB methods use data from the full sample to iteratively estimate individual-level partworths by constrain-

ing the distribution of those partworths. The estimation is based on Gibbs sampling and the Metropolis

Hastings Algorithm (Sawtooth Software 1999). For starting values we use equal weights for the impor-

tance of the nine product features and randomly draw the starting values for price from a uniform

distribution.17 Further investigation revealed that the final parameter estimates (although not the perform-

ance metrics) were sensitive to this random draw. For this reason, we repeated the analysis three times,

with different random draws in each case, and then averaged the findings across the three repetitions.

16 See for example: Akaah and Korgaonkar 1983; Cattin, Hermet, and Pioche 1982; Carroll and Green 1995, Green 1984; Green, Goldberg and Montemayor 1981; Green, Goldberg and Wiley 1982; Green and Krieger 1995; Huber 1974; Huber, et. al. 1993; Leigh, MacKay, and Summers 1984; Moore and Semenik 1988; Wind, et. al. 1989. 17 If price also had the same starting value the initial covariance matrix would be singular.

21


Alternative Estimation Procedures When Self-Explicated Responses Are Available

We consider three estimation methods that make use of self-explicated responses: Weighted hy-

brid estimation (WH); Analytic Center estimation with self-explicated constraints (ACSE); and Hierar-

chical Bayes estimation with self-explicated constraints (HBSE).

In the WH method the SE and PC partworths are normalized to have equal sums of differences

between the best and worst levels of each feature and then weighted by the number of feature levels in the

SE questions and the number of PC questions. This is the method used to generate final parameter esti-

mates in the most recent version of ACA (Version 5), which was released following completion of the

design and data collection portion of this study. (All versions of ACA use the same method to design

questions.) As with the previous version, we coded this estimation procedure and used a sample of data to

confirm the accuracy of the code with Sawtooth Software.

The ACSE method is a modification of AC estimation. We incorporate the SE responses by in-

troducing constraints on the feasible polyhedron that forms the basis of the AC estimation procedure. For

example, if Feature 1 is given a higher SE importance weight than Feature 2, we add a constraint to the

FP polyhedron such that the partworth of Feature 1 is larger than the partworth of Feature 2.

The HB estimation procedure was modified in two respects to incorporate the SE responses.

First, the SE responses were used to set the starting values. Second, we used the SE responses to intro-

duce constraints on the estimation. These constraints were analogous to the constraints introduced to the

AC estimation in the ACSE method described above.18

Results

We separately combine the different estimation methods with data from each of the respective

conditions. For the methods that do not require self-explicated responses we used data from the Fixed

and FP1 conditions, and for the methods requiring SE responses we used data from the ACA and FP2

conditions. The findings are reported in Table 6. We obtain similar results for both the internal validity

(holdout questions) and the external validity (final bag choice) tests, and so we focus primarily on the ex-

ternal-validity choice correlations. We briefly discuss the holdout results at the end of this section.

To support these findings we re-estimated Equation 1 using appropriate specifications and data-

sets to evaluate the significance of the various comparisons. We summarize the findings in Table 7. Ref-

erences to statistical significance in the following discussion relate to analysis with Equation 1.

18 We also considered introducing the SE responses into the objective function. However, additional analysis re-vealed that this did not improve accuracy.

22


Table 6 Performance of Hybrid Methods on the External Validity Tests

Correlation with Actual Choice

After 8 Questions After 16 Questions

Fixed Questions

FP1 Questions

Fixed Questions

FP1 Questions

Ordinary Least Squares (OLS) na na 0.54 0.60

Analytic Center (AC) 0.51 0.59 0.62 0.68

Hierarchical Bayes (HB) 0.53 0.57 0.61 0.64

Sample size 88 88 88 88

ACA Questions

FP2 Questions

ACA Questions

FP2 Questions

Weighted Hybrid (WH) 0.66 0.68 0.68 0.72

AC with SE constraints (ACSE) 0.63 0.70 0.65 0.71

HB with SE constraints (HBSE) 0.64 0.73 0.65 0.74


Table 7 Performance of Hybrid Methods on the External Validity Tests

Conclusions from the Multivariate Analysis

Without SE Questions With SE Questions

Comparison of Estimation Methods

Fixed Questions HB > AC >>> OLS

FP1 Questions AC >> HB >>> OLS

ACA Questions HBSE > WH > ACSE

FP2 Questions HBSE >>> WH > ACSE

Comparison of Question-Design Methods

Ordinary Least Squares Estimation (OLS) FP >>> Fixed

Analytic Center Estimation (AC) FP >>> Fixed

Hierarchical Bayes Estimation (HB) FP >>> Fixed

Weighted Hybrid Estimation (WH) FP > ACA

AC with SE Constraints (ACSE) FP >> ACA

HB with SE Constraints (HBSE) FP >>> ACA

Method A > Method B: Method A is more accurate than Method B but the difference is not significant. (p<0.05). Method A >> Method B: Method A is significantly more accurate than Method B (p<0.05). Method A >>> Method B: Method A is significantly more accurate than Method B (p<0.01).

23


Figure 6 Comparing Estimation Methods Holding the Question Design Constant

Correlation with Actual Choice

Without SE Responses

Figure 6a: Fixed Questions

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Question Number

Ave

rage

Cho

ice

Cor

rela

tions

HB

AC

Figure 6b: FP1 Questions

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Question Number

Ave

rage

Cho

ice

Cor

rela

tions

HB

AC

OLS OLS

WH WH

With SE Responses

Figure 6c: ACA Questions

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Question Number

Ave

rage

Cho

ice

Cor

rela

tions

HBSE

ACSE

Figure 6d: FP2 Questions

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Question Number

Ave

rage

Cho

ice

Cor

rela

tions HBSE

ACSE


We can compare the accuracy of the different estimation methods by comparing the findings from

Table 6 within a column (for a specific set of questions). This comparison holds the question design con-

stant and varies the estimation method. To facilitate this comparison in Figures 6a-6d we have graphi-

cally depicted the performance of the different estimation methods for each set of questions.

For those experimental cells that were designed to obtain estimates without the SE questions, Hi-

erarchical Bayes and AC estimation offer similar predictive accuracy. Both techniques are significantly

more accurate than the OLS estimates (p < 0.01). The evidence that HB estimation out-performs OLS is

consistent with the recent simulation analyses by Andrews, Ansari, and Currim (2002) who suggest that

individual-level OLS estimates might overfit data and do less well than HB and finite mixture models on

predictive validity. However, the authors did not evaluate AC estimation. Because HB estimation is both

24


computationally demanding and time-consuming, some readers may prefer to use the AC approach in

rapid-response circumstances. If SE responses are available, the preferred estimation method appears to depend upon both the

question-design method that is chosen and the number of PC responses used in the estimation. By using

population-level PC data, HBSE performs well after only a single PC question. However, increasing the

number of PC responses yields less improvement in the accuracy of HBSE relative to WH and ACSE.

When using PC questions designed by FP2, all three estimation methods eventually converge to compa-

rable accuracy levels. When using PC questions designed by ACA, the improvement in the WH method

is eventually sufficient to out-perform the HBSE estimates. Because the WH method corresponds to the

default estimation method in the currently available commercial version of ACA, we asked Sawtooth

Software about the relative accuracy of WH and HBSE. In email communication they advocated “the

ACA/HB approach as the gold standard, with [ACA Version 5] as the default (and a good) method pro-

vided by the base software.” Our findings support this recommendation for small numbers of PC ques-

tions. Figure 6c suggests an interesting hypothesis. Perhaps as the number of questions gets large, there

might be sufficient data at the individual level to provide estimates that need not depend on population

distributions. This is an interesting avenue for further investigation.


The findings in Table 6 also facilitate comparison of the question-design methods. Comparing

across rows holds the estimation method constant and varies the question-design method. This compari-

son favors the two conditions in which the FP question-design method was used. When the SE measures

were not collected, the FP1 question design yields significantly (p<0.01) more accurate predictions than

the Fixed question design. This is true for all tested estimation methods, although the difference is par-

ticularly large when using OLS estimation.

When SE data are collected, the result comparing FP2 and ACA question design is almost as

strong. The FP2 question design is better for all three estimation methods (WH, ACSE, and HBSE). In-

deed, further investigation reveals that for every estimation method we tested, the estimates derived using

the FP2 questions outperform the corresponding estimates derived using ACA questions after each and

every question number. This finding is consistent with the results of the simulations reported by Toubia,

Simester and Hauser (2002).

The Incremental Predictive Value of the SE Questions

Table 6 and Figure 6 also highlight the incremental value of SE questions by offering a compari-

son of methods that use both the SE and PC questions to methods that rely on just the PC questions. The

results suggest that methods that use both SE and PC questions consistently outperform methods that use

25


only the PC questions. Thus, at least in this category, the SE questions provide incremental predictive

ability. We caution that when interpreting this finding readers should recall that the product category was

chosen at least in part because the SE responses were expected to be accurate.

In turn, we can evaluate whether the PC responses also contribute incremental accuracy. Predic-

tions that use the SE responses alone (without the PC responses) yield an average correlation with actual

choice of 0.64. This is lower than the performance of the methods that use both SE and PC responses (see

Table 6) and so we conclude that both SE and PC questions add incremental predictive ability in our data.

This conclusion is consistent with previous evidence in the literature (Green, Goldberg, and Montemayor

1981; Huber, et. al. 1993, Johnson 1999; Leigh, MacKay, and Summers 1984).

Do Small Numbers of PC Questions Suffice?

Recall that we motivated this study by recognizing the need for methods that provide reasonable

estimates after few questions. The findings in Figure 6 allow us to evaluate how many PC questions are

required to provide meaningful estimates. The answer depends upon whether SE responses are available.

Without SE responses, there is a steady improvement in predictive ability as the number of PC questions

increases. On the other hand, when SE responses are available it will often not be necessary to ask all 16

PC questions; the improvement in accuracy is quite small, especially for FP question design (Figure 6d)

and HBSE estimation. We again caution that this conclusion might be specific to categories in which the

SE responses are sufficiently accurate and we caution that “small” must be viewed as relative to the num-

ber of features and levels.

The Internal Validity Task

We repeat the analysis of hybrid methods using the correlation measures from the internal valid-

ity (holdout questions) task. For completeness we present the analog of Tables 6 and 7 in the Appendix.

As in our initial comparisons, the findings from the holdout task are similar to the findings for the choice

task. However, there are two differences that are worth noting. First, in the choice task the HBSE

estimation method was more accurate than WH for FP2. In the holdout task there is no significant

difference between the two methods. Second, the FP question–design method is not significantly bette

than the Fixed method for the holdout correla

r

tions.

Summary

Overall, the most accurate hybrid methods appears to be HBSE or WH estimation combined with

FP question design. When SE data are unavailable, the most accurate method appears to be either AC or

HB estimation combined with FP question design. However, the methods all do well and add signifi-

cantly to a product-development team’s knowledge of customer preferences.

26


While we are encouraged by the hybrid findings, we must be cautious in making a general rec-

ommendation. The hybrids appear to improve predictions, but are based on post analysis of the data in a

category where SE questions do well. Before embracing SE-based hybrids for all categories, further test-

ing is required in product categories in which SE responses are less accurate. This may include categories

where the product is less separable and more holistic and categories where the direct scaling of partworths

by respondents is difficult. We expect that such categories exist. For example, in a review of forty-two

SE applications, Wilkie and Pessemier (1973) identify 45% with favorable results, 33% questioning the

SE model, and the remainder mixed.

8. Discussion Polyhedral methods do well in simulation, but, prior to this paper, they were untested in empirical

situations. This paper provides an initial evaluation of polyhedral methods in an actual transaction involv-

ing a new product and real money. We report tests of internal validity and tests of external validity. The

internal validity tests comprise four holdout metric paired-comparison questions, while the external valid-

ity tests represent an actual choice from a set of five laptop computer bags. The findings for the internal

and external tests are consistent with one another and consistent with prior simulations. In this first head-

to-head comparison, the basic FastPace (FP) method appears to improve predictions relative to the effi-

cient Fixed (non-adaptive) design with OLS estimation and to offer similar accuracy to basic ACA.

We also tested several hybrid methods that combine separate estimation and question-design

components. These hybrids perform well and suggest that both the self-explicated and metric paired-

comparison questions contribute incremental information. These findings suggest that hybrid improve-

ments are worth considering in categories where self-explicated data are accurate. The preferred ques-

tion-design method appears to be FP’s polyhedral method. The preferred estimation method varied based

on the question-design method, although the Hierarchical Bayes estimation methods consistently per-

formed well.

In the end we conclude that all of the tested methods perform well on these validity tests. This is

good news for the many academics and product-development professionals that rely on conjoint analysis.

We conclude further that the polyhedral methods appear to have passed their first empirical test. The per-

formance of FP question design and the FP hybrid estimation suggest that further developments based on

polyhedral methods have the potential to advance conjoint analysis theory and practice. In the future,

researchers might consider testing the adaptive CBC polyhedral method versus appropriate CBC bench-

marks and testing the adaptive CBC polyhedral method versus the adaptive polyhedral metric-pairs

method.

27

Application and Test of Web-based Adaptive Polyhedral Conjoint Analysis – References

References Acito, Franklin and Arun K. Jain (1980), “Evaluation of Conjoint Analysis Results: A Comparison of Methods,”

Journal of Marketing Research, 17, (February), 106-112. Akaah, Ishmael P. and Pradeep K. Korgaonkar (1983), “An Empirical Comparison of the Predictive Validity of

Self-explicated, Huber-hybrid, Traditional Conjoint, and Hybrid Conjoint Models,” Journal of Marketing Research, 20, (May), 187-197.

Allen, Chris T. (1982), “Self-Perception Based Strategies for Stimulating Energy Conservation,” Journal of Con-sumer Research, 8 (March), 381-390.

----, and William R. Dillon (1983), “Self-Perception Development and Consumer Choice Criteria: Is There A Link-age?” in Advances in Consumer Research, 10, Eds. Richard P. Bagozzi and Alice M. Tybout. Ann Arbor, MI: Association for Consumer Research, 45-50.

Allenby, Greg M. and Peter E. Rossi (1999), “Marketing Models of Consumer Heterogeneity,” Journal of Econo-metrics, 89, (March/April), 57-78.

Alreck, Pamela L. and Robert B. Settle (1995), The Survey Research Handbook: Guidelines and Strategies for Con-ducting a Survey, 2E, (New York, NY: McGraw Hill), p. 102.

Andrews, Rick L., Asim Ansari, and Imran S. Currim (2002), “Hierarchical Bayes vs. Finite Mixture Conjoint Analysis Models: A Comparison of Fixed, Prediction, and Partworth Recovery,” Journal of Marketing Re-search, 36, (February), 87-98.

Arora, Neeraj, Greg M. Allenby and James L. Ginter (1998), “A Hierarchical Bayes Model of Primary and Secon-

dary Demand,” Marketing Science, 17, 1, 29-44.

---- and Joel Huber (2001), “Improving Parameter Estimates and Model Prediction by Aggregate Customization in Choice Experiments,” Journal of Consumer Research, 28, (September), 273-283.

Ben-Akiva, Moshe and S. Gershenfeld (1998), “Multi-featured Products and Services: Analyzing Pricing and Bun-dling Strategies,” Journal of Forecasting, 17, 175-196.

Bem, Daryl (1972), “Self Perception Theory,” in Advances in Experimental Social Psychology, 6, Ed. Leonard Ber-kowitz, New York: Academic Press.

Bickart, Barbara A. (1993), “Carryover and Backfire Effects in Marketing Research,” Journal of Marketing Re-search, 30, 1, (February), 52-62.

Bucklin, Randolph E. and V. Srinivasan (1991), “Determining Interbrand Substitutability Through Survey Meas-urement of Consumer Preference Structures,” Journal of Marketing Research, 28, (February), 58-71.

Campbell, Neil A. (1996), Biology, 4E, (Reading, MA: The Benjamin/Cummings Publishing Company, Inc.). Carroll, J. Douglas and Paul E. Green (1995), “Psychometric Methods in Marketing Research: Part I, Conjoint

Analysis,” Journal of Marketing Research, 32, (November), 385-391. Cattin, Philippe and Dick R. Wittink (1982), “Commercial Use of Conjoint Analysis: A Survey,” Journal of Market-

ing, 46, (Summer), 44-53. ----, Garard Hermet and Alain Pioche (1982), “Alternative Hybrid Models for Conjoint Analysis: Some Empirical

Results,” in Analytic Approaches to Product and Marketing Planning: The Second Conference, R. K. Srinivasan and Allen D. Shocker, eds., (Cambridge, MA: Marketing Science Institute), 142-152.

Choi, S. Chan and Wayne S. DeSarbo (1994), “A Conjoint-based Product Designing Procedure Incorporating Price Competition,” Journal of Product Innovation Management, 11, 451-459.

Currim, Imran S. (1981), “Using Segmentation Approaches for Better Prediction and Understanding from Consumer Mode Choice Models,” Journal of Marketing Research,18, (August), 301-309.

R1


Cusumano, Michael A. and Richard W. Selby (1995), Microsoft Secrets, (New York, NY: The Free Press). Dahan, Ely and John R. Hauser (2002), "The Virtual Customer,", forthcoming, Journal of Product Innovation Man-

agement. ---- and V. Srinivasan (2000), “The Predictive Power of Internet-Based Product Concept Testing Using Visual De-

piction and Animation,” Journal of Product Innovation Management, 17, 99-109. Davidson, J. D. (1973), “Forecasting Traffic on STOL,” Operations Research Quarterly, 22, 561-569. Dawes, Robin M. and Bernard Corrigan (1974), “Linear Models in Decision Making,” Psychological Bulletin, 81,

(March), 95-106. De Angelis, Chris (2001), “Sampling for Web-Based Surveys,” paper presented at 22nd Annual Marketing Research

Conference, Atlanta, GA, September. Einhorn, Hillel J. (1971), “Use of Nonlinear, Noncompensatory, Models as a Function of Task and Amount of In-

formation,” Organizational Behavior and Human Performance, 6, 1-27. Elrod, Terry, Jordan Louviere, and Krishnakumar S. Davey (1992), “An Empirical Comparison of Ratings-Based

and Choice-based Conjoint Models,” Journal of Marketing Research 29, 3, (August), 368-377. Eppinger, Steven D. (1998), “Information-Based Product Development,” Presentation to the Research Advisory

Committee of the Center for Innovation in Product Development, MIT, Cambridge, MA (January 12). ----, Daniel E. Whitney, Robert P. Smith, and David A. Gebala (1994), "A Model-Based Method for Organizing

Tasks in Product Development", Research in Engineering Design. 6, 1, 1-13. Feldman, Jack M. and John G. Lynch, Jr. (1988), “Self-generated Validity: Effects of Measurement on Belief, Atti-

tude, Intention, and Behavior,” Journal of Applied Psychology, 73, (August), 421-435. Folkes, Valerie S., and Tina Kiesler, (1991), “Social Cognition: Consumers’ Inferences About the Self and Others,”

in Thomas S. Robertson and Harold H. Kassarjian, eds., Handbook of Consumer Behavior, 281-315. Freund, Robert (1993), “Projective Transformations for Interior-Point Algorithms, and a Superlinearly Convergent

Algorithm for the W-Center Problem,” Mathematical Programming, 58, 385-414. Gonier, Dennis E. (1999), “The Emperor Gets New Clothes,” Paper presented at the Advertising Research Founda-

tion’s On-line Research Day and available at www.dmsdallas.com. (January). Green, Paul E., (1984), “Hybrid Models for Conjoint Analysis: An Expository Review,” Journal of Marketing Re-

search,155-169. Green, Paul E. and Kristiaan Helsen (1989), “Cross-Validation Assessment of Alternatives to Individual-Level Con-

joint Analysis: A Case Study,” Journal of Marketing Research, 346-350. ----, ----, and Bruce Shandler (1988), “Conjoint Internal Validity Under Alternative Profile Presentations,” Journal of

Consumer Research, 15, (December), 392-397. ----, Stephen M. Goldberg, and Mila Montemayor (1981), “A Hybrid Utility Estimation Model for Conjoint Analy-

sis,” Journal of Marketing, 33-41. ----, ----, and James B. Wiley (1982), “A Cross Validation Test of Hybrid Conjoint Models,” in Advances in Con-

sumer Research, 10, Richard P. Bagozzi and Alice M. Tybout, eds., (Ann Arbor, MI: Association for Con-sumer Research), 147-150.

---- and Abba Krieger (1995), “Attribute Importance Weights Modification in Assessing a Brand’s Competitive Po-tential,” Marketing Science, 14, 3, Part 1 of 2, 253-270.

----, ----, and Manoj K. Agarwal (1991), “Adaptive Conjoint Analysis: Some Caveats and Suggestions,” Journal of Marketing Research, 215-222.

----, ----, and Pradeep Bansal (1988), “Completely Unacceptable Levels in Conjoint Analysis: A Cautionary Note,” Journal of Marketing Research, 25, (August), 293-300.

R2


----, ----, and Jerry Wind (2001), "Thirty Years of Conjoint Analysis: Reflections and Prospects," Interfaces, 31 (May-June), S56-S73.

---- and V. Srinivasan (1990), “Conjoint Analysis in Marketing: New Developments With Implications for Research and Practice,” Journal of Marketing, 3-19.

---- and ---- (1978), “Conjoint Analysis in Consumer Research: Issues and Outlook,” Journal of Consumer Research, 5, 2, (September), 103-123.

Griffin, Abbie and John R. Hauser (1993), "The Voice of the Customer," Marketing Science, vol. 12, No. 1, (Win-ter), 1-27.

Haaijer, Rinus, Michel Wedel, Marco Vriens, and Tom Wansbeek (1998), “Utility Covariances and Context Effects in Conjoint MNP Models,” Marketing Science, 17, 3, 236-252.

Hagerty, Michael R. (1985), “Improving the Predictive Power of Conjoint Analysis” The Use of Factor Analysis and Cluster Analysis,” Journal of Marketing Research, 22, (May), 168-184.

Hauser, John R., and Steven P. Gaskin (1984), "Application of the `Defender' Consumer Model," Marketing Sci-ence, Vol. 3, No. 4, (Fall), 327-351.

---- and Frank S. Koppelman (1979), "Alternative Perceptual Mapping Techniques: Relative Accuracy and Usefulness, Journal of Marketing Research, Vol. 16, No. 4, (November), 495-506.

---- and Steven M. Shugan (1980), “Intensity Measures of Consumer Preference,” Operations Research, 28, 2, (March-April), 278-320.

----, Alice M. Tybout, and Frank Koppelman (1981), "Consumer-Oriented Transportation Service Planning: "Consumer Analysis and Strategies," Applications of Management Science, Vol. 1, 91-138.

---- and Glen L. Urban (1977), “A Normative Methodology for Modeling Consumer Response to Innovation,” Op-erations Research, 25, 5, (July-August), 579-619.

Huber, George P. (1974), “Multiattribute Utility Models: A Review of Field and Field-like Studies, Management Science, 20, (June), 1393-1402.

Huber, Joel (1975), “Predicting Preferences on Experimental bundles of Attributes: A Comparison of Models,” Journal of Marketing Research, 12, (August), 290-297.

----, Dick R. Wittink, John A. Fiedler, and Richard Miller (1993), “The Effectiveness of Alternative Preference Elicitation Procedures in Predicting Choice,” Journal of Marketing Research, 105-114.

Jain, Arun K., Franklin Acito, Naresh K. Malhotra, and Vijay Mahajan (1979), “A Comparison of the Internal Validity of Alternative Parameter Estimation Methods in Decompositional Multiattribute Preference Models,” Journal of Marketing Research, 16, (August), 313-322.

Johnson, Eric J., Robert J. Meyer, and Sanjoy Ghose (1989), “When Choice Models Fail: Compensatory Models in Negatively Correlated Environments,” Journal of Marketing Research, 255-270.

Johnson, Richard (1987), “Accuracy of Utility Estimation in ACA,” Working Paper, Sawtooth Software, Sequim, WA, (April).

---- (1991), “Comment on `Adaptive Conjoint Analysis: Some Caveats and Suggestions,” Journal of Marketing Re-search, 28, (May), 223-225.

---- (1999), “The Joys and Sorrows of Implementing HB Methods for Conjoint Analysis,” Working Paper, Sawtooth Software, Sequim, WA, (November).

Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lutkepohl, and T. C. Lee (1985), The Theory and Practice of Economet-rics, (New York, NY: John Wiley and Sons).

Karmarkar, N. (1984), “A New Polynomial Time Algorithm for Linear Programming,” Combinatorica, 4, 373-395.

R3


Keeney, Ralph and Howard Raiffa (1976), Decisions with Multiple Consequences: Preferences and Value Trade-offs, (New York, NY: John Wiley & Sons).

Klein, Noreen M., “Assessing Unacceptable Attribute Levels in Conjoint Analysis,” Advances in Consumer Re-search vol. XIV, 154-158.

Krantz, David H., R. Duncan Luce, Patrick Suppes, and Amos Tversky (1971), Foundations of Measurement, (New York, NY: Academic Press).

Kuhfeld, Warren F. (1999), “Efficient Experimental Designs Using Computerized Searches,” Working Paper, SAS Institute, Inc.

----, Randall D. Tobias, and Mark Garratt (1994), “Efficient Experimental Design with Marketing Research Applica-tions,” Journal of Marketing Research, (November), 545-557.

Leigh, Thomas W., David B. MacKay, and John O. Summers (1984), “Reliability and Validity of Conjoint Analysis and Self-Explicated Weights: A Comparison,” Journal of Marketing Research, 456-462.

Lenk, Peter J., Wayne S. DeSarbo, Paul E. Green, and Martin R. Young (1996), “Hierarchical Bayes Conjoint Analysis: Recovery of Partworth Heterogeneity from Reduced Experimental Designs,” Marketing Science, 15, 2, 173-191.

Liechty, John, Venkatram Ramaswamy, Steven Cohen (2001), “Choice-Menus for Mass Customization: An Ex-perimental Approach for Analyzing Customer Demand With an Application to a Web-based Information Service,” Journal of Marketing Research, 38, 2, (May).

Louviere, Jordan J., David A. Hensher, and Joffre D. Swait (2000), Stated Choice Methods: Analysis and Applica-tion, (New York, NY: Cambridge University Press), 354-381.

Mahajan, Vijay and Jerry Wind (1992), “New Product Models: Practice, Shortcomings and Desired Improvements,” Journal of Product Innovation Management, 9, 128-139.

---- (1986), “An Approach to the Measurement of Consumer Preferences Using Limited Information,” Journal of Marketing Research, 23, (February), 33-40.

McGrath, Michael E. (1996), Setting the Pace in Product Development: A guide to Product and Cycle-Time Excel-lence, (Boston, MA: Butterworth-Heinemann).

Montgomery, David B. and Dick R. Wittink (1980), “The Predictive Validity of Conjoint Analysis for Alternative Aggregation Schemes,” in Market Measurement and Analysis: Proceedings of the 1979 ORSA/TIMS Con-ference on Marketing, David B. Montgomery and Dick R. Wittink, eds, Cambridge, MA: Marketing Sci-ence Institute, 298-309.

Moore, William L. (1980), “Levels of Aggregation in Conjoint Analysis: An Empirical Comparison,” Journal of Marketing Research, 516-523.

---- and Richard J. Semenik (1988), “Measuring Preferences with Hybrid Conjoint Analysis: The Impact of a Differ-ent Number of Attributes in the Master Design,” Journal of Business Research, 261-274.

Nadilo, Rudy (1999), “On-line Research: The Methodology for the Next Millennium,” Advertising Research Foun-dation Journal, (Spring). Available at www.greenfield.com.

Nesterov, Y. and A. Nemirovskii (1994), “Interior-Point Polynomial Algorithms in Convex Programming,” SIAM, Philadelphia.

Nowlis, Stephen M. and Itamar Simonson (1997), “Attribute-Task Compatibility as a Determinant of Consumer Preference Reversals,” Journal of Marketing Research, 36, (May), 205-218.

Orme, Bryan (1999), “ACA, CBC, of Both?: Effective Strategies for Conjoint Research,” Working Paper, Sawtooth Software, Sequim, WA.

R4


----, M. I. Alpert, and E. Christensen (1997), “Assessing the Validity of Conjoint Analysis – Continued,” Sawtooth Software Conference Proceedings, 209-225.

Page, Albert L. and Harold F. Rosenbaum (1987), “Redesigning Product Lines with Conjoint Analysis: How Sun-beam Does It,” Journal of Product Innovation Management, 4, 120-137.

Park, C. Whan, Sung Youl Jun, and Deborah Macinnis (2000), “Choosing What I Want versus Rejecting What I Do Not Want: An Application of Decision Framing to Product Option Choice Decisions,” Journal of Market-ing Research, 37, 2, (May), 187-202.

Parker B. R. and V. Srinivasan (1976), “A Consumer Preference Approach to the Planning of Rural Primary Health-Care Facilities,” Operations Research, 24, 5, (September-October), 991-1025.

Robinson, P. J. (1980), “Applications of Conjoint Analysis to Pricing Problems,” in Market Measurement and Analysis: Proceedings of the 1979 ORSA/TIMS Conference on Marketing, David B. Montgomery and Dick Wittink, eds., (Cambridge, MA: Marketing Science Institute), 183-205.

Sandor, Zsolt and Michel Wedel (2001), “Designing Conjoint Choice Experiments Using Managers’ Prior Beliefs,” Journal of Marketing Research, 38, 4, (November), 430-444.

Sawtooth Software, Inc. (1999), “CVA: Full-Profile Conjoint System from Sawtooth Software,” CVA Manual, (Sequim, WA: Sawtooth Software, Inc.)

---- (1996), “ACA System: Adaptive Conjoint Analysis,” ACA Manual, (Sequim, WA: Sawtooth Software, Inc.) ---- (1999), “The ACA/HB Module for Hierarchical Bayes Estimation,” ACA/HB Manual, (Sequim, WA: Sawtooth

Software, Inc.) Sawyer, Alan G. (1975), “Demand Artifacts in Laboratory Experiments in Consumer Research,” Journal of Con-

sumer Research, 1. (March), 20-30. Sheth, Jagdish N. and Wayne W. Talarzyk (1972), “Perceived Instrumentality and Value Importance as Determi-

nants of Attitudes,” Journal of Marketing Research, 9, 6-9. Shimp, Terry, Eva Hyatt, and David Snyder (1991), “A critical Appraisal of Demand Artifacts in Laboratory Ex-

periments in Consumer Research,” Journal of Consumer Research, 18, (December), 273-283. Simmons, Carolyn J., Barbara A. Bickart, and John G. Lynch, Jr. (1993), “Capturing and Creating Public Opinion in

Survey Research,” Journal of Consumer Research, 30, (September), 316-329. Smith, Preston G. and Donald G. Reinertsen (1998), Developing Products in Half the Time, 2E, (New York, NY:

John Wiley & Sons, Inc.). Sonnevend, G. (1985a), “An ‘Analytic’ Center for Polyhedrons and New Classes of Global Algorithms for Linear

(Smooth, Convex) Programming,” Proceedings of the 12th IFIP Conference on System Modeling and Opti-mization, Budapest.

---- (1985b), “A New Method for Solving a Set of Linear (Convex) Inequalities and its Applications for Identifica-tion and Optimization,” Preprint, Department of Numerical Analysis, Institute of Mathematics, Eőtvős University, Budapest, 1985.

Srinivasan, V. (1988), “A Conjunctive-Compensatory Approach to The Self-Explication of Multiattributed Prefer-ences,” Decision Sciences, 295-305.

---- and Chan Su Park (1997), “Surprising Robustness of the Self-Explicated Approach to Customer Preference Structure Measurement,” Journal of Marketing Research, 34, (May), 286-291.

---- and Allen D. Shocker(1973a), “Estimating the Weights for Multiple Attributes in a Composite Criterion Using Pairwise Judgments,” Psychometrika, 38, 4, (December), 473-493.

---- and---- (1973b), “Linear Programming Techniques for Multidimensional Analysis of Preferences,” Psycho-metrika, 38, 3, (September), 337-369.

R5


Tessler, Amy, Norm Wada, and Robert L. Klein (1993), “QFD at PG&E,” Transactions from The Fifth Symposium on Quality Function Deployment, (June).

Toubia, Olivier, Duncan Simester, John R. Hauser (2002), “Fast Polyhedral Adaptive Conjoint Estimation,” working paper, Cambridge, MA: Center for Innovation in Product Development, MIT, (January).

Tourangeau, Roger, Lance J. Rips, and Kenneth A. Rasinski (2000), The Psychology of Survey Response, (New York, NY: Cambridge University Press).

Tybout, Alice M. and John R. Hauser (1981), "A Marketing Audit Using a Conceptual Model of Consumer Behavior: Application and Evaluation," Journal of Marketing, Vol. 45, No. 3, (Summer), 81-101.

Ulrich, Karl T. and Steven D. Eppinger (2000), Product Design and Development, (New York, NY: McGraw-Hill, Inc.).

Urban, Glen L. and Gerald M. Katz, “Pre-Test Market Models: Validation and Managerial Implications,” Journal of Marketing Research, Vol. 20 (August 1983), 221-34.

Vaidja, P. (1989), “A Locally Well-Behaved Potential Function and a Simple Newton-Type Method for Finding the Center of a Polytope,” in: N. Megiddo, ed., Progress in Mathematical Programming: Interior Points and Related Methods, Springer: New York, 79-90.

White, H (1980), “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroske-dasticity,” Econometrica, 48, 817-838.

Wilkie, William L. and Edgar A. Pessemier (1973), “Issues in Marketing’s Use of Multi-attribute Attitude Models,” Journal of Marketing Research, 10, (November), 428-441.

Willkie, Joseph., Adams, Christine and Azra Girnius (1999), “First Systematic Comparison Between Mall-Intercept and Internet Interviewing,” European Society for Opinion and Marketing Research, Worldwide Internet Conference, London UK, (February 21).

Wind, Jerry, Paul E. Green, Douglas Shifflet, and Marsha Scarbrough (1989), “Courtyard by Marriott: Designing a Hotel Facility with Consumer-Based Marketing Models,” Interfaces, 25-47.

Wittink, Dick R. and Trond Bergestuen (2001), “Forecasting with Conjoint Analysis,” Principles of Forecasting: A Handbook for Researchers and Practitioners, J. Scott Armstrong, ed., (Norwell, MA: Kluwer Academic Publishers),

---- and Philippe Cattin (1989), “Commercial Use of Conjoint Analysis: An Update,” Journal of Marketing, 91-96. ---- and David B. Montgomery (1979), “Predictive Validity of Trade-off Analysis for Alternative Segmentation

Schemes,” 1979 AMA Educators’ Conference Proceedings, Neil Beckwith ed., Chicago, IL: American Marketing Association.

Wright, Peter and Mary Ann Kriewall (1980), “State-of-Mind Effects on Accuracy with which Utility Functions Predict Marketplace Utility,” Journal of Marketing Research, 17, (August), 277-293.

R6

Application and Test of Web-based Adaptive Polyhedral Conjoint Analysis – Appendix

Appendix Summary of Acronyms Used to Simplify Exposition

AC Analytic Center estimation; the polyhedral estimation method proposed by Hauser, Simester and Toubia (2002).

ACA Refers to an experimental question-design condition (see Table 2). When clear in context, ACA is also used to describe the combination of the question design and an estimation method proposed by Sawtooth Software (ACA Version 4).

ACSE Analytic Center (AC) estimation with Self-Explicated constraints.

CBC Choice-based conjoint analysis. An alternative data collection method for conjoint analysis in which respondents are asked to state choices among sets of full-profile product descriptions. CBC data are not tested in this paper, but polyhedral methods exist for such data

Fixed Refers to an experimental question-design condition (see Table 2) and to a question-design method (non-adaptive D-efficient experimental design).

FP Fast Polyhedral Adaptive Conjoint Estimation (FastPace); the question-design method pro-posed by Hauser, Simester and Toubia (2002). When clear in context, basic FP also describes the combination of FP question-design with AC estimation.

FP1 Refers to both an experimental question-design condition (see Table 2) and the basic FastPace method (FP question design and AC estimation) using data from that condition. In the FP1 (and Fixed) conditions, the SE and PI questions are not collected prior to the external validity task.

FP2 Refers to both an experimental question-design condition (see Table 2) and the basic FastPace method (FP question design and AC estimation) using data from that condition. In FP2 (and ACA) conditions, the SE and PI questions are collected prior to the external validity task.

HB Hierarchical Bayes estimation.

HBSE Hierarchical Bayes estimation with Self-Explicated constraints.

OLS Ordinary Least Squares; an estimation method.

PC Metric Paired-Comparison Questions. The respondent provides a rating to indicate his or her strength of preference between two alternatives. All tested methods use PCs.

PD Product Development.

PI Purchase Intention Questions. These questions are used in standard ACA applications. They are only collected in the ACA and FP2 conditions prior to the external validity task.

SE Self-Explicated Questions. The respondent states the importance of a feature directly. They are only collected in the ACA and FP2 conditions prior to the external validity task.

WH Weighted Hybrid estimation; an estimation method recently proposed by Sawtooth Software (ACA Version 5).

A1

Application and Test of Web-based Adaptive Polyhedral Conjoint Analysis – Appendix

Hybrid Holdout Results Performance of Hybrid Methods on the Internal Validity Tests

Correlation with Actual Response

After 8 Questions After 16 Questions

Fixed Questions

FP1 Questions

Fixed Questions

FP1 Questions

Ordinary Least Squares (OLS) na na 0.73 0.71

Analytic Center (AC) 0.65 0.69 0.80 0.79

Hierarchical Bayes (HB) 0.70 0.67 0.76 0.72


ACA Questions

FP2 Questions

ACA Questions

FP2 Questions

Weighted Hybrid (WH) 0.77 0.81 0.81 0.84

AC with SE constraints (ACSE) 0.74 0.78 0.77 0.84

HB with SE constraints (HBSE) 0.76 0.80 0.78 0.82

Sample size 80 71 80 71 The missing observations reflect respondents who gave the same response for all four holdout questions (in which case the correlations were undefined).

Performance of Hybrid Methods on the Internal Validity Tests Conclusions from the Multivariate Analysis

Without SE Questions With SE Questions


Fixed Questions HB > AC >>> OLS

FP1 Questions AC >>> HB >>> OLS

ACA Questions WH > HBSE >>> ACSE

FP2 Questions WH > HBSE > ACSE


Ordinary Least Squares Estimation (OLS) Fixed > FP

Analytic Center Estimation (AC) FP > Fixed

Hierarchical Bayes Estimation (HB) Fixed > FP

Weighted Hybrid Estimation (WH) FP >>> ACA

AC with SE Constraints (ACSE) FP >>> ACA

HB with SE Constraints (HBSE) FP >>> ACA

A2

Documents

Application and Test of Web-based Adaptive Polyhedral ...hauser/Papers/FPE052802.pdf · Application and Test of Web-based Adaptive Polyhedral Conjoint Analysis tion for individual-respondent