21
MANAGEMENT SCIENCE Vol. 60, No. 6, June 2014, pp. 1392–1412 ISSN 0025-1909 (print) ISSN 1526-5501 (online) http://dx.doi.org/10.1287/mnsc.2014.1952 © 2014 INFORMS Path to Purchase: A Mutually Exciting Point Process Model for Online Advertising and Conversion Lizhen Xu Scheller College of Business, Georgia Institute of Technology, Atlanta, Georgia 30308, [email protected] Jason A. Duan, Andrew Whinston McCombs School of Business, The University of Texas at Austin, Austin, Texas 78712 {[email protected], [email protected]} T his paper studies the effects of various types of online advertisements on purchase conversion by capturing the dynamic interactions among advertisement clicks themselves. It is motivated by the observation that certain advertisement clicks may not result in immediate purchases, but they stimulate subsequent clicks on other advertisements, which then lead to purchases. We develop a novel model based on mutually exciting point processes, which consider advertisement clicks and purchases as dependent random events in continuous time. We incorporate individual random effects to account for consumer heterogeneity and cast the model in the Bayesian hierarchical framework. We construct conversion probability to properly evaluate the conversion effects of online advertisements. We develop simulation algorithms for mutually exciting point processes to compute the conversion probability and for out-of-sample prediction. Model comparison results show the proposed model outperforms the benchmark models that ignore exciting effects among advertisement clicks. Using a proprietary data set, we find that display advertisements have relatively low direct effect on purchase conversion, but they are more likely to stimulate subsequent visits through other advertisement formats. We show that the commonly used measure of conversion rate is biased in favor of search advertisements and underestimates the conversion effect of display advertisements the most. Our model also furnishes a useful tool to predict future purchases and advertisement clicks for the purpose of targeted marketing and customer relationship management. Keywords : attribution model; online advertising; conversion; mutually exciting point process; multivariate stochastic model; search advertisement; display advertisement; business analytics History : Received September 16, 2012; accepted March 9, 2014, by Eric Bradlow, special issue on business analytics. Published online in Articles in Advance April 16, 2014. 1. Introduction As the Internet grows to become the leading advertis- ing medium, firms invest heavily to attract consumers to visit their websites through advertising links in various formats, among which search advertisements (i.e., sponsored links displayed on the search engine results pages) and display advertisements (i.e., digital graphics linking to the advertiser’s website embed- ded in Web content pages) are the two leading online advertising formats (Interactive Advertising Bureau and PricewaterhouseCoopers 2012). Thanks to the advancement of information technology, which makes tremendous individual-level online clickstream data available, business analytics of how to evaluate the effectiveness of these different formats of online adver- tisements (ads) has been attracting constant academic and industrial interest. Marketing researchers and practitioners are especially interested in the conversion effect of each type of online advertisement, that is, given an individual consumer clicked on a certain type of advertisement, what is the probability of her making a purchase (or performing certain actions such as registration or subscription) thereafter. The most common measure of conversion effects is conversion rate, which is the percentage of the adver- tisement clicks that directly lead to purchases among all advertisement clicks of the same type. This simple statistic provides an intuitive assessment of advertising effectiveness. However, it overemphasizes the effect of the “last click” (i.e., the advertisement click directly preceding a purchase) and completely ignores the effects of all previous advertisement clicks, which natu- rally leads to biased estimates. Existing literature has developed more sophisticated models to analyze the conversion effects of website visits and advertisement clicks (e.g., Moe and Fader 2004, Manchanda et al. 2006). These models account for the entire clickstream history of individual consumers and model the pur- chases as a result of the accumulative effects of all previous clicks, which can more precisely evaluate the conversion effects and predict the purchase probability. Nevertheless, because existent studies on conversion effects focus solely on how nonpurchase activities 1392

Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Embed Size (px)

Citation preview

Page 1: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

MANAGEMENT SCIENCEVol. 60, No. 6, June 2014, pp. 1392–1412ISSN 0025-1909 (print) � ISSN 1526-5501 (online) http://dx.doi.org/10.1287/mnsc.2014.1952

© 2014 INFORMS

Path to Purchase: A Mutually Exciting Point ProcessModel for Online Advertising and Conversion

Lizhen XuScheller College of Business, Georgia Institute of Technology, Atlanta, Georgia 30308, [email protected]

Jason A. Duan, Andrew WhinstonMcCombs School of Business, The University of Texas at Austin, Austin, Texas 78712

{[email protected], [email protected]}

This paper studies the effects of various types of online advertisements on purchase conversion by capturing thedynamic interactions among advertisement clicks themselves. It is motivated by the observation that certain

advertisement clicks may not result in immediate purchases, but they stimulate subsequent clicks on otheradvertisements, which then lead to purchases. We develop a novel model based on mutually exciting pointprocesses, which consider advertisement clicks and purchases as dependent random events in continuous time. Weincorporate individual random effects to account for consumer heterogeneity and cast the model in the Bayesianhierarchical framework. We construct conversion probability to properly evaluate the conversion effects ofonline advertisements. We develop simulation algorithms for mutually exciting point processes to compute theconversion probability and for out-of-sample prediction. Model comparison results show the proposed modeloutperforms the benchmark models that ignore exciting effects among advertisement clicks. Using a proprietarydata set, we find that display advertisements have relatively low direct effect on purchase conversion, but they aremore likely to stimulate subsequent visits through other advertisement formats. We show that the commonly usedmeasure of conversion rate is biased in favor of search advertisements and underestimates the conversion effectof display advertisements the most. Our model also furnishes a useful tool to predict future purchases andadvertisement clicks for the purpose of targeted marketing and customer relationship management.

Keywords : attribution model; online advertising; conversion; mutually exciting point process; multivariatestochastic model; search advertisement; display advertisement; business analytics

History : Received September 16, 2012; accepted March 9, 2014, by Eric Bradlow, special issue on business analytics.Published online in Articles in Advance April 16, 2014.

1. IntroductionAs the Internet grows to become the leading advertis-ing medium, firms invest heavily to attract consumersto visit their websites through advertising links invarious formats, among which search advertisements(i.e., sponsored links displayed on the search engineresults pages) and display advertisements (i.e., digitalgraphics linking to the advertiser’s website embed-ded in Web content pages) are the two leading onlineadvertising formats (Interactive Advertising Bureauand PricewaterhouseCoopers 2012). Thanks to theadvancement of information technology, which makestremendous individual-level online clickstream dataavailable, business analytics of how to evaluate theeffectiveness of these different formats of online adver-tisements (ads) has been attracting constant academicand industrial interest. Marketing researchers andpractitioners are especially interested in the conversioneffect of each type of online advertisement, that is,given an individual consumer clicked on a certaintype of advertisement, what is the probability of her

making a purchase (or performing certain actions suchas registration or subscription) thereafter.

The most common measure of conversion effects isconversion rate, which is the percentage of the adver-tisement clicks that directly lead to purchases amongall advertisement clicks of the same type. This simplestatistic provides an intuitive assessment of advertisingeffectiveness. However, it overemphasizes the effect ofthe “last click” (i.e., the advertisement click directlypreceding a purchase) and completely ignores theeffects of all previous advertisement clicks, which natu-rally leads to biased estimates. Existing literature hasdeveloped more sophisticated models to analyze theconversion effects of website visits and advertisementclicks (e.g., Moe and Fader 2004, Manchanda et al.2006). These models account for the entire clickstreamhistory of individual consumers and model the pur-chases as a result of the accumulative effects of allprevious clicks, which can more precisely evaluate theconversion effects and predict the purchase probability.Nevertheless, because existent studies on conversioneffects focus solely on how nonpurchase activities

1392

Page 2: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and ConversionManagement Science 60(6), pp. 1392–1412, © 2014 INFORMS 1393

Figure 1 Illustrative Examples of the Interactions Among Ad Clicks

t1� t3

� t4�

Time

Search Search PurchaseB

t2�

Search

t1 t2 t3 Time

Display Search PurchaseA

(e.g., advertisement clicks, website visits) affect theprobability of purchasing, they usually consider thenonpurchase activities as deterministic data rather thanstochastic events and neglect the dynamic interactionsamong these activities themselves, which motivates usto fill this gap.

To illustrate the importance of capturing the dynamicinteractions among advertisement clicks when studyingtheir conversion effects, let us consider a hypotheticalexample illustrated in Figure 1. Suppose consumer Asaw firm X’s display advertisement for its productwhen browsing a webpage, clicked on the ad, andwas linked to the product webpage at time t1. Later,she searched for firm X’s product in a search engineand clicked on the firm’s search advertisement thereat time t2. Shortly afterward, she made a purchaseat firm X’s website at time t3. In this case, how shallwe attribute this purchase and evaluate the respectiveconversion effects of the two advertisement clicks? Ifwe attribute the purchase solely to the search advertise-ment click, like how the conversion rate is computed,we ignore the fact that the search advertisement clickmight not have occurred without the initial click on thedisplay advertisement. In other words, the occurrenceof the display ad click at time t1 is likely to increase theprobability of the occurrence of the subsequent adver-tisement clicks, which eventually lead to a purchase.Without considering such an effect, we might under-value the first click on the display ad and overvaluethe next click on the search ad. Therefore, to properlyevaluate the conversion effects of different types ofadvertisement clicks, it is imperative to account forthe exciting effects between advertisement clicks, thatis, how the occurrence of an earlier advertisementclick affects the probability of occurrence of subse-quent advertisement clicks. Neglecting the excitingeffects between different types of advertisement clicks,the simple measurement of conversion rates mighteasily underestimate the conversion effects of thoseadvertisements that tend to catch consumers’ attentioninitially and trigger their subsequent advertisementclicks but are less likely to directly lead to a purchase,for instance, the display advertisements.

In addition to the exciting effects between differenttypes of advertisement clicks, neglecting the excitingeffects between the same type of advertisement clicks

may also lead to underestimation of their conversioneffects. Consider consumer B in Figure 1, who clickedon search advertisements three times before makinga purchase at time t′4. If we take the occurrence ofadvertisement clicks as given and consider only theiraccumulative effects on the probability of purchasing,like the typical conversion models, we may concludethat it takes the accumulative effects of three searchadvertisement clicks for consumer B to make the pur-chase decision, so each click contributes one third.Nevertheless, it is likely that the first click at t′1 stimu-lates the subsequent two clicks, all of which togetherlead to the purchase at time t′4. When we considersuch exciting effects, the (conditional) probability ofconsumer B making a purchase eventually given heclicked on a search advertisement at time t′1 clearlyneeds to be reevaluated.

This study aims to develop an innovative model-ing approach that captures the exciting effects amongadvertisement clicks to contribute to the attributionmodels for properly evaluating the effectiveness ofonline advertisements using individual-level onlineclickstream data. To properly characterize the dynamicsof consumers’ online behaviors, the model also needsto account for the following unique properties and pat-terns of online advertisement clickstream and purchasedata. First, different types of online advertisementshave their distinct natures and therefore differ greatlyin their probabilities of being clicked, their impacts onpurchase conversions, and their interactions with othertypes of advertisements as well. Therefore, unlike thetypical univariate approach in modeling the conver-sion effects of website visits, to study the conversioneffects of various types of online advertisements froma holistic perspective, the model needs to account forthe multivariate nature of nonpurchase activities.

Second, consumers vary from individual to individ-ual in terms of their online purchase and ad clickingbehaviors, which could be affected by their inherentpurchase intention, exposure to marketing communica-tion tools, or simply preference for one advertisingformat over another. Because most of these factorsare usually unobservable in online clickstream data,it is important to incorporate consumers’ individualheterogeneity in the model.

Third, online clickstream data often contain the pre-cise occurrence time of various activities. Although

Page 3: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and Conversion1394 Management Science 60(6), pp. 1392–1412, © 2014 INFORMS

the time data are very informative about the under-lying dynamics of interest, most existing modelingapproaches have yet to adequately exploit such infor-mation. Prevalent approaches to address the time effectsusually involve aggregating data by an arbitrary fixedtime interval or considering the activity counts only, butdiscarding the actual time of occurrence. It is appealingto cast the model in a continuous-time framework toduly examine the time effects between advertisementclicks and purchases. Notice that the effects of a previ-ous ad click on later ones and purchases should decayover time. In other words, an ad click one month agoshould have less direct impact on a purchase at presentcompared to a click several hours ago. Moreover, someadvertisement formats may have more lasting effectsthan others, so the decaying effects may vary across dif-ferent advertisement formats. Therefore, incorporatingthe decaying effects of different types of advertisementclicks in the model is crucial in accurately evaluatingtheir conversion effects.

Furthermore, a close examination of the online adver-tisement click and purchase data set used for this studyreveals noticeable clustering patterns, that is, adver-tisement clicks and purchases tend to concentrate inshorter time spans and there are longer time intervalswithout any activity, which is also termed clumpy datain statistics literature (e.g., Zhang et al. 2013).1 If weare to model advertisement clicks and purchases as astochastic process, the commonly used Poisson processmodel will perform poorly, because its intensity at anytime is independent of its own history, and such amemoryless property implies no clustering at all (Coxand Isham 1980). For this reason, a more sophisticatedmodel with history-dependent intensity functions isespecially desirable.

In this paper, we develop a stochastic model foronline purchasing and advertisement clicking thatincorporates mutually exciting point processes withindividual heterogeneity in a Bayesian hierarchicalmodeling framework. The mutually exciting pointprocess is a multivariate stochastic process in whichdifferent types of advertisement clicks and purchasesare modeled as different types of random points incontinuous time. The occurrence of an earlier pointaffects the probability of occurrence of later pointsof all types so that the exciting effects among alladvertisement clicks are well captured. As a result, theintensities of the point process, which can be interpretedas the instant probabilities of point occurrence, dependon the previous history of the process. Moreover, the

1 Using the clumpiness metric (entropy value) proposed by Zhanget al. (2013), we find the median clumpiness of the individuals in ourdata sample is 0.50. In comparison, the median clumpiness would be0.17 if the click and purchase data were generated by memorylessPoisson processes.

exciting effects are modeled to be decaying over time ina natural way. The hierarchical structure of the modelallows each consumer to have her own propensity forclicking on various advertisements and purchasing sothat consumers’ individual processes are heterogeneous.

Our model offers a novel method to more preciselyevaluate the effectiveness of various formats of onlineadvertisements. In particular, the model manages tocapture the exciting effects among advertisement clicksso that advertisement clicks, instead of being determin-istic data as given, are also stochastic events dependenton the past occurrences. In this way, even for thoseadvertisements that have little direct effect on pur-chase conversion but may trigger subsequent clicks onother types of advertisements that eventually lead toconversion, our model can properly account for theircontributions. Compared with the benchmark modelthat ignores all the exciting effects among advertise-ment clicks, our proposed model outperforms it to aconsiderable degree in terms of model fit, which indi-cates that the mutually exciting model better capturesthe complex dynamics of online advertising responseand purchase processes.

Based on our model and its Bayesian estimationresults, we construct conversion probability to betterevaluate the conversion effects of different types ofonline advertisements. We find that the commonly usedmeasure of conversion rate is biased in favor of searchadvertisements by overemphasizing the “last click”effects and underestimates the effectiveness of displayadvertisements the most severely. We show that displayadvertisements have little direct effect on purchaseconversion, but are likely to stimulate visits throughother advertising channels. As a result, ignoring themutually exciting effects between different types ofadvertisement clicks undervalues the efficacy of displayadvertisements the most. Likewise, ignoring the self-exciting effects leads to significant underestimationof search advertisement’s conversion effects. A moreaccurate understanding of the effectiveness of variousonline advertising formats can help firms rebalancetheir marketing investment and optimize their portfolioof advertising spending.

Our model also better predicts individual consumers’online behavior based on their past behavioral data.Compared with the benchmark model that ignores allthe exciting effects, incorporating the exciting effectsamong all types of online advertisements improves themodel predictive power for consumers’ future ad clickand purchase patterns. Because our modeling approachallows us to predict both purchase and nonpurchaseactivities in the future, it thus furnishes a useful toolfor marketing managers in targeted advertising andcustomer relationship management.

In addition to the substantive contributions, thispaper also makes several methodological contributions.

Page 4: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and ConversionManagement Science 60(6), pp. 1392–1412, © 2014 INFORMS 1395

We model the dynamic interactions among onlineadvertisement clicks and their effects on purchaseconversion with a mutually exciting point process. Tothe best of our knowledge, we are the first to apply themutually exciting point process model in a marketing-or ecommerce-related context. We are also the first toincorporate individual random effects into the mutuallyexciting point process model in the applied econometricand statistic literature. This is the first study thatsuccessfully applies Bayesian inference using Markovchain Monte Carlo (MCMC) method to a mutuallyexciting point process model, which enables us to fit amore complex hierarchical model with random effectsin correlated stochastic processes. In evaluating theconversion effects for different online advertisementformats and predicting consumers’ future behaviors,we develop algorithms to simulate the point processes,which extend the thinning algorithm in Ogata (1981)to mutually exciting point processes with parametervalues sampled from posterior distributions.

The rest of this paper is organized as follows. In §2,we survey the related literature. We then provide anoverview of the data used for this study with summarystatistics in §3. In §4, we construct the model andexplore some of its theoretical properties. In §5, wediscuss the inference and present the estimation results,which will be used to evaluate the conversion effectsof different types of online advertisements and predictfuture consumer behaviors in §6. We also extend themodel to incorporate additional data in §6. We concludethis paper with discussions in §7.

2. Literature ReviewThis study is related to various streams of existing lit-erature on online advertising, consumer Web browsingbehaviors, and their effects on purchase conversion.Our modeling approach using the mutually excitingpoint process also relates to existing theoretical andapplied studies in statistics and probability. In addition,the rich literature on consumer behavior theory andcognitive psychology provides behavioral support forour model specifications. We next review the relevantliterature in these domains.

Our work is related to the literature on the dynamicsof online advertising exposure, website visit, web-page browsing, and purchase conversion. For example,Manchanda et al. (2006) study the effects of banneradvertising exposure on the probability of repeatedpurchase using a survival model. Moe and Fader (2004)propose a model of accumulative effects of websitevisits to investigate their effects on purchase conversion.Both studies consider the conversion effects of a singletype of activity and focus on the effects of the non-purchase activities on purchase conversion, whereaswe study the effects of various types of online adver-tisement clicks and consider the dynamic interactions

among nonpurchase activities as well. Montgomeryet al. (2004) consider the sequence of webpage viewswithin a single site-visit session. They develop a Markovmodel in which, given the occurrence of a webpageview, the type of the webpage being viewed is affectedby the type of the last webpage view. In contrast, weconsider multiple visits over a long period of timeand capture the actual time effect between differentactivities. To account for correlations among multi-variate activities, Park and Fader (2004) apply theSarmanov family of bivariate distributions to modelthe dependence of website visit durations across twodifferent websites. Danaher and Smith (2011) furtherdemonstrate that a more general class of copula modelscan be used to model multivariate distributions invarious marketing applications. As we discuss in moredetail in §7.1, our model based on the mutually excitingpoint process offers a new approach to induce correla-tion among all time durations between activities in aparsimonious way. Most recently, an emerging streamof research is dedicated to attribution modeling, whichdemonstrates that the simplistic approach of attributingconversion to the very last stop is erroneous (e.g., Liand Kannan 2014, Abhishek et al. 2013, Zantedeschiet al. 2013). Our paper enriches this increasingly vibrantstream of literature with a novel modeling framework.

In the area of statistics and probability, mutuallyexciting point processes were first proposed in Hawkes(1971a, b), where their theoretical properties are studied.Statistical models using Hawkes’ processes, includ-ing the simpler version of self-exciting processes, areapplied in seismology (e.g., Ogata 1998), sociology (e.g.,Mohler et al. 2011), and finance (e.g., Ait-Sahalia et al.2013, Bowsher 2007). These studies do not considerindividual heterogeneity, and the estimation is usuallyconducted using method of moments or maximumlikelihood estimation, whose asymptotic consistencyand efficiency is studied in Ogata (1978). Our paper isthus the first to incorporate random coefficients intothe mutually exciting point process model, cast it ina hierarchical framework, and obtain Bayesian infer-ence for it. Bijwaard et al. (2006) proposes a countingprocess model for interpurchase duration, which isclosely related to our model. A counting process isone way of representing a point process (e.g., Cox andIsham 1980). The model in Bijwaard et al. (2006) is anonhomogeneous Poisson process where the depen-dence on the purchase history is introduced throughcovariates. Our model is not a Poisson process wherethe dependence on history is parsimoniously modeledby making the intensity directly as a function of theprevious path of the point process itself. Bijwaard et al.(2006) also incorporates unobserved heterogeneity inthe counting process model and estimates it using theexpectation–maximization algorithm. Our Bayesianinference using the MCMC method not only provides

Page 5: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and Conversion1396 Management Science 60(6), pp. 1392–1412, © 2014 INFORMS

an alternative and efficient way to estimate this typeof stochastic model, but it facilitates straightforwardsimulation and out-of-sample prediction as well.

2.1. Conceptual BackgroundA large volume of behavioral literature on consumerinformation processing and responses to advertisingprovides theories and evidence supporting our quantita-tive model formulation. First, a consumer’s prior clickson one format of online advertisement could increasethe probability that she clicks on the advertisements ineither the same or a different format. Studies on process-ing fluency show that prior exposures to advertisingcan enhance both perceptual and conceptual fluency(e.g., Jacoby and Dallas 1981, Shapiro 1999). Perceptualfluency refers to the ease with which consumers canidentify a target stimulus on subsequent encountersand involves the processing of physical features (suchas modality, shape, and color), whereas conceptualfluency refers to the ease with which the target comesto consumers’ minds subsequently and involves theprocessing of meanings (Lee and Labroo 2004). Enhance-ment in these two dimensions therefore implies theincrease of the likelihood of the consumer recognizingand clicking on the ads either in the same format withsimilar physical features or in different formats thatpertain to the same product information. Additionally,extant studies find that prior ad exposure increasesthe probability of consideration-set membership of theadvertised product (e.g., Lee 2002, Nedungadi 1990,Shapiro et al. 1997), which makes consumers morewilling to consult ads that they would otherwise haveignored for informational purposes (Engel et al. 1995).Such positive effects have been shown robust for bothstimulus-based and memory-based consideration-setformation (Lee 2002, Nedungadi 1990), suggesting theincrease in the probability of the consumer clicking onthe ads that either are physically presented or requireactive searches.

Furthermore, it is also well documented thatenhanced perceptual and conceptual fluency positivelyinfluence consumers’ affective responses (e.g., Anandand Sternthal 1991, Lee and Labroo 2004). Increasedliking of the product, on one hand, will positivelyinfluence the attention and reception consumers giveto marketing communications; on the other hand, itwill directly increase the probability of purchase (Engelet al. 1995). Together with the above literature on theconsideration-set inclusion because of prior exposure toadvertisements, we argue that a consumer’s successiveclicks on various formats of online advertisementsjointly contribute to the increase in the probability ofpurchase conversion.

The theories and evidence discussed above suggestthat prior exposures to advertising positively influencesubsequent clicks and purchases; meanwhile, such

effects are subject to decay over time. Cognitive psy-chology provides theories explaining why memorytraces fade with the passage of time (Anderson andMilson 1989). Evidence is well documented that theprobability of retrieval failure increases as a function oftime, in many cases quite rapidly (e.g., Brown 1958,Muter 1980). Meanwhile, various features of the mar-keting communications being processed (e.g., modalityof information, interrelations among components) caninfluence people’s physiologic reactions that affect reten-tion and retrieval (e.g., Janiszewski 1990, Rothschildand Hyun 1990), suggesting that the lasting effectscould vary with different advertising formats.

Given the dynamic interaction among different for-mats of advertising and purchase discussed above,we hypothesize that a model of purchase conversionthat fails to account for the interactions among varioustypes of advertising will lead to biased estimationof the conversion effects of advertising and inferiorpredictive capability.

3. Data OverviewWe obtained the data for this study from a majormanufacturer and vendor of consumer electronics (e.g.,computers and accessories) that sells most of its prod-ucts online through its own website.2 The firm recordsconsumers’ responses to its online advertisements invarious formats. Every time a consumer clicks on oneof the firm’s online advertisements and visits the firm’swebsite through it, the exact time of the click and thetype of the online advertisement being clicked arerecorded. Consumers are identified by user IDs thatare primarily based on the tracking cookies stored ontheir computers.3 The firm also provided the purchasedata (including the time of a purchase) associated withthese user IDs. By combining the advertisement clickand purchase data, we form a panel of individualswho have visited the firm’s website through advertise-ments at least once, which comprises the entire historyof clicking on different types of advertisements andpurchasing by each individual.

2 We are unable to reveal the identity of the firm because of anondisclosure agreement.3 The firm constructs the so-called generalized user IDs to identifyusers by linking the cookie IDs with other user identity information(such as user accounts and order IDs) whenever available. Althoughmitigated by this approach, the general limitations of cookie-baseddata still exist in our data set (Dreze and Zufryden 1998). Ideally, moreadvanced technology capable of identifying the same user acrossmultiple devices would be preferable for more precise estimationresults. Nevertheless, given that the technological reliability ofcookie-based tracking is robust, and general users are increasinglyreceptive to the use of tracking cookies (Specific Media 2011), cookiedata are commonly used in the literature studying consumer onlinebehavior (e.g., Bucklin and Sismeiro 2003, De et al. 2010, Manchandaet al. 2006).

Page 6: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and ConversionManagement Science 60(6), pp. 1392–1412, © 2014 INFORMS 1397

One unique aspect of our data is that, instead ofbeing limited to one particular type of advertisement,our data offer a holistic view covering most majoronline advertising formats, which allows us to studythe dynamic interactions among different types ofadvertisements. Because we are especially interested inthe two leading formats of online advertising, namely,search and display advertisements, we categorize theadvertisement clicks in our data set into three categories:search, display, and other. Search advertisements, alsocalled sponsored search or paid search advertisements,refer to the sponsored links displayed by search engineson their search result pages alongside the generalsearch results. Display advertisements, also calledbanner advertisements, refer to the digital graphicsthat are embedded in Web content pages and link tothe advertiser’s website. The “other” category includesall the remaining types of online advertisements exceptsearch and display, such as classified advertisements(i.e., textual links included in specialized online listingsor Web catalogs) and affiliate advertisements (i.e.,referral links provided by partners in affiliate networks).Notice that our data only contain visits to the firm’swebsite through advertising links, and we do not havedata on consumers’ direct visits (such as by typingthe URL of the firm’s website directly in the Webbrowser) or visits through organic search results (i.e.,general search results on search engine results pages).Therefore, we focus on the conversion effects of onlineadvertisements rather than the general website visits.We will discuss data availability and limitations inmore detail in §7.2.

For this study, we use a random sample of 12,000user IDs spanning a four-month period from April 1 toJuly 31, 2008. We use the first three months for estima-tion and leave the last month as the holdout samplefor out-of-sample validation. The data of the first threemonths contain 17,051 ad clicks and 457 purchases.Table 1 presents a detailed breakdown of differenttypes of ad clicks. There are 2,179 individuals whohave two or more ad clicks within the first threemonths, among whom 26.3% clicked on multiple typesof advertisements.

We first perform a simple calculation of the conver-sion rates for different online advertisements, whichare shown in Table 1. In calculating the conversionrates, we consider a certain ad click leads to a con-version if it is succeeded by a purchase of the same

Table 1 Data Description

Number of Percentage ofad clicks ad clicks (%) Conversion rate

Search 6,886 40.4 0.01990Display 3,456 20.3 0.00203Other 6,709 39.3 0.01774

individual within one day;4 we then divide the numberof the ad clicks that lead to conversion by the totalnumber of the ad clicks of the same type. Because ofthe nature of different types of advertisements, it is notsurprising that their conversion rates vary significantly.The conversion rates presented in Table 1 are consistentwith the general understanding in industry that searchadvertising leads all Internet advertising formats interms of conversion rate, whereas display advertisinghas much lower conversion rates.5 Nevertheless, as wasdiscussed earlier, the simple calculation of conversionrate attributes every purchase solely to the most recentad click preceding the purchase. Naturally, it would bebiased against those advertisements that are not likelyto lead to immediate purchase decisions (e.g., displayadvertisements).

To further obtain intuition about the interactingdynamics among different types of advertisementsand purchases before starting the modeling analysis,we summarize the strings of ad click and purchasesequences and perform some descriptive analysis.Because we are interested in how ad clicks couldinfluence each other and together lead to conversion,we focus on those individuals with multiple ad clicksand purchase as well. We draw from the original dataa random sample of 10,000 individuals who have atleast two ad clicks and a purchase from April throughJune, 2008. We then summarize the strings of ad clickand purchase sequences for all these individuals andobtain the counts for each unique sequence pattern.Table 2 presents the most frequent sequences.

A scrutiny of Table 2 reveals several noteworthydata patterns. First, sequences involving repeatingclicks on the same types of ads before making a pur-chase are very common, indicating the imperative toaccount for the interactions among the same types ofadvertisements. Second, there is also a considerableportion that involves one type of ad click succeeded byanother type of ad click, which indicates the necessityto capture the interactions among different types ofadvertisements as well. Third, display ads appear to bemore likely to excite the other two types of ads than theother way around. Note that the number of display adclicks is relatively low in absolute terms, as is shown inTable 1. When focusing on the sequences that containdisplay ad clicks, we can see that the sequences of“DSP” and “DOP” are more frequent than “SDP,” and“ODP” is absent from the top list. To explore further

4 As we can show, choosing a different time interval longer or shorterthan one day (e.g., several hours, a few days) delivers essentiallythe same outcomes with regard to conversion rate and conversionprobability, which will be discussed in §6.1.5 Industry reports show that search ads’ conversion rates varywith a median of 3.5% (MarketingSherpa 2012), whereas the aver-age conversion rates for display ads lie between 0.15% and 0.2%(MediaMind 2010).

Page 7: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and Conversion1398 Management Science 60(6), pp. 1392–1412, © 2014 INFORMS

Table 2 Summary of Ad Click and Purchase Sequences

Rank Sequence Count Rank Sequence Count

1 OOP 11920 15 DOP 872 SSP 11781 16 SOOP 833 OOOP 886 17 SDP 794 SSSP 635 18 OOOOOOOOP 755 OOOOP 478 19 SSSSSSP 716 SOP 327 20 DDDP 637 OOOOOP 287 0 0 0 0 0 0 0 0 0

8 SSSSP 2749 DDP 19310 OOOOOOP 189 Count 1 Count 2 Ratio (%)11 OSP 166 Search 1,199 5,032 23.812 SSSSSP 145 Display 418 1,323 31.613 OOOOOOOP 119 Other 811 6,237 13.014 DSP 98

Notes. S, search ad click; D, display ad click; O, other ad click; P, purchase.Count 1 is the number of sequences that start with the particular type of adclick and also contain a different type of ad click. Count 2 is the number of allthe sequences that contain the particular type of ad click.

along this line, we parse all sequences and calculate thepercentage of the sequences that start with a particulartype of ad click and contain a different type of ad clickafterward among all the sequences that contain thisparticular type of ad click, as is shown in the lower-right panel of Table 2. The highest ratio for displayads thus suggests that ignoring the interactions amongdifferent types of ads could result in underestimatingthe conversion effect of display ads the most severely.

4. Model DevelopmentTo capture the interacting dynamics among differentonline advertising formats to properly evaluate theirconversion effects, we propose a model based onmutually exciting point processes. We also accountfor heterogeneity among individual consumers, whichcasts our model in a hierarchical framework. In thissection, we first provide a brief overview of mutuallyexciting point processes and then specify our proposedmodel in detail.

4.1. Mutually Exciting Point ProcessesA point process is a type of stochastic process thatmodels the occurrence of events as a series of randompoints in time or geographical space. For example, inthe context of this study, a click on an online adver-tisement or a purchase can be modeled as a pointoccurring along the time line. We can describe sucha point process by N4t5, which is an increasing non-negative integer-valued counting process such thatN4t25−N4t15 is the total number of points that occurredwithin the time interval 4t11 t27. Most point processescan be fully characterized by the conditional intensityfunction defined as follows (Daley and Vere-Jones 2003):

�4t �Ht5= limãt→0

Pr8N 4t +ãt5−N4t5 > 0 �Ht9

ãt1 (1)

where Ht is the history of the point process given therealization of the stochastic process up to time instant t,which includes all information and summary statisticsbefore t (e.g., all points occurred and their respectiveoccurrence times).6 The intensity measures the prob-ability of instantaneous point occurrence given theprevious realization. By the definition of Equation (1),given the event history Ht , the probability of a pointoccurring within 4t1 t +ãt7 is �4t � Ht5ãt. Note that�4t �Ht5 is always positive by definition.

Hawkes (1971a, b) first systematically studied a classof point processes, namely, the mutually exciting pointprocesses, in which past events affect the probabilityof future event occurrence, and different series ofevents interact with each other. A mutually excitingpoint process is a multivariate point process composedof multiple univariate point processes (or marginalprocesses), often denoted as N4t5= 6N14t51 0 0 0 1NK4t57,such that the conditional intensity function for eachmarginal process can be written as

�k4t �Ht5=�k+

K∑

j=1

∫ t

−�

gjk4t−u5dNj4u51 �k>00 (2)

Here gjk4t −u5 is the response function capturing theeffect of the past occurrence of a type j point at time uon the probability of a type k point occurring at time t(for u < t). The most common specification of theresponse function takes the form of exponential decaysuch that

gjk4�5= �jke−�jk�1 �jk > 01�jk > 00 (3)

As is indicated by Equation (2), the intensity for thetype k marginal process, �k4t �Ht5, is determined by theaccumulative effects of the past occurrence of points ofall types (not only the type k points, but also pointsof the other types), and meanwhile, such excitingeffects decay over time, as captured by Equation (3).In other words, in a mutually exciting point process,the intensity for each marginal process at any timeinstant depends on the entire history of all the marginalprocesses. For this reason, the intensity itself is actuallya random process, depending on the realization of thepoint process in the past.

It is worth noting that the commonly used Poissonprocess is a special point process such that the intensitydoes not depend on the history. The most commonPoisson process is homogeneous, which means theintensity is constant over the entire process; that is,�4t �Ht5≡ �. For a nonhomogeneous Poisson process,the intensity can be a deterministic function of the timebut still independent of the realization of the stochasticprocess.

6 Mathematically, Ht is a version of � -field generated by the randomprocess up to time t. Summary statistics such as how many pointsoccurred before t or the passage of time since the most recent pointare all probability events (sets) belonging to the �-field Ht .

Page 8: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and ConversionManagement Science 60(6), pp. 1392–1412, © 2014 INFORMS 1399

4.2. The Proposed ModelThe mutually exciting point process provides a flexibleframework that well suits the nature of the researchquestion of our interest. It allows us to model not onlythe effect of a particular ad click on future purchase,but also the dynamic interactions among ad clicksthemselves, and all these effects can be neatly cast intoa continuous-time framework to properly account forthe time effect. We therefore construct our model basedon mutually exciting point processes as follows.

For an individual consumer i (i = 11 0 0 0 1 I), we con-sider her interactions with the firm’s online marketingcommunication and her purchase actions as a mul-tivariate point process, N i4t5, which consists of Kmarginal processes, N i4t5= 6N i

14t51 0 0 0 1NiK4t57. Each of

her purchases as well as clicks on various online adver-tisements is viewed as a point occurring in one of theK marginal processes. The random variable N i

k4t5 is anonnegative integer counting the total number of typek points that occurred within the time interval 601 t7. Welet k =K stand for purchases and k = 11 0 0 0 1K− 1 standfor various types of ad clicks. For our data, we considerK = 4 so that N i

44t5 stands for purchases and N i14t51N

i24t5,

and N i34t5 stand for clicks on search, display, and other

advertisements, respectively. When individual i, forexample, clicks on search advertisements for the secondtime at time t0, then a type 1 point occurs, and N i

14t5jumps from 1 to 2 at t = t0.

The conditional intensity function (defined by Equa-tion (1)) for individual i’s type k process is modeled as

�ik4t �Hi

t5 = �ik exp4�kN

iK4t55

+

K−1∑

j=1

∫ t

0�jk exp4−�j4t − s55 dN i

j 4s5 (4)

= �ik exp4�kN

iK4t55

+

K−1∑

j=1

N ij 4t5∑

l=1

�jk exp4−�j4t − tj4i5

l 551 (5)

for k = 11 0 0 0 1K, where �ik > 0, �jk > 0, �j > 0, and t

j4i5

l

is the time instant when the lth point in individuali’s type j process occurs. Note that time t here iscontinuous and measures the exact time lapse sincethe start of observation (using day as unit, e.g., for atime lapse of 32.5 hours, t = 10354). Capable of dealingwith continuous time directly, our modeling approachavoids the assumption of arbitrary fixed time intervalsor the visit-by-visit analysis that merely considers visitcounts and ignores the time effect.

The first component of the intensity �ik specified in

Equation (4) is the baseline intensity, �ik. It represents

the general risk of the occurrence of a particular typeof event (i.e., an ad click or a purchase) for a particularindividual, which can be a result of the consumer’s

inherent purchase intention, intrinsic tendency to clickon certain types of online advertisements, and degreeof exposure to the firm’s Internet marketing com-munication. Apparently, the baseline intensity variesfrom individual to individual. We hence model suchheterogeneity among consumers by considering that�i = 6�i

11 0 0 0 1�iK7 follows a multivariate log-normal

distribution,

�i∼ log-MVNK4��1è�50 (6)

The multivariate log-normal distribution facilitates thelikely right-skewed distribution of �i

k (>0). In addition,the variance–covariance matrix è� allows for correlationbetween different types of baseline intensities; that is,for example, an individual having a higher tendencyto click on display advertisements may also have acorrelated tendency (higher or lower) to click on searchadvertisements.

In modeling the effects of ad clicks, we focus ontheir exciting effects on future purchases as well as onsubsequent clicks on advertisements. As is discussedin §2, behavioral studies on consumers’ processingof advertising show that prior interactions with mar-keting communications can increase both perceptualand conceptual fluency, which directly contributes tothe increase of the probability of consumers’ futureresponses to various types of advertisements. Further-more, increased processing fluency leads to positiveaffective evaluation and consideration-set membershipof the advertised brand, which not only increases thepurchase probability directly, but also makes consumersmore active to consult the focal firm’s advertisementsfor informational purposes. As a result, prior clicks ononline advertisements could increase the probabilityof not only purchase conversion, but also later clickson online ads in either the same or a different format.We hence model the effects of ad clicks in a formsimilar to Equation (3). We use �jk (j = 11 0 0 0 1K − 1and k = 11 0 0 0 1K) to measure the magnitude of increasein the intensity of type k process (i.e., ad clicks orpurchase) when a type j point (i.e., a type j ad click)occurs. Hence, �jK captures the direct effects of varioustypes of ad clicks on purchase conversion, and �jk

(k = 11 0 0 0 1K − 1) represents the exciting effects of priorad clicks on later ad clicks in various formats. Amongthem, �jj indicates the effect between the same typeof points and is therefore called the self-exciting effect;for j 6= k, �jk is the mutually exciting effects betweendifferent types of points.

As discussed in §2, the effects of prior ad clicksdecay over time, as memory traces fade and retrievalfailures occur with the passage of time. We thereforeuse �j to measure how fast such effects decay overtime. To keep our model parsimonious, we let �jk = �j

for all k = 11 0 0 0 1K, which implies that the decay rates

Page 9: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and Conversion1400 Management Science 60(6), pp. 1392–1412, © 2014 INFORMS

of the exciting effects of a type j point on various typesof processes would be the same.7 Whereas a larger �jk

indicates a greater exciting effect instantaneously, asmaller �j means such exciting effect is more lasting.

The effects of purchases are different from the effectsof ad clicks in at least two aspects. First, compared toa single click on an advertisement, a past purchaseshould have much more lasting effects on purchasesand responses to advertising in the near future, espe-cially given the nature of the products in our data (i.e.,major personal electronics). With respect to the timeframe of our study (i.e., three months), it is reasonableto consider such effects constant over time. Second,past purchases may impact the likelihood of futurepurchases and the willingness to respond to advertisingin either a positive or negative way. A recent purchasemay reduce the purchase need in the near future andthus lower the purchase intention and the interest inrelevant ads; on the other hand, a pleasant purchaseexperience could eliminate purchase-related anxietyand build up brand trust, which would increase theprobability of repurchase or further browsing of adver-tising information. Therefore, it is appropriate not topredetermine the sign of the effects of purchases. Basedon these two considerations, we model the effects ofpurchases as a multiplicative term shifting the base-line intensity, exp4�kN

iK4t55, so that each past purchase

changes the baseline intensity of the type k process (i.e.,purchase or one type of ad click) by exp4�k5, where�k can be either positive or negative. A positive �k

means a purchase increases the probability of futureoccurrence of type k points, whereas a negative �k

indicates the opposite.As is discussed earlier, the intensity �i = 6�i

11 0 0 0 1 �iK7

defined in Equation (5) is a vector random process anddepends on the realization of the stochastic processN i4t5 itself. As a result, �i keeps changing over theentire process. Figure 2 illustrates how the intensityof different marginal processes changes over time fora certain realization of the point process. It is alsoworth noting that the intensity function specified inEquation (5) only indicates the probability of eventoccurrence, whereas the actual occurrence could also beaffected by many other unobservable factors, for exam-ple, unexpected incidents or impulse actions. In thissense, the model implicitly accounts for nonsystematicunobservables and idiosyncratic shocks.

7 The model can be easily revised into different versions by allowing�jk to take different values. In fact, we also estimated two alternativemodels: one allowing values of �jk’s to be different from eachother, and the other considering that �jk’s take the same valuefor k = 11 0 0 0 1K − 1, which is different from �jK . It is shown thatthe performance of our proposed model is superior to that ofboth alternative models: the Bayes factors of the proposed modelrelative to the two alternative models are exp41200245' 107 × 1052

and exp41900185' 309 × 1082, respectively.

Notice that Equation (4) implicitly assumes thatthe accumulative effects from the infinite past up totime t = 0, which is unobserved in the data, equalzero; that is, �i

k405−�ik = 0. In fact, the initial effect

should not affect the estimates as long as the responsefunction diminishes to zero at infinite and the studyperiod is long enough. Ogata (1978) theoretically showsthat the maximum likelihood estimates when omittingthe history from the infinite past are consistent andefficient, as long as the data observation period issufficiently long. Using simulated data, we can alsoshow that Bayesian estimation using left-censored datacan still recover true model parameters for the lengthof our observed period. The details of the simulationstudy are omitted because of page limitation and areavailable upon request.

Based on the intensity function specified in Equa-tion (5), the likelihood function for any realization ofall individuals’ point processes 8N i4t59Ii=1 can be writtenas (Daley and Vere-Jones 2003)

L =

I∏

i=1

K∏

k=1

{[N ik4T 5∏

l=1

�ik4t

k4i5l �Hi

tk4i5l

5

]

· exp(

∫ T

0�ik4t �Hi

t5 dt

)}

0 (7)

The likelihood function involving a stochastic integra-tion can be derived in a full closed form, which ispresented in detail in §A.1 of the appendix. It is worthemphasizing that unlike the typical conversion modelsin which advertising responses are treated only asexplanatory variables for purchases, our model treatsad clicks also as random events that are impactedby the history, and hence their probability densitiesdirectly enter the likelihood function, in the sameway as purchases. This fully multivariate modelingapproach avoids the structure of conditional (partial)likelihood, which often arbitrarily specifies “dependent”and “independent” variables, resulting in statisticallyinefficient estimates for an observational study.

To summarize, we constructed a mutually excitingpoint process model with individual random effects.Given the hierarchical nature of the model, we castit in the Bayesian hierarchical framework. The fullhierarchical model is described as follows:

N i4t5 � �1�1�1�i∼ �i4t �Hi

t51

�i� ��1è� ∼ log-MVNK4��1è�51

�jk ∼Gamma4a�1 b�51 �j ∼Gamma4a�1 b�51 � ∼MVNK4��1 è�51

�� ∼MVNK4���1 è��51 è� ∼ IW4S−11 �51 (8)

where � is a 4K − 15×K matrix whose 4j1 k5th elementis �jk, and � = 6�11 0 0 0 1�K−17 and � = 6�11 0 0 0 1�K7are both vectors. The parameters to be estimated are

Page 10: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and ConversionManagement Science 60(6), pp. 1392–1412, © 2014 INFORMS 1401

Figure 2 Illustration of the Intensity Functions

8�1�1�1 8�i91 ��1è�9. Notice that �, �, �, and 8�i9 playdistinct roles in the data-generating process, and themodel is therefore identified (Bowsher 2007).

4.3. Alternative and Benchmark ModelsOur modeling framework is general enough to incor-porate a class of nested models. We are particularlyinterested in a special case in which �jk = 0 for j 6= kand j1 k = 11 0 0 0 1K − 1. It essentially ignores the excit-ing effects among different types of ad clicks. A pastclick on advertisement still has impact on the prob-ability of future occurrence of purchases as well asad clicks of the same type, but it will not affect thefuture occurrence of ad clicks of different types. There-fore, in contrast with our proposed mutually excitingmodel, we call this special case the self-exciting model,because it only captures the self-exciting effects amongadvertisement clicks.

For model comparison purpose, we are interested indifferent benchmark models for purchase conversion.A nested benchmark model within our point processmodeling framework is letting �jk = �k = 0 for allj1 k = 11 0 0 0 1K − 1. In other words, this benchmark

model completely ignores the exciting effects amongall advertisement clicks. Ad clicks still have effects onpurchases, but the occurrence of ad clicks themselvesis not impacted by the history of the process (neitherpast ad clicks nor past purchases), and hence theirintensities are taken as given and constant over time.As a result, the processes for all types of advertisementclicks are homogeneous Poisson processes, and we thuscall this benchmark model the Poisson process model.

A commonly used model for purchase conversionis the binary response model that models the pur-chase conversion as a 0-or-1 variable with consumers’nonpurchase activities as explanatory variables. Weare thus interested in a benchmark model in whichthe probability of an individual consumer makinga purchase within a certain time interval (e.g., eachmonth) is a logistic regression on the counts of herclicks on different types of online ads within the sametime interval. We will refer to this benchmark model asthe logistic conversion model. Notice that this logisticconversion model cannot be directly compared withthe point process models using the model comparisoncriteria such as the deviance information criterion (DIC)

Page 11: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and Conversion1402 Management Science 60(6), pp. 1392–1412, © 2014 INFORMS

or log-marginal likelihood, because what constitutethe random data within the likelihood function aredifferent. The occurrence of ad clicks and the timeintervals between them all enter the likelihood functionas in Equation (7) in the point process models, whereasthe likelihood function only accounts for the occurrenceof purchase conversions in the logistic conversionmodel. Therefore, we compare our mutually excitingmodel with this benchmark model using out-of-sampleprediction on purchase conversion in §6.2.

5. EstimationTo estimate the parameters in the model, we use theMCMC method for Bayesian inference. We apply theMetropolis–Hastings algorithm to sample the parame-ters. Section A.2 in the appendix presents the detailedsteps of the MCMC algorithm. For each model, we ranthe sampling chain for 50,000 iterations using the Rprogramming language on a Windows workstationcomputer and discarded the first 20,000 iterations toensure convergence.

5.1. Estimation ResultsBefore estimating the model using the real data, wefirst conduct a simulation study by estimating ourmodel using simulated data. Results show that ourmodel can correctly recover the true parameter values.The details of the simulation study are omitted becauseof page limitation and are available upon request. We

Table 3 Parameter Estimates for the Mutually Exciting Model

Search Display Other Purchase

(a) Exciting effectsSearch �11 �12 �13 �14

208617 (0.1765) 000860 (0.0214) 005381 (0.0562) 006167 (0.0633)Display �21 �22 �23 �24

001614 (0.0496) 107818 (0.2314) 002055 (0.0572) 000845 (0.0347)Other �31 �32 �33 �34

004647 (0.0654) 001270 (0.0367) 800526 (0.4117) 008384 (0.0867)Purchase �1 �2 �3 �4

−005664 (0.1228) −007556 (0.2348) −006235 (0.1229) 002787 (0.2160)Decay �1 �2 �3

3400188 (1.7426) 4608854 (4.9370) 5105114 (2.3241)

(b) Individual heterogeneityMean ��11 ��12 ��13 ��14

−503926 (0.0166) −601027 (0.0212) −508063 (0.0221) −907704 (0.0762)Covariance

Search è�111

004584 (0.0246)Display è�121 è�122

−001197 (0.0212) 005934 (0.0335)Other è�131 è�132 è�133

−004942 (0.0257) −003380 (0.0304) 100014 (0.0365)Purchase è�141 è�142 è�143 è�144

002256 (0.1228) −004157 (0.1665) 000762 (0.2440) 203914 (0.2575)

Note. Posterior means and posterior standard deviations (in parentheses) are reported.

then apply the real observational data for estimation.We estimate the mutually exciting model for the dataof the first three months. We report the posterior meansand posterior standard deviations for major parametersin Table 3. (The estimates for 12,000 different �i’s areomitted due to the page limit.)

The estimation results in Table 3(a) demonstrate sev-eral interesting findings regarding the effects of onlineadvertisement clicks. First of all, it is shown that thereexist significant exciting effects between the same typesof advertisement clicks as well as between differenttypes of advertisement clicks. Compared to the base-line intensities for the occurrence of ad clicks (i.e., �i

j ,j = 11213), whose expected values (exp8��1j9, j = 11213)range from exp8−60109' 000022 to exp8−50399' 000046,the values of �jk (j1 k = 11213) are greater by orders ofmagnitude. It implies that given the occurrence of aparticular type of ad click, the probability of ad clicksof the same type or different types occurring in the nearfuture is significantly increased. Therefore, the resultsunderscore the necessity and importance of accountingfor the dynamic interactions among advertisementclicks in studying their conversion effects.

Compared with the mutually exciting effects, self-exciting effects between the same type of advertisementclicks are more salient, as �jj (j = 11213) are greater than�jk (j 6= k and j1 k = 11213). This result is consistent withthe observed data pattern that it is more common for

Page 12: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and ConversionManagement Science 60(6), pp. 1392–1412, © 2014 INFORMS 1403

consumers to click on the same type of advertisementmultiple times.

When we compare the mutually exciting effectsbetween different types of advertisement clicks, inter-estingly, display advertisements tend to have greaterexciting effects on the other two types of advertisementclicks than the other way round. The posterior prob-ability of �21 being greater than �12 is 0092, and theposterior probability of �23 being greater than �32 is0087. This result implies that when there is a sequenceof clicks on different types of advertisements in a shorttime period, display ad clicks are more likely to occurat the beginning of the sequence than toward the end,because they are more likely to excite the other twotypes of ad clicks than to be excited by them.

Regarding the direct effects on purchase conver-sion, the values of �j4 (j = 11213) are much greatercompared with the baseline intensity for purchaseoccurrence (i.e., �i

4), whose expected value exp8��149 isabout exp8−90779' 0000006. It indicates that clickingon an advertisement and visiting the firm’s websiteincrease the probability of purchase directly, whichis consistent with the previous findings in literature.Although all three types of advertisement clicks havedirect conversion effects, display advertisement’s directconversion effect (�24) is much smaller, which partiallyexplains the low conversion rate of display adver-tisements and the general understanding of its lowconversion efficacy.

Past purchases are shown to negatively affect theprobability of future clicks on advertisements. Theparameters �j for j = 11213 take significantly negativevalues, whereas the effect on repeated purchases (i.e.,�4) is insignificant. It suggests that past purchases ingeneral suppress consumers’ purchase need from thisparticular firm and thus diminish their interest in thefirm’s online advertisements; although some consumersmight make repeated purchases, as the positive �4suggests, they tend to make the repeated purchasesdirectly, rather than through clicking advertising linksagain.

There are also interesting results regarding thevariance–covariance matrix for individual baselineintensities in Table 3(b). First, notice that the covari-ances between the individual baseline intensities forany two types of ad clicks (i.e., è�1211è�1311è�132) arenegative. In other words, a consumer having higherbaseline intensity for clicking search advertisements, forexample, is likely to have lower baseline intensity forclicking display advertisements. Such negative covari-ances imply that consumers are initially inclined torespond to one particular type of online advertisement,whereas clicking this particular type of advertisementmay increase the probability of clicking other types ofadvertisements subsequently. In addition, it is interest-ing to find that the individual baseline intensity for

Table 4 Model Comparison Results

Model DIC Log-marginal likelihood

Mutually exciting 192,002.56 −93,454.69Self-exciting 193,513.08 −94,219.22Poisson process 206,146.80 −99,748.95

clicking display advertisements is negatively correlatedwith the individual baseline intensity for purchases;that is, è�142 < 0. In other words, consumers whoare more likely to respond to display ads usuallyhave lower initial purchase intention, which partiallyexplains the lower conversion rate of display ads.

5.2. Model ComparisonWe next estimate the alternative self-exciting model andthe benchmark Poisson process model and comparetheir goodness of fit with the mutually exciting modelby computing the DIC and the log-marginal likelihoodfor the Bayes factor. In computing the log-marginallikelihood, we draw from the posterior distributionbased on the MCMC sampling chain using the methodproposed by Gelfand and Dey (1994). Table 4 showsthe results of the model comparison criteria for thethree models.

According to Table 4, the Bayes factor of the mutuallyexciting model relative to the self-exciting model isexp4−931454069+9412190325' 100×10332, and the Bayesfactor of the mutually exciting model relative to thePoisson process model is exp4−931454069+9917480955'

307 × 1021733. The mutually exciting model also has thelowest DIC value. Therefore, both the DIC and Bayesfactor indicate that the proposed mutually excitingmodel outperformed the other two models by a greatextent. Recall that the Poisson process model fails tocapture any exciting effect among advertisement clicksat all. Given the estimation results showing that sucheffects do exist, it is not surprising that such a modelperforms poorly in terms of model fit. In contrast, theself-exciting model captures the exciting effects amongthe same type of advertisement clicks, which accountfor a considerable portion of the dynamic interactionsamong all advertisement clicks. Consequently, the self-exciting model improves noticeably beyond the Poissonprocess model. Nevertheless, its performance is stillsubstantially inferior to that of the mutually excitingmodel because of its omission of the exciting effectsbetween different types of advertisement clicks.

6. Model ApplicationsThe existence of both mutually exciting and self-excitingeffects indicated by the estimation results suggests thenecessity of reassessing the effectiveness of differentonline advertising formats in a more proper approach.In this section, we apply our model and develop a

Page 13: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and Conversion1404 Management Science 60(6), pp. 1392–1412, © 2014 INFORMS

measure to evaluate the conversion effects of differenttype of online advertisements. To derive the probabilityof purchase occurring after clicking on a certain typeof advertisement, we develop a simulation algorithmto simulate the mutually exciting processes specified inour model. The simulation approach also allows us toexplore individual’s future behavior, which we utilizefor out-of-sample validation and prediction purposes.We also demonstrate how to generalize our model toincorporate additional information such as marketingmix variables.

6.1. Conversion EffectBy considering the occurrence of advertisement clicksas stochastic events, our modeling approach enables usto more precisely measure different advertisements’conversion effects by capturing the dynamic inter-actions among advertisement clicks themselves. Inparticular, it enables us to explicitly examine the proba-bility of purchase occurring within a certain periodof time given a click on a particular type of adver-tisement initially, which subsumes the cases wherevarious subsequent advertisement clicks are triggeredafter the initial click and lead to the eventual purchaseconversion altogether.

Formally, we define the conversion probability (CP) asfollows. Suppose a representative consumer i clickedon a type k advertisement at time t0, and no clickoccurred in the history before t0.8 Then the conversionprobability for type k advertisement in time period tgiven the parameters for the processes 8�1�1�1�i9 canbe defined as

CPk4t3�i1�1�1�5

=Pr4N iK4t0 +t5−N i

K4t05>0 �N ik4t05−N i

k4t0−5=151 (9)

where k= 11 0 0 0 1K − 1. Note that N ik4t0−5 is defined

as the limit of the type k ad click count up to but notincluding time instant t0, i.e., N i

k4t0−5= limt↑t0N i

k4t5. ByEquation (9), the conversion probability CPk4t5 measuresthe probability of purchase conversion occurring withinthe time period t given a type k ad click occurredinitially at time t0. Note that CPk4t5 captures both thedirect and the indirect effects of a type k advertisementclick on purchase conversion, because the probabilitymeasure includes not only the cases in which a purchaseoccurs directly after the initial ad click (without anyother points in between), but also those cases in whichvarious advertisement clicks occur after the initial clickand before the purchase conversion. Therefore, as ameasure of the conversion effects of different types ofadvertisement clicks, the conversion probability defined

8 In reality, an “initial click” can be approximated as long as therewas no click in the recent history before t0.

in Equation (9) manages to account for the excitingeffects among advertisement clicks themselves.

Based on the Bayesian inference of our proposedmodel, we can define the average conversion probability(ACP) by taking the expectation over the posteriordistribution of the model parameters, p4�1�1�1��1è� �Data5, as follows:

ACPk4t5

=E6CPk4t3�i1�1�1�5 �Data7

=

CPk4t3�i1�1�1�5p4�i

���1è�5

·p4�1�1�1��1è� �Data5d�i d�d�d�d�� dè�0 (10)

Note that we are interested in the average conversionprobability of different types of advertisements for arepresentative consumer (i.e., a typical consumer) ratherthan any specific consumer in the data set. Therefore,in Equation (10), the conversion probability is averagedover the distribution of individual baseline intensities,p4�i � ��1è�5, as is specified in Equation (6), insteadof using the posterior distribution p4�i � Data5 for aspecific consumer i.

Given the complexity of the mutually exciting pointprocesses, the conversion probability cannot be explic-itly derived in a closed form. Instead, we use theMonte Carlo method to calculate such probabilities.For this purpose, we develop an algorithm to simulatethe mutually exciting point processes in our proposedmodel. This simulation algorithm is an extension ofthe thinning algorithm for self-exciting point processes(Ogata 1981) to mutually exciting point processes withposterior samples of the model parameters. The algo-rithm details are presented in §A.3 of the appendix.Here, we provide a brief overview of this simulationalgorithm. The basic idea of this algorithm is simi-lar to the typical acceptance–rejection Monte Carlomethod: we first simulate a homogeneous Poissonprocess with a high intensity and then drop someof the extra points probabilistically according to theactual conditional intensity function. More specifically,we first draw the model parameters from the MCMCposterior sample and draw the individual baselineintensity as well. We then find a constant intensitythat dominates the aggregate intensity function of themutually exciting point process. We can thus simulatethe next point of the homogeneous Poisson processwith this constant dominating intensity by generatingthe time interval from an exponential distribution. Next,we probabilistically reject this point according to theratio of the aggregate intensity of the mutually excitingpoint process to the constant intensity of the Poissonprocess. Finally, we assign a type to the generatedpoint using the intensities for different types of pointsas probability weights.

Applying the above algorithm to repeatedly sim-ulate the point processes in our model, we can

Page 14: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and ConversionManagement Science 60(6), pp. 1392–1412, © 2014 INFORMS 1405

Table 5 Average Conversion Probabilities (%) of DifferentAdvertisement Formats

Model Search Display Other

Conversion rate 1.990 0.203 1.774Mutually exciting 2.030 0.246 1.978Self-exciting 2.005 0.238 1.968Poisson process 1.818 0.234 1.703

approximate the average conversion probability inEquation (10) by

ACPk4t5

=

E6I8N iK4t0 +t5−N i

K4t05>09 �N ik4t05−N i

k4t0−5=17

·p4�i���1è�5

·p4�1�1�1��1è� �Data5d�i d�d�d�d�� dè�

'1R

R∑

r=1

I8Ni4r5K 4t0 +t5−N

i4r5K 4t05>091 (11)

where R is the total number of simulation rounds, N i4r54t5is the point process simulated in the rth round, andI8 · 9 is the indicator function such that I8N i4r5

K 4t0 + t5−N

i4r5K 4t05 > 09 equals 1 if there is at least one purchase

point within the time interval 4t01 t0 + t7 in the rthsimulated point process.

We use the approach described above to compute theaverage conversion probabilities for search, display, andother types of advertisements based on the Bayesianinference outcome of our proposed model. For eachk ∈ 8112139, we run the simulation for 1,000,000 timesto compute ACPk4t5 according to Equation (11).9 Wechoose the time interval t equal to one day so thatthe average conversion probabilities for each typeof advertisements are directly comparable to theirconversion rates as in §3. Table 5 presents the averageconversion probabilities of different advertisementformats computed based on our proposed mutuallyexciting model (the second row) in contrast to theirconversion rates (the first row).

As a common measure of the effectiveness of vari-ous online advertisement formats, conversion ratessimply attribute a purchase completely to the lastadvertisement click preceding it. As a result, for thoseadvertisement formats that tend to be used as thelast stop before a purchase action, such as searchadvertisements, their conversion effects can easily beamplified by such a measure. On the contrary, for thoseadvertisement formats that are more likely to attractconsumers’ initial attention but less likely to directlylead to immediate purchase decision, such as display

9 We sample 1,000,000 times with replacement from the 30,000posterior samples to complete this exercise.

advertisements, their contribution are largely ignored.The results presented in the first two rows of Table 5confirm such bias against display advertisements by themeasure of conversion rates. If we compare the ratio ofthe conversion rates of display advertisements versussearch advertisements (i.e., 00102) with the ratio of theiraverage conversion probabilities based on our proposedmutually exciting model (i.e., 00121), we find that therelative conversion effect of display advertisements isunderestimated by as much as 1507%. Such underesti-mation originates from the fact that although displayadvertisements have little direct effect on purchaseconversion (recall that display advertisements havethe lowest direct conversion effect, i.e., �24 is muchlower than �14 and �34 according to the estimationresults), they may stimulate subsequent clicks on othertypes of advertisements, which in turn leads to thepurchase conversion. In contrast, the proposed measureof average conversion probability properly capturessuch contribution from display advertisements. Noticethat the relative conversion rates often serve as animportant guide for marketing managers to determinetheir portfolios of online advertising spending and foradvertising providers to price their advertising vehicles.In this sense, our analysis results suggest that displayadvertisements might have long been undervalued inthe online advertising practice.

To further investigate how neglecting different typesof exciting effects among advertisement clicks wouldaffect the estimation of their conversion effects, we usethe same approach and compute the average conversionprobabilities for the self-exciting and Poisson processmodels based on their respective model inferenceoutcomes. The results are presented in the third andfourth rows of Table 5.

Comparing the conversion probabilities evaluatedbased on the self-exciting model (the third row ofTable 5) with those based on the mutually excitingmodel (the second row of Table 5), we can see that theconversion effect of display advertisements is underes-timated by 3.3% by the self-exciting model, whereas theconversion effects of search and other advertisementsare underestimated by 1.2% and 0.5%, respectively.Notice that in comparison with our proposed mutuallyexciting model, the nested self-exciting model cap-tures the exciting effects only among the same type ofadvertisement clicks, but ignores the exciting effectsbetween different types of advertisement clicks. Thisresult thus suggests that among all online advertisingformats we studied, display advertisements have themost salient effects in stimulating subsequent clickson advertisements of different types. If we ignore themutually exciting effects among different types ofadvertisements, display advertisements’ conversioneffects would be underestimated the most severely.

Page 15: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and Conversion1406 Management Science 60(6), pp. 1392–1412, © 2014 INFORMS

If we further compare the conversion probabilitiesevaluated based on the Poisson process model (thefourth row of Table 5) with those based on the self-exciting model (the third row of Table 5), it is clearthat the Poisson process model underestimates theconversion effect of search advertisements more greatlythan display advertisements. Recall that comparedwith the self-exciting model, Poisson process modelfurther ignores the self-exciting effects among the sametype of advertisement clicks. This result thus indi-cates that search advertisements have more salientself-exciting effects; that is, a search advertisement clickis more likely to be succeeded by further clicks on thesame type of advertisements, which altogether leadto the purchase conversion. Therefore, ignoring suchself-exciting effects would underestimate the conver-sion effects of search advertisements more severelythan display advertisements. In conclusion, to obtainan unbiased assessment of different advertisements’conversion effects, it is important to account for themutually exciting effects as well as the self-excitingeffects among advertisement clicks.

6.2. Prediction and ValidationThe simulation algorithm developed to evaluate theconversion probabilities also enables us to predict eachindividual’s future behavior based on their historicaldata. It allows us to perform out-of-sample validation ofour proposed model and compare model performancesin terms of predicative power.

Recall that in our data set that spans a four-monthperiod from April through July, 2008, we use the dataof the first three months for model estimation andleave the fourth month’s data as a holdout sample. Toperform out-of-sample validation, we randomly selecta sample of 1,000 individuals out of all individualsused for estimation. For each individual, we predicttheir advertisement clicking and purchasing behaviorsfor the fourth month (31 days) based on their pastbehaviors in the previous three months (91 days) andthe Bayesian inference for model parameters obtainedduring the estimation step. The algorithm used tosimulate individuals’ future behaviors is similar tothe one developed in §6.1. The primary difference istwofold: the baseline intensities �i no longer reflect arepresentative consumer, but are now individual spe-cific and are drawn from the posterior distribution foreach specific individual obtained during the estimationstep; the initial effects at the beginning of the simulatedprocesses are the accumulated effects of the actual pastbehavior for each specific individual over the first threemonths. (See §A.3 in the appendix for more details.)

For each of the selected 1,000 individuals, we simulate10,000 point processes according to the proposed model.We then calculate the predicted average numbers ofpurchases and ad clicks in the fourth month over

Table 6 Average Numbers of Ad Clicks and Purchases per Customer inthe Fourth Month

Search Display Other Purchase

Actual data 0.12 0.070 0.14 0.013Model 0.16 0.074 0.13 0.013

prediction (0.065, 0.60) (0.027, 0.23) (0.045, 0.82) (0.0083, 0.056)

the 1,000 individuals by taking the median valuesfrom the 10,000 sets of simulation outcomes. We alsoconstruct the 95% interval of these numbers based onthe simulation outcomes. We contrast the predictednumbers and intervals with the actual data from theholdout sample. Table 6 presents the out-of-samplevalidation results, which show that the actual dataall fall into the 95% predicative intervals and arequite close to the predicted values, indicating thatthe proposed model adequately captures the complexdynamics underlying consumers’ online advertisementclicking and purchasing processes.

As out-of-sample prediction can provide statisti-cally corroborating evidence for the model comparisonresults in Table 4, we next compare the predicative per-formance across different models. For the self-excitingand the Poisson process models, we use the same simu-lation approach to forecast individual behaviors in thefourth month for the same predicative sample based onthe Bayesian estimation outcomes from the two models.For each of the three comparative models, we calculatethe predicted numbers of purchases and advertisementclicks of different types for each individual by aver-aging over the 10,000 simulated processes, and thenwe compute the sum of squared errors (SSE) betweenthe predicted numbers and the observed data fromthe holdout sample. Table 7 shows the average sum ofsquared errors over the 1,000 selected individuals forthe three models.

As we can see from Table 7, the out-of-sample per-formances confirm the model comparison results basedon the DIC and Bayes factors reported in Table 4.The proposed mutually exciting model has the lowestaverage sum of squared errors and thus performs thebest in terms of both out-of-sample predicative powerand within-sample model fit. In comparison, the nestedself-exciting model underperforms in predicative poweronly slightly thanks to the capture of a considerableportion of the exciting effects among advertisementclicks. Ignoring all exciting effects among advertise-ment clicks completely, the Poisson process model

Table 7 Model Comparison for Out-of-Sample Prediction

Model Mutually exciting Self-exciting Poisson process

Average sum of 1.514 1.545 1.705squared errors

Page 16: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and ConversionManagement Science 60(6), pp. 1392–1412, © 2014 INFORMS 1407

demonstrates the poorest model performance in all thethree criteria.

As is noted in §4.3, we are also interested in compar-ing the model performance with the logistic conversionmodel, a benchmark representing the commonly usedbinary response model for purchase conversion. Tocompare the out-of-sample prediction performance ofthis model with the mutually exciting model, we usethe logistic conversion model to predict the probabilityof purchase occurring in the fourth month for the 1,000individuals in the predictive sample randomly selectedabove. The numbers of each individual’s various typesof ad clicks in the fourth month are simulated usingPoisson distributions, whose parameters are drawnfrom the posterior distributions estimated by Bayesianinference given the previous three months’ data. Thesimulated ad click numbers are then plugged into thelogistic regression outcomes to calculate the predictedpurchase probability in the fourth month for eachindividual. Out-of-sample prediction performance iscompared using the SSE between the predicted pur-chase probability and the observed purchase occurrence.We find that the logistic conversion model has an SSEequal to 18.838, whereas the mutually exciting modelhas an SSE equal to 14.475. The substantial differencein prediction accuracy (over 30%) thus confirms thesuperior performance of our proposed model.

6.3. Incorporating Additional InformationMarketing researchers are often interested in gainingricher implications by incorporating marketing mixvariables (such as prices and promotions) and consumerdemographic information (such as age, gender, andincome level) into the model when these data areavailable. Our modeling framework is general enoughto facilitate such extension, as we demonstrate below.

To incorporate individual-specific and/or time-varying covariates into the model, we can allow theindividual baseline intensity �i to take a more gen-eral form, �i4t5, which is dependent on these newlyintroduced explanatory variables. Specifically, we canrewrite the conditional intensity function in Equa-tion (5) as

�ik4t �Hi

t5 = �ik4t5exp4�kN

iK4t55

+

K−1∑

j=1

N ij 4t5∑

l=1

�jk exp4−�j4t − tj4i5

l 551 (12)

wherelog�i4t5= �i +êXi4t51

�i ∼MVNK4�� +ìZi1è�50(13)

Here �i4t5= 6�i14t51 0 0 0 1�

iK4t57

′ and �i = 6�i11 0 0 0 1�

iK7

are both vectors; Xi4t5 are M -dimensional time-varyingcovariates (e.g., marketing mix variables); Zi are L-dimensional individual-specific covariates (e.g., con-sumer demographic information); and ê and ì are

K ×M and K ×L matrices of model parameters, respec-tively. We can estimate the extended model usingsimilar Bayesian estimation strategy as the originalmodel and applying additional Metropolis and Gibbssampling steps for ê and ì. The main challenge liesin the integral over the time-varying covariates incomputing the likelihood function. Complex as it mightbecome, the likelihood function can be derived in aclosed form in most cases (e.g., when Xi4t5 is a stepfunction over time, such as prices). In other cases,numerical integration or Monte Carlo methods shouldalways apply.

To illustrate how the above model extension can beimplemented in the context of our study, we acquireadditional data from the focal firm on its daily averagemarkup ratio to account for the firm’s marketing mixduring the study period. The daily average markupratio is calculated by averaging the ratio between thesales price and the cost over all orders placed every dayfrom April to June, 2008. The data exhibit stable patternand indicate no major change in the firm’s sales andpromotion strategy during the study period. We usethese data as the Xi4t5 variables in Equation (13) andintroduce four new parameters, ê = 6�11�21�31�47

′,into the original mutually exciting model. The likeli-hood function and the MCMC algorithm for estimationare detailed in §A.4 of the appendix. The estimationresults show that our original results are robust aftercontrolling the marketing mix. The estimates of allmajor parameters 8�1�1�9 from the extended modelare very close to those from the original model, and theposterior mean values of ê are unsurprisingly negative.Because of the page limit, we omit the detailed estima-tion results in this paper, which are available uponrequest. In addition, we also compute the fit statisticsof the extended model: the DIC is 192,050.06, and thelog-marginal likelihood is −94106308, which are bothinferior to those of the original mutually exciting modelas in Table 4. It thus indicates that incorporating thisadditional marketing mix component in the context ofour study does not improve the model performance.

7. Discussion and ConclusionIn this paper, we develop a Bayesian hierarchical modelthat incorporates the mutually exciting point processand individual heterogeneity to study the conversioneffects of different online advertising formats. Themutually exciting point process offers us a flexibleframework to model the dynamic and stochastic interac-tions among online consumers’ advertisement clickingand purchasing behaviors. To account for heterogeneityamong consumers, our model allows them to havedifferent propensities for ad clicking and purchas-ing using random effects for their baseline intensities.We successfully apply the MCMC method to obtain

Page 17: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and Conversion1408 Management Science 60(6), pp. 1392–1412, © 2014 INFORMS

Bayesian inference for our model. We construct con-version probability based on our proposed mutuallyexciting model to properly evaluate the conversioneffects of various types of online advertisements. Tocompute the conversion probability and predict con-sumers’ future behaviors, we develop a simulationalgorithm by extending the existing algorithm to mutu-ally exciting point processes with posterior samplingof parameters. Using proprietary data from a majorvendor of consumer electronics, we demonstrate thatour proposed mutually exciting model has superiorgoodness of fit and leads to proper evaluation of con-version effects by successfully capturing the excitingeffects among advertisement clicks.

7.1. Related Modeling FrameworksEmpirical studies of conversion and attribution throughmultiple advertising channels typically require multi-variate models. As we discuss below, compared withother existing multivariate modeling frameworks, themutually exciting point process provides a flexiblecontinuous-time model to accommodate multivariateevent data with precise time-stamp records and accountfor the complete event history.

When the data are only recorded as counts withinfixed time periods, a Poisson count model with latentmultivariate linear or simultaneous equation modelsfor the unobserved Poisson parameters has been suc-cessfully applied in marketing (e.g., Dong et al. 2011).However, if the original data are disaggregate (e.g.,generated by a continuous-time process), temporalaggregation such as converting time-stamped inci-dences into daily or monthly counts has been known tocause biased estimation of the magnitude and durationof advertising effects (Leone 1995, Tellis and Franses2006). Indeed, when daily aggregation is applied to ourdata, it is obvious that the loss of ordering in clumpyad-click and purchase events within a day can causeerroneous estimation of the advertising effects. To testit, we aggregate our data into daily counts and test aversion of the multivariate Poisson count model, where(a) the Poisson parameter for number of purchases is alog-linear model of the ad clicks in the same periodand the lagged purchase counts, and (b) the Poissonparameters for the numbers of various types of adclicks are log-linear models of the lagged ad clicksand purchases. We compare out-of-sample predictionwith our model and find that its performance is sub-stantially inferior (the prediction is inferior to thatof the Poisson process model in Table 7). Therefore,instead of selecting a time interval for aggregation,the mutually exciting point process is a natural andneeded development in modeling multivariate eventdata in continuous time.

Another approach to modeling multivariate eventdata is to correlate interincidence durations with

Sarmanov class distributions (e.g., Park and Fader 2004)or, more generally, copulas (e.g., Danaher and Smith2011). A Sarmanov class model, which can be shownas a special case of copulas, does not easily extendto more than three dimensions (Danaher and Smith2011). The number of parameters in the Sarmanovclass distribution increases exponentially with themodel dimension, and it often involves considerablealgebraic derivation of the likelihood function. Forour research context, in which past purchases andclicks on the various types of ads can all influence theprobability of clicking or buying at the current time,correlating high-dimensional interincidence durationsby the Sarmanov distribution could be computation-ally impractical. Other copula models can be morecomputationally tractable. But these models still haveto assume that the current time hazard is correlatedwith a fixed number of concurrent and past durations,which amounts to the Markov property (Park andFader 2004). For our data, in which a random numberof events often occur in a clustered fashion, it is hardto justify selecting an arbitrary number of correlateddurations and assuming it is a Markov process. Incontrast, the mutually exciting point process does notassume Markov property by allowing dependence onall previous events and time intervals and letting theireffects decay over time, which is a more natural choicein our research context.

7.2. Data Availability and LimitationIn addition to the ad clicking data, other online activitydata may also be relevant in studying the conversioneffects of online advertisements, for example, users’direct visit data (i.e., directly visiting the firm’s websitewithout clicking any advertising link) or ad exposuredata (i.e., individual’s level of exposure to variousadvertising broadcasting including display ads, tele-vision ads, etc.). Thanks to the abundance of digitalfootprints, these types of data are becoming increas-ingly available to researchers (e.g., Zantedeschi et al.2013). Our modeling framework is general enough toincorporate these data once available. For example, wecan incorporate direct visits into our current model asan additional marginal process (i.e., the (K + 1)th pro-cess), which can be influenced by other processes andcontributes to purchase conversions as well. We canfurther extend the model specification in §6.3 to allowXi4t5 to include each individual’s time-varying ad expo-sure levels to capture their effects on the occurrence ofad clicks and purchases.

Neither direct visit data nor ad exposure data areavailable in our data set. We would thus like to discussto what extent our results are robust given such datalimitation. Consumers might visit the website directly anumber of times after clicking on an ad before makinga purchase, and these direct visits might contribute to

Page 18: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and ConversionManagement Science 60(6), pp. 1392–1412, © 2014 INFORMS 1409

the conversion as well. Without observing the directvisits, the expected conversion effects of all these unob-served events are implicitly incorporated in the modelparameters formulating the conditional intensity forpurchase occurrence as in Equation (4). If direct visitswere observed, their effects would be separated outfrom the current model parameters and explicitly rep-resented by the model parameters for the additional4K + 15th process. Therefore, in the two cases with andwithout direct visit data, the model parameters incor-porate different effects and hence should be interpreteddifferently. Nevertheless, the conversion probabilityas we estimate in this paper, which delivers the mainimplications on ad effectiveness, would remain robust,because, by its definition, conversion probability takesexpectation over all possible events (either explicitlymodeled or implicitly incorporated) happening betweenan initial ad click and the final purchase conversion.Likewise, without observing the ad exposure data, weincorporate the expected ad exposure level for eachindividual in the individual-specific baseline intensity.The actual ad exposure data would generally vary overtime if they were observed. As long as the temporalvariation of the exposure data is stationary withoutsystematic trends or disproportionately large shocks,the conversion probability will remain robust afteraveraging across all individuals over time. However,if there were dramatic or systematic variations in adexposure levels (e.g., caused by major changes in thefirm’s advertising strategy), then missing advertisingexposure data could bias the estimates. Nonetheless, toour knowledge, there was not any major change in thefirm’s marketing efforts during the period of our study,as is verified by the top marketing manager of the firmfurnishing the data. It thus sustains our confidence inthe robustness of our results.

7.3. ConclusionThis study provides valuable managerial implica-tions for marketing managers seeking optimal onlineadvertising strategies as well as Internet advertisingproviders. We underscore a new perspective in mea-suring the effects of online advertisement clicks onpurchase conversion. We suggest that to properly assessthe conversion effects of various types of online adver-tisements, it is inadequate to merely focus on the directeffects of advertisement clicks on purchase probabilitiesper se. Even though an advertisement click does notlead to immediate purchase, it may increase the prob-abilities of subsequent clicks through other formatsof advertisements, which in turn contribute to thefinal conversion. Such indirect contribution should notbe neglected in evaluating the conversion effects ofadvertisements, which calls for novel modeling meth-ods. Our proposed mutually exciting model and theassociated conversion probability provide marketing

managers and Internet advertising providers with aninnovative method readily applicable to the propermeasurement of the efficacy of online advertisementsactualizing this particular perspective.

The results from our analysis shed new light onthe understanding of the effectiveness of differenttypes of online advertisements. We show that displayadvertisements are likely to stimulate subsequent visitsthrough other online advertisement formats, suchas search advertisements, though they have a lowdirect effect on purchase conversions. Neglecting sucheffects and overemphasizing the “last click” effects, thecommonly used measure of conversion rate is biasedtoward search advertisements and underestimates therelative effectiveness of display advertisements the mostseverely. For decision makers who are to allocate anadvertising budget among various online advertisingformats, our results suggest display advertisementshave not been given their due share of appreciation,and a rebalance of the advertising spending portfoliocould optimize the return on investment. On the otherhand, a better understanding of the effectiveness ofdifferent online advertising formats can help onlineadvertising providers to reassess their pricing strategiesfor these online advertising vehicles.

In addition, our method furnishes a useful tool forInternet marketers to assess the future values of theirpotential customers and target their marketing efforts.We demonstrate the superior predictive power of ourmodel in forecasting consumers’ future advertisementclicking and purchasing behaviors. Beyond the typicalpredictive models for future purchase activities, ourmodeling approach also enables us to predict nonpur-chase activities at the same time. The ability to predictfuture responses to different online advertising formatsis especially important for online marketing managersto deliver targeted advertisements to potential cus-tomers in an effective manner.

This study also leads to interesting directions forfuture research. For instance, because we provide anapproach to more properly estimate the conversionprobabilities for various types of online ads, futurestudies that combine the cost information for differenttypes of online advertisements with our findings canhelp marketers design a more efficient budget alloca-tion scheme for online advertising. In addition, giventhat our data structure only involves user clicks onadvertisements, the exciting effects we study focuson the effects of prior ad clicks on the probabilityof later ad clicks occurring. As discussed previously,when additional user online behavior data becomeavailable (e.g., advertising exposures, direct websitevisits), our model can be easily adapted to incorporatesuch data to deliver richer and deeper implicationsabout more detailed processes of consumer respond-ing to online advertisements. Moreover, our general

Page 19: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and Conversion1410 Management Science 60(6), pp. 1392–1412, © 2014 INFORMS

modeling framework can be applied to many othercontexts involving multiple interdependent activitieswith individual heterogeneity in continuous time.

AcknowledgmentsThe authors thank Eric Bradlow, the associate editor, and twoanonymous reviewers, whose insightful and constructivecomments greatly improved this paper. The authors alsothank Yu (Jeffrey) Hu, Szu-Chi Huang, Preston McAfee, CarlMela, Eric Overby, Sandra Slaughter, Frenkel ter Hofstede,and D. J. Wu for generous input. They thank the seminarparticipants at the Georgia Institute of Technology, JointStatistical Meetings (2011), the Atlanta–Athens Conference onResearch in Information Systems (2012), the Workshop onInformation Systems and Economics (2012), and the ChinaSummer Workshop on Information Management (2013) forhelpful feedback.

Appendix

A.1. Likelihood FunctionThe likelihood function for individual i can be written asfollows:

Li4�1�1�1�i�Datai5 =

K∏

k=1

{N ik4T 5∏

l=1

[

�ikexp4�kN

iK4t

k4i5l 55

+

K−1∑

j=1

N ij 4t

k4i5l 5∑

m=1

�jkexp4−�j4tk4i5l −t

j4i5m 55

]

·exp[

−�ik

N iK 4T 5∑

m=0

exp4�km54tK4i5m+1 −tK4i5m 5

K−1∑

j=1

N ij 4T 5∑

m=1

�jk

�j

41−exp4−�j4T −tj4i5m 555

]}

0

The likelihood function for all individuals is thus

L4�1�1�1� �Data5=

I∏

i=1

Li4�1�1�1�i�Datai51

where �= 8�i9Ii=1.

A.2. MCMC AlgorithmBelow are the details of the MCMC algorithm used in thisstudy.

Step 12 Sample �. We consider the prior distribution of �jk

following Gamma4a�1 b�5, where a� = b� = 1/2. We use theMetropolis–Hastings algorithm to sample �. The proposalfunction is a random-walk generation such that the nextdraw �∗ is drawn from a log-normal distribution, i.e., �∗

jk ∼

log-N4log4�jk51 �2jk5. We adjust the variance �2

jk adaptivelysuch that the acceptance rate of proposed �∗ is between 0.1and 0.4. The accepting probability for �∗ is given as

min{L4�∗1�1�1� �Data5

j1kGamma4�∗jk � a�1b�5·

j1k�∗jk

L4�1�1�1� �Data5∏

j1kGamma4�jk � a�1b�5·∏

j1k�jk

11}

1

where the last terms in the numerator and denominator cor-respond to the Jacobian in the log-normal proposal function.

Step 22 Sample �. We consider the prior distribution of �j

following Gamma4a�1 b�5, where a� = b� = 1/2. We use theMetropolis–Hastings algorithm to sample �. The proposalfunction is a random-walk generation such that the nextdraw �∗ is drawn from a log-normal distribution, i.e., �∗

j ∼

log-N4log4�j 51 �2j 5. We adjust the variance �2

j adaptively suchthat the acceptance rate of proposed �∗ is between 0.1 and0.4. The accepting probability for �∗ is given as

min{

L4�1�∗1�1� �Data5∏

j Gamma4�∗j � a�1 b�5 ·

j �∗j

L4�1�1�1� �Data5∏

j Gamma4�j � a�1 b�5 ·∏

j �j

11}

0

Step 32 Sample �. We consider the prior distribution of� following MVNK4��1 è�5, where �� = 0, and è� = 104IK(IK is the K ×K identity matrix). We use the Metropolis–Hastings algorithm to sample �. The proposal function isa random-walk generation such that the next draw �∗ isdrawn from a normal distribution, i.e., �∗

k ∼N4�k1 �2k 5. We

adjust the variance �2k adaptively such that the acceptance

rate of proposed �∗ is between 0.1 and 0.4. The acceptingprobability for �∗ is given as

min{

L4�1�1�∗1� �Data5MVNK4�∗ � ��1 è�5

L4�1�1�1� �Data5MVNK4� � ��1 è�511}

0

Step 42 Sample �i. We consider the prior distribution of each�i following a K-dimensional multivariate log-normal distri-bution log-MVNK4��1è�5. We use the Metropolis–Hastingsalgorithm to sample �i. The proposal function is a random-walk generation such that the next draw �i∗ is drawn froma log-normal distribution, i.e., �i∗

k ∼ log-N4log4�ik51 �

2k 5. We

adjust the variance �2k adaptively such that the average

acceptance rate of all proposed �i∗’s is between 0.1 and 0.4.The accepting probability for �i∗ is given as

min{

Li4�1�1�1�i∗ �Datai5·log-MVNK4�

i∗ ���1è�5·∏

k�i∗k

Li4�1�1�1�i �Datai5·log-MVNK4�

i ���1è�5·∏

k�ik

11}

0

Step 52 Sample ��. We consider the conjugate prior distri-bution for ��, which follows a K-dimensional multivariatenormal distribution MVNK4���1 �

5, where ��� = 0 and�

= 106IK . The next draw �∗� is drawn from a multivariate

normal distribution

�∗

� ∼MVNK4A1B51

where A = B′44∑I

i=1 log�i5′è−1� + �′

��è−1

��5′ and B = 4Iè−1

� +

è−1��5−1.

Step 62 Sample è�. We consider the conjugate prior distribu-tion for è�, which follows a K-dimensional inverse Wishartdistribution IW4S−11 �5, where S = IK and � = 1. The nextdraw è∗

� is drawn from an inverse Wishart distribution,

è∗

� ∼ IW

(( I∑

i=1

4log�i− ��54log�i

− ��5′+ S

)−1

1 I + �

)

0

A.3. Simulation AlgorithmsTo simulate the point processes according to our proposedmodel, we extend the thinning algorithm in Ogata (1981) tomutually exciting point processes with the parameter valuesdrawn from the posterior distribution obtained during the

Page 20: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and ConversionManagement Science 60(6), pp. 1392–1412, © 2014 INFORMS 1411

estimation process. The following is the detailed algorithmused in §6.1 to simulate the behavior of a representativeconsumer after clicking on a particular type of advertisementat time t0 = 0.

1. Draw �, �, �, ��, è� with replacement from the pos-terior samples obtained during estimation. Generate �i ∼

MVNK4��1è�5.2. Simulate a point process in 601 T 7 given �, �, �, �i, and

a realized type j0 point at t0 = 0 (j0 = 11 0 0 0 1K − 1).(a) Let t = 0, n= 0, nK = 0, m=

∑Kk=14�

ik +�j0k

)(b) Repeat until t > T

(i) Simulate s ∼ Exp4m5(ii) Set t = t + s(iii) If t < T , calculate

�k = �ik exp4�knK5+�j0k

exp4−�j0t5

+

n∑

l=11 jl 6=K

�jlkexp4−�jl

4t − tl55

and let �=∑K

k=1 �k. Generate U ∼ Unif40115.(A) If U ≤ �/m, n = n + 1, tn = t. Simulate jn ∼

multinomial411�1/�1 0 0 0 1�K/�5.—If jn =K, nK = nK + 1,

m= �−�ik exp4�k4nK − 155+�i

k exp4�knK50

—If jn 6=K,

m= �+�jnk0

(B) If U >�/m, m= �.(c) The simulation output is 8t11 0 0 0 1 tn9 and 8j11 0 0 0 1 jn9.3. Repeat R times of Steps 1 and 2.

The simulation algorithm used in §6.2 to predict an indi-vidual consumer’s behavior in the fourth month is similarto the above algorithm, with two differences: (1) Becausethe prediction is based on the activity history of a specificindividual consumer i, �i is now supposed to be sampledfrom the posterior �4�i � data5. Therefore, in Step 1, we draw�i with replacement from the posterior samples obtainedin the estimation step. (2) In Step 2, instead of assuminga certain type of point realized at time t0, now the initialeffect at t0 is the accumulative effect of the observed behaviorof individual i in the data of the first three months. Conse-quently, the intensity we use to simulate Poisson processbefore thinning is specific to individual i’s history, and thecalculation of m in Step 2(a) becomes

m=

K∑

k=1

{

�ik exp4�kN

iK4t055+

K−1∑

j=1

N ij 4t05∑

l=1

�jk exp4−�j4t0 − tj4i5l 55

}

0

Accordingly, �k in Step 2(b)(iii) also needs to be modified as

�k = �ik exp4�k4nK +N i

K4t0555+K−1∑

j=1

N ij 4t05∑

l=1

�jk exp4−�j4t − tj4i5l 55

+

n∑

l=11 jl 6=K

�jlkexp4−�jl

4t − tl550

A.4. Model ExtensionFor the extended model incorporating the additional market-ing mix data described in §6.3, the likelihood function forindividual i can be derived as

Li4�1�1�1�1�i�Datai5

=

K∏

k=1

{N ik4T 5∏

l=1

[

�ik exp8�kptk4i5l

9exp4�kNiK4t

k4i5l 55

+

K−1∑

j=1

N ij 4t

k4i5l 5∑

m=1

�jk exp4−�j4tk4i5l − t

j4i5m 55

]

· exp[

−�ik

N iK 4T 5∑

m=0

exp4�km5∫ t

K4i5m+1

tK4i5m

exp8�kpt9 dt

K−1∑

j=1

N ij 4T 5∑

m=1

�jk

�j

41 − exp4−�j4T − tj4i5m 555

]}

1

where 8pt9 is the daily average markup ratio. The likelihoodfunction for all individuals is thus

L4�1�1�1�1� �Data5=

I∏

i=1

Li4�1�1�1�1�i�Datai50

The MCMC algorithm for estimating the extended modelis similar to the one described in §A.2, where the likelihoodfunctions are replaced with Li and L. To sample �, we needto add an additional step between Steps 3 and 4 in §A.2, asis described below.

Step for sampling �. We consider the prior distribution of �following MVNK4��1 è�5, where �� = 0 and è� = 104IK (IKis the K ×K identity matrix). We use Metropolis–Hastingsalgorithm to sample �. The proposal function is a random-walk generation such that the next draw �∗ is drawn froma normal distribution, i.e., �∗

k ∼ N4�k1 �2k 5. We adjust the

variance �2k adaptively such that the acceptance rate of

proposed �∗ is between 0.1 and 0.4. The accepting probabilityfor �∗ is given as

min{

L4�1�1�1�∗1� �Data5MVNK4�∗ � ��1 è�5

L4�1�1�1�1� �Data5MVNK4� � ��1 è�511}

0

ReferencesAbhishek V, Fader PS, Hosanagar K (2013) Media exposure through

the funnel: A model of multi-stage attribution. Working paper,Carnegie Mellon University, Pittsburgh.

Ait-Sahalia Y, Cacho-Diaz J, Laeven RJA (2013) Modeling financialcontagion using mutually exciting jump processes. Workingpaper, Princeton University, Princeton, NJ.

Anand P, Sternthal B (1991) Perceptual fluency and affect withoutrecognition. Memory Cognition 19(3):293–300.

Anderson JR, Milson R (1989) Human memory: An adaptive perspec-tive. Psych. Rev. 96(4):703–719.

Bijwaard GE, Franses PH, Paap R (2006) Modeling purchases asrepeated events. J. Bus. Econom. Statist. 24(4):487–502.

Bowsher CG (2007) Modeling security market events in continu-ous time: Intensity based, multivariate point process models.J. Econometrics 141(2):876–912.

Brown J (1958) Some tests of the decay theory of immediate memory.Quart. J. Experiment. Psych. 10(1):12–21.

Page 21: Path to Purchase: A Mutually Exciting Point Process Model ...lxu6/papers/Xu-etal_2014_MS_PointProcess.pdf · Path to Purchase: A Mutually Exciting Point Process ... heterogeneity

Xu, Duan, and Whinston: Point Process Model for Advertising and Conversion1412 Management Science 60(6), pp. 1392–1412, © 2014 INFORMS

Bucklin RE, Sismeiro C (2003) A model of Web site browsing behaviorestimated on clickstream data. J. Marketing Res. 40(3):249–267.

Cox DR, Isham V (1980) Point Processes (Chapman and Hall,New York).

Daley DJ, Vere-Jones D (2003) An Introduction to the Theory of PointProcesses (Springer, New York).

Danaher PJ, Smith MS (2011) Modeling multivariate distributionsusing copulas: Applications in marketing. Marketing Sci. 30(1):4–21.

De P, Hu Y(J), Rahman MS (2010) Technology usage and online sales:An empirical study. Management Sci. 56(11):1930–1945.

Dong X, Chintagunta PK, Manchanda P (2011) A new multivariatecount data model to study multi-category physician prescriptionbehavior. Quant. Marketing Econom. 9(3):301–337.

Dreze X, Zufryden F (1998) Is internet advertising ready for primetime? J. Advertising Res. 38(3):7–18.

Engel JF, Blackwell RD, Miniard PW (1995) Consumer Behavior (TheDryden Press, Oak Brook, IL).

Gelfand AE, Dey DK (1994) Bayesian model choice: Asymptoticsand exact calculations. J. Royal Statist. Soc. Ser. B (Methodol.)56(3):501–514.

Hawkes AG (1971a) Point spectra of some mutually exciting pointprocesses. J. Royal Statist. Soc.: Ser. B 33(3):438–443.

Hawkes AG (1971b) Spectra of some self-exciting and mutuallyexciting point processes. Biometrika 58(1):83–90.

Interactive Advertising Bureau, PricewaterhouseCoopers (2012)IAB Internet advertising revenue report 2011 full-year results.Market research report, Interactive Advertising Bureau andPricewaterhouseCoopers, New York.

Jacoby LL, Dallas M (1981) On the relationship between autobio-graphical memory and perceptual learning. J. Experiment. Psych.:General 110(3):306–340.

Janiszewski C (1990) The influence of nonattended material on theprocessing of advertising claims. J. Marketing Res. 27(3):263–278.

Lee AY (2002) Effects of implicit memory on memory-based versusstimulus-based brand choice. J. Marketing Res. 39(4):440–454.

Lee AY, Labroo AA (2004) The effect of conceptual and perceptualfluency on brand evaluation. J. Marketing Res. 41(2):151–165.

Leone RP (1995) Generalizing what is known about temporal aggre-gation and advertising carryover. Marketing Sci. 14(3):141–150.

Li H(A), Kannan PK (2014) Attributing conversions in a multichannelonline marketing environment: An empirical model and a fieldexperiment. J. Marketing Res. 51(1):40–56.

Manchanda P, Dube J-P, Goh KY, Chintagunta PK (2006) The effectof banner advertising on internet purchasing. J. Marketing Res.43(1):98–108.

MarketingSherpa (2012) 2012 search marketing benchmark report—PPC edition. Market research report, MarketingSherpa, Jack-sonville, FL.

MediaMind (2010) MediaMind global benchmark report: Standardbanners—Non-standard results. Market research report, Media-Mind, New York.

Moe WW, Fader PS (2004) Dynamic conversion behavior ate-commerce sites. Management Sci. 50(3):326–335.

Mohler GO, Short MB, Brantingham PJ, Schoenberg FP, Tita GE(2011) Self-exciting point process modeling of crime. J. Amer.Statist. Assoc. 106(493):100–108.

Montgomery AL, Li S, Srinivasan K, Liechty JC (2004) Modelingonline browsing and path analysis using clickstream data.Marketing Sci. 23(4):579–595.

Muter P (1980) Very rapid forgetting. Memory Cognition 8(2):174–179.Nedungadi P (1990) Recall and consumer consideration sets: Influ-

encing choice without altering brand evaluations. J. ConsumerRes. 17(3):263–276.

Ogata Y (1978) The asymptotic behavior of maximum likelihoodestimators for stationary point processes. Ann. Inst. Statist. Math.30(2):243–261.

Ogata Y (1981) On Lewis’ simulation method for point processes.IEEE Trans. Inform. Theory 27(1):23–31.

Ogata Y (1998) Space-time point-process models for earthquakeoccurrences. Ann. Inst. Statist. Math. 50(2):379–402.

Park Y-H, Fader PS (2004) Modeling browsing behavior at multiplewebsites. Marketing Sci. 23(3):280–303.

Rothschild ML, Hyun YJ (1990) Predicting memory for compo-nents of TV commercials from EEG. J. Consumer Res. 16(4):472–478.

Shapiro S (1999) When an ad’s influence is beyond our consciouscontrol: Perceptual and conceptual fluency effects caused byincidental ad exposure. J. Consumer Res. 26(1):16–36.

Shapiro S, Macinnis DJ, Heckler SE (1997) The effects of incidentalad exposure on the formation of consideration sets. J. ConsumerRes. 24(1):94–104.

Specific Media (2011) Consumer privacy: Choice, clarity, control.Market research report, Specific Media, London.

Tellis GJ, Franses PH (2006) Optimal data interval for estimatingadvertising response. Marketing Sci. 25(3):217–229.

Zantedeschi D, Feit EM, Bradlow ET (2013) Measuring multi-channeladvertising effectiveness using consumer-level advertisingresponse data. Working paper, The Wharton School, Universityof Pennsylvania, Philadelphia.

Zhang Y, Bradlow ET, Small DS (2013) New measures of clumpinessfor incidence data. J. Appl. Statist. 40(11):2533–2548.