A Game-theoretic Machine Learning Approach for Revenue ... · A Game- heoretic Machine Learning Approach for Revenue Maximization in Sponsored Search Di He†, Wei Chen‡, Liwei

A Game- heoretic Machine Learning Approachfor Revenue Maximization in Sponsored Search

Di He†, Wei Chen‡, Liwei Wang† and Tie-Yan Liu‡

†Key Laboratory of Machine Perception, MOE, School of Electronics Engineeringand Computer Science, Peking University, Beijing, P.R.China

‡Microsoft Research Asia, Building 2, No. 5 Danling Street, Beijing, P.R.China†{hedi,wanglw}@cis.pku.edu.cn, ‡{wche,tyliu}@microsoft.com

AbstractSponsored search is an important monetizationchannel for search engines, in which an auctionmechanism is used to select the ads shown to usersand determine the prices charged from advertisers.There have been several pieces of work in the lit-erature that investigate how to design an auctionmechanism in order to optimize the revenue of thesearch engine. However, due to some unrealistic as-sumptions used, the practical values of these stud-ies are not very clear. In this paper, we propose anovel game-theoretic machine learning approach,which naturally combines machine learning andgame theory, and learns the auction mechanism us-ing a bilevel optimization framework. In particular,we first learn a Markov model from historical datato describe how advertisers change their bids in re-sponse to an auction mechanism, and then for anygiven auction mechanism, we use the learnt modelto predict its corresponding future bid sequences.Next we learn the auction mechanism through em-pirical revenue maximization on the predicted bidsequences. We show that the empirical revenue willconverge when the prediction period approaches in-finity, and a Genetic Programming algorithm caneffectively optimize this empirical revenue. Ourexperiments indicate that the proposed approachis able to produce a much more effective auctionmechanism than several baselines.

1 IntroductionSponsored search is an important means of Internet moneti-zation, and is the driving force of major search engines to-day. In sponsored search, keyword auction is used to de-termine the ranking and pricing of the ads, and thus affectsthe revenue of the search engine. The research on auctionmechanism design has attracted the attention of many re-searchers from the areas of artificial intelligence and elec-tronic commerce [Broder et al., 2007; Graepel et al., 2010;Edelman et al., 2005].

In keyword auctions, each advertiser is required to submit abid for his/her ad. When the keyword is issued by a web user,the search engine will rank the ads, show the top-ranked ones

to the user, and then charge the corresponding advertisers incertain conditions (e.g., if their ads are clicked). Generalizedsecond price (GSP) auction is a family of auction mechanismsthat has been popularly used by today’s search engines, whichranks ads according to the products of their bid prices andquality scores; and charges a clicked ad by the minimum bidprice to maintain its current rank position.

Given the critical role of the auction mechanism in spon-sored search, many people have studied the optimization ofits performance in terms of search engine revenue [Lahaieand Pennock, 2007; Garg et al., 2007; Zhu et al., 2009b;Radlinski et al., 2008; Zhu et al., 2009a]. These studiescan be categorized into two groups. The first group [Lahaieand Pennock, 2007; Garg et al., 2007] addresses the prob-lem from a game-theoretic perspective. Some works in thisgroup optimize the worse-case revenue in symmetric Nashequilibria; some others consider Bayesian optimal mecha-nism design with the knowledge of bidders’ value distribu-tion. They usually require strong (and somewhat unrealistic)assumptions on the game, such as the full information aboutthe values/bids of the bidders (or their distributions) and thefull rationality of the bidders. However, empirical evidencehas shown that these assumptions might not hold in prac-tice, especially considering that sponsored search auctionsare complex, fast-paced, and involve a very large number ofbidders[Pin and Key, 2011; Duong and Lahaie, 2011; Edel-man and Ostrovsky, 2007]. In such a real situation, advertis-ers will have limited information access and bounded ratio-nality, therefore it is inappropriate to perform classical game-theoretic analysis based on the aforementioned assumptions.The second group [Zhu et al., 2009b; Radlinski et al., 2008;Zhu et al., 2009a] tackles the task from a machine learningperspective, in particular, directly adopts conventional ma-chine learning methods to optimize the revenue on historicalbidding data. The underlying assumption is that the bid dis-tribution will not change over time thus the optimal auctionmechanism learnt from historical data can generalize to futuretest data. However, as we know, there is a so-called second-

order effect in sponsored search, i.e., many advertisers willadjust the bids according to their utilities, as the responses toa new auction mechanism. As a result, the “optimal” auctionmechanism directly learnt from historical bidding data willno longer be optimal after it is deployed since advertisers willchange their bids in response to it.

T

Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence

206

To overcome the above drawbacks, we propose a novel ap-proach, which can naturally combine game theory and ma-chine learning using a bilevel optimization framework, so asto simultaneously avoid the strong assumptions and handlethe second-order effect. For ease of reference, we call theapproach a game-theoretic machine learning approach. Gen-erally speaking, in the approach, we first learn an advertiserbehavior model from historical data to describe how adver-tisers change their bids in response to a given auction mech-anism. We then predict advertiser’s bid changes using thismodel during the process of auction mechanism learning.

Specifically, when building the advertiser behavior model,we consider that the advertisers do not have a very goodknowledge about the detailed auction mechanism or thebids of other advertisers. Instead, they adjust their bidsmainly based on the previous bids of their own and the sig-nals provided by the search engine. We assume the time-homogeneous Markov property for this dependency, and thesignals mainly include the number of impressions, the num-ber of clicks, and the average cost per click (which are re-ferred to as Key Performance Index, or KPI for short). Welearn the parametric transition matrix of the Markov model bya maximum likelihood method. For the next step, the learntadvertiser behavior model is used to predict the future bid se-quences of the advertisers after a new auction mechanism isdeployed. Then we estimate the revenue of the search engineover the predicted mechanism-dependent bids in a finite timeperiod (based on the historical logs), and call this quantitythe empirical revenue under the advertiser behavior model.We prove that the empirical revenue of an arbitrary auctionmechanism will converge when the length of the sequenceapproaches infinity, and then apply Genetic Programming tosearch for the optimal auction mechanism in terms of this em-pirical revenue.

We have conducted preliminary experiments to test the pro-posed approach. The experiments show very promising re-sults: our learnt auction mechanism can outperform severalbaselines including the widely used classic GSP auction.

2 Game-theoretic Machine LearningApproach

In this section, we introduce our proposed approach for rev-enue maximization in sponsored search, which we call agame-theoretic machine learning approach1.

2.1 PreliminariesAccording to [Caragiannis et al., 2012; Graepel et al., 2010;Edelman et al., 2005; Varian, 2007], sponsored search auc-tions can be mathematically formulated as follows. Assumethere are m ads from m advertisers; each advertiser i has avalue per click vi on his/her ad; and each ad i is representedby a feature vector xi ∈ Rd.

1In the literature of online learning, there is a branch calledgame-theoretic learning or adaptive mechanism design [Pardoe et

al., 2006]. The main difference between our work and theirs is thatwe learn a mechanism based on historical data, while they optimizethe performance of the mechanism using an online scheme.

Before the auction starts, the search engine receives a bidvector b = (b1, · · · , bi, · · · , bm) from the advertisers wherethe i-th component bi is the bid of the i-th advertiser. Whenthe keyword is issued by a web user u, the search enginewill rank the ads according to the products of their qualityscores f(xi) and bid prices bi, and show the top-ranked adsto the user. As a common practice, the quality score f(xi)

is defined as a compound function f(xi) = g(�CTR(xi)),where �CTR(xi) predicts the click probability of ad i at thetop-1 position, and g(.) is a monotone function. In the liter-ature, different g(.) functions have been used. For example,g(t) = 1 was used by Yahoo! in early 2000s; later g(t) =t has been used by several search engines; and recentlyg(t) = tα has been considered [Lahaie and McAfee, 2011;Lahaie and Pennock, 2007].

If ad i is placed at position j and is clicked by the user, ad-vertiser i will be charged by the following amount of moneyaccording to the Generalized Second Price rule,

Payf,b(i) =f(xσf,b(j+1))bσf,b(j+1)

f(xi)

where σf,b(j) is the index of the ad ranked at position jaccording to f(x)b.

The corresponding utility function for advertiser i can bewritten as follows,

Utif,b(i) = [vi − Payf,b(i)] {user u clicks on ad i}

where {.} is the indicator function.The revenue of the search engine can be obtained as below,

r(b, f, u) =�

i

Payf,b(σf,b(i)) {user u clicks on the ad at position i} (1)

Considering that in GSP, both the ranking and pricing rulesare determined once the quality score f is given, we will alsorefer to f as the auction mechanism if there is no confusionin the context.

2.2 Game Theory vs. Machine LearningGiven an auction mechanism space F , one of the goals of thesearch engine is to design an auction mechanism f∗ ∈ F tomaximize its revenue. In the literature of game theory, therehave been some attempts along this direction. For example,in [Lahaie and Pennock, 2007] the worst-case revenue in thesymmetric Nash equilibria is maximized, and in [Garg andNarahari, 2009; Garg et al., 2007] the Bayesian optimal auc-tion mechanism design is investigated with the value distri-bution of the bidders as public knowledge. In these works,some ideal assumptions have been employed. For instance,one usually assumes that the values/bids (or their distribu-tions) of the advertisers as well as the auction mechanism ofthe search engine are publicly accessible; however, in reality,an advertiser can only see the information about his/her ownad (e.g., the bids and KPI associated with the ad in a certainperiod of time) but have no access to the information of otheradvertisers or the search engine. For another instance, it isusually assumed that advertisers are rational enough to takethe best responses so as to maximize their utilities; however,

207

in reality, advertisers have very different and diverse biddingbehaviors: some advertisers are very active while some othersseldom change their bids; some are highly capable of plac-ing near-optimal bids while some others are not. As a result,the practical values of the aforementioned game-theoretic at-tempts are not very clear.

In recent years, some machine learning researchers havetried to optimize the empirical revenue of an auction mech-anism based on historical bidding data, in order to avoid theuse of unrealistic game-theoretic assumptions. For example,in [Zhu et al., 2009b; 2009a],the authors propose to simul-taneously optimize the revenue and relevance of the auctionmechanism on historical bidding data. However, these worksactually introduce another kind of assumption, which is aboutthe i.i.d. distribution of bidding data. This is a very founda-tional assumption used by statistical machine learning, whichguarantees that the model learnt from the training set can gen-eralize to the future test set. Unfortunately, this assumptiondoes not hold either in the context of sponsored search. Asmentioned in the introduction, the so-called second-order ef-fect makes the optimal auction mechanism on the historicalbids no longer optimal since advertisers will change their bidsin response to the auction mechanism in the future.

2.3 Bilevel Optimization FrameworkWe propose a bilevel optimization framework to address theaforementioned problems with previous works. The frame-work contains two levels of optimization, which models theadvertiser’s responses in the inner level and optimizes the rev-enue of the search engine in the outer level. The frameworknaturally combines game theory and machine learning, andtherefore we call it a game-theoretic machine learning ap-proach.

The proposed framework can be intuitively depicted usingFigure 1, and can also be mathematically characterized as fol-lows.

maxf∈F

R(f, g, S) (2)

s.t.ming∈G

L(g, S) (3)

where S represents the historical data; G is the space of ad-vertiser behavior models; L(g, S) is the loss function to facil-itate the learning of the behavior model; F is the space of theauction mechanisms; and R(f, g, S) is the empirical revenuedefined based on the historical data, the advertiser behaviormodel, and the auction mechanism f .

As can be seen from the figure, two kinds of training dataare used: advertisers’ auction logs and web users’ query andclick-through logs. The former record the historical bids ofthe advertisers in a period of time; the latter contain the key-words issued by users and their click behaviors in the sameperiod. As indicated by the formulas, the inner level of ourframework is to learn a mechanism-dependent advertiser be-havior model from historical auction logs, which is used topredict advertisers’ bid changes in the future. Since the ad-vertiser behavior model is mechanism-dependent, we can usethe model to predict advertisers’ future bid sequences basedon the users’ query and click-through data with respect to dif-ferent auction mechanisms. Then the outer level is to learn the

Figure 1: Game-Theoretic Machine Learning Framework forAuction Design

optimal auction mechanism in terms of the empirical revenueon the predicted bid sequences.

Here we would like to emphasize that it is actually veryfundamental to model sponsored search using bilevel opti-mization (and we regard this as one of our contributions). Thebilevel optimization framework cannot only characterize ourproposed approach, but also contain many previous works asits special cases. For example, the framework can cover theworst-case optimal mechanism design [Lahaie and Pennock,2007] if the inner level optimization problem characterizes aset of equilibrium conditions on the bids.

To realize the proposed framework, first of all, we needto build a reasonable model space G to describe the advertis-ers’ bidding behaviors. Second, we need to ensure that therevenue computed on the predicted bids by using this modelcan converge when the prediction period approaches infinity.Only in this way, the subsequent revenue optimization pro-cess will make sense. We will make detailed introduction ofhow we deal with these difficulties in the next sections.

3 Markov Advertiser ModelIn this section, we show that a Markov process can be usedto model how advertisers change their bids. Please note thatone can also employ other (more complicated) models in ourproposed bilevel optimization framework.

Intuitively, if an advertiser finds that his/her KPI in the lastperiod is below expectation, it is very likely that he/she willincrease the bid price in hope to get a better KPI (e.g., rankedin higher positions or receive more clicks). On the otherhand, if an advertiser feels that the numbers of impressionsand clicks are satisfactory but the cost per click is higher thanexpected, he/she may lower down the bid price. The Markovproperty indicates that advertisers’ bid changes only dependon their bids in a fixed number of previous time periods. Forsimplicity here we only consider the time-homogenous first-order Markov property (i.e., the next bid is independent ofpervious bids if the current bid is given, and the transitionmatrix is time-invariant).

To be more specific, we denote the bid vector at time t asbt = (bt1, · · · , bti, · · · , btm), in which the i-th component btiis the bid of the i-th advertiser at time t. We assume the bidprices that advertisers can assign come from a finite space

208

B. This assumption is reasonable considering that there is aminimum bid unit in real sponsored search systems and thebid price is always bounded.

Suppose in the t-th time period, a stream of users U t issuethe keyword and observe a number of ads. For each user u ∈U t, if u clicks on an ad, the corresponding advertiser will paya certain amount of money to the search engine according tothe pricing rule of the auction. At the beginning of the (t+1)-th time period, advertiser i will receive a report about the KPIof his/her ad in the t-th period. We use kpiti to denote thisreport, which can be considered as the output of a functionKPIi(bt, U t, f). Based on kpiti, advertiser i may changehis/her bid price to bt+1

i in order to be better off. That is,

P (bt+1i |bt, · · · ,bt, U1, · · · , U t, f) = P (bt+1

i |bti, kpiti) (4)

Denote Mi,kpi as the transition matrix for advertiser i withkpi.

Mi,kpi =

· · · · · · · · ·· · · pi,kpi(j, k) · · ·· · · · · · · · ·

(5)

where the element pi,kpi(j, k) is the probability of advertiseri changing his/her bid price from j to k given kpi, i.e.,

pi,kpi(j, k) = P (bt+1i = k|bti = j, kpi) (6)

Considering that kpit = (kpit1, · · · , kpitm) is a function ofbt, U t and auction mechanism f , and given kpit, advertis-ers independently change their bids according to the Markovprocesses, the probability of advertisers’ bid vector changingfrom b to b� can be written as,

P (bt+1 = b�|bt = b, U t, f) =�

i

pi,KPIi(b,Ut,f)(bi, b�i) (7)

It is clear that bt can also be considered as a Markov processand the transition matrix QUt,f is a function of auction mech-anism f and user stream U t. We refer to the matrix QU,f asthe advertiser behavior model.

For each advertiser i we observe {(bti, kpiti, bt+1i )}, t =

1, · · · , T in the historical auction logs and the correspond-ing transition matrices Mi,kpi can be learnt from these ob-servations. In particular, we assume for any t, the probabil-ity of bt+1

i is proportional to a truncated Gaussian functionp(b) = e−(b−µ)2 , b ≥ 0, in which the value of µ depends on(bti, kpi

ti). We use a linear function to model this dependency:

for any advertiser i, denote zti = (bti, kpiti, 1), then we have

µ =< w, zti > (8)

where w can be learnt by maximizing the log likelihood ofthe observed data using gradient descent(which correspondsto the inner level optimization).

w∗ = argminw(< w, zti > −bt+1i )2 (9)

By using (7), the advertiser behavior model can be con-structed with the learnt Mi,kpi, and we can use the modelto predict the future bid prices given any auction mechanism.Then the search engine revenue can be computed and opti-mized on the predicted mechanism-dependent bid prices. De-tails of this process will be introduced in the next section.

4 Empirical Revenue MaximizationIn this section, we discuss how to optimize search engine rev-enue over the bids predicted by the Markov advertiser be-havior model. One may have noticed that with the Markovmodel, the bid prices of the advertisers will always changeaccording to the transition matrix. Then a natural question iswhether the empirical revenue on such dynamically changingbid prices will converge when the time period approaches in-finity. In the following subsections, we will first give a prooffor the convergence, and then propose a Genetic Program-ming method to optimize the empirical revenue.

4.1 Convergence AnalysisFirst, we formally define the empirical revenue for a givenperiod of time.

Definition 1. Given the bid vector btand user stream U t

at

each time t = 1, · · · , T , the empirical revenue for the search

engine with auction mechanism f is defined as follows,

R(b1, · · · ,bT, U1, · · · , UT , f) =1T

T�

t=1

�

u∈Ut

r(bt, f, u) (10)

Next we show that under certain conditions the empiricalrevenue will converge in probability. The proof has two com-ponents: first we prove the existence of the expectation ofthe empirical revenue (we call it expected revenue), and thenwe prove the variance of empirical revenue approaches zerowhen the time period approaches infinity.Lemma 1. If user stream U t

is stochastically i.i.d. sampled

and the expectation of the Markov transition matrix with re-

spect to user stream EUQU,f is positive for auction mecha-

nism f , limT→∞ ER(b1, · · · , bT , U1, · · · , UT , f) exists.

Proof. Considering that U t is independent of bt, the ex-pected revenue can be formulated as follows,

ER(b1, · · · ,bT, U1, · · · , UT , f)

=1T

T�

t=1

�

b∈{B}M[EU1,··· ,Ut−1P (bt = b)× E

�

u∈U

r(b, f, u)]

It is clear that by taking expectation over U1, · · · , U t−1, thebid vector b follows a Markov process with transition matrixQf = EUQU,f . Considering that EUQU,f is positive, theMarkov process will lead to its stationary distribution πQf ,therefore we have,

limT→∞

ER(b1, · · · ,bT, U1, · · · , UT , f)

=�

b∈{B}m[πQf (b)× EU

�

u∈U

r(b, f, u)]

where πQf (b) is the probability of the state b in the station-ary distribution.

Lemma 2. If the user stream U tis stochastically

i.i.d. sampled, and the expected Markov transition ma-

trix EUQU,f is positive for auction mechanism f , then

limT→∞ V arR(b1, · · · ,bT , U1, · · · , UT , f) = 0 .

209

The proof of Lemma 2 is similar to that of Lemma 1.Weomit it due to space restrictions. By jointly considering thetwo lemmas, it is easy to obtain the following theorem.Theorem 1. For ∀� > 0, there exists δ(�, T ) that converges

to zero as T approaches infinity, satisfying

P (|R(b1, · · · ,bT, U1, · · · , UT , f) (11)

− ER(b1, · · · ,bT, U1, · · · , UT , f)| > �) < δ(�, T ).

4.2 Optimization AlgorithmAccording to the theoretical guarantee given in the previoussubsection, the optimal auction mechanism f∗ ∈ F can belearnt based on the converged value of the empirical revenue.In this section, we introduce our proposed algorithm to ful-fill this task. The algorithm has three components, advertiserbehavior learning, empirical revenue simulation, and auctionmechanism learning. We call it a bilevel optimization algo-rithm, or BOA for short. For ease of reference, we summarizeour algorithm in Table 1.

Assume that we have advertisers’ auction logs and users’query and click-through logs in T time periods for training.The first step is to learn the parametric transition matricesM∗

i,kpi based on this training data and construct Q∗U,f . As

shown in Section 3, this step can be achieved by means ofmaximum likelihood estimation. Then given the learnt ad-vertiser behavior model, it is a natural idea to select the opti-mal auction mechanism according to the following expectedrevenue.

�

b∈{B}m[πQf (b)× EU

�

u∈U

r(b, f, u)]

Considering that the empirical revenue will converge to theexpected revenue, it suffices to optimize the empirical rev-enue on a sufficiently long period of time. To achieve thisgoal, we leverage the learnt advertiser behavior model to pre-dict the bids for another N time periods (please note that thisprediction is a part of our learning algorithm, and N can bemuch larger than T ). The prediction is performed in the fol-lowing manner. With an initial bid profile b1, for any time1 ≤ t ≤ N , we randomly sample a piece of user data U t

from the training set. Then we are able to rank the ads ac-cording to bt and auction mechanism f , check the clicks of U t

on the shown ads based on the historical users click-throughlogs, and compute kpiti for each advertiser i. Based on kpiti,we sample advertiser i’s next bid bt+1

i according to the learnttransition matrices M∗

i,kpi. Therefore the empirical revenueof mechanism f over these N time periods can be computed.As shown in Eqn(1), the empirical revenue is a complex func-tion of f due to the second-price formula. We therefore em-ploy the Genetic Programming method for its optimization.Genetic Programming is an evolutionary algorithm developedin the field of artificial intelligence, which can handle com-plex, non-linear functional relationships.

5 ExperimentsIn this section, we report our experimental results on the ef-fectiveness of our proposed approach. For simplicity andwithout loss of generality, we assume that the quality score

Table 1: The Bilevel Optimization AlgorithmBilevel Optimization Algorithm (BOA):Advertiser Behavior Learning (ABL):

Input : Advertisers’ bids bt, t = 1, · · · , T .Advertiser’s KPI reports kpit, t = 1, · · · , T .

1. Train the advertiser behavior model by maximizing thelog likelihood function in Equ (9).

Output : The parametric transition matrix M∗i,kpi

Empirical Revenue Simulation (ERS)

Input : The transition matrices M∗i,kpi, auction mechanism f .

User streams U t, t = 1, · · · , TAdvertisers’ initial bids b1, sampling length N

1. For each t, uniformly sampling U t from {U t, t = 1, · · · , T}2. At each time 1 ≤ t ≤ N , use U t, auction mechanism f ,

M∗i,kpi, and bt to predictbt+1.

3. Compute the empirical revenue over the predicted bid sequence.Output: The empirical revenue of mechanism f .Auction Mechanism Learning (AML)

Input : The transition matrices M∗i,kpi.

Auction mechanism space FUser streams U t, t = 1, · · · , T

1. Initialize K random auction mechanisms f1, · · · fK .2. For each auction mechanism f do

2.1. Compute each f ’s revenue according to the ERS step.2.2. Use f ’s empirical revenue as the fitness function.

3. New K mechanisms are created by applying genetic operators.Output: The best mechanism f∗.

has the mathematical form of f(x) = (�CTR(x))α. The taskof auction mechanism learning then reduces to finding thebest parameter α. This setting has also been used in someother works, such as [Lahaie and Pennock, 2007].

5.1 Experimental SettingsSince it is impractical to get the online responses of the ad-vertisers and web users to any learnt auction mechanism, wechoose to use a simulation-based experiment. Simulation-based experiments have been widely used in the field of on-line advertising for algorithm evaluation [Li et al., 2010;Jordan and Wellman, 2010]. Generally speaking, we first col-lect a dataset containing the queries and clicks of web usersas well as the bids of advertisers in a period of time. Con-sidering that the clicks and bids in the dataset were generatedwhen a previous auction mechanism was used, we could notuse them to evaluate new auction mechanisms due to the sec-ond order effect. Therefore, we remove all such informationin the dataset and simulate advertisers’ bid changes and users’clicks with respect to an underlying auction mechanism usingsome state-of-the-art behavior models.

Specifically, we use a log data of 300 days obtained froma commercial search engine. We randomly sample ten key-words from the data that contains 1053 advertisers’ bid his-tory, and collect all the information about the advertisers andusers that have bidden or queried these keywords in the logdata. We use the first 100 days to estimate some basic pa-rameters in our experiments (e.g., the CTR and the valua-tion of each ad) and to construct the training data with thesimulation-based method by assuming the auction mecha-nism to be standard GSP (α = 1 in the quality score). Wethen use the remaining 200 days as the test set to compare

210

different mechanisms, and the simulated bids on the test setwill be different when different mechanisms are evaluated.

We follow the method described in [Richardson et al.,2007] to estimate �CTR(xi). We also compute a position dis-count factor βj by aggregating all the click information. Wethen simulate users’ click behaviors by assuming ad i to havea probability of �CTR(xi)βj to be clicked if it is ranked at po-sition j. When estimating the valuation of an ad, we simplyuse its largest bid in the original dataset.2 Then we use adver-tisers’ bids in the first day as their initial bids and simulate thebid changes using a mixture model, which is based on threeadvertiser models proposed in the literature.

1) Best Response Model(BRM) [Edelman et al., 2005]. Itassumes that each advertiser knows exactly the bids of allhis/her competitors in any round of auction. On this basis,he/she will take the best-response bid strategy.

2) Analytical model(AM) [Pin and Key, 2011]. The modeldoes not assume full information about the bids of advertis-ers, instead assumes the distribution of the number of adver-tisers and the distribution of the bid prices of advertisers to beavailable as public knowledge. On this basis, each advertisertakes the strategy that maximizes his/her expected utility.

3) Stable Behavior Model(SBM). As discussed in the in-troduction, many advertisers rarely change their bids. Thismodel assumes that advertisers are lazy and never changetheir bid prices even if the auction mechanism is changed.

In the mixture model, we uniformly sample the coefficientsp1, p2, p3 from a multinomial distribution, and then select p1fraction of advertisers to behave according to BRM, p2 frac-tion of advertisers to behave according to AM, and the rest tobehave according to SBM.

We implement three baselines to compare with our pro-posed approach.

1) The GSP model: the classical GSP auction mechanismwith α = 1 in the quality score f .

2) The Worst-Case Analysis (WCA) model: a state-of-the-art optimal mechanism design method [Lahaie and Pennock,2007]which studies the revenue in symmetric Nash equilibriaand selects the auction mechanism by worst case analysis.

3) The Directly-Learnt Auction (DLA) model: the optimalauction mechanism is directly learnt based on historical logdata without considering the second order effect.

When estimating the Markov transition matrix, we runthe gradient decent algorithm for 500 iterations. Our pro-posed BOA algorithm leverages this Markov model and com-putes the empirical revenue with N = 1000. When us-ing the Genetic Programming algorithm for empirical rev-enue optimization, we set the number of individuals in eachgeneration as 10, the number of generations as 50, andcrossover/mutation/reproduction rates as 70%/20%/10% re-spectively. To make our experiments robust, we sample thecoefficients in the mixture model for 100 times, and reportthe average performance in the experimental evaluation.

2Please note that we take this approach for sake of simplicity.One can choose more complicated methods [Pin and Key, 2011] toestimate the values.

Figure 2: Revenue Comparisons

5.2 Experimental ResultsFigure 2 shows the performances of the auction mechanismlearnt by our proposed BOA approach as well as the baselinemechanisms. The x-axis corresponds to the length of simu-lated test data, whose unit is day; and the y-axis correspondsto the average revenue, which is normalized to [0,10] to avoidthe disclosure of sensitive business information.

We have the following observations from the figure:1) When the size of the test data grows, the average perfor-

mances of different methods become stable and the compar-isons between their performances become clear.

2) The performances of BOA and WCA are better thanGSP (with relative improvements of 8.9% and 2.2% respec-tively), which indicates that both our proposed approach andthe game-theoretic approach for revenue optimization are ef-fective. Furthermore, BOA is statistically significantly betterthan WCA and GSP with a p-value=0.05, while WCA doesnot pass the significance test against GSP. This implies thatour approach is more effective than previous approaches.

3) The performance of DLA is worse than GSP (with a rel-ative decrease of 4.8%), and the result is statistically signifi-cant with a p-value=0.05. This demonstrates the huge impactof the second-order effect on the experimental results. Due tothe strong response from the advertisers to the auction mech-anism, simply adopting classical machine learning methodscannot lead to an effective auction mechanism.

From the experiments, we can clearly see the effectivenessof our proposed algorithm and verify our theoretical analysis.

6 Conclusion and Future WorkIn this paper, we have proposed a game-theoretic machine

learning approach to deal with the revenue optimization insponsored search auctions. Specifically, we have proposed aMarkov model to describe how advertisers change their bids,and then use the model to learn the auction mechanism thatoptimizes search engine’s revenue on the predicted bids. Theexperimental results demonstrate the effectiveness of our pro-posal. As for the future work, we plan to consider other fac-tors in our learning process, e.g.,the reserved price. We alsoplan to investigate on more comprehensive advertiser behav-ior models.

211

AcknowledgmentsThis work is supported by NSFC(61222307, 61075003) andNCET-12-0015, and the work was done when the first authorwas visiting Microsoft Research Asia.

References[Broder et al., 2007] A. Broder, M. Fontoura, V. Josifovski,

and L. Riedel. A semantic approach to contextual advertis-ing. In Proceedings of the 30th annual international ACM

SIGIR conference on Research and development in infor-

mation retrieval, pages 559–566. ACM, 2007.[Caragiannis et al., 2012] I. Caragiannis, C. Kaklamanis,

P. Kanellopoulos, M. Kyropoulou, B. Lucier, R.P.Leme, and E. Tardos. On the efficiency of equilib-ria in generalized second price auctions. Arxiv preprint

arXiv:1201.6429, 2012.[Duong and Lahaie, 2011] Q. Duong and S. Lahaie. Discrete

choice models of bidder behavior in sponsored search. In-

ternet and Network Economics, pages 134–145, 2011.[Edelman and Ostrovsky, 2007] B. Edelman and M. Ostro-

vsky. Strategic bidder behavior in sponsored search auc-tions. Decision support systems, 43(1):192–198, 2007.

[Edelman et al., 2005] B. Edelman, M. Ostrovsky, andM. Schwarz. Internet advertising and the generalized sec-ond price auction: Selling billions of dollars worth of key-words. Technical report, National Bureau of EconomicResearch, 2005.

[Garg and Narahari, 2009] D. Garg and Y. Narahari. An op-timal mechanism for sponsored search auctions on the weband comparison with other mechanisms. Automation Sci-

ence and Engineering, IEEE Transactions on, 6(4):641–657, 2009.

[Garg et al., 2007] D. Garg, Y. Narahari, and S.S. Reddy.Design of an optimal auction for sponsored search auc-tion. In E-Commerce Technology and the 4th IEEE

International Conference on Enterprise Computing, E-

Commerce, and E-Services, 2007. CEC/EEE 2007. The

9th IEEE International Conference on, pages 439–442.IEEE, 2007.

[Graepel et al., 2010] T. Graepel, J.Q. Candela, T. Borchert,and R. Herbrich. Web-Scale Bayesian Click-Through RatePrediction for Sponsored Search Advertising in MicrosoftsBing Search Engine. In ICML 2010. Citeseer, 2010.

[Jordan and Wellman, 2010] P.R. Jordan and M.P. Wellman.Designing an ad auctions game for the trading agent com-petition. Agent-Mediated Electronic Commerce. Design-

ing Trading Strategies and Mechanisms for Electronic

Markets, pages 147–162, 2010.[Lahaie and McAfee, 2011] S. Lahaie and R. McAfee. Effi-

cient ranking in sponsored search. Internet and Network

Economics, pages 254–265, 2011.[Lahaie and Pennock, 2007] S. Lahaie and D.M. Pennock.

Revenue analysis of a family of ranking rules for keywordauctions. In Proceedings of the 8th ACM Conference on

Electronic Commerce, pages 50–56. ACM, 2007.

[Li et al., 2010] W. Li, X. Wang, R. Zhang, Y. Cui, J. Mao,and R. Jin. Exploitation and exploration in a perfor-mance based contextual advertising system. In Proceed-

ings of the 16th ACM SIGKDD international conference

on Knowledge discovery and data mining, pages 27–36.ACM, 2010.

[Pardoe et al., 2006] David Pardoe, Peter Stone, MaytalSaar-Tsechansky, and Kerem Tomak. Adaptive mecha-nism design: a metalearning approach. In Proceedings of

the 8th international conference on Electronic commerce:

The new e-commerce: innovations for conquering current

barriers, obstacles and limitations to conducting success-

ful business on the internet, pages 92–102. ACM, 2006.[Pin and Key, 2011] F. Pin and P. Key. Stochastic variabil-

ity in sponsored search auctions: observations and models.In Proceedings of the 12th ACM conference on Electronic

commerce, pages 61–70, 2011.[Radlinski et al., 2008] F. Radlinski, A. Broder, P. Ciccolo,

E. Gabrilovich, V. Josifovski, and L. Riedel. Optimizingrelevance and revenue in ad search: a query substitutionapproach. In SIGIR 2008, pages 403–410. ACM, 2008.

[Richardson et al., 2007] M. Richardson, E. Dominowska,and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th inter-

national conference on World Wide Web, pages 521–530.ACM, 2007.

[Varian, 2007] H.R. Varian. Position auctions. Interna-

tional Journal of Industrial Organization, 25(6):1163–1178, 2007.

[Zhu et al., 2009a] Y. Zhu, G. Wang, J. Yang, D. Wang,J. Yan, and Z. Chen. Revenue optimization with relevanceconstraint in sponsored search. In Proceedings of the Third

International Workshop on Data Mining and Audience In-

telligence for Advertising, pages 55–60. ACM, 2009.[Zhu et al., 2009b] Y. Zhu, G. Wang, J. Yang, D. Wang,

J. Yan, J. Hu, and Z. Chen. Optimizing search engine rev-enue in sponsored search. In SIGIR 2009, pages 588–595.ACM, 2009.

212

Documents

A Game-theoretic Machine Learning Approach for Revenue ... · A Game- heoretic Machine Learning Approach for Revenue Maximization in Sponsored Search Di He†, Wei Chen‡, Liwei