View
326
Download
5
Category
Preview:
Citation preview
Recent advances in computational advertising: design and analysis of ad retrieval systems
Evgeniy Gabrilovichgabr@yahoo-inc.comg @y
1
What is “Computational Advertising”?
• A new scientific sub-discipline that provides the f d i f b ildi li d i l l ffoundation for building online ad retrieval platforms– To wit: given a certain user in a certain context,
find the most suitable ad
• At the intersection of Large scale text analysis– Large scale text analysis
– Information retrieval– Statistical modeling and machine learningStatistical modeling and machine learning– Optimization– Microeconomics
© Yahoo! Research 2010 2Technologies described might or might not be in actual use at Yahoo!
Computational Advertising at Yahoo! ResearchYahoo! Research
© Yahoo! Research 2010 3
Online advertising spending
© Yahoo! Research 2010 4
Textual advertising
1 Ads driven by search keywords –1. Ads driven by search keywords Sponsored Search (a.k.a. “keyword driven ads”, “paid search”, etc.), p , )
2. Ads directly driven by the content of a web page – Content Match (a k a “contextpage – Content Match (a.k.a. context driven ads”, “contextual ads”, etc.)
Textual advertising on the Web is strongly related
© Yahoo! Research 2010 5
to NLP and information retrieval
Sponsored searchText based ads driven by a keyword searchText-based ads driven by a keyword search
© Yahoo! Research 2010 6
Content match adsText-based ads driven by the page contentText-based ads driven by the page content
C t tContent match
ads
© Yahoo! Research 2010 7
Anatomy of an ad
Bid phrases: {SIGIR 2010,computational advertising
Title
computational advertising, Evgeniy Gabrilovich, ...}Bid: $0.10
Title
CreativeDisplay URLDisplay URL
Landing URL: http://research.yahoo.com/tutorials/sigir10_compadv
© Yahoo! Research 2010 8Landing page
So when do advertising dollars actually change hands?actually change hands?
CPM t th d i i– CPM = cost per thousand impressions• Typically used for graphical/banner ads
(brand advertising)
– CPC = cost per clickp• Typically used for textual ads
CPT/CPA = cost per transaction/action– CPT/CPA = cost per transaction/action a.k.a. referral fees or affiliate fees
© Yahoo! Research 2010 9
Beyond keyword matching
• Matching ads is relatively simple for explicitly bid keywordsWhat about queries on which there are no bids ?– Advertisers should be able to bid on “broad queries” and/or
“concept queries”– Advertisers need volume – the total amount of searches on bid
phrases is not enough !
• Suppose your ad is “Good prices on Seattle hotels”• Suppose your ad is Good prices on Seattle hotels• Naïve approach: bid on any query that contains the word Seattle• Problems
• “Seattle's Best Coffee Chicago”
• “Alaska cruises start point”
© Yahoo! Research 2010 10
• Ideally: bid on any query related to Seattle as a travel destination
The old school: heuristic ad matchingheuristic ad matching
• Sponsored searchp– Exact match between the query and the bid phrase
of the ad (modulo simple normalization, e.g., stemming)stemming)
– Advertisers cannot possibly bid on all relevant queries (especially rare ones)
• Use advanced match (e.g., through query-to-query rewrites)
• Content match– Extract bid phrases from pages, thus reducing the
problem to exact match Both essentially perform record lookup
© Yahoo! Research 2010 11
Both essentially perform record lookup
The old school (cont’d)
Query Abbey Road
Front end
Query rewriting moduleSimplistic
lyrics
QueryQuery rewrites
query expansion
Exact matchIgnoring (or underusing) the multitude of information
il bl Candidate ads
Revenue d i Ad slate
available
© Yahoo! Research 2010 12
reordering Ad slate
The new approach: knowledge-based ad retrievalknowledge based ad retrieval
• Ad indexing and scoring based on all the information• Ad indexing and scoring based on all the information available (bid terms, title, creative, URL, landing page, ...)– Similar to document indexing in IR
• Use standard IR tools (text preprocessing – tokenization, stemming, entity extraction; inverted indexes etc.)
– Use multiple features of the query and the ad
• Elaborate query expansion
2nd l d i ( ki )• 2nd pass relevance reordering (re-ranking)– Using features not available to the 1st pass model (e.g., set-level
features, click history)
© Yahoo! Research 2010 13
The new approach (cont’d)
Query Miele
Front end
Ad query<Miele, appliances, kitchen,“appliances repair” “appliance parts”Ad query
generation
Ad query
appliances repair , appliance parts ,Business/Shopping/Home/Appliances>Rich query
Fi
Ad search engineThe hidden parts of ads (bid phrases +
First pass retrieval
Relevance
landing pages) allow us to augment the ads (cf. query
© Yahoo! Research 2010 14
Candidate ads
Revenue reordering Ad slate
Relevance reorderingexpansion)
Research questions
Should we show ads
How to select How to
index thequestions show ads at all?relevant
ads?
index the ad corpus?Can we generate bid
phrases (or even entire ad campaigns)
automatically?
Wh t i thWhat is the interplay between the organic and
sponsored
Sh ld
presults?
Can we optimally
Should we use the
landing for indexing?
© Yahoo! Research 2010 15
p ychoose the
landing page?
g
How to select
relevant ads?
Feature generation for improved ad retrieval
(SIGIR 2007 B d t l(SIGIR 2007, w. Broder et al.;ACM TWEB 2009, Gabrilovich et al.))
© Yahoo! Research 2010 16
Query classification using Web search resultsWeb search results
• Humans often find it hard to readily see what the yquery is about … – But they can easily make sense of it once they look at
th h ltthe search results…• Let computers do the same thing
Infer the q er intent from the top algorithmic search– Infer the query intent from the top algorithmic search results (“pseudo relevance feedback”)
• Classify search results (either summaries or full pages)• Let these results “vote” to determine the query class(es) in a
large taxonomy of commercial topics• Our goal: Construct additional features to retrieve better ads
© Yahoo! Research 2010 17
Our goal: Construct additional features to retrieve better ads
Example: ex560lku
CATEGORIES1. Computing/Computer/ Hardware/Computer/Peri-pherals/ComputerModems
© Yahoo! Research 2010 18
If we know it is about actiontec usb modem then we have plenty of ads …p y
© Yahoo! Research 2010 19
Our approach
Traditional approach:
Query Classifier Insufficient data
Our approach:
Query Search engine
Very large scale
y
Using Web
Search results Pre-classify all pages just once !
© Yahoo! Research 2010 20
ClassifierUsing Web as external knowledge
just once !
Research questions
Snippets or
Number of search results tofull pages? results to obtain
N b fNumber of classes per search result
Aggregation:
bundling or voting?
© Yahoo! Research 2010 21
bundling or voting?
The effect of using Web search results
© Yahoo! Research 2010 22
B d th b fBeyond the bag of words: matching textual ads in thetextual ads in the enriched feature space
(SIGIR 2007, Broder et al.;( , ;CIKM 2008, w. Broder et al.)
© Yahoo! Research 2010 23
What can we do about non-English queries ?(iNEWS @ CIKM 2008, w. Wang et al.;WSDM 2009 W t l )WSDM 2009, w. Wang et al.)
• Developing a taxonomy and building a query classifier for every language is prohibitively expensive
• Solution: apply off-the-shelf MT to the search results in the source languageg g
Very short
Machine Translation
Very short text Sufficiently
long text
© Yahoo! Research 2010 24
The effect of query expansion prior to applying MTprior to applying MT.
The gap for infrequent queries is wider
Baseline = translate th ( i MT)the query (using MT), then classify the result as an English query
© Yahoo! Research 2010 25
more frequent less frequent(Head) (Tail)
How to index the
ad corpus?
The Anatomy of an ad: Structured indexing and retrieval
for sponsored search(WWW 2010 w Bendersky et al )(WWW 2010, w. Bendersky et al.)
© Yahoo! Research 2010 26
Structure of online ad campaigns: the ad schemaad schema
Advertiser
New Year deals on
Account 1 Account 2 …
New Year deals on lawn & garden toolsBuy appliances on
Black Friday
Campaign 1
Campaign 2 …Kitchen appliances
Ad group 1
Ad group 2 …
Creatives Bid phrasesAd
Brand name appliances { Miele,
Can be just a single bid phrase, or
thousands of bid phrases (which are
© Yahoo! Research 2010 27
Compare prices and save moneywww.appliances-r-us.com
KitchenAid, Cuisinart, …}
phrases (which are not necessarily
topically coherent)
Implications of the campaign structurestructure
• What is the appropriate indexing unit?g– Cartesian product of creatives and bid phrases? Ad group?
• Leveraging information from higher levels to address data sparsity at children nodesat children nodes
• What is the right approach to document length normalization?– Large variability of document lengths– Probability of shorter documents (smaller ad groups) to be retrieved is
higher than their probability of being relevant
• How to index and score templated ads?p
• Prior work mostly considered ads as independent atomic units and ignored hierarchical campaign structure
© Yahoo! Research 2010 28
g p g
Possible approaches
1. Term index (Cartesian product of all creatives and bid terms)• Huge index, small focused documents
2. Creative index (a creative is coupled with all the bid terms in the ad group)the ad group)
• Two-stage retrieval (first choose the creative, then pick the term)• Bid terms are duplicated across creatives
3. Ad group index• Indexing units are entire ad groups• Three stage retrieval (first choose• Three-stage retrieval (first choose
the ad group, then the creative, and finally pick the term)M t t i d
© Yahoo! Research 2010 29
• Most compact index
Retrieval speed vs. relevance
Term index yields most relevantTerm index yields most relevantads, yet is least efficient (20x slower than the ad group index)
Are we tradingAre we trading effectiveness
for efficiency ?for efficiency ?
Ad group index is most efficient(2x faster than creative index), yet least effective
© Yahoo! Research 2010 30
least effective
Using learning to rank techniques:structured re-rankingstructured re ranking
• Step 1: Retrieve an initial set of candidates using the ad group index
• Step 2: Re-rank the candidate set using structural features (instead of ignoring the structure and scoring creatives and terms independently)– Ad group score, creative-term pair scoreg p , p– # bid terms in the ad group– Unigram entropy (cohesiveness)
of the ad group– Ratio of query words covered
by the ad group text– Fraction of the titles / terms /
URLs that contain at least one query term
– Other features are possible !
© Yahoo! Research 2010 31feature functions
Re-ranking retrieval performance
nDCG@5 Len 1 (143 i )
Len 2-3 (443 i )
Len 4+ (187 i )(143 queries) (443 queries) (187 queries)
Term index 0.841 0.716 0.656
St t d 0 849 0 731 0 686Structured re-ranking
0.849(+ 0.95%)
0.731(+ 2.1%)
0.686(+ 4.6%)
• Structured re-ranking is superior for all query lengths
• Most notable improvements are obtained for longer queries
© Yahoo! Research 2010 32
• Still very efficient!
To swing or not to swing: learning when (not) to advertise (CIKM 2008, w. Broder et al.)to advertise (CIKM 2008, w. Broder et al.)
• Repeatedly showing non-Should we show ads Repeatedly showing non
relevant ads can have detrimental long-term effects
at all?
• Want to be able to predict when (not) to show individual ads or a set of ads (“swing”)ads or a set of ads ( swing )
• Modeling actual short- and flong-term costs of showing
non-relevant ads is very difficult
© Yahoo! Research 2010 33
Thresholding approach
• Decision made on individual ads based onDecision made on individual ads based on ad scores
Set a global score threshold– Set a global score threshold– Only retrieve ads with scores above it– If none of the ad scores are above the
threshold, then no ads are shown (“no swing”)
• Scores are not necessarily comparable across queries!
© Yahoo! Research 2010 34
q
Machine learning approach
• Decision made on sets of ads based on aDecision made on sets of ads based on a variety of features
Learn a binary prediction model (“swing” /– Learn a binary prediction model ( swing / “no swing”) for sets of ads
– If we swing then all ads are retrievedIf we swing, then all ads are retrieved– If we do not swing, then no ads are retrieved
F t d fi d t f d th• Features defined over sets of ads, rather than individual ads
© Yahoo! Research 2010 35
Features
• Relevance features– Word overlap, cosine similarity between ad and query/page
• Vocabulary mismatch features– Translation models– PMI between query/page terms and bid terms
• Ad-based features– Bid price (higher bids may indicate better ads)p ( g y )
• Result set cohesiveness features– Coefficient of variation of ad scores (std/mean) – Result set clarityResult set clarity
• If the set of ads is very cohesive and focused on 1-2 topics, the relevance language model is very different from the collection model
– Entropy
© Yahoo! Research 2010 36
Entropy
Wh t h ft d li k?What happens after an ad click? Quantifying the impact of landing y g p g
pages in Web advertising(CIKM 2009 B k t l )(CIKM 2009, w. Becker et al.)
Can we optimally choose the
landing page?
© Yahoo! Research 2010 37
g p g
Conceptually: context transfer
Search engine result page
Click!
g p g
Landing page
User’s activity thon the
advertiser’s Web siteConversion
(e.g., purchase of the
© Yahoo! Research 2010 38
(e.g., purchase of the product or service being advertised)
All landing pages are not created equal(and neither are the corresponding conversion rates)(and neither are the corresponding conversion rates)
• We propose a concise taxonomy of landing page types:I. Homepage (25%) – top-level page of the advertiser’s site
(e.g., Verizon.com)II. Category browse (37.5%) – main page of a sub-section ofII. Category browse (37.5%) main page of a sub section of
the advertiser’s site, which describes a category of related products
III. Search transfer (26%) – search within the advertiser’s site ( )OR on other Web sites
IV. Other (11.5%) – terminal pages (e.g., promotion pages or forms)
© Yahoo! Research 2010 39
Examples: Homepage
© Yahoo! Research 2010 40
Examples: Category browse
© Yahoo! Research 2010 41
Examples: search transfer
© Yahoo! Research 2010 42
Landing page classifier
• Features: bag of words, HTML patterns[ST] “ h lt ” “f d”– [ST] “search results”, “found”
– [CB] “Home > Verizon > LG phones”– [HP] HTML overlap between given URL and base URL– [O] ratio of form elements to text, few outgoing links
• Accuracy on the pilot dataset (10-fold xval): 83%• Accuracy on additional 100 labeled pages: 80%
• Distribution of landing page types in a set of 20,000 g p g yplanding pages from Yahoo! Toolbar logs:
Homepage Search Transfer
Category Browse
Other
© Yahoo! Research 2010 43
Transfer Browse34.4% 22.3% 36.0% 7.3%
Using the landing page taxonomy
Picking the right landing page type for each ad
Improving the conversion rateImproving the conversion rate
Improving advertisers’ ROI !
© Yahoo! Research 2010 44
Landing page type usage vs. conversion: breakdown by query frequency
Navigational queries
Category and search transfer become more
popular for rare queriesp p q
© Yahoo! Research 2010 45
Observed conversion rates are in sharp contrast with usage frequency
of the different page types
Landing page type usage vs. conversion: breakdown by query priceb ea do by que y p ce
Category and search transfer are dominant for cheaper queriesp q
© Yahoo! Research 2010 46
As the price goes up, so does the conversion rate (higher quality pages?)
What is the interplay between p ythe organic and
sponsored results?
Competing for users’ attention:On the interplay between organic andOn the interplay between organic and
sponsored search results(WWW 2010 w Danescu Niculescu Mizil et al )(WWW 2010, w. Danescu-Niculescu-Mizil et al.)
© Yahoo! Research 2010 47
The interplay between ads and organic resultsorganic results
“... in an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is thatdearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among theneed to allocate that attention efficiently among the overabundance of information sources that might consume it.”
-- Herbert Simon, “Designing Organizations for an Information-Rich World”, 1971.,
• Is there competition for clicks between ads and organic results ?• Do users prefer ads that are similar to the organic results, or do
they prefer diversity ?they prefer diversity ?
We found that the nature of this interplay depends on the type of the query
© Yahoo! Research 2010 48
on the type of the query
Relation between the CTR of ads and the CTR of organic resultsand the CTR of organic results
• Negative correlation (competition)g ( p )– Users are only willing to spend limited time and effort on
each query
P iti l ti (d d th lit f• Positive correlation (depends on the quality of results)– Easy query (“online radio”) – decent ads and organicEasy query ( online radio ) decent ads and organic
results – clicks on both– Hard query (“who is giving this talk?”) – poor results on
both sides – no clicks on eitherboth sides no clicks on either
• Independence (null hypothesis)– Users consider ads and organic results as two
© Yahoo! Research 2010 49
gindependent sources of information
Findings: competition + positive correlationcompetition + positive correlation
© Yahoo! Research 2010 50
Decoupling the forces
• Users are willing to invest limited effort in geach query competition
• In order to single out the competition effect, we gtried to explicitly model the amount of effortthe user is willing to investL ff i i l i [B d 2002]• Low effort = navigational queries [Broder, 2002] (27% of queries)
“Pandora radio” “Bank of America”– Pandora radio , Bank of America
• High effort = non-navigational queries“Meaning of life” “academia vs industry”
© Yahoo! Research 2010 51
– Meaning of life , academia vs. industry
Competition clearly exists for navigational queriesnavigational queries
We also examined differentWe also examined different degrees of navigationality:
the less navigational the query is, the less competition we
observed
© Yahoo! Research 2010 52
Another viewpoint:Do users prefer ads that are more similar to the organic results or more diverse ads?the organic results or more diverse ads?
• Both have been argued for in prior workBoth have been argued for in prior work• Preference for similarity
– Ads are more likely to be relevant– This assumption is often made in query
i f d ti i [B d t l 2008]expansion for advertising [Broder et al., 2008]
• Preference of diversity– Diversity among organic search results has
often been shown to be desirable (e.g., entire i di it @ WWW 2010)
© Yahoo! Research 2010 53
session on diversity @ WWW 2010)
We found evidence for users’ preferring both diversity and similaritybot d e s ty a d s a ty
So we need to dig deeper
again ...
Overlap measured using the Jaccard
coefficient
© Yahoo! Research 2010 54
between titles of ads and organic
results
Let’s break down by navigationality againby navigationality again
© Yahoo! Research 2010 55
Break down by navigationality (cont’d)(cont d)
© Yahoo! Research 2010 56
Counterintuitive ?
© Yahoo! Research 2010 57
Responsive and incidental ads
• Responsive ads directly address the user’sResponsive ads directly address the user s information need
Incidental ads are only somewhat related to the– More likely to be similar to the organic results
• Incidental ads are only somewhat related to the user’s information need– Unreasonable as organic results but ok for adsUnreasonable as organic results, but ok for ads
• Example: query = “free internet radio”
– More likely to be different from the organic results
• Example: query = free internet radio– Responsive: “Pandora Internet Radio”– Incidental: “Discount Bose Computer Speakers”
© Yahoo! Research 2010 58
– Incidental: Discount Bose Computer Speakers
Now it all make sense ...
Using the featuresUsing the features that quantify this
interplay, we improved the accuracy of CTRaccuracy of CTR prediction by 5%
© Yahoo! Research 2010 59
Summary
1 The financial scale is huge1. The financial scale is huge2. Advertising is a form of information3. Finding the “best ad” is an information
retrieval problem Multiple, possibly contradictory utility functions Classical IR needs significant adaptation
4. The optimal solution requires extensive use of external knowledge
© Yahoo! Research 2010 60
g
Th k !Thank you!gabr@yahoo-inc.com
http://research.yahoo.com/~gabr
61
This talk is Copyright Yahoo! 2010.Y h ! d th A th t i ll i ht i l diYahoo! and the Author retain all rights, including
copyright and distribution rights. No publication or further distribution in full or in part is permitted
without explicit written permission.
The opinions expressed herein are the responsibilityThe opinions expressed herein are the responsibility of the author and do not necessarily reflect the
opinion of Yahoo! Inc.
This talk benefitted from the contributions of many colleagues and co-authors at Yahoo! and elsewhere.
© Yahoo! Research 2010 62
Their help is gratefully acknowledged.
Recommended