Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT...

Preview:

DESCRIPTION

Introduction  Web has become an important venue for advertising e.g Google, Yahoo  Mainly two kinds of advertising channels  Contextual advertising  Sponsored advertising  Ranking: derived from  relevance to the user query  page content

Citation preview

Advisor: Koh Jia-Ling Nonhlanhla Shongwe

2010-09-28

EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCHWANG.H, LIANG.Y, FU.L, XUE.G, YU.YSIGIR’09

Preview

Introduction AdSearch

Bid phrase clustering Index structure for efficient ad search Query processing

Experimental evaluation Conclusion

Introduction

Web has become an important venue for advertising e.g Google, Yahoo

Mainly two kinds of advertising channels Contextual advertising Sponsored advertising

Ranking: derived from relevance to the user query page content

Introduction cont’s

Ad’s are characterized by bid phrases keywords the advertisers choose for their ads

Syntactic approaches suffer low recall Example

Query: “job training” Ad: career college

Ad does not have a syntactic match and is not proposed

Introduction cont’s

The problem is even worse because Shorter lengths of ads Sparsity of the bid phrases

Propose an efficient adsearch solution Tackle the issues with query expansion

AdSearch Overview

AdSearch cont’s Bid phrase clustering

Bipartite Graph Construction for Bid Phrase and Ads

Agglomerative Iterative Clustering

Bipartite Graph Construction for BidPhrase and Ads

A, B , C Ad0, Ad1, Ad2, Ad3, Ad4

1. B = 2. A =

3. G = vba, vbb, vbc 4. G = va0, va1, va2, va3, va4

Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4

Agglomerative Iterative Clustering

Jaccard Similarity

Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4 (A,B) = 1/4 (B,C) = 2/4

Agglomerative Iterative Clustering cont’sCorpus data C

A = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4

A, B , C Ad0, Ad1, Ad2, Ad3, Ad4Bid-phrases Ads

Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4

A, B , C Ad0, Ad1, Ad2, Ad3, Ad4Bid-phrases

(A, B) = 0.25 (A, C) = 0.25(B, C) = 0.5

Bipartite graph

Ads

Ad0 = A, Ad1 = B, Ad2 = B, CAd3 = B, A, CAd4 = C

Ad0, Ad1 = 0Ad0, Ad2 = 0Ad0, Ad3 = 0.33Ad0, Ad4 = 0Ad1, Ad2 = 0.5Ad1, Ad3 = 0.33Ad1, Ad4 = 0Ad2, Ad3 = 0.66Ad2, Ad4 =0.5Ad3, Ad4 =0.33

Merge:Ad2, Ad3Ad2, Ad4Ad1, Ad2Ad0, Ad3

MergeB to CThen A

AB, C

Ad0

Ad1, Ad4

Ad2, Ad3

AdSearch cont’s

• Index structure for efficient adsearch Mapping clusters of Bid Phrases to Index Terms Block-based Index Structure Dictionaries

Mapping clusters of Bid Phrases to Index Terms

Clusters

B

A

C

D E

Block-based Index Structure3 inverted lists

Contains: Index =bid phrase

List = ad1 inverted list

Contains:Index =3 bid

phrases List = ad and bid phrase

Query =B

Block-based Index Structure cont’s

Advantages over the traditional method Similar bit phrases and their

corresponding ads are placed together Merge operations become fewer or even

can be avoided Expanding phrase B with phrase A and C,

in the traditional method is not efficient.

Dictionaries

Dictionary D used to record the mapping

Bid phrase to its corresponding artificial words Locate corresponding block to a bid phrase

Bid phrase artificial words (path)

A 6:0B 6_5:1C 6_5:2

Cluster pathNumber of distinct ads

Dictionaries cont’s

Dictionary C (counter dictionary) used to record number of distinct Ads per

cluster

Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4

Cluster path

Distinct ads

6 |Ad0, Ad3|=2

6_5 |Ad1, Ad2, Ad3, Ad4| = 4

(6, 2)(6_5, 4)

AdSearch cont’s

• Query processing Finding Related Bid phrases with

Corresponding Ads Ranking Top-k Relevant Ads

Finding Related Bid phrases with Corresponding Ads

The process to find related bid phrases Input: user queries

Look up the dictionary D to get corresponding artificial words

Find minimum clusters that contain enough ads

Bid phrase artificial words (path)

A 6:0B 6_5:1C 6_5:2

Query: ABD

Cluster path

Distinct ads

6 |Ad0, Ad3|=2

6_5 |Ad1, Ad2, Ad3, Ad4| = 4

e.g. Top 2 ads M=1.5 *2 = 3

Bid phrase artificial words (path)

A 6:0B 6_5:1C 6_5:2

Cluster path

Distinct ads

6 |Ad0, Ad3|=2

6_5 |Ad1, Ad2, Ad3, Ad4| = 4

Finding Related Bid phrases with Corresponding Ads

The process to find related bid phrases Return clusters, those containing at least

one bid are stored in one group

Perform a multi-way merge operation to get the final results.

Ad Ad1 Ad2 Ad3 Ad4

Bid phrases

A B,C A,B,C C

Ad Ad1 Ad2 Ad3 Ad4

Bid phrases

A B,C A,B,C C

Ranking Top-k Relevant Ads

A procedure to expand the user query with related bid phrases and get a list of ads

To get the top K User a scoring function

Q Query B(x) Set of related bid phrases

Similarity between x and ytfidf(y, ad) term frequency and inverse

document frequency

Experimental evaluation

Both Chinese and English

Experimental evaluation cont’s

Name Description CQS1 (Chinese )or EQS1 (English) Randomly sampled 100 bid

phrases and each bid phrase is associated with few distinct ads

CQS2 (Chinese )or EQS2 (English) Selected 100 pairs bid phrases, each pair could return ads associated with both bid phrases inside it

CQS3 (Chinese )or EQS3 (English) Constructed similarly with queries composed of 3 to 4 bid phrases

CQF ( Chinese Frequent Query set)and EQS( English Frequency Query Set )

100 popular bid phrases to build the CQF and EQF

Experimental evaluation cont’s

Evaluation of the clusters step

Experimental evaluation cont’s

Efficiency evaluation The adSearch was implemented in fixed and unfixed block sizes

The block size is defined as the fraction of distinct ads in the block with regards to the whole ads.

AdSearch(0.001) number of distinct ads in each block.

For example Chinese data 524, 868 * 0.001 = 525Chinese data set = 525

Inv= perform query expansion on top of the traditional inverted index

Experimental evaluation cont’s

Effectiveness valuation

•Randomly selected 50 queries •10 people invited to evaluate the returned ads by AdSearch and Baidu.

Experimental evaluation cont’s

Effectiveness evaluation

Conclusion Introduced a AdSearch system which

consists Bid phrase clustering

For each bid phrase and ad, it will contract a bipartite graph

Used the agglomerative iterative clustering to cluster similar ads

Index structure for efficient ad search Used a block-based index structure to index all ads

and bid phrases Used the dictionary to record mappings between

bid phrases and ads Query processing

Explained how ads we retrieved and ranked to get the top-k results

THANK YOU

Introduction cont’s

Back

All Docs Relevant Ads

Relevant Docs (R)

Relevant Ads in the Ads set (Ra)

Q = “job

training”

Recommended