Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT...

Advisor: Koh Jia-Ling Nonhlanhla Shongwe

2010-09-28

EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCHWANG.H, LIANG.Y, FU.L, XUE.G, YU.YSIGIR’09

Preview

Introduction AdSearch

Bid phrase clustering Index structure for efficient ad search Query processing

Experimental evaluation Conclusion

Introduction

Web has become an important venue for advertising e.g Google, Yahoo

Mainly two kinds of advertising channels Contextual advertising Sponsored advertising

Ranking: derived from relevance to the user query page content

Introduction cont’s

Ad’s are characterized by bid phrases keywords the advertisers choose for their ads

Syntactic approaches suffer low recall Example

Query: “job training” Ad: career college

Ad does not have a syntactic match and is not proposed

The problem is even worse because Shorter lengths of ads Sparsity of the bid phrases

Propose an efficient adsearch solution Tackle the issues with query expansion

AdSearch Overview

AdSearch cont’s Bid phrase clustering

Bipartite Graph Construction for Bid Phrase and Ads

Agglomerative Iterative Clustering

Bipartite Graph Construction for BidPhrase and Ads

A, B , C Ad0, Ad1, Ad2, Ad3, Ad4

1. B = 2. A =

3. G = vba, vbb, vbc 4. G = va0, va1, va2, va3, va4

Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4

Agglomerative Iterative Clustering

Jaccard Similarity

Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4 (A,B) = 1/4 (B,C) = 2/4

Agglomerative Iterative Clustering cont’sCorpus data C

A = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4

A, B , C Ad0, Ad1, Ad2, Ad3, Ad4Bid-phrases Ads

A, B , C Ad0, Ad1, Ad2, Ad3, Ad4Bid-phrases

(A, B) = 0.25 (A, C) = 0.25(B, C) = 0.5

Bipartite graph

Ad0 = A, Ad1 = B, Ad2 = B, CAd3 = B, A, CAd4 = C

Ad0, Ad1 = 0Ad0, Ad2 = 0Ad0, Ad3 = 0.33Ad0, Ad4 = 0Ad1, Ad2 = 0.5Ad1, Ad3 = 0.33Ad1, Ad4 = 0Ad2, Ad3 = 0.66Ad2, Ad4 =0.5Ad3, Ad4 =0.33

Merge:Ad2, Ad3Ad2, Ad4Ad1, Ad2Ad0, Ad3

MergeB to CThen A

Ad1, Ad4

Ad2, Ad3

AdSearch cont’s

• Index structure for efficient adsearch Mapping clusters of Bid Phrases to Index Terms Block-based Index Structure Dictionaries

Mapping clusters of Bid Phrases to Index Terms

Clusters

Block-based Index Structure3 inverted lists

Contains: Index =bid phrase

List = ad1 inverted list

Contains:Index =3 bid

phrases List = ad and bid phrase

Query =B

Block-based Index Structure cont’s

Advantages over the traditional method Similar bit phrases and their

corresponding ads are placed together Merge operations become fewer or even

can be avoided Expanding phrase B with phrase A and C,

in the traditional method is not efficient.

Dictionaries

Dictionary D used to record the mapping

Bid phrase to its corresponding artificial words Locate corresponding block to a bid phrase

Bid phrase artificial words (path)

A 6:0B 6_5:1C 6_5:2

Cluster pathNumber of distinct ads

Dictionaries cont’s

Dictionary C (counter dictionary) used to record number of distinct Ads per

cluster

Cluster path

Distinct ads

6 |Ad0, Ad3|=2

6_5 |Ad1, Ad2, Ad3, Ad4| = 4

(6, 2)(6_5, 4)

AdSearch cont’s

• Query processing Finding Related Bid phrases with

Corresponding Ads Ranking Top-k Relevant Ads

Finding Related Bid phrases with Corresponding Ads

The process to find related bid phrases Input: user queries

Look up the dictionary D to get corresponding artificial words

Find minimum clusters that contain enough ads

A 6:0B 6_5:1C 6_5:2

Query: ABD

Cluster path

Distinct ads

6 |Ad0, Ad3|=2

6_5 |Ad1, Ad2, Ad3, Ad4| = 4

e.g. Top 2 ads M=1.5 *2 = 3

A 6:0B 6_5:1C 6_5:2

Cluster path

Distinct ads

6 |Ad0, Ad3|=2

6_5 |Ad1, Ad2, Ad3, Ad4| = 4

Finding Related Bid phrases with Corresponding Ads

The process to find related bid phrases Return clusters, those containing at least

one bid are stored in one group

Perform a multi-way merge operation to get the final results.

Ad Ad1 Ad2 Ad3 Ad4

Bid phrases

A B,C A,B,C C

Ad Ad1 Ad2 Ad3 Ad4

Bid phrases

A B,C A,B,C C

Ranking Top-k Relevant Ads

A procedure to expand the user query with related bid phrases and get a list of ads

To get the top K User a scoring function

Q Query B(x) Set of related bid phrases

Similarity between x and ytfidf(y, ad) term frequency and inverse

document frequency

Experimental evaluation

Both Chinese and English

Experimental evaluation cont’s

Name Description CQS1 (Chinese )or EQS1 (English) Randomly sampled 100 bid

phrases and each bid phrase is associated with few distinct ads

CQS2 (Chinese )or EQS2 (English) Selected 100 pairs bid phrases, each pair could return ads associated with both bid phrases inside it

CQS3 (Chinese )or EQS3 (English) Constructed similarly with queries composed of 3 to 4 bid phrases

CQF ( Chinese Frequent Query set)and EQS( English Frequency Query Set )

100 popular bid phrases to build the CQF and EQF

Evaluation of the clusters step

Efficiency evaluation The adSearch was implemented in fixed and unfixed block sizes

The block size is defined as the fraction of distinct ads in the block with regards to the whole ads.

AdSearch(0.001) number of distinct ads in each block.

For example Chinese data 524, 868 * 0.001 = 525Chinese data set = 525

Inv= perform query expansion on top of the traditional inverted index

Effectiveness valuation

•Randomly selected 50 queries •10 people invited to evaluate the returned ads by AdSearch and Baidu.

Effectiveness evaluation

Conclusion Introduced a AdSearch system which

consists Bid phrase clustering

For each bid phrase and ad, it will contract a bipartite graph

Used the agglomerative iterative clustering to cluster similar ads

Index structure for efficient ad search Used a block-based index structure to index all ads

and bid phrases Used the dictionary to record mappings between

bid phrases and ads Query processing

Explained how ads we retrieved and ranked to get the top-k results

THANK YOU

All Docs Relevant Ads

Relevant Docs (R)

Relevant Ads in the Ads set (Ra)

Q = “job

training”

Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT...

Documents

Utilizing Query Change for Session Search (SIGIR 2013)

Practical Online Retrieval Evaluation SIGIR 2011 Tutorial

CURRICULUM VITAE OF MANDLENKOSI KELLY SHONGWE

SIGIR’13 Debrief

Electronic CommerceNonhlanhla Shongwe 698470726. Introduction Mission statement Product Business model SWOT Analysis Conclusion

ACM SIGIR Annual Business Meeting 2010: Secretary’s Notessigir.hosting.acm.org/files/reports/sigir-notes-2010.pdf · 2014-01-29 · BUSINESS MEETING REPORT ACM SIGIR Annual Business

Web Table Extraction, Retrieval and Augmentation - SIGIR ...SIGIR 2019 tutorial Shuo Zhang and Krisztian Balog University of Stavanger ... entity-oriented and semantic search. Shuo

Nonhlanhla Mphahlele

+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January

Lemur Tutorial at SIGIR 2006

Copulas for Information Retrieval (SIGIR'13)

SIGIR 2016 COFIBA - Collaborative Filtering Bandits, the 39th ACM SIGIR

Entity Sentiment Extraction Using Text Ranking - sigir

Person Name Disambiguation by Bootstrapping SIGIR’10 Yoshida M., Ikeda M., Ono S., Sato I., Hiroshi N. Supervisor: Koh Jia-Ling Presenter: Nonhlanhla

Neighbourhood Preserving Quantisation for LSH SIGIR Poster

Apresentacao Paper SIGIR Sergio

Shongwe building trip 24 27 april 2015 first draft

DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe

Nonhlanhla-Chikoti Sibanda FLORIDA … Day Spending Reports/4th...Nonhlanhla-Chikoti Sibanda 850-414-4754(phone) Nonhlanhla-Chikoti.Sibanda@dot.state.fl.us FLORIDA DEPARTMENT OF TRANSPORTATION

SIGIR PIM Workshop 2006