Upload
phyllis-katrina-short
View
215
Download
0
Embed Size (px)
DESCRIPTION
Copyright 2010 by CEBT Context Query Classification Motivation Example Query “Jaguar” w.o. context – Ambiguous that user is interested in “car” or “animal” Query “jaguar” before “BMW” – Clear that User is interested in “car” Context Information Adjacent queries Clicked URLs This paper is modeling context information with CRF 3
Citation preview
Context-Aware Query Classifica-tion
Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang
Microsoft Research AsiaSIGIR 2009
2010.04.27Summarized and presented by Sang-il Song, IDS Lab., Seoul National Uni-
versity
Copyright 2010 by CEBT 2
Query Classification Query Classification (QC)
Understanding user’s search intent Classifying user queries into predefined target categories. Difference from traditional text classification
– Queries are usually very short– Many queries are ambiguous, so that it belongs to multiple cate-
gories Approaches
– Augmenting the queries with extra data (search results)– Leveraging unlabeled data to help improve the accuracy of su-
pervised learning– Expanding training data by automatically labeling some queries
in some click-through data via a self-training These approaches doesn’t consider user behavior history
Copyright 2010 by CEBT 3
Context Query Classification Motivation Example
Query “Jaguar” w.o. context– Ambiguous that user is interested in “car” or “animal”
Query “jaguar” before “BMW”– Clear that User is interested in “car”
Context Information Adjacent queries Clicked URLs
This paper is modeling context information with CRF
Copyright 2010 by CEBT 4
User Session User search session
Series of observation Each consists of a query and a set of URL , clicked
by user for
Copyright 2010 by CEBT 5
Taxonomy Taxonomy
Tree of categories Each node corresponds to a predefined category
Copyright 2010 by CEBT 6
Conditional Random Field Undirected graphical model input sequence pij depends on feature function Motivation for using CRF
Suitable for capturingcontext information
Doesn’t need anyprior knowledge
Flexible to richer features2
s1
s3
s4
p11
p22
p44
p33p23
p21
p24
p12p13p14
p32
p42
p41p43
p31
p34
Copyright 2010 by CEBT 7
Context-Aware QC with CRFworld cup
worldcup.fifa.com
fifafifa10.ea.com
fifa news
fifaworldcup.ea.com
0.80.2
0.30.70.050.95
0.70.30.40.60.70.30.40.6
0.8
0.2
0.240.560.010.19
0.1680.0720.2240.3360.0070.0030.0760.114
soccergame
Category Label
Copyright 2010 by CEBT 8
Conditional Probability Conditional Probability
Category label sequence Observation sequence Conditional Probability
– Z(o) : normalization factor Potential function
– fk : feature function– lk : weight of fk
Copyright 2010 by CEBT 9
Training and Classification Training
Given Training Data Objective
– find a set of parameters– Maximize the conditional log-likelihood:
Inferring the category label ct for the test query as
Copyright 2010 by CEBT 10
Features
Feature What does it use?
localfeature
Query terms Query terms
Pseudo feedback External Web directory
Implicit feedback External Web directory +click information
contex-tual fea-
ture
Direct Association be-tween
adjacent labelsPrevious labels
Taxonomy-based associa-tion between adjacent la-
belsTaxonomy structure
Feature
Copyright 2010 by CEBT 11
Local Feature Query Terms
Elementary feature too sparse – training data couldn’t
cover terms sufficiently Pseudo feedback
Using top M results returned by an external Web directory
Mapping its category label to a category in the target tax-onomy
General label confidence
– Meaning the number of returned related search results of whose category labels are after mapping
Copyright 2010 by CEBT 12
Local Features (contd.) Implicit feedback
Similar to Pseudo feedback, but using click information click-based label confidence score
Calculating1. Using Web Directory, get corresponding categories2. Obtain a document collection for each possible query3. Build a Vector Space Model for each category4. Use cosine Similarity term vector of and snippets of the
Copyright 2010 by CEBT 13
Contextual Features Direct Association between adjacent labels
Using occurrence of a pair of labels The Higher the weight ,
the larger the probability transits into
Taxonomy-based association between adjacent labels Limited by size of training data, some transition may not
occur. Using Structure of Taxonomy The association between two
sibling categories stronger than that of two non-sibling categories
Copyright 2010 by CEBT 14
Experimental Setup Taxonomy of ACM KDD Cup’05
Target Taxonomy 7 level-one category 67 level-two category
Data set Extracting 10,000 sessions from one day’s search log Each session contains at least two queries Three human labelers label the queries of each session
Copyright 2010 by CEBT 15
Baseline Bridging classifier (BC)
Training a classifier on an intermediate taxonomy Bridging the queries and the target taxonomy in the online
step of QC Outperforming the winning approach in KDD Cup’ 05
Collaborating classifier (CC) Naïve context-aware approach Define score function of query q and category c by BC Using current query and past query, association of previous
category and estimated category
Copyright 2010 by CEBT 16
Evaluation For a test query , true category label Given the classification results
is a set of the top K predicted category labels
Recall
Precision
F1 Score
Copyright 2010 by CEBT 17
Results
CRF-B: CRF with Basic Features – Query terms, General label confi-dence and Direct association between adjacent labels
CRF-B-C: CRF-B + Click-based label confidenceCRF-B-C-T: CRF-B-C + Taxonomy-based association
The average overall recall
Copyright 2010 by CEBT 18
Results (contd.)
The average overall F1 score
The average overall precision
Copyright 2010 by CEBT 19
Case Study
Without considering context, Many possible search intents– General information of Santa Fe => Information\Local & Re-
gional– Travel information of Santa Fe => Living\Travel & Vacation
Copyright 2010 by CEBT 20
Conclusions Novel Approach for leveraging context information to
classify queries by modeling search through CRFs This approach consistently outperforms a non-context-
aware baseline and a naïve context-aware baselines The effectiveness of context information
Copyright 2010 by CEBT 21
Discussions Experiments on real data set clearly show that this ap-
proach outperforms non-context-aware baseline
The first-query problem Not being able to find a search context if query is located at
the beginning of the session
Experiments are too simple size of session height of taxonomy
Q & A
Thank you