Upload
ntnu
View
160
Download
2
Embed Size (px)
Citation preview
Evaluating Query-Independent Object
Features for Relevancy Prediction
Andres R. Masegosa1, Hideo Joho2, Joemon Jose2
1 Department of Computer Science and A.I., University of Granada, Spain 2 Department of Computing Science, University of Glasgow, UK.
ECIR’07: Rome, 5th April 2007
2
Outline 1. Introduction
2. Methodology
2.1. Conceptual Categories of Object Features
2.2. Probabilistic Classification Approach
2.3. Feature Selection and Validation Scheme
3. Experiments
3.1. Effect of Contextual Features
3.2. Effect of Feature Selection and Combination
3.3. Effect of Highly Relevant Documents
4. Discussion
5. Conclusions and Future Work
3
Background
IR in contexts
To gain further improvement on retrieval effectiveness and user experience
To make an IR system more adaptive to a search environment (i.e., context-aware)
Many potential contexts proposed
Work task, searcher, interaction, system, document, environment, temporality, etc. (See Ingwersen and Järvelin, 2005)
How can we effectively find significant contexts and use them in IR?
4
Background (cont’d)
Contexts are abstract
We need tangible variables that can represent a context to
offer context-aware retrieval and user support in
interactive IR
How can we determine which tangible variables are
effective to represent a context?
Many potential contexts
Many potential tangible variables
5
Our approach
Machine learning techniques as a diagnostic tool
Group click-through documents to represent contexts
Train classifiers with candidate variables to predict document relevancy
Prediction accuracy as a measure of variables’ effectiveness
Click-through documents and relevance judgements were collected from a user study
More details later
6
Our assumptions
If a context is significant, the effect of the
context can be represented in the relevance of
retrieved documents.
If a variable is effective, it can increase the
power of discriminating relevant documents
from non-relevant ones in a context.
7
Focus of this study
As a preliminary study of our approach, we
focused on
Topics as an example of contexts
Query independent document features as
candidate variables
Investigation of other contexts and variables
is under way
But, not presented here
8
Methodology I:
Introduction
Around 150 features extracted as candidates
features for relevancy prediction.
Based on informal experiments and literature
survey.
Features like number of digits, n. of words, n.
of bold tags, n. of links, page rank value….
Grouped into four independent functional groups
based on their role in a document.
9
Methodology II:
Conceptual Categories of Object Features
A. Document Textual Features (DOC): 14 features.
B. Visual/Layout Features:
B.1. Visual Appearance (V-VS): 28 features.
B.2. Visual HTML Tags (V-TG): 27 features.
B.3. Visual HTML Attributes (V-AT): 16 features.
C. Structural Features (STR): 18 features.
D. Other Selective Features:
D.1. Selective Words in anchor texts (O-AC): 11 features.
D.2. Selective Words in document (O-WD): 11 features.
D.3. Selective HTML tags (O-TG): 7 features.
D.4. Selective HTML attributes (O-AT): 16 features.
10
Methodology III:
Probabilistic Classification Approach
P ( R | Word1, Word2,…, Wordn)
P ( R | “IR”, No “office”, ...) = (0.99, 0.01)
Relevant Document Predicted
non-Relevant
Relevant
Document Corpus Documents Words (Features)
Word Rel non-Rel
“IR” 96% 4%
“office” 16% 84%
……
Topic: “IR related papers”
11
Methodology IV:
Classifiers Used
Four different classifiers were used.
They estimate the probability distribution in
different ways (changing the assumptions).
Classifiers:
Naive Bayes: Features are independent.
AODE: Single dependency among features.
HNB: Assume a hidden variable.
K2-MDL: Learn a general probabilistic network.
12
Methodology V:
Feature Selection Scheme
F1
F4
F2
F3
F4
F7
F5
F6
Non Selected Features Selected Features
Classifier
Score
78 % 57 % 63 %
13
77 % 69 %
Methodology VI:
Feature Selection Scheme
F1
F4
F2
F4
F7
F5
F6
Non Selected Features
F3
Selected Features
Classifier
Score
86 %
14
Methodology VII:
Feature Selection Scheme
This FS method depends on the data set.
Changing data set, changes the FS set.
FS is applied 100 times taking the 90% of the instances of the data set.
Features could be or couldn’t be selected in each step.
Features have a percentage of selection that it is used as a confident threshold.
Feature selection sheme is applied over each Conceptual Category.
F4
F7
F5
F6
F3
Selected Features
15
Methodology VIII:
Validation Scheme
F1,F2,F3,F4,Relvance
1, 0, 1, 2, NR
12, 0, 1, 2, R
12, 0, 1, 2, NR
12, 0, 1, 2, R
12, 0, 1, 2, R
12, 0, 1, 2, NR
12, 0, 1, 2, R
12, 0, 1, 2, NR
12, 0, 1, 2, NR
12, 0, 1, 2, R
Training Data Set
F1,F2,F3,F4,Relvance, Predicted
1, 0, 1, 2, NR, R
12, 0, 1, 2, R, NR
12, 0, 1, 2, NR NR
12, 0, 1, 2, R R
12, 0, 1, 2, R R
12, 0, 1, 2, NR R
12, 0, 1, 2, R NR
12, 0, 1, 2, NR NR
12, 0, 1, 2, NR NR
12, 0, 1, 2, R R
Test Data Set
Classifier
Performance =
(Prediction Accuracy)
%6010
6
F1,F2,F3,F4,Relvance
1, 0, 1, 2, NR
12, 0, 1, 2, R
12, 0, 1, 2, NR
12, 0, 1, 2, R
12, 0, 1, 2, R
12, 0, 1, 2, NR
12, 0, 1, 2, R
2, 0, 1, 2, NR
12, 0, 1, 2, NR
2, 0, 1, 2, R
Right Prediction Wrong Prediction
16
Methodology IX:
Validation Scheme
Training set contains the 90% of data set and
Test data set the 10%.
Random sampling is carried out 100 times.
The final estimated accuracy is the mean of
these estimations.
10 fold-cross validation was repeated 10
times.
17
Experiments I:
Data Set 1038 click-through documents extracted from a
user study of 24 participants.
Each participant was given four topics and asked to bookmark perceived relevant documents.
375 were unique relevant and 362 were unique non-relevant.
Baseline performance was 50.9%.
In Topic division data sets, baseline was:
Topic 1: 50.0% (corrected from 60.6% by re-sampling)
Topic 2: 52.1% Topic 3: 55.2% Topic 4: 51.7%
18
Experiments II:
Effect of Contextual Features
It shows the relative improvement with
respect to the baseline performance.
Significant improvement is in bold.
19
Experiments III:
Effect of Feature Selection
Features selected more than 90% of times.
Larger improvement on individual topics.
20
Experiments IV:
Effect of Feature Combination
Taking the selected features in each category
and using all together.
Found to be stable across topics (except Topic 4).
21
Experiments V:
Effect of Highly Relevant Documents
Highly Relevant Document: Judged as relevant
by at least two participants on the same topic.
Not a perfect definition, but found to be a better
data set to mine significant features.
Effect with the “combined features”:
22
Discussion I:
Effectiveness of QI Features
In topic-independent set, textual features and
visual/layout HTML attributes were significant.
Effectiveness of QI-features varies across topics.
Feature Selection and Combination can improve the
prediction accuracy and robustness.
Highly relevant documents are promising data set to
elicit significant features.
23
Discussion II:
Example of significant features
24
Conclusion & Future Work
Presented an approach to mining significant
contextual features
Investigated the effectiveness of query-independent
features for relevancy prediction.
Machine learning techniques allowed us to examine
a large number of candidate features.
Disadvantage: significant features are sometimes difficult
to interpret.
Promising approach to finding significant contextual
features.
25
Conclusion & Future Work
Implications of our study
Categorisation of search topics can facilitate the mining of
significant contextual features
Exploiting highly relevant documents can help find more
robust contextual features
Future work
Investigate other data (.e.g, transaction logs, user
feedback, etc.)
Develop adaptive search models for re-ranking, grouping
search results, recommendation, etc.