Evaluating query-independent object features for relevancy prediction

Evaluating Query-Independent Object

Features for Relevancy Prediction

Andres R. Masegosa1, Hideo Joho2, Joemon Jose2

1 Department of Computer Science and A.I., University of Granada, Spain 2 Department of Computing Science, University of Glasgow, UK.

ECIR’07: Rome, 5th April 2007

2

Outline 1. Introduction

2. Methodology

2.1. Conceptual Categories of Object Features

2.2. Probabilistic Classification Approach

2.3. Feature Selection and Validation Scheme

3. Experiments

3.1. Effect of Contextual Features

3.2. Effect of Feature Selection and Combination

3.3. Effect of Highly Relevant Documents

4. Discussion

5. Conclusions and Future Work

3

Background

IR in contexts

To gain further improvement on retrieval effectiveness and user experience

To make an IR system more adaptive to a search environment (i.e., context-aware)

Many potential contexts proposed

Work task, searcher, interaction, system, document, environment, temporality, etc. (See Ingwersen and Järvelin, 2005)

How can we effectively find significant contexts and use them in IR?

4

Background (cont’d)

Contexts are abstract

We need tangible variables that can represent a context to

offer context-aware retrieval and user support in

interactive IR

How can we determine which tangible variables are

effective to represent a context?

Many potential contexts

Many potential tangible variables

5

Our approach

Machine learning techniques as a diagnostic tool

Group click-through documents to represent contexts

Train classifiers with candidate variables to predict document relevancy

Prediction accuracy as a measure of variables’ effectiveness

Click-through documents and relevance judgements were collected from a user study

More details later

6

Our assumptions

If a context is significant, the effect of the

context can be represented in the relevance of

retrieved documents.

If a variable is effective, it can increase the

power of discriminating relevant documents

from non-relevant ones in a context.

7

Focus of this study

As a preliminary study of our approach, we

focused on

Topics as an example of contexts

Query independent document features as

candidate variables

Investigation of other contexts and variables

is under way

But, not presented here

8

Methodology I:

Introduction

Around 150 features extracted as candidates

features for relevancy prediction.

Based on informal experiments and literature

survey.

Features like number of digits, n. of words, n.

of bold tags, n. of links, page rank value….

Grouped into four independent functional groups

based on their role in a document.

9

Methodology II:

Conceptual Categories of Object Features

A. Document Textual Features (DOC): 14 features.

B. Visual/Layout Features:

B.1. Visual Appearance (V-VS): 28 features.

B.2. Visual HTML Tags (V-TG): 27 features.

B.3. Visual HTML Attributes (V-AT): 16 features.

C. Structural Features (STR): 18 features.

D. Other Selective Features:

D.1. Selective Words in anchor texts (O-AC): 11 features.

D.2. Selective Words in document (O-WD): 11 features.

D.3. Selective HTML tags (O-TG): 7 features.

D.4. Selective HTML attributes (O-AT): 16 features.

10

Methodology III:

Probabilistic Classification Approach

P ( R | Word1, Word2,…, Wordn)

P ( R | “IR”, No “office”, ...) = (0.99, 0.01)

Relevant Document Predicted

non-Relevant

Relevant

Document Corpus Documents Words (Features)

Word Rel non-Rel

“IR” 96% 4%

“office” 16% 84%

……

Topic: “IR related papers”

11

Methodology IV:

Classifiers Used

Four different classifiers were used.

They estimate the probability distribution in

different ways (changing the assumptions).

Classifiers:

Naive Bayes: Features are independent.

AODE: Single dependency among features.

HNB: Assume a hidden variable.

K2-MDL: Learn a general probabilistic network.

12

Methodology V:

Feature Selection Scheme

F1

F4

F2

F3

F4

F7

F5

F6

Non Selected Features Selected Features

Classifier

Score

78 % 57 % 63 %

13

77 % 69 %

Methodology VI:


F1

F4

F2

F4

F7

F5

F6

Non Selected Features

F3

Selected Features

Classifier

Score

86 %

14

Methodology VII:


This FS method depends on the data set.

Changing data set, changes the FS set.

FS is applied 100 times taking the 90% of the instances of the data set.

Features could be or couldn’t be selected in each step.

Features have a percentage of selection that it is used as a confident threshold.

Feature selection sheme is applied over each Conceptual Category.

F4

F7

F5

F6

F3

Selected Features

15

Methodology VIII:

Validation Scheme

F1,F2,F3,F4,Relvance

1, 0, 1, 2, NR

12, 0, 1, 2, R

12, 0, 1, 2, NR

12, 0, 1, 2, R

12, 0, 1, 2, R

12, 0, 1, 2, NR

12, 0, 1, 2, R

12, 0, 1, 2, NR

12, 0, 1, 2, NR

12, 0, 1, 2, R

Training Data Set

F1,F2,F3,F4,Relvance, Predicted

1, 0, 1, 2, NR, R

12, 0, 1, 2, R, NR

12, 0, 1, 2, NR NR

12, 0, 1, 2, R R

12, 0, 1, 2, R R

12, 0, 1, 2, NR R

12, 0, 1, 2, R NR

12, 0, 1, 2, NR NR

12, 0, 1, 2, NR NR

12, 0, 1, 2, R R

Test Data Set

Classifier

Performance =

(Prediction Accuracy)

%6010

6

F1,F2,F3,F4,Relvance

1, 0, 1, 2, NR

12, 0, 1, 2, R

12, 0, 1, 2, NR

12, 0, 1, 2, R

12, 0, 1, 2, R

12, 0, 1, 2, NR

12, 0, 1, 2, R

2, 0, 1, 2, NR

12, 0, 1, 2, NR

2, 0, 1, 2, R

Right Prediction Wrong Prediction

16

Methodology IX:

Validation Scheme

Training set contains the 90% of data set and

Test data set the 10%.

Random sampling is carried out 100 times.

The final estimated accuracy is the mean of

these estimations.

10 fold-cross validation was repeated 10

times.

17

Experiments I:

Data Set 1038 click-through documents extracted from a

user study of 24 participants.

Each participant was given four topics and asked to bookmark perceived relevant documents.

375 were unique relevant and 362 were unique non-relevant.

Baseline performance was 50.9%.

In Topic division data sets, baseline was:

Topic 1: 50.0% (corrected from 60.6% by re-sampling)

Topic 2: 52.1% Topic 3: 55.2% Topic 4: 51.7%

18

Experiments II:

Effect of Contextual Features

It shows the relative improvement with

respect to the baseline performance.

Significant improvement is in bold.

19

Experiments III:

Effect of Feature Selection

Features selected more than 90% of times.

Larger improvement on individual topics.

20

Experiments IV:

Effect of Feature Combination

Taking the selected features in each category

and using all together.

Found to be stable across topics (except Topic 4).

21

Experiments V:

Effect of Highly Relevant Documents

Highly Relevant Document: Judged as relevant

by at least two participants on the same topic.

Not a perfect definition, but found to be a better

data set to mine significant features.

Effect with the “combined features”:

22

Discussion I:

Effectiveness of QI Features

In topic-independent set, textual features and

visual/layout HTML attributes were significant.

Effectiveness of QI-features varies across topics.

Feature Selection and Combination can improve the

prediction accuracy and robustness.

Highly relevant documents are promising data set to

elicit significant features.

23

Discussion II:

Example of significant features

24

Conclusion & Future Work

Presented an approach to mining significant

contextual features

Investigated the effectiveness of query-independent

features for relevancy prediction.

Machine learning techniques allowed us to examine

a large number of candidate features.

Disadvantage: significant features are sometimes difficult

to interpret.

Promising approach to finding significant contextual

features.

25

Conclusion & Future Work

Implications of our study

Categorisation of search topics can facilitate the mining of

significant contextual features

Exploiting highly relevant documents can help find more

robust contextual features

Future work

Investigate other data (.e.g, transaction logs, user

feedback, etc.)

Develop adaptive search models for re-ranking, grouping

search results, recommendation, etc.