Download ppt - 9/21/2000Information Organization and Retrieval Ranking and Relevance Feedback Ray Larson & Marti Hearst University of California, Berkeley School of Information

9/21/2000 Information Organization and Retrieval

Ranking and Relevance Feedback

Ray Larson & Marti Hearst

University of California, Berkeley

School of Information Management and Systems

SIMS 202: Information Organization and Retrieval


Review

• Inverted files

• The Vector Space Model

• Term weighting


tf x idf)/log(* kikik nNtfw

log

Tcontain that in documents ofnumber the

collection in the documents ofnumber total

in T termoffrequency document inverse

document in T termoffrequency

document in term

nNidf

Cn

CN

Cidf

Dtf

DkT

kk

kk

kk

ikik

ik


Inverse Document Frequency

• IDF provides high values for rare words and low values for common words

41

10000log

698.220

10000log

301.05000

10000log

010000

10000log

For a collectionof 10000 documents


Similarity Measures

|)||,min(|

||

||||

||

||||

||||

||2

||

21

21

DQ

DQ

DQ

DQ

DQDQ

DQ

DQ

DQ

Simple matching (coordination level match)

Dice’s Coefficient

Jaccard’s Coefficient

Cosine Coefficient

Overlap Coefficient


Vector Space Visualization


Text ClusteringClustering is

“The art of finding groups in data.” -- Kaufmann and Rousseeu

Term 1

Term 2


tf x idf normalization• Normalize the term weights (so longer documents

are not unfairly given more weight)– normalize usually means force all values to fall within a certain

range, usually between 0 and 1, inclusive.

t

k kik

kikik

nNtf

nNtfw

1

22 )]/[log()(

)/log(


Vector space similarity(use the weights to compare the documents)

terms.) thehting when weigdone tion was(Normaliza

product.inner normalizedor cosine, thecalled also is This

),(

:is documents twoof similarity theNow,

1

t

kjkikji wwDDsim


Vector Space Similarity Measurecombine tf x idf into a similarity measure

)()(

),(

:comparison similarity in the normalize otherwise

),( :normalized weights termif

absent is terma if 0 ...,,

,...,,

1

2

1

2

1

1

,21

21

t

jd

t

jqj

t

jdqj

i

t

jdqji

qtqq

dddi

ij

ij

ij

itii

ww

ww

DQsim

wwDQsim

wwwwQ

wwwD


Vector Space with Term Weights and Cosine Matching

1.0

0.8

0.6

0.4

0.2

0.80.60.40.20 1.0

D2

D1

Q

1

2

Term B

Term A

Di=(di1,wdi1;di2, wdi2;…;dit, wdit)Q =(qi1,wqi1;qi2, wqi2;…;qit, wqit)

t

j

t

j dq

t

j dq

i

ijj

ijj

ww

wwDQsim

1 1

22

1

)()(),(

Q = (0.4,0.8)D1=(0.8,0.3)D2=(0.2,0.7)

98.042.0

64.0

])7.0()2.0[(])8.0()4.0[(

)7.08.0()2.04.0()2,(

2222

DQsim

74.058.0

56.),( 1 DQsim


Term Weights in SMART

• In SMART weights are decomposed into three factors:

norm

collectfreqw kkd

kd


SMART Freq Components

1)ln(

)max(2

1

2

1)max(

}1,0{

kd

kd

kd

kd

kd

kd

freq

freqfreq

freq

freq

freq

Binary

maxnorm

augmented

log


Collection Weighting in SMART

k

k

k

k

k

k

Doc

Doc

DocNDocDoc

NDoc

Doc

NDoc

collect

1

log

log

log

2

Inverse

squared

probabilistic

frequency


Term Normalization in SMART

jvector

vectorj

vectorj

vectorj

w

w

w

w

norm

max

4

2

sum

cosine

fourth

max


Probabilistic Models: Logistic Regression attributes

MX

n

nNIDF

IDFM

X

DLX

DAFM

X

QLX

QAFM

X

j

j

j

j

j

t

t

M

t

M

t

M

t

log

log1

log1

log1

6

15

4

13

2

11

Average Absolute Query Frequency

Query Length

Average Absolute Document Frequency

Document Length

Average Inverse Document Frequency

Inverse Document Frequency

Number of Terms in common between query and document -- logged


Probabilistic Models: Logistic Regression

6

10),|(

iii XccDQRP

Probability of relevance is based onLogistic regression from a sample set of documentsto determine values of the coefficients.At retrieval the probability estimate is obtained by:

For the 6 X attribute measures shown previously


Probabilistic Models

• Strong theoretical basis

• In principle should supply the best predictions of relevance given available information

• Can be implemented similarly to Vector

• Relevance information is required -- or is “guestimated”

• Important indicators of relevance may not be term -- though terms only are usually used

• Optimally requires on-going collection of relevance information

Advantages Disadvantages


Vector and Probabilistic Models

• Support “natural language” queries• Treat documents and queries the same• Support relevance feedback searching• Support ranked retrieval• Differ primarily in theoretical basis and in how the

ranking is calculated– Vector assumes relevance

– Probabilistic relies on relevance judgments or estimates


Current use of Probabilistic Models

• Virtually all the major systems in TREC now use the “Okapi BM25 formula” which incorporates the Robertson-Sparck Jones weights…

5.05.05.0

5.0

log)1(

rRnNrnrR

r

w


Okapi BM25

• Where:• Q is a query containing terms T• K is k1((1-b) + b.dl/avdl)• k1, b and k3 are parameters , usually 1.2, 0.75 and 7-1000• tf is the frequency of the term in a specific document• qtf is the frequency of the term in a topic from which Q was

derived• dl and avdl are the document length and the average

document length measured in some convenient unit• w(1) is the Robertson-Sparck Jones weight.

QT qtfk

qtfk

tfK

tfkw

3

31)1( )1()1(


Today

• Logistic regression and Cheshire

• Relevance Feedback


Logistic Regression and Cheshire II

• The Cheshire II system uses Logistic Regression equations estimated from TREC full-text data.

• Demo (?)


Querying in IR SystemInterest profiles

& QueriesDocuments

& data

Rules of the game =Rules for subject indexing +

Thesaurus (which consists of

Lead-InVocabulary

andIndexing

Language

StorageLine

Potentially Relevant

Documents

Comparison/Matching

Store1: Profiles/Search requests

Store2: Documentrepresentations

Indexing (Descriptive and

Subject)

Formulating query in terms of

descriptors

Storage of profiles

Storage of Documents

Information Storage and Retrieval System


Relevance Feedback in an IR System

Interest profiles& Queries

Documents & data

Rules of the game =Rules for subject indexing +

Thesaurus (which consists of

Lead-InVocabulary

andIndexing

Language

StorageLine

Potentially Relevant

Documents

Comparison/Matching

Store1: Profiles/Search requests

Store2: Documentrepresentations

Indexing (Descriptive and

Subject)

Formulating query in terms of

descriptors

Storage of profiles

Storage of Documents

Information Storage and Retrieval System

Selected relevant docs


Query Modification• Problem: how to reformulate the query?

– Thesaurus expansion:• Suggest terms similar to query terms

– Relevance feedback:• Suggest terms (and documents) similar to retrieved documents

that have been judged to be relevant


Relevance Feedback• Main Idea:

– Modify existing query based on relevance judgements• Extract terms from relevant documents and add them to the

query

• and/or re-weight the terms already in the query

– Two main approaches:• Automatic (psuedo-relevance feedback)

• Users select relevant documents

– Users/system select terms from an automatically-generated list


Relevance Feedback

• Usually do both:– expand query with new terms

– re-weight terms in query

• There are many variations– usually positive weights for terms from relevant docs

– sometimes negative weights for terms from non-relevant docs

– Remove terms ONLY in non-relevant documents


Rocchio Method

0.25) to and 0.75 toset best to studies some(in

t termsnonrelevan andrelevant of importance the tune and

chosen documentsrelevant -non ofnumber the

chosen documentsrelevant ofnumber the

document relevant -non for the vector the

document relevant for the vector the

query initial for the vector the

2

1

0

1 21 101

21

n

n

iS

iR

Q

where

n

S

n

RQQ

i

i

n

i

in

i

i


Rocchio Method

0.25) to and 0.75 toset best to studies some(in

t termsnonrelevan andrelevant of importance thetune and ,

chosen documentsrelevant -non ofnumber the

chosen documentsrelevant ofnumber the

document relevant -non for the vector the

document relevant for the vector the

query initial for the vector the

2

1

0

121101

21

n

n

iS

iR

Q

where

Sn

Rn

QQ

i

i

i

n

i

n

ii


Rocchio/Vector Illustration

Retrieval

Information

0.5

1.0

0 0.5 1.0

D1

D2

Q0

Q’

Q”

Q0 = retrieval of information = (0.7,0.3)D1 = information science = (0.2,0.8)D2 = retrieval systems = (0.9,0.1)

Q’ = ½*Q0+ ½ * D1 = (0.45,0.55)Q” = ½*Q0+ ½ * D2 = (0.80,0.20)


Example Rocchio Calculation

)04.1,033.0,488.0,022.0,527.0,01.0,002.0,000875.0,011.0(

12

25.0

75.0

1

)950,.00.0,450,.00.0,500,.00.0,00.0,00.0,00.0(

)00.0,020,.00.0,025,.005,.00.0,020,.010,.030(.

)120,.100,.100,.025,.050,.002,.020,.009,.020(.

)120,.00.0,00.0,050,.025,.025,.00.0,00.0,030(.

121

1

2

1

new

new

Q

SRRQQ

Q

S

R

R

Relevantdocs

Non-rel doc

Original Query

Constants

Rocchio Calculation

Resulting feedback query


Rocchio Method

• Rocchio automatically– re-weights terms

– adds in new terms (from relevant docs)• have to be careful when using negative terms

• Rocchio is not a machine learning algorithm

• Most methods perform similarly– results heavily dependent on test collection

• Machine learning methods are proving to work better than standard IR approaches like Rocchio


Probabilistic Relevance Feedback Robertson & Sparck Jones

Document Relevance

Documentindexing

Given a query term t

+ -

+ r n-r n

- R-r N-n-R+r N-n

R N-R N

Where N is the number of documents seen


Robertson-Spark Jones Weights

• Retrospective formulation --

rRnNrnrR

r

wnewt log


Robertson-Sparck Jones Weights

5.05.05.0

5.0

log)1(

rRnNrnrR

r

w

Predictive formulation


Using Relevance Feedback

• Known to improve results– in TREC-like conditions (no user involved)

• What about with a user in the loop?– How might you measure this?


Relevance Feedback Summary

• Iterative query modification can improve precision and recall for a standing query

• In at least one study, users were able to make good choices by seeing which terms were suggested for R.F. and selecting among them

Information Organization and Retrieval9/21/2000

Alternative Notions of Relevance Feedback



• Find people whose taste is “similar” to yours. Will you like what they like?

• Follow a users’ actions in the background. Can this be used to predict what the user will want to see next?

• Track what lots of people are doing. Does this implicitly indicate what they think is good and not good?



• Several different criteria to consider:– Implicit vs. Explicit judgements – Individual vs. Group judgements– Standing vs. Dynamic topics– Similarity of the items being judged vs. similarity

of the judges themselves

Information Organization and Retrieval

Collaborative Filtering (social filtering)

• If Pam liked the paper, I’ll like the paper

• If you liked Star Wars, you’ll like Independence Day

• Rating based on ratings of similar people– Ignores the text, so works on text, sound, pictures etc.

– But: Initial users can bias ratings of future users

Sally Bob Chris Lynn KarenStar Wars 7 7 3 4 7Jurassic Park 6 4 7 4 4Terminator II 3 4 7 6 3Independence Day 7 7 2 2 ?


• Users rate musical artists from like to dislike– 1 = detest 7 = can’t live without 4 = ambivalent

– There is a normal distribution around 4

– However, what matters are the extremes

• Nearest Neighbors Strategy: Find similar users and predicted (weighted) average of user ratings

• Pearson r algorithm: weight by degree of correlation between user U and user J– 1 means very similar, 0 means no correlation, -1 dissimilar

– Works better to compare against the ambivalent rating (4), rather than the individual’s average score

Ringo Collaborative Filtering (Shardanand & Maes 95)

22 )()(

))((

JJUU

JJUUrUJ


Social Filtering• Ignores the content, only looks at who judges

things similarly• Works well on data relating to “taste”

– something that people are good at predicting about each other too

• Does it work for topic? – GroupLens results suggest otherwise (preliminary)– Perhaps for quality assessments– What about for assessing if a document is about a

topic?


Learning interface agents• Add agents in the UI, delegate tasks to them

• Use machine learning to improve performance– learn user behavior, preferences

• Useful when:– 1) past behavior is a useful predictor of the future

– 2) wide variety of behaviors amongst users

• Examples: – mail clerk: sort incoming messages in right mailboxes

– calendar manager: automatically schedule meeting times?


Summary

• Relevance feedback is an effective means for user-directed query modification.

• Modification can be done with either direct or indirect user input

• Modification can be done based on an individual’s or a group’s past input.