35
© CvR SIGIR2002

© CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

Embed Size (px)

Citation preview

Page 1: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Page 2: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Keith van RijsbergenTampere 12th August, 2002

Landmarks in Information Retrieval: the message out of the bottle

Page 3: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Introductory Remarks

• Exclusions – IE, TM, ..

• Commercial successes and failures

• Caveats

• Why we have survived.

• Where we were, where we are, where we are going.

Page 4: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Pre-history

Smee (1850)Wells (1936)Bush (1945)Bagley (1951) MITFairthorne (1945-52) RAELuhn (1958)Mooers (1952)

Page 5: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Experimental Methodology

Cleverdon CranfieldLancaster MedlarsKeen Cranfield/SmartSaracevic CWRUSalton SmartSparck Jones Ideal Test CollectionBlair & Maron StairsHarman TREC

Page 6: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Evaluation

ABNO/OBNA (Fairthorne)Precision, Recall -> trade-off (Cleverdon)Probabilistic versions (Swets)Measure-theoretic (Bollman)

Page 7: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

‘the world in 1980 according to Belver Griffith’

Who is missing?

Page 8: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Landmarks

Luhn’s tf weightingArchitectureRelevance FeedbackStemmingPoisson Model -> BM25Statistical weighting tf*idfVarious models

Page 9: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Luhn’s curve

Page 10: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

What about evaluation?

InformationProblem

IndexedObjects

Query

FictiveObjects

Representation Representation

Compare

Page 11: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Architecture (Brenda Gerrie, 1983)

Page 12: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Time I (highlights for me)1952 Mooers coins IR1958 International Conference on Scientific Information1960 Cranfield I1960 Maron and Kuhns paper1961 Towards IR, RAF1961 (-1965) Smart built1964 Washington conference on Association Methods1966 Cranfield II1968 Salton’s first book197- Cranfield conferences1975 CvR’s book1975 Ideal test collection1976 KSJ/SER JASIS paper

Page 13: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Time II1978 1st SIGIR1979 1st BCSIRSG1980 1st joint ACM/BCS conference on IR1981 KSJ book on IR Experiments1982 Belkin et al ASK hypothesis1983 - Okapi started1985 RIAO-11986 CvR logic model1990 Deerwester et al,LSI paper1991 CoLIS 1 (in Tampere!)1991 – Inquiry started1992 Ingwersen’s book1992 TREC-11998 Croft Ponte paper on language models

Page 14: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Matching

Inference

Model

Classification

Query Language

Query Definition

Query Dependence

Items wanted

Error response

Logic

Exact Match Partial (best) Match

Deduction Induction

Deterministic Probabilistic

Monothetic Polythetic

Artificial Natural

Complete Incomplete

Yes No

Matching Relevant

Sensitive Insensitive

Classical Non-classical

Representation a priori a posteriori

Language Models Logical Statistical

dimensions

Page 15: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Probabilistic Retrieval

Maron and KuhnsMiller (following Goffman)SER/KSJCroft

Page 16: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Vector Space Model

SaltonMurrayRocchio

Page 17: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Logical Model

Mooers/Faithorne 1960+Hillman 1965Cooper/Maron 1970+CvR 1986Nie/Amati/Bruza/Huibers 1990+

For

Against

Bar-Hillel 1950+Kasher 1966

Page 18: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Buried Treasure

Dependence e.g C.T YuUnified Probabilistic Model Maron/Cooper/SERCo-relevance IvieStochastic Processes Mandelbrot/HerdanBrouwerian Logics HillmanError Analysis Hughes/Cover/Duda

Page 19: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Hypotheses/Principles

P & R trade-off – ABNO/OBNAExhaustivity/SpecificityCluster HypothesisAssociation HypothesisProbability Ranking PrincipleLogical Uncertainty PrincipleASKPolyrepresentation

Items may be associated without apparent meaning butexploiting their association may help retrieval

Page 20: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Postulates of Impotence(according to Swanson, 1988)

• An information need cannot be expressed independent of context

• It is impossible to instruct a machine to translate a request into adequate search terms

• A document’s relevance depends on other seen documents

• It is never possible to verify whether all relevant documents have been found

• Machines cannot recognise meaning -> can’t beat human indexing etc

Page 21: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

….more postulates

• Word-occurrence statistics can neither represent meaning nor substitute for it

• The ability of an IR system to support an iterative process cannot be evaluated in terms of single-iteration human relevance judgment

• You can have either subtle relevance judgments or highly effective mechanised procedures, but not both

• Thus, consistently effective fully automatic in dexing and retrieval is not possible

Page 22: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

?

Conclusions

Page 23: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Co-ordination is positively correlated with external relevanceJackson, 1969 – Association Hypothesis

The larger the number of matching descriptive items, for arequest and document, the more likely the document is to berelevant to the requestSparck Jones, 1971- Relevance Hypothesis

Matching

Page 24: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

It is a common fallacy, underwritten at this date by theinvestment of several million dollars in a variety of retrievalhardware, that the algebra of Boole (1847) is the appropriateformalism for retrieval design…..The ‘logic’ of Brouwer,as invoked by Fairthorne, is one such weakening of thepostulate system,……Mooers, 1961

Another one:Logical Uncertainty PrincipleCvR, 1986

Inference

Page 25: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Co-occurrence [of terms] as a basis for grouping makesfor good swops i.e. permits substitutions which retrieverelevant rather than irrelevant documents.Sparck Jones, 1971. – Classification Hypothesis

If an index term is good at discriminating relevant fromnon-relevant document then any closely associated index termis also likely to be good at this. CvR, 1979 – Association Hypothesis

Closely associated documents tend to be relevant to the samerequests – CvR, 1971 - Cluster Hypothesis

Classification

Page 26: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Vector Space/LSIProbabilisticLogical

Models

Page 27: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Query Language

Artificial/Natural

Multilingual/cross-lingual

images

none at all

Page 28: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Query Definition

Complete/Incomplete

Independence/Dependence

Weighted/Unweighted

Query Expansion/one shot (feedback, web)

Sense disambiguation

Cross-lingual

Page 29: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Relevance Feedback

Ostensive Retrieval

Context

Query Expansion

Query Dependence

Page 30: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Relevance

ASK: Anomolous State of Knowledge

Situated Relevance

Items wanted

Page 31: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Precision and Recall

Error response

Page 32: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Logic

standard/non-standard

probabilistic logic

information flow/logic

Page 33: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

Discrimination/Representation

Specificity/Exhaustivity

Representation

Page 34: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002

NLP

Montague Semantics

Language Models

Stochastic

Page 35: © CvR SIGIR2002. © CvR SIGIR2002 Keith van Rijsbergen Tampere 12 th August, 2002 Landmarks in Information Retrieval: the message out of the bottle

© CvR SIGIR2002