Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)

Topic by Topic Performance of Topic by Topic Performance of Information Retrieval SystemsInformation Retrieval Systems

Walter LiggettWalter LiggettNational Institute of Standards and TechnologyNational Institute of Standards and Technology

TREC-7 (1999)TREC-7 (1999)

2

AbstractAbstract Formulation of topic properties is the goal of this paper.Formulation of topic properties is the goal of this paper. Applying statistical method to both TREC-6 and TREC-7 IR Applying statistical method to both TREC-6 and TREC-7 IR

results, we identify topic pairs that exemplify topic results, we identify topic pairs that exemplify topic properties useful in relating topic statements to system properties useful in relating topic statements to system performance.performance.

We formulate topic properties by relating the We formulate topic properties by relating the corresponding topic statements to what is known about corresponding topic statements to what is known about IR systems.IR systems.

Some properties apparent in the topic pairs identified are Some properties apparent in the topic pairs identified are linked to topic expansion, and these pairs exemplify both linked to topic expansion, and these pairs exemplify both the need for expansion and the danger in automatic the need for expansion and the danger in automatic expansion.expansion.

3

Topic PropertiesTopic Properties System developers would like to be able to judge topic System developers would like to be able to judge topic

difficulty from reasoning involving topic properties.difficulty from reasoning involving topic properties. Ex: a set of topic properties to describe whether several Ex: a set of topic properties to describe whether several

sentences are needed to narrow the topic sufficiently?sentences are needed to narrow the topic sufficiently? There is some doubt that formulation of topic There is some doubt that formulation of topic

properties is even possible.properties is even possible. ““Little is known about what makes a topic difficult,” Little is known about what makes a topic difficult,”

conclude Voorhees and Harman in their TREC-6 conclude Voorhees and Harman in their TREC-6 overview.overview.

Our analysis of the ad hoc task shows that one can Our analysis of the ad hoc task shows that one can connect system performance to generic topic connect system performance to generic topic properties.properties.

4

Topic PropertiesTopic Properties The effect of topic properties on system performance The effect of topic properties on system performance

depends on the document collection.depends on the document collection. Our statistical methods might connect two topics not Our statistical methods might connect two topics not

because the topic statements share some property but because the topic statements share some property but because the document collection has some unexpected because the document collection has some unexpected characteristic.characteristic.

Topic properties formulated through the TREC ad hoc Topic properties formulated through the TREC ad hoc task might not be as useful with other document task might not be as useful with other document collections.collections.

For each topic in the ad hoc task, the TREC evaluation For each topic in the ad hoc task, the TREC evaluation provides system performances, a collection of numbers provides system performances, a collection of numbers that might be termed a performance profile.that might be termed a performance profile. Partial answer to the question “How do topics differ?”Partial answer to the question “How do topics differ?”

5

Topic PropertiesTopic Properties This paper presents a pair of TREC-6 topics and three This paper presents a pair of TREC-6 topics and three

pairs of TREC-7 topics for the reader to study.pairs of TREC-7 topics for the reader to study. It’s hoped that the reader will be able to offer an It’s hoped that the reader will be able to offer an

opinion.opinion. Presentation of the four pairs involves two alternative Presentation of the four pairs involves two alternative

measures of system performance and a method for measures of system performance and a method for decomposing the system-by-topic table of decomposing the system-by-topic table of performance measurements.performance measurements. Average precisionAverage precision

Partially depend on the number of relevant documentsPartially depend on the number of relevant documents Depth at 25 percent recallDepth at 25 percent recall Two-way analysis of varianceTwo-way analysis of variance

6

Topic Pairs for StudyTopic Pairs for Study Statistical analysis suggests the following two pairs of Statistical analysis suggests the following two pairs of

TREC-7 topics for careful study.TREC-7 topics for careful study. 372 and 379 for study with respect to the 40 best 372 and 379 for study with respect to the 40 best

systemssystems 372 and 391 with respect to the 14 best automatic 372 and 391 with respect to the 14 best automatic

systems that use all parts of the topic statementsystems that use all parts of the topic statement

7

Topic Pairs for StudyTopic Pairs for Study

8

Topic Pairs for StudyTopic Pairs for Study The reason that statistical analysis of system The reason that statistical analysis of system

performance connects these topics.performance connects these topics. The appearance of common system successes and The appearance of common system successes and

common system failures in expansion of these topics.common system failures in expansion of these topics. Manual systems seem to do relatively well with topics Manual systems seem to do relatively well with topics

372 and 379 whereas automatic systems do relatively 372 and 379 whereas automatic systems do relatively poorly.poorly.

In trying to conceive of the topic property common to In trying to conceive of the topic property common to the members of these pairs, one must also recognize the members of these pairs, one must also recognize other topic properties in which these topics differ.other topic properties in which these topics differ.

9

Data AnalysisData Analysis Comparison of two topics in search of a common topic Comparison of two topics in search of a common topic

property involves for each topic:property involves for each topic: The statementThe statement The performance for a group of systemsThe performance for a group of systems

Average precision, depth at 25 percent recallAverage precision, depth at 25 percent recall Performance is divided into components.Performance is divided into components.

The overall componentThe overall component Compare two topics in terms of the overall abilities of the systems.Compare two topics in terms of the overall abilities of the systems.

The distinctive componentThe distinctive component Compare two topics in terms of deviations from these overall Compare two topics in terms of deviations from these overall

abilities.abilities. The topic-by-topic similarity of the distinctive componentThe topic-by-topic similarity of the distinctive component

10

Computation of the ComponentsComputation of the Components Let Let NNss be the number of systems, be the number of systems, NNtt be the number of topics, and be the number of topics, and yyijij be the performance measure for system be the performance measure for system ii and topic and topic jj..

The difficulty of topic The difficulty of topic jj is: is:

The (centered) average performance of system The (centered) average performance of system ii is: is:

Variation from topic to topic in the effect of overall system abilitiVariation from topic to topic in the effect of overall system abilities:es:

11

Computation of the ComponentsComputation of the Components The overall component is given by:The overall component is given by:

The remainder is:The remainder is:

The remainder reflects The remainder reflects interactionsinteractions, cases where after , cases where after adjustment for overall performance, one system is better adjustment for overall performance, one system is better than another for one topic but not for another topic.than another for one topic but not for another topic. The remainder is not easy to interpret because it’s noisy.The remainder is not easy to interpret because it’s noisy.

The distinctive component = the remainder excluding noiseThe distinctive component = the remainder excluding noise

12

Overall Component for Pair-1Overall Component for Pair-1

Topic 372=“Native American casino”, 379=“mainstreaTopic 372=“Native American casino”, 379=“mainstreaming”ming” “ “t”=title, “d”=description, “s”=title+description, t”=title, “d”=description, “s”=title+description, “l”=“s”+narrative, “m”=manual“l”=“s”+narrative, “m”=manual

13

Distinctive Component for Pair-1Distinctive Component for Pair-1

Too many query types …

14

Interpretation of the DataInterpretation of the Data System-to-system variation in tSystem-to-system variation in the overall component reflects the overall component reflects the average performance.he average performance. There is discrepancy between tThere is discrepancy between the two measures.he two measures.

AP depends on # of rel-docs.AP depends on # of rel-docs. There is little to suggest that oThere is little to suggest that one topic is easier than the othene topic is easier than the other.r.

15

Interpretation of the DataInterpretation of the Data Figure 4 shows that 372-379 Figure 4 shows that 372-379

share one or more topic share one or more topic properties related to properties related to challenges in topic challenges in topic expansion.expansion. Manual vs. automaticManual vs. automatic

16

Components for 372-391Components for 372-391

Automatic systems that use the entire topic statement.Automatic systems that use the entire topic statement. What features of these systems cause them to favor topics 372 and 391 when What features of these systems cause them to favor topics 372 and 391 when

overall performance of these systems is variable, both better and worse than overall performance of these systems is variable, both better and worse than the average?the average?

17

The Distinctive ComponentThe Distinctive Component A A NNss × × NNtt matrix with elements matrix with elements

Analysis of the residual matrix for choosing topic pairsAnalysis of the residual matrix for choosing topic pairs Singular value decompositionSingular value decomposition ApproximationApproximation OptimizationOptimization ……

18

Topic 352 and 385Topic 352 and 385

19


The title-only systems perform relatively better than description-The title-only systems perform relatively better than description-only systems in terms of the distinctive component.only systems in terms of the distinctive component.

The key noun phrase differs between title and description.The key noun phrase differs between title and description.

20

Topic 312 and 316Topic 312 and 316

21


Very specific key words: “hydroponics” and “polygamy”.Very specific key words: “hydroponics” and “polygamy”. The manual systems that did well seem better able to take The manual systems that did well seem better able to take

advantage of these key words than automatic systems.advantage of these key words than automatic systems.

22

ConclusionsConclusions This paper offers four pairs of topics chosen by This paper offers four pairs of topics chosen by

statistical methods.statistical methods. Faced with the challenge of hypothesizing what each Faced with the challenge of hypothesizing what each

pair has in common, it seems that one has some basis pair has in common, it seems that one has some basis for a response.for a response.

One topic property that seems to have surfaced is the One topic property that seems to have surfaced is the need for parsimonious expansion.need for parsimonious expansion.

23

[email protected]@.25RMeasure of depth at 25 percent recall Measure of depth at 25 percent recall ＝＝ -log(-log(rr.25.25))

Documents

Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)