LIS618 lecture 11 i/r performance evaluation Thomas Krichel 2011-11-21

LIS618 lecture 11i/r performance evaluation

Thomas Krichel2011-11-21

retrieval performance evaluation• "Recall" and "Precision" are two classic measures to

measure the performance of information retrieval in a single query.

• Both assume that there is an answer set of documents that contain the answer to the query.

• Performance is optimal if– the database returns all the documents in the answer set– the database returns only documents in the answer set

• Recall is the fraction of the relevant documents that the query result has captured.

• Precision is the fraction of the retrieved documents that is relevant.

recall and precision curves

• Assume that all the retrieved documents arrive at once and are being examined.

• During that process, the user discover more and more relevant documents. Recall increases.

• During the same process, at least eventually, there will be less and less useful document. Precision declines (usually).

• This can be represented as a curve.

Example• Let the answer set be {0,1,2,3,4,5,6,7,8,9} and

non-relevant documents represented by letters.

• A query reveals the following result: 7,a,3,b,c,9,n,j,l,5,r,o,s,e,4.• For the first document, (recall, precision) is

(10%,100%), for the third (20%,66%), for the sixth (30%,50%), for the tenth (40%,40%), and for the last (50%,33%).

recall/precision curves

• Such curves can be formed for each query.• An average curve, for each recall level, can be

calculated for several queries.• Recall and precision levels can also be used to

calculate two single-valued summaries. – average precision at seen document– R-precision

R-precision

• This is a pretty ad-hoc measure. • Let R be the size of the answer set.• Take the first R results of the query. • Find the number of relevant documents• Divide by R. • In our example, the R-precision is 40%. • An average can be calculated for a number of

queries.

average precision at seen document

• To find it, sum all the precision level for each new relevant document discovered by the user and divide by the total number of relevant documents for the query.

• In our example, it is (100+66+50+40+ 33)/5=57.8%

• This measure favors retrieval methods that get the relevant documents to the top.

critique of recall & precision

• Recall has to be estimated by an expert.• Recall is very difficult to estimate in a large

collection.• They focus on one query only. No serious user

works like this. • There are some other measures, but that is

more for an advanced course in IR.

http://openlib.org/home/krichel

Please shutdown the computers whenyou are done.

Thank you for your attention!

Documents

LIS618 lecture 11 i/r performance evaluation Thomas Krichel 2011-11-21