IR Evaluation tugas kampus

Embed Size (px)

Citation preview

  • 8/10/2019 IR Evaluation tugas kampus

    1/25

    Introduction to Information Retrieval

    Introduction to

    Information RetrievalCS276

    Information Retrieval and Web Search

    Pandu Nayak and Prabhakar Raghavan

    Lecture 8: Evaluation

  • 8/10/2019 IR Evaluation tugas kampus

    2/25

    Introduction to Information Retrieval

    2

    Measures for a search engine

    How fast does it index

    Number of documents/hour

    (Average document size)

    How fast does it search Latency as a function of index size

    Expressiveness of query language

    Ability to express complex information needs Speed on complex queries

    Uncluttered UI

    Is it free?

    Sec. 8.6

  • 8/10/2019 IR Evaluation tugas kampus

    3/25

    Introduction to Information Retrieval

    3

    Measures for a search engine

    All of the preceding criteria are measurable: we can

    quantify speed/size

    we can make expressiveness precise

    The key measure: user happiness What is this?

    Speed of response/size of index are factors

    But blindingly fast, useless answers wont make a user

    happy

    Need a way of quantifying user happiness

    Sec. 8.6

  • 8/10/2019 IR Evaluation tugas kampus

    4/25

    Introduction to Information Retrieval

    4

    Happiness: elusive to measure

    Most common proxy: relevanceof search results

    But how do you measure relevance?

    We will detail a methodology here, then examineits issues

    Relevance measurement requires 3 elements:

    1. A benchmark document collection

    2. A benchmark suite of queries

    3. A usually binary assessment of either Relevant or

    Nonrelevant for each query and each document

    Some work on more-than-binary, but not the standard

    Sec. 8.1

  • 8/10/2019 IR Evaluation tugas kampus

    5/25

    Introduction to Information Retrieval

    5

    Evaluating an IR system

    Note: the information needis translated into a

    query

    Relevance is assessed relative to the information

    neednot thequery E.g., Information need: I'm looking for information on

    whether drinking red wine is more effective at

    reducing your risk of heart attacks than white wine.

    Query: wine red white heart attack effective

    Evaluate whether the doc addresses the information

    need, not whether it has these words

    Sec. 8.1

  • 8/10/2019 IR Evaluation tugas kampus

    6/25

    Introduction to Information Retrieval

    6

    Standard relevance benchmarks

    TREC - National Institute of Standards and

    Technology (NIST) has run a large IR test bed for

    many years

    Reuters and other benchmark doc collections used Retrieval tasks specified

    sometimes as queries

    Human experts mark, for each query and for eachdoc, Relevant or Nonrelevant

    or at least for subset of docs that some system returned

    for that query

    Sec. 8.2

  • 8/10/2019 IR Evaluation tugas kampus

    7/25

    Introduction to Information Retrieval

    7

    Unranked retrieval evaluation:

    Precision and Recall

    Precision: fraction of retrieved docs that are relevant

    = P(relevant|retrieved)

    Recall: fraction of relevant docs that are retrieved

    = P(retrieved|relevant)

    Precision P = tp/(tp + fp)

    Recall R = tp/(tp + fn)

    Relevant Nonrelevant

    Retrieved tp fp

    Not Retrieved fn tn

    Sec. 8.3

  • 8/10/2019 IR Evaluation tugas kampus

    8/25

    Introduction to Information Retrieval

    8

    Unranked retrieval evaluation:

    Precision and Recall

    Precision

    The ability to retrievetop-ranked documents that are

    mostly relevant.

    Recall The ability of the search to find allof the relevant items in

    the corpus.

    Sec. 8.3

  • 8/10/2019 IR Evaluation tugas kampus

    9/25

    Introduction to Information Retrieval

    9

    Precision/Recall

    You can get high recall (but low precision) by

    retrieving all docs for all queries!

    Recall is a non-decreasing function of the number

    of docs retrieved

    In a good system, precision decreases as either the

    number of docs retrieved or recall increases This is not a theorem, but a result with strong empirical

    confirmation

    Sec. 8.3

  • 8/10/2019 IR Evaluation tugas kampus

    10/25

    Introduction to Information Retrieval

    10

    Trade-off between Recall and Precision

    10

    1

    Recall

    Precision

    The ideal

    Returns relevant documents but

    misses many useful ones too

    Returns most relevantdocuments but includes

    lots of junk

    d f l

  • 8/10/2019 IR Evaluation tugas kampus

    11/25

    Introduction to Information Retrieval

    11

    F-Measure

    One measure of performance that takes into account

    both recall and precision.

    Harmonic mean of recall and precision:

    Compared to arithmetic mean, both need to be highfor harmonic mean to be high.

    PRRP

    PRF

    11

    22

    I d i I f i R i l

  • 8/10/2019 IR Evaluation tugas kampus

    12/25

    Introduction to Information Retrieval

    12

    E Measure (parameterized F Measure)

    A variant of F measure that allows weightingemphasis on precision over recall:

    Value of controls trade-off:

    = 1: Equally weight precision and recall (E=F).

    > 1: Weight recall more.

    < 1: Weight precision more.

    PRRP

    PR

    E 1

    2

    2

    2

    2

    )1()1(

    I d i I f i R i l S 8 3

  • 8/10/2019 IR Evaluation tugas kampus

    13/25

    Introduction to Information Retrieval

    13

    Ranked retrieval evaluation:

    Precision and Recall

    Precision: fraction of retrieved docs that are relevant

    = P(relevant|retrieved)

    Recall: fraction of relevant docs that are retrieved

    = P(retrieved|relevant)

    Precision P = tp/(tp + fp)

    Recall R = tp/(tp + fn)

    Relevant Nonrelevant

    Retrieved tp fp

    Not Retrieved fn tn

    Sec. 8.3

    I t d ti t I f ti R t i l

  • 8/10/2019 IR Evaluation tugas kampus

    14/25

    Introduction to Information Retrieval

    14

    Computing Recall/Precision Points

    For a given query, produce the ranked list of

    retrievals.

    Adjusting a threshold on this ranked list produces

    different sets of retrieved documents, and thereforedifferent recall/precision measures.

    Mark each document in the ranked list that is

    relevant according to the gold standard.

    Compute a recall/precision pair for each position in

    the ranked list that contains a relevant document.

    I t d ti t I f ti R t i l

  • 8/10/2019 IR Evaluation tugas kampus

    15/25

    Introduction to Information Retrieval

    Example 1

    15

    n doc # relevant

    1 588 x

    2 589 x

    3 576

    4 590 x

    5 986

    6 592 x

    7 984

    8 988

    9 578

    10 985

    11 103

    12 591

    13 772 x

    14 990

    R=3/6=0.5; P=3/4=0.75

    Let total # of relevant docs = 6

    Check each new recall point:

    R=1/6=0.167; P=1/1=1

    R=2/6=0.333; P=2/2=1

    R=5/6=0.833; p=5/13=0.38

    R=4/6=0.667; P=4/6=0.667

    Missing onerelevant

    document.

    Never reach

    100% recall

    Introduction to Information Retrieval

  • 8/10/2019 IR Evaluation tugas kampus

    16/25

    Introduction to Information Retrieval

    Example 2

    16

    R=3/6=0.5; P=3/5=0.6

    Let total # of relevant docs = 6

    Check each new recall point:

    R=1/6=0.167; P=1/1=1

    R=2/6=0.333; P=2/3=0.667

    R=6/6=1.0; p=6/14=0.429

    R=4/6=0.667; P=4/8=0.5

    R=5/6=0.833; P=5/9=0.556

    n doc # relevant

    1 588 x

    2 576

    3 589 x

    4 342

    5 590 x

    6 717

    7 984

    8 772 x

    9 321 x

    10 498

    11 113

    12 628

    13 772

    14 592 x

    Introduction to Information Retrieval

  • 8/10/2019 IR Evaluation tugas kampus

    17/25

    Introduction to Information Retrieval

    Interpolating a Recall/Precision Curve

    Interpolate a precision value for each standard recall

    level:

    rj{0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}

    r0= 0.0, r1= 0.1, , r10=1.0

    The interpolated precision at thej-th standard recall

    level is the maximum known precision at any recall

    level between thej-th and (j + 1)-th level:

    17

    )(max)(1

    rPrPjj rrr

    j

    Introduction to Information Retrieval

  • 8/10/2019 IR Evaluation tugas kampus

    18/25

    Introduction to Information Retrieval

    Example 1

    18

    0.4 0.8

    1.0

    0.8

    0.6

    0.4

    0.2

    0.2 1.00.6 Recall

    Precision

    Introduction to Information Retrieval

  • 8/10/2019 IR Evaluation tugas kampus

    19/25

    Introduction to Information Retrieval

    Example 2

    19

    0.4 0.8

    1.0

    0.8

    0.6

    0.4

    0.2

    0.2 1.00.6 Recall

    Precision

  • 8/10/2019 IR Evaluation tugas kampus

    20/25

    Introduction to Information Retrieval

  • 8/10/2019 IR Evaluation tugas kampus

    21/25

    Introduction to Information Retrieval

    21

    Compare Two or More Systems

    The curve closest to the upper right-hand corner ofthe graph indicates the best performance

    0

    0.2

    0.4

    0.6

    0.8

    1

    0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Recall

    Precision

    NoStem Stem

    Introduction to Information Retrieval

  • 8/10/2019 IR Evaluation tugas kampus

    22/25

    Introduction to Information Retrieval

    22

    Sample RP Curve for CF Corpus

    Introduction to Information Retrieval

  • 8/10/2019 IR Evaluation tugas kampus

    23/25

    Introduction to Information Retrieval

    23

    R- Precision

    Precision at the R-th position in the ranking of results

    for a query that has R relevant documents.

    n doc # relevant

    1 588 x

    2 589 x

    3 576

    4 590 x

    5 986

    6 592 x

    7 984

    8 988

    9 578

    10 985

    11 103

    12 591

    13 772 x

    14 990

    R = # of relevant docs = 6

    R-Precision = 4/6 = 0.67

    Introduction to Information Retrieval

  • 8/10/2019 IR Evaluation tugas kampus

    24/25

    Introduction to Information Retrieval

    Mean Average Precision (MAP)

    Average Precision: Average of the precision values at the

    points at which each relevant document is retrieved.

    Ex1: (1 + 1 + 0.75 + 0.667 + 0.38 + 0)/6 = 0.633

    Ex2: (1 + 0.667 + 0.6 + 0.5 + 0.556 + 0.429)/6 = 0.625

    Mean Average Precision: Average of the average

    precision value for a set of queries.

    24

    Introduction to Information Retrieval

  • 8/10/2019 IR Evaluation tugas kampus

    25/25

    Introduction to Information Retrieval

    25