21
1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

1

I256: Applied Natural Language Processing

Marti HearstSept 27, 2006 

 

Page 2: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

2

Evaluation Measures

Page 3: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

3

Evaluation Measures

Precision: Proportion of those you labeled X that the gold standard thinks really is X #correctly labeled by alg/ all labels assigned by alg #True Positive / (#True Positive + #False Positive)

Recall:Proportion of those items that are labeled X in the gold standard that you actually label X#correctly labeled by alg / all possible correct labels#True Positive / (#True Positive + # False Negative)

Page 4: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

4

F-measure

Can “cheat” with precision scores by labeling (almost) nothing with X.Can “cheat” on recall by labeling everything with X.The better you do on precision, the worse on recall, and vice versaThe F-measure is a balance between the two.

2*precision*recall / (recall+precision)

Page 5: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

5

Evaluation Measures

Accuracy:Proportion that you got right (#True Positive + #True Negative) / N

N = TP + TN + FP + FN

Error:(#False Positive + #False Negative)/N

Page 6: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

6

Prec/Recall vs. Accuracy/Error

When to use Precision/Recall?Useful when there are only a few positives and many many negativesAlso good for ranked ordering

– Search results ranking

When to use Accuracy/ErrorWhen every item has to be judged, and it’s important that every item be correct.Error is better when the differences between algorithms are very small; let’s you focus on small improvements.

– Speech recognition

Page 7: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

7

Evaluating Partial Parsing

How do we evaluate it?

Page 8: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

8

Evaluating Partial Parsing

Page 9: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

9

Testing our Simple Fule

Let’s see where we missed:

Page 10: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

10

Update rules; Evaluate Again

Page 11: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

11

Evaluate on More Examples

Page 12: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

12

Incorrect vs. MissedAdd code to print out which were incorrect

Page 13: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

13

Missed vs. Incorrect

Page 14: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

14

What is a good Chunking Baseline?

Page 15: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

15

Page 16: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

16

The Tree Data Structure

Page 17: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

17

Baseline Code (continued)

Page 18: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

18

Evaluating the Baseline

Page 19: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

19

Cascaded Chunking

Page 20: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

20

Page 21: 1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006

21

Next Time

Summarization