7
Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002

Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002

Embed Size (px)

Citation preview

Page 1: Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002

Novelty Detection in Repeated MEAD Summarization

Richard MurphyEECS 59706 December 2002

Page 2: Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002

The Problem with MEAD

Works well for one-time summaries Summaries produced are readable, fairly informative

News stories are on-going, not one-time New, relevant articles may appear after cluster is summarized

Expanded cluster will include new informationSecond summary of a cluster will include lots of known information

New information often demoted--further from centroid

Repeated summaries lose value Reader can be assumed to remember past summaries Most informative summary will focus on new information with only

brief repetition of key points More repetition = Less new information = Less useful summary

Page 3: Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002

[1] CNN.com - Plane hits skyscraper in Milan - April 18, 2002[2] CNNenEspanol.com A small plane has hit a skyscraper in central Milan, setting the top floors of the 30-story building on fire, an Italian journalist told CNN.[3] The crash by the Piper tourist plane into the 26th floor occurred at 5:50 p.m. (1450 GMT) on Thursday, said journalist Desideria Cavina.[4] Several storeys of the building were engulfed in fire, she said.[5] Italian TV says the crash put a hole in the 25th floor of the Pirelli building, and that smoke is pouring from the opening.[6] U.N. envoy horror at Jenin camp U.S. bombing kills Canadians Chinese missiles concern U.S. 2002 Cable News Network LP, LLLP.[7] The building houses government offices and is next to the city's central train station.

[1] CNN.com - Plane hits skyscraper in Milan - April 18, 2002[2] The crash by the Piper tourist plane into the 26th floor occurred at 5:50 p.m. (1450 GMT) on Thursday, said journalist Desideria Cavina.[3] The building houses government offices and is next to the city's central train station.[4] Italian TV says the crash put a hole in the 25th floor of the Pirelli building, and that smoke is pouring from the opening.[5] U.N. envoy horror at Jenin camp U.S. bombing kills Canadians Chinese missiles concern U.S. 2002 Cable News Network LP, LLLP.[6] The Pirelli Building in Milan, Italy, was hit by a small plane.[7] (ABCNEWS.com) 8212; A small plane crashed into a skyscraper in downtown Milan today, setting several floors of the 30-story building on fire.[8] The plane crashed into the 25th floor of the Pirelli building in downtown Milan.[9] A small airplane crashed into a government building in heart of Milan, setting the top floors on fire, Italian police reported.[10] WITNESSES REPORTED hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city s central train station.[11] Italian state television said the crash put a hole in the 25th floor of the Pirelli building.[12] CNNenEspanol.com A small plane has hit a skyscraper in central Milan, setting the top floors of the 30-story building on fire, an Italian journalist told CNN.

Page 4: Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002

Solution: MEAD with a memory

Save summaries with cluster informationWhen summarizing cluster in future, check for archived summariesDuring reranking, compare sentences to sentences in old summaries

Existing default-reranker.pl module compares sentences in summary to each other using cosine similarity metric, eliminates those that are too similar to other sentences in the summary

After this process, use cosine similarity to demote sentences in new summary that are too similar to sentences in old summary

Don’t completely eliminate sentences similar to known information--If user requests large enough summary, “background” (already seen) information should appear lower in new summary

User specific In a MEAD-based system like NewsInEssence, users could log in to get

updated summaries of on-going stories

Page 5: Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002

Evaluating Multiple Summaries

Evaluation of single (first) summary Create manual extract from current cluster Run meadeval.pl to calculate precision/recall/kappa of automated

summary

Evaluation of subsequent summaries Create manual extract from current cluster and past automated

summaries (not past manual summaries--reader will have seen the automated output)

Run meadeval.pl

Always use the cluster which was available to MEAD at time of automated summarization

Page 6: Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002

Comparing MEAD to MEAD with memory

Default MEAD--Initial summary: Precision:

0.571428571428571 Recall: 0.571428571428571 Kappa: 0.539170506912442

Default MEAD--Second summary:

Precision: 0.25 Recall: 0.25 Kappa: 0.147727272727273

Default MEAD--Third summary: Precision:

0.0833333333333333 Recall: 0.0833333333333333 Kappa: -0.0416666666666663

MEAD with memory--Initial: Precision: 0.571428571428571 Recall: 0.571428571428571 Kappa: 0.539170506912442

MEAD with memory--Second: Precision: 0.333333333333333 Recall: 0.333333333333333 Kappa: 0.242424242424242

MEAD with memory--Third: Precision: 0.833333333333333 Recall: 0.833333333333333 Kappa: 0.81060606060606

Settings: demote on cosine-similarity >= 0.7, demote by 0.1 points

Page 7: Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002

Remaining / Future Work

More testing More test clusters Different values of demotion increment, demotion similarity cutoff Command-line options for demotion settings Varying levels of demotion based on position in old summary

Multiple users Currently assumes cluster belongs to an individual user Add command-line identification of user so that multiple users can

summarize cluster without being affected by each others’ archives

News in Essence interface Remember website visitors, keep unique archives for each