Upload
gertrude-fitzgerald
View
212
Download
0
Embed Size (px)
Citation preview
Novelty Detection in Repeated MEAD Summarization
Richard MurphyEECS 59706 December 2002
The Problem with MEAD
Works well for one-time summaries Summaries produced are readable, fairly informative
News stories are on-going, not one-time New, relevant articles may appear after cluster is summarized
Expanded cluster will include new informationSecond summary of a cluster will include lots of known information
New information often demoted--further from centroid
Repeated summaries lose value Reader can be assumed to remember past summaries Most informative summary will focus on new information with only
brief repetition of key points More repetition = Less new information = Less useful summary
[1] CNN.com - Plane hits skyscraper in Milan - April 18, 2002[2] CNNenEspanol.com A small plane has hit a skyscraper in central Milan, setting the top floors of the 30-story building on fire, an Italian journalist told CNN.[3] The crash by the Piper tourist plane into the 26th floor occurred at 5:50 p.m. (1450 GMT) on Thursday, said journalist Desideria Cavina.[4] Several storeys of the building were engulfed in fire, she said.[5] Italian TV says the crash put a hole in the 25th floor of the Pirelli building, and that smoke is pouring from the opening.[6] U.N. envoy horror at Jenin camp U.S. bombing kills Canadians Chinese missiles concern U.S. 2002 Cable News Network LP, LLLP.[7] The building houses government offices and is next to the city's central train station.
[1] CNN.com - Plane hits skyscraper in Milan - April 18, 2002[2] The crash by the Piper tourist plane into the 26th floor occurred at 5:50 p.m. (1450 GMT) on Thursday, said journalist Desideria Cavina.[3] The building houses government offices and is next to the city's central train station.[4] Italian TV says the crash put a hole in the 25th floor of the Pirelli building, and that smoke is pouring from the opening.[5] U.N. envoy horror at Jenin camp U.S. bombing kills Canadians Chinese missiles concern U.S. 2002 Cable News Network LP, LLLP.[6] The Pirelli Building in Milan, Italy, was hit by a small plane.[7] (ABCNEWS.com) 8212; A small plane crashed into a skyscraper in downtown Milan today, setting several floors of the 30-story building on fire.[8] The plane crashed into the 25th floor of the Pirelli building in downtown Milan.[9] A small airplane crashed into a government building in heart of Milan, setting the top floors on fire, Italian police reported.[10] WITNESSES REPORTED hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city s central train station.[11] Italian state television said the crash put a hole in the 25th floor of the Pirelli building.[12] CNNenEspanol.com A small plane has hit a skyscraper in central Milan, setting the top floors of the 30-story building on fire, an Italian journalist told CNN.
Solution: MEAD with a memory
Save summaries with cluster informationWhen summarizing cluster in future, check for archived summariesDuring reranking, compare sentences to sentences in old summaries
Existing default-reranker.pl module compares sentences in summary to each other using cosine similarity metric, eliminates those that are too similar to other sentences in the summary
After this process, use cosine similarity to demote sentences in new summary that are too similar to sentences in old summary
Don’t completely eliminate sentences similar to known information--If user requests large enough summary, “background” (already seen) information should appear lower in new summary
User specific In a MEAD-based system like NewsInEssence, users could log in to get
updated summaries of on-going stories
Evaluating Multiple Summaries
Evaluation of single (first) summary Create manual extract from current cluster Run meadeval.pl to calculate precision/recall/kappa of automated
summary
Evaluation of subsequent summaries Create manual extract from current cluster and past automated
summaries (not past manual summaries--reader will have seen the automated output)
Run meadeval.pl
Always use the cluster which was available to MEAD at time of automated summarization
Comparing MEAD to MEAD with memory
Default MEAD--Initial summary: Precision:
0.571428571428571 Recall: 0.571428571428571 Kappa: 0.539170506912442
Default MEAD--Second summary:
Precision: 0.25 Recall: 0.25 Kappa: 0.147727272727273
Default MEAD--Third summary: Precision:
0.0833333333333333 Recall: 0.0833333333333333 Kappa: -0.0416666666666663
MEAD with memory--Initial: Precision: 0.571428571428571 Recall: 0.571428571428571 Kappa: 0.539170506912442
MEAD with memory--Second: Precision: 0.333333333333333 Recall: 0.333333333333333 Kappa: 0.242424242424242
MEAD with memory--Third: Precision: 0.833333333333333 Recall: 0.833333333333333 Kappa: 0.81060606060606
Settings: demote on cosine-similarity >= 0.7, demote by 0.1 points
Remaining / Future Work
More testing More test clusters Different values of demotion increment, demotion similarity cutoff Command-line options for demotion settings Varying levels of demotion based on position in old summary
Multiple users Currently assumes cluster belongs to an individual user Add command-line identification of user so that multiple users can
summarize cluster without being affected by each others’ archives
News in Essence interface Remember website visitors, keep unique archives for each