Comparing Twitter Summarization Algorithms for Multiple Post Summaries

Comparing Twitter Summarization Al-gorithms for Multiple Post SummariesDavid Inouye and Jugal K. KalitaSocialCom 2011

2013 May 10Hyewon Lim

Outline Introduction Related Work Problem Definition Selected Approaches for Twitter Summaries Experimental Setup Results and Analysis Conclusion

Introduction Motivation of the summarizer

Introduction Prior work

– “A torch extinguished: Ted Kennedy dead at 77.”“A legend gone: Ted Kennedy died of brain cancer.”“Ted Kennedy was a leader.”“Ted Kennedy died today.”

B. Sharifi et al., “Automatic Summarization of Twitter Topics” 4/24

Introduction Prior work (cont.)

– “A torch extinguished: Ted Kennedy dead at 77.”“A legend gone: Ted Kennedy died of brain cancer.”“Ted Kennedy was a leader.”“Ted Kennedy died today.”

Best final summary: Ted Kennedy died

B. Sharifi et al., “Automatic Summarization of Twitter Topics” 5/24

Introduction We create summaries that contain multiple posts

– Several sub-topics or themes in a specified topic

Related Work Text summarization

– Reduce the amount of content to read– Reduce the number of features required for classifying or clustering

Multi-document summarization– Potential redundancy

Algorithms – SumBasic, Centroid, LexRank, TextRank, MEAD, …

Related Work SumBasic

Centroid

“A torch extinguished: Ted Kennedy dead at 77.”“A legend gone: Ted Kennedy died of brain cancer.”“Ted Kennedy was a leader.”“Ted Kennedy died today.”

Ted Kennedy died

(D. R. Radev et al., “Centroid-based summarization of multiple documents”)

Related Work LexRank

– Adjacency matrix for computing the relative importance of sentences

TextRank– Find the most highly ranked sentences using the PageRank

Compatibility of systems of linear constraints over the set of natural numbers. Criteria of compatibility of a system of linear Diophantine equations, strict inequations, and nonstrict inequations are consid-ered. Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given. These criteria and the corre-sponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types systems and systems of mixed types.

Problem Definition Given

– A topic keyword or phrase T– Length k for the summary

Output– A set of representative posts S with a cardinality of k

such that1) ∀s ∈ S, T is in the text of s2) ∀si, ∀sj ∈ S, si ≁ sj

Selected Approaches for Twitter Summaries TF-IDF

(Term frequency) * (Inverse document frequency)

A microblog post is not a traditional document– Define a single document that encompass all the posts => IDF↓– Define each post as a document => TF↓

A…….A……………………A……......................……………………….A……………………………

Selected Approaches for Twitter Summaries Hybrid TF-IDF

– Define a document as a single post– Computing the term frequencies

Assume the document is the entire collection of posts

Select the top k most weighted posts– Cosine similarity for avoiding redundancy14/24

Selected Approaches for Twitter Summaries Cluster summarizer

1. Cluster the tweets into k clusters based on a similarity measure2. Summarize each cluster by picking the most weighted post

Bisecting k-means++ algorithm– Bisecting k-means

– k-means++ Chooses the next centroid ci, selecting ci = v’ ∈ V with probability

Selected Approaches for Twitter Summaries k-means++

k-means

Outlier problem

k-means++

http://blog.sragent.pe.kr/ 16/24

Selected Approaches for Twitter Summaries Algorithms to compare results

– Baseline Random summarizer Most recent summarizer

– SumBasic Depends only on the frequency of words

– MEAD Comparison between the more structured document domain and Twitter

– Graph-based method LexRank TextRank

Experimental Setup Data collection

– 5 consecutive days– Top ten currently trending topics every day– Approximately 1500 tweets for each topic

ROUGE– Automated summary vs. manual summaries

Choice of k

Results and Analysis Average F-measure, precision and recall

Results and Analysis Average score for human evaluation

Results and Analysis Paired two-sided T-test

Hybrid

SumBas

0.050.1

0.150.2

0.250.3

RecallPrecision

Conclusion The best techniques for summarizing Twitter topics

– Simple word frequency – Redundancy reduction

Simple algorithms seem to perform well – Not clear that added complexity will improve the quality of the summaries

Extension– Extrinsic evaluations (e.g., user survey)– Dynamically discovering a good value for k for k-means– Detect named entities and events in the documents

Comparing Twitter Summarization Algorithms for Multiple Post Summaries

Documents

Scene Summarization

Text Summarization: News and Beyondjulia/courses/CS4705/kathy/Slides09/Class16... · 7 Cut and Paste in Professional Summarization Humans also reuse the input text to produce summaries

Video Summarization Ppt

Speech Summarization

Automatic Text Summarization

Automatic Summarization - Maartje ter Hoeve · Maartje ter Hoeve m.a.terhoeve@uva.nl @maartjeterhoeve 24 ROUGE Lin, Chin-Yew. “ROUGE: A Package for Automatic Evaluation of Summaries.”

Visualization & Summarization

MLSUM: The Multilingual Summarization Corpus · microblogging platform. They are paired with summaries given by the author of each text. The dataset includes 10k summaries which were

Video summarization via spatio-temporal deep architecturefuturemedia.szu.edu.cn/assets/files/Video summarization... · 2020. 9. 7. · video summarization task. Zhang et al. tried

TEACHING SUMMARIZATION

Formative Assessment Tools. 3x Summarization To check understanding, ask kids to write three different summaries: One in 10-15 words One in 30-50 words

Even Metadata is Getting Big: Annotation Summarization ...web.cs.wpi.edu/~meltabakh/Publications/Insight... · Figure 1: Example of Annotation Summaries in InsightNotes. InsightNotes

Road to Summarization

Improving Transformer with Sequential Context ...tcci.ccf.org.cn/conference/2019/papers/321.pdf · Automatic text summarization is the process of generating brief summaries from input

Guided Summarization

Cosaliency: Where People Look When Comparing Imagesgraphics.stanford.edu/papers/cosaliency/cosaliency.pdfother images in the same collection, however. Implicit in each of these summarization

Lecture: Summarization

Patent Summarization and Paraphrasing - Electrical …ece.drexel.edu/walsh/David_PatentSummarization.pdfPatent Summarization I Patent Summarization is the technique of summarizing

Comparing Abstractive and Extractive Summarization of

Whiteboard Video Summarization via Spatio-Temporal Conﬂict ...rlaz/files/Kenny_ICDAR_2017.pdf · tables of contents, storyboards and pictorial summaries [3]. Generated visualizations