Upload
cody-buntain
View
111
Download
0
Tags:
Embed Size (px)
Citation preview
Discovering Credible Events in Near Real Time from Social Media Streams
WWW2015Cody Buntain, @codybuntain, [email protected] by Dr. Jen Golbeck
Social networks are
powerful resources
during crises.
2 Carvin, A. "The 2008 Mumbai Attacks As They Happened On Twitter." Storify.com. https://storify.com/acarvin/the-2008-mumbai-attacks-as-they-happened-on-twitte
Social networks are
powerful resources
during crises.
3 B. Parr. “#IranElection Crisis: A Social Media Timeline.” 21 June 2009. http://mashable.com/2009/06/21/iran-election-timeline/
Social networks are
powerful resources
during crises.
4Huffington Post. "Chile Earthquake PICTURES: Twitter Photos Record The Wreckage In Chile (PHOTOS)."
http://www.huffingtonpost.com/2010/02/27/chile-earthquake-pictures_n_479535.html
Social networks are
powerful resources
during crises.
5 Alex Nunns, Nadia Idle. "Tweets from Tahrir: Egypt's Revolution as it Unfolded, in the Words of the People who Made it." OR Books. 2011.
Social networks are
powerful resources
during crises.
6
E. Guskin, P. Hitlin. "Hurricane Sandy and Twitter." November 6, 2012. http://www.journalism.org/2012/11/06/hurricane-sandy-and-twitter/
C. Ngak. "Social media a news source and tool during Superstorm Sandy." October 30, 2012. http://www.cbsnews.com/news/social-media-a-news-source-and-tool-during-superstorm-sandy/
Social networks are
powerful resources
during crises.
7C. Kanalley. "Boston Marathon Bombing Timeline: The Week in 50 Tweets, 5 Videos." Huffington Post. 21 April 2013. http://www.huffingtonpost.com/craig-kanalley/boston-marathon-bombing-timeline_b_3125721.html
Social networks are
powerful resources
during crises.
7C. Kanalley. "Boston Marathon Bombing Timeline: The Week in 50 Tweets, 5 Videos." Huffington Post. 21 April 2013. http://www.huffingtonpost.com/craig-kanalley/boston-marathon-bombing-timeline_b_3125721.html
Social media is not completely
trustworthy.
13
C. Reinwald. "What Twitter Got Wrong During the Week Following Last Year’s Boston Marathon." Boston.com. 18 April 2014.
http://www.boston.com/news/local/massachusetts/2014/04/18/what-twitter-got-wrong-during-the-boston-marathon-bombing-week/
ZOYLJpEydYgJ8UYNUT674H/story.html
Social media is not completely
trustworthy.
13
C. Reinwald. "What Twitter Got Wrong During the Week Following Last Year’s Boston Marathon." Boston.com. 18 April 2014.
http://www.boston.com/news/local/massachusetts/2014/04/18/what-twitter-got-wrong-during-the-boston-marathon-bombing-week/
ZOYLJpEydYgJ8UYNUT674H/story.html
Social media is not completely
trustworthy.
13
C. Reinwald. "What Twitter Got Wrong During the Week Following Last Year’s Boston Marathon." Boston.com. 18 April 2014.
http://www.boston.com/news/local/massachusetts/2014/04/18/what-twitter-got-wrong-during-the-boston-marathon-bombing-week/
ZOYLJpEydYgJ8UYNUT674H/story.html
Social media is not completely
trustworthy.
14Lu Wang, Whitney Kisling and Eric Lam. "Fake Post Erasing $136 Billion Shows Markets Need Humans." Bloomberg.com. 23 April, 2013. http://www.bloomberg.com/news/2013-04-23/fake-report-erasing-136-billion-shows-market-s-fragility.html
Social media is not completely
trustworthy.
14Lu Wang, Whitney Kisling and Eric Lam. "Fake Post Erasing $136 Billion Shows Markets Need Humans." Bloomberg.com. 23 April, 2013. http://www.bloomberg.com/news/2013-04-23/fake-report-erasing-136-billion-shows-market-s-fragility.html
Social media is not completely
trustworthy.
14Lu Wang, Whitney Kisling and Eric Lam. "Fake Post Erasing $136 Billion Shows Markets Need Humans." Bloomberg.com. 23 April, 2013. http://www.bloomberg.com/news/2013-04-23/fake-report-erasing-136-billion-shows-market-s-fragility.html
Social media is not completely
trustworthy.
14Lu Wang, Whitney Kisling and Eric Lam. "Fake Post Erasing $136 Billion Shows Markets Need Humans." Bloomberg.com. 23 April, 2013. http://www.bloomberg.com/news/2013-04-23/fake-report-erasing-136-billion-shows-market-s-fragility.html
By integrating machine learning and high-volume streams across social networks, one can detect high-impact events, identify specific occurrences within those events, and evaluate credibility of those occurrences in near real time.
17
Thesis Statement
By integrating machine learning and high-volume streams across social networks, one can detect high-impact events, identify specific occurrences within those events, and evaluate credibility of those occurrences in near real time.
18
Thesis Statement
By integrating machine learning and high-volume streams across social networks, one can detect high-impact events, identify specific occurrences within those events, and evaluate credibility of those occurrences in near real time.
19
Thesis Statement
By integrating machine learning and high-volume streams across social networks, one can detect high-impact events, identify specific occurrences within those events, and evaluate credibility of those occurrences in near real time.
20
Thesis Statement
Completed Work
22 * www.iflscience.com
Discovering Interesting Moments with
Burst Detection
Typical event detection
systems track human-
generated, predefined keywords.
23
Tweets per second mentioning “gol copa, gool copa, goool, golaço” during the match June 12th, 2014 [1]
Tweets per hour related to earthquakes [2]
[1] L. Cipriani, “Goal! Detecting the most important World Cup moments,” Twitter Developer’s Blog. 23 June 2014. https://blog.twitter.com/2014/goal-detecting-the-most-important-world-cup-moments [2] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes Twitter users: real-time event detection by social sensors,” in Proceedings of the 19th international conference on World wide web, 2010, pp. 851–860.
Weaknesses
• Unable to detect unanticipated events
• Disregards languages not represented in the keyword list
25
Weaknesses
• Unable to detect unanticipated events
• Disregards languages not represented in the keyword list
• Often requires language models for text normalization
25
Weaknesses
• Unable to detect unanticipated events
• Disregards languages not represented in the keyword list
• Often requires language models for text normalization
• Limited insight into the context of the event
25
Step 1Identify Keywords
26
goal, score
Step 2
LABurst Algorithm
Find Bursts
goooal, 0-1, 0:1,1-0, gollll, holandaaaa, penal, penalti, persie
Can LABurst be useful in
other domains?
29
Earthquake Detection
Honshu, Japan Earthquake - 25 October 2013
Can LABurst be useful in
other domains?
29
Earthquake Detection
Honshu, Japan Earthquake - 25 October 2013
Iwaki, Japan Earthquake - 11 July 2014
Events occur across
different temporal and geographical
scales.
35
Japanese Earthquake, Oct. 2013
Boston Marathon, 15 Apr. 2013
Research Question:
How well do sports models transfer to non-sports events?
36
Event Date
Argentine Cacerolazo
Protest8 Nov. 2012
Boston Marathon Bombing
15 Apr. 2013
Westgate Mall Attack
21 Sept. 2013
Ukrainian Revolution
18-23 Feb. 2014
Ferguson, MO Protests
9-19 Aug. 2014
Research Question:
How can we track related moments?
39
Per-Half-Hour Bursty Tokens/Topics
Per-Minute Bursty Tokens
argentina
deutschland
germany
arg
ger
worldcup
goalkick
gol
goaaaalllgoalll
gooolll
goetze
Semantic Similarity
46M. O
sbor
ne, S
. Mor
an, R
. McC
read
ie, A
. Von
Lun
en,
M. S
ykor
a, E
. Can
o, N
. Ire
son,
C. M
acdo
nald
, I. O
unis
, Y.
He,
and
oth
ers,
“Rea
l-Tim
e D
etec
tion,
Tra
ckin
g, a
nd
Mon
itorin
g of
Aut
omat
ical
ly D
isco
vere
d Ev
ents
in
Soci
al M
edia
,” As
soc.
Com
put.
Ling
uist
., 20
14.
English “Security-Related” Keywords
One needs confidence in information
used to make critical
decisions.
52 http://thedailyshow.cc.com/videos/9qx6fp/the-most-busted-name-in-news
One needs confidence in information
used to make critical
decisions.
52
“The Most Busted Name in News”
http://thedailyshow.cc.com/videos/9qx6fp/the-most-busted-name-in-news
One needs confidence in information
used to make critical
decisions.
52
“The Most Busted Name in News”
http://thedailyshow.cc.com/videos/9qx6fp/the-most-busted-name-in-news
[1] N. Diakopoulos, M. De Choudhury, and M. Naaman, “Finding and assessing social media information sources in the context of journalism,” in Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems, 2012, pp. 2451–2460.
Most credibility analyses focus
on people rather than information.
53
[1] N. Diakopoulos, M. De Choudhury, and M. Naaman, “Finding and assessing social media information sources in the context of journalism,” in Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems, 2012, pp. 2451–2460.
Most credibility analyses focus
on people rather than information.
53
56
The “majority of content generated on Twitter during the [Mumbai terror attacks]
comes from non authority users.” [1]
56
The “majority of content generated on Twitter during the [Mumbai terror attacks]
comes from non authority users.” [1]
[1] A. Gupta and P. Kumaraguru, “Twitter explodes with activity in mumbai blasts! a lifeline or an unmonitored daemon in the lurking?,” 2012.
Research Question:
How can we evaluate the credibility of aggregate
event information?
57 http://davedecampblog.wordpress.com/shareyourstory/wordclouds/
58
Moment 8
Moment 7
Moment 6
Moment 5
Moment 4
Moment 3
Moment 2
Moment 1
0 22.5 45 67.5 90
Credibility Ranking
Research Question:
How can we evaluate the credibility of aggregate
event information?
59
Sparse versus Dense Networks
Moment 1 Moment 2
Research Question:
How can we evaluate the credibility of aggregate
event information?
59
Sparse versus Dense Networks
More Credible
Less Credible
Moment 1 Moment 2
Research Question:
How can we evaluate the credibility of aggregate
event information?
60 [1] M. Mendoza, B. Poblete, and C. Castillo, “Twitter Under Crisis: Can We Trust What We RT?,” in Proceedings of the First Workshop on Social Media Analytics, 2010, pp. 71–79.
“… false rumors tend to be questioned much more than
confirmed truths…” [1]
Discord
Moment 1 Moment 2
Research Question:
How can we evaluate the credibility of aggregate
event information?
60 [1] M. Mendoza, B. Poblete, and C. Castillo, “Twitter Under Crisis: Can We Trust What We RT?,” in Proceedings of the First Workshop on Social Media Analytics, 2010, pp. 71–79.
“… false rumors tend to be questioned much more than
confirmed truths…” [1]
Discord
Moment 1 Moment 2
More Credible
Less Credible
Summary of Proposed Work
• Extending LABurst • Adapting sports models to crisis-level events• Tracking related moments• Detecting and tracking events from streaming sources at scale
62
Summary of Proposed Work
• Extending LABurst • Adapting sports models to crisis-level events• Tracking related moments• Detecting and tracking events from streaming sources at scale
• Credibility Analysis • Evaluating the credibility of aggregate event information in near
real time
62
By integrating machine learning and high-volume streams across social networks, one can detect high-impact events, identify specific occurrences within those events, and evaluate credibility of those occurrences in near real time.
63
Thesis Statement
Keyword bursts across
language barriers
65
Match Event Bursty Tokens
Brazil v. Netherlands, 12 July
2014
Netherlands' Van Persie scores a goal
on a penalty at 3', 1-0
0-1, 1-0, 1:0, 1x0, card, goaaaaaaal, goal, gol, goool,
holandaaaa, kırmızı, pen, penal, penalti, pênalti, persie, red
Brazil v. Netherlands, 12 July
2014
Brazil's Oscar gets a yellow card at 68'
dive, juiz, penalty, ref
Germany v. Argentina, 13 July
2014
Germany’s Götze scores a goal at
113’, 1-0
goaaaaallllllll, goalllll, godammit,
goetze, gollllll, gooooool, gotze, gotzeeee, götze,
nooo, yessss, ドイツ
Keyword bursts across
language barriers
65
Match Event Bursty Tokens
Brazil v. Netherlands, 12 July
2014
Netherlands' Van Persie scores a goal
on a penalty at 3', 1-0
0-1, 1-0, 1:0, 1x0, card, goaaaaaaal, goal, gol, goool,
holandaaaa, kırmızı, pen, penal, penalti, pênalti, persie, red
Brazil v. Netherlands, 12 July
2014
Brazil's Oscar gets a yellow card at 68'
dive, juiz, penalty, ref
Germany v. Argentina, 13 July
2014
Germany’s Götze scores a goal at
113’, 1-0
goaaaaallllllll, goalllll, godammit,
goetze, gollllll, gooooool, gotze, gotzeeee, götze,
nooo, yessss, ドイツ
Keyword bursts across
language barriers
65
Match Event Bursty Tokens
Brazil v. Netherlands, 12 July
2014
Netherlands' Van Persie scores a goal
on a penalty at 3', 1-0
0-1, 1-0, 1:0, 1x0, card, goaaaaaaal, goal, gol, goool,
holandaaaa, kırmızı, pen, penal, penalti, pênalti, persie, red
Brazil v. Netherlands, 12 July
2014
Brazil's Oscar gets a yellow card at 68'
dive, juiz, penalty, ref
Germany v. Argentina, 13 July
2014
Germany’s Götze scores a goal at
113’, 1-0
goaaaaallllllll, goalllll, godammit,
goetze, gollllll, gooooool, gotze, gotzeeee, götze,
nooo, yessss, ドイツ