119
Discovering Credible Events in Near Real Time from Social Media Streams WWW2015 Cody Buntain, @codybuntain, [email protected] Advised by Dr. Jen Golbeck

Discovering Credible Events in Near Real Time from Social Media Streams

Embed Size (px)

Citation preview

Discovering Credible Events in Near Real Time from Social Media Streams

WWW2015Cody Buntain, @codybuntain, [email protected] by Dr. Jen Golbeck

Social networks are

powerful resources

during crises.

2

Social networks are

powerful resources

during crises.

2 Carvin, A. "The 2008 Mumbai Attacks As They Happened On Twitter." Storify.com. https://storify.com/acarvin/the-2008-mumbai-attacks-as-they-happened-on-twitte

Social networks are

powerful resources

during crises.

3 B. Parr. “#IranElection Crisis: A Social Media Timeline.” 21 June 2009. http://mashable.com/2009/06/21/iran-election-timeline/

Social networks are

powerful resources

during crises.

4Huffington Post. "Chile Earthquake PICTURES: Twitter Photos Record The Wreckage In Chile (PHOTOS)."

http://www.huffingtonpost.com/2010/02/27/chile-earthquake-pictures_n_479535.html

Social networks are

powerful resources

during crises.

5

Social networks are

powerful resources

during crises.

5 Alex Nunns, Nadia Idle. "Tweets from Tahrir: Egypt's Revolution as it Unfolded, in the Words of the People who Made it." OR Books. 2011.

Social networks are

powerful resources

during crises.

6

E. Guskin, P. Hitlin. "Hurricane Sandy and Twitter." November 6, 2012. http://www.journalism.org/2012/11/06/hurricane-sandy-and-twitter/

C. Ngak. "Social media a news source and tool during Superstorm Sandy." October 30, 2012. http://www.cbsnews.com/news/social-media-a-news-source-and-tool-during-superstorm-sandy/

Social networks are

powerful resources

during crises.

7C. Kanalley. "Boston Marathon Bombing Timeline: The Week in 50 Tweets, 5 Videos." Huffington Post. 21 April 2013. http://www.huffingtonpost.com/craig-kanalley/boston-marathon-bombing-timeline_b_3125721.html

Social networks are

powerful resources

during crises.

7C. Kanalley. "Boston Marathon Bombing Timeline: The Week in 50 Tweets, 5 Videos." Huffington Post. 21 April 2013. http://www.huffingtonpost.com/craig-kanalley/boston-marathon-bombing-timeline_b_3125721.html

8 http://srogers.cartodb.com/viz/64f6c0f4-745d-11e4-b4e1-0e4fddd5de28/embed_map

8 http://srogers.cartodb.com/viz/64f6c0f4-745d-11e4-b4e1-0e4fddd5de28/embed_map

11

Social media is not completely

trustworthy.

11

Social media is not completely

trustworthy.

12

Social media is not completely

trustworthy.

12

Social media is not completely

trustworthy.

12

Social media is not completely

trustworthy.

13

C. Reinwald. "What Twitter Got Wrong During the Week Following Last Year’s Boston Marathon." Boston.com. 18 April 2014.

http://www.boston.com/news/local/massachusetts/2014/04/18/what-twitter-got-wrong-during-the-boston-marathon-bombing-week/

ZOYLJpEydYgJ8UYNUT674H/story.html

Social media is not completely

trustworthy.

13

C. Reinwald. "What Twitter Got Wrong During the Week Following Last Year’s Boston Marathon." Boston.com. 18 April 2014.

http://www.boston.com/news/local/massachusetts/2014/04/18/what-twitter-got-wrong-during-the-boston-marathon-bombing-week/

ZOYLJpEydYgJ8UYNUT674H/story.html

Social media is not completely

trustworthy.

13

C. Reinwald. "What Twitter Got Wrong During the Week Following Last Year’s Boston Marathon." Boston.com. 18 April 2014.

http://www.boston.com/news/local/massachusetts/2014/04/18/what-twitter-got-wrong-during-the-boston-marathon-bombing-week/

ZOYLJpEydYgJ8UYNUT674H/story.html

Social media is not completely

trustworthy.

14Lu Wang, Whitney Kisling and Eric Lam. "Fake Post Erasing $136 Billion Shows Markets Need Humans." Bloomberg.com. 23 April, 2013. http://www.bloomberg.com/news/2013-04-23/fake-report-erasing-136-billion-shows-market-s-fragility.html

Social media is not completely

trustworthy.

14Lu Wang, Whitney Kisling and Eric Lam. "Fake Post Erasing $136 Billion Shows Markets Need Humans." Bloomberg.com. 23 April, 2013. http://www.bloomberg.com/news/2013-04-23/fake-report-erasing-136-billion-shows-market-s-fragility.html

Social media is not completely

trustworthy.

14Lu Wang, Whitney Kisling and Eric Lam. "Fake Post Erasing $136 Billion Shows Markets Need Humans." Bloomberg.com. 23 April, 2013. http://www.bloomberg.com/news/2013-04-23/fake-report-erasing-136-billion-shows-market-s-fragility.html

Social media is not completely

trustworthy.

14Lu Wang, Whitney Kisling and Eric Lam. "Fake Post Erasing $136 Billion Shows Markets Need Humans." Bloomberg.com. 23 April, 2013. http://www.bloomberg.com/news/2013-04-23/fake-report-erasing-136-billion-shows-market-s-fragility.html

How can we decide what is important and what to trust in these streams?

15

How can we decide what is important and what to trust in these streams?

15

How can we decide what is important and what to trust in these streams?

15

And can we make these decisions rapidly?

By integrating machine learning and high-volume streams across social networks, one can detect high-impact events, identify specific occurrences within those events, and evaluate credibility of those occurrences in near real time.

17

Thesis Statement

By integrating machine learning and high-volume streams across social networks, one can detect high-impact events, identify specific occurrences within those events, and evaluate credibility of those occurrences in near real time.

18

Thesis Statement

By integrating machine learning and high-volume streams across social networks, one can detect high-impact events, identify specific occurrences within those events, and evaluate credibility of those occurrences in near real time.

19

Thesis Statement

By integrating machine learning and high-volume streams across social networks, one can detect high-impact events, identify specific occurrences within those events, and evaluate credibility of those occurrences in near real time.

20

Thesis Statement

Two Research

Areas

21

Two Research

Areas

21

Event

Event

Event

EventRapid Event Discovery

Two Research

Areas

21

Event

Event

Event

EventRapid Event Discovery

Credibility Analysis

Completed Work

22 * www.iflscience.com

Discovering Interesting Moments with

Burst Detection

Typical event detection

systems track human-

generated, predefined keywords.

23

Tweets per second mentioning “gol copa, gool copa, goool, golaço” during the match June 12th, 2014 [1]

Tweets per hour related to earthquakes [2]

[1] L. Cipriani, “Goal! Detecting the most important World Cup moments,” Twitter Developer’s Blog. 23 June 2014. https://blog.twitter.com/2014/goal-detecting-the-most-important-world-cup-moments [2] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes Twitter users: real-time event detection by social sensors,” in Proceedings of the 19th international conference on World wide web, 2010, pp. 851–860.

24

Step 1Identify Keywords

24

goal, score

Step 2Find Bursts

Typical Approach

Weaknesses

25

Weaknesses

• Unable to detect unanticipated events

25

Weaknesses

• Unable to detect unanticipated events

• Disregards languages not represented in the keyword list

25

Weaknesses

• Unable to detect unanticipated events

• Disregards languages not represented in the keyword list

• Often requires language models for text normalization

25

Weaknesses

• Unable to detect unanticipated events

• Disregards languages not represented in the keyword list

• Often requires language models for text normalization

• Limited insight into the context of the event

25

Step 1Identify Keywords

26

goal, score

Step 2

LABurst Algorithm

Find Bursts

Step 1Identify Keywords

26

goal, score

Step 2

LABurst Algorithm

Find Bursts

Step 1Identify Keywords

26

goal, score

Step 2

LABurst Algorithm

Find Bursts

Step 1Identify Keywords

26

goal, score

Step 2

LABurst Algorithm

Find Bursts

goooal, 0-1, 0:1,1-0, gollll, holandaaaa, penal, penalti, persie

LABurst Algorithm

Discover Unexpected Moments

27

LABurst Algorithm

Discover Unexpected Moments

27

LABurst Algorithm

Discover Unexpected Moments

27

suarez, bit,

biting

Identify Keywords

Performance vs. Baselines

28

Can LABurst be useful in

other domains?

29

Can LABurst be useful in

other domains?

29

Earthquake Detection

Can LABurst be useful in

other domains?

29

Earthquake Detection

Honshu, Japan Earthquake - 25 October 2013

Can LABurst be useful in

other domains?

29

Earthquake Detection

Honshu, Japan Earthquake - 25 October 2013

Iwaki, Japan Earthquake - 11 July 2014

Proposed Work

30

Event

Event

Event

EventRapid Event Discovery

Credibility Analysis

31

Event

Event

Event

EventRapid Event Discovery

Extending LABurst to New Domains and Scales

31

Event

Event

Event

EventRapid Event Discovery

Extending LABurst to New Domains and Scales

Is LABurst adaptable to more critical

events?

32

Is LABurst adaptable to more critical

events?

32

Is LABurst adaptable to more critical

events?

32

Is LABurst adaptable to more critical

events?

33

Is LABurst adaptable to more critical

events?

33

Is LABurst adaptable to more critical

events?

34

Events occur across

different temporal and geographical

scales.

35

Events occur across

different temporal and geographical

scales.

35

Japanese Earthquake, Oct. 2013

Events occur across

different temporal and geographical

scales.

35

Japanese Earthquake, Oct. 2013

Boston Marathon, 15 Apr. 2013

Research Question:

How well do sports models transfer to non-sports events?

36

Research Question:

How well do sports models transfer to non-sports events?

36

Event Date

Argentine Cacerolazo

Protest8 Nov. 2012

Boston Marathon Bombing

15 Apr. 2013

Westgate Mall Attack

21 Sept. 2013

Ukrainian Revolution

18-23 Feb. 2014

Ferguson, MO Protests

9-19 Aug. 2014

vs.

Events Discovered by LABurst

Research Question:

How can we track related moments?

38

Research Question:

How can we track related moments?

38

Similarity

Research Question:

How can we track related moments?

39

Research Question:

How can we track related moments?

39

Per-Half-Hour Bursty Tokens/Topics

Per-Minute Bursty Tokens

argentina

deutschland

germany

arg

ger

worldcup

goalkick

gol

goaaaalllgoalll

gooolll

goetze

Semantic Similarity

Research Question:

How can we track related moments?

40

Moment 1

Moment 2

Location Similarity

Research Question:

How can we track related moments?

41

Moment 1

Moment 2

Network Similarity

Event Detection at Scale

42

43

44

45

46M. O

sbor

ne, S

. Mor

an, R

. McC

read

ie, A

. Von

Lun

en,

M. S

ykor

a, E

. Can

o, N

. Ire

son,

C. M

acdo

nald

, I. O

unis

, Y.

He,

and

oth

ers,

“Rea

l-Tim

e D

etec

tion,

Tra

ckin

g, a

nd

Mon

itorin

g of

Aut

omat

ical

ly D

isco

vere

d Ev

ents

in

Soci

al M

edia

,” As

soc.

Com

put.

Ling

uist

., 20

14.

English “Security-Related” Keywords

47

LABurst at Scale

48

LABurst at Scale

No Seed Keywords

Proposed Work

50

Event

Event

Event

EventRapid Event Discovery

Credibility Analysis

51

Credibility Analysis

51

Credibility Analysis

51

Credibility Analysis

Near Real-Time Credibility Evaluation of Discovered Events

One needs confidence in information

used to make critical

decisions.

52 http://thedailyshow.cc.com/videos/9qx6fp/the-most-busted-name-in-news

One needs confidence in information

used to make critical

decisions.

52

“The Most Busted Name in News”

http://thedailyshow.cc.com/videos/9qx6fp/the-most-busted-name-in-news

One needs confidence in information

used to make critical

decisions.

52

“The Most Busted Name in News”

http://thedailyshow.cc.com/videos/9qx6fp/the-most-busted-name-in-news

Most credibility analyses focus

on people rather than information.

53

[1] N. Diakopoulos, M. De Choudhury, and M. Naaman, “Finding and assessing social media information sources in the context of journalism,” in Proceedings of

the SIGCHI Conference on Human Factors in Computing Systems, 2012, pp. 2451–2460.

Most credibility analyses focus

on people rather than information.

53

[1] N. Diakopoulos, M. De Choudhury, and M. Naaman, “Finding and assessing social media information sources in the context of journalism,” in Proceedings of

the SIGCHI Conference on Human Factors in Computing Systems, 2012, pp. 2451–2460.

Most credibility analyses focus

on people rather than information.

53

Other approaches

trust a subset of curated,

“trustworthy” authors.

54

Other approaches

trust a subset of curated,

“trustworthy” authors.

55

56

The “majority of content generated on Twitter during the [Mumbai terror attacks]

comes from non authority users.” [1]

56

The “majority of content generated on Twitter during the [Mumbai terror attacks]

comes from non authority users.” [1]

[1] A. Gupta and P. Kumaraguru, “Twitter explodes with activity in mumbai blasts! a lifeline or an unmonitored daemon in the lurking?,” 2012.

Research Question:

How can we evaluate the credibility of aggregate

event information?

57 http://davedecampblog.wordpress.com/shareyourstory/wordclouds/

58

Moment 8

Moment 7

Moment 6

Moment 5

Moment 4

Moment 3

Moment 2

Moment 1

0 22.5 45 67.5 90

Credibility Ranking

Research Question:

How can we evaluate the credibility of aggregate

event information?

59

Sparse versus Dense Networks

Moment 1 Moment 2

Research Question:

How can we evaluate the credibility of aggregate

event information?

59

Sparse versus Dense Networks

More Credible

Less Credible

Moment 1 Moment 2

Research Question:

How can we evaluate the credibility of aggregate

event information?

60 [1] M. Mendoza, B. Poblete, and C. Castillo, “Twitter Under Crisis: Can We Trust What We RT?,” in Proceedings of the First Workshop on Social Media Analytics, 2010, pp. 71–79.

“… false rumors tend to be questioned much more than

confirmed truths…” [1]

Discord

Moment 1 Moment 2

Research Question:

How can we evaluate the credibility of aggregate

event information?

60 [1] M. Mendoza, B. Poblete, and C. Castillo, “Twitter Under Crisis: Can We Trust What We RT?,” in Proceedings of the First Workshop on Social Media Analytics, 2010, pp. 71–79.

“… false rumors tend to be questioned much more than

confirmed truths…” [1]

Discord

Moment 1 Moment 2

More Credible

Less Credible

Research Question:

How can we evaluate the credibility of aggregate

event information?

61

Summary of Proposed Work

62

Summary of Proposed Work

• Extending LABurst • Adapting sports models to crisis-level events• Tracking related moments• Detecting and tracking events from streaming sources at scale

62

Summary of Proposed Work

• Extending LABurst • Adapting sports models to crisis-level events• Tracking related moments• Detecting and tracking events from streaming sources at scale

• Credibility Analysis • Evaluating the credibility of aggregate event information in near

real time

62

By integrating machine learning and high-volume streams across social networks, one can detect high-impact events, identify specific occurrences within those events, and evaluate credibility of those occurrences in near real time.

63

Thesis Statement

Thanks!

Questions?

64

Contact: @codybuntain

[email protected]

Keyword bursts across

language barriers

65

Keyword bursts across

language barriers

65

Match Event Bursty Tokens

Brazil v. Netherlands, 12 July

2014

Netherlands' Van Persie scores a goal

on a penalty at 3', 1-0

0-1, 1-0, 1:0, 1x0, card, goaaaaaaal, goal, gol, goool,

holandaaaa, kırmızı, pen, penal, penalti, pênalti, persie, red

Brazil v. Netherlands, 12 July

2014

Brazil's Oscar gets a yellow card at 68'

dive, juiz, penalty, ref

Germany v. Argentina, 13 July

2014

Germany’s Götze scores a goal at

113’, 1-0

goaaaaallllllll, goalllll, godammit,

goetze, gollllll, gooooool, gotze, gotzeeee, götze,

nooo, yessss, ドイツ

Keyword bursts across

language barriers

65

Match Event Bursty Tokens

Brazil v. Netherlands, 12 July

2014

Netherlands' Van Persie scores a goal

on a penalty at 3', 1-0

0-1, 1-0, 1:0, 1x0, card, goaaaaaaal, goal, gol, goool,

holandaaaa, kırmızı, pen, penal, penalti, pênalti, persie, red

Brazil v. Netherlands, 12 July

2014

Brazil's Oscar gets a yellow card at 68'

dive, juiz, penalty, ref

Germany v. Argentina, 13 July

2014

Germany’s Götze scores a goal at

113’, 1-0

goaaaaallllllll, goalllll, godammit,

goetze, gollllll, gooooool, gotze, gotzeeee, götze,

nooo, yessss, ドイツ

Keyword bursts across

language barriers

65

Match Event Bursty Tokens

Brazil v. Netherlands, 12 July

2014

Netherlands' Van Persie scores a goal

on a penalty at 3', 1-0

0-1, 1-0, 1:0, 1x0, card, goaaaaaaal, goal, gol, goool,

holandaaaa, kırmızı, pen, penal, penalti, pênalti, persie, red

Brazil v. Netherlands, 12 July

2014

Brazil's Oscar gets a yellow card at 68'

dive, juiz, penalty, ref

Germany v. Argentina, 13 July

2014

Germany’s Götze scores a goal at

113’, 1-0

goaaaaallllllll, goalllll, godammit,

goetze, gollllll, gooooool, gotze, gotzeeee, götze,

nooo, yessss, ドイツ