28
Credibility Ranking of Tweets during High Impact Events Adi$ Gupta & Ponnurangam Kumaraguru PSOSM@WWW April 17, 2012

Credibility Ranking of Tweets during High Impact Events

  • Upload
    precog

  • View
    412

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Credibility Ranking of Tweets during High Impact Events

Credibility  Ranking  of  Tweets  during  High  Impact  Events  

Adi$  Gupta  &  Ponnurangam  Kumaraguru  PSOSM@WWW  April  17,  2012  

Page 2: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Problem  MoOvaOon  

2  

Page 3: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Problem  MoOvaOon  

3  

Informa$on  

Opinion  

Spam  

Page 4: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Outline  

•  Research  statement  •  Architecture  •  Data  collecOon  •  Analysis  •  Results  •  ImplementaOon  •  Future  direcOon  

4  

Page 5: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Research  Statement  

•  IdenOfy  parameters  that  affect  credibility  of  content  on  TwiTer  

•  Develop  a  semi-­‐automated  algorithm  to  assess  credibility  of  tweets    

5  

Page 6: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Terminology  

6  

TWEET:  A  status  (140  chars)  

RETWEET   URL  HASHTAG  

USER  PROFILE  

Tweets  

USER  NAME  @screen_name  

FOLLOWERS  

@-­‐MENTIONS  

Page 7: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Credibility  

•  “The  quality  of  being  trusted  and  believed  in.”    

•  In  this  research  – Assess  the  credibility  of  the  informaOon  in  the  content  of  a  tweet  (message)  by  a  user  on  TwiTer.    

–   A  tweet  is  said  to  contain  credible  informaOon  about  a  news  event,  if  you  trust  or  believe  that  informaOon  in  the  tweet  to  be  correct  /  true.  

7  

Page 8: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

News  on  TwiTer  

8  

Topics  on  Twi7er  

News  Events   Chit-­‐Chat  

News  on  Twi7er  

Credible  Informa$on  

Non-­‐Credible  

Informa$on  

Fake  news  /  Rumors  /Spam  /  Personal  

Opinions  

E.g.  #nothingwrongwith,  #goodmorningtwiTer  

E.g.  #Irene,  #Libyacrisis      

Page 9: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Our  ContribuOons  •  30%  of  tweets  provide  informaOon  (17%  credible  informaOon)  

and  14%  was  spam    

•  Linear  logisOc  regression    –  Content  based:  #unique  characters,  swear  words,  pronouns  and  emoOcons  

–  User  based:  #followers  and  length  of  username    

•  Present  automated  algorithm  (supervised  ML  and  relevance  feedback)  to  assess  credibility  in  tweets  

9  

Page 10: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Data  StaOsOcs  

10  

•  High  impact  events:  – Greater  25K  tweets  – More  than  48  hours  in  trending  topics  

Total  tweets 35,748,136 Total  unique  users 6,877,320 Tweets  with  URLs 4,973,457 Number  of  singleton  tweets 22,481,898 Number  of  re-­‐tweets  /  replies 13,266,238 Start  date 12th  July,  2011 End  date 30th  August,  2011

Page 11: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Data  StaOsOcs  

11  

Page 12: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Data  StaOsOcs  Events Tweets Trending Topics

UK Riots 542,685 #ukriots, #londonri- ots, #prayforlondon

Libya Crisis 389,506 libya, tripoli

Earthquake in Virginia 277,604 #earthquake, Earth- quake in SF

JanLokPal Bill Agitation 182,692 Anna Hazare, #jan- lokpal, #anna

Apple CEO Steve Jobs resigns 158,816 Steve Jobs, Tim Cook, Apple CEO

US Downgrading 148,047 S&P, AAA to AA

Hurricane Irene 90,237 Hurricane Irene, Tropical Storm Irene

Google acquires Motorola Mobility 68,527 Google, Motorola Mobility

News of the World Scandal 67,602 Rupert Murdoch, #murdoch

Abercrombie & Fitch stocks drop 54,763 Abercrombie & Fitch, A&F

Muppets Bert and Ernie were gay 52,401 Bert and Ernie

Indiana State Fair Tragedy 49,924 Indiana State Fair

Mumbai Blast, 2011 32,156 #mumbaiblast, Dadar, #needhelp

New Facebook Messenger 28,206 Facebook Messenger

12  

Page 13: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Architecture  

13  

Page 14: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Human  AnnotaOon  

14  

•  For  each  tweet:  –  Tweet  contains  informaOon  about  the  event.  Rate  the  credibility  of  

informaOon  present:  •  Definitely  Credible  •  Seems  Credible  •  Definitely  Incredible  •  I  can’t  Decide  

–  Tweet  is  related  to  the  news  event,  but  contains  no  informaOon  –  Tweet  is  not  related  to  news  event  –  Skip  tweet    

•  Each  tweet  annotated  by  3  people  •  Inter-­‐annotator  agreement  (Cronbach  Alpha)  =  0.748  

 •  30%  of  tweets  provide  informaOon  (17%  credible  informaOon)  and  

14%  was  spam  

Page 15: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

ANALYSIS  

15  

Page 16: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Feature  Sets  

16  

Message based features Length of the tweet

Number of words

Number of unique characters

Number of hashtags

Number of retweets

Number of swear language words

Number of positive sentiment words

Number of negative sentiment words

Tweet is a retweet

Number of special symbols [$, !]

Number of emoticons [:-), :-(]

Tweet is a reply

Number of @- mentions

Number of retweets

Time lapse since the query

Has URL

Number of URLs

Use of URL shortener service

Message based features

Length of the tweet

Number of words

Source based features

Registration age of the user

Number of statuses

Number of followers

Number of friends

Is a verified account

Length of description

Length of screen name

Has URL

Ratio of followers to followees

Source based features

Registration age of the user

Number of statuses

Number of followers

Page 17: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

PRF  

•  PRF  (Pseudo  Relevance  Feedback)    – Extract  k  ranked  documents  and  then  re-­‐rank  those  documents  according  to  a  defined  score    

– Re-­‐ranking  based  on  ‘context’  of  the  event    

– Top  n  unigrams  based  on  BM25  metric  

17  

Page 18: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Algorithm  

18  

Page 19: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

EvaluaOon  Metric  

19  

EvaluaOon  Metric:  NDCG  (Normalized  Discounted  CumulaOve  Gain)          NDCG  is  the  standard  metric  used  to  evaluate  “graded”  results  

Page 20: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Ranking  Results  

20  

•  Tweet  and  user  based  features  contribute  in  determining  the  credibility  –  it  maTers  “what  you  post  and  who  you  are”    

•  Context  based  (PRF)  ranking  greatly  enhances  the  performance  (upto  .74  NDCG)  

Page 21: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Web-­‐portal  ImplementaOon  

21  

Page 22: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

LimitaOons  &  Future  Work  

•  Human  input  required  – Need  to  develop  self  learning  (completely  automated)  soluOons  

•  Analyze  events  with  a  greater  temporal  variaOon  

•  Understanding  user’s  perspecOve  of  credibility  of  content  on  TwiTer  

22  

Page 23: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Challenges  

•  Large  volume  of  data  being  generated  •  Real-­‐Ome  soluOons  needed  •  Only  140  characters  •  Informal  language  

23  

Page 24: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

Acknowledgements  

•  All  members  of  our  research  group  •  Dept.  of  InformaOon  Technology,  Government  of  India  

24  

Page 25: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

References  •  C.  CasOllo,  M.  Mendoza,  and  B.  Poblete.  InformaOon  Credibility  on  TwiTer.  

In  WWW,  pages  675–684,  2011.  •  J.  Chen,  R.  Nairn,  L.  Nelson,  M.  Bernstein,  and  E.  Chi.  Short  and  tweet:  

experiments  on  recommending  content  from  informaOon  streams.  CHI  ’10,  pages  1185–1194,  2010.  

•  J.  Ratkiewicz,  M.  Conover,  M.  Meiss,  B.  Gon  ̧calves,  S.  PaOl,  A.  Flammini,  and  F.  Menczer.  Truthy:  mapping  the  spread  of  astroturf  in  microblog  streams.  WWW  ’11.  

•  S.  E.  Robertson,  S.  Walker,  and  M.  Beaulieu.  Okapi  at  trec-­‐7:  automaOc  ad  hoc,  filtering,  vlc  and  interacOve  track.  IN,  1999.  

•  T.  Sakaki,  M.  Okazaki,  and  Y.  Matsuo.  Earthquake  shakes  twiTer  users:  real-­‐Ome  event  detecOon  by  social  sensors.  WWW  ’10,  2010.  

•  S.  Verma,  S.  Vieweg,  W.  J.  Corvey,  L.  Palen,  J.  H.  MarOn,  M.  Palmer,  A.  Schram,  and  K.  M.  Anderson.  Nlp  to  the  rescue?  extracOng  “situaOonal  awareness”  tweets  during  mass  emergency.  ICWSM,  2011.  

25  

Page 26: Credibility Ranking of Tweets during High Impact Events

     precog.iiitd.edu.in                    IIIT-­‐Delhi  

QuesOons?  

26  

Page 27: Credibility Ranking of Tweets during High Impact Events

           

Thank  You!      

[email protected]  [email protected]  

precog.iiitd.edu.in  

Page 28: Credibility Ranking of Tweets during High Impact Events

For  any  further  informaOon,  please  write  to  [email protected]  

precog.iiitd.edu.in  

28