Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

Embed Size (px)

Citation preview

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    1/34

    Fron%ersof

    Computa%onalJournalism

    ColumbiaJournalismSchool

    Week3:SocialFilteringSeptember25,2013

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    2/34

    Week5:SocialFiltering

    Findingsourcesonsocialmedia

    Par%cipatoryJournalism

    Informa%onDistribu%ononSocialNetworks

    SocialSoHware

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    3/34

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    4/34

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    5/34

    ClassifyUsers

    Classicmachinelearningproblem.Classifyeach

    userasoneof:

    journalist/blogger organiza%on ordinaryindividualFirst,needtoencodeasavector/select

    features...

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    6/34

    Featuresforuserclassifier

    #offollowers/following #ofposts,favorites

    percentageofpoststhatareRTs,@replies,links

    presence/absenceofnameden%%es topicdistribu%onoftweets (IPTCtopleveltopics)

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    7/34

    Digression:IPTCMediaTopicCodes

    Interna%onalstandardhierarchicaltaxonomy,partoftheNewsMLmarkupsystem.DefinedbyReuters,AP,

    NYTimes...

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    8/34

    K-nearestneighborclassifier

    TakeKclosesttrainingpoints(inhighdimensional

    featurespace),choosemajoritylabel.

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    9/34

    Crea%ngthetrainingdata

    1,850randomusers

    1,532knownorganiza%ons

    1,490knownjournalistsandbloggers

    iredMechanicalTurkworkerstoapplylabels.

    Eachuserlabeledbytwoworkers,discardedifdisagreement.

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    10/34

    ClassifierAccuracy

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    11/34

    Eyewitnessclassifier

    Goalistofindindividualtweetsthatareeyewitnessreports.

    StartedwithLIWC(linguis%cinquiryandwordcount)dic%onarythatclassifiesEnglishwordsalong70differentdimensions,includingemo%on,cogni%on,%me,health...

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    12/34

    WordAspects

    Usedpercep%oncategorywords

    plusinsightandcertaintywords

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    13/34

    Eyewitnesstweetclassifier

    Itsaneyewitnesstweetifitcontainsanyof

    thesespecialwords!(ortheirstems)

    ighprecision!Lowrecall.

    89oftweetsclassifiedaseyewitnessactuallywere.

    Butonly32ofeyewitnesstweetsdetected.

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    14/34

    Otherdimensions

    TweetcontainsURLtophotoorvideo(usedtableofdomainnames,e.g.flickr.com=photo)

    Postedfrommobiledevice(fromtweetmetadatanaming

    pos%ngapp)

    Geocodeusersstatedloca%on(thisispainfulandunreliable)

    Distribu%onoffriendsloca%ons.(Friend=mutualfollowing)

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    15/34

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    16/34

    Testuserreac%ons

    Thisgivesyoucontextyouhavethecontextforwhetherornotyouthinktheyrereputableorwhetherornottheyreworthreachingoutto.

    Itsgivingmealotofcontextwhichisreallyusefulwhenyouretryingtoverifyifsomeoneisreputableornot.

    Iwouldtendtofocusontheeyewitnessesandjournalists/bloggers.EventuallyIdlookateveryoneelsebutIdwanttostartmysearchwiththosetwogroupsbecausetheywouldnormallyprovidemewiththemostinformaCon.

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    17/34

    Testuserreac%ons

    Popularfeatures:

    Eyewitnessfiltering,userloca%on,image/videofilter

    Unpopularfeatures:

    En%tyextrac%onnothelpful,noabilitytofilterbyloca%onandeyewitnessstatus,focusonusers

    insteadofcontent

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    18/34

    Week5:SocialFiltering

    Findingsourcesonsocialmedia

    Par%cipatoryJournalism

    Informa%onDistribu%ononSocialNetworks

    SocialSoHware

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    19/34

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    20/34

    User

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    21/34

    User

    storiesnotcovered

    filtering

    x

    x

    x

    x

    x

    x

    x

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    22/34

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    23/34

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    24/34

    x

    x

    x

    x

    x

    whouserchoosestofollow=

    socialfiltering

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    25/34

    Week5:SocialFiltering

    Findingsourcesonsocialmedia

    Par%cipatoryJournalism

    Informa%onDistribu%ononSocialNetworks

    SocialSoHware

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    26/34

    Twierfollowernetwork

    Wehavecrawledtheen%reTwiersiteandobtained41.7millionuserprofiles,1.47billionsocialrela%ons,4,262trendingtopics,and106milliontweets.Initsfollower-followingtopologyanalysiswehavefounda

    non-power-lawfollowerdistribu%on,ashorteffec%vediameter,andlowreciprocity,whichallmarkadevia%onfromknowncharacteris%csofhumansocialnetworks

    -Kwaket.al,WhatisTwier,aSocialNetworkoraNewsMedia?

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    27/34

    Morefollowingsthanfollowers

    S ll di b d

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    28/34

    Smallavgdistancebetweentwonodes(why?andwhatdoesthismean?)

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    29/34

    Itsanewsnetwork

    Smallnumberofhigh-degreehubs

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    30/34

    ItsanewsnetworkSmallnumberofhigh-degreehubs

    Differentnetworkstructurethane.g.Facebook.

    Differentuses.

    why?

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    31/34

    Week5:SocialFiltering

    Findingsourcesonsocialmedia

    Par%cipatoryJournalism

    Informa%onDistribu%ononSocialNetworks

    SocialSoHware

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    32/34

    SocialSoHware

    Basicassump%on:structureofsoHwareinfluenceshowgroupsuseit.

    or:architectureinfluencesbehavior

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    33/34

    Threewaystoinfluencebehavior

    Norms:culture,habits,e%quee,theusers

    senseofwhatisrightorappropriate

    Laws:rulesenforcedbytheadministrator

    Code:whatitisactuallypossibletodo

  • 7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 4: Social Filtering

    34/34

    Designproblem...

    Whatdowewanttheuserstoaccomplish

    together?

    owdoweencouragethis?

    Wecanwritethecode,butthecultureistosomedegreebeyondourpredic%onorcontrol.