Click here to load reader

Retweeting Behavior and Spectral Graph Analysis in Social Media

  • Upload
    blaine

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

Retweeting Behavior and Spectral Graph Analysis in Social Media. Xintao Wu Jan 18, 2013 . Social Media Customer Analytics . Entity resolution Patterns Temporal/spatial Scalability Visualization Sentiment Privacy. Network topology. Structured profile. - PowerPoint PPT Presentation

Citation preview

Slide 1

Xintao Wu Jan 18, 2013

Retweeting Behavior and Spectral Graph Analysis in Social Media1Social Media Customer Analytics 2Network topology

namesexagediseasesalaryAdaF18cancer25kBobM25heart110kidSexageaddressIncome5FYNC25k3MYSC110kStructured profileRetweet sequenceUnstructured text (e.g., blog, tweet)Customer profileCustomer transactionInventoryProduct desc and reviewEntity resolutionPatternsTemporal/spatialScalabilityVisualizationSentimentPrivacy2OutlineExamining retweeting behavior to understand information propagationMulti-factor interaction analysis Coverage predictionBurst detectionSpectral graph analysisCommunity partitionFraud detection

3Multi-factor interaction analysis 4 For each following relationship , what factors affect the user As decision on whether to forward messages from B to A s followers?We examine users retweet behaviors by using various featuresPower ratio (A)Link structure (B)Location factor (C)Gender factor (D) We apply a fitted Log-linear model to capture and interpret interaction patterns among features A-D and retweet E.

4Interpreting interaction effect5

Interpretation exampleNeither gender nor location has any significant effect on retweeting solely.However, considering link structure, Females are more conservative and have a lower tendency to retweet messages from non-friend (especially female) users, but have a higher tendency to retweet messages from friends or superstars. Males are more open-minded and have a higher tendency to retweet messages from non-friend (especially female) users.66OutlineExamining retweeting behavior to understand information propagationMulti-factor interaction analysis Coverage predictionBurst detectionSpectral graph analysisCommunity partitionFraud detection

7Retweet Sequence Information dynamically flows through the network.8AliceBobCathyDavidEllenFredD1D2D3t1m1ARetweet SequenceInformation dynamically flows through a social network.9AliceBobCathyDavidEllenFredD1D2D3t1m1At2m2Bt1m1AFlow Through Tree StructureInformation dynamically flows through a social network.10AliceBobCathyDavidEllenFredD1D2D3t1m1At2m2Bt1m1At3m3D\t Bt1m1AFlow Through Tree StructureInformation dynamically flows through a social network.11AliceBobCathyDavidEllenFredD1D2D3t1m1At2m2Bt1m1At3m3D\t Bt1m1At4m4Ct1m1AWISE12 ChallengeSina Weibo# of user: 5,636,858# of tweets: 46,584,914# of retweets: 190,920,026

33 test messages each with 100 initial retweets composed by 27 users from 6 events

For each message, predictM1: the number of retweets in 30 daysM2: the number of possible-views in 30 days

12IdeaWe treat retweeting activities of each original message in the training data as a time series Each value corresponds to the number of times that the original message during time period tFor each message in the test data

13

Known from 100 retweetsUse ARMA to predictPrediction Result

14Runner-up award (2nd place) on WISE 2012 Challenge Mining Track.Death of Steve JobsXiaomi ReleaseYao Jiaxin Murder Case Xiaomi ReleaseOutlineExamining retweeting behavior to understand information propagationMulti-factor interaction analysis Coverage predictionBurst detectionSpectral graph analysisCommunity partitionFraud detection

15Bursts16

Peak TimeDuration TimeWe found that some particular tweet would be heated and widely-retweeted continually even after a long time they had been posted, while most tweets would not be retweeted in a very short period of time after they were potsted.

81:8% of the tweets would not be retweeted any more after one day of their originally posted time;6:28% of the tweets would survive for two days; A few tweets would still be retweeted after 100 days.

16Topic 17Retweet vs. Time18Retweet vs. Time19Burst Analysis : UsersTop 100 users tend to have: shorter path length, shorter peak time, shorter duration time.20

Burst PredictionExtract featuresUser related including profile and history informationTweet-related including time series and retweet tree Run classifiersLogistic regressionRandom forestDecision treeNave bayesSVMKNNAchieve 83.2% accuracy

21OutlineExamining retweeting behavior to understand information propagationMulti-factor interaction analysis Coverage predictionBurst detectionSpectral graph analysisCommunity partitionFraud detection

22Spectral graph analysis

Spectral coordinate:

Polbook Network2323Accuracy of AdjClusterLap [Miller and Teng 1998]: Laplacian based Ncut [Shi and Malik, 2000]: Normalized cutHE [Wakita and Tsurumi, 2007]: Modularity based agglomerative clustering SpokEn [Prakash et al., 2010]: EigenSpoke

Accuracy: where :the i-th community produced by different algorithms

24

Refer to IJCAI 11 for detailsOr add introduction of Lap NCut He SpokEn here instead of in the introduction section?Evaluation on Web spam challenge data

SPCTRA fraud detection

25GREEDY: based on outer-triangles [Shrivastava, ICDE, 2008]100-1000 times fasterRefer to ICDE11details.25AcknowledgmentsThis work was supported in part by U.S. National Science Foundation CNS-0831204 and CCF-1047621, and UNC Charlotte Chancellors Special Fund .

Thank You! Questions?

2626