Click here to load reader
Upload
blaine
View
53
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Retweeting Behavior and Spectral Graph Analysis in Social Media. Xintao Wu Jan 18, 2013 . Social Media Customer Analytics . Entity resolution Patterns Temporal/spatial Scalability Visualization Sentiment Privacy. Network topology. Structured profile. - PowerPoint PPT Presentation
Citation preview
Slide 1
Xintao Wu Jan 18, 2013
Retweeting Behavior and Spectral Graph Analysis in Social Media1Social Media Customer Analytics 2Network topology
namesexagediseasesalaryAdaF18cancer25kBobM25heart110kidSexageaddressIncome5FYNC25k3MYSC110kStructured profileRetweet sequenceUnstructured text (e.g., blog, tweet)Customer profileCustomer transactionInventoryProduct desc and reviewEntity resolutionPatternsTemporal/spatialScalabilityVisualizationSentimentPrivacy2OutlineExamining retweeting behavior to understand information propagationMulti-factor interaction analysis Coverage predictionBurst detectionSpectral graph analysisCommunity partitionFraud detection
3Multi-factor interaction analysis 4 For each following relationship , what factors affect the user As decision on whether to forward messages from B to A s followers?We examine users retweet behaviors by using various featuresPower ratio (A)Link structure (B)Location factor (C)Gender factor (D) We apply a fitted Log-linear model to capture and interpret interaction patterns among features A-D and retweet E.
4Interpreting interaction effect5
Interpretation exampleNeither gender nor location has any significant effect on retweeting solely.However, considering link structure, Females are more conservative and have a lower tendency to retweet messages from non-friend (especially female) users, but have a higher tendency to retweet messages from friends or superstars. Males are more open-minded and have a higher tendency to retweet messages from non-friend (especially female) users.66OutlineExamining retweeting behavior to understand information propagationMulti-factor interaction analysis Coverage predictionBurst detectionSpectral graph analysisCommunity partitionFraud detection
7Retweet Sequence Information dynamically flows through the network.8AliceBobCathyDavidEllenFredD1D2D3t1m1ARetweet SequenceInformation dynamically flows through a social network.9AliceBobCathyDavidEllenFredD1D2D3t1m1At2m2Bt1m1AFlow Through Tree StructureInformation dynamically flows through a social network.10AliceBobCathyDavidEllenFredD1D2D3t1m1At2m2Bt1m1At3m3D\t Bt1m1AFlow Through Tree StructureInformation dynamically flows through a social network.11AliceBobCathyDavidEllenFredD1D2D3t1m1At2m2Bt1m1At3m3D\t Bt1m1At4m4Ct1m1AWISE12 ChallengeSina Weibo# of user: 5,636,858# of tweets: 46,584,914# of retweets: 190,920,026
33 test messages each with 100 initial retweets composed by 27 users from 6 events
For each message, predictM1: the number of retweets in 30 daysM2: the number of possible-views in 30 days
12IdeaWe treat retweeting activities of each original message in the training data as a time series Each value corresponds to the number of times that the original message during time period tFor each message in the test data
13
Known from 100 retweetsUse ARMA to predictPrediction Result
14Runner-up award (2nd place) on WISE 2012 Challenge Mining Track.Death of Steve JobsXiaomi ReleaseYao Jiaxin Murder Case Xiaomi ReleaseOutlineExamining retweeting behavior to understand information propagationMulti-factor interaction analysis Coverage predictionBurst detectionSpectral graph analysisCommunity partitionFraud detection
15Bursts16
Peak TimeDuration TimeWe found that some particular tweet would be heated and widely-retweeted continually even after a long time they had been posted, while most tweets would not be retweeted in a very short period of time after they were potsted.
81:8% of the tweets would not be retweeted any more after one day of their originally posted time;6:28% of the tweets would survive for two days; A few tweets would still be retweeted after 100 days.
16Topic 17Retweet vs. Time18Retweet vs. Time19Burst Analysis : UsersTop 100 users tend to have: shorter path length, shorter peak time, shorter duration time.20
Burst PredictionExtract featuresUser related including profile and history informationTweet-related including time series and retweet tree Run classifiersLogistic regressionRandom forestDecision treeNave bayesSVMKNNAchieve 83.2% accuracy
21OutlineExamining retweeting behavior to understand information propagationMulti-factor interaction analysis Coverage predictionBurst detectionSpectral graph analysisCommunity partitionFraud detection
22Spectral graph analysis
Spectral coordinate:
Polbook Network2323Accuracy of AdjClusterLap [Miller and Teng 1998]: Laplacian based Ncut [Shi and Malik, 2000]: Normalized cutHE [Wakita and Tsurumi, 2007]: Modularity based agglomerative clustering SpokEn [Prakash et al., 2010]: EigenSpoke
Accuracy: where :the i-th community produced by different algorithms
24
Refer to IJCAI 11 for detailsOr add introduction of Lap NCut He SpokEn here instead of in the introduction section?Evaluation on Web spam challenge data
SPCTRA fraud detection
25GREEDY: based on outer-triangles [Shrivastava, ICDE, 2008]100-1000 times fasterRefer to ICDE11details.25AcknowledgmentsThis work was supported in part by U.S. National Science Foundation CNS-0831204 and CCF-1047621, and UNC Charlotte Chancellors Special Fund .
Thank You! Questions?
2626