Download pdf - Mining and Comparing Engagement Dynamics Across Multiple Social Media Platforms #websci14

Mining and Comparing Engagement Dynamics Across Multiple Social Media Platforms

Matthew Rowe Lancaster University, UK

@halani

harith-alani

@halani

ACM Web Science Conference (WebSci) 2014, Bloomington, IND

http://people.kmi.open.ac.uk/harith/

Harith Alani Knowledge Media institute, UK

Engagement in Social Media

Moving on … §  How can we move on

from these (micro) studies?

§  Are results consistent across datasets, and platforms?

§  One way forward is: §  Multiple platforms §  Multiple topics

Publications on "social media analysis”

0

100

200

300

400

500

600

2006 2007 2008 2009 2010 2011 2012 2013

Publications on "social media analysis"

Papers studying single/multiple social media platforms




Apples and Oranges

§  We mix and compare different features, datasets, and platforms

§  Aim is to figure out their similarities and differences

Contributions

§  Examine replying dynamics as a modality of engagement

§  Define a framework of engagement analysis that fits multiple social platforms

§  Show the varying features at play in different platforms, and where the similarities and differences are

§  Contrast the role of different features on engagement likelihood across five social media platforms

§  Compare results to relevant literature on same or different platforms and engagement indicators

7 datasets from 5 platforms Platform Posts Users Seeds Non-seeds Replies

Boards.ie 6,120,008 65,528 398,508 81,273 5,640,227

Twitter Random 1,468,766 753,722 144,709 930,262 390,795

Twitter (Haiti Earthquake)

65,022 45,238 1,835 60,686 2,501

Twitter (Obama State of Union Address)

81,458 67,417 11,298 56,135 14,025

SAP 427,221 32,926 87,542 7,276 332,403

Server Fault 234,790 33,285 65,515 6,447 162,828

Facebook 118,432 4,745 15,296 8,123 95,013

Seed posts are those that receive a reply Non-seed posts are those with no replies

Data Balancing Platform Seeds Non-seeds Instance Count

Boards.ie 398,508 81,273 162,546

Twitter Random 144,709 930,262 289,418

Twitter (Haiti Earthquake)

1,835 60,686 3,670

Twitter (Obama State of Union Address)

11,298 56,135 22,596

SAP 87,542 7,276 14,552

Server Fault 65,515 6,447 12,894

Facebook 15,296 8,123 16,246

Total 521,922

For each dataset, an equal number of seeds and non-seed posts are used in the analysis.

Features §  Post Length: number of words in

the post §  Complexity: Measures the

cumulative entropy of terms in a post

§  Readability: Gunning Fog index, gauges how hard the post is to parse by readers, and LIX Readability metric to determine complexity of words based on number of letters

§  Referral Count: number of URLs in the post

§  Informativeness: TF-IDF of the post

§  Polarity: average sentiment polarity of the post (using SentiWordnet)

§  In-degree: number of in-coming social connections (explicit or implicit)

§  Out-degree: number of out-going social connections (explicit or implicit)

§  Post Count: number of posts made in previous 6 months

§  User Age: length of membership in community in days

§  Post Rate: number of posts by the user per day

Social Features

Content Features

Classification of Posts

Seed Posts Non-Seed Posts

§  Binary classification model

§  Trained with social, content, and combined features §  80/20 training/testing

§  Compare results across platforms, to see how a change in each feature is associated with likelihood of engagement

§  Compare engagement dynamics from our platforms against the literature

Classification Results Feature P R F1

Social 0.592 0.591 0.591

Content 0.664 0.660 0.658

Social+Content 0.670 0.666 0.665

(Random) (Haiti Earthquake)

(Obama’s State Union Address)

P R F1

0.561 0.561 0.560

0.612 0.612 0.611

0.628 0.628 0.628

P R F1

0.968 0.966 0.966

0.752 0.747 0.747

0.974 0.973 0.973

Feature P R F1

Social 0.542 0.540 0.539

Content 0.650 0.642 0.639


P R F1

0.650 0.631 0.628

0.575 0.541 0.521

0.652 0.632 0.629

P R F1

0.528 0.380 0.319

0.626 0.380 0.275

0.568 0.407 0.359

Feature P R F1

Social 0.635 0.632 0.632

Content 0.641 0.641 0.641


§  Performance of the logistic regression classifier trained over different feature sets and applied to the test set.

Effect of features on engagement

Boards.ie

β

−2−1

012

Twitter Random

β

−0.50.00.51.0

Twitter Haiti

−6e+16−4e+16−2e+16

0e+002e+164e+166e+16

Twitter Union

β

−0.8−0.6−0.4−0.2

0.00.2

Server Fault

β

−1.0−0.5

0.00.51.01.52.0

SAP

β

−10

−5

0

5

Facebook

β

−0.10.00.10.20.30.40.5

In−degreeOut−degreePost CountAge

Post RatePost LengthReferrals CountPolarity

ComplexityReadabilityReadability FogInformativeness

Logistic regression coefficients for each platform's features

Significance of regression coefficients

Boards.ie

p

0.00.20.40.60.81.0 Titter Random

p

0.00.20.40.60.81.0 Titter Haiti

p

0.00.20.40.60.81.0

Titter Union

p

0.00.20.40.60.81.0 Server Fault

p

0.00.20.40.60.81.0 SAP

p

0.00.20.40.60.81.0

Facebook

p

0.00.20.40.60.81.0

In−degreeOut−degreePost CountAge

Post RatePost LengthReferrals CountPolarity

ComplexityReadabilityReadability FogInformativeness

Comparison to literature

§  How performance of our feature compare to other studies on different datasets and platforms?

Positive impact Negative impact

Mismatch Match

Positive impact Negative impact

Mismatch Match

Summary

§  We tested the consistency and applicability of engagement patterns across multiple platforms

§  Used 12 social/content features that map to 5 platforms

§  Studied the impact of those features on engagement across these platforms

§  Compared the impact of our features against generally relevant studies in the literature

§  Showed that same features could play a different roles in different platforms, or different non-random datasets

So what’s Next!

§  LOTS!

§  Apply same study to more datasets from the same platforms, and from other platforms

§  Expand from replies to other engagement indicators

§  Improve classification of seeds/non-seeds with more common features

§  Further study on impact of topics and non-randomness on engagement dynamics

§  Take user type into account – e.g. posts from new agencies are more likely to be tweeted than replied to

Questions! 1.  Why those specific datasets and platforms?

2.  What about platform-specific features?

3.  Could we ever get a full understanding of these dynamics across all social platforms?

4.  Could these findings be used to increase engagement?

5.  Who’s right/wrong when the same feature appears to have conflicting impact on the same platform?

6.  Couldn’t be the case that the same feature is used differently in different platforms?

7.  How could we study event-specific engagement dynamics?

@halani

harith-alani

@halani

http://people.kmi.open.ac.uk/harith/

ACM Web Science Conference (WebSci) 2014, middle of nowhere!