Detecting and Combating Social Media Inﬂuencers...•Social Media provides a popular platform for marketers and organizations to diffuse content. •The potential fear of malicious

User 1

User 2

User 3

Individual Models:

• CSM Built For every user

References

User 1 User 2 User 3

Aggregated Models:

• All users data of a level is aggregated together

• CSM built for each level

Detecting and Combating

Social Media Influencers

Agent Based Model

Data

Anthony Weishampel

Department of Statistics

[email protected]

William Rand

Poole College of Management

[email protected]

This project has the following three objectives:

• Develop methods to assist in the identification of influencers and bots with malicious

intent, through analyzing the user’s Twitter behavior.

• Two approaches: Causal State Models (CSM) & Functional Data Analysis (FDA)

• Explore ways to prevent or reduce the power of negative influence campaigns.

• Construct the framework for a platform to simulate and test out theories of influence

and influence combat through using Agent Based Models

Data Sources & User Types:

• Labeled data provided on various types of users

Cresci, Stefano et.al. “The Paradigm-shift of social spambots” Proceedings of the 26th

International Conference on World Wide Web Companion, 2017

• Genuine Users vs E-commerce Bots (advertisements for mobile apps, amazon

products)

•Also have data and analyzed: Spambots, Political Influencers

• For each user needed to have the users Twitter Timeline (tweet history up to 3200

tweets)

• Social Media provides a popular platform for marketers and organizations to diffuse

content.

• The potential fear of malicious influencers (e.g., bots, trolls, and extremists) spreading

falsehoods or attempting to radicalize the general public on social media has been

validated in recent years.

• Some of this influence is perpetrated not directly by humans but by social media bots,

which are artificial agents whose behavior is controlled by predetermined algorithms,

often in an attempt to imitate authentic users, and have a particular effect on social

media public discourse.

• These automated users have impacted modern political campaigns, and have attacking

brands, including Nestle, Harley-Davidson, and AMD.

Alessandro Bessi and Emilio Ferrara. Social bots distort the 2016 U.S. Presidential election online discussion. First Monday, 21(11). 2016.

David Darmon, Jared Sylvester, Michelle Girvan, and William Rand. Predictability of user behavior in social media: Bottom-up v. top-down modeling. In Social

Computing (SocialCom), 2013 International Conference on, pages 102107. IEEE, 2013.

Davis, Clayton Allen, et al. "Botornot: A system to evaluate social bots." Proceedings of the 25th International Conference Companion on World Wide Web.

International World Wide Web Conferences Steering Committee, 2016.

Jeff Goldsmith, Vadim Zipunnikov, and Jennifer Schrack. “Generalized Multilevel Function-on-Scalar Regression and Principal Component Analysis”. Biometrics, 71:

344-353. 2015.

Eric Johsnon, How social media bots could tank your stock price. Recode. https://www.recode.net/2018/7/16/17574512/

S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, M. Tesconi. The paradigm-shift of social spambots: Evidence theories and tools for the arms race. WWW '17

Proceedings of the 26th International Conference on World Wide Web Companion, 963-972, 2017

Figure 3: Example of how the CSMs are made for a level of an Aggregated model.

Time Series Data Structure:

•Data was converted into a time series structure (time intervals of 30 minutes)

Binary Time Series Data:

0 – User doesn’t tweet

1 – User tweets

Figure 1: Binary Time Series Data Structure

Objective

Background• Widely applicable to high-resolution individual-level data

• Harada, et al. 2015 showed that Causal State Models can then be

used to forecast a user’s activity

• CSMs found by the Causal State Splitting Reconstruction (CSSR)

algorithm (Shalizi and Shalizi, 2004)

• CSMs are minimally complex and maximally predictive for social

media (Darmon et al., 2013)

• Assume time series data generated by conditionally stationary

stochastic process

Classification:

• Distant Metric:

• Let be the probability of 0000 occurring in model a

• Distance between two models a and b

User Type Number of Users

Genuine Users 3,474

E-commerce Bots 464

Figure 2: Sample daily time series for 50 genuine users and 50 e-commerce bots

Sample of Genuine Users Sample of Bots

Figure 4: Example of how the CSMs are made for the Individual Models• Group Distance Metric: Create aggregated models for every

category, then classify the user by closest CSM

• Individual k-NN: Build a CSM for every user in the testing set, find

the k closest users’ CSMs

Bot Combat• Identified 72 active bots from the dataset

• Collected information on the American users who

have interacted with these bots.

• 307 users who have interacted with the bots

• We reached out to some of these users

informing them of the bot interaction.

• It is currently too early to determine whether

reaching out and informing the user about the

bot alters the users interaction with the bots.

CSM: Results Functional Data Analysis

•Functional Principal Component Analysis to extract

the functions and scores for various types of users

• Calculate scores for new users & classify user

based on scores from training set

• 𝜇𝑖𝑗 𝑡 is the behavior for user i, on day j, at time t.

• Can only work with users who have a time zone

listed

Figure 6: Number of States in Individual CSMs

Individual CSM

Bots Genuine Users

Classification Results Precision Recall

Aggregated Models 0.349 1.0

Individual Models 0.982 0.947

Figure 5: Aggregated CSMs for the two types of users

Detection Method: Causal State Models

• From the aggregate CSMs it is clear that

the bot’s behavior requires more states

and consequently more complicated than

the typical genuine user.

• Similar conclusions can be drawn from

the individual CSMs

Results & Discussion

Aggregate Model

• Individual Models proved to be much better at classification

• FDA model requires high computation time but proved to be the most

accurate classification method. FDA results were excluded because the model

has multiple restrictions

• Results form the Bot combat will be used to determine how agents in the

Agent Based Model will react. • Agents – Various Twitter Users

• Actions – Follow/Unfollow a user, Tweet,

Retweet or no Tweet

• Both the FDA and the CSMs can be used to

model the agents behavior

• The framework provided in the project can create

an ABM platform to explore both the effects of

influence campaigns on social media as well as

the results of counter-narrative campaigns

Documents

Detecting and Combating Social Media Inﬂuencers...•Social Media provides a popular platform for marketers and organizations to diffuse content. •The potential fear of malicious